Cisco Systems Inc.

07/18/2019 | News release | Distributed by Public on 07/18/2019 10:17

Using Automation for a DataCenter Network Core Migration

This blog is authored by Ioannis Theodoridis (Yannis for friends and workmates). I am a network and systems engineer specializing in fault, performance, and service level monitoring for the Network and IT services. I work in the Network Department, in the IT Division of the Bank of Greece, as a member of the team of NOC engineers that design, deploy and support operations for the WAN, the LAN and the DC Network.
The text posted in this article is not in any way representative or binding for my organization. Some details are purposefully omitted for confidentiality reasons.

My relationship with network automation and programmability

The idea of network automation and programmability has been with me for a long time, long before the term had any meaning for me. In the past I have used TCL/Expect, and then Perl coupled with regular expressions to manipulate network equipment and get performance metrics through Telnet and SSH. Although Perl is a powerful scripting language, it required a lot of work to build things from scratch, as there was no community for this particular application of the language. I have been learning about Network Programmability for some time now, using various sources:

  • I have attended Cisco Live DevNet sessions and labs (Cisco Live Berlin & Barcelona) and two different DevNet Express events in Athens (2018 & 2019).
  • I have followed Kirk Byers's email course on Python for Network Engineers and enrolled in several online courses for Python and other development tools.
  • I have followed both seasons of NetDevOps Live series of webinars from the Cisco DevNet Team.
  • I have joined several communities using forums for most known Open Source Network Automation Frameworks and I follow automation experts, teams and projects on Twitter and other Social Media.

Getting hands-on at an expert-led DevNet Express event

The DC Core Net upgrade project - every change is an opportunity

Project requirements and project launch
A few years back, we started working on a big network upgrade project with my workmate K.D., on behalf of our Department, as the team of network engineers responsible of seeing it through. The project demanded most of our attention, starting from the research stage and a feasibility study, leading to writing up specifications documents for an RFQ, choosing a partner, and finally to the actual project design and implementation phase, for which I was also assigned as the role of Project Manager.

It consisted of switching the DC Network Core of our Datacenters to Nexus DC Switches, separating the User Access Network Layer from the DC Server Access and Core Layers. For 13 years, the users and servers networks were mixed together. The same pairs of Legacy Cisco 6k Catalyst Switches were operating as the core switches for everything.

For the change to be successful, we needed to plan first for a series of large-scale infrastructure upgrades and modifications in Power, Cabling and Rack space in multiple buildings. With those complete, we would still need to:

  • seamlessly integrate the new Core Switches to the existing network and migrate to the new design (upgrading all the old Core Switches was needed first).
  • leverage VDC technology to isolate critical networks in several separate L2 fault domains
  • introduce OTV and MacSec encryption across DC sites
  • provide a new strong, scalable and resilient DC Core
  • minimize impact to bank services.

Learning in parallel
By end of January 2018, we were deep in studying the relevant technologies and products available. Along with my workmate, we attended Cisco Live Barcelona, gathering information that could be of use to the project, but at the same time keeping up with current developments, including all we could find out about DevNet. The DevNet Express events were already going strong all over the world at the time. We were finally able to attend one, the first in Athens, Greece, come June 2018. Attending such a live event, can be transforming with the event structure helping you to go back up in your tracks, find your path, and start exploring. In the beginning, we didn't think there would be time enough to practice and apply our new skills to the project, so we didn't plan for it to happen. However, as time went on, such occasions presented themselves almost naturally.

After DevNet Express Athens, I had taken the time to find out a lot more about Network Automation Frameworks, taken an email course from Kirk Byers about using the Netmiko Python Library, started using Ansible, enrolled in Python courses, and tracked down any possible source of information, using Twitter, Youtube and LinkedIn. My workmate had also been learning python and studying a lot of relevant material available on Cisco DevNet. We often exchanged knowledge, working together, discussing possibilities and sharing ideas, each one taking roles, pushing for change and at the same time making sure we were keeping a steady foot on the ground, trying to find practical uses for the application of our new skills.

Use Cases - network automation tools used

By November 2018, we were in the project implementation phase, moving things around, destroying and rebuilding infrastructure to get to our goal. That also involved invoking many Disaster Recovery Processes. Instead of applying those standardized DR processes by hand, I started writing small python Netmiko based automation scripts for them. It was easy and fun and it worked. Besides, what could go wrong, we were already performing Disaster Recovery, right? After that, coming up with practical ideas for Network Automation in our workplace started getting easier for us. I prepared a few more scripts for small everyday operational tasks (some examples here and here)

We first migrated to the new DC Core and then reached the point, where we were finally ready to migrate the DC servers from the old switches to the new DC access layer, so we wanted to find out where exactly every server was connected. That would allow us to:

  • verify steps and coordinate the migration
  • make sure there are none left behind when the migration is over
  • perform clean up later

It was a 15-year-old environment, with added layers as time passed, so writing a reusable script for it proved a little more difficult than other cases. It involved reading switch addresses from an excel file, integrating the use of TextFsm and NTC-templates to turn command output in structured data, gathering arp information from the core switches, cross-referencing it with mac address table information from each access switch, and finally exporting the results to excel files. Automating this process allowed us to run it repetitively, offering a practical way to check our progress between migration windows scheduled with teams of system admins and ensuring success for the project. You can find the code and a little more detailed explanation here.

After the successful migration of both DC Core and Servers Access layer in the new design, we had to perform acceptance tests for the whole environment. I had become aware of the Python-based Automated Testing Framework created by Cisco and released in public, called pyATS. As I had very little time to go through the documentation and get to a point where I could set up something useful, I turned to Twitter for help. Cisco DevNet's Stuart Clark responded by calling on the Cisco pyATS team to the rescue. They were kind enough to provide me with details on how to use the Genie CLI for producing results with minimal effort. I set up my DC testbed very quickly and successfully gathered data between the tests, by running Genie Learn and Parse commands, and in the end compared results by running Genie Diff against result sets.

Using pyATS and Genie, we were able to draw very interesting and useful conclusions with ease and speed, for example we realized that the Mainframe routing process switched the traffic from one Nexus peer to the other during the tests, leading to further investigation. We also easily verified that all services were successfully taken over by active links and equipment during the tests, and that all switch processes maintained the necessary information at all times.

Project Conclusion - Current Status

The DC Network Core upgrade project has been successfully concluded by end of May 2019. We have learned a lot from our automation attempts and we are still learning, finding smaller projects, on a case by case basis, and investing time with caution in choosing possible new learning paths, as our time is valuable. The PyATS & Genie suite remains ready to be used at the DC environment every time we need to gather status data 'before' and 'after' big changes.

We feel confident that our python scripting skills provide us with the ability to fill in gaps in our monitoring environment, manipulating data directly in and out of the network equipment or through management software APIs. We are also considering writing code for automated network recovery purposes and network traffic engineering. Right now, we are putting our requirements in queue and lining up the small development projects to cover them, making sure we are not neglecting our other duties and separate projects.

Future plans - Practical Approach

I have since prepared more scripts, targeted to saving time or to standardize tasks, tackling additional Network Automation tools, such as Nornir for a use case described on DevNet Code Exchange. In the months to come I would also like to go deeper with PyATS, using its library framework within python scripts, so that a network operational status may be turned into a monitored operational state variable and produce relevant alerts through a separate alerting tool like Nagios, when things go wrong.

My work partner, proficient with Network Management tools such as HP NNMi and Cisco Prime Infrastructure, has been focusing on Rest APIs, and extracting information from our Management Tools. Working together, setting up a database model for monitoring our network equipment or even creating a complete self-service CI monitoring platform, seems possible in the long run.

With Network Programmability and Automation, our team has expanded its collaboration on new ground. However, our purpose is not to automate for the sake of automation. Learning to solve a problem with automation does not always mean you must actually use it. It gives you an excuse to practice, learn more, go deeper and provides you with an alternative. Deciding to use that alternative in production is up to you, with different factors coming into play, like validation, scalability and security.

Advice for newcomers

If somehow you are new to all this or wondering where to start and if this is for you, don't trouble yourself any longer. Start anywhere; try automating anything, no matter how small or insignificant it may seem at first. You might still be looking for use cases that would justify using automation. No need for that. Those will come later, as you get accustomed with the tools and techniques that are available to you.

  • Start getting your feet wet using one of the DevNet Sandboxes, or at a small lab at work/home, or use a network emulator as a testbed (not sure if I can mention any of them here, but you can easily find out which they are).
  • For learning, you can start with DevNet learning tracks or Kirk Byers's course for Netmiko or find an online course for Python/Ansible/Network automation in several online learning platforms.
  • Watch both seasons of NetDevOps Live and stay alert for anything new that comes up from the DevNet Team by following them on Twitter or subscribing with email. Find Dmitry Figol's streams on Network Programmability in his Youtube Channel.
  • Join forums or Slack channels for the Open Source projects (Netmiko, Napalm, Nornir). Check the docs and sites of all the known automation frameworks (the above plus pyATS, Ansible, Salt, Puppet, etc)
  • Get to know Netconf and Restconf as those will be the methods of choice for the newer platforms, learn how to interact with Software Rest APIs, using Postman and Python.
  • Find mentors, join communities like DevNet.

Some links to get you started:

httpv//youtu.be/LxiBHPMves8

Thank you for reading this post. Keep following Cisco DevNet's developer blog. Best of luck in your adventures!

Ioannis Theodoridis | Twitter: @mythryll

Author Bio: I work in the Network Department, in the IT Division of the Bank of Greece, as a member of the team of NOC engineers that design, deploy and support operations for the Wan, the LAN and the DC Network, for the past 15 years, operating Cisco Systems network equipment and software. Before that, I have worked for an integrator in Greece (Space Hellas) and a telecom operator in France (Cegetel/SFR). I started my studies in Mechanical Engineering (National Technical University of Athens - NTUA) and moved to Network Conception and Architecture (Mastère Spécialisé en Conception & Architecture des Réseaux, Ecole Nationale Supérieure des Télécommunications - Telecom Paris, France).

Share: