Capgemini SE

04/15/2024 | News release | Distributed by Public on 04/16/2024 10:31

Why and how to monitor a SONiC NOS network

Why and how to monitor a SONiC NOS network
A complete approach to monitoring a network based on SONiC NOS, to enable analytics and automation

Thovi Keerthi Kumar

Apr 15, 2024

Our blog examines why monitoring networks built on Sonic NOS (Software for Open Networking in the Cloud) is vital to meet growing user expectations for reliability and speed. We discuss the move towards autonomous networks that adapt to changing traffic and faults, reducing disruptions.

According to the market intelligence company IDC, worldwide spending on public cloud services is forecast to reach $1.35 trillion in 2027.

As the cloud becomes more ubiquitous and capable, consumers expect an ever-improving quality of service from the services they use. Whether it's accessing a website, streaming media, or gaming, these consumers want it to be faster and more reliable. Network operators are under pressure to ensure that their infrastructure can deliver on these expectations.

This 'always-on, always-connected, always-available' expectation drives the need for autonomous networks which can respond dynamically to changing traffic demands, degradation or faults, so that users are impacted minimally by changes.

One critical aspect of an autonomous network is the ability to monitor network elements and traffic flows at various points in that network. Datacenter networks, based on the well-established Clos or Leaf-Spine topologies, along with technologies like MC-LAG (Multi Chassis Link Aggregation) have an extremely high level of redundancy and, as a result, high fault tolerance. However, enterprises and service provider access/aggregation networks use different topologies and may not have the luxury of installing such a high level of redundancy in the topology itself. Indeed, enterprises that operate less fault-tolerant infrastructures require even closer monitoring.

A variety of monitoring tools are available currently and well distributed in networks. However, many of these take a performance-monitoring approach, rather than a telemetry approach. The significance of this is mainly from a investment perspective - large datacenters can often afford expensive monitoring systems, some of which are bespoke to the equipment they use. In contrast, smaller enterprises usually can't. Most importantly for the focus of this blog, these bespoke tools cannot monitor SONiC NOS based networks.

Why is this important? SONiC (Software for Open Networking in the Cloud) is a major enabler for open networking in datacenters, which require accurate and up to the second monitoring. SONiC is typically installed into a router or switch in a network. But network monitoring must be done by other software which is external, eg. on a server. However, existing installed tools may not readily be able to collect the necessary information from SONiC, due to compatibility issues.

To this end, we have put together a demonstration and proof of concept that shows a completely open solution to the challenge of properly monitoring your network, leveraging popular and proven open source components.

Our SONiC monitoring solution

We use a combination of sFlow-RT, Prometheus and Grafana for this. The sFlow-RT tool collects telemetry data using the established sFlow methodology. Prometheus stores the collected data in a real-time database and provides access to this data to other tools/systems. Grafana offers an administrator a rich visual view of the network traffic. We used the open-source Debian Linux distribution as the platform for these tools, and we generated the traffic data patterns using TRex, an open-source traffic generator.

The benefits of such network monitoring include: 

  • Immediate network visibility
  • Real-time insights 
  • Critical metrics on utilization and error statistics 
  • Being able to identify irregular traffic patterns 
  • The ability to respond to emerging network issues 
  • Maintaining optimal performance 

More about sFlow

SONiC-based network elements include switches and routers. We will now delve into the the details of sFlow - why it is important and how it is integrated into these network elements to monitor and collect essential network traffic data. This data includes packets, byte counts, traffic patterns, and flow information.  

sFlow (an abbreviation of "sampled flow") is a network monitoring technology that provides real-time insight into network traffic. It doesn't capture every single packet of data that flows through a network, but rather samples subsets of packets for analysis. sFlow collects valuable information about the source and destination of the sample packets, the type of traffic, and traffic volumes - identifying patterns, trends, and irregularities within the network's traffic. This is especially useful in troubleshooting to spot inefficiencies and problem areas in a network. 

The SONiC NOS includes an sFlow agent. This sFlow agent, when correctly set up, samples and collects data about network traffic flowing through the device. It sends the collected information, (encapsulated in sFlow datagrams) to a designated destination, typically a sFlow collector. These datagrams contain details about the sampled packets, such as source and destination addresses, ports, and other relevant information. 

sFlow-RT (sFlow-RT installation) is the sFlow Collector, a tool to collect sFlow data sent out by sFlow agents embedded in devices in the network. It can also provide critical metrics, like packets-received, bytes-received, packets-transmitted, bytes-transmitted, utilization, and error statistics. A Collector receives, stores and analyzes sFlow datagrams from multiple agents across the network. It stores the data in a real-time database, providing real-time visibility into network performance, traffic patterns, and anomalies in the network. 

Prometheus (Prometheus installation) is a monitoring and alerting toolkit that collects, stores, analyzes, and visualizes time-series data, including network flow data. It stores the collected telemetry data in a real time database and provides an elegant, user-friendly interface to read and use the collected data - both in real-time and post-facto. 

Grafana (Grafana installation) is a data visualizer. It transforms the data/metrics from other tools (like Prometheus) into meaningful visualizations. The inbuilt dashboards make it easy to get started, by setting up the most common parameters that network administrators are interested in. The dashboards can be customized and additional elements can be added, depending upon need. This helps users to interpret and analyze network performance data.

The detailed configurations are shown in the video. A summary of the configurations is below.

Switch/router with SONiC sFlow-RT/Collector Prometheus Grafana
· Enable sFlow
· Configure polling interval
· Add an agent-id
· Enable sFlow on an interface and map it to the added agent-id
· Configure the name and IP-address of the system on which sFlow-RT is installed
· Open configuration file sflow-rt.conf
· Configure exported details (type=Prometheus, IP-address and port number of system running Prometheus
· Restart sFlow-RT service
· Add one or more jobs into the prometheus.yaml file. · Configure Prometheus as the type of data source
· Configure the IP address of the system on which Prometheus is running, as the data source
· Import a pre-created dashboard by selecting from the available list

Better oversight: better network performance

This setup helps network administrators gain insights into network traffic, so that they can oversee their networks, identify hot spots, troubleshoot issues and optimize performance. This allows you to get the best out of your deployed network resources. 

Capgemini Engineering helps clients to best use SONiC in their projects. Contact our experts today to see how we can help you leverage the benefits of open networking.

Meet our expert

Thovi Keerthi Kumar

Expert II-Lead Connectivity & NW Engineer at Capgemini Engineering

With 17+ years of hands-on IT experience as a Software R&D Engineer, Integration Specialist, Technical Lead, Mentor & Associate Architect, Keerthi is an expert in the fields of networking and telecom for product development and support services. He also has years of experience in domains of VOIP, HomeNetworking, STB Middleware, PTT based MissionCritical Services and L2/L3 Networking. Keerthi is an enthusiastic learner, holding a Master's Degree in Computer Networks from Manipal Institute of Technology and Bachelor's Degree from VTU University.
First Name *
First Name is not valid.
Last Name *
Last Name is not valid.
Email *
Email is not valid.
Company *
Company is not valid.
Country
Country Afghanistan Aland Islands Albania Algeria American Samoa Andorra Angola Anguilla Antarctica Antigua And Barbuda Argentina Armenia Aruba Australia Austria Azerbaijan Bahamas Bahrain Bangladesh Barbados Belarus Belgium Belize Benin Bermuda Bhutan Bolivia Bosnia And Herzegovina Botswana Bouvet Island Brazil British Indian Ocean Territory Brunei Darussalam Bulgaria Burkina Faso Burundi Cambodia Cameroon Canada Cape Verde Cayman Islands Central African Republic Chad Chile China Christmas Island Cocos (Keeling) Islands Colombia Comoros Congo Congo, Democratic Republic Cook Islands Costa Rica Cote D'Ivoire Croatia Cuba Cyprus Czech Republic Denmark Djibouti Dominica Dominican Republic Ecuador Egypt El Salvador Equatorial Guinea Eritrea Estonia Ethiopia Falkland Islands (Malvinas) Faroe Islands Fiji Finland France French Guiana French Polynesia French Southern Territories Gabon Gambia Georgia Germany Ghana Gibraltar Greece Greenland Grenada Guadeloupe Guam Guatemala Guernsey Guinea Guinea-Bissau Guyana Haiti Heard Island & Mcdonald Islands Holy See (Vatican City State) Honduras Hong Kong Hungary Iceland India Indonesia Iran, Islamic Republic Of Iraq Ireland Isle Of Man Israel Italy Jamaica Japan Jersey Jordan Kazakhstan Kenya Kiribati Korea Kuwait Kyrgyzstan Lao People's Democratic Republic Latvia Lebanon Lesotho Liberia Libyan Arab Jamahiriya Liechtenstein Lithuania Luxembourg Macao Macedonia Madagascar Malawi Malaysia Maldives Mali Malta Marshall Islands Martinique Mauritania Mauritius Mayotte Mexico Micronesia, Federated States Of Moldova Monaco Mongolia Montenegro Montserrat Morocco Mozambique Myanmar Namibia Nauru Nepal Netherlands Netherlands Antilles New Caledonia New Zealand Nicaragua Niger Nigeria Niue Norfolk Island Northern Mariana Islands Norway Oman Pakistan Palau Palestinian Territory, Occupied Panama Papua New Guinea Paraguay Peru Philippines Pitcairn Poland Portugal Puerto Rico Qatar Reunion Romania Russian Federation Rwanda Saint Barthelemy Saint Helena Saint Kitts And Nevis Saint Lucia Saint Martin Saint Pierre And Miquelon Saint Vincent And Grenadines Samoa San Marino Sao Tome And Principe Saudi Arabia Senegal Serbia Seychelles Sierra Leone Singapore Slovakia Slovenia Solomon Islands Somalia South Africa South Georgia And Sandwich Isl. Spain Sri Lanka Sudan Suriname Svalbard And Jan Mayen Swaziland Sweden Switzerland Syrian Arab Republic Taiwan Tajikistan Tanzania Thailand Timor-Leste Togo Tokelau Tonga Trinidad And Tobago Tunisia Turkey Turkmenistan Turks And Caicos Islands Tuvalu Uganda Ukraine United Arab Emirates United Kingdom United States United States Outlying Islands Uruguay Uzbekistan Vanuatu Venezuela Viet Nam Virgin Islands, British Virgin Islands, U.S. Wallis And Futuna Western Sahara Yemen Zambia Zimbabwe
Country is not valid.
Phone (optional)
Your Message *
Your Message is not valid.
I agree to Capgemini collecting and processing my personal data to allow me to receive information on Capgemini services. For further information, please see our Privacy Notice. .
Expert title
Page URL
Slide to submit
Thank you for your submission. We will be in touch with you soon!

We are sorry, the form submission failed. Please try again.

Related