Monitoring Cisco Switch Ports, with bandwith and STP status

I have been trying to monitor cisco switch ports ever since I started working with Nagios. However, I never found a fully satisfying solution. Either way the plugins didn´t show correct bandwidth data, or were just not working properly.

Finally I came to the plug-in check_snmp_netint.pl from William Leibzon. William recoded an older plug-in which can collect performance data from cisco switches, aswell as other from other manufacturer. Really cool is, that the plugin is still actively developed and optimized. On his website http://william.leibzon.org/nagios/ you can also find a handful other useful nagios plugins.

The check_snmp_netint.pl plugin offers you tons of configuration possibilities, such as monitoring bandwith, STP status or error counters of a switchport. This can also be done for Port-Channels. Port-Channels are an aggregation of multiple switchports to a single link. For example you can make a 4 Gbit/s link with aggregating four single 1 Gbit/s links.

Here I want show you how I set the plug-in together with a suitable pnp4nagios template.

First of all you have to go to http://william.leibzon.org/nagios/ and get the plugin check_snmp_netint.pl. Copy it to your nagios plugins directory (patch: /omd/sites/<sitename>/lib/nagios/plugins) and make it executable (chmod +x check_snmp_netint.pl)

For getting the full possible syntax you can execute the script with –help.


My goal was to achieve two different goals:

1)    Keep track of the used bandwidth / have helpful data for troubleshooting.

2)    Keep track of changes on how the data is flowing through the network.


This is how my checkcommand looked (commandline for NConf):

$USER1$/check_snmp_netint -H $HOSTADDRESS$ -C $ARG1$ –stp -n $ARG2$ -w 0,0,0,0,0,0 -c 0,0,0,0,0,0 -k -f -Y -z -M -B -d 1000 -r

Argument –> Purpose

-C –> SNMP Community of the switch (best Read-Only!). Keep an eye on your ACLs (Access Lists!). The IP of the nagios server needs to have access rights!
–stp –> Read STP Status. This will give you the values stp_status and stp_changetime.
-n –> Defines the name of the switchport. If you are unsure about the correct name, you can get a full table of all ports with adding –v at the end of the command.
-w -c –> Defining warning and critical levels.
-k  –> Activates the standard usage features.

-f –> Activates performance output.
-Y –> Actives output in bits/s or bytes/s, see syntax –M -B
-z –> Makes usage of –w and –c optional. I still kept these parameters for later use 😉 For me It didn´t make sense to set warning or critical levels, since there are times where the bandwidth is fully used (during backups or software distribution for example).
-M -B –> Activates output in Mbit/s
-d –> Defines the delta time (prefered time between two values that the script will use to calculate the averages). Should be bigger as the check interval. Here I took 1000 seconds since 900 seconds (15 mins) is my check interval. Generally, a delta time of Check_Intervall multiplied by 1.1 seems suitable.
-r –> Without that option being enabled the script will select eth0, eth1 for the input eth. When set, the script only selects interfaces if they fully match the interface name.


The output looks as following:

GigabitEthernet0/1:UP [STP:forwarding] (21.6Mbps/14.2Mbps):(1 UP): OK


Performance data output is:

‘GigabitEthernet0/1_stp_state’=5 ‘GigabitEthernet0/1_stp_changetime’=1326796820 ‘GigabitEthernet0/1_in_bps’=21577281;;; ‘GigabitEthernet0/1_out_bps’=14221734;;;

As you can see, there are four values given: In-bandwidth, out-bandwidth, stp_state, stp_changetime. Now I can use them for the goals defined earlier.


Goal: Keep track of the used bandwidth / have helpful data for troubleshooting.

In-bandwidth; out-bandwidth: Collect historical data of the bandwidth. Will give you reliable information about the utilization and answers to the questions why response times are changing, or the connectivity is temporary unavailable.

Goal: Keep track of changes on how the data is flowing through the network.
Stp_changetime; stp_status: The status shows you if the port is in forwarding or blocking state. After the first configuration there will be different ports in forwarding and blocking mode (depending on your network topology). This is a copy of the currently running stp configuration. Now, if there would be a change in the stp status of a port, this would mean that the data flow somehow changed. For example, if a switch has two connections to the core router and the preferred one was cut off.
This was at least the plan when I implemented the script (optimizations are welcome).

Experiences

My current experiences show, that monitoring of the bandwidth is working perfectly.  The extended cisco functionality enabled by using the –cisco syntax (see script sourcecode for more information) didn´t work on all cisco switches (2960 was working, 4500 series didn´t work).

However, I’m still a little bit unsure about the stp values and their correct interpretation. I will do some further investigation on that (comments welcome). It seems that the stp feature is not working for every port-channel/port.

Finally I wrote a small pnp4nagios template that displays both bandwidth values in one graph (the stp values are printed in separate graphs).

<?php

# In-bandwidth and out-bandwidth
$opt[1] = ” –vertical-label \”MB/s \” -b 1000 –title \”Interface traffic for $hostname / $servicedesc\” “;
$def[1] = “DEF:var1=$RRDFILE[3]:$DS[1]:AVERAGE ” ;
$def[1] .= “DEF:var2=$RRDFILE[4]:$DS[1]:AVERAGE ” ;
$def[1] .= “LINE1:var1#FF7F24:\”in  \” ” ;
$def[1] .= “GPRINT:var1:LAST:\”%7.2lf %SB/s last\” ” ;
$def[1] .= “GPRINT:var1:AVERAGE:\”%7.2lf %SB/s avg\” ” ;
$def[1] .= “GPRINT:var1:MAX:\”%7.2lf %SB/s max\\n\” ” ;
$def[1] .= “LINE1:var2#FF4040:\”out \” ” ;
$def[1] .= “GPRINT:var2:LAST:\”%7.2lf %SB/s last\” ” ;
$def[1] .= “GPRINT:var2:AVERAGE:\”%7.2lf %SB/s avg\” ” ;
$def[1] .= “GPRINT:var2:MAX:\”%7.2lf %SB/s max\\n\” “;
# stp_status
$opt[2] = ” –vertical-label \”State\” -b 1000 –title \”STP State $hostname / $servicedesc\” “;
$def[2] = “DEF:var1=$RRDFILE[1]:$DS[1]:AVERAGE ” ;
$def[2] .= “LINE1:var1#003300:\”State \” ” ;
$def[2] .= “GPRINT:var1:LAST:\”%6.0lf last\” ” ;
$def[2] .= “GPRINT:var1:AVERAGE:\”%6.0lf avg\” ” ;
$def[2] .= “GPRINT:var1:MAX:\”%6.0lf max\\n\” ” ;
#stp_changetime
$opt[3] = ” –vertical-label \”t\” -b 1000 –title \”STP Change Time $hostname / $servicedesc\” “;
$def[3] = “DEF:var1=$RRDFILE[2]:$DS[1]:AVERAGE ” ;
$def[3] .= “LINE1:var1#003300:\”Change Time  \” ” ;
$def[3] .= “GPRINT:var1:LAST:\”%6.0lf last\” ” ;
$def[3] .= “GPRINT:var1:AVERAGE:\”%6.0lf avg\” ” ;
$def[3] .= “GPRINT:var1:MAX:\”%6.0lf max\\n\” ” ;
?>

Remember to cut off the def[2] and def[3] rows if you are not using the –stp syntax, since there is no performance data available (you will see an error page).

The template has to be stored in /omd/sites/<sitename>/etc/pnp4nagios/templates.

Advertisements

About sitweak
Monitoring, Network, Firewall, Mobile Security. I´m totally into that stuff!

5 Responses to Monitoring Cisco Switch Ports, with bandwith and STP status

  1. me says:

    Thanks for this post, it was exactly what I was looking for. I setup everything per your post, but the graphing doesn’t appear to be working correctly. I don’t get any errors on the template, but all the graphs show a value of 5. I verified the performance data is being returned correctly when the check runs. If I dump the contents of the rrd, all values are NaN. Any ideas?

    • sitweak says:

      What you could try: Try to delete the template. PNP4nagios should then use the default template for each value returned, therefore it should show a couple of graphs. This would confirm that the performance data is working in general.

      If this is the case, I´m pretty sure it has something todo with the template itself. The template probably only reads the value for the STP state, as “5” indicates a STP status of “forwarding”.

      Feel free to contact me on twitter for further help! twitter.com/sitweak

  2. stijn says:

    Hi sitweak

    I’m trying to monitor the bandwidth from the HP procurves 2626 switches, with success!
    But when i try the cisco SGE2000 and SGE2010, it is a failure.

    I always get the result: ERROR: Cisco port-index map table : Requested table is empty or does not exist. I have the OID’s of al the interfaces, i have the community string and so on.

    Is it possible to help me with this problem with teamviewer or email?

    I would really appreciate it, it is killing me for weeks.
    Thanks for the help!

    Stijn Vanbinnebeeck

  3. sitweak says:

    Hi Stijn,
    unfortunately I´m not using nagios myself anymore. Couple months ago I switched to OMD (including check_mk). Ever since then I didn´t follow the progress on the mentioned plug-in anymore. It seems like the plug-in can´t read the interface table – which is weird – because it is usually in a standard format.

    What you could try to verify SNMP functionality:
    – Make a bulkwalk from the nagios server.
    – Use a MIB Browser and browse through the SNMP tables.

    If this is working you can be 100 % confident that SNMP is working as supposed. If not you may wanna check the SNMP configuration and access lists.

    Feel free to contact me on twitter for further troubleshooting! https://twitter.com/sitweak

    • Anonymous says:

      I would like to monitor a certain port on a switch to see the state of the port (forwarding or blocking), This is my command but it gives me to many ports
      ./check_snmp_netint.pl -v 2c -C public -H 10.77.255.254 -N 1.3.6.1.2.1.17.2.15.1.3 –stp -n 5 -v
      How can I change my check sow i can monitor only one port?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: