Earlier today had some issue with a customer having latency on one of their MPLS circuit. MPLS or Multi-Protocol Label Switching defines a mechanism for packet forwarding in network routers. It was originally developed to provide faster packet forwarding than traditional IP routing, although improvements in router hardware have reduced the importance of speed in packet fowarding. However, the flexibility of MPLS has led to it becoming the default way for modern networks to achieve Quality of Service (QoS), next generation VPN services, and optical signaling.
Traditional IP networks are connectionless: when a packet is received, the router determines the next hop using the destination IP address on the packet alongside information from its own forwarding table. The router’s forwarding tables contain information on the network topology. They use an IP routing protocol, such as OSPF, IS-IS, BGP, RIP or static configuration, to keep their information synchronized with changes in the network.
I am not able to diagnose the issue myself so i called in one of my mentors/guru/Boss Dean Zerbe to help me isolate the problem. If you want to know more of MPLS, check out his blog.
Dean gave me some advice on how to troubleshoot things…
Customer Has 4 routers running on an MPLS circuit, First on the checklist was to verify the Layer 1 and Layer 2 connectivity by looking at the interfaces and verifying if there are any errors on the circuit.
- Show Interfaces – The purpose of the show interfaces command is rather self-explanatory, it displays the interfaces and their status. Just a priliminary on the troubleshooting checking for errors, current traffic and the status of the interface.
- Show Process commands – displays the information about the active process in a cisco router.
router#show processes
CPU utilization for five seconds: 0%/0%; one minute: 0%; five minutes: 0%
PID Q Ty PC Runtime(uS) Invoked uSecs Stacks TTY Process
1 C sp 602F3AF0 0 1627 0 2600/3000 0 Load Meter
2 L we 60C5BE00 4 136 29 5572/6000 0 CEF Scanner
3 L st 602D90F8 1676 837 2002 5740/6000 0 Check heaps
4 C we 602D08F8 0 1 0 5568/6000 0 Chunk Manager
5 C we 602DF0E8 0 1 0 5592/6000 0 Pool Manager
6 M st 60251E38 0 2 0 5560/6000 0 Timers
7 M we 600D4940 0 2 0 5568/6000 0 Serial Backgroun
8 M we 6034B718 0 1 0 2584/3000 0 OIR Handler
9 M we 603FA3C8 0 1 0 5612/6000 0 IPC Zone Manager
10 M we 603FA1A0 0 8124 0 5488/6000 0 IPC Periodic Tim
11 M we 603FA220 0 9 0 4884/6000 0 IPC Seat Manager
12 L we 60406818 124 2003 61 5300/6000 0 ARP Input
13 M we 60581638 0 1 0 5760/6000 0 HC Counter Timer
14 M we 605E3D00 0 2 0 5564/6000 0 DDR Timers
15 M we 605FC6B8 0 2 011568/12000 0 Dialer event
- show process cpu -displays information about the active processes in the router and their corresponding CPU utilization statistics.
router#show processes cpu
CPU utilization for five seconds: 8%/4%; one minute: 6%; five minutes: 5%
PID Runtime(uS) Invoked uSecs 5Sec 1Min 5Min TTY Process
1 384 32789 11 0.00% 0.00% 0.00% 0 Load Meter
2 2752 1179 2334 0.73% 1.06% 0.29% 0 Exec
3 318592 5273 60419 0.00% 0.15% 0.17% 0 Check heaps
4 4 1 4000 0.00% 0.00% 0.00% 0 Pool Manager
5 6472 6568 985 0.00% 0.00% 0.00% 0 ARP Input
6 10892 9461 1151 0.00% 0.00% 0.00% 0 IP Input
7 67388 53244 1265 0.16% 0.04% 0.02% 0 CDP Protocol
8 145520 166455 874 0.40% 0.29% 0.29% 0 IP Background
9 3356 1568 2140 0.08% 0.00% 0.00% 0 BOOTP Server
10 32 5469 5 0.00% 0.00% 0.00% 0 Net Background
11 42256 163623 258 0.16% 0.02% 0.00% 0 Per-Second Jobs
12 189936 163623 1160 0.00% 0.04% 0.05% 0 Net Periodic
13 3248 6351 511 0.00% 0.00% 0.00% 0 Net Input
14 168 32790 5 0.00% 0.00% 0.00% 0 Compute load avgs
15 152408 2731 55806 0.98% 0.12% 0.07% 0 Per-minute Jobs
CPU utilization for five seconds – CPU utilization for the last five seconds. The first number indicates the total, the second number indicates the percent of CPU time spent at the interrupt level.
PID – The Process ID
Runtime (uS) – CPU time the process has used, expressed in microseconds.
Invoked – The number of times the process has been invoked.
uSecs – Microseconds of CPU time for each process invocation.
5Sec – CPU utilization by task in the last five seconds.
1Min – CPU utilization by task in the last minute.
5Min – CPU utilization by task in the last five minutes.
TTY – Terminal that controls the process.
Process – Name of Process.
What is important at this point is to verify the CPU utilization for five seconds, if the value has a significant amount of increase or is having a 100% utilization then something is really wrong.
- show process cpu history – displays in ASCII graphical form the total CPU usage on the router over a period of time: one minute, one hour, and 72 hours, displayed in increments of one second, one minute, and one hour, respectively. Maximum usage is measured and recorded every second; average usage is calculated on periods over one second.
The following is a sample output of the one-hour portion of the output:
router#show processes cpu history
!--- One minute output omitted
6665776865756676676666667667677676766666766767767666566667
6378016198993513709771991443732358689932740858269643922613
100
90
80 * * * * * * * *
70 * * ***** * ** ***** *** **** ****** * ******* * *
60 #***##*##*#***#####*#*###*****#*###*#*#*##*#*##*#*##*****#
50 ##########################################################
40 ##########################################################
30 ##########################################################
20 ##########################################################
10 ##########################################################
0....5....1....1....2....2....3....3....4....4....5....5....
0 5 0 5 0 5 0 5 0 5
CPU% per minute (last 60 minutes)
* = maximum CPU% # = average CPU%
!--- 72-hour output omitted
- The Y-axis of the graph is the CPU utilization.
- The X-axis of the graph is the increment within the period displayed in the graph; in this instance, it is the individual minutes during the previous hour. The most recent measurement is on the left end of the X-axis.
- The top two rows, read vertically, display the highest percentage of CPU utilization recorded during the increment.
In the above example, the CPU utilization for the last minute recorded is 66 percent. The router may have reached 66 percent only once during that minute, or it may have reached 66 percent multiple times; the router records only the peak reached during the increment and the average over the course of that increment.
For the issue above, Dean checked out all four existing routers and verified all their CPU utilization, it turned out to be that one of the router is having a high utilization. The High utilization of the router is correlated to an over utilization of the T1. He then checked of the NAT translation and found out one of the servers is sending lots of traffic to the internet causing multiple NAT translation thus causing the utilization of the router to increase and pointing the issue of the T1 being over utilized and inside network being slow. We had Isolated the said server and blocked its access on the internet, removing the one-to-one NAT translation and denying the ports 80 and 21 for the said server.
It should be noted that high CPU utilization, by itself, does not indicate a problem with your device.
As a rough guideline, only consistently high CPU utilization over an extended period of time indicates a problem. Further, these commands are more relevant in the process of figuring out what went wrong rather than being indicators that all is not fine.
Need some advice on your network? You may check out some technical stuff at Techblog, Intelletrace Site or Tweet askintelletrace.