IOS-XE controllers System Health Check for Debugging Issues
- Monitor the cpu utilization history and make sure that there are no CPU spikes for longer period (more than couple of minutes).
- The hourly average CPU utilization of any core for any given minute going higher than ~70% needs further close monitoring. Identify the process that is taking most of the CPU and see the spike can be attributed to any kind of network activity.
- NOTE: If the controller is added to the Prime Infrastructure(PI), then there could be periodic spikes of some of the processes listed below:
-
- snmp_subagent, eicored, eicore_ipc(wcm) and SNMP ENGINE (iosd)
- The SNMP ENGINE utilization should drop down if there is any other higher priority activity that requires the CPU.
-
Commands
show processes cpu sorted detailed | exclude 0.00 # Sorted list of processes to the thread level based on the CPU utilization.
show processes cpu historySample output with PI activity:
#sh proc cpu sort det | e 0.00 Core 0: CPU utilization for five seconds: 45%; one minute: 16%; five minutes: 13% Core 1: CPU utilization for five seconds: 44%; one minute: 14%; five minutes: 10% Core 2: CPU utilization for five seconds: 52%; one minute: 15%; five minutes: 11% Core 3: CPU utilization for five seconds: 44%; one minute: 15%; five minutes: 11% PID T C TID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process (%) (%) (%) 5805 L 2621250 1821404 85 17.69 3.14 1.03 0 snmp_subagent 5805 L 2 5805 1074 1478 0 15.99 1.47 0.39 0 snmp_subagent 5805 L 0 6626 1389355 7685676 0 8.42 7.46 0.41 0 snmp_subagent 5805 L 2 6586 762745 9850229 0 3.94 2.63 0.18 0 CMI default xdm 5805 L 1 13461 874 33226 0 2.56 1.84 1.79 0 SNMP SA CACHE 5805 L 2 6497 276868 724659 0 0.13 0.09 0.09 0 snmp_subagent 8127 L 3936457 4684750 100 16.91 5.53 4.32 0 iosd 8127 L 2 8127 600155 2944832 0 13.54 3.74 2.66 0 iosd 8127 L 3 8742 2624104 1473118 0 2.47 1.69 1.62 0 iosd.fastpath 8127 L 2 8743 707123 2610018 0 0.90 0.11 0.03 0 CMI Thread 281 I 89336 1250203 0 24.44 4.11 1.11 0 SNMP ENGINE 276 I 859231 18049 0 11.66 0.99 0.22 0 SNMP Timers 22 I 1967404 1384683 0 8.55 1.66 0.44 0 CMI IOSd task 59 I 1908371 5190050 0 3.00 2.99 2.88 0 ARP Snoop 312 I 1547565 190253 0 2.77 1.33 1.33 0 QoS stats process 274 I 671863 5192881 0 0.88 0.88 0.88 0 DAI Packet Process 30 I 2510652 2756779 0 0.66 0.55 0.44 0 ARP Input 200 I 911297 1344167 0 0.66 0.22 0.11 0 Tunnel IOSd shim DB 195 I 1717254 4141977 0 0.33 0.22 0.22 0 IP Host Track Proce 232 I 1066616 5559004 0 0.33 0.22 0.22 0 UDLD 223 I 1590722 5162371 0 0.33 0.33 0.33 0 IP Input 219 I 977672 3720297 0 0.11 0.11 0.11 0 DHCP Snooping 8123 L 3603041 7383618 63 4.31 2.04 2.04 0 wcm 8123 L 1 8808 2457789 5526106 0 2.34 4.40 0.40 0 eicore_ipc
#show processes cpu history History information for system: 11111 11111 11111 11111 11111 1111 811111999991111177777111118888800000888880000099999999990000 100 90 80 70 60 50 40 30 20 10 ************************************************************ 0....5....1....1....2....2....3....3....4....4....5....5.... 0 5 0 5 0 5 0 5 0 5 CPU% per second (last 60 seconds) 164111111111111111454411116611111111111111135441113651111111 157124331111111613114211143611111104010000091641006601002200 100 90 80 70 * * * 60 * ** # 50 ** * ** ** #* 40 #* *##* ## *##* *#* 30 #* ###* ## *##* *#* 20 #* * #### ## #### *## 10 ############################################################ 0....5....1....1....2....2....3....3....4....4....5....5.... 0 5 0 5 0 5 0 5 0 5 CPU% per minute (last 60 minutes) * = maximum CPU% # = average CPU% 677776677677777766656666666565554555555555556575556556565555555655665644 634359546523207032491111314839989215282111231924203084021809628696108257 100 90 80 * * * 70 **************** * * 60 ******************************** * * *** * ** * * ** ******** 50 ************************************************************************ 40 ************************************************************************ 30 ************************************************************************ 20 #*##*#**#########**********************************************#******** 10 ######################################################################## 0....5....1....1....2....2....3....3....4....4....5....5....6....6....7. 0 5 0 5 0 5 0 5 0 5 0 5 0 CPU% per hour (last 72 hours) * = maximum CPU% # = average CPU%
- Monitor the RSS column of the individual processes. If this component is monotonically growing then it is alarming and further investigation needed based on the process that is growing.
- Monitor the free memory component of the system level memory. Don’t be alarmed if it drops down suddenly as IOSd usage could spike up and the memory freed will be held back in IOSd itself. If the system level free memory goes down below 40% it is alarming and further investigation needed.
- If the IOSd is process is growing, the run IOSd process specific command and look for Processor 8E2F3008 1610612736 344548600 1266064136 1114693628 1102149236 line in the output. Make sure the Lowest and Largest Columns are not too much apart.
Commands
show processes memory sorted
show memory detailed process iosd allocating-process totals # If the IOSd process RSS is growing.Sample output
#show processes memory sorted System memory : 7995664K total, 2601088K used, 5394576K free, 2227952K kernel reserved Lowest(b) : 568393300 PID Text Data Stack Heap RSS Total Process 8123 20048 1140084 88 34448 748636 1312796 wcm 8127 63252 1601300 104 932 740208 1924924 iosd 5240 9272 422520 92 115968 486208 783940 fed 5782 4260 157088 88 81200 139288 256792 ffm 5242 612 93880 88 1184 117536 306532 stack-mgr 5805 760 219200 88 22908 102992 283952 snmp_subagent
#show memory detailed process iosd allocating-process totals System memory : 7995664K total, 2601040K used, 5394624K free, 2227952K kernel reserved Lowest(b) : 568393300 Process iosd, type L, PID = 8127 1924924K total, 63252K text, 1601300K data, 104K stack, 932K heap Head Total(b) Used(b) Free(b) Lowest(b) Largest(b) Processor 8E2F3008 1610612736 344548600 1266064136 1114693628 1102149236 IOS Proce 8D2F2008 16777216 7691620 9085596 8954256 8466680 Allocator PC Summary for: Processor PC Total Count Name 0xF4CFA0B7 43112624 19401 *Packet Data* 0xF4CFA036 23628624 19560 *Packet Header* 0xF5936761 20985680 320 AAA AttrL Sub 0xF5A01BE7 12586816 168267 Init
- Monitor the number of APs, clients and make sure there is no sudden drop in the count at any point.
- Monitor the APs up time to make sure there are no AP drop offs.
- Monitor the client stats for learn-ip state and clients with self-assigned IP addresses.
- Monitor the AP load and channel utilization of all the APs.
Commands
show ap uptime # Look at AP uptime and Association uptime columns and make sure those are not too much apart, otherwise there are AP flappings
show wcdb database all | include Num # Client count
show wcdb database all | count LEARN|169.254. # Clients with self assigned IP, there may be a problem with the DHCP
show ap dot11 24ghz load-info # Make sure the channel utilization is not high
show ap dot11 5ghz load-info
show wireless exclusion # Make sure the clients are not listed if disabled, Make sure the clients are not stuck if enabled with 0 timer value.
show wireless client summary | i Idle # Make sure the clients are not stuck in this state.Sample output
#show ap uptime Number of APs : 498 AP Name Ethernet MAC AP Up Time Association Up Time --------------------------------------------------------------------------------------------------------------------------- tsim-01-1 00aa.0100.02f0 1 day 8 hours 42 minutes 15 seconds 1 day 8 hours 38 minutes 57 seconds tsim-01-2 00aa.0100.03f0 1 day 8 hours 42 minutes 15 seconds 1 day 8 hours 38 minutes 57 seconds tsim-01-3 00aa.0100.04f0 1 day 8 hours 42 minutes 14 seconds 1 day 8 hours 38 minutes 56 seconds #show wcdb database all | include Number Total Number of Wireless Clients = 3486 #show ap dot11 24ghz load-info AP Name Radio MAC Slot Channel Utilization (%) Clients ---------------------------------------------------------------------------------------- ap-01-1 00aa.0100.0200 0 40 10 ap-01-2 00aa.0100.0300 0 30 5 #show wireless exclusionlist Manually Excluded Clients Dynamically Excluded Clients
- Look for % in the logs to get all the syslog messages.
- Look for the keywords like Trace, CAPWAP, AVL in the system logs
- Monitor the traffic handled by the CPU and watch for any buffer exhaustion/suspension happening frequently.
Commands
show logging
show platform punt client # Look for failures column for any packets punted to CPU, should be zero. It should not increase fast if it is non-zero
show controllers ethernet-controller port-asic statistics exceptions # Look for any drop counter increments
Sample output
show logging | include Trace|%|AVL|CAPWAP *May 8 11:50:53.940 PDT: %LINEPROTO-5-UPDOWN: Line protocol on Interface Capwap3, changed state to up -Traceback= 1#b1a3e3fd1d3138ac635fe72eb77c44c3 :F39AF000+20DA8A3 :F39AF000+1380F83 :F39AF000+1382D17 :F39AF000+138C274 #show platform punt client tag buffer jumbo fallback packets received failures alloc free bytes conv buf 27 0/1024/2048 0/5 0/5 0 0 0 0 0 65536 0/1024/1600 0/0 0/512 2687494 2687494 171999616 0 0 65537 0/ 512/1600 0/0 0/512 174604 174604 34383056 0 0 65538 0/ 5/5 0/0 0/5 0 0 0 0 0 65539 0/2048/1600 0/16 0/512 60657819 60657819 2060395227 0 0 65540 0/ 128/1600 0/8 0/0 0 0 0 0 0 65541 0/ 128/1600 0/16 0/32 1490044 1490729 340860488 0 0 65542 0/ 768/1600 0/4 0/0 5623288 5646925 337397580 0 0 65544 0/ 96/1600 0/4 0/0 0 0 0 0 0 65545 0/ 96/1600 0/8 0/32 0 0 0 0 0 65546 0/ 512/1600 0/32 0/512 2482982 2914147 240875710 0 0 65547 0/ 96/1600 0/8 0/32 0 0 0 0 0 65548 0/ 512/1600 0/32 0/256 0 0 0 0 0 65551 0/ 512/1600 0/0 0/256 0 0 0 0 0 65556 0/ 16/1600 0/4 0/0 0 0 0 0 0 65557 0/ 16/1600 0/4 0/0 0 0 0 0 0 65558 0/ 16/1600 0/4 0/0 0 0 0 0 0 65559 0/ 16/1600 0/4 0/0 0 0 0 0 0 65560 0/ 16/1600 0/4 0/0 0 0 0 0 0 65561 0/ 512/1600 0/0 0/256 100170460 129432612 1711903376 0 2 65562 0/ 512/1600 0/0 0/256 3740090 9643238 1266448617 0 0 65563 0/ 512/1600 0/0 0/256 0 0 0 0 0 65565 0/ 512/1600 0/16 0/256 0 0 0 0 0 65566 0/ 512/1600 0/16 0/256 0 0 0 0 0 65567 0/ 512/1600 0/16 0/256 0 0 0 0 0 65568 0/ 512/1600 0/16 0/256 0 0 0 0 0 65583 0/ 1/1 0/0 0/0 0 0 0 0 0 131071 0/ 96/1600 0/4 0/0 0 0 0 0 0 fallback pool: 0/1500/1600 jumbo pool: 0/128/9300