Understanding code yellow

Understanding sample code yellow alert-

Oct 3 01:00:10, cms01, Error, Cisco CallManager, ccm: 147896: Oct
03 05:00:10.616 UTC : %CCM_CALLMANAGER-CALLMANAGER-3-CodeYellowEntry:
CodeYellowEntry Expected Average Delay:362 Entry Latency:20 Exit Latency:8
Sample Size:10 Total Code Yellow Entry:14 High Priority Queue Depth:2 Normal
Priority Queue Depth:12 Low Priority Queue Depth:70 Cluster
ID:StandAloneCluster Node ID:cms01, 3652


The above alert is about code yellow entry, it shows the time when server enters code yellow state.
There are no specifications to note at this alert, CM enters code yellow when it achieves the mentioned conditions for code yellow state as mentioned in service parameters.

Oct 3 01:00:12, cms01, Error, Cisco CallManager, ccm: 147897: Oct
03 05:00:12.268 UTC : %CCM_CALLMANAGER-CALLMANAGER-3-CodeYellowExit:
CodeYellowExit Expected Average Delay:0 Entry Latency:20 Exit Latency:8
Sample Size:10 Time Spent in Code Yellow:2 Number of Calls Rejected Due to
Call Throttling:60 Total Code Yellow Exit:14 High Priority Queue Depth:0
Normal Priority Queue Depth:5 Low Priority Queue Depth:4 Cluster
ID:StandAloneCluster Node ID:cms01, 3653


The above alert is about code yellow exit, it shows the time when server moved out of code yellow state.
Important thing to note here is :-

Time Spent in Code Yellow:2 - it means server stayed in code yellow state for 2milliseconds, you can verify it with the actual time of getting code yellow entry alert and code yellow exit alert.
Number of Calls Rejected Due to Call Throttling:60 - It means CM rejected 60 calls during this time.


What is code yellow?
CodeYellow is a state a CallManager server can enter if it is under heavy call activity, low CPU availability to Cisco Unified Communications Manager, routing loops, disk I/O limitations, disk fragmentation or other such events.
Basically the alert is to notify that the call processing power if this node may be diminished and calls may take longer to setup.


How to troubleshoot code yellow?
Foremost, we need to see application logs during the time when server entered code yellow state.
If we see core dump file alert in the application logs-
Oct 12 11:36:03 CMC2 local7 2 LpmTool: 2:
CMC2.hk.hsbc: Oct 12 2011 03:36:03.798 UTC :
%UC_LPMTCT-2-CoreDumpFileFound: %[TotalCoresFound=1][CoreDetails=The
following lists up to 6 cores dumped by corresponding
applications.][Core1=Cisco CallManager
(core.8045.6.ccm.1318390519)][AppID=Cisco Log Partition Monitoring
Tool][ClusterID=][NodeID=CMC2]: The new core dump file(s) have
been found in the system.


Then all we need to track is core dump file. You can get this file from SSH by running following command in off-hours-
utils core active analyze <core file name>
Generally it calls for bug, you may have to contact TAC.

If we do not see any core dump in SSH related to code yellow time, then we may have to check perfmon logs to see if there is high CPU utilization.
It is also necessary to check if we keep on getting code yellow periodically, do we see any pattern in code yellow like MWI resync leads to code yellow alert due to heavy load on CUCM. In such a situation, we need to increase 'system throttle sample size'.
We might also need to check CCM SDI/SDL traces to check how CUCM was behaving prior to receive code yellow alert.


What is code red?
Cisco Unified Communications Manager enters a Code Red state, which indicates that Cisco Unified Communications Manager has remained in a Code Yellow state for an extended period and cannot recover. When Cisco Unified Communications Manager enters a Code Red state, the Cisco CallManager service restarts.

Comments

  1. Hey, Thanks for the post. i really find this helpful for me.

    Regards,
    KarthiK

    ReplyDelete

Post a Comment

Popular posts from this blog

Status: Local Agent is not responding. This may be due to Master or Local Agent being down.