Tuesday, November 29, 2011

FXO calls disconnect: fxols_line_reversal_clid_wait. On_hook line reversal detected possibily due to crossed cable

Customer running CME on c2800nm-advipservicesk9-mz.124-4.XC4.bin with 6 FXO ports on 2 VWIC2-4FXO cards.  The complaint was that callers in conversation with outside parties disconnect at random times.  The duration of the call and destination didn't seem to matter.

Running "debug voip ccapi inout" and "debug vpm all" to investigate, occasionally found the following:

Nov 22 13:06:39: htsp_process_event: [0/1/0, FXOLS_ONHOOK, E_DSP_SIG_0110]

Nov 22 13:06:39: fxols_line_reversal_clid_wait. On_hook line reversal detected possibily
due to crossed cable
Nov 22 13:06:39: htsp_process_event: [0/1/1, FXOLS_ONHOOK, E_DSP_SIG_0110]
Nov 22 13:06:39: fxols_line_reversal_clid_wait. On_hook line reversal detected possibily
due to crossed cable
Nov 22 13:06:39: htsp_process_event: [0/1/2, FXOLS_ONHOOK, E_DSP_SIG_0110]
Nov 22 13:06:39: fxols_line_reversal_clid_wait. On_hook line reversal detected possibily
due to crossed cable
Nov 22 13:06:39: htsp_process_event: [0/1/3, FXOLS_ONHOOK, E_DSP_SIG_0110]

To confirm this was not a crossed cable or a flip in tip and ring leads, cables were inspected. No physical layer changes were made in years.

To confirm the issue was not related to the FXO card (errors were only related to one card), I swapped the two card positions in the router.  The errors above continued to happen on the same ports (0/1/X), ruling out a card failure.

Running "show voice dsp group all" indicated the DSPs were up and looked normal.

DSP groups on slot 0:

dsp 1:
State: UP, firmware: 8.4.1
Max signal/voice channel: 16/16
Max credits: 240
Group: FLEX_GROUP_VOICE, complexity: FLEX
Shared credits: 240, reserved credits: 0
Signaling channels allocated: 12
Voice channels allocated: 0
Credits used: 0

dsp 5:
State: UP, firmware: 8.4.1
Max signal/voice channel: 16/16
Max credits: 240
Group: FLEX_GROUP_VOICE, complexity: FLEX
Shared credits: 240, reserved credits: 0
Signaling channels allocated: 0
Voice channels allocated: 0
Credits used: 0
Tracing physical cables further, I found the 4 ports in question were served by an Adtran Total Access 616, while the other 2 appeared to be standard POTS lines.  In discussions with the TELCO carrier, errors on the T1 serving the Adtran were found and remote access to the device was not possible.

A site visit by a carrier representative indicated "this thing is toast" and the Adtran was replaced.

Problem solved.

Monday, November 07, 2011

Unity Connection 8.6 Delayed Messaging Complaints Troubleshooting

The issue to be addressed was a user claiming that they "cleared" their mailbox one afternoon, but upon arriving the next morning, heard new messages with time stamps earlier than the time they claimed they cleared it.  This is interesting in so much as they were not depending on MWI to trigger their message retrieval, but rather checking via the TUI occasionally and paying attention to message counts and time stamps.

While I might normally attribute this to user error, the user complained of multiple instances, only occasionally, leading me to believe there might be some validity to their concern.

A combination of the "User Phone Login and MWI Report" and the "User Message Activity Report" in Unity Connection should provide a basic log of messages left, user retrieval and MWI actions, but they seemed slightly broken on this system, TAC can't confirm what I am presented with is to be expected and is working on it in their own lab.  If these reports were reliable, they might be all I needed to confirm what the user was experiencing, or if they were in error.

Rather, I needed to use the UC logs and RTMT to decipher the user and system actions.

Step one was to enable micro traces on Unity Connection.  You can access these trace setting via Unity Connection Serviceability. Below they are set for Conversation Manager, MiuSkinny and Notifier, although I was most concerned with notifier (related to MWI functions with information re: message counts as well), given I suspected other traces and output might be suspect per TAC's research.






Once your traces are set and the problem is reproduced, you can use RTMT to retrieve the associated log files. You'll need as much detail from the end user as possible regarding their experience, but at very least, times associated with the message timestamps or user retrieval will be helpful.  The logs can be quite voluminous and confusing.  The correct version of RTMT can be downloaded from the Unity Connection server, not Cisco.com.  If you are like me, you may already have multiple versions installed.   It will be helpful to note what version you are installing so it can be run effectively when complete.

Below are screenshots of appropriate options to retrieve Conversation Manager and Notifier traces.






After downloading the trace files, finding a tool to search giant ugly text files for you is invaluable.  I personally use Windows Grep by Huw Millington, but that might not be the best and certainly isn't the only way to proceed.

You can start by searching for the user's alias or display name to find relevant data. Nice variables in the notifier logs also include NOTIFYQ_ACTION_MSG_NEW (associated with users receiving new messages), NOTIFYQ_ACTION_MSG_READ (users reading messages),
NOTIFYQ_ACTION_SEEN_MSG_DELETED (users deleting messages), etc.  Your needs or results may vary.

You will also find things like "resyncCheck found expected msgcounts 0,4,0,0,0,0,0,0,0,4,0,4,0,0,0,0,0" that are easy to ignore due to the comma delimited overload, but are extremely helpful.

Depending on your text editor, you may not see the data schema unless you loosen your search.  In this case its: URG VOI, VOI, RCPT, URG FAX, FAX, URG TXT, TXT, URG DISP, DISP, NONPROC,
SAVED URG VOI, SAVED VOI, SAVED RCPT, SAVED URG FAX, SAVED FAX, SAVED URG TXT, SAVED TXT.

So, the 0,4,0,0,0,0,0,0,0,4,0,4,0,0,0,0,0 indicates 4 voice messages, 4 not processed (?), and 4 saved messages associated with whatever user you are dwelling on.

Following the actions of users in the logs, you can determine exactly what times messages were received, lights lit, users acted on messages and how, lights extinguished, etc.

Ultimately, in this case I discovered, although the user claimed to check a mailbox at a specific time, they did not and all messages were retrieved and deleted the next time they reported to check messages.  There was no evidence of delayed messages or lights. The user retrieval was delayed.

It is important to note after all this headache, the problem could have been averted by a better phone and message delivery design.  The user responsible for checking messages in the box did not have a MWI available on their phone, but rather a secretary in another room (that was recently separated from the company) was the intended MWI recipient. 

After some discussion re: my findings, and some gentle finger pointing, I simply added a line appearance associated with the mailbox to the users phone and explained the difference between a lit LED under the handset and the envelope icon next to the line.

So far so good, and the user is grateful to not have to guess when to check a mailbox.