Wednesday, August 14, 2013

Cisco CUCM NTP servers, VMWare, re-licensing and what NOT to do

Let's assume you have a Cisco CUCM server(s) setup to use public NTP servers.
Let's assume those servers are unreliable, have been retired or are unreachable and you want to change them in CUCM.
Let's also assume you are running in CUCM in VMWare and you realize changing the NTP servers will invalidate your licenses because the license MAC will change.
Let's assume rehosting the licenses might be time consuming, or for whatever reason, an issue.
Let's assume you know you could, but don't want to root the CUCM server to spoof the license MAC.

You know what I mean, right?

Here's a dirty little hack.

Use a voice gateway as your NTP master (you could use any device with a real clock):

; setup your time zone correctly
clock timezone EST -5
clock summer-time EDT recurring
; setup name look ups 
ip name-server
ip name-server
; setup some name servers
ntp server
ntp server
ntp server
; setup the gateway as an NTP master
ntp master 5
; add a loopback interface 
; use the address of the CUCM NTP server that's unreliable
interface Loopback10
ip address
; verify your NTP status
show ntp associations
show ntp status

If the voice gateway is not your CUCM's default gateway, add a route to your new time server (you could have just made the default gateway the NTP master, but let's say it's an old layer 3 switch without an internal clock):

; create a static route to the old unreliable NTP server
; and route it to the voice gateway address
ip route

You now are using an internal NTP  server (the voice gateway's loopback) that you have control over (courtesy of some basic routing), synching with a reliable list of public clocks, and don't have to fiddle with CUCM.

Maybe dirty. Definitely works. Go nuts.

Sunday, August 11, 2013

IPCelerate IPSession, Status Solutions, Dukane nurse call and stattap.exe crashing

Running IPCelerate IPSession version 5.8.4 with a Dukane nurse call integration.  Status Solutions provides the interface between the Dukane TAP page output and IPCelerate SAM.

After the server hung during a Microsoft update application it was hard booted. At first the problem was that alerts designed to be presented to a single 7925 WiFi phone were being presented to all phones in the IPSession page group.  Then I found the stattap.exe application (the "Status Solutions TAP Interface" service) was crashing shortly after startup.  Several pending alerts could be sent and then the stattap.exe would crash.  A symptom was that the IPCelerate SAM web GUI would display a "Got error 134 from table handler", rather than a grid of recent alerts.

Ultimately, the issue was a corrupted alarm_device table in the MySQL statsol database.

You can gain access to the MySQL database by running:
c:\ipcelerate\statsol\mysql\bin\mysql -u root -pstatsoldb statsol

You can see the tables in the statsol database by using:

The output should look similar to the following:
| Tables_in_statsol           |
| action                      |
| alarm                       |
| alarm_device                |
| alarm_device_image          |
| alarm_device_type           |
| alarm_log                   |
| alarm_log_2000_q1           |

The list continues but it's not shown here...

With that, wanting to check tables without being potentially destructive and having the most control during the process, I started checking tables by running:

Given the number of tables in the list, I was happy to find the alarm_device table was an offender.  The check, repair and recheck output is shown below:

The bad check...

mysql> check table alarm_device;
| Table                | Op    | Msg_type | Msg_text
| statsol.alarm_device | check | warning  | 3 clients are using or haven't close
d the table properly |
| statsol.alarm_device | check | error    | Unexpected byte: 0 at link: 179536
| statsol.alarm_device | check | error    | Corrupt
3 rows in set (0.05 sec)

The repair...

mysql> repair table alarm_device;
| Table                | Op     | Msg_type | Msg_text
| statsol.alarm_device | repair | warning  | Number of rows changed from 1645 to
 1644 |
| statsol.alarm_device | repair | status   | OK
2 rows in set (0.09 sec)

A good check...

mysql> check table alarm_device;
| Table                | Op    | Msg_type | Msg_text |
| statsol.alarm_device | check | status   | OK       |
1 row in set (0.01 sec)

After some discussions with Status Solutions support staff, they confirmed that the alarm_device table is a common issue after unclean reboots.  They also indicated running mysqlcheck should not cause any issues while MySQL is running.

A much more efficient command is then:
C:\IPcelerate\statsol\mysql\bin\mysqlcheck --repair --check-only-changed -uroot -pstatsoldb statsol

The output should be similar to this:
statsol.action                                     Table is already up to date
statsol.alarm                                      OK
statsol.alarm_device                               OK
statsol.alarm_device_image                         Table is already up to date
statsol.alarm_device_type                          OK
statsol.alarm_log                                  OK
statsol.alarm_log_2000_q1                          Table is already up to date

The list continues below but is not shown...

mysqlcheck appears to run the manual check and appropriate repair in one process, as shown here:

Monday, August 05, 2013

Cisco UCCX Application Manager in partial service

See here or he like for typical troubleshooting techniques:

Problem: You find your Cisco UCCX server(s) in partial service, and after drilling down find the Application Manager is in partial service.  One option is to change the trace levels to debug on the APP_MGR, restart the engine(s) or server(s) and then examine the MIVR logs.  Searching the logs for PARTIAL_SERVICE or ERROR at the time of reboots can be telling as it may reference an offending script.  At at high profile installations, restarting / rebooting is not a practical option, and pulling logs from the Linux appliances is not as easy as it was on the older Windows based systems.

Possible easy solution:  Check the Real Time Reporting in UCCX.  From the administrative GUI, choose Tools | Real Time Reporting | Report | Applications.  You should find a grid of the applications on your deployment and a column on the right titled Valid.

If any of them have a value of false, you can be sure the partial service is at least partially attributable to that application.  Validating the script via the script editor, checking for valid sub flows, etc. is in order.

If you don't "own" the administration of UCCX and have various administrators not checking server health after their changes, this is a nice easy option, pointed out to me by a customer in just such an environment.  It appears to be available in at least versions 7 through 9.