Search

Troubleshoot Connection Center

If you're having trouble with Connection Center our troubleshooting steps can help you work out what is going wrong. They are loosely in an order you would look to approach a problem with, however not all steps are relevant to all problems.

Below this, is the specific issues section. This contains a number of issues or quirks that we have encountered and felt were worth calling out directly. Not all of these have resolutions, and can just be quirks of SCOM.

This page is focused on SCOM. If you are looking for troubleshooting steps related to a specific destination, you can find these in the child pages of this article.

Troubleshooting Steps

1. Connection UI

Connection Center comes with a UI for each of the different connection types, showing you the properties and state of each connection.

In the example image above there are a couple of key things. The state of the connection is shown as Disabled and there is no option to Enable the connection. In this scenario, you should attempt to modify the connection at which point we are shown that the license has expired:

If the connection is showing an error state it would be worth checking over the properties to make sure that these are set up correctly. Common issues include:

Typos in the URL (which can be clicked on to verify it is correct)
Choosing the incorrect App configuration
- Check with your ServiceNow admin whether you should be using the Cookdown App or Event Management
- Check with your Cherwell admin that you are using correct Incident values

2. View Test Events

Each connection will attempt to run Connection tests periodically and these will generate events that can be looked at via the UI.

In the case of REST API-based connections (such as ServiceNow or Cherwell), these tests happen on startup. For webhook-based connections (such as Slack or Microsoft Teams), these checks occur the first time an alert is sent via the connection.

Selecting the connection in the UI and then selecting 'View Events' will show you all of the recent events for the connection type. Any recent errors (such as those shown in the examples below) should be investigated.

Incorrect or absent credentials can cause problems connecting, resulting in an error.

Typos in the URL can also result in connection errors.

The event IDs across all connection types are structured to allow you to quickly narrow down your search to only those that matter at the time.

Event ID	Meaning
224_	The connection test has passed
225_	The connection test has failed
22_1	An Outbound Notification test
22_2	An Inbound Notification test
22_3	An Inbound Maintenance test

So using the examples above we see that the 1st event is a failure on an Outbound Notification test (2251) and in the 2nd event, we have a failure on an Inbound Notification test (2252).

Note that this view will only show you test events. Successes/Failures that occur outside of these scenarios will not be shown in this view.

These are SCOM events and are subject to your grooming settings. You may find that the 'View events' view is empty if a connection test has not occurred within this time period. You may still be able to view these events on the management server in question or via Log aggregation tools if you employ them.

3. Check for Connection Alerts

When a connection fails, by default, an alert is raised to notify you of the occurrence. This will be a warning alert and the source will be for the ‘Connection Hub' of the destination type in question. In the following example, this is the 'ServiceNow Connection Hub’.

Viewing the alert properties and 'Alert Context' in particular can help shed some light on what is going on and can go into further detail than the connection tests.

3a. Check Runas Account/Profile

If you’re seeing authentication/authorization issues in the previous two steps, it is worth checking that your run-as accounts and profiles are configured and distributed correctly. Correlate the RunAs Profile in the alert with the correct Account. Ensure that the Account is distributed to the Management Server raised in the Test event. We have further guidance on this here.

If you are using a proxy, please check if this requires authentication. We have seen proxies throw 401 and 403 errors instead of 407 implying that this is an authentication issue on the target service and masking the real issue. If your proxy does require authentication we do have a proxy run-as profile that can be used to provide credentials.

4. Disable/Re-enable the Connection

If there have been repeated failures over a large period of time, SCOM can unload the module and stop it from re-loading. Disabling the connection, waiting for the configuration MP to be distributed, and then re-enabling the connection will reset this, allowing the connection to start working again (assuming the underlying problem is no longer present).

In high availability scenarios, this might also cause the process to be started upon a different management server. If you regularly see failure events from one server and not another there could be an environmental factor at play.

There are two ways to approach this.

By selecting the ‘Disable Connection’ option from the UI
By modifying the connection and turning the 'Connection is enabled' option off

Under the hood, these do behave slightly differently. The former preserves the current connection, the latter destroys it and then recreates it. For example, if you have an ITSM maintenance period where you temporarily want to stop generating incidents, you should use the Disable Connection option. Any alerts that are changed whilst the connection is disabled will be queued up and sent when the connection is re-enabled using the 'Enable Connection' option. On the other hand, if you have an alert storm in SCOM you may wish to modify the connection to change the state. This re-creates the connection effectively re-setting the queue. You could achieve the same behavior by using the Disable Connection link and then modifying the connection later to be enabled. Bear in mind that legitimate alerts may get thrown out with the unwanted alerts and that any manual customizations to the connection via the XML may also be removed.

5. View Health Explorer

The Health explorer can help narrow down the timings of state changes very quickly.

From the Monitoring Tab, select Discovered Inventory, followed by Change Target Type. From the list of all targets search for Cookdown to see all the available classes. If you are interested in one particular connection type, select the relevant connection hub. Otherwise, select 'Cookdown Connection Center Hub' to view all of the available connection types.

You should then be shown all of the available destinations

Select the destination you are interested in, followed by the 'Health Explorer'. By default, you will be shown any monitors in an error state, however, you can remove the monitor filter to see healthy monitors under the Availability monitor.

Select the monitor you are interested in and then select the 'State Change Events' tab to see what might have caused failures (or been affected by failures).

6. Check Alert History

If you are looking at an individual or small group of alerts you might want to check the ‘Alert History’ from the alert Properties. Connection Center will attempt to update each alert when it makes changes.

There are sometimes underlying issues that can prevent Connection Center from performing the expected tasks. For example, if an object is unavailable and in a 'grey state' the action to reset the monitor will fail. In this case, the history would still be updated to indicate that this has been attempted even if the task is not successful.

7. Find the active Management Server in the pool

From here we need to check out what is going on with individual members of the resource pool. For high availability purposes, there is usually a number of servers in the resource pool, however, only one server in the pool is ever doing the work assigned. If you have test events logged you should be able to work out what server is being used by the 'Logging Computer' property. If you have alerts raised for the connection, some of these alerts are raised against the health service of the management server doing the work. If this is the case you can use these names to narrow your focus. Failing that you can definitively determine the management server using the Operations Manager database with a SQL query adapted from this Catapult Systems blog article:

select
BaseManagedEntity.DisplayName
,cs.agent.AGentGuid
,cs.WorkFlowExecutionLocationAgent.AgentRowId
,cs.workflowexecutionlocation.WorkflowExecutionLocationRowId
,cs.workflowexecutionlocation.DisplayName
from cs.WorkFlowExecutionLocationAgent
inner join cs.workflowexecutionlocation
ON cs.WorkFlowExecutionLocationAgent.WorkFlowExecutionLocationAgentRowId = cs.workflowexecutionlocation.WorkflowExecutionLocationRowId
inner join CS.agent
ON CS.agent.AgentRowId=cs.WorkFlowExecutionLocationAgent.AgentRowId
inner join BaseManagedEntity
ON BaseManagedEntity.BaseManagedEntityId = CS.agent.AGentGuid
where cs.workflowexecutionlocation.DisplayName = 'Cookdown Connection Center Resource Pool'

Once you know which management server is handling your connections, you can usually narrow your focus to that server.

8. Check Event Log

Connection Center will log out Warning and Error events to the Operations Manager event log as required. Generally, these will come from the ‘Cookdown Connection Center' and ‘Cookdown Managed Modules’ sources, but, you may find that there are events from other sources such as 'HealthService’ related to Connection Center in the runup or aftermath of these events. It can be worth filtering down to Cookdown Events to find rough timescales and then looking at that time period a bit more generally.

In our example to the right (event ID 4513 followed by 1103), you can see that a persistent network issue has caused SCOM to force Connection Center to abandon its reconnection attempts. Whilst not directly related to the root cause, it will compound the issue.

9. Restart the Health Service

In the previous example, we saw that SCOM can force our connection offline after too many failures. The easiest way to get the connection to reload in this scenario is to restart the health service of the management server looking after the connection. From PowerShell you can do this simply:

Get-Service -Name healthservice | Restart-Service

If you have Kevin Holmans SCOM Management MP you can do this via an agent task:

And of course, you could do this in person on the server by restarting the service from the Services List.

10. Try a Duplicate/Similar Connection

It can be worth re-creating the connection. This can help highlight any typos that might have crept into the initial connection.

In the case of complex alert criteria please try simpler criteria to test with and then build it up from there. Resolution states, for example, can cause issues if used in criteria as they can stop alert updates from being sent if they reach an unexpected state. We go into this particular scenario a bit deeper with some examples in our section about the difference between internal connectors and subscriptions.

11. Restrict Connection Center to a Single Management Server

In the case of intermittent issues or for debug logs it can be worth restricting the Connection Center Resource Pool down to a single server. This will ensure that you have all of your event logs in the same consistent location (especially useful in the case of debug logging). Rotating this pool through the servers manually should allow you to identify if a single server has an issue.

We have detailed instructions on how to achieve this here.

12. Flush the health service cache

A more extreme version of the health service restart. This forces the health service to completely unload its processes, clear out any cached information and restart. With Connection Center this can be useful if the connection has gotten itself into a strange state. For example, it may be hanging on to expired licenses or it may be trying to process something that it shouldn’t.

If you have Kevin Holmans SCOM Management MP you can do this via an agent task:

Otherwise you can do this from the Monitoring Menu:

Select OperationsManager > Management Server > Management Server State > Select Server from Management Server State.

Finally, Select 'Flush Health Service State and Cache' from the task menu:

There are also options to do this manually (or via script) by stopping the health service, removing its health service state folder, and restarting the service. But this is outside the scope of this article and not something that we would necessarily recommend doing.

13. Manual Checks

Depending on the destination you may be able to do some manual checks using tools like Postman or PowerShell to validate some of the details you have been provided and your connection to the target endpoint. The specifics for how to check these details manually do vary by destination so largely this is covered under the subpages for each destination:

14. Enable Debug Logging

Enabling debug logging can be fairly intense on your management server depending on how busy it is. But this can give a lot more insight into what is going on. If you are going to do this we typically recommend that you do this server by server (using your resource pool) and that you increase the size of the Operations Manager log.

This may allow you to catch more specific errors or small details that can point you in the correct direction. Should you need to go down this route, we have further detail about the registry key and value here. If you’re at this stage it’s also probably worth talking to our support team by raising a ticket through the web portal or by emailing support@cookdown.com.

Specific Issues

There are some specific issues that may be encountered during the use of Connection Center or after upgrading from Alert Sync. The following may provide you with a direct solution or workaround.

Orphaned Internal Connectors

Problem

You created an Internal Connector in SCOM using Alert Sync and have removed this MP before cleaning up unused connectors. You may also have created a Connector-based connection in Connection Center using the Advanced UI option and run into a scenario where this has become orphaned.

Solution

Remove the duplicate/unused connector from the SCOM SDK. We have a simple Powershell script that will enable you to do this by following the below steps

Load Powershell
Download our Powershell script
Run script
Select the orphaned internal connector to remove and hit OK

SDK/Destination Severity	Console Severity
Error	Critical
Warning	Warning
Information	Information

SDK/Destination Priority	Console Priority
High	High
Normal	Medium
Low	Low

Troubleshoot Connection Center

Troubleshooting Steps

1. Connection UI

2. View Test Events

3. Check for Connection Alerts

3a. Check Runas Account/Profile

4. Disable/Re-enable the Connection

5. View Health Explorer

6. Check Alert History

7. Find the active Management Server in the pool

8. Check Event Log

9. Restart the Health Service

10. Try a Duplicate/Similar Connection

11. Restrict Connection Center to a Single Management Server

12. Flush the health service cache

13. Manual Checks

14. Enable Debug Logging

Specific Issues

Orphaned Internal Connectors

Problem

Solution

Download Script

Incorrect Incident ID/Synced Data in SCOM if Instance Changed

Problem

Solution

Alerts Intermittently Sent from SCOM to ServiceNow when using Internal Connector

Problem

Cause

Solution

Workaround

Forwarding Status is Pending for a Long Period of Time

Problem

Workaround

Alerts in Your Destination Have Different Severities/Priorities to the Console

Problem

Cause

Using Right to Left Reading Order or Unicode Control Characters Can Cause Display Issues

Problem

Cause

Overriding a Connection Manually in Scom Does Not Show in the Relevant Connection UI

Problem

Cause

Using the Criteria Picker Can Result in Errors

Problem

Cause

You Cannot Modify the Severity or Priority of a Rule Once Created

Problem

Cause

Workaround

When exporting a connection based on a connector to XML it will create a connector

Problem

Cause

LastModified and Repeat Count Can Be out of Sync

Problem

Cause

A Task Was Canceled

Problem

Cause

No Events Raised by Cookdown Components

Problem

Cause

Resolution

Connection Center Alert Export Failure/Connection Center Alert Update Failure

Problem

Cause

Resolution

Cookdown EventLog Configuration

Problem

Cause

Resolution

Cannot set Object connection RunAs Profile independently of Notifications

Problem

Cause

Workarounds

Editing a maintenance mode in the SCOM console looks like it has the wrong time

Problem

Cause

Resetting alert monitors can cause two updates to occur

Problem

Cause