Downloading diagnostics and logs

Info
Diagnostics are not available for the Apama-ctrl-smartrules and Apama-ctrl-smartrulesmt microservices.

If a user has READ permission for “CEP management”, then two links for downloading diagnostics information are available from the bottom of the Advanced Rules application: one for downloading basic diagnostics information (the Diagnostics link) and another one for downloading enhanced (more resource-intensive) diagnostics information (the Enhanced link). These links are shown at the bottom of the home screen and also on the pages that appear when you go to the Analytics Builder and EPL Apps pages (that is, in the EPL app manager and in the model manager).

It may be useful to capture this diagnostics information when experiencing problems, or for debugging EPL apps. It is also useful to provide to product support if you are filing a support ticket. You can see a version number next to the links.

Basic diagnostics information is provided in a ZIP file named diagnostic-overview<timestamp>.zip and includes the following information (this should be typically a few Megabytes, and be generated in about 5 seconds):

Enhanced diagnostics information is provided in a ZIP file named diagnostic-enhanced<timestamp>.zip and includes the following information:

What a user can see or do depends on the permissions:

Log files of the Apama-ctrl microservice

There are two ways to get the logs of the Apama-ctrl microservice:

Contact product support if needed.

Diagnostics REST endpoints

Info
These endpoints are not available for the Apama-ctrl-smartrules and Apama-ctrl-smartrulesmt microservices.

The following diagnostics endpoints are available for REST requests. These require authentication as a user with READ permission for “CEP management”:

Monitoring REST endpoints

The following monitoring endpoints are available for REST requests. These require authentication as a valid user, but do not require any special roles.

Alarms generated by the Apama-ctrl microservice

Alarms are created by user applications in the Cloud of Things tenant (for example, by an analytic model, an activated EPL file, or a smart rule). To learn about alarms in general, refer to Device management > Working with alarms in the User guide. The Apama-ctrl microservice also generates alarms because it has encountered some problem, so that the user is notified about the situation. The information below is about alarms that are generated by the Apama-ctrl microservice, their causes, consequences and possible ways to resolve them.

Info
Alarms generated by Apama-ctrl about its own state are available as of Analytics Builder 10.5.0 and EPL Apps 10.5.0.

You can view alarms in the following ways:

  1. In the Cockpit application. See Cockpit in the User guide for detailed information.
  2. In the Administration application, under Ecosystem > Microservices. Click the Apama-ctrl microservice and then click Status. See Administration > Managing and monitoring microservices in the User guide for detailed information.
  3. From the Advanced Rules application. Click the Diagnostics (or Enhanced) link which is provided at the bottom of the home screen. A ZIP file is then downloaded that contains alarms information under /alarm/alarms_apama-ctrl-object.json. See Downloading diagnostics and logs for detailed information.

Alarm severities

Severity Description
CRITICAL Apama-ctrl was unable to continue running the user’s applications and will require corrective action.
MAJOR Apama-ctrl has encountered a situation that will result in some loss of service (for example, due to a restart).
MINOR Apama-ctrl has a problem that you might want to fix.
WARNING There is a warning.

Alarms created by the Apama-ctrl microservice

Apama-ctrl can create alarms to notify users in scenarios such as the correlator running out of memory, uncaught exceptions in activated EPL files, and so on. Once you see an alarm in the Cloud of Things tenant, you should diagnose it and resolve it depending on the severity level of the raised alarm. Each alarm has details such as title, text, type, date, and count (represents the number of times the alarm has been raised).

The following is a list of the alarms. The information further down below explains when these alarms will occur, their consequences, and how to resolve them.

Once the cause of an alarm is resolved, you must acknowledge and clear the alarm in the Cloud of Things tenant. Otherwise, you will continue to see the alarm until a further restart of the Apama-ctrl microservice.

Info
The alarm texts for the alarms below may undergo minor changes in the future.

Change in tenant options and restart of Apama-ctrl

This alarm is raised when a tenant option changes in the analytics.builder or streaminganalytics category. For details on the tenant options, refer to the Tenant API in the Cloud of Things OpenAPI Specification for more details.

Analytics Builder allows you to configure its settings by changing the tenant options, using key names such as numWorkerThreads or status_device_name. For example, if you want to process things in parallel, you can set numWorkerThreads to 3 by sending a REST request to Cloud of Things, which will update the tenant option. Such a change automatically restarts the Apama-ctrl microservice. To notify the users about the restart, Apama-ctrl raises an alarm, saying that changes have been detected in a tenant option and that Apama-ctrl will restart in order to use it.

Once you see this alarm, you can be sure that your change is effective.

Safe mode on startup

This alarm is raised whenever the Apama-ctrl microservice switches to safe mode.

Apama-ctrl detects if it has been repeatedly restarting and if user assets (EPL apps, analytic models, extensions) have been modified recently. Apama-ctrl disables all user assets as a precaution. Potential causes are, for example, an EPL app that consumes more memory than is available or an extension containing bugs.

You can check the mode of the microservice (either normal or safe mode) by making a REST request to service/cep/diagnostics/apamaCtrlStatus (available as of EPL Apps 10.5.7 and Analytics Builder 10.5.7), which contains a safe_mode flag in its response.

To diagnose the cause of an unexpected restart, you can try the following:

In safe mode, all previously active analytic models and EPL apps are deactivated and must be manually re-activated.

Deactivating models in the Apama-ctrl-starter microservice

This alarm is raised when Apama-ctrl switches from the fully capable microservice to the Apama-ctrl-starter microservice with more than 3 active models.

With the Apama-ctrl-starter microservice, a user can have a maximum of 3 active models. For example, a user is working with the fully capable Apama-ctrl microservice and has 5 active models, and then switches to Apama-ctrl-starter. Since Apama-ctrl-starter does not allow more than 3 active models, it deactivates all the active models (5) and raises an alarm to notify the user.

High memory usage

This alarm is raised whenever the Apama-ctrl microservice consumes 90% of the maximum memory permitted for the microservice container. During this time, the Apama-ctrl microservice automatically generates the diagnostics overview ZIP file which contains diagnostics information used for identifying the most likely cause for memory consumption.

There are 3 variants of this alarm, depending on the time and count restrictions of the generated diagnostics overview ZIP file.

First variant:

Second variant:

Third variant:

Running EPL apps (and to a lesser extent, smart rules and analytic models) consumes memory, the amount will depend a lot on the nature of the app running. The memory usage should be approximately constant for a given set of apps, but it is possible to create a “memory leak”, particularly in an EPL file or a custom block. The Apama-ctrl microservice monitors memory and raises an alarm with WARNING severity if the 90% memory limit is reached along with the diagnostics overview ZIP file and saves it to the files repository (as mentioned in the alarm text).

Apama-ctrl generates the diagnostics overview ZIP files with the following conditions:

To diagnose high-memory-consuming models and EPL apps, you can try the following (it could be listener leaks, excessive state being stored or spawned monitors leaking, and so on):

If the memory continues to grow, then when it reaches the limit, the correlator will run out of memory and Apama-ctrl will shut down. To prevent the microservice from going down, you must fix this as a priority.

See also Diagnostic tools for Apama in Cloud of Things in DT IoT’s Tech Community.

Warning or higher level logging from an EPL file

This alarm is raised whenever messages are logged by EPL files with specific log levels (including CRITICAL, FATAL, ERROR and WARNING).

The Advanced Rules application allows you to deploy EPL files to the correlator. The Apama-ctrl microservice analyzes logged content in the EPL files and raises an alarm for specific log levels with details such as monitor name, log text and alarm type (either of WARNING or MAJOR), based on the log level.

For example, the following is a simple monitor which prints a sequence and logs some texts at different EPL log levels.

monitor Sample{
   action onload() {
      log "Info"; // default log level is now INFO
      log "Fatal Error" at FATAL; // log level is FATAL
      log "Critical Error" at CRIT; // log level is CRITICAL
      log "Warning" at WARN; // log level is WARNING
   }
}

Apama-ctrl analyzes all the log messages, filters out only certain log messages, and raises an alarm for the identified ones. Thus, Apama-ctrl generates the following three alarms for the above example:

First alarm:

Second alarm:

Third alarm:

An EPL file throws an uncaught exception

You have seen that the Apama-ctrl microservice raises alarms for logged messages. In addition, there can also be uncaught exceptions (during runtime). Apama-ctrl identifies such exceptions and raises alarms so that you can identify and fix the problem.

For example, the following monitor throws IndexOutOfBoundsException during runtime:

monitor Sample{
   sequence<string> values := ["10", "20", "30"];
   action onload() {
      // IndexOutOfBoundsException (runtime error)
      log "Value = " + values[10] at ERROR;
   }
}

Apama-ctrl generates the following alarm for the above example:

You can diagnose the issue by the monitor name and line number given in the alarm.

For more details, you can also check the Apama logs if the tenant has the “microservice hosting” feature enabled. Alarms of this type should be fixed as a priority as these uncaught exceptions will terminate the execution of that monitor instance, which will typically mean that your app is not going to function correctly. This might even lead to a correlator crash if not handled properly.

An EPL file blocks the correlator context for too long

If an EPL app has an infinite loop, it may block the correlator context for too long, not letting any other apps run in the same context or, even worse, causes excessive memory usage (as the correlator is unable to perform any garbage collection cycles) leading to the app running out of memory. The Apama-ctrl microservice identifies such scenarios (the correlator logs warning messages if an app is blocking a context for too long) and raises alarms, so that the user can identify and fix the problem.

For example, the following monitor blocks the correlator main context:

event MyEvent {
}

monitor Sample{
    action onload() {
        while true {
            // do something
            send MyEvent() to "foo";
        }
    }
}

Apama-ctrl generates the following alarm for the above example:

You can diagnose the issue by the monitor name and context name given in the alarm.

For more details, you can also check the Apama logs if the tenant has the “microservice hosting” feature enabled. Alarms of this type should be fixed as a priority as these scenarios may lead to the microservice and correlator running out of memory.

EPL app restore timeout on restart of Apama-ctrl

If restoring an EPL app on a restart of the Apama-ctrl microservice takes a long time and exceeds the time limit specified by the recovery.timeoutSecs tenant option (in the streaminganalytics category) or a default of 60 seconds, the Apama-ctrl microservice times out and raises an alarm, indicating that it will restart and reattempt to restore the EPL app. The alarm text includes the names of any EPL apps that are considered to be the reason for the timeout.

The following information is only included in the alarm text if the Apama-ctrl microservice detects that the timeout is due to some EPL apps: “The following EPL apps may be the cause of this: <comma-separated list of app names>.". If no such apps are detected, this information is omitted from the alarm text.

Multiple extensions with the same name

This alarm is raised when the Apama-ctrl microservice tries to activate the deployed extensions during its startup process and there are multiple extensions with the same name.

This disables all extensions that were deployed to Apama-ctrl. In order to use the deployed extensions, the user must decide which extensions to keep and then delete the duplicate ones.

Info
In case of multiple duplicates, this alarm is only listed once.

Smart rule configuration failed

This alarm is raised if a smart rule contains an invalid configuration.

To diagnose the cause, download the diagnostics overview ZIP file as described in Downloading diagnostics and logs. Or, if that fails, log on as an administrator and look at the result of a GET request to /service/smartrule/smartrules?withPrivateRules=true. Review the smart rules JSON and look for invalid smart rule configurations. Such smart rules must be corrected.

The Apama microservice log contains more details on the reason for the smart rule configuration failure. For example, it is invalid to configure an “On measurement threshold create alarm” smart rule with a data point that does not exist.

Smart rule restore failed

This alarm is raised if a corrupt smart rule is present in the inventory and the correlator therefore fails to recover it correctly during startup.

To diagnose the cause, download the diagnostics overview ZIP file as described in Downloading diagnostics and logs. Or, if that fails, log on as an administrator and look at the result of a GET request to /service/smartrule/smartrules?withPrivateRules=true. Review the smart rules JSON and look for invalid smart rule configurations. Such smart rules may need to be deleted or corrected.

Connection to correlator lost

This alarm is raised in certain cases when the connection between the Apama-ctrl microservice and the correlator is lost. This should not happen, but can be triggered by high load situations.

Apama-ctrl will automatically restart. Report this to product support if this is happening frequently.

Performance alarms

Input or output queues that are filling up are a symptom of a serious performance degradation, suggesting that events or requests are being produced by Apama or Cloud of Things faster than they can be processed by Apama or Cloud of Things.

The performance of the correlator’s input and output queues is periodically monitored. Different types of alarms can be raised, where the alarm text contains a snapshot of the correlator status at the time of raising the alarm.

This alarm is raised for the input queues:

This alarm is raised for the output queues:

This alarm is raised for both the input and output queues:

See also List of correlator status statistics in the Apama documentation.

Check the text from the above alarms to get an indication of which queue is blocking. A problem is likely to trigger these alarms, followed by this alarm:

This alarm is raised whenever the CEP queue for the respective tenant is full. It is coming from Cloud of Things Core, but concerns Apama-ctrl.

Karaf nodes that send events to the CEP engine maintain per-tenant queues for the incoming events. This data gets processed by the CEP engine for the hosted CEP rules. For various reasons, these queues can become full and cannot accommodate newly arriving data. In such cases, an alarm is sent to the platform so that the end users are notified about the situation.

If the CEP queue is full, older events are removed to handle new incoming events. To avoid this, you must diagnose the cause of the queue being full and resolve it as soon as possible.

The CEP queue size is based on the number of CEP events, not raw bytes.

To diagnose the cause, you can try the following. It may be that the Apama-ctrl microservice is running slow because of time-consuming smart rules, analytic models or EPL apps, or the microservice is deprived of resources, or code is not optimized, and so on. Check the correlator input and output queues from the above alarms (or from the microservice logs or from the diagnostics overview ZIP file under /correlator/status.json).

Parent tenant not subscribed

This alarm is raised for a subtenant that was subscribed before the parent tenant was subscribed.

The Apama-ctrl microservice allows you to subscribe to tenants in any order. However, as long as the parent tenant is not subscribed, the microservice functionality will not work on the subtenant.

This alarm is cleared once the parent tenant is subscribed.