Azure Activity Logs Test

The Activity log is a platform log in Azure that provides insight into subscription-level events. This includes such information as when a resource is modified or when a virtual machine is started. Events of varying severity levels - eg., critical events, warning events, information events - and events of different categories - eg., administrative, service health, resource health etc. - are logged in the Activity log.

To promptly, and sometimes proactively, capture problem conditions, resolve bottlenecks, and avert potential disasters, administrators need to be alerted as soon as a critical/warning event, a serious health issue, or a crucial operational failure is logged in the Activity log. This is exactly what the Azure Activity Logs test does!

This test monitors the Activity log and reports the count of events logged per severity/category. In the process, the test notifies administrators every time a problem condition is captured by the log. Detailed diagnostics provide additional problem insights to administrators, thereby easing troubleshooting.

Note:

Typically, to consolidate log entries, correlate log data, and perform complex analysis, the Activity log is often sent to one/more Log Analytics Workspaces. This test reports valid metrics on events by reading data from these Log Analytics Workspaces only. If the Activity log is not sent to any Log Analytics Workspace, then this test will only report the value 0 for all its measures. To avoid this, before configuring this test, make sure that the Activity log is configured to be sent to at least one Log Analytics Workspace. Follow the steps discussed in Configuring the Activity Log to be Sent to a Log Analytics Workspace to achieve this:

Target of the Test: A Microsoft Azure Subscription

Agent deploying the test: A remote agent

Output of the test: One set of results for the configured SUBSCRIPTION ID

Configurable parameters for the test
Parameters Description

Test Period

How often should the test be executed.

Host

The host for which the test is to be configured.

Subscription ID

Specify the GUID which uniquely identifies the Microsoft Azure Subscription to be monitored. To know the ID that maps to the target subscription, do the following:

  1. Login to the Microsoft Azure Portal.

  2. When the portal opens, click on the Subscriptions option (as indicated by Figure 3).

    Figure 3 : Clicking on the Subscriptions option

  3. Figure 4 that appears next will list all the subscriptions that have been configured for the target Azure AD tenant. Locate the subscription that is being monitored in the list, and check the value displayed for that subscription in the Subscription ID column.

    Figure 4 : Determining the Subscription ID

  4. Copy the Subscription ID in Figure 4 to the text box corresponding to the SUBSCRIPTION ID parameter in the test configuration page.

Tenant ID

Specify the Directory ID of the Azure AD tenant to which the target subscription belongs. To know how to determine the Directory ID, refer to Configuring the eG Agent to Monitor a Microsoft Azure Subscription Using Azure ARM REST API.

Client ID, Client Password, and Confirm Password

To connect to the target subscription, the eG agent requires an Access token in the form of an Application ID and the client secret value. For this purpose, you should register a new application with the Azure AD tenant. To know how to create such an application and determine its Application ID and client secret, refer to Configuring the eG Agent to Monitor a Microsoft Azure Subscription Using Azure ARM REST API. Specify the Application ID of the created Application in the Client ID text box and the client secret value in the Client Password text box. Confirm the Client Password by retyping it in the Confirm Password text box.

Proxy Host and Proxy Port

In some environments, all communication with the Azure cloud could be routed through a proxy server. In such environments, you should make sure that the eG agent connects to the cloud via the proxy server and collects metrics. To enable metrics collection via a proxy, specify the IP address of the proxy server and the port at which the server listens against the Proxy Host and Proxy Port parameters. By default, these parameters are set to none, indicating that the eG agent is not configured to communicate via a proxy, by default.

Proxy Username, Proxy Password and Confirm Password

If the proxy server requires authentication, then, specify a valid proxy user name and password in the Proxy Username and Proxy Password parameters, respectively. Then, confirm the password by retyping it in the Confirm Password text box.

Log Analytics Workspace Name

Typically, the Activity log is sent to a Log Analytics Workspace to:

  • Correlate Activity log data with other monitoring data collected by Azure Monitor.

  • Consolidate log entries from multiple Azure subscriptions and tenants into one location for analysis together.

  • Use log queries to perform complex analysis and gain deep insights on Activity Log entries.

  • Use log alerts with Activity entries allowing for more complex alerting logic.

  • Store Activity log entries for longer than the Activity Log retention period.

By default, the Log Analytics Workspace Name parameter is set to All. This indicates that the test reads event data from all Log Analytics Workspaces configured for the target subscription, by default. However, if you want the test to use only those Log Analytics Workspaces to which the Activity logs have been sent, then provide the names of these workspaces here as a comma-separated list. To determine the names of the workspaces, do the following:

  1. Login to the Microsoft Azure Portal and select the Activity log option (see ).

Selecting the Activity log option

  1. Figure 5 will then appear listing the log entries. Next, click on the Diagnostic settings button indicated by Figure 5.

    Figure 5 : Clicking on the Diagnostic settings button

  2. Figure 6 will then appear. From the Subscription drop-down in , select the Azure subscription being monitored currently. The diagnostic settings that pre-exist for the chosen subscription will then appear. If any of the existing diagnostic settings have already been configured with Log Analytics Workspaces, then the Log Analytics workspace column highlighted in will display these workspace names. You can configure the LOG ANALYTICS WORKSPACE NAME parameter of this test with any of these workspace names. If required, you can even configure this parameter with two/more workspaces listed in , as a comma-separated list

    Figure 6 : List of workspace names

However, If the Log Analytics workspace column in is blank for all the existing diagnostic settings, it is a clear indication that the Activity log is yet to be configured to be sent to any Log Analytics Workspace. In this case therefore, you should create a new diagnostic setting for the target Subscription, where a Log Analytics Workspace is configured as the destination for the Activity log. To achieve this, follow the procedure detailed in Configuring the Activity Log to be Sent to a Log Analytics Workspace.

Exclude Operations

By default, this test does not monitor list and view operations logged in the Activity log. Accordingly, this parameter is set to *list*|*view*, by default. In a typical Azure organization, 'list' and 'view'operations are very commonplace.Numerous events related to such operations will be logged in the logs. If these operations are monitored and detailed analytics related to these operations are periodically stored in the eG database, they can grow uncontrollably over time, and may even end up choking the database. This is why, the 'list' and 'view' operations are excluded frommonitoring by default. If required, you can exclude more operations from monitoring, by appending their patterns as a grep-separated list. For instance, if you want to exclude 'restart' operations and 'health update' operations from monitoring, then your complete Exclude Operations specification will be: *list*|*view*|*restart*|*health*

DD Frequency

Refers to the frequency with which detailed diagnosis measures are to be generated for this test. The default is 1:1. This indicates that, by default, detailed measures will be generated every time this test runs, and also every time the test detects a problem. You can modify this frequency, if you so desire. Also, if you intend to disable the detailed diagnosis capability for this test, you can do so by specifying none against DD frequency.

Detailed Diagnosis

To make diagnosis more efficient and accurate, the eG Enterprise embeds an optional detailed diagnostic capability. With this capability, the eG agents can be configured to run detailed, more elaborate tests as and when specific problems are detected. To enable the detailed diagnosis capability of this test for a particular server, choose the On option. To disable the capability, click on the Off option.

The option to selectively enable/disable the detailed diagnosis capability will be available only if the following conditions are fulfilled:

  • The eG manager license should allow the detailed diagnosis capability

  • Both the normal and abnormal frequencies configured for the detailed diagnosis measures should not be 0.

Measures made by the test:
Measurement Description Measurement Unit Interpretation

Number of critical events

Indicates the number of critical events logged in the activity log.

Number

Critical events are events that demand the immediate attention of a system administrator. The incidence of a critical event may indicate that an application or system has failed or stopped responding.

Ideally therefore, the value of this measure should be 0. If a non-zero value is reported, then use the detailed diagnosis of this measure to view the complete details of the critical events.

Number of error events

Indicates the number of error events logged in the activity log.

Number

Error events are events that indicate a problem, but do not require immediate attention.

Ideally therefore, the value of this measure should be 0. If a non-zero value is reported, then use the detailed diagnosis of this measure to view the complete details of the error events.

Number of warning events

Indicates the number of warning events logged in the activity log.

Number

Warning events are those that provide forewarning of potential problems. Such events indicate that a resource is not in an ideal state and may degrade later into showing errors or critical events.

Ideally therefore, the value of this measure should be 0. If a non-zero value is reported, then use the detailed diagnosis of this measure to view the complete details of the warning events. Studying these events closely may proactively alert you to probable problem situations.

Number of information events

Indicates the number of information events logged in the activity log.

Number

Information events are those that pass non-critical information to the administrator - similar to a note that says: "For your information".

Use the detailed diagnosis of this measure to view the complete details of information events logged in the activity log.

Number of administrative events

Indicates the number of events in the activity log that belong to the Administrative category.

Number

The Administrative category contains the record of all create, update, delete, and action operations performed through Resource Manager. Examples of Administrative events include create virtual machine and delete network security group.

Every action taken by a user or application using Resource Manager is modeled as an operation on a particular resource type. If the operation type is Write, Delete, or Action, the records of both the start and success or fail of that operation are recorded in the Administrative category. Administrative events also include any changes to Azure role-based access control in a subscription.

Number of policy events

Indicates the number of events in the activity log that belong to the Policy category.

Number

The Policy category contains records of all effect action operations performed by Azure Policy. Examples of Policy events include Audit and Deny. Every action taken by Policy is modeled as an operation on a resource.

Number of security events

Indicates the number of events in the activity log that belong to the Security category.

Number

The Security category contains the record of alerts generated by Microsoft Defender for Cloud. An example of a Security event is Suspicious double extension file executed.

Service Health events come in Six varieties: Action Required, Assisted Recovery, Incident, Maintenance, Information, or Security. These events are only created if you have a resource in the subscription that would be impacted by the event.

Number of service health events

Indicates the number of events in the activity log that belong to the Service Health category.

Number

The Service Health category contains the record of any service health incidents that have occurred in Azure. An example of a Service Health event SQL Azure in East US is experiencing downtime.

Number of resource health events

Indicates the number of events in the activity log that belong to the Resource Health category,

Number

The Resource Health category contains the record of any resource health events that has occurred to your Azure resources. An example of a Resource Health event is Virtual Machine health status changed to unavailable.

Resource Health events can represent one of four health statuses: Available, Unavailable, Degraded, and Unknown. Additionally, Resource Health events can be categorized as being Platform Initiated or User Initiated.

Number of alert events

Indicates the number of events in the activity log that belong to the Alert category,

Number

The Alert category contains the record of activations for Azure alerts. An example of an Alert event is CPU % on myVM has been over 80 for the past 5 minutes.

Number of autoscale events

Indicates the number of events in the activity log that belong to the Autoscale category,

Number

The Autoscale category contains the record of any events related to the operation of the autoscale engine based on any autoscale settings you have defined in your subscription. An example of an Autoscale event is Autoscale scale up action failed.

Number of recommendation events

Indicates the number of events in the activity log that belong to the Recommendation category,

Number

The Recommendation category Contains recommendation events from Azure Advisor.

Number of resource create/write operations

Indicates the number of resource create/write operations logged in the activity log.

Number

 

Number of failed events

Indicates the number of events logged in the activity log that indicate operational failures. .

Number

Ideally, the value of this measure should be 0.

Number of succeeded resource create/write operations

Indicates the number of resource create/write operations that succeeded.

Number

 

Number of failed resource create/write operations

Indicates the number of resource create/write operations that failed.

Number

Use the detailed diagnosis of this measure to know which resource create/write operations failed.

Number of autoscale up events

Indicates the number of autoscale up events that were logged in the Activity log.

Number

Scaling up and down, or vertical scaling, keeps the number of resources constant, but gives those resources more capacity in terms of memory, CPU speed, disk space and network. Vertical scaling is limited by the availability of larger hardware, which eventually reaches an upper limit.

To know when resources were scaled up/down and why, use the detailed diagnosis of the Number of VM scale set autoscale up events or Number of VM scale set autoscale down events measure (as the case may be).

Number of  autoscale down events

Indicates the number of autoscale down events that were logged in the Activity log.

Number

Resource delete operations

Indicates the number of resource delete events that were logged in the Activity log.

Number

 

Succeeded resource delete operations

Indicates the number of resource delete operations that succeeded.

Number

Use the detailed diagnosis of this measure to know which delete operations succeeded.

Failed resource delete operations

Indicates the number of resource delete operations that failed.

Number

Use the detailed diagnosis of this measure to know which delete operations failed.