Hyper-V VM Heartbeat Status Test

User access to a VM can be disrupted by many factors. A poor network link or a broken network link can delay/deny users access to a VM. Beside such external network connectivity issues, a user may not be able to reach a VM owing to internal issues as well – these issues can range from a VM lock, a VM crash, or a sudden termination of a VM’s operations. This is why, when a user complains of being unable to access a VM, the administrator needs to quickly determine the reason for the inaccessibility of the VM, so that the correct remedial action can be initiated and access to the VM can be swiftly restored.

The Hyper-V VM Heartbeat Status test periodically monitors the hearbeat service installed on each VM and reports whether that service and the VM it is operating on are functioning properly or not. The heartbeat service allows the parent partition to detect when a virtual machine has locked up, crashed or otherwise ceased to function. The parent partition sends heartbeat messages to the guest operating system at regular intervals. It is then the job of the Hyper-V Heartbeat Service installed on the guest operating system to send a response to each of these heartbeat messages. When the parent partition fails to receive responses from the child partition, it assumes that the child's Heartbeat Service, and therefore the guest operating system on which it is running, has encountered problems. By closely monitoring the heartbeat service, this test enables administrators to determine whether/not internal issues (eg., a VM lock, a VM crash, etc.) are affecting the accessibility of a VM. If the test reports that the heartbeat service and the VM it is installed on are up and running, the administrator can safely conclude that internal factors are not responsible for the unavailability of that VM; further investigation as to the reason for the VM’s unavailability can then be carried out.   

Target of the test : A Hyper-V / Hyper-V VDI server

Agent executing the test : An internal agent

Output of the test : One set of results for the Hyper-V host monitored

Configurable parameters for the test
  1. Test period - How often should the test be executed
  2. Host - The host for which the test is to be configured.
  3. To make diagnosis more efficient and accurate, the eG Enterprise suite embeds an optional detailed diagnostic capability. With this capability, the eG agents can be configured to run detailed, more elaborate tests as and when specific problems are detected. To enable the detailed diagnosis capability of this test for a particular server, choose the On option. To disable the capability, click on the Off option.

    The option to selectively enable/disable the detailed diagnosis capability will be available only if the following conditions are fulfilled:

    • The eG manager license should allow the detailed diagnosis capability
    • Both the normal and abnormal frequencies configured for the detailed diagnosis measures should not be 0.
Measurements reported by the test
Measurement Description Measurement Unit Interpretation

OK status

Indicates the number of VMs on which the heartbeat service is operating normally.

Number

A high value is desired for this measure. A low value indicates that the parent partition is unable to communicate with the heartbeat service on many VMs; this in turn implies that many VMs are currently unreachable. If this is the case, you will have to figure out why those VMs are unavailable and initiate the required corrective action.

You can use the detailed diagnosis of this measure to know which VMs are operating normally.

Error status

Indicates the number of VMs that do not support a compatible protocol version.

Number

Ideally, the value of this measure should be low.

You can use the detailed diagnosis of this measure to know which VMs have encounrtered errors. 

Lost contact status

Indicates the number of VMs on which the heartbeat service has not been installed yet or has not yet been contacted by the parent partition.

Number

Ideally, the value of this measure should be 0.

You can use the detailed diagnosis of this measure to know on which VMs the heartbeat service has not been  installed or is yet to be contacted.

Lost communication status

Indicates the number of VMs on which the hearbeat service is not responding to the hearbeat messages sent by the parent partition.

Number

Ideally, the value of this measure should be 0. A high value indicates that the heartbeat service on many VMs is not responding to heartbeat messages. This could be owing to a VM lock, a VM crash, or any other activity that can temporarily/permanently suspend VM operations.

You can use the detailed diagnosis of this measure to know the VMs with which the parent partition is unable to communicate.

Unknown status

Indicates the number of VMs that have been powered off.

Number

If a VM is powered off, the parent partition will not be able to contact the heartbeat service on that VM at all. This again can cause user accesses to that VM to be denied.

You can use the detailed diagnosis of this measure to know the VMs that are in an Unknown state.

The detailed diagnosis of the OK status measure reveals the VMs that are currently operating normally.

DDheartbeatstatustest-OKstatus

Figure 1 : The detailed diagnosis of the OK status measure