Network - ESX Test

This test reports key statistics pertaining to the network traffic to and from every network interface supported by the ESX server host.

Target of the test : An ESX server host

Agent deploying the test : An internal/remote agent

Outputs of the test : One set of results for every network interface supported by the ESX server host monitored

Configurable parameters for the test
  1. Test period - How often should the test be executed
  2. Host - The host for which the test is to be configured.
  3. port - The port at which the specified host listens. By default, this is NULL.
  4. esx user and esx password - In order to enable the test to extract the desired metrics from a target ESX server, you need to configure the test with an ESX USER and ESX PASSWORD. The user credentials to be passed here depend upon the mechanism used by the eG agent for collecting performance statistics from the ESX server and its VMs. These monitoring methodologies and their corresponding configuration requirements have been discussed hereunder:

    • Monitoring using the web services interface of the ESX server: Starting with ESX server 3.0, a VMware ESX server offers a web service interface using which the eG agent collects metrics from the ESX server. The VMware VI SDK is used by the agent to implement the web services interface. To use this interface for monitoring, this test should be configured with an ESX USER who has “Read-only” privileges to the target ESX server. By default, the root user is authorized to execute the test. However, it is preferable that you create a new user on the target ESX host and assign the “Read-only” role to him/her. The steps for achieving this have been elaborately discussed in Increasing the Memory Settings of the eG Agent that Monitors ESX Servers section.

      ESX servers terminate user sessions based on timeout periods. The default timeout period is 30 mins. When you stop an agent, sessions currently in use by the agent will remain open for this timeout period until ESX times out the session. If the agent is restarted within the timeout period, it will open a new set of sessions. If you want the eG agent to close already existing sessions before it opens new sessions, then you would have to configure all the tests with the credentials of an ESX user with permissions to View and stop sessions (prior to vSphere/ESX server 4.1, this was called the View and Terminate Sessions privilege). To know how to grant this permission to an ESX user, refer to Creating a Special Role on an ESX Server and Assigning the Role to a New User to the Server section.

      Sometimes, the VMware VI SDK may cache the hardware status metrics it collects and provide the test with the cached results. This may cause the eG agent to receive obsolete hardware status information from the SDK. This is also the reason why, you may at times notice a mismatch between the hardware status reported by the eG agent and by the vSphere client. To ensure that the eG agent always reports the current hardware status, you should configure the eG agent to obtain the hardware metrics from the VMware VI SDK only after the SDK resets the cache to clear its contents, and then refreshes the cache so that the latest hardware status information is fetched into it. To enable the eG agent to make the reset and refresh SDK calls, the esx user and esx password parameters must be configured with the credentials of a vSphere user with the Change Settings privilege. For that you need to create a special role on vSphere, assign the Change Settings privilege to that role, and then map the role with a new user on vSphere. The procedure for this is detailed in Configuring the eG Agent to Collect Current Hardware Status Metrics section.

    • Monitoring using the vCenter in the target environment: By default, the eG agent connects to each ESX server and collects metrics from it. While this approach scales well, it requires additional configuration for each server being monitored. For example, separate user accounts may need to be created on each server for read-only access to VM details. While monitoring large virtualized installations however, the agents can be optionally configured to monitor ESX servers using the statistics already available with different vCenter installations in the environment.

    In this case therefore, the ESX USER and ESX PASSWORD that you specify should be that of an Administrator or Virtual Machine Administrator in vCenter. However, if, owing to security constraints, you prefer not to use the credentials of such users, then, you can create a special role on vCenter with ‘Read-only’ privileges.

    Refer to Assigning the ‘Read-Only’ Role to a Local/Domain User to vCenter section to know how to create a user on vCenter.

    If the ESX server for which this test is being configured had been discovered via vCenter, then the eG manager automatically populates the esx user and esx password text boxes with the vCenter user credentials using which the ESX discovery was performed.

    Like ESX servers, vCenter servers too terminate user sessions based on timeout periods. The default timeout period is 30 mins. When you stop an agent, sessions currently in use by the agent will remain open for this timeout period until vCenter times out the session. If the agent is restarted within the timeout period, it will open a new set of sessions. If you want the eG agent to close already existing sessions before it opens new sessions, then you would have to configure all the tests with the credentials of a vCenter user with permissions to View and stop sessions (prior to vCenter 4.1, this was called the View and Terminate Sessions permission). To know how to grant this permission to a user to vCenter, refer to Creating a Special Role on vCenter and Assigning the Role to a Local/Domain User section. When the eG agent is started/restarted, it first attempts to connect to the vCenter server and terminate all existing sessions for the user whose credentials have been provided for the tests.

    This is done to ensure that unnecessary sessions do not remain established in the vCenter server for the session timeout period.  Ideally, you should create a separate user account with the required credentials and use this for the test configurations. If you provide the credentials for an existing user for the test configuration, when the eG agent starts/restarts, it will close all existing sessions for this user (including sessions you may have opened using the Virtual Infrastructure client). Hence, in this case, you may notice that your VI client sessions are terminated when the eG agent starts/restarts.

    Sometimes, the VMware VI SDK may cache the hardware status metrics it collects and provide the test with the cached results. This may cause the eG agent to receive obsolete hardware status information from the SDK. This is also the reason why, you may at times notice a mismatch between the hardware status reported by the eG agent and by the vSphere client. To ensure that the eG agent always reports the current hardware status, you should configure the eG agent to obtain the hardware metrics from the VMware VI SDK only after the SDK resets the cache to clear its contents, and then refreshes the cache so that the latest hardware status information is fetched into it. To enable the eG agent to make the reset and refresh SDK calls, the esx user and esx password parameters must be configured with the credentials of a vCenter user with the Change Settings privilege. For that you need to create a special role on vCenter, assign the Change Settings privilege to that role, and then map the role with a new user on vCenter. The procedure for this is detailed in Configuring the eG Agent to Collect Current Hardware Status Metrics.

  5. confirm password - Confirm the password by retyping it here.
  6. ssl - By default, the ESX server is SSL-enabled. Accordingly, the SSL flag is set to Yes by default. This indicates that the eG agent will communicate with the ESX server via HTTPS by default.

    Like the ESX sever, the vCenter is also SSL-enabled by default. If you have chosen to use the vCenter for monitoring, then you have to set the SSL flag to Yes.

  7. webport - By default, in most virtualized environments, the vSphere/ESX server and vCenter listen on port 80 (if not SSL-enabled) or on port 443 (if SSL-enabled). This implies that while monitoring an SSL-enabled vSphere/ESX server directly, the eG agent, by default, connects to port 443 of the vSphere/ESX server to pull out metrics, and while monitoring a non-SSL-enabled server, the eG agent connects to port 80. Similarly, while monitoring a vSphere/ESX server via an SSL-enabled vCenter, the eG agent connects to port 443 of vCenter to pull out the metrics, and while monitoring via a non-SSL-enabled vCenter, the eG agent connects to port 80 of vCenter. 

    Accordingly, the webport parameter is set to 80 or 443 depending upon the status of the ssl flag.  In some environments however, the default ports 80 or 443 might not apply. In such a case, against the webport parameter, you can specify the exact port at which the vSphere/ESX server or vCenter in your environment listens so that the eG agent communicates with that port.

  8. VIRTUAL CENTER - If the eG manager had discovered the target ESX server by connecting to vCenter, then the IP address of the vCenter server used for discovering this ESX server would be automatically displayed against the vIRTUAL center parameter; similarly, the esx user and esx password text boxes will be automatically populated with the vCenter user credentials, using which ESX discovery was performed.

    If this ESX server has not been discovered using vCenter, but you still want to monitor the ESX server via vCenter, then select the IP address of the vCenter host that you wish to use for monitoring the ESX server from the vIRTUAL center list. By default, this list is populated with the IP address of all vCenter hosts that were added to the eG Enterprise system at the time of discovery. Upon selection, the esx user and esx password that were pre-configured for that vCenter server will be automatically displayed against the respective text boxes.

    On the other hand, if the IP address of the vCenter server of interest to you is not available in the list, then, you can add the details of the vCenter server on-the-fly, by selecting the Other option from the vIRTUAL center list. This will invoke the add vcenter server details page. Refer to Adding the Details of a vCenter Server for VM Discovery section to know how to add a vCenter server using this page. Once the vCenter server is added, its IP address, esx user, and esx password will be displayed against the corresponding text boxes.

    On the other hand, if you want the eG agent to behave in the default manner -i.e., communicate with each ESX server for monitoring it - then set the VIRTUAL CENTER parameter to ‘none’. In this case, the ESX USER and ESX PASSWORD parameters can be configured with the credentials of a user who has at least ‘Read-only’ privileges to the target ESX server.

  9. reportactiveonly - By default, this test reports metrics for only those network interfaces that are connected to a vSwitch. Accordingly, the reportactiveonly flag is set to Yes by default. To ensure that the test reports metrics for all network interfaces, regardless of whether/not they are connected to a vSwitch, set this flag to No.
  10. DETAILED DIAGNOSIS – To make diagnosis more efficient and accurate, the eG Enterprise embeds an optional detailed diagnostic capability. With this capability, the eG agents can be configured to run detailed, more elaborate tests as and when specific problems are detected. To enable the detailed diagnosis capability of this test for a particular server, choose the On option. To disable the capability, click on the Off option.

    The option to selectively enable/disable the detailed diagnosis capability will be available only if the following conditions are fulfilled:

    • The eG manager license should allow the detailed diagnosis capability
    • Both the normal and abnormal frequencies configured for the detailed diagnosis measures should not be 0.
Measurements made by the test

Measurement

Description

Measurement Unit

Interpretation

Network packets transmitted:

Indicates the rate at which packets were transmitted by this network interface.

Packets/Sec

An increase in traffic from the server can indicate an increase in accesses from the server (from users on the ESX server, or from the virtual machines on the server).

Use the detailed diagnosis of the Network packets transmitted measure to know the rate at which packets were transmitted by each VM on the target virtual host via this NIC. Likewise, you can use the detailed diagnosis of the Network data transmitted measure to know how much data was transmitted per second by every VM on this hypervisor via this NIC.

From these detailed metrics, you can precisely pinpoint the VM that is imposing the maximum load on this NIC.  

Network data transmitted:

Indicates the rate at which data was transmitted by this network interface.

Mbps

Network packets received:

Indicates the rate at which data packets were received by this network interface.

Packets/Sec

An increase in traffic to the server can indicate an increase in accesses to the server (from users or from other applications external to the server) or that the server is under an attack of some form.

Use the detailed diagnosis of the Network packets received measure to know the rate at which packets were received by each VM on the target virtual host via this NIC. Likewise, you can use the detailed diagnosis of the Network data received measure to know how much data was received per second by every VM on this hypervisor via this NIC.

From these detailed metrics, you can precisely pinpoint the VM that is imposing the maximum load on this NIC.  

Network data received:

Indicates the rate at which data was received by this network interface.

Mbps

Status:

Indicates whether the network interface card is available or not.

 

If this measure reports the value Up, it indicates that the NIC is accessible. Whereas, if the status is Down, it indicates that the NIC cannot be accessed.

The numeric values that correspond to each of the states discussed above are listed in the table below:

State Value

Up

100

Down

0

Note:

By default, this measure reports the value Up or Down to indicate the availability of the NIC. The graph of this measure however, represents the status of an NIC using the numeric equivalents - 100 or 0. 

Speed:

Indicates the current speed of this network interface card.

Mbps

Compare the value of this measure across NICs to identify the slowest NIC and to assess network usage across NICs. You can also use the value of this measure to predict the maximum throughput that can be reached.

Packets dropped during transmission:

Indicates the rate at which packet were dropped by this interface during transmissions.

Packets/Sec

A low value is typically desired for this measure.

During transmission dropped packets:

Indicates the percentage of packets that were dropped by this network interface during transmissions.

Percent

A very high value of this measure is a cause for concern, as it could indicate a network congestion. 

Packets dropped during reception:

Indicates the rate at which packets were dropped by this interface during reception.

Packets/Sec

A low value is typically desired for this measure.

During reception dropped packets:

Indicates the percentage of packets that were dropped by this network interface during reception.

Percent

A very high value of this measure is a cause for concern, as it could indicate a network congestion. 

Network IOPS:

Indicates the rate at which packets were received and transmitted by this network interface.

Packets/Sec

A high value is desired for this measure. A consistent drop in the value of this measure is a cause for concern, as it indicates a strain on the network interface, which is causing it to slowdown.

Network throughput:

Indicates the rate at which data was received and transmitted by this network interface

MB/Sec

A high value is desired for this measure. A consistent drop in the value of this measure is a cause for concern, as it indicates a strain on the network interface, which is causing it to slowdown.

Unknown protocol frames received

Indicates the rate at which packets with unknown protocol were received by this network interface.

Packets/Sec

For packet-oriented interfaces, this measure will report the number of packets received via the interface which were discarded because of an unknown or unsupported protocol. For character-oriented or fixed-length interfaces that support protocol multiplexing, this measure reports the number of transmission units received via the interface which were discarded because of an unknown or unsupported protocol. For any interface that does not support protocol multiplexing, this counter will always be 0.

Multicast packets received

Indicates the rate at which multicast packets were received by this interface.

Packets/Sec

Multicast is the term used to describe communication where a piece of information is sent from one or more points to a set of other points. In this case there is may be one or more senders, and the information is distributed to a set of receivers (theer may be no receivers, or any other number of receivers).

Multicasting is the networking technique of delivering the same packet simultaneously to a group of client.

Multicast packets transmitted

Indicates the rate at which multicast packets were transmitted by this interface.

Packets/Sec

Broadcast packets received

Indicates the rate at which broadcast packets were received by this interface.

Packets/Sec

Broadcast is the term used to describe communication where a piece of information is sent from one point to all other points. In this case there is just one sender, but the information is sent to all connected receivers.

Broadcast transmission is supported on most LANs (e.g. Ethernet), and may be used to send the same message to all computers on the LAN (e.g. the address resolution protocol (arp) uses this to send an address resolution query to all computers on a LAN). Network layer protocols (such as IPv4) also support a form of broadcast that allows the same packet to be sent to every system in a logical network (in IPv4 this consists of the IP network ID and an all 1's host number).

Broadcast packets transmitted

Indicates the rate at which broadcast packets were sent by this interface.

Packets/Sec

Packets received with errors

Indicates the rate at which packets with errors were received by this interface.

Packets/Sec

 

Ideally, the value of these measures should be 0.

Packets transmitted with errors

Indicates the rate at which packets with errors were transmitted by this interface.

Packets/Sec

To know the network health across all network interfaces supported by the ESX server, just click on the descriptor with the name of the vSphere/ESXi server being monitored. For instance, in the case of Figure 1, click on the descriptor esx3i.chn.egurkha.com to view the following measure:

Measurement Description Measurement Unit Interpretation

Network usage:

Indicates the rate at which data transmitted and received for all the NIC instances of the host.

Mbps

Use the detailed diagnosis of this measure to know the data usage per VM on the monitored hypervisor. From this, you can quickly identify the VM that is making maximum use of the network resources of the vSphere/ESX server.