Citrix Web Servers Test

If users accessing a web server complain of slowness, administrators must be able to quickly figure out what is causing the slowness – is it because of a processing bottleneck with the web server? or is it owing to a latent server network? The Citrix Web Servers test accurately points administrators to the source of the slowness! This test tracks requests to each web server managed by NetScaler and reports the time every server takes to process the requests. The test thus sends out proactive alerts to administrators if it finds that any web server is responding very slowly to client requests. Additionally, the test also indicates if the slowdown experienced by the user can be attributed to a latent server-side network. This way, the test helps administrators identify slow servers and rapidly isolate the reason for the slowness, so that the problem can be fixed quickly and normalcy restored in no time.

Target of the test : An AppFlow-enabled NetScaler appliance

Agent deploying the test : A remote agent

Outputs of the test : One set of results for every web server that hosts web applications managed by a NetScaler appliance

Configurable parameters for the test
Parameter Description

Test period

How often should the test be executed. It is recommended that you set the test period to 5 minutes. This is because, the eG AppFlow Collector is capable of capturing and aggregating AppFlow data related to the last 5 minutes only.

Host

The host for which the test is to be configured.

Cluster IPs

This parameter applies only if the NetScaler appliance being monitored is part of a NetScaler cluster. In this case, configure this parameter with a comma-separated list of IP addresses of all other nodes in that cluster.

If the monitored NetScaler appliance is down/unreachable, then the eG AppFlow Collector uses the Cluster IPs configuration to figure out which other node in the cluster it should connect to for pulling AppFlow statistics. Typically, the collector attempts to connect to every IP address that is configured against Cluster IPs, in the same sequence in which they are specified. Metrics are pulled from the first cluster node that the collector successfully establishes a connection with.

Enable Logs

This flag is set to No by default. This means that, by default, the eG agent does not create AppFlow logs. You can set this flag to Yes to enable AppFlow logging. If this is done, then the eG agent automatically writes the raw AppFlow records it reads from the collector into individual CSV files. These CSV files are stored in the <EG_AGENT_INSTALL_DIR>\NetFlow\data\<IP_of_Monitored_NetScaler>\webappflow\actual_csv folder on the eG agent host. These CSV files provide administrators with granular insights into the web appflows, thereby enabling effective troubleshooting.

Note:

By default, the eG agent creates a maximum of 10 CSV files in the actual_csv folder. Beyond this point, the older CSV files will be automatically deleted by the eG agent to accommodate new files with current data. Likewise, a single CSV file can by default contain a maximum of 99999 records only. If the records to be written exceed this default value, then the eG agent automatically creates another CSV file to write the data.

If required, you can overwrite these default settings. For this, do the following:

  1. Login to the eG agent host.
  2. Edit the Netflow.Properties file in the <EG_AGENT_INSTALL_DIR>\NetFlow\config directory.
  3. In the file, look for the parameter, csv_file_retention_count.
  4. This is the parameter that governs the maximum number of CSV files that can be created in the auto_csv folder. By default, this parameter is set to 10. If you want to retain more number of CSV files at any given point in time, you can increase the value of this parameter. If you want to retain only a few CSV files, then decrease the value of this parameter.
  5. Next, look for the parameter, csv_max_flow_record_per_file.
  6. This is the parameter that governs the number of flow records that can be written to a single CSV. By default, this parameter is set to 99999. If you want a single file to accommodate more records, so that the creation of new CSVs is delayed, then increase the value of this parameter. On the other hand, if you want to reduce the capacity of a CSV file, so that new CSVs are quickly created, then decrease the value of this parameter.
  7. Finally, save the file.

Show Top N Servers

By default, this is set to Yes. This means that, by default, the test will report metrics for only the top web servers (in terms of number of hits or bandwidth usage). In this case, only the top-N bandwidth-intensive or most-used web servers (depending upon the option chosen against the Show Top-N Servers By parameter) will be the descriptors of this test. If you want the test to report metrics for all web servers, then set this flag to No.

Show Top N Servers By

By default, this parameter is set to Hits. This means that, by default, the test will report metrics for only those web servers that have been used the most. If required, you can configure the test to report metrics for those web servers that are bandwidth-intensive. For that, set this parameter to Bandwidth.

Top N Servers Limit

By default, this is set to 10. This denotes that the test will report metrics for the top-10 web servers (in terms of number of hits or bandwidth usage, depending upon the Show Top-N Servers By parameter setting) only. You can change the 'N' in top-N by specifying a higher or a lower value here.

Show Top N in DD

By default, this flag is set to Yes. This indicates that, by default, the detailed diagnosis of this test will display the details of only the top requests for a server (in terms of the number of hits or bandwidth usage, depending upon the Sort DD Data By setting). If you set this flag to No, then detailed diagnosis will provide the details of all requests.

Sort DD Data By

By default, this test sorts the detailed diagnostics it reports in the descending order of those HTTP request method:response status pairs that have seen the maximum hits. Accordingly, the Hits option is by default chosen against this parameter. Detailed diagnosis so sorted will point you to those server requests that frequently returned error responses. If required, you can sort the detailed diagnostics in the descending order of bandwidth usage, so you can quickly identify those requests that resulted in bandwidth-intensive responses. For this, choose the Bandwidth option against this parameter.

Top N DD Limit

This parameter applies only if the Show Top N in DD flag is set to 'Yes'.

By default, this parameter is set to 10, indicating that the detailed diagnostics will report the top-10 HTTP request method:response status pairs (in terms of the number of hits or bandwidth usage, depending upon the Sort DD Data By setting). You can change the 'N' in Top N by specifying any number of your choice in this text box.

DD Frequency

Refers to the frequency with which detailed diagnosis measures are to be generated for this test. The default is 1:1. This indicates that, by default, detailed measures will be generated every time this test runs, and also every time the test detects a problem. You can modify this frequency, if you so desire. Also, if you intend to disable the detailed diagnosis capability for this test, you can do so by specifying none against DD Frequency.

Detailed Diagnosis

To make diagnosis more efficient and accurate, the eG Enterprise embeds an optional detailed diagnostic capability. With this capability, the eG agents can be configured to run detailed, more elaborate tests as and when specific problems are detected. To enable the detailed diagnosis capability of this test for a particular server, choose the On option. To disable the capability, click on the Off option.

The option to selectively enable/disable the detailed diagnosis capability will be available only if the following conditions are fulfilled:

  • The eG manager license should allow the detailed diagnosis capability
  • Both the normal and abnormal frequencies configured for the detailed diagnosis measures should not be 0.

 

Measurements made by the test
Measurement Description Measurement Unit Interpretation

Hits

Indicates the number of requests received by this web server.

Number

This is a good indicator of the load on the web server. 

Compare the value of this measure across web servers to know which server is receiving the maximum number of requests. If a single server appears to be servicing a significantly large number of requests than the rest, it could imply that the server is overloaded. This in turn indicates that a faulty/ineffective load-balancing algorithm is in use.

Use the detailed diagnosis of this measure to identify the bandwidth-intensive requests to the web server and requests that have often failed/resulted in error responses.

Bandwidth

Indicates the total amount of data received by this web server.

KB

Compare the value of this measure across web servers to know which server is consuming bandwidth excessively.

Server processing time

Indicates the elapsed time, from when the server starts to receive the first byte of a request from the NetScaler appliance until the NetScaler appliance receives the first byte to response.

msecs

A high value for this measure indicates that the web server is processing requests slowly.

Compare the value of this measure across web servers to isolate the slowest web server.

In the event that a user complains of slowness, you can compare the value of this measure with that of the Server avg latency measure to determine what is causing the slowness – the poor processing power of the web server? or a latent server network? 

Server avg latency

Indicates the average latency caused by the server network.

msecs

A high value for this measure indicates that the server network is latent.

Compare the value of this measure across web servers to know which server’s network is the slowest.

In the event that a user complains of slowness, you can compare the value of this measure with that of the Server processing time measure to determine what is causing the slowness – the poor processing power of the web server? or a latent server network? 

The detailed diagnosis of the Hits measure groups server requests on the basis of the HTTP request method and response status of the requests. For each unique HTTP request method:response status pair, the detailed diagnosis reveals the client from which the requests were received, the OS of the client, the device used for sending the requests, and the web server to which the requests were sent. Additionally, the detailed diagnostics also report the number of hits, bandwidth usage, and responsiveness of each HTTP request method:response status pair. In the process, the test points to request methods that often resulted in error responses, request methods that took too long to be serviced, and the probable cause for the poor responsiveness - did the server take too long to process requests of that type? or is the slowness owing to a latent server network? 

Figure 1 : The detailed diagnosis of the Hits measure reported by the Citrix Web Servers test