Distributed Cache Usage Analytics Test

The Distributed Cache service, which is built on Windows Server AppFabric Cache, is set to run in a collocated mode on all SharePoint 2013 Servers by default. It's essential for maintaining the large amounts of information on your SharePoint Server, ensuring that the information is fresh and readily available for the end user.

Caching functionalities, provided by the Distributed Cache service, enable web applications deployed on SharePoint to quickly retrieve data without any dependency on databases stored in SQL Server, as everything is stored in memory.

Any SharePoint server in the farm running the Distributed Cache service is known as a cache host. Cache size is the memory allocated to the Distributed Cache service on the cache host.

At any given point in time, sufficient memory resources should be available to the Distributed cache service to ensure optimum cache usage and to assure SharePoint users of a satisfactory experience with their web applications. In the absence of adequate memory, cache lookups will be delayed or even missed, thus affecting overall SharePoint performance and adversely impacting the health of user interactions with the web applications. 

It is hence imperative that administrators keep an eye on the usage of the cache service by each dependent web application, rapidly detect unexpected slowness in cache reads and writes, capture cache misses, and figure out if such anomalies are owing to the bad size of the Distributed cache service.  This is what the Distributed Cache Usage Analytics test help administrators do!

This test queries the SharePoint Logging database and retrieves metrics revealing how each web application uses the  distributed cache, from it. The metrics so collected reveal the following:

  • Is any web application reading from and/or writing to the cache slowly? If so, which host is slow?
  • Is any web application overloading the cache with read/write requests?
  • Which web application is experiencing many cache misses?
  • Are there any cache failures? If so, which web application failed to read from or write to the cache?

This way, the test brings cache usage and sizing irregularities to light, pinpoints the exact web application that is being impacted by these abnormalities, and thus prompts administrators to right-size the cache to ensure peak application performance.

Note:

  • This test will run only if a SharePoint Usage and Health Service application is created and is configured to collect usage and health data. To know how to create and configure this application, follow the steps detailed in Configuring the eG Agent to Collect Usage Analytics.
  • This test is not applicable when the target server is a Microsoft SharePoint 2010 server

Target of the test : A Microsoft SharePoint Server

Agent deploying the test : An internal/remote agent

Outputs of the test : One set of results for each web application in the monitored SharePoint server

Configurable parameters for the test
Parameters Description

Test period

This indicates how often should the test be executed.

Host

The host for which the test is to be configured.

Port

The port at which the host server listens.

SQL Port Number

Specify the port number of the SQL server that hosts the SharePoint Logging database.

Instance

If the SQL server that hosts the SharePoint Logging database is instance-based, then provide the instance name here. If not, then set this to none.

SSL

If the SQL server hosting the SharePoint Logging database is SSL-enabled, then set this flag to Yes. If not, set it to No.

Isntlmv2

In some Windows networks, NTLM (NT LAN Manager) may be enabled. NTLM is a suite of Microsoft security protocols that provides authentication, integrity, and confidentiality to users. NTLM version 2 (“NTLMv2”) was concocted to address the security issues present in NTLM. By default, the Isntlmv2 flag is set to No, indicating that NTLMv2 is not enabled by default on the SQL server that hosts the SharePoint Logging database. Set this flag to Yes if NTLMv2 is enabled on that SQL server.

Database Domain

Specify the fully qualified name of the domain in which the Microsoft SQL server hosting the SharePoint Logging database operates. For instance, your specification can be: SharePoint.eginnovations.com

Database server Name

Specify the name of Microsoft SQL server that hosts the SharePoint Logging database to be accessed by this test.

Database Name

Specify the name of the SharePoint Logging database that this test should access.

Database User Name, Database Password, Confirm Password

Specify the credentials of a user who has db_datareaderaccess to the SharePoint Logging database configured, in the Database User Name and Database Password text boxes. Then, confirm the password by retyping it in the Confirm Password text box.

URL patterns to be ignored from monitoring

By default, this test does not track requests to the following URL patterns: *.js,*.css,*.jpeg,*.jpg,*.png,*.asmx,*.ashx,*.svc,*.dlll. If required, you can remove one/more patterns from this default list, so that such patterns are monitored, or can append more patterns to this list in order to exclude them from monitoring. For instance, to additionally ignore URLs that end with .gif and .bmp when monitoring, you need to alter the default specification as follows: *.js,*.css,*.jpeg,*.jpg,*.png,*.asmx,*.ashx,*.svc,*.dlll,*.gif,*.bmp 

Ignore AjaxDelta Pages

By default, this test ignores all requests to AjaxDelta pages. This is why, the Ignore AjaxDelta Pages is set to Yes by default. If you want the test to track requests to the AjaxDelta pages as well, set this flag to No.

Fetch Farm Measures

Typicaly, farm-level metrics - eg., metrics on farm status, site collections, usage analytics - will not vary from one SharePoint server in the farm to another. If these metrics are collected and stored in the eG database for each monitored server in the SharePoint farm, it is bound to unnecessarily consume space in the database and increase processing overheads. To avoid this, farm-level metrics collection is by default switched off for the member servers in the SharePoint farm, and enabled only if the server being monitored is provisioned as the Central Administration site. Accordingly, this parameter is set to If Central Administration by default. This default setting ensures that farm-level metrics are collected from and stored in the database for only a single SharePoint server in the farm.  

If you want to completely switch-off farm-level metrics collection for a SharePoint farm, then set this parameter to No.

Some high-security environments may not allow an eG agent to be deployed on the Central Administration site. Administrators of such environments may however require farm-level insights into status and performance. To provide these insights for such environments, you can optionally enable farm-level metrics collection from any monitored member server in the farm, even if that server is not provisioned as the Central Administration site. For this, set this parameter to Yes when configuring this test for that member server. 

Domain, Domain User, Password, and Confirm Password

If the Fetch Farm Measures flag of these tests is set to No or to If Central Administration Site, then this test should be configured with the credentials of a user with the following privileges:

On the other hand, if the Fetch Farm Measures flag of these tests is set to Yes, then the user configured for the tests not only requires the four privileges discussed above, but should also be part of the following groups on the eG agent host:

  • Administrators

  • WSS_ADMIN_WPG

  • IIS_USRS

  • Performance Monitor Users

  • WSS_WPG

  • Users

It is recommended that you create a special user for this purpose and assign the aforesaid privileges to him/her. Once such a user is created, specify the domain to which that user belongs in the Domain text box, and then, enter the credentials of the user in the Domain User and Password text boxes. To confirm the password, retype it in the Confirm Password text box.

DD Frequency

Refers to the frequency with which detailed diagnosis measures are to be generated for this test. The default is 1:1. This indicates that, by default, detailed measures will be generated every time this test runs, and also every time the test detects a problem. You can modify this frequency, if you so desire. Also, if you intend to disable the detailed diagnosis capability for this test, you can do so by specifying none against DD frequency.

Detailed Diagnosis

To make diagnosis more efficient and accurate, the eG Enterprise embeds an optional detailed diagnostic capability. With this capability, the eG agents can be configured to run detailed, more elaborate tests as and when specific problems are detected. To enable the detailed diagnosis capability of this test for a particular server, choose the On option. To disable the capability, click on the Off option.

The option to selectively enable/disable the detailed diagnosis capability will be available only if the following conditions are fulfilled:

  • The eG manager license should allow the detailed diagnosis capability
  • Both the normal and abnormal frequencies configured for the detailed diagnosis measures should not be 0.
Measurements made by the test
Measurement Description Measurement Unit Interpretation

Reads

Indicates the number of cache reads performed by this web application. 

Number

 

Average read duration

Indicates the average time taken by this web application to read from the cache.

Msecs

A low value is desired for this measure. A consistent increase in the value of this measure could indicate a reading bottleneck. One of the reasons for reading delays is insufficient memory for the cache on the host. You may want to right-size the cache to make sure that your requests are serviced quickly and efficiently. 

Reading delays can also occur if the cache is overloaded with read requests or too much data is to be read. Whenever this measure registers an abnormally high value for a web application, look up the value reported by the Reads and Average read size measures for the same web application to determine whether/not the slowness can be attributed to the count and size of the reads.

Average read size

Indicates how many kilobytes of data, on an average, are read from the cache by this web application.

KB

 

Writes

Indicates the number of writes to the cache by this web application. 

Number

 

Average write duration

Indicates the average time taken by this web application to write to the cache.

Msecs

A low value is desired for this measure. A consistent increase in the value of this measure could indicate a writing bottleneck. One of the reasons for writing delays is insufficient memory for the cache. You may want to right-size the cache to make sure that your requests are serviced quickly and efficiently.

Writing delays can also occur if the web application is overloading the cache with write requests or too much data is to be written. Whenever this measure registers an abnormally high value for a web application, look up the value reported by the Writes and Average writes size measure for the same application to determine whether/not the slowness can be attributed to the unusually high number and size of the writes. 

Average writes size

Indicates how many kilobytes of data, on an average, are written to the cache by this web application.

Number

 

Misses

Indicates the number of requests from this web application that were not serviced by the cache.

Number

Ideally, the value of this measure should be 0. If on the other hand, this measure value is close to the value of the Objects requested measure, it is a cause for serious concern, as it implies that almost all objects requested were not found in the cache. Under such circumstances, use the detailed diagnosis of this measure to know which web site addresses could not be found in the cache.

One of the reasons for a high number of misses could be insufficient memory allocation to the cache service. In such a situation, you may want to increase the cache size by adding more memory.

Hits

Indicates the number of requests from this web application that were successfully served by this cache.

Number

Ideally, the value of this measure should be the same as the value of the Objects requested measure. If not, check whether the cache has enough memory, and if required, add more memory to it. 

Failures

Indicates the number of cache failures experienced by this web application.

Number

Ideally, the value of this measure should be 0. If this measure reports a non-zero value, then use the detailed diagnosis of this measure to know which web site addresses were being looked up in the cache when the failures occurred.

Objects requested

Indicates the number of objects requested by this web application.

Number

 

Use the detailed diagnosis of the Misses  measure to know which web site addresses could not be found in the cache.

SharePoint >SharePoint >- Google Chrome

Figure 1 : The detailed diagnosis of the Misses measure

Use the detailed diagnosis of the Failures measure to know which web site addresses were being looked up in the cache when the failures occurred.

SharePoint >SharePoint >- Google Chrome

Figure 2 : The detailed diagnosis of the Failures measure