Search Submission Test

Like problems in the content acquisition process, snags in the content processing routine can also delay searching. Content processing in SharePoint is performed by the  content processing component (CPP) and the index component. Once crawling is complete, the Content plug-in on the crawl component first routes the content to the Content Submission Service (CSS) of the content processing component. An instance of the CSS runs alongside each instance of a content processing component. Once the content plug-in on the crawl component establishes a session with the CSS, the CSS load-balances the incoming content by uniformly distributing the content to the content processing components (CPC). Upon receipt of documents from the CSS, the content processing component processes the documents and then sends them to the indexer for indexing.

If a crawler session is unexpectedly terminated by CSS, then some crawled content may not even reach the CSS, and will hence not be processed or indexed; this will eventually impact the search service! Moreover, if CSS is not able to push its document load to the content processing component fast enough, documents may get timed out from the CSS itself, and will hence be omitted from the search index; this again will result in a poor search experience. Likewise, if the content processing component suffers a slowdown, document processing and indexing will be significantly delayed, which in turn can affect querying. If such problems are to be avoided, administrators should closely monitor the availability and processing ability of the CSS and the CPC, and rapidly isolate bottlenecks. This is where the Search Submission test helps.

This test periodically checks the sessions to CSS, monitors how quickly the CSS load-balances the content and transmits it to the CPC, and measures the processing capacity of the  CPC. When users complain of their search queries being slow, then this test will shed light on the probable cause of the delay – is it owing to sudden/sporadic breaks in the crawler sessions to CSS? Is it because of a load-balancing bottleneck experienced by the CSS? Or is it due to a processing slowdown at the CPC? Based on the findings reported by this test, administrators can initiate the appropriate remedial measures.

Target of the test : A Microsoft SharePoint Server

Agent deploying the test : An internal agent

Outputs of the test : One set of results each for the Microsoft SharePoint server that is being monitored

Configurable parameters for the test
Parameters Description

Test period

This indicates how often should the test be executed.

Host

The host for which the test is to be configured.

Port

The port at which the host server listens.

Measurements made by the test
Measurement Description Measurement Unit Interpretation

Aborted sessions

Indicates the number of sessions that aborted since the start of the component.

Number

Ideally, the value of this measure should be 0. A high value is a cause for concern as it indicates frequent breaks in the crawler sessions on the CSS. Too many broken sessions can seriously impede the transfer of crawled content from the crawler to the CSS, resulting in incomplete transfers! This warrants an investigation into the reason for the frequent session failures.     

Active sessions

Indicates the number of crawler sessions that are currently active on the CSS.

Number

This is a good indicator of the current load on the CSS.

Available callbacks

Indicates the current number of callbacks ready for consumption, but not yet consumed by the client.

Number

Once the content processing component processes the content it receives and writes it to the index, it sends out a ‘call back’ to the content plug-in on the crawler indicating the processing status of that content.

A high value for this measure indicates that while the CPC has been able to generate callbacks, many of these callbacks have not yet been consumed by – i.e., have not yet reached – the crawler. This hints at an error in network communication between the crawler and the CPC.  

Total callbacks

Indicates the total number of callbacks produced by the submission service since the start of the component.

Number

You may want to compare the value of the Available callbacks measure with that of this measure to understand what fraction of callbacks is still to be consumed by the crawl component.

Client polls

Indicates the total number of client polls since the start of the component.

Number

Each time a client refreshes the session to check for callbacks this measure will be incremented.

Client submits

Indicates the total number of submits performed by clients since the start of the component.

Number

 

Skipped documents

Indicates the total number of documents skipped in the submission service before being delivered to the content processing component.

Number

A non-zero value is desired for this measure. A high value is disconcerting as it indicates that too many crawled documents are not reaching the CPC for processing as the CSS disregards them. Further investigation into the reasons is necessitated.  

Timed out documents

Indicates the total number of documents that timed out in the submission service.

Number

A low value is desired for this measure. A high value implies that the search index may not include many crawled documents as they have been timed out of the submission queue itself. This in turn may result in ineffective search queries. You may hence want to reset the timeout value for documents in the submission service.

Flows used for feeding

Indicates the current number of flows used for feeding.

Number

The CPC uses Flows and Operators to process the content.  Flows define how to process content, queries and results and each flow processes one item at a time. The number of current flows is hence an indicator of the number of documents that are being processed by the CPC.

Pending items

Indicates the current number of items delivered to the content processing component but where no callback has yet been received.

Number

A high value or a consistent rise in the value for this measure could indicate a bottleneck in content processing.