OSS Search Gatherer Databases Test

The search architecture contains search components and databases. The Crawl component crawls content sources to collect crawled properties and metadata from crawled items and sends this information to the Content Processing component. Crawling is the process of gathering the content for search. To retrieve information, the crawl component connects to the content sources by using the proper out-of-the-box or custom connectors. After retrieving the content, the Crawl Component passes crawled items to the Content Processing Component. It also stores tracking information and historical information about crawled items such as documents and URLs in the Crawl database.

The Content Processing component transforms the crawled items and sends them to the index component. This component also maps crawled properties to managed properties. Additionally, this component stores unprocessed information it exracts in the Link database, so that the Analytics component can carry out search and usage analytics using the information.

The Index component receives the processed items from the Content Processing component and writes them to the search index. This component also handles incoming queries, retrieves information from the search index and sends back the result set to the Query Processing component.

Since the success of the search function depends upon how well the Crawl and Content Processing components perform their duties, administrators need to keep their eyes open for irregularities in the functioning of these two components, so that such anomalies are detected instantly and corrected before they can stall searching.

This test monitors the crawl/gatherer component and the content processing components, and reports issues in its performance (if any).

Target of the test : A Microsoft SharePoint Server

Agent deploying the test : An internal agent

Outputs of the test : One set of results each for the Microsoft SharePoint server that is being monitored

Configurable parameters for the test
Parameters Description

Test period

This indicates how often should the test be executed.

Host

The host for which the test is to be configured.

Port

The port at which the host server listens.

Measurements made by the test
Measurement Description Measurement Unit Interpretation

Documents in crawl history

Indicates the number of documents in the crawl history since the gatherer service was started.

Number

Too many documents in the crawl history may unnecessarily clutter the Crawl database. If the value of this measure rises consistently, you may want to cleanup the crawl history to conserve storage space.

Documents in crawl queue

Indicates the number of documents currently waiting in the crawl queue.

Number

If the value of this measure increases with time, it could mean that the Crawl component is crawling content very slowly.

Links waiting to be processed

Indicates the current number of links that are waiting to be processed.

Number

If the value of this measure increases with time, it could mean that the Content Processing component is experiencing bottlenecks when processing links.

Links processed

Indicates the number of links that were processed since the gatherer service was started.

Number