SharePoint Foundation Search Gatherer Test

The search functionality can be described in its simplest form as a Web page where the user defines his or her search query. The index role can be configured to run on its own Microsoft SharePoint server, or run together with all the other roles, such as the Web service, Excel Services and Forms Services. It performs its indexing tasks following this general workflow:

  1. SharePoint stores all configuration settings for the indexing in its database.
  2. When activated, the index will look in SharePoint's databases to see what content sources to index, and what type of indexing to perform, such as a full or incremental indexing.
  3. The index service will start a program called the Gatherer, which is a program that will try to open the content that should be indexed.
  4. For each information type, the Gatherer will need an Index Filter, or IFilter, that knows how to read text inside this particular type of information. For example, to read a MS Word file, an IFilter for .DOC is needed.
  5. The Gatherer will receive a stream of Unicode characters from the IFilter. It will now use a small program called a Word Breaker; its job is to convert the stream of Unicode characters into words.
  6. However, some words are not interesting to store in the index, such as "the", "a", and numbers; the Gatherer will now compare each word found against a list of Noise Words. This is a text file that contains all words that will be removed from the stream of words.
  7. The remaining words are stored in an index file, together with a link to the source. If that word already exists, only the source will be added, so one word can point to multiple sources.
  8. If the source was information stored in SharePoint, or a file in the file system, the index will also store the security settings for this source. This will prevent a user from getting search results that he or she is not allowed to open.
  9. Since the success of an indexing operation also depends upon how the Gatherer program functions, administrators need to keep their eyes open for irregularities in the functioning of the gatherer, so that such anomalies are detected instantly, and corrected before they can stall the indexing process.

This test monitors the performance of the SharePoint Foundation Search Gatherer, and reports issues in its performance (if any).

Target of the test : A Microsoft SharePoint Server

Agent deploying the test : An internal/remote agent

Outputs of the test : One set of results each for the ProfileImport and Portal_Content instances

Configurable parameters for the test
Parameters Description

Test period

This indicates how often should the test be executed.

Host

The host for which the test is to be configured.

Port

The port at which the host server listens.

Measurements made by the test
Measurement Description Measurement Unit Interpretation

Filtering threads in the system

Indicates the current number of filtering threads in the system.

Number

 

Threads waiting for documents

Indicates the number of threads that are currently waiting for documents.

Number

These threads are not currently doing any work and will eventually be terminated. If you consistently have more than Max Threads/Hosts idle threads you can schedule an additional crawl. If this number is 0 then you are starved. Do not schedule another crawl in this time period and analyze the durations of your crawls during this time to see if they are meeting your freshness goals. If your goals are not being met you should reduce the number of crawls.

Threads waiting for network response from the filter process

Indicates the number of threads that were waiting for a response from the filter process.

Number

If you figure out that there is no activity that is taking place as far as this measure is concerned, and if the value of this measure is equal to the Filtering threads in system measure, it indicates a network issue or the unavailability of the server that is crawling into.

Threads committing transactions

Indicates the number of threads that are committing transactions.

Number

 

Threads waiting for plug-ins to complete an operation

Indicates the number of threads currently waiting for plug-ins to complete an operation.

Number

These threads have the filtered documents and are processing it in one of several plug-ins. This is when the index and property store are created.

Threads loading transactions from persisted crawl queue

Indicates the number of transactions that are loaded from the persisted crawl queue.

Number

 

Threads processing links

Indicates the number of threads that are processing links.

Number

 

Filtering processes in the system

Indicates the number of filtering processes that are active in the system.

Number

 

Filter objects in the system

Indicates the number of filter objects in the system.

Number

 

Documents waiting for robot threads

Indicates the number of documents that are waiting for robot threads.

Number

If the value of this measure is 0, then it implies that all the threads are filtering threads.

Currently connected admin clients

Indicates the number of currently connected admin clients.

Number

 

Amount of resources allowed for the Gatherer service

Indicates the amount of resources that the Gatherer service is allowed to use.

Number

 

Servers recently accessed by the system

Indicates the number of servers that were recently accessed by the system.

Number

 

Servers currently unavailable

Indicates the number of servers that are currently unavailable to the system.

Number

A server becomes unavailable if the requests made to the server is timed out.

Available cached stemmer instances

Indicates the number of cached stemmer instances in the system.

Number

Stemmers are nothing but components shared by the Search and Indexing engines that generate inflected forms for a word. Too many stemmer instances that are cached may indicate a resource usage problem.

System I/O rate

Indicates the rate at which the system IO disk traffic is detected during back off period.

KB/Sec

During a back-off period, indexing is suspended. To manually back off the gatherer service, pause the search service. If the search service itself generates the back-off, an event will be recorded and the search service will be paused automatically. There is no automatic restart, so you must manually start the search service in order to end a back-off state. Note that there is little reason to start the search service until you have solved the problem that caused the back-off in the first place.

Timeouts

Indicates the number of timeouts detected by the system during the last measurement period.

Number

Ideally, this value should be zero.

Documents filtered

Indicates the rate at which the documents are filtered in the system.

KB/Sec

If this rate is decreasing over time, you should perform some troubleshooting to find out why your server is not filtering documents.

Look for memory issues, processor issues, network issues, or site hit frequency rules that slow the gatherer process.

Documents successfully filtered

Indicates the rate at which the documents are filtered successfully in the system.

KB/Sec

 

Documents delayed due to site hit frequency rules

Indicates the number of documents that were currently delayed due to site hit frequency rules.

Number

If you have a plethora of rules and this number is steadily increasing over time, consider relaxing or simplifying your site hit frequency rules. A very high number may indicate a conflict in the rules that the gatherer cannot resolve or follow with efficiency.

Document entries currently in memory

Indicates the number of document entries that are currently available in the memory of the system.

Number

 

Documents filtered

Indicates the total number of documents filtered in the system during the last measurement period.

Number

 

Documents successfully filtered

Indicates the total number of documents that are successfully filtered in the system during the last measurement period.

Number

If the value of this measure is less than the value of the Documents filtered measure, use the gatherer logs to figure out the cause for the documents that are attempting to be filtered but are failing.