Indexing Statistics Test

The content index, also called the search index, allows searching of Confluence content. It is also used for a number of related functions such as building email threads in the mail archive, the space activity feature, and lists of recently-updated content.

For reasons of efficiency, Confluence does not immediately add content to the index. New and modified Confluence content is first placed in a queue and the queue is processed once every five seconds (by default). If the queue keeps building up without updating the index, then newly added/modified content will not be picked up by Confluence. Confluence users will hence end up consuming outdated content. This can adversely impact the user experience with Confluence.

To avoid this, administrators need to monitor the indexing activity continuously, track changes to the index queue length, and proactively detect processing bottlenecks well before it impacts user experience. This is where the Indexing Statistics test helps!

This test monitors the index queue and promptly alerts administrators if the count of pending tasks in the queue keeps increasing. This way, the test points administrators to a probable index processing bottleneck, prompts them to investigate the reasons for the same, and enables them to arrive at the appropriate corrective action. Based on these observations, if administrators decide to rebuild the index to clear the bottleneck, they can use the test to figure out if any re-indexing is already in progress. The test additionally reports the time taken for the last indexing/reindexing, and thus indicates if re-indexing is slow. Detailed diagnostics also reveal the precise task that was last re-indexed and is the probable cause for the indexing slowness.

Target of the test: Atlassian Confluence

Agent deploying the test : An internal/remote agent

Outputs of the test : One set of results for the Confluence server monitored

Configurable parameters for the test
Parameter Description

Test period

How often should the test be executed

Host

The host for which the test is to be configured.

Port

The port number at which the specified host listens to

JMX remote port

Here, specify the port at which the JMX listens for requests from remote hosts. Ensure that you specify the same port that you configured in the catalina.bat file in the <<ATLASSIAN_CONFLUENCE_INSTALL_DIR>>\confluence\bin directory.

JNDIname

The JNDIname is a lookup name for connecting to the JMX connector. By default, this is jmxrmi. If you have resgistered the JMX connector in the RMI registry using a different lookup name, then you can change this default value to reflect the same. 

User, Password, and Confirm password

If JMX requires authentication only (but no security), then ensure that the user and password parameters are configured with the credentials of a user with read-write access to JMX. Confirm the password by retyping it in the Confirm Password text box.

Timeout

Specify the duration (in seconds) for which this test should wait for a response from the target server. If there is no response from the target beyond the configured duration, the test will timeout. By default, this is set to 10 seconds.

DD Frequency

Refers to the frequency with which detailed diagnosis measures are to be generated for this test. The default is 1:1. This indicates that, by default, detailed measures will be generated every time this test runs, and also every time the test detects a problem. You can modify this frequency, if you so desire. Also, if you intend to disable the detailed diagnosis capability for this test, you can do so by specifying none against this parameter.

Detailed Diagnosis

To make diagnosis more efficient and accurate, the eG Enterprise suite embeds an optional detailed diagnostic capability. With this capability, the eG agents can be configured to run detailed, more elaborate tests as and when specific problems are detected. To enable the detailed diagnosis capability of this test for a particular server, choose the On option. To disable the capability, click on the Off option.

The option to selectively enable/disable the detailed diagnosis capability will be available only if the following conditions are fulfilled:

  • The eG manager license should allow the detailed diagnosis capability
  • Both the normal and abnormal frequencies configured for the detailed diagnosis measures should not be 0.
Measurements made by the test
Measurement Description Measurement Unit Interpretation

Is cache currently flushing

Indicates whether/not the cache is currently flushing.

 

Sometimes, cached content can be added to the index. This is very often done to make external content - i.e., content from external sources such as SQL databases, Excel, etc. - searchable in Confluence. For this, typically, a Cache Macro is used. The Confluence page displaying the external content should include a cache macro instance with indexing enabled. The cache macro renders contents of the macro into HTML and stores the HTML in the cache. The cache content extractor processes the HTML data from the cache, extracts only the text and attribute fields, and inserts them into the index for that external content. Where a cache macro is in use, you can use this measure to figure out if the HTML data in the cache is being flushed into the index.

The values that this measure can report and their corresponding numeric values are listed in the table below:

Measure Value

Numeric Value

Yes

1

No

0

Note:

By default, this measure reports the Measure Values listed in the table above to indicate whether/not cache flushing is in progress. In the graph of this measure however, the same is indicated using the numeric equivalents.

Is confluence content currently reindexing

Indicates whether/not Confluence is currently re-indexing.

 

Typically, when the index is rebuilt, it is reindexed. This measure therefore, indicates whether/not any index rebuild is in progress. The values that this measure can report and their corresponding numeric values are listed in the table below:

Measure Value

Numeric Value

Yes

1

No

0

If the number of items in the index queue increases, it is a sign of an indexing bottleneck. Performing an index rebuild from scratch will resolve this bottleneck. This is why, you may want to check the value of this measure, if you find the value of the Task queue length measure increasing consistently. In such a situation, the value Yes for this measure could indicate that re-indexing is being performed, probably to resolve an indexing bottleneck. On the other hand, if the value of this measure is No in such a situation, it indicates to administrators that an index rebuild is a must, but is yet to begin.

If the value of this measure is Yes for a long time, it could indicate that reindexing is stuck. Probable causes for this are as follows:

  • Intermittent database disruptions might cause problems during the reindex, but resolve quickly enough that Confluence as whole is not impacted Adding a validation query should resolve any intermittent connection problems to the database.
  • Some attachments cause problems when being read. You can disable indexing of attachments temporarily, to give the reindex a chance to complete. The debug logging will help to determine the file that was attempted to be indexed, before the process got into the stuck state.
  • Through analysis of the thread dumps, you might find that the indexer threads are waiting on an external resource; such as a lock on a file or a database connection. This may also occur in other scheduled jobs.

Note:

By default, this measure reports the Measure Values listed in the table above to indicate whether/not re-indexing is in progress. In the graph of this measure however, the same is indicated using the numeric equivalents.

Time taken for last indexing

Indicates the time taken during the last re-indexing.

Msecs

Ideally, the value of this measure should be low. A very high value indicates that re-indexing is slow. In such situations, you can use the detailed diagnosis of this measure to determine the task that was last re-indexed.

Typically, the length of time depends on the following factors:

  • Number of pages in your Confluence instance.
  • Number, type and size of attachments.
  • Amount of memory allocated to Confluence.
  • Disk throughput.

It may help to increase the heap memory allocation of Confluence. The process is basically the same for Confluence or Jira applications. If you are running an older version of Confluence and find that the index rebuild is not progressing, you may need to shut down Confluence, and restart it with the following Java system property set: bucket.indexing.threads.fixed=1.

Task queue length

Indicates the number of tasks in the index queue.

Number

If the value of this measure keeps increasing, it counld indicate that indexing is stuck. Usually, re-indexing from scratch resolves the problem, but it is not a permanent solution. You may want to track which artifact (either a page, artifact, or some other problem) is responsible for the stuck state. For this, you can use the detailed diagnosis of the Time taken for last indexing measure. This reveals the task that was last re-indexed and has probably contributed to the stuck state. Other causes for the queue being stuck include:

  • Intermittent database disruptions might cause problems during the reindex, but resolve quickly enough that Confluence as whole is not impacted Adding a validation query should resolve any intermittent connection problems to the database.
  • Some attachments cause problems when being read. You can disable indexing of attachments temporarily, to give the reindex a chance to complete. The debug logging will help to determine the file that was attempted to be indexed, before the process got into the stuck state.
  • Through analysis of the thread dumps, you might find that the indexer threads are waiting on an external resource; such as a lock on a file or a database connection. This may also occur in other scheduled jobs