What is Apache Solr?

Apache Solr is open source enterprise search platform based on Apache Lucene. Solr is blazing fast, highly reliable and fault tolerant, It provides distributed indexing, replication and load balanced querying along with automated failover and recovery. Solr powers the navigation and search features of some of the world's largest internet sites.

Apache Solr is a J2EE based application that uses the libraries of Apache Lucene internally for the generation of the indexes as well as to provide the user-friendly searches. The architecture of Apache Solr has been described with the help of block diagram below.

Figure 1 : Apache Solr Architecture

There are logically four layers in which the overall architecture of solr can be divided. The storage layer is responsible for the managing of indexes and configuration metadata. It is inside the J2EE container on which the instance will run, and the solr engine is the application package that runs on top of the container. Finally, interaction denotes how the client/ Apache Solr server can interact with the web browser.

The Apache Solr storage can be used mainly for storing metadata and the necessary index information. It is typically file storage that is locally configured in the configuration file of Apache Solr. The installation package comes with a Jetty servlet and HTTP server by default, the configuration related to the package can be found in the $solr. Home/conf folder inside the Solr installation. An index contains the sequence of the document, and external storage devices can be configured in Apache Solr.

Below are some of major building blocks of Apache Solr:

Request Handler: Any incoming requests to Apache Solr like query request or update request are processed by Request Handlers.Depending on teh requirement the relevant handler will be chosen.

Search Component: It is a feature of search available in Apache Solr. It can be spell checking, faceting, querying, hit highlighting, etc. All the components are registered as the search handlers. You can register multiple components to a search handler.

Query Parser: Any query passed on to Apache Solr will be parsed by Query Parser for synntax errors. It translates them to a format that the Lucene application understands after parsing the queries.

Response Writer: In Apache Solr, the Response Writer is the component that generates the formatted output for the queries of the user. Apache Solr supports formats of response such as XML, JSON, CSV, etc.

Analyzer/tokenizer : Apache Solr recognizes data in the form of tokens. It analyzes the content that divides it into tokens and passes all the tokens to Lucene. An analyzer in Apache Solr can be used to examines the text of fields and creates a token stream. The token stream prepared by the analyzer can be broken into tokens.

Update Request Processor: When an update request is passed on to Apache Solr, the update request will be run through a collection of plugins ( i.e., signature, logging, indexing), which is known as update request processor collectively. This update request processor is responsible for the modifications, such as adding a field, dropping a field, etc.