What is Apache Zookeeper?

Apache ZooKeeper is an open source Apache project that provides a centralized service for providing configuration information, naming, synchronization and group services over large clusters in distributed systems. The goal is to make these systems easier to manage with improved, more reliable propagation of changes.

Apache ZooKeeper is a service used by a cluster (group of nodes) to coordinate between themselves and maintain shared data with robust synchronization techniques. ZooKeeper is itself a distributed application providing services for writing a distributed application.

The common services provided by ZooKeeper are as follows:

  • Naming service: Identifying the nodes in a cluster by name. It is similar to DNS, but for nodes.

  • Configuration management: Latest and up-to-date configuration information of the system for a joining node.

  • Cluster management: Joining / leaving of a node in a cluster and node status at real time.

  • Leader election: Electing a node as leader for coordination purpose.

  • Locking and synchronization service: Locking the data while modifying it. This mechanism helps you in automatic fail recovery while connecting other distributed applications like Apache HBase.

  • Highly reliable data registry: Availability of data even when one or a few nodes are down.

A Zookeeper comprises of a Client-Server architecture. A Zookeeper ensemble may contain 3 or more Zookeeper servers/nodes. Each server/node provides all services to the clients connecting to them. For high availability, one single node in the ensemble is elected as the Leader and all other nodes in the ensemble are termed as Followers.