Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

Introduction to Apache ZooKeeper

Save for later
  • 1560 min read
  • 2015-02-05 00:00:00

article-image

In this article by Saurav Haloi, author of the book a Apache Zookeeper Essentials, we will learn about Apache ZooKeeper is a software project of the Apache Software Foundation; it provides an open source solution to the various coordination problems in large distributed systems. ZooKeeper as a centralized coordination service is distributed and highly reliable, running on a cluster of servers called a ZooKeeper Ensemble. Distributed consensus, group management, presence protocols, and leader election are implemented by the service so that the applications do not need to reinvent the wheel by implementing them on its own. On top of these, the primitives exposed by ZooKeeper can be used by applications to build much more powerful abstractions for solving a wide variety of problems.

(For more resources related to this topic, see here.)

Apache ZooKeeper is implemented in Java. It ships with C, Java, Perl, and Python client bindings. Community contributed client libraries are available for a plethora of languages like Go, Scala, Erlang, and so on. Apache ZooKeeper is widely used by large number of organizations, such as Yahoo Inc., Twitter, Netflix and Facebook, in their distributed application platforms as a coordination service.

In this article we will look into installation and configuration of Apache ZooKeeper, some of the concepts associated with it followed by programming using Python client library of ZooKeeper. We will also read how we can implement some of the important constructs of distributed programming using ZooKeeper.

Download and installation

ZooKeeper is supported by a wide variety of platforms. GNU/Linux and Oracle Solaris are supported as development and production platforms for both server and client. Windows and Mac OS X are recommended only as development platforms for both server and client.

ZooKeeper is implemented in Java and requires Java 6 or later versions to run.

Let's download the stable version from one of the mirrors, say Georgia Tech's Apache download mirror (http://b.gatech.edu/1xElxRb) in the following example:

$ wget
http://www.gtlib.gatech.edu/pub/apache/zookeeper/stable/zookeeper-
3.4.6.tar.gz
$ ls -alh zookeeper-3.4.6.tar.gz
-rw-rw-r-- 1 saurav saurav 17M Feb 20 2014 zookeeper-3.4.6.tar.gz

Once we have downloaded the ZooKeeper tarball, installing and setting up a standalone ZooKeeper node is pretty simple and straightforward. Let's extract the compressed tar archive into /usr/share:

$ tar -C /usr/share -zxf zookeeper-3.4.6.tar.gz
$ cd /usr/share/zookeeper-3.4.6/
$ ls
bin CHANGES.txt contrib docs ivy.xml LICENSE.txt
README_packaging.txt recipes zookeeper-3.4.6.jar zookeeper-
3.4.6.jar.md5
build.xml conf dist-maven ivysettings.xml lib
NOTICE.txt README.txt src zookeeper-3.4.6.jar.asc
zookeeper-3.4.6.jar.sha1

The location where the ZooKeeper archive is extracted in our case, /usr/share/zookeeper-3.4.6, can be exported as ZK_HOME as follows:

$ export ZK_HOME=/usr/share/zookeeper-3.4.6

Configuration

Once we have extracted the tarball, the next thing is to configure ZooKeeper. The conf folder holds the configuration files for ZooKeeper. ZooKeeper needs a configuration file called zoo.cfg in the conf folder inside the extracted ZooKeeper folder. There is a sample configuration file that contains some of the configuration parameters for reference.

Let's create our configuration file with the following minimal parameters and save it in the conf directory:

$ cat conf/zoo.cfg
tickTime=2000
dataDir=/var/lib/zookeeper
clientPort=2181

The configuration parameters' meanings are explained here:

  • tickTime: This is measured in milliseconds; it is used for session registration and to do regular heartbeats by clients with the ZooKeeper service. The minimum session timeout will be twice the tickTime parameter.
  • dataDir: This is the location to store the in-memory state of ZooKeeper; it includes database snapshots and the transaction log of updates to the database. Extracting the ZooKeeper archive won't create this directory, so if this directory doesn't exist in the system, you will need to create it and set writable permission to it.
  • clientPort: This is the port that listens for client connections, so it is where the ZooKeeper clients will initiate a connection. The client port can be set to any number, and different servers can be configured to listen on different ports. The default is 2181.

ZooKeeper needs the JAVA_HOME environment variable to be set correctly. To see if this is set in your system, run the following command:

$ echo $JAVA_HOME

Starting the ZooKeeper server

Now, considering that Java is installed and working properly, let's go ahead and start the ZooKeeper server. All ZooKeeper administration scripts to start/stop the server and invoke the ZooKeeper command shell are shipped along with the archive in the bin folder with the following code:

$ pwd
/usr/share/zookeeper-3.4.6/bin
$ ls
README.txt  zkCleanup.sh  zkCli.cmd  zkCli.sh  zkEnv.cmd  zkEnv.sh  zkServer.cmd  zkServer.sh

The scripts with the .sh extension are for Unix platforms (GNU/Linux, Mac OS X, and so on), and the scripts with the .cmd extension are for Microsoft Windows operating systems.

To start the ZooKeeper server in a GNU/Linux system, you need to execute the zkServer.sh script as follows. This script gives options to start, stop, restart, and see the status of the ZooKeeper server:

$ ./zkServer.sh 
JMX enabled by default
Using config: /usr/share/zookeeper-3.4.6/bin/../conf/zoo.cfg
Usage: ./zkServer.sh
{start|start-foreground|stop|restart|status|upgrade|print-cmd}

To avoid going to the ZooKeeper install directory to run these scripts, you can include it in your PATH variable as follows:

export PATH=$PATH:/usr/share/zookeeper-3.4.6/bin

Executing zkServer.sh with the start argument will start the ZooKeeper server. A successful start of the server will show the following output:

$ zkServer.sh start
JMX enabled by default
Using config: /usr/share/zookeeper-3.4.6/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED

To verify that the ZooKeeper server has started, you can use the following ps command:

$ ps –ef | grep zookeeper | grep –v grep | awk '{print $2}'
5511

The ZooKeeper server's status can be checked with the zkServer.sh script as follows:

$ zkServer.sh status
JMX enabled by default
Using config: /usr/share/zookeeper-3.4.6/bin/../conf/zoo.cfg
Mode: standalone

Connecting to ZooKeeper with a Java-based shell

To start the Java-based ZooKeeper command-line shell, we simply need to run zkCli.sh of the ZK_HOME/bin folder with the server IP and port as follows:

${ZK_HOME}/bin/zkCli.sh –server zk_server:port

In our case, we are running our ZooKeeper server on the same machine, so the ZooKeeper server will be localhost, or the loop-back address will be 127.0.0.1. The default port we configured was 2181:

$ zkCli.sh -server localhost:2181

As we connect to the running ZooKeeper instance, we will see the output similar to the following one in the terminal (some output is omitted):

Connecting to localhost:2181
...............
...............
Welcome to ZooKeeper!
JLine support is enabled
.............
WATCHER::
WatchedEvent state:SyncConnected type:None path:null
[zk: localhost:2181(CONNECTED) 0]

To see a listing of the commands supported by the ZooKeeper Java shell, you can run the help command in the shell prompt:

[zk: localhost:2181(CONNECTED) 0] help
ZooKeeper -server host:port cmd args
  connect host:port
  get path [watch]
  ls path [watch]
  set path data [version]
  rmr path
  delquota [-n|-b] path
  quit 
  printwatches on|off
  create [-s] [-e] path data acl
  stat path [watch]
  close 
  ls2 path [watch]
  history 
  listquota path
  setAcl path acl
  getAcl path
  sync path
  redo cmdno
  addauth scheme auth
  delete path [version]
  setquota -n|-b val path

We can execute a few simple commands to get a feel of the command-line interface. Let's start by running the ls command, which, as in Unix, is used for listing:

[zk: localhost:2181(CONNECTED) 1] ls /
[zookeeper]

Now, the ls command returned a string called zookeeper, which is a znode in the ZooKeeper terminology. We can create a znode through the ZooKeeper shell as follows:

To begin with, let's create a HelloWorld znode with empty data.

[zk: localhost:2181(CONNECTED) 2] create /HelloWorld ""
Created /HelloWorld
[zk: localhost:2181(CONNECTED) 3] ls /
[zookeeper, HelloWorld]

We can delete the znode created by issuing the delete command as follows:

[zk: localhost:2181(CONNECTED) 4] delete /HelloWorld
[zk: localhost:2181(CONNECTED) 5] ls /
[zookeeper]

The ZooKeeper data model

ZooKeeper allows distributed processes to coordinate with each other through a shared hierarchical namespace of data registers. The namespace looks quite similar to a Unix filesystem. The data registers are known as znodes in the ZooKeeper nomenclature.

introduction-apache-zookeeper-img-0

ZooKeeper has two types of znodes: persistent and ephemeral. There is a third type that you might have heard of, called a sequential znode, which is a kind of a qualifier for the other two types. Both persistent and ephemeral znodes can be sequential znodes as well.

The persistent znode

As the name suggests, persistent znodes have a lifetime in the ZooKeeper’s namespace until they’re explicitly deleted. A znode can be deleted by calling the delete API call.

The ephemeral znode

An ephemeral znode is deleted by the ZooKeeper service when the creating client’s session ends. An end to a client’s session can happen because of disconnection due to a client crash or explicit termination of the connection.

The sequential znode

A sequential znode is assigned a sequence number by ZooKeeper as a part of its name during its creation. The value of a monotonously increasing counter (maintained by the parent znode) is appended to the name of the znode.

The ZooKeeper Watches

ZooKeeper is designed to be a scalable and robust centralized service for very large distributed applications. A common design anti-pattern associated while accessing such services by clients is through polling or a pull kind of a model. A pull model often suffers from scalability problems when implemented in large and complex distributed systems. To solve this problem, ZooKeeper designers implemented a mechanism where clients can get notifications from the ZooKeeper service instead of polling for events. This resembles a push model, where notifications are pushed to the registered clients of the ZooKeeper service.

Clients can register with the ZooKeeper service for any changes associated with a znode. This registration is known as setting a watch on a znode in ZooKeeper terminology. Watches allow clients to get notifications when a znode changes in any way. A watch is a one-time operation, which means that it triggers only one notification. To continue receiving notifications over time, the client must reregister the watch upon receiving each event notification.

ZooKeeper watches are a one-time trigger. What this means is that if a client receives a watch event and wants to get notified of future changes, it must set another watch. Whenever a watch is triggered, a notification is dispatched to the client that had set the watch. Watches are maintained in the ZooKeeper server to which a client is connected, and this makes it a fast and lean means of event notification.

The watches are triggered for the following three changes to a znode:

  1. Any changes to the data of a znode, such as when new data is written to the znode’s data field using the setData operation.
  2. Any changes to the children of a znode. For instance, children of a znode are deleted with the delete operation.
  3. A znode being created or deleted, which could happen in the event that a new znode is added to a path or an existing one is deleted.

Again, ZooKeeper asserts the following guarantees with respect to watches and notifications:

  • ZooKeeper ensures that watches are always ordered in the FIFO manner and that notifications are always dispatched in order
  • Watch notifications are delivered to a client before any other change is made to the same znode
  • The order of the watch events are ordered with respect to the updates seen by the ZooKeeper service

ZooKeeper operations

ZooKeeper’s data model and its API support the following nine basic operations: Operation Description

Operation

Event-generating Actions

exists

A znode is created or deleted, or its data is updated

getChildren

A child of a znode is created or deleted, or the znode itself is deleted

getData

A znode is deleted or its data is updated

Watches and ZooKeeper operations

The read operations in znodes, such as exists, getChildren, and getData, allow watches to be set on them. On the other hand, the watches triggered by znode's write operations, such as create, delete, and setData. ACL operations do not participate in watches.

The following are the types of watch events that might occur during a znode state change:

  • NodeChildrenChanged: A znode’s child is created or deleted
  • NodeCreated: A znode is created in a ZooKeeper path
  • NodeDataChanged: The data associated with a znode is updated
  • NodeDeleted: A znode is deleted in a ZooKeeper path

Programming with Apache ZooKeeper with Python

ZooKeeper is easily programmable and has client binding for a plethora of languages. Its shipped with official Java, C, Perl and Python client libraries. Here we will look at programming ZooKeeper with Python:

Apache ZooKeeper is shipped with an official client binding for Python, which is developed on top of the C bindings. It can be found in the contrib/zkpython directory of the ZooKeeper distribution. To build and install the Python binding, refer to the instructions in the README file there. In this section, we will study about another popular Python client library for ZooKeeper, called Kazoo (https://kazoo.readthedocs.org/).

Kazoo is a pure Python library for ZooKeeper, which means that unlike the official Python bindings, Kazoo is implemented fully in Python and has no dependency on the C bindings of ZooKeeper. Along with providing both synchronous and asynchronous APIs, the Kazoo library also provides APIs for some distributed data structure primitives such as distributed locks, leader election, distributed queues, and so on.

Installation of Kazoo is very simple, which can be done either with pip or easy_install installers:

Using pip, Kazoo can be installed with the following command:

$ pip install kazoo

Using easy_install, Kazoo is installed as follows:

$ easy_install kazoo

To verify whether Kazoo is installed properly, let's try to connect to the ZooKeeper instance and print the list of znodes in the root path of the tree, as shown in the following screenshot:

introduction-apache-zookeeper-img-1

In the preceding example, we imported the KazooClient, which is the main ZooKeeper client class. Then, we created an object of the class (an instance of KazooClient) by connecting to the ZooKeeper instance that is running on the localhost. Once we called the start() method, it initiates a connection to the ZooKeeper server. Once successfully connected, the instance contains the handle to the ZooKeeper session. Now, when we called the get_children() method on the root path of the ZooKeeper namespace, it returned a list of the children. Finally, we closed the connection by calling the stop() method.

A watcher implementation

Kazoo provides a higher-level child and data watching API's as a recipe through a module called kazoo.recipe.watchers. This module provides the implementation of DataWatch and ChildrenWatch along with another class called PatientChildrenWatch. The PatientChildrenWatch> class returns values after the children of a node don't change for a period of time, unlike the other two, which return each time an event is generated.

Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime

Let's look at the implementation of a simple children watcher client, which will generate an event each time a znode is added or deleted from the ZooKeeper path:

import signal
from kazoo.client import KazooClient
from kazoo.recipe.watchers import ChildrenWatch
zoo_path = '/MyPath'
zk = KazooClient(hosts='localhost:2181')
zk.start()
zk.ensure_path(zoo_path)
@zk.ChildrenWatch(zoo_path)
def child_watch_func(children):
print "List of Children %s" % children
while True:
signal.pause()

In this simple implementation of a children watcher, we connect to the ZooKeeper server that is running in the localhost, using the following code, and create a path /MyPath:

zk.ensure_path(zoo_path)
@zk.ChildrenWatch(zoo_path)

We then set a children watcher on this path and register a callback method child_watch_func, which prints the current list of children on the event generated in /MyPath.

When we run this client watcher in a terminal, it starts listening to events:

introduction-apache-zookeeper-img-2

On another terminal, we will create some znodes in/MyPath with the ZooKeeper shell:

introduction-apache-zookeeper-img-3

We observe that the children watcher client receives these znode creation events, and it prints the list of the current children in the terminal window:

introduction-apache-zookeeper-img-4

Similarly, if we delete the znodes that we just created, the watcher will receive the events and subsequently will print the children listing in the console:

introduction-apache-zookeeper-img-5

The messages shown in the following screenshot are printed in the terminal where the children watcher is running:

introduction-apache-zookeeper-img-6

ZooKeeper recipes

In this section, you will learn to develop high-level distributed system constructs and data structures using ZooKeeper. As mentioned earlier, most of these constructs and functions are of utmost importance in building scalable distributed architectures, but they are fairly complicated to implement from scratch. Developers can often get bogged down while implementing these and integrating them with their application logic. In this section, you will learn how to develop algorithms to build some of these high-level functions using ZooKeeper primitives and data model and see how ZooKeeper makes it simple, scalable, and error free, with much lesser code.

Barrier

Barrier is a type of synchronization method used in distributed systems to block the processing of a set of nodes until a condition is satisfied. It defines a point where all nodes must stop their processing and cannot proceed until all the other nodes reach this barrier.

The algorithm to implement a barrier using ZooKeeper is as follows:

  1. To start with, a znode is designated to be a barrier znode, say /zk_barrier.
  2. The barrier is said to be active in the system if this barrier znode exists
  3. .

  4. Each client calls the ZooKeeper API's exists() function on /zk_barrier by registering for watch events on the barrier znode (the watch event is set to true).
  5. If the exists() method returns false, the barrier no longer exists, and the client proceeds with its computation.
  6. Else, if the exists() method returns true, the clients just waits for watch events.
  7. Whenever the barrier exit condition is met, the client in charge of the barrier will delete /zk_barrier.
  8. The deletion triggers a watch event, and on getting this notification, the client calls the exists() function on /zk_barrier again.
  9. Step 7 returns true, and the clients can proceed further.

    The barrier exists until the barrier znode ceases to exist!

In this way, we can implement a barrier using ZooKeeper without much of an effort.

The example cited so far is for a simple barrier to stop a group of distributed processes from waiting on some condition and then proceed together when the condition is met. There is another type of barrier that aids in synchronizing the beginning and end of a computation; this is known as double barrier. The logic of a double barrier states that a computation is started when the required number of processes join the barrier. The processes leave after completing the computation, and when the number of processes participating in the barrier become zero, the computation is stated to end.

The algorithm for a double barrier is implemented by having a barrier znode that serves the purpose of being a parent for individual process znodes participating in the computation. It's algorithm is outlined as follows:

Phase 1: Joining the barrier znode can be done as follows:

  1. Suppose the barrier znode is represented by znode/barrier. Every client process registers with the barrier znode by creating an ephemeral znode with /barrier as the parent. In real scenarios, clients might register using their hostnames.
  2. The client process sets a watch event for the existence of another znode called ready under the /barrier znode and waits for the node to appear.
  3. A number N is predefined in the system; this governs the minimum number of clients to join the barrier before the computation can start.
  4. While joining the barrier, each client process finds the number of child znodes of /barrier:
    M = getChildren(/barrier, watch=false)
  5. 5.

  6. If M is less than N, the client waits for the watch event registered in step
  7. 3.

  8. Else, if M is equal to N, then the client process creates the ready znode under /barrier.
  9. The creation of the ready znode in step 5 triggers the watch event, and each client starts the computation that they were waiting so far to do.

Phase 2: Leaving the barrier can be done as follows:

  1. Client processing on finishing the computation deletes the znode it created under /barrier (in step 2 of Phase 1: Joining the barrier).
  2. The client process then finds the number of children under /barrier:

    M = getChildren(/barrier, watch=True)

    If M is not equal to 0, this client waits for notifications (observe that we have set the watch event to True in the preceding call).

    If M is equal to 0, then the client exits the barrier znode

The preceding procedure suffers from a potential herd effect where all client processes wake up to check the number of children left in the barrier when a notification is triggered. To get away with this, we can use a sequential ephemeral znode to be created in step 2 of Phase 1: Joining the barrier. Every client process watches it's next lowest sequential ephemeral znode to go away as an exit criterion. This way, only a single event is generated for any client completing the computation, and hence, not all clients need to wake up together to check on its exit condition. For a large number of client processes participating in a barrier, the herd effect can negatively impact the scalability of the ZooKeeper service, and developers should be aware of such scenarios.

A Java language implementation of a double barrier can be found in the ZooKeeper documentation at http://zookeeper.apache.org/doc/r3.4.6/zookeeperTutorial.html.

Queue

A distributed queue is a very common data structure used in distributed systems. A special implementation of a queue, called a producer-consumer queue, is where a collection of processes called producers generate or create new items and put them in the queue, while consumer processes remove the items from the queue and process them. The addition and removal of items in the queue follow a strict ordering of first in first out (FIFO).

A producer-consumer queue can be implemented using ZooKeeper. A znode will be designated to hold a queue instance, say queue-znode. All queue items are stored as znodes under this znode. Producers add an item to the queue by creating a znode under the queue-znode, and consumers retrieve the items by getting and then deleting a child from the queue-znode.

The FIFO order of the items is maintained using sequential property of znode provided by ZooKeeper. When a producer process creates a znode for a queue item, it sets the sequential flag. This lets ZooKeeper append the znode name with a monotonically increasing sequence number as the suffix. ZooKeeper guarantees that the sequence numbers are applied in order and are not reused. The consumer process processes the items in the correct order by looking at the sequence number of the znode.

The pseudocode for the algorithm to implement a producer-consumer queue using ZooKeeper is shown here:

  1. Let /_QUEUE_ represent the top-level znode for our queue implementation, which is also called the queue-node.
  2. Clients acting as producer processes put something into the queue by calling the create() method with the znode name as "queue-" and set the sequence and ephemeral flags if the create() method call is set true:
    create( “queue-“, SEQUENCE_EPHEMERAL)

    The sequence flag lets the new znode get a name like queue-N, where N is a monotonically increasing number

  3. Clients acting as consumer processes process a getChildren() method call on the queue-node with a watch event set to true:
    M = getChildren(/_QUEUE_, true)

    It sorts the children list M, takes out the lowest numbered child znode from the list, starts processing on it by taking out the data from the znode, and then deletes it.

  4. The client picks up items from the list and continues processing on them. On reaching the end of the list, the client should check again whether any new items are added to the queue by issuing another get_children() method call.
  5. >

  6. The algorithm continues when get_children() returns an empty list; this means that no more znodes or items are left under /_QUEUE_.

It's quite possible that in step 3, the deletion of a znode by a client will fail because some other client has gained access to the znode while this client was retrieving the item. In such scenarios, the client should retry the delete call.

Using this algorithm for implementation of a generic queue, we can also build a priority queue out of it, where each item can have a priority tagged to it. The algorithm and implementation is left as an exercise to the readers.

C and Java implementations of the distributed queue recipe are shipped along with the ZooKeeper distribution under the recipes folder. Developers can use this recipe to implement distributed lock in their applications.

Kazoo, the Python client library for ZooKeeper, has distributed queue implementations inside the kazoo.recipe.queue module. This queue implementation has priority assignment to the queue items support as well as queue locking support that are built into it.

Lock

A lock in a distributed system is an important primitive that provides the applications with a means to synchronize their access to shared resources. Distributed locks need to be globally synchronous to ensure that no two clients can hold the same lock at any instance of time.

Typical scenarios where locks are inevitable are when the system as a whole needs to ensure that only one node of the cluster is allowed to carry out an operation at a given time, such as:

  • Write to a shared database or file
  • Act as a decision subsystem
  • Process all I/O requests from other nodes

 

  • ZooKeeper can be used to implement mutually exclusive locks for processes that run on different servers across different networks and even geographically apart.

 

To build a distributed lock with ZooKeeper, a persistent znode is designated to be the main lock-znode. Client processes that want to acquire the lock will create an ephemeral znode with a sequential flag set under the lock-znode. The crux of the algorithm is that the lock is owned by the client process whose child znode has the lowest sequence number. ZooKeeper guarantees the order of the sequence number, as sequence znodes are numbered in a monotonically increasing order. Suppose there are three znodes under the lock-znode: l1, l2, and l3. The client process that created l1 will be the owner of the lock. If the client wants to release the lock, it simply deletes l1, and then, the owner of l2 will be the lock owner and so on.

The pseudocode for the algorithm to implement a distributed lock service with ZooKeeper is shown here:

Let the parent lock node be represented by a persistent znode, /_locknode_, in the Zookeeper tree.

Phase 1: Acquire a lock with the following steps:

  1. Call the create("/_locknode_/lock-",CreateMode=EPHEMERAL_SEQUENTIAL) method.
  2. Call the getChildren("/_locknode_/lock-", false) method on the lock node. Here, the watch flag is set to false, as otherwise, it can lead to a herd effect.
  3. If the znode created by the client in step 1 has the lowest sequence number suffix, then the client is owner of the lock, and it exits the algorithm. Call the exists("/_locknode_/, True) method.
  4. If the exists() method returns false, go to step 2.
  5. If the exists() method returns true, wait for notifications for the watch event set in step 4.

Phase 2: Release a lock as follows:

  1. The client holding the lock deletes the node, thereby triggering the next client in line to acquire the lock.
  2. The client that created the next higher sequence node will be notified and hold the lock. The watch for this event was set in step 4 of Phase 1,:Acquire a lock.

While it's not recommended that you use a distributed system with a large number of clients due to the herd effect, if the other clients also need to know about the change of lock ownership, they could set a watch on the /_locknode_ lock node for events of the NodeChildrenChanged type and can determine the current owner.

If there was a partial failure in the creation of znode due to connection loss, it's possible that the client won't be able to correctly determine whether it successfully created the child znode. To resolve such a situation, the client can store its session ID in the znode data field or even as a part of the znode name itself. As a client retains the same session ID after a reconnect, it can easily determine whether the child znode was created by it by looking at the session ID.

The idea of creating an ephemeral znode prevents a potential dead-lock situation that might arise when a client dies while holding a lock. However, as the property of the ephemeral znode dictates that it gets deleted when the session times out or expires, ZooKeeper will delete the znode created by the dead client, and the algorithm runs as usual. However, if the client hangs for some reason but the ZooKeeper session is still active, then we might get into a deadlock. This can be solved by having a monitor client that triggers an alarm when the lock holding time for a client crosses a predefined time out.

The ZooKeeper distribution is shipped with the C and Java language implementation of a distributed lock in the recipes folder. The recipe implements the algorithm you have learned so far and also takes into account the problems associated with partial failure and herd effect.

The previous recipe of a mutually exclusive lock can be modified to implement a shared lock as well. Readers can find the algorithm and pseudocode for a shared lock using Zookeeper in the documentation at http://zookeeper.apache.org/doc/r3.4.6/recipes.html#Shared+Locks.

More ZooKeeper recipes are available at:

http://zookeeper.apache.org/doc/trunk/recipes.html

Summary

In this article, we read about the fundamentals of Apache ZooKeeper, programming it and how to implement common distributed data structures with ZooKeeper. For more details on Apache ZooKeeper, please visit its project page.

Resources for Article:


Further resources on this subject:


Modal Close icon
Modal Close icon