Distributed hash tables

In this article by Richard Reese, author of the book Learning Network Programming with Java, you will learn that a distributed hash table (DHT) uses a key/value pair to locate resources in a network. This mapping function is spread across peers, thus making it a distributed one. This architecture allows P2P networks to scale easily to a large number of nodes and handle the peers joining and leaving a network randomly. A DHT is the basis for supporting core P2P services. Many applications use DHT including BitTorrent, Freenet, and YaCy.

(For more resources related to this topic, see here.)

The following figure illustrates mapping a key to a value. The key is usually a string containing the identity of a resource, such as the name of a book, and the value is a number generated to represent the resource. The number can be used to locate the resource in a network and may correspond to the identifier of a node.

P2P networks have been in use for a while. The evolution of these networks is reflected in how resources are mapped, as typified by Napster, Gnutella, and Freenet.

  • Napster (https://en.wikipedia.org/wiki/Napster) was the first large-scale P2P content delivery system. It used a server to keep track of the nodes in the network, and the nodes held the actual data. When a client needed this data, the server would look up the current set of nodes that holds the data and this node's location would be sent back to the client. The client would then contact the node holding the data. This made it easy for attacks to be launched against it and eventually its demise through lawsuits.
  • Gnutella (https://web.archive.org/web/20080525005017, http://www.gnutella.com/) did not use a central server but broadcasted to every node in a network. This resulted in the network being flooded with messages and the approach was modified in later versions.
  • Freenet (https://freenetproject.org/) uses a heuristic key-based routing scheme and focuses on censorship and anonymity issues. However, DHS uses a more structured key-based routing approach, which results in:
    • Decentralization
    • Fault tolerant
    • Scalability
    • Efficiency

However, DHT does not support exact-match searches. If this type of search is needed, it has to be added.

DHT components

A keyspace is a set of 160-bit strings (or keys) used to identify an element. Keyspace partitioning is the process of splitting the keyspace among the nodes of the network. An overlay network connects the nodes.

A commonly used hashing algorithm is Secure Hash Algorithm (SHA-1) (https://en.wikipedia.org/wiki/SHA-1). SHA-1 was designed by NSA and generates a 160-bit hash value known as a message digest. Most P2P models do not require the developer to explicitly perform the hashing function , but it is instructive to see how it is done. The following is an example of using Java to create a digest.

The MessageDigest class in the getInstance method accepts a string specifying the algorithm to use and returns a MessageDigest instance. Its update method requires an array of bytes containing the key to hash. In this example, a string is used. The digest method returns an array of bytes holding the hash value. The byte array is then displayed as a hexadecimal number:

      String message = "String to be hashed";
       try {
        MessageDigest messageDigest =
           MessageDigest.getInstance("SHA-1");
         messageDigest.update(message.getBytes());
         byte[] digest = messageDigest.digest();

         StringBuffer buffer = new StringBuffer();
         for (byte element : digest) {
           buffer.append(Integer
            .toString((element & 0xff) + 0x100, 16)
             .substring(1));
         }
         System.out.println("Hex format : " +
          buffer.toString());

       }
       catch (NoSuchAlgorithmException ex) {
         // Handle exceptions
       }

Executing this sequence will produce the following output:

Hex format : 434d902b6098ac050e4ed79b83ad93155b161d72

To store data, such as a file, we can use the filename to create a key. A put type function is then used to store the data, as follows:

put(key, data)

To retrieve the data corresponding to a key, a get type function is used, as follows:

data = get(key)

Every node in an overlay either contains data represented by the key or is a node closer to the node containing the data. The routing algorithm determines the next node to visit on the way to the node containing the data.

DHT implementations

There are several Java implementations of DHTs, as listed in the following table:

Implementation

URL

openkad

https://code.google.com/p/openkad/

Open Chord

http://open-chord.sourceforge.net/

TomP2P

http://tomp2p.net/

JDHT

http://dks.sics.se/jdht/

We will use the Java distributed hash table (JDHT) to illustrate the use of a DHT.

Using a JDHT

In order to use a JDHT, you will need the JAR files listed in the following table. The dks.jar file is the main JAR file used. However, the other two JAR files are used by the JDHT. An alternate source for the dks.jar file is also listed:

JAR

Site

dks.jar

http://dks.sics.se/jdht/

https://www.ac.upc.edu/projects/cms/browser/cms/trunk/lib/dks.jar?rev=2

xercesImpl.jar

http://www.java2s.com/Code/Jar/x/DownloadxercesImpljar.htm

Apache log4j 1.2.17

https://logging.apache.org/log4j/1.2/download.html

 The following example is adapted from the one at the website. First, we will create a JDHT instance. The JDHT instance uses port 4440 as default. With this instance, we can then use its put method to add a key/value pair to the table:

   try {
    JDHT DHTExample = new JDHT();
     DHTExample.put("Java SE API",
      "http://docs.oracle.com/javase/8/docs/api/");
     ...
   } catch (IOException ex) {
    // Handle exceptions
   }

In order for a client to connect with this instance, we need to get a reference to this node. This is achieved, as shown here, where it is displayed:

   System.out.println(((JDHT) DHTExample).getReference());

The following code will keep the program running until the user terminates it. The close method is then used to close the table, as follows:

   Scanner scanner = new Scanner(System.in);
   System.out.println("Press Enter to terminate application: ");
   scanner.next();
   DHTExample.close();

When the program is executed, you will get an output similar to the following:

dksref://192.168.1.9:4440/0/2179157225/0/1952355557247862269
Press Enter to terminate application: 

The client application will follow. A new JDHT instance is created using a different port. The second argument is the reference to the first application. You will need to copy the reference and paste it into the client. A different reference will be generated each time the first application is executed:

   try {
    JDHT myDHT = new JDHT(5550, "dksref://192.168.1.9:4440"
      + "/0/2179157225/0/1952355557247862269");
     ...
   }
   catch (IOException | DKSTooManyRestartJoins |
    DKSIdentifierAlreadyTaken | DKSRefNoResponse ex) {
      // Handle exceptions
    }

Next, we will use the get method to retrieve the value associated with the key. The value is then displayed, and the application is closed.

  String value = (String) myDHT.get("Java SE API");
   System.out.println(value);
   myDHT.close();

The output is as follows:

http://docs.oracle.com/javase/8/docs/api/

This simple demonstration illustrates the basics of a distributed hash table.

Using FreePastry

Pastry (http://www.freepastry.org/) is a P2P routing overlay system. FreePastry(http://www.freepastry.org/FreePastry/) is an open source implementation of Pastry and is simple enough for us to illustrate many of the features of a P2P system. Pastry routes messages with a network of n nodes in O(log n) steps. This means that given a network of nodes, it requires at the most a log base 2 of n steps to reach the node. This is an efficient routing approach. However, while it may only require traversing three nodes to get to a resource, it may require a considerable number of IP hops to get to it.

Pastry uses the concept of leaf sets in the routing process. Each node has a leaf set. A leaf set is a collection of GUIDs and IP addresses of nodes that is numerically the closest to this node. The nodes are logically arranged in a circle, as shown in the next section. The leaf set of a node are those that are close to this node.

In the following figure, each dot represents a node with an identifier. The addresses used here range from 0 to FFFFFF. The real addresses range from 0 to 2128. If a message representing a request originates at address 9341A2 and needs to be sent to address E24C12, based on the numerical address, the overlay router may route the messages through the intermediate nodes, as depicted by the arrows:

Other applications are built on top of FreePastry, including:

  • SCRIBE: This is a group communication and event notification system supporting the publisher/subscriber paradigm
  • PAST: This is an archival storage utility system
  • SplitStream: This supports content streaming and distribution
  • Pastiche: This is a backup system

Each of these applications uses an API to support their use.

The FreePastry demonstration

To demonstrate how FreePastry supports a P2P application, we will create an application based on the FreePastry tutorials found at https://trac.freepastry.org/wiki/FreePastryTutorial. In this demonstration, we will create two nodes and demonstrate how they can send and receive messages. The demonstration uses three classes:

  • FreePastryExample: This is used to bootstrap the network
  • FreePastryApplication: This executes the functionality of the node
  • PastryMessage: This is the message sent between nodes

Let's start with the bootstrap application.

Understanding the FreePastryExample class

There are several components used with FreePastry applications. These include:

  • Environment: This class represents the application's environment
  • The bind port: This represents the local port that the application will bind to
  • The boot port: This is the boot port used for the node's InetAddress
  • The boot address: This is the IP address of the boot node

The FreePastryExample class is defined in the following code. It contains a main method and a constructor:

public class FreePastryExample {
...
}

We will start with the main method. An instance of the Environment class is created first. This class holds the parameter settings for the node. Next, the NAT search policy is set to never, which allows us to use the program in a LAN without difficulty. Take a look at the following code:

   public static void main(String[] args) throws Exception {
    Environment environment = new Environment();
     environment.getParameters()
     .setString("nat_search_policy", "never");
     ...
   }

The ports and the InetSocketAddress instance are initialized. We will set both ports to the same number at this time. We used the IP address 192.168.1.14 to instantiate the InetAddress object. You would need to use the address of your machine instead. This is a LAN address. Do not use 127.0.0.1 as it will not work properly. The InetAddress object, along with the bootPort value, is used to create the InetSocketAddress instance. All of this is placed in a try block to handle exceptions. Take a look at the following code:

 

  try {
    int bindPort = 9001;
     int bootPort = 9001;
     InetAddress bootInetAddress =
      InetAddress.getByName("192.168.1.14");
     InetSocketAddress bootAddress =
      new InetSocketAddress(bootInetAddress, bootPort);
     System.out.println("InetAddress: " + bootInetAddress);
     ...
   } catch (Exception e) {
    // Handle exceptions
   }

The last task is to create an instance of the FreePastryExample class by calling the constructor, as follows:

   FreePastryExample freePastryExample =
    new FreePastryExample(bindPort, bootAddress, environment);

The constructor will follow and create and launch the node's application. To accomplish this, we need to create a PastryNode instance and attach the application to it. To create the node, we will use a factory.

Every node needs a unique ID. The RandomNodeIdFactory class generates an ID based on the current environment. Using this object with the bind port and environment, an instance of SocketPastryNodeFactory is created. With this factory, the newNode method is invoked to create our PastryNode instance. Now, take a look at the following code:

   public FreePastryExample(int bindPort,
  InetSocketAddress bootAddress,
   Environment environment) throws Exception {
    NodeIdFactory nidFactory =
      new RandomNodeIdFactory(environment);
     PastryNodeFactory factory =
      new SocketPastryNodeFactory(
     nidFactory, bindPort, environment);
     PastryNode node = factory.newNode();
     ...
   }

Next, an instance of the FreePastryApplication class is created, and the node is started using the boot method, as follows:

   FreePastryApplication application =
    new FreePastryApplication(node);
   node.boot(bootAddress);
   ...

The node's ID is then displayed, as shown in the next code sequence. As there are multiple nodes in the network, we will pause for 10 seconds to allow the other nodes to start. We used the FreePastry timer to put this delay into effect. A random node ID is created, and the application's routeMessage message is called to send a message to this node. Take a look at the following code:

  System.out.println("Node " + node.getId().toString() + "
     created");
   environment.getTimeSource().sleep(10000);
   Id randomId = nidFactory.generateNodeId();
   application.routeMessage (randomId);

Before we execute the program, we need to develop the application class.

Understanding the FreePastryApplication class

The FreePastryApplication class implements the Application interface and the functionality of the node. The constructor creates and registers an Endpoint instance and initializes a message. Then, the Endpoint instance is used by the node to send messages. The class and constructor are shown here:

public class FreePastryApplication implements Application {
protected Endpoint endpoint;
private final String message;
private final String instance = " Instance ID";

public FreePastryApplication(Node node) {
 this.endpoint = node.buildEndpoint(this, instance);
   this.message = "Hello there! from Instance: "
    + instance + " Sent at: [" + getCurrentTime()
     + "]";
   this.endpoint.register();
}

...
}

You may get a leaking this in constructor warning when this code is compiled. This is caused by a reference to the constructor object being passed as an argument to the buildEndpoint method using the keyword this. This is potentially bad practice because the object may not be fully constructed when it is passed, and another thread may try to do something with the object before it is ready. It is not as much of a problem if it is passed to a package-private method that performs common initialization. In this situation, it is not likely to cause problems.

The Application interface requires that three methods be implemented:

  • deliver: This is called when a message is received
  • forward: This is used to forward a message
  • update: This informs the application that a node has joined or left a set of local nodes

We are only interested in the deliver method for this application. In addition, we will add the getCurrentTime and routeMessage methods to the application. We will use getCurrentTime to show the time that our messages are sent and arrive. The routeMessage method will send a message to another node.

The getCurrentTime method will follow. It uses the EndPoint object to access the node's environment and then the time:

   private long getCurrentTime() {
    return this.endpoint
     .getEnvironment()
       .getTimeSource()
       .currentTimeMillis();
   }

The routeMessage method is passed through the identifier of the destination node. The message text is constructed by adding endpoint and time information. A PastryMessage instance is created using the endpoint identifier and message text. The route method is then called to send the message. Take a look at the following code:

   public void routeMessage(Id id) {
    System.out.println(
      "Message Sent\n\tCurrent Node: " +
        this.endpoint.getId()
         + "\n\tDestination: " + id
         + "\n\tTime: " + getCurrentTime());
     Message msg = new PastryMessage(endpoint.getId(),
      id, message);
     endpoint.route(id, msg, null);
   }

When a message is received by a node, the deliver method is invoked. The implementation of this method will then follow. The endpoint identifier, message, and time of arrival are displayed. This will help us understand how messages are sent and received. Take a look at the following code:

   public void deliver(Id id, Message message) {
    System.out.println("Message Received\n\tCurrent Node: "
      + this.endpoint.getId() + "\n\tMessage: "
       + message + "\n\tTime: " + getCurrentTime());
   }

The PastryMessage class implements the Message interface, as shown in the following code. The constructor accepts the destination, source, and message:

public class PastryMessage implements Message {
private final Id from;
private final Id to;
private final String messageBody;

public PastryMessage(Id from, Id to, String messageBody) {
   this.from = from;
   this.to = to;
   this.messageBody = messageBody;
}

...
}

The Message interface possesses a single getPriority method that needs to be overridden. Here, we will return a low priority so that it does not interfere with the underlying P2P maintenance traffic:

public int getPriority() {
   return Message.LOW_PRIORITY;
}

The toString method is overridden to provide a more detailed description of the message, as shown in the following code:

public String toString() {
   return "From: " + this.from
    + " To: " + this.to
     + " [" + this.messageBody + "]";
}

Now, we are ready to execute the example. Execute the FreePastryExample class. The initial output will consist of the following. The abbreviated node identifier is displayed, which in this case is <0xB36864..>. The identifier you get will be different:

InetAddress: /192.168.1.14 Node <0xB36864..> created

After a pause, a message is sent and subsequently received by the current node. This message is created in the FreePastryExample class using the code repeated here for your convenience:

   Id randomId = nidFactory.generateNodeId();
   application.routeMessage(randomId);

A random identifier is used because we do not have a specific node to send the message to. When the message is sent, the following output is generated. The random identifier for this run is <0x83C7CD..>:

Message Sent
Current Node: <0xB36864..>
Destination: <0x83C7CD..>
Time: 1441844742906
Message Received
Current Node: <0xB36864..>
Message: From: <0xB36864..> To: <0x83C7CD..> [Hello there! from Instance: Instance ID Sent at: [1441844732905]]
Time: 1441844742915

The time between the sending and receiving of the message is minimal. If a larger set of nodes comprised the P2P network, more significant delays would show up.

In the previous output, the node addresses are truncated. We can use the toStringFull method, as shown here, to get the full address:

   System.out.println("Node " + node.getId().toStringFull()
    + " created");

This will produce an output similar to the following:

Node B36864DE0C4F9E9C1572CBCC095D585EA943B1B4 created

We did not provide a specific address for our messages; instead, we randomly generated addresses. This application demonstrates the basic elements of a FreePastry application. Additional layers are used to facilitate communication between nodes, such as the publisher/provider paradigm supported by Scribe.

We can start a second node using the same program but will need to use a different bind port to avoid binding conflicts. The message sent by either node will not necessarily be received by the other node. This is the result of routes generated by FreePastry.

Sending a message to a specific node

To send a message directly to a node, we need its identifier. To get a remote node's identifier, we need to use a leaf set. This collection is not strictly a set because for small networks, such as the one we are using, the same node may appear twice.

The LeafSet class represents this collection and has a get method that returns a NodeHandle instance for each node. We can send messages to nodes if we have this node handle.

To demonstrate this approach, add the following method to the FreePastryApplication class. This is similar to the routeMessage method but uses a node handle as the argument of the route method:

   public void routeMessageDirect(NodeHandle nh) {
    System.out.println("Message Sent Direct\n\tCurrent Node: "
      + this.endpoint.getId() + " Destination: " + nh
       + "\n\tTime: " + getCurrentTime());
     Message msg =
      new PastryMessage(endpoint.getId(), nh.getId(),
       "DIRECT-" + message);
     endpoint.route(null, msg, nh);
   }

Add the following sequences of code to the end of the FreePastryExample constructor. Optionally, comment out the previous code that uses the routeMessage method. First, we will pause for 10 seconds to allow other nodes to join the network:

   environment.getTimeSource().sleep(10000);

Next, we will create an instance of the LeafSet class. The getUniqueSet method returns the leaf set, which excludes the current node. A for-each statement will then use routeMessageDirect to send the message to the nodes of the collection:

   LeafSet leafSet = node.getLeafSet();
   Collection<NodeHandle> collection = leafSet.getUniqueSet();
   for (NodeHandle nodeHandle : collection) {
    application.routeMessageDirect(nodeHandle);
     environment.getTimeSource().sleep(1000);
   }

Start the FreePastryExample class using a bind port of 9001. Then, change the bind port to 9002 and start the class a second time. After several seconds, you will see an output similar to the following. The first set of output corresponds to the first instance of the application, while the second set corresponds to the second instance. Each instance will send one message to the other instance. Note the time stamps used when the messages are sent and received:

InetAddress: /192.168.1.9
Node <0xA5BFDA..> created
Message Sent Direct
Current Node: <0xA5BFDA..> Destination: [SNH: <0x2C6D18..>//192.168.1.9:9002]
Time: 1441849240310
Message Received
Current Node: <0xA5BFDA..>
Message: From: <0x2C6D18..> To: <0xA5BFDA..> [DIRECT-Hello there! from Instance: Instance ID Sent at: [1441849224879]]
Time: 1441849245038

InetAddress: /192.168.1.9
Node <0x2C6D18..> created
Message Received
Current Node: <0x2C6D18..>
Message: From: <0xA5BFDA..> To: <0x2C6D18..> [DIRECT-Hello there! from Instance: Instance ID Sent at: [1441849220308]]
Time: 1441849240349
Message Sent Direct
Current Node: <0x2C6D18..> Destination: [SNH: <0xA5BFDA..>//192.168.1.9:9001]
Time: 1441849245020

There is a lot more to FreePastry than we are able to illustrate here. However, the examples provide a feel for the nature of P2P application development. Other P2P frameworks work in a similar manner.

Summary

In this article, we saw how a distributed hash table supports identifying and locating nodes in a network. A routing algorithm uses this table to fulfill requests by sending messages between nodes. We demonstrated the JDHT to illustrate the used of DHTs.

There are several open source Java-based P2P frameworks available. We used FreePastry to demonstrate how P2P networks work. Specifically, we showed how nodes join a network and how messages are sent between nodes. This provides you with a better understanding of how these frameworks function.

Resources for Article:


Further resources on this subject:


You've been reading an excerpt of:

Learning Network Programming with Java

Explore Title