Infinispan Data Grid Platform

By Francesco Marchioni , Manik Surtani
  • Instant online access to over 7,500+ books and videos
  • Constantly updated with 100+ new titles each month
  • Breadth and depth in over 1,000+ technologies

About this book

In today's competitive business world, Enterprise systems must be able to deliver highly available, high transaction volumes with an increasing number of users. Infinispan enables you to do this as well as share and distribute data among servers in the most efficient way possible so that you achieve faster response times, while trying to avoid single points of failure.

Infinispan Data Grid Platform will teach you the most important concepts for building Enterprise applications. Using Infinispan will give you a decisive competitive advantage over the standard clustered applications that are typical in the enterprise today. This, the only book to cover Infinispan, offers detailed instructions for installing, configuring, and effectively using the Infinispan platform. You will learn how to utilize and make the most out of every feature of its API.

Progress from examples of adding, removing, and evicting data from a cache, to more complex scenarios such as clustering and distributing data more efficiently in the grid. Throughout the book, you will follow a simple example of an API using a ticket booking system, which will help you to learn how to set up robust and scalable Infinispan configurations. You will also see a complete demonstration of integrating the Infinispan data grid platform with JBoss AS 7.

Publication date:
August 2012
Publisher
Packt
Pages
150
ISBN
9781849518222

 

Chapter 1. Installing Infinispan

This chapter will introduce the Infinispan data grid platform, starting with two obvious questions: What is a data grid? and Why do we need to learn about it? We will cover the following topics in detail:

  • Data grid definition and its conceptual background

  • Installing the Infinispan platform and a common set of tools that will help us in the development process

 

What is a data grid?


Data growth is one of the biggest challenges faced by today's organizations. It's a fact that the amount of data being used by applications is growing in size, mostly due to the inherent complexity of new Enterprise systems.

In such a scenario, traditional, centralized solutions for storing and retrieving data are not feasible for a set of reasons: first, they are not able to scale as needed, and then they are not suitable to fully address availability and efficiency at such a large scale. For these reasons, many vendors are turning to data grid products, which are a form of middleware that can be used to store a large set of data across distributed applications, over a network.

The most evident benefits of a data grid solution can be summarized in a set of key factors, as follows:

  • Large data set: Data grids are specifically designed to read and distribute huge sets of data across a set of servers communicating over a network, forming a cluster.

  • Heterogeneity: The data grid environment is intrinsically heterogeneous both from the software and hardware points of view. This requires a middleware that is able to deal with diverse platforms and storage systems.

  • Scalability: The data grid is optimized to scale out in environments that produce and use huge amounts of data.

In this picture, we introduce the Infinispan project, which is an open source data grid solution written in Java, providing all the features (and more!) that we have just mentioned. The Infinispan project has grown out of the experiences gained with JBoss Cache, the former JBoss AS caching solution; however, it is in no way related to or dependent on JBoss Cache. As a matter of fact, there are several differences between JBoss Cache and Infinispan, the most significant one being the scope. Also, JBoss Cache was focused on being a clustered caching library, whereas Infinispan is a data grid platform, complete with GUI management tooling and the potential to scale to thousands of nodes.

By automatically and dynamically partitioning the data in memory across multiple servers, Infinispan enables continuous data availability and transactional integrity, even in the event of server failure.

The concept of a data grid might be a little difficult to perceive at the beginning because developers are usually accustomed to dealing with simpler entities, such as Hashtables, to cache their data. We will, therefore, start our journey from this simple point of view, and then we will smoothly demystify all the functionalities that make a caching system a data grid.

 

Introducing Infinispan as a cache


The term cache is generally used to refer to a component that temporarily stores data that is hard to calculate or expensive to retrieve in memory, so that future requests for that data can be served faster. Programmers often employ data structures such as Hashtables or ArrayLists, to maintain in-memory data which is frequently read by your applications.

However, simple java.util packages are often too basic to cache your data effectively; and to address this, a Java Specification Request (JSR-107) has been created to define a temporary caching API for the Java platform.

Its primary interface is javax.cache.Cache, which is similar to java.util.ConcurrentMap, with some modifications for distributed environments. In particular, it adds the ability to register, deregister, and list event listeners, and it defines a CacheLoader interface for loading/storing cached data. Cache instances can be retrieved using an appropriate CacheManager, which represents a collection of caches.

Going beyond JSR-107

Although JSR-107 defines some standard APIs for storing and managing data in a local or distributed cache across several nodes, certain APIs specific to a distributed data structure are missing. For example, there is no Future-based asynchronous API.

The JSR-107 specification does not define a mechanism to configure caches. As such, implementations have their own proprietary configuration mechanisms.

With Infinispan, you can either use its simple XML configuration file or define the configuration programmatically. Chapter 3, Introducing Infinispan Configuration, and Chapter 4, Developing Advanced Configurations, of this book will describe in detail how to configure Infinispan, either in a standalone environment or in a clustered distribution.

As far as replication of data is concerned, it would be worthwhile for a data grid to both full and partial replication of data, in both synchronous and asynchronous manner.

In a fully replicated mode (known simply as replicated in Infinispan), all nodes in a cluster hold copies of all entries (if an entry exists on one node, it will also exist on all the other nodes). In a partially replicated mode (known as distributed in Infinispan), a fixed number of copies are maintained to provide redundancy and fault tolerance, regardless of cluster size. This is typically far fewer than the number of nodes in the cluster. A partially replicated data grid provides a far greater degree of scalability than a fully replicated one. It is thus the recommended clustering mode in Infinispan.

Finally, invalidation is a clustered mode that does not actually share any data at all, but simply aims to remove data that may be stale from remote caches (see Chapter 4, Developing Advanced Configurations, for a detailed discussion about cache modes).

Additionally, with Infinispan, you can also use asynchronous methods to add/remove entries from the cache by using the non-blocking API; this API combines the advantage of non-blocking calls with the ability to handle communication failures and exceptions.

In conclusion, Infinispan exposes a JSR-107-compatible cache interface in which you can store data, and enhances it by providing additional APIs and features.

 

Installing the required software


Having introduced the basics of Infinispan, we will now wet our feet by installing the Infinispan platform.

Installing Java SE

The first mandatory requirement is to install a JDK 1.6/JDK 1.7 environment. The Java SE download site can be located at http://www.oracle.com/technetwork/java/javase/downloads/index.html.

Choose to download either Java SE 6 or Java SE 7, and install it. If you don't know how to install it, please take a look at the following link:

http://docs.oracle.com/javase/7/docs/webnotes/install/index.html

Testing the installation

Once you have completed your installation, run java -version to verify that it is correctly installed:

C:\Windows>java -version
java version "1.7.0_02"
Java(TM) SE Runtime Environment (build 1.7.0_02-b13)
Java HotSpot(TM) Client VM (build 22.0-b10, mixed mode, sharing)

Installing Maven

The examples contained in this book can be executed from within any development environment of your choice. We will not cover the steps required to install these tools, which in most cases require as little as following a guided wizard procedure.

On the other hand, we would like to describe the installation of Apache Maven, a popular software build and release tool. By using Maven, you will enjoy:

  • A standard structure for all your projects

  • A centralized and automatic management of dependencies

Maven is distributed in several formats, for your convenience, and can be downloaded from http://maven.apache.org/download.html.

Once the download is complete, unzip the distribution archive (for example, apache-maven-3.0.4-bin.zip) to the directory in which you wish to install Maven 3.0.4 (or the latest available version), for example C:\apache-maven-3.0.4.

Once done, add the M2_HOME environment variable to your system, so that it will point to the folder where Maven has been unpacked.

Next, update the PATH environment variable by adding the Maven binaries to your system path. For example, on the Windows platform, you should include %M2_HOME%/bin, in order to make Maven available in the command line.

Testing the installation

Once you have completed your installation, run mvn --version, to verify that Maven has been correctly installed:

mvn --version
Apache Maven 3.0.4 (r1075438; 2011-02-28 18:31:09+0100)
Maven home: C:\apache-maven-3.0.4\bin\..
Java version: 1.6.0, vendor: Sun Microsystems Inc.
Java home: C:\Programmi\Java\jdk1.6.0\jre
Default locale: it_IT, platform encoding: Cp1252
OS name: "windows xp", version: "5.1", arch: "x86", family: "windows"

Installing Infinispan

The Infinispan platform can be freely downloaded from the JBoss community site, http://www.jboss.org/infinispan/downloads.html. In this book, we will target the 5.1 release, named Brahma.

As with most Java libraries, Infinispan does not require running an installer; just unzip the archive in a folder. Let's have a look at its content once you have unpacked the distribution:

+---bin
+---doc
+---etc
+---config-samples
+---lib
+---licenses
+---modules
+---cachestores
+---demos
+---hotrod
+---hotrod-client
+---lucene-directory
+---memcached
+---query
+---rhq-plugin
+---spring
+---tree
+---websocket
infinispan-core.jar

The bin folder contains a few batch scripts that can be used to manage Infinispan.

The most interesting ones are:

  • startServer.bat/startServer.sh, which can be used to start Infinispan in a standalone JVM process (more about that in the next chapter).

  • importConfig.bat/importConfig.sh, which can be used to migrate JBoss Cache configurations into the Infinispan configuration file.

  • runGuiDemo.bat/runGuiDemo.sh, which can be used to test Infinispan's cache using a Java Swing-based demo. We will use it in the next section of this chapter to test the Infinispan installation.

The doc folder contains the Javadocs API documentation for Infinispan.

The etc directory contains the XML schemas for the Infinispan configuration file (infinispan-5.1.xsd) along with some sample configuration files.

The lib folder contains some additional libraries that need to be on your classpath (or packaged with your deployment), along with the main library infinispan-core.jar, which can be located at the root of the Infinispan archive.

The modules directory contains a number of optional modules (such as the Infinispan query module or REST interface). In order to use them, you will need to add the module's JAR file and all of its dependencies (modules/MODULE_NAME/lib) to be on your classpath, in addition to Infinispan's JARs mentioned earlier.

Finally, the licenses directory contains the licenses for some of the other libraries shipped with the distribution and not covered by the LGPL-2.1 license, such as Apache's Lucene libraries.

Extending Infinispan with its additional modules

Most Infinispan functionalities are provided by the core library (infinispan-core.jar), which is placed at the root of the distribution. Infinispan, however, has a highly extensible architecture, making it easy to plug in additional extensions. By default, the Infinispan distribution ships with a set of additional modules, which are contained in the modules folder of the distribution.

The cacheStore module, for example, allows Infinispan to store cached data in a persistent location, such as a shared JDBC database, a local filesystem, among others.

The hotrod folder contains a server module featuring a custom binary protocol for communicating with a remote Infinispan cluster.

Note

The hotrod wire protocol is designed to enable faster client/server interactions and also allows clients to make more intelligent decisions with regards to load balancing, failover, and even data location operations. The next chapter shows an example of a remote client connecting to an Infinispan server, using the hotrod protocol.

Another server endpoint is found in the memcached directory. This allows memcached clients to talk to one or more Infinispan servers using the memcached wire protocol.

Note

The memcached wire protocol was developed as a means to connect to the simple, non-clustered memcached caching daemon. As a simple cache, memcached was designed to speed up access to dynamic database-driven websites by caching data and objects in memory, to reduce the number of times an external data source must be read. It is currently adopted in most of the popular social networking sites, such as Facebook or Twitter.

Next, the lucene-directory folder contains a highly scalable Apache Lucene directory implementation that can be used to provide reliable index sharing across the cluster.

The query module, also making use of Apache Lucene in combination with Hibernate Search, adds querying capabilities to Infinispan. This module allows users to search for stored data without access to a specific key. So, you can now search for your data based on their attributes (for example, the tickets sold in one country).

The rhq-plugin directory contains libraries that can be used to manage multiple Infinispan instances using the the web-based, open source RHQ management platform. Thanks to RHQ's agent and auto-discovery capabilities, monitoring both CacheManager and Cache instances is a very simple task (see Chapter 5, Monitoring Infinispan, for more information about managing Infinispan with RHQ agent).

Another addition can be located in the spring folder. The libraries contained in this folder allow you to use Infinispan as a Spring Cache API instead of the default implementations, which ship with Spring.

Finally, the tree folder contains Infinispan's tree API module, which offers clients the possibility of storing data using a hierarchical (tree-like) structured API. This API is similar to the one provided by JBoss Cache, hence the tree module is perfect for those users wanting to migrate their applications from JBoss Cache to Infinispan.

Note

If you feel more adventurous, you can even develop your own extensions to let you extend Infinispan beyond its core use case. See this resource for more information about it:

https://docs.jboss.org/author/display/ISPN/Extending+Infinispan

Testing the installation with the GUI demo

As we mentioned, Infinispan ships with a GUI demo that can be used to test its basic caching functionalities, either as a standalone server or as a clustered server. We will use it to test our environment without the need to write code:

  1. 1. Move into the bin folder and execute the runGuiDemo.bat/runGuiDemo.sh script.

    Note

    Please note that this GUI demo uses the configuration file named gui-demo-cache-config.xml, contained in infinispan-gui-demo.jar. You can switch to another configuration file by passing the -Dinfinispan.configuration.file parameter to the start demo script.

  2. 2. Once launched, the GUI will display the following frame:

  3. 3. In order to get started, hit the Start Cache button, which will instantiate a cache.

  4. 4. Move to the Manipulate data tab. In this tab, you can perform CRUD operations on the cache, including generating bulk random inserts. In the following example, we are adding a sample entry in the cache by filling the Key and Value textboxes and hitting the Go button:

  5. 5. Once added, the Demo GUI will switch automatically to the Data view tab, which will display the list of entries contained in the cache.

    The Infinispan GUI demo application is a simple but effective example application that starts a standalone Infinispan cache and is also able to distribute the cache data across a cluster of JVMs. You can check out the cluster functionalities by starting several demo GUIs and verifying that data is distributed across the other JVMs.

 

Summary


In this chapter, we have described the requirements for a highly available and fast access to data in today's Enterprise class systems, and the common solution involving in-memory data grids.

Although JSR-107 sets some standards for distributed cached data, it is likely that Enterprise services will need a much more complete approach, often resorting to vendor-specific extensions that reside outside the standard JSR-107 APIs.

Infinispan already provides all of what JSR-107 requires, and much more, by using its core libraries and the additional modules that are contained in the modules folder of the platform distribution.

Next, we covered how to install Infinispan and a set of tools (J2SE and Maven) that are needed to develop applications with this platform.

In the next chapter, we will familiarize ourselves with Infinispan API by running our first code samples, which will grow more complex as we go through the other chapters of this book.

About the Authors

  • Francesco Marchioni

    Francesco Marchioni is a Red Hat Certified JBoss Administrator (RHCJA) and Sun Certified Enterprise Architect working at Red Hat in Rome, Italy. He started learning Java in 1997, and since then he has followed the path to the newest Application Program Interfaces released by Sun. In 2000, he joined the JBoss community when the application server was running the 2.X release.

    He has spent years as a software consultant, where he has envisioned many successful software migrations from vendor platforms to open source products, such as JBoss AS, fulfilling the tight budget requirements of current times.

    Over the last 10 years, he has authored many technical articles for OReilly Media and ran an IT portal focused on JBoss products (http://www.mastertheboss.com).

    Browse publications by this author
  • Manik Surtani

    Manik Surtani is a core R&D engineer at JBoss, a division of Red Hat. He is the founder of the Infinispan project, which he currently leads. He is also the spec lead of JSR 347 (Data Grids for the Java Platform), and represents Red Hat on the Expert Group of JSR 107 (Temporary caching for Java). His interests lie in cloud and distributed computing, big data and NoSQL, autonomous systems and highly available computing. He has a background in artificial intelligence and neural networks, highly available e-commerce systems and enterprise Java. Surtani is a strong proponent of open source development methodologies, ethos, and collaborative processes, and has been involved in open source since his first forays into computing.

    Browse publications by this author