This chapter will introduce the Infinispan data grid platform, starting with two obvious questions: What is a data grid? and Why do we need to learn about it? We will cover the following topics in detail:
Data grid definition and its conceptual background
Installing the Infinispan platform and a common set of tools that will help us in the development process
Data growth is one of the biggest challenges faced by today's organizations. It's a fact that the amount of data being used by applications is growing in size, mostly due to the inherent complexity of new Enterprise systems.
In such a scenario, traditional, centralized solutions for storing and retrieving data are not feasible for a set of reasons: first, they are not able to scale as needed, and then they are not suitable to fully address availability and efficiency at such a large scale. For these reasons, many vendors are turning to data grid products, which are a form of middleware that can be used to store a large set of data across distributed applications, over a network.
Heterogeneity: The data grid environment is intrinsically heterogeneous both from the software and hardware points of view. This requires a middleware that is able to deal with diverse platforms and storage systems.
In this picture, we introduce the Infinispan project, which is an open source data grid solution written in Java, providing all the features (and more!) that we have just mentioned. The Infinispan project has grown out of the experiences gained with JBoss Cache, the former JBoss AS caching solution; however, it is in no way related to or dependent on JBoss Cache. As a matter of fact, there are several differences between JBoss Cache and Infinispan, the most significant one being the scope. Also, JBoss Cache was focused on being a clustered caching library, whereas Infinispan is a data grid platform, complete with GUI management tooling and the potential to scale to thousands of nodes.
By automatically and dynamically partitioning the data in memory across multiple servers, Infinispan enables continuous data availability and transactional integrity, even in the event of server failure.
The concept of a data grid might be a little difficult to perceive at the beginning because developers are usually accustomed to dealing with simpler entities, such as Hashtables, to cache their data. We will, therefore, start our journey from this simple point of view, and then we will smoothly demystify all the functionalities that make a caching system a data grid.
The term cache is generally used to refer to a component that temporarily stores data that is hard to calculate or expensive to retrieve in memory, so that future requests for that data can be served faster. Programmers often employ data structures such as Hashtables or ArrayLists, to maintain in-memory data which is frequently read by your applications.
java.util packages are often too basic to cache your data effectively; and to address this, a Java Specification Request (JSR-107) has been created to define a temporary caching API for the Java platform.
Its primary interface is
javax.cache.Cache, which is similar to
java.util.ConcurrentMap, with some modifications for distributed environments. In particular, it adds the ability to register, deregister, and list event listeners, and it defines a CacheLoader interface for loading/storing cached data. Cache instances can be retrieved using an appropriate CacheManager, which represents a collection of caches.
Although JSR-107 defines some standard APIs for storing and managing data in a local or distributed cache across several nodes, certain APIs specific to a distributed data structure are missing. For example, there is no Future-based asynchronous API.
The JSR-107 specification does not define a mechanism to configure caches. As such, implementations have their own proprietary configuration mechanisms.
With Infinispan, you can either use its simple XML configuration file or define the configuration programmatically. Chapter 3, Introducing Infinispan Configuration, and Chapter 4, Developing Advanced Configurations, of this book will describe in detail how to configure Infinispan, either in a standalone environment or in a clustered distribution.
As far as replication of data is concerned, it would be worthwhile for a data grid to both full and partial replication of data, in both synchronous and asynchronous manner.
In a fully replicated mode (known simply as replicated in Infinispan), all nodes in a cluster hold copies of all entries (if an entry exists on one node, it will also exist on all the other nodes). In a partially replicated mode (known as distributed in Infinispan), a fixed number of copies are maintained to provide redundancy and fault tolerance, regardless of cluster size. This is typically far fewer than the number of nodes in the cluster. A partially replicated data grid provides a far greater degree of scalability than a fully replicated one. It is thus the recommended clustering mode in Infinispan.
Finally, invalidation is a clustered mode that does not actually share any data at all, but simply aims to remove data that may be stale from remote caches (see Chapter 4, Developing Advanced Configurations, for a detailed discussion about cache modes).
Additionally, with Infinispan, you can also use asynchronous methods to add/remove entries from the cache by using the non-blocking API; this API combines the advantage of non-blocking calls with the ability to handle communication failures and exceptions.
In conclusion, Infinispan exposes a JSR-107-compatible cache interface in which you can store data, and enhances it by providing additional APIs and features.
Having introduced the basics of Infinispan, we will now wet our feet by installing the Infinispan platform.
The first mandatory requirement is to install a JDK 1.6/JDK 1.7 environment. The Java SE download site can be located at http://www.oracle.com/technetwork/java/javase/downloads/index.html.
Choose to download either Java SE 6 or Java SE 7, and install it. If you don't know how to install it, please take a look at the following link:
The examples contained in this book can be executed from within any development environment of your choice. We will not cover the steps required to install these tools, which in most cases require as little as following a guided wizard procedure.
On the other hand, we would like to describe the installation of Apache Maven, a popular software build and release tool. By using Maven, you will enjoy:
A standard structure for all your projects
A centralized and automatic management of dependencies
Maven is distributed in several formats, for your convenience, and can be downloaded from http://maven.apache.org/download.html.
Once the download is complete, unzip the distribution archive (for example,
apache-maven-3.0.4-bin.zip) to the directory in which you wish to install Maven 3.0.4 (or the latest available version), for example
Once done, add the
M2_HOME environment variable to your system, so that it will point to the folder where Maven has been unpacked.
Next, update the
PATH environment variable by adding the Maven binaries to your system path. For example, on the Windows platform, you should include
%M2_HOME%/bin, in order to make Maven available in the command line.
mvn --version Apache Maven 3.0.4 (r1075438; 2011-02-28 18:31:09+0100) Maven home: C:\apache-maven-3.0.4\bin\.. Java version: 1.6.0, vendor: Sun Microsystems Inc. Java home: C:\Programmi\Java\jdk1.6.0\jre Default locale: it_IT, platform encoding: Cp1252 OS name: "windows xp", version: "5.1", arch: "x86", family: "windows"
The Infinispan platform can be freely downloaded from the JBoss community site, http://www.jboss.org/infinispan/downloads.html. In this book, we will target the 5.1 release, named Brahma.
+---bin +---doc +---etc +---config-samples +---lib +---licenses +---modules +---cachestores +---demos +---hotrod +---hotrod-client +---lucene-directory +---memcached +---query +---rhq-plugin +---spring +---tree +---websocket infinispan-core.jar
bin folder contains a few batch scripts that can be used to manage Infinispan.
The most interesting ones are:
startServer.bat/startServer.sh, which can be used to start Infinispan in a standalone JVM process (more about that in the next chapter).
importConfig.bat/importConfig.sh, which can be used to migrate JBoss Cache configurations into the Infinispan configuration file.
runGuiDemo.bat/runGuiDemo.sh, which can be used to test Infinispan's cache using a Java Swing-based demo. We will use it in the next section of this chapter to test the Infinispan installation.
lib folder contains some additional libraries that need to be on your
classpath (or packaged with your deployment), along with the main library
infinispan-core.jar, which can be located at the root of the Infinispan archive.
modules directory contains a number of optional modules (such as the Infinispan query module or REST interface). In order to use them, you will need to add the module's JAR file and all of its dependencies (
modules/MODULE_NAME/lib) to be on your
classpath, in addition to Infinispan's JARs mentioned earlier.
licenses directory contains the licenses for some of the other libraries shipped with the distribution and not covered by the LGPL-2.1 license, such as Apache's Lucene libraries.
Most Infinispan functionalities are provided by the core library (
infinispan-core.jar), which is placed at the root of the distribution. Infinispan, however, has a highly extensible architecture, making it easy to plug in additional extensions. By default, the Infinispan distribution ships with a set of additional modules, which are contained in the
modules folder of the distribution.
The hotrod wire protocol is designed to enable faster client/server interactions and also allows clients to make more intelligent decisions with regards to load balancing, failover, and even data location operations. The next chapter shows an example of a remote client connecting to an Infinispan server, using the hotrod protocol.
Another server endpoint is found in the
memcached directory. This allows memcached clients to talk to one or more Infinispan servers using the memcached wire protocol.
The memcached wire protocol was developed as a means to connect to the simple, non-clustered memcached caching daemon. As a simple cache, memcached was designed to speed up access to dynamic database-driven websites by caching data and objects in memory, to reduce the number of times an external data source must be read. It is currently adopted in most of the popular social networking sites, such as Facebook or Twitter.
lucene-directory folder contains a highly scalable Apache Lucene directory implementation that can be used to provide reliable index sharing across the cluster.
query module, also making use of Apache Lucene in combination with Hibernate Search, adds querying capabilities to Infinispan. This module allows users to search for stored data without access to a specific key. So, you can now search for your data based on their attributes (for example, the tickets sold in one country).
rhq-plugin directory contains libraries that can be used to manage multiple Infinispan instances using the the web-based, open source RHQ management platform. Thanks to RHQ's agent and auto-discovery capabilities, monitoring both CacheManager and Cache instances is a very simple task (see Chapter 5, Monitoring Infinispan, for more information about managing Infinispan with RHQ agent).
Another addition can be located in the
spring folder. The libraries contained in this folder allow you to use Infinispan as a Spring Cache API instead of the default implementations, which ship with Spring.
tree folder contains Infinispan's tree API module, which offers clients the possibility of storing data using a hierarchical (tree-like) structured API. This API is similar to the one provided by JBoss Cache, hence the tree module is perfect for those users wanting to migrate their applications from JBoss Cache to Infinispan.
If you feel more adventurous, you can even develop your own extensions to let you extend Infinispan beyond its core use case. See this resource for more information about it:
As we mentioned, Infinispan ships with a GUI demo that can be used to test its basic caching functionalities, either as a standalone server or as a clustered server. We will use it to test our environment without the need to write code:
1. Move into the
binfolder and execute the
2. Once launched, the GUI will display the following frame:
4. Move to the Manipulate data tab. In this tab, you can perform CRUD operations on the cache, including generating bulk random inserts. In the following example, we are adding a sample entry in the cache by filling the Key and Value textboxes and hitting the Go button:
The Infinispan GUI demo application is a simple but effective example application that starts a standalone Infinispan cache and is also able to distribute the cache data across a cluster of JVMs. You can check out the cluster functionalities by starting several demo GUIs and verifying that data is distributed across the other JVMs.
In this chapter, we have described the requirements for a highly available and fast access to data in today's Enterprise class systems, and the common solution involving in-memory data grids.
Although JSR-107 sets some standards for distributed cached data, it is likely that Enterprise services will need a much more complete approach, often resorting to vendor-specific extensions that reside outside the standard JSR-107 APIs.
Infinispan already provides all of what JSR-107 requires, and much more, by using its core libraries and the additional modules that are contained in the
modules folder of the platform distribution.
Next, we covered how to install Infinispan and a set of tools (J2SE and Maven) that are needed to develop applications with this platform.
In the next chapter, we will familiarize ourselves with Infinispan API by running our first code samples, which will grow more complex as we go through the other chapters of this book.