Home Programming IBM WebSphere eXtreme Scale 6

IBM WebSphere eXtreme Scale 6

By Anthony Chaves
books-svg-icon Book
eBook $39.99
Print $65.99
Subscription $15.99
$10 p/m for first 3 months. $15.99 p/m after that. Cancel Anytime!
What do you get with a Packt Subscription?
This book & 7000+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook + Subscription?
Download this book in EPUB and PDF formats, plus a monthly download credit
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook?
Download this book in EPUB and PDF formats
Access this title in our online reader
DRM FREE - Read whenever, wherever and however you want
Online reader with customised display settings for better reading experience
What do you get with video?
Download this video in MP4 format
Access this title in our online reader
DRM FREE - Watch whenever, wherever and however you want
Online reader with customised display settings for better learning experience
What do you get with video?
Stream this video
Access this title in our online reader
DRM FREE - Watch whenever, wherever and however you want
Online reader with customised display settings for better learning experience
What do you get with Audiobook?
Download a zip folder consisting of audio files (in MP3 Format) along with supplementary PDF
What do you get with Exam Trainer?
Flashcards, Mock exams, Exam Tips, Practice Questions
Access these resources with our interactive certification platform
Mobile compatible-Practice whenever, wherever, however you want
BUY NOW $10 p/m for first 3 months. $15.99 p/m after that. Cancel Anytime!
eBook $39.99
Print $65.99
Subscription $15.99
What do you get with a Packt Subscription?
This book & 7000+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook + Subscription?
Download this book in EPUB and PDF formats, plus a monthly download credit
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook?
Download this book in EPUB and PDF formats
Access this title in our online reader
DRM FREE - Read whenever, wherever and however you want
Online reader with customised display settings for better reading experience
What do you get with video?
Download this video in MP4 format
Access this title in our online reader
DRM FREE - Watch whenever, wherever and however you want
Online reader with customised display settings for better learning experience
What do you get with video?
Stream this video
Access this title in our online reader
DRM FREE - Watch whenever, wherever and however you want
Online reader with customised display settings for better learning experience
What do you get with Audiobook?
Download a zip folder consisting of audio files (in MP3 Format) along with supplementary PDF
What do you get with Exam Trainer?
Flashcards, Mock exams, Exam Tips, Practice Questions
Access these resources with our interactive certification platform
Mobile compatible-Practice whenever, wherever, however you want
  1. Free Chapter
    What is a Data Grid
About this book
A data grid is a means of combining computing resources. Data grids provide a way to distribute object storage and add capacity on demand in the form of CPU, memory, and network resources from additional servers. All three resource types play an important role in how fast data can be processed, and how much data can be processed at once. WebSphere eXtreme Scale provides a solution to scalability issues through caching and grid technology. Working with a data grid requires new approaches to writing highly scalable software; this book covers both the practical eXtreme Scale libraries and design patterns that will help you build scalable software. Starting with a blank slate, this book assumes you don't have experience with IBM WebSphere eXtreme Scale. It is a tutorial-style guide detailing the installation of WebSphere eXtreme Scale right through to using the developer libraries. It covers installation and configuration, and discusses the reasons why a data grid is a viable middleware layer. It also covers many different ways of interacting with objects in eXtreme Scale. It will also show you how to use eXtreme Scale in new projects, and integrate it with relational databases and existing applications. This book covers the ObjectMap, Entity, and Query APIs for interacting with objects in the grid. It shows client/server configurations and interactions, as well as the powerful DataGrid API. DataGrid allows us to send code into the grid, which can be run where the data lives. Equally important are the design patterns that go alongside using a data grid. This book covers the major concepts you need to know that prevent your client application from becoming a performance bottleneck. By the end of the book, you'll be able to write software using the eXtreme Scale APIs, and take advantage of a linearly scalable middleware layer.
Publication date:
November 2009
Publisher
Packt
Pages
292
ISBN
9781847197443

 

Chapter 1. What is a Data Grid

We have many software packages which make up the so-called "middleware" layer. Application servers, message brokers, enterprise service buses, and caching packages are examples of this middleware layer that powers an application. The last few years have seen the introduction of more powerful caching solutions that can also execute code on objects stored in the cache. The combination of a shared cache and executable code spread over many processes is a data grid.

Caching data is an important factor in making an application feel more responsive, or finish a request more quickly. As we favor horizontal scale-out more, we have many different processes sharing the same source data. In order to increase processing speed, we cache data in each process. This leads to data duplication. Sharing a cache between processes lets us cache a larger data set versus duplicating cached data in each process. A common example of a shared cache program is the popular Memcached. A shared cache moves the cache out of the main application process and into a dedicated process for caching. However, we trade speed of access for caching a larger data set, this trade is acceptable when using larger data sets.

Typically, our applications pull data from a data source such as a relational database and perform some operations on it. When we're done, we write the changes back to the data source. The cost of moving data between the data source and the point where we execute code is costly, especially when operating on a large data set. Typically, our complied source code is much smaller than the size of data we move. Rather than pulling data to our code, a data grid lets us push our code to the data. Co-locating our code and data by moving code to data is another important feature of a data grid.

Because of their distributed nature, data grids allow near-linear horizontal scalability. Adding more hardware to a data grid lets it service more clients without diminishing returns. Additional hardware also lets us have redundancy for our cached data. Ease of scalability and data availability are two major benefits of using data grids.

A shared cache and a container to execute application code are just two factors which make up a data grid. We'll cover those features most extensively in this book. There are several different data grid platforms available from major vendors. IBM is one of those vendors, and we'll use IBM WebSphere eXtreme Scale in this book. We will cover the major features of eXtreme Scale, including the APIs used to interact with the object cache, running code in the grid, and design patterns that help us get the most out of a data grid.

This chapter offers a tutorial on how to get IBM WebSphere eXtreme Scale, configure our development environment to use it, and write a "Hello, world!" type application. After reading this chapter, you will:

  • Understand the uses for a shared cache

  • Set up a development environment with WebSphere eXtreme Scale (WXS)

  • Write and understand a sample WXS application that uses the ObjectMap API

Data grid basics

One part of a data grid is the object cache. An object cache stores the serialized form of Java objects in memory. This approach is an alternative to the most common form of using a relational database for storage. A relational database stores data in column form, and needs object-relational mapping to turn objects into tuples and back again. An object cache only deals with Java objects and requires no mapping to use. A class must be serializeable though.

Caching objects is done using key/value tables that look like a hash table data structure. In eXtreme Scale terminology, this hash table data structure is a class that implements the com.ibm.websphere.objectgrid.BackingMap interface. A BackingMap can work like a simple java.util.Map, used within one application process. It can also be partitioned across many dedicated eXtreme Scale processes. The APIs for working with an unpartitioned BackingMap and a partitioned BackingMap are the same, which makes learning how to use eXtreme Scale easy. The programming interface is the same whether our application is made up of one process or many.

Using a data grid in our software requires some trade-offs. With the great performance of caching objects in memory, we still need to be aware of the consequences of our decisions. In some cases, we trade faster performance for predictable scalability. One of the most important factors driving data grid adoption is predictable scalability in working with growing data sets and more simultaneous client applications.

An important feature of data grids that separates them from simple caches is database integration. Even though the object cache part of a data grid can be used as primary storage, it's often useful to integrate with a relational database. One reason we want to do this is that reporting tools based on RDBMS's are far more capable than reporting tools for data grids today. This may change in the coming years, but right now, we use reporting tools tied in to a database.

WXS uses Loaders to integrate with databases. Though not limited to databases, Loaders are most commonly used to integrate with a database. A Loader can take an object in the object cache and call an existing ORM framework that transforms an object and saves it to a database. Using a Loader makes saving an object to a database transparent to the data grid client. When the client puts the object into the object cache, the Loader pushes the object through the ORM framework behind the scenes. If you are writing to the cache, then the database is a thing of the past.

Using a Loader can make the object cache the primary point of object read/write operations in an application. This greatly reduces the load on a database server by making the cache act as a shock absorber. Finding an object is as simple as looking it up in the cache. If it's not there, then the Loader looks for it in the database. Writing objects to the cache may not touch the database in the course of the transaction. Instead, a Loader can store updated objects and then batch update the database after a certain period of time or after certain number of objects are written to the cache. Adding a data grid between an application and database can help the database serve more clients when those clients are eXtreme Scale clients since the load is not directly on the database server:

This topology is in contrast to one where the database is used directly by client applications. In the following topology the limiting factor in the number of simultaneous clients is the database.

Applications can start up, load a grid full of data, and then shut down while the data in the grid remains there for use by another application. Applications can put objects in the grid for caching purposes and remove them upon application completion. Or, the application can leave them and those objects will far outlive the process that placed them in the grid.

Notice how we are dealing with Java objects. Our cache is a key/value store where keys and values are POJOs. In contrast, a simple cache may limit keys and values to strings. An object in a data grid cache is the serialized form of our Java object. Putting an object from our application into the cache only requires serialization. Mapping to a data grid specific type is not required, nor does the object require a transform layer. Getting an object out of the cache is just as easy. An object need only be deserialized once in the client application process. It is ready for use upon deserialization and does not require any transformation or mapping before use. This is in contrast to persisting an object by using an ORM framework where the framework generates a series of SQL queries in order to save or load the object state. By storing our objects in the grid, we also free ourselves from calling our ORM to save the objects to the database if we choose. We can use the data grid cache as our primary data store or we can take advantage of the database integration features of eXtreme Scale and have the grid write our objects to the database for us.

Data grids typically don't use hard disks or tapes for storing objects. Instead, they store objects in the memory, which may seem obvious based on the name in-memory data grid. Storing objects in the memory has the advantage of keeping objects in a location with much lower access time compared to physical storage. A network hop to connect to a database is going to take the same amount of time as a network hop to a data grid instance. The remote server storing or retrieving of the data from the grid is much faster than the equivalent operation on a database due to the nature of the storage medium. A network hop is required in a distributed deployment. This means that an object isn't initially available in the same address space where it will be used. This is one of those trade-offs mentioned earlier. We trade initial locality of reference for predictable performance over a large data set. What works for caching small data sets may not be a good idea when caching large data sets.

Though the access time of storing objects in memory is an advantage over a database hit, it's hardly a new concept. Developers have been creating in-memory caches for a long time. Looking at a single-threaded application, we may have the cache implemented as a simple hash-map (see below). Examples of things we might cache are objects that result from CPU-intensive calculations. By caching the result, we save ourselves the trouble of recalculating it again the next time it is needed. Another good candidate for caching is storing large amounts of read-only data.

In a single-threaded application, we have one cache available to put data. The amount of data that fits in our cache is limited by the amount of JVM heap size available to it. Depending on the JVM settings, garbage collection may become an issue if large numbers of objects are frequently removed from the cache and go out of the application scope. However, this typically isn't an issue.

This cache is located in the same address space and thread as the code that operates on objects in the cache. Cached objects are local to the application, and accessing them is about as fast as we can get. This works great for data sets that fit in the available heap space, and when no other processes need to access these cached objects.

Building multi-threaded applications changed the scope of the cache a little bit. In a single-threaded application, we have one cache per thread. As we introduce more threads, this method will not continue to work for long:

Each cache contains the same key/value pairs as the other caches.

As each of our N threads has its own cache, the JVM heap size must now be shared among N caches. The most prominent problem with this method of caching is that data will be duplicated in multiple caches. Loading data into one cache will not load it into the others. Depending on the eviction policy, we could end up with a cache hit rate that is close to 0 percent over time. Rather than maintaining a cache per thread, developers started to use a singleton cache:

The singleton cache is protected by the Singleton design pattern in Java. The Singleton design pattern (almost) assures us that only one instance of a particular object exists in the JVM, and it also provides us with a canonical reference to that object. In this way, we can create a hash-map to act as our cache if one doesn't already exist, and then get that same instance every time we look for it. With one cache, we won't duplicate data in multiple caches and each thread has access to all of the data in the cache.

With the introduction of the java.util.concurrent package, developers have safer options available for caching objects between multiple threads. Again, these strategies work best for data sets that fit comfortably in one heap. Running multiple processes of the same application will cache the same data in each process:

What if our application continues to scale out to 20 running instances to do the processing for us? We're once again in the position of maintaining multiple caches that contain the same data set (or subset of the same data set). When we have a large data set that does not fit in one heap, our cache hit rate may approach 0 percent over time. Each application instance cache can be thought of as a very small window into the entire data set. As each instance sees only a small portion of the data set, our cache hit rate per instance is lower than an application with a larger window into the data set. Locality of reference for an object most likely requires a database hit to get the object and then cache it. As our locality of reference is already poor, we may want to insert a shared cache to provide a larger window into the data set. Getting an object from an object cache is faster than retrieving the object from a database, provided that object is already cached.

What we really want is an object cache where any thread in any application process can access the data. We need something that looks like this:

A data grid is made up of many different processes running on different servers. These processes are data grid, or eXtreme Scale processes, not our application processes. For each eXtreme Scale process, we have one more JVM heap available to an object cache. eXtreme Scale handles the hard work of distributing objects across the different data grid processes, making our cache look like one large logical cache, instead of many small caches. This provides the largest window possible into our large data set. Caching more objects is as simple as starting more eXtreme Scale processes on additional servers.

We still have the same number of application instances, but now the cache is not stored inside the application process. It's no longer a hash-map living inside the same JVM alongside the business logic, nor is it stored in an RDBMS. Instead, we have conscripted several computers to donate their memory. This lets us create a distributed cache reachable by any of our application instances. Though the cache is distributed across several computers, there is no data duplication. The data is still stored as a map where the keys are stored across different partitions, such that the data is distributed as evenly as possible.

When using an object cache, the goal is to provide a window as large as possible into a large data set. We want to cache as much data as we can in memory so that any application can access it. We accept that this is slower than caching locally because using a small cache does not produce acceptable cache hit rates. A network hop is a hop, whether it is to connect to a database or data grid. A distributed object cache needs to be faster than a database for read and write operations, only after paying the overhead of making a network connection.

Each partition in a distributed object cache holds a subset of the keys that our applications use for objects. No cache partition stores all of the keys. Instead, eXtreme Scale determines which partition to store an object in, based on it's key. Again, the hard work is handled by eXtreme Scale. We don't need to have knowledge of which partition an object is stored in, or how to connect to that partition. We interact with the object cache as if it were a java.util.Map and eXtreme Scale handles the rest:

In-memory data grids can do a lot more than object caching, though that's the use we will explore first. Throughout this book, we will explore additional features that make up a data grid and put them to use in several sample applications.

 

Data grid basics


One part of a data grid is the object cache. An object cache stores the serialized form of Java objects in memory. This approach is an alternative to the most common form of using a relational database for storage. A relational database stores data in column form, and needs object-relational mapping to turn objects into tuples and back again. An object cache only deals with Java objects and requires no mapping to use. A class must be serializeable though.

Caching objects is done using key/value tables that look like a hash table data structure. In eXtreme Scale terminology, this hash table data structure is a class that implements the com.ibm.websphere.objectgrid.BackingMap interface. A BackingMap can work like a simple java.util.Map, used within one application process. It can also be partitioned across many dedicated eXtreme Scale processes. The APIs for working with an unpartitioned BackingMap and a partitioned BackingMap are the same, which makes learning how to use eXtreme Scale easy. The programming interface is the same whether our application is made up of one process or many.

Using a data grid in our software requires some trade-offs. With the great performance of caching objects in memory, we still need to be aware of the consequences of our decisions. In some cases, we trade faster performance for predictable scalability. One of the most important factors driving data grid adoption is predictable scalability in working with growing data sets and more simultaneous client applications.

An important feature of data grids that separates them from simple caches is database integration. Even though the object cache part of a data grid can be used as primary storage, it's often useful to integrate with a relational database. One reason we want to do this is that reporting tools based on RDBMS's are far more capable than reporting tools for data grids today. This may change in the coming years, but right now, we use reporting tools tied in to a database.

WXS uses Loaders to integrate with databases. Though not limited to databases, Loaders are most commonly used to integrate with a database. A Loader can take an object in the object cache and call an existing ORM framework that transforms an object and saves it to a database. Using a Loader makes saving an object to a database transparent to the data grid client. When the client puts the object into the object cache, the Loader pushes the object through the ORM framework behind the scenes. If you are writing to the cache, then the database is a thing of the past.

Using a Loader can make the object cache the primary point of object read/write operations in an application. This greatly reduces the load on a database server by making the cache act as a shock absorber. Finding an object is as simple as looking it up in the cache. If it's not there, then the Loader looks for it in the database. Writing objects to the cache may not touch the database in the course of the transaction. Instead, a Loader can store updated objects and then batch update the database after a certain period of time or after certain number of objects are written to the cache. Adding a data grid between an application and database can help the database serve more clients when those clients are eXtreme Scale clients since the load is not directly on the database server:

This topology is in contrast to one where the database is used directly by client applications. In the following topology the limiting factor in the number of simultaneous clients is the database.

Applications can start up, load a grid full of data, and then shut down while the data in the grid remains there for use by another application. Applications can put objects in the grid for caching purposes and remove them upon application completion. Or, the application can leave them and those objects will far outlive the process that placed them in the grid.

Notice how we are dealing with Java objects. Our cache is a key/value store where keys and values are POJOs. In contrast, a simple cache may limit keys and values to strings. An object in a data grid cache is the serialized form of our Java object. Putting an object from our application into the cache only requires serialization. Mapping to a data grid specific type is not required, nor does the object require a transform layer. Getting an object out of the cache is just as easy. An object need only be deserialized once in the client application process. It is ready for use upon deserialization and does not require any transformation or mapping before use. This is in contrast to persisting an object by using an ORM framework where the framework generates a series of SQL queries in order to save or load the object state. By storing our objects in the grid, we also free ourselves from calling our ORM to save the objects to the database if we choose. We can use the data grid cache as our primary data store or we can take advantage of the database integration features of eXtreme Scale and have the grid write our objects to the database for us.

Data grids typically don't use hard disks or tapes for storing objects. Instead, they store objects in the memory, which may seem obvious based on the name in-memory data grid. Storing objects in the memory has the advantage of keeping objects in a location with much lower access time compared to physical storage. A network hop to connect to a database is going to take the same amount of time as a network hop to a data grid instance. The remote server storing or retrieving of the data from the grid is much faster than the equivalent operation on a database due to the nature of the storage medium. A network hop is required in a distributed deployment. This means that an object isn't initially available in the same address space where it will be used. This is one of those trade-offs mentioned earlier. We trade initial locality of reference for predictable performance over a large data set. What works for caching small data sets may not be a good idea when caching large data sets.

Though the access time of storing objects in memory is an advantage over a database hit, it's hardly a new concept. Developers have been creating in-memory caches for a long time. Looking at a single-threaded application, we may have the cache implemented as a simple hash-map (see below). Examples of things we might cache are objects that result from CPU-intensive calculations. By caching the result, we save ourselves the trouble of recalculating it again the next time it is needed. Another good candidate for caching is storing large amounts of read-only data.

In a single-threaded application, we have one cache available to put data. The amount of data that fits in our cache is limited by the amount of JVM heap size available to it. Depending on the JVM settings, garbage collection may become an issue if large numbers of objects are frequently removed from the cache and go out of the application scope. However, this typically isn't an issue.

This cache is located in the same address space and thread as the code that operates on objects in the cache. Cached objects are local to the application, and accessing them is about as fast as we can get. This works great for data sets that fit in the available heap space, and when no other processes need to access these cached objects.

Building multi-threaded applications changed the scope of the cache a little bit. In a single-threaded application, we have one cache per thread. As we introduce more threads, this method will not continue to work for long:

Each cache contains the same key/value pairs as the other caches.

As each of our N threads has its own cache, the JVM heap size must now be shared among N caches. The most prominent problem with this method of caching is that data will be duplicated in multiple caches. Loading data into one cache will not load it into the others. Depending on the eviction policy, we could end up with a cache hit rate that is close to 0 percent over time. Rather than maintaining a cache per thread, developers started to use a singleton cache:

The singleton cache is protected by the Singleton design pattern in Java. The Singleton design pattern (almost) assures us that only one instance of a particular object exists in the JVM, and it also provides us with a canonical reference to that object. In this way, we can create a hash-map to act as our cache if one doesn't already exist, and then get that same instance every time we look for it. With one cache, we won't duplicate data in multiple caches and each thread has access to all of the data in the cache.

With the introduction of the java.util.concurrent package, developers have safer options available for caching objects between multiple threads. Again, these strategies work best for data sets that fit comfortably in one heap. Running multiple processes of the same application will cache the same data in each process:

What if our application continues to scale out to 20 running instances to do the processing for us? We're once again in the position of maintaining multiple caches that contain the same data set (or subset of the same data set). When we have a large data set that does not fit in one heap, our cache hit rate may approach 0 percent over time. Each application instance cache can be thought of as a very small window into the entire data set. As each instance sees only a small portion of the data set, our cache hit rate per instance is lower than an application with a larger window into the data set. Locality of reference for an object most likely requires a database hit to get the object and then cache it. As our locality of reference is already poor, we may want to insert a shared cache to provide a larger window into the data set. Getting an object from an object cache is faster than retrieving the object from a database, provided that object is already cached.

What we really want is an object cache where any thread in any application process can access the data. We need something that looks like this:

A data grid is made up of many different processes running on different servers. These processes are data grid, or eXtreme Scale processes, not our application processes. For each eXtreme Scale process, we have one more JVM heap available to an object cache. eXtreme Scale handles the hard work of distributing objects across the different data grid processes, making our cache look like one large logical cache, instead of many small caches. This provides the largest window possible into our large data set. Caching more objects is as simple as starting more eXtreme Scale processes on additional servers.

We still have the same number of application instances, but now the cache is not stored inside the application process. It's no longer a hash-map living inside the same JVM alongside the business logic, nor is it stored in an RDBMS. Instead, we have conscripted several computers to donate their memory. This lets us create a distributed cache reachable by any of our application instances. Though the cache is distributed across several computers, there is no data duplication. The data is still stored as a map where the keys are stored across different partitions, such that the data is distributed as evenly as possible.

When using an object cache, the goal is to provide a window as large as possible into a large data set. We want to cache as much data as we can in memory so that any application can access it. We accept that this is slower than caching locally because using a small cache does not produce acceptable cache hit rates. A network hop is a hop, whether it is to connect to a database or data grid. A distributed object cache needs to be faster than a database for read and write operations, only after paying the overhead of making a network connection.

Each partition in a distributed object cache holds a subset of the keys that our applications use for objects. No cache partition stores all of the keys. Instead, eXtreme Scale determines which partition to store an object in, based on it's key. Again, the hard work is handled by eXtreme Scale. We don't need to have knowledge of which partition an object is stored in, or how to connect to that partition. We interact with the object cache as if it were a java.util.Map and eXtreme Scale handles the rest:

In-memory data grids can do a lot more than object caching, though that's the use we will explore first. Throughout this book, we will explore additional features that make up a data grid and put them to use in several sample applications.

 

Getting IBM WebSphere eXtreme Scale


IBM WebSphere eXtreme Scale is an in-memory data grid formerly known by the brand name Object Grid. There are two ways to get eXtreme Scale. First, eXtreme Scale is integrated with certain versions of IBM WebSphere Application Server. If you have a WebSphere Application Server 6.1 (or higher) deployment capable of integrating with WebSphere eXtreme Scale, then you should follow the instructions provided with your WebSphere software. WebSphere Application Server 6.1 contains additional features that are enabled only when WebSphere eXtreme Scale is present.

If you do not have an installation of WebSphere eXtreme Scale by using your WebSphere Application Server 6.1 license, then you can use the standalone edition. The standalone WebSphere eXtreme Scale trial edition is functionally equivalent to the full licensed version. Everything that can be done with the licensed edition can be done with the trial edition. The programming and configuration interfaces are identical. If you develop an application using the trial edition, it can be deployed to the full edition. All of the examples in this book have been tested with the WebSphere eXtreme Scale 6.1.0.4 FIX2 trial edition available as a multi-platform download. You can download the trial edition from IBM Developer Works at http://www.ibm.com/developerworks/downloads/ws/wsdg/. The file you're looking for is named objectgridtrial610.zip.

IBM strongly recommends that you use an IBM JVM for developing and running your WebSphere eXtreme Scale application. In the event that you use a non-IBM JVM, you should manually integrate the IBM Object Request Broker (ORB) with your JVM. Other ORBs might work, but they are not tested by IBM. The Sun JVM ORB does not work as of this writing. Please see http://www.ibm.com/developerworks/wikis/x/niQ for more information. You can download IBM Java developer kits from http://www.ibm.com/developerworks/java/jdk/. I created the examples with the IBM Development Package for Eclipse, though these examples will work with any of the JVMs listed there.

 

Setting up your environment


Unzip the objectgridtrial610.zip into an empty directory. Unzipping the file produces a directory named ObjectGrid. This directory contains everything you need to run local and distributed WebSphere eXtreme Scale instances.

In order to use the Object Grid classes in our first example, we need to add a few JAR files to our Java classpath. If you're using the command line tools, then add the following classpath option to your javac and java commands, while replacing the paths here with the appropriate paths for your environment and operating system:

-cp .;c:\wxs\ObjectGrid\lib\cjlib.jar; \
c:\wxs\ObjectGrid\lib\ogclient.jar

That's all the setup you need for the command line tools at this time. If you're using the Eclipse environment, then we need to add these JAR files to the project build path:

  1. 1. Create a new Java project in Eclipse.

  2. 2. Right-click on the project folder in the package explorer and select Build Path | Configure Build Path.

  3. 3. Open the Libraries tab and click Add External Jars.

  4. 4. Navigate to the ObjectGrid/lib directory and highlight the cglib.jar and ogclient.jar files. Click Open.

  5. 5. Click OK on the Build Path dialog.

We're now ready to work with a short sample to get our feet wet in the WebSphere eXtreme Scale world.

 

Hello, world!


An object cache stores objects as key/value pairs. The first thing we should do is define a class that we want to store in the cache. Let's store credit card payments for now:

public class Payment implements Serializable {
private int id;
private String cardNumber;
private BigDecimal amount;
private long version = 0L;
// getters and setters omitted for brevity...
}

The Payment class is a simple POJO with getters and setters. As it may be used in a distributed eXtreme Scale deployment, it implements Serializable. That's it! All we need to use a class with eXtreme Scale's object cache is for it to implement Serializable.

Objects of type Payment will be the value part of the key/value pair when we store them in the BackingMap. The key should also implement Serializable if it is a class. The key, if it is not a primitive type, should also implement reasonable equals(Object obj) and hashCode() methods.

Now that we know what we will store in the cache, let's see what it takes to actually store it. In order to put objects in a BackingMap, we need the instance of com.ibm.websphere.objectgrid.ObjectMap. We don't interact directly with objects in a BackingMap. Instead, we do it by using a proxy ObjectMap.

We obtain a reference to an ObjectMap from the com.ibm.websphere.objectgrid.Session#getMap(String mapName) method. A Session lets us perform operations, like GET and PUT operations, in the context of a transaction.

A Session object is returned from the com.ibm.websphere.objectgrid.ObjectGrid#getSession() method. We get an instance of ObjectGrid from the ObjectGridManager#createObjectGrid() method.

public class PaymentLoader {
ObjectGrid grid;
private void initialize() throws ObjectGridException {
ObjectGridManager ogm = ObjectGridManagerFactory.getObjectGridManager();
grid = ogm.createObjectGrid();
BackingMap bm = grid.defineMap("Payment");
}
// other methods omitted for now...
}

The PaymentLoader class has an instance variable grid which holds a reference to our ObjectGrid instance (seen above). The initialize() method sets up the ObjectGrid instance and defines one BackingMap used within the grid.

Let's take it one step at a time and walk through what we did. We want to interact with an ObjectGrid instance. The ObjectGrid interface is the gateway to interacting with WebSphere eXtreme Scale. It allows us to define BackingMaps and create Sessions. ObjectGrid instances are created with the ObjectGridManager interface. We get an ObjectGridManager reference by calling the helper class ObjectGridManagerFactory.getObjectGridManager(). ObjectGridManager is a singleton which provides access to methods which create local ObjectGrid instances, or connect to remote ObjectGrid instances. For now, we call the createObjectGrid() method on the ObjectGridManager. It returns an ObjectGrid instance which is exactly what we were looking for. There are several createObjectGrid methods on the ObjectGridManager that take varying arguments for naming grids and configure them through XML files. Right now this is unnecessary, though we will eventually need to use them. For now, createObjectGrid() meets our needs.

The ObjectGrid.createObjectGrid() method creates a local instance of an ObjectGrid. This means the grid lives inside the application process along with our business logic. At this point, the grid is API equivalent to any WebSphere eXtreme Scale topology we create. No matter how interesting our deployment becomes with partitions, shards, and catalog servers, we always use the same APIs to interact with it.

After creating the grid, we must define maps within it (as seen above). We store our application data inside the maps defined in the grid. Creating a map to store Payment objects is done by calling grid.defineMap("Payment"). This method creates and returns a BackingMap which lives in the grid and holds our Payments. If we were to store different classes in the grid, then we would call the grid.defineMap(String mapName) method for each one. We aren't limited to one BackingMap per class though. If we were to split up our Payment by card type, then our maps would be defined by:

BackingMap bmapVisa = grid.defineMap("VisaPayments"); BackingMap bmapMC = grid.defineMap("MasterCardPayments"); BackingMap bmapAmEx = grid.defineMap("AmExPayments");

Defining a BackingMap gives it a place to live inside the ObjectGrid instance. ObjectGrid instances manage more than one BackingMap. Creating the previous BackingMaps would give us a runtime grid that looked like this:

All BackingMaps must be defined before the call to grid.initialize(). An explicit call to initialize() is optional since the grid.getSession() method calls it if it has not been called by our application. A call to grid.defineMap(String mapName) will throw an IllegalStateException if the initialize() method has already been called. In the sample code, we rely on the implicit call to grid.initialize(), rather than explicitly calling it. This approach is acceptable as long as all BackingMaps are defined before the first call to grid.getSession().

The most important thing to remember about BackingMaps is that they contain objects which live inside the ObjectGrid instance. Once an object is outside of the grid instance, we are no longer dealing with the BackingMap. We never directly interact with an object while it is inside a BackingMap. So how do we interact with objects in the grid?

Any interaction with objects in the gird must be done through an instance of ObjectMap. An ObjectMap is the application-side representation of the objects in a particular BackingMap. ObjectMap instances are obtained through the getMap(String mapName) method in the Session interface:

Session session = grid.getSession();
ObjectMap instancesobtaining, (String mapName) method usedObjectMap paymentsMap = session.getMap("Payments");

Sessions are used to gain access to ObjectMaps backed by BackingMaps that live inside an ObjectGrid instance. The ObjectMap instance can be thought of as a "near cache". Objects copied from remote ObjectGrid instances live inside the near cache when a "get" operation is performed. Objects in the near cache are synchronized to main ObjectGrid cache when a transaction commits. The following diagram should make the relationship clearer:

The previous diagram shows the BackingMap named Payment defined in our application. This BackingMap exists inside the ObjectGrid instance, and we cannot directly add, remove, or modify the objects inside it. Our sample code calls the grid.getSession() method (which actually creates the Payment BackingMap with an implicit call to grid.initialize()). The Session interface is used to create the ObjectMap object that lets us interact with the objects in the BackingMap named Payment with the call to session.getMap("Payment"). session.getMap("Payment") throws UndefinedMapException if it is passed the name of a BackingMap that does not exist.

Now that we have an ObjectMap, we can start adding objects to the cache. Interacting with an instance of ObjectMap is similar to interacting with java.util.Maps. ObjectMap contains methods to put objects in, and get objects out. The two simplest methods to use in this interface are put(Object key, Object value) and get(Object key). While ObjectMap contains other methods to put and get objects in bulk, we'll use these two methods in our example:

private void persistPayment(Payment payment) throws ObjectGridException {
Session session = grid.getSession();
ObjectMap paymentMap = session.getMap("Payment");
session.begin();
paymentMap.put(payment.getId(), payment);
session.commit();
}

The persistPayment(Payment payment) method uses the instance variable grid to get a Session. The Session instance can get a reference to the ObjectMap used to interact with the Payment BackingMap. We call the Session#getMap(String mapName) to get a reference to an ObjectMap. Once we have an ObjectMap, we can interact with it using GET and PUT methods.

When we want to put or get objects from an ObjectMap, we must do so under the context of a transaction. A transaction in WebSphere eXtreme Scale is similar in concept to a database transaction. The Session interface is responsible for transaction management, with explicit calls session.begin() and session.commit() or session.rollback() to start and end transactions. If a put or get on an ObjectGrid does not take place under an explicitly created transaction, then a transaction will begin and commit implicitly. Though implicit transactions may be usable for occasional one-off reads or writes, it is considered poor form to use them, and you are encouraged to call session.begin() and session.commit() in order to utilize transactions better and improve access to your BackingMaps:

session.begin();
paymentMap.put(payment.getId(), payment);
session.commit();

Starting a transaction alerts the grid that we are about to read from, or write to an ObjectMap. In this case, we are simply putting a Payment into the ObjectMap named Payment. Right now, that object only exists in our application context. The ObjectGrid instance does not know about it yet. The call to session.commit() signals that we are finished with our actions, and any changes made to any ObjectMap inside the transaction may be safely written out to their BackingMap inside the grid:

Eventually, we're going to get data out of the grid. In the event that we already have a reference to an ObjectMap, we can begin a new transaction in the Session and read from the ObjectMap using the get(Object) method. Our example shows what we need to do in the event we do not have a reference to an ObjectGrid on hand:

private Payment findPayment(int id) throws ObjectGridException {
ObjectGrid instanceget(Object) methodSession session = grid.getSession();
ObjectMap paymentMap = session.getMap("Payment");
session.begin();
Payment payment = (Payment)paymentMap.get(id);
session.rollback(); return payment;
}

The findPayment(int id) method shows how to get a Payment out of the Payment BackingMap. Just like the persistPayment(Payment payment) method, findPayment(int id) obtains a reference to a Session and an ObjectMap for Payments. The ObjectMap#get(Object key) method returns an Object with the key ID. If that key does not exist in the map, then the get method returns null. We cast the Object returned from the get method into a Payment and return it after rolling back the transaction. We roll back because we did not change the payment object at all, and we only read from the map.

When we're done using the grid, we should make sure we don't leave the grid open to accidental operations. We call the tearDown() method to make sure our reference to ObjectGrid doesn't work anymore:

private void tearDown() {
ObjectGrid instancetearDown() methodgrid.destroy();
}

Finally, grid.destroy() frees any resources used by the grid. Any attempt to get a Session, or begin a transaction after calling grid.destroy(), results in a thrown IllegalStateException.

For completeness, here is the entire PaymentLoader class.package wxs.sample:

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.math.BigDecimal;
import com.ibm.websphere.objectgrid.BackingMap;
import com.ibm.websphere.objectgrid.ObjectGrid;
import com.ibm.websphere.objectgrid.ObjectGridException;
import com.ibm.websphere.objectgrid.ObjectGridManager;
import com.ibm.websphere.objectgrid.ObjectGridManagerFactory;
import com.ibm.websphere.objectgrid.ObjectMap;
import com.ibm.websphere.objectgrid.Session;
public class PaymentLoader {
ObjectGrid grid;
static int pId = 0;
private void initialize() throws ObjectGridException {
ObjectGridManager ogm = ObjectGridManagerFactory.getObjectGridManager();
grid = ogm.createObjectGrid();
BackingMap bm = grid.defineMap("Payment");
}
public static void main(String[] args) {
PaymentLoader pl = new PaymentLoader();
try {
pl.initialize();
pl.loadPayments(args[0]);
pl.tearDown();
} catch (ObjectGridException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
System.out.println("All done!");
}
private void loadPayments(String filename) throws IOException, ObjectGridException {
BufferedReader br = null;
try {
br = new BufferedReader(new FileReader(filename));
String line;
while ((line = br.readLine()) != null) {
Payment payment = createPayment(line);
persistPayment(payment);
findPayment(payment.getId());
}
} finally {
if (br != null) {
br.close();
}
}
}
private Payment findPayment(int id) throws ObjectGridException {
Session session = grid.getSession();
ObjectMap paymentMap = session.getMap("Payment");
session.begin();
Payment payment = (Payment)paymentMap.get(id);
session.rollback();
return payment;
}
private void persistPayment(Payment payment) throws ObjectGridException {
Session session = grid.getSession();
ObjectMap paymentMap = session.getMap("Payment");
session.begin();
paymentMap.put(payment.getId(), payment);
session.commit();
}
private Payment createPayment(String line) {
String[] tokens = line.split(":");
Payment payment = new Payment();
payment.setId(pId++);
payment.setCardNumber(tokens[0]);
payment.setAmount(new BigDecimal(tokens[4]));
return payment;
}
private void tearDown() {
grid.destroy();
}
}

So far, we have created the ObjectGrid instance and BackingMaps in our application code. This isn't always the best way to do it, and it adds clutter to our code. Only local ObjectGrid instances are configurable through the programmatic interface. If we were to continue creating grids like this, we would not be able to take advantage of many of the features that make WebSphere eXtreme Scale so powerful. Instead of the programmatic configuration, we can use XML configuration files to keep information about our ObjectGrid deployment, and then load it when our application runs. This will eventually allow us to build the linearly scalable grids which we have discussed in this chapter. Let's take a look at what the XML configuration for our sample program looks like:

<?xml version="1.0" encoding="UTF-8"?>
<objectGridConfig xmlns:xsi="http://www.w3.org/2001/ XMLSchema-instance"xsi:schemaLocation="http://ibm.com/ws/ objectgrid/config../objectGrid.xsd"xmlns="http://ibm.com/ws/
objectgrid/config">
<objectGrids>
<objectGrid name="MyGrid">
<backingMap name="Payment"/>
</objectGrid>
</objectGrids>
</objectGridConfig>

This is about as simple an XML configuration file we can get. The<objectGridConfig> tag encompasses everything else in the file. In this file, we can define multiple ObjectGrid instances in the<objectGrids>. However, in this file we define just one. ObjectGrids defined in an XML configuration file, must have a name set with the name attribute. When we load the ObjectGrid from the config file in our code, we must provide a name to the createObjectGrid(...) method so that it can return the correct grid instance. This is a departure from using anonymous ObjectGrids, which is what we have done in the rest of this chapter. Our Payment BackingMap is defined in a tag nested under the ObjectGrid instance to which it belongs. Defining an ObjectGrid instance in an XML configuration file changes the way we obtain a reference to it in our code:

private void initialize() throws ObjectGridException, MalformedURLException {
ObjectGridManager ogm = ObjectGridManagerFactory.getObjectGridManager();
String filename = "c:/objectgrid.xml";
URL configFile = new URL("file://" + filename);
grid = ogm.createObjectGrid("MyGrid",configFile);
}

The main difference here is that the createObjectGrid(...) method has changed from giving us an anonymous ObjectGrid instance, to requesting the ObjectGrid instance named MyGrid that is defined in the XML file. Notice how the grid.defineMap("Payment") call disappears. We have already defined the Payment BackingMap for the new ObjectGrid instance in the XML file. Once we have a reference to the ObjectGrid instance, we have everything we need to get to work.

 

Summary


Data grids give us a great way to better utilize our computing resources. They allow us to cache objects in memory that could be located on a computer, on the LAN, or in a data center thousands of miles away. Caching data is one of the core features found in all data grid products. Understanding how to cache objects with WebSphere eXtreme Scale is the natural starting point in learning how to use its many other features to your advantage. Knowledge of where your objects live will help you create powerful grid topologies that scale linearly and keep up with the demand of your applications. WebSphere eXtreme Scale will provide your applications with fast access to remote objects, while giving you a much larger cache to rely on than you would have without a data grid. It allows us to logically join numerous computers together, whether the hardware is real or virtual, and create grids that can store terabytes of live Java objects, all while avoiding costly database hits and transformations to and from SQL statements.

You should now feel comfortable getting started with WebSphere eXtreme Scale, and creating ObjectGrid instances by using the programmatic API, or by creating a simple XML configuration file. Explore what happens when you examine the ObjectMap instances and BackingMaps with a debugger after puts and transaction commits.

You should be familiar with local ObjectGrid instances. As we explore more features of WebSphere eXtreme Scale, you will be able to tell when using a local instance is right for the situation, and when a distributed grid is more suitable. In Chapter 2, we'll find out more about interacting with data in the grid.

About the Author
  • Anthony Chaves

    Anthony writes software for customers of all sizes. He likes building scalable, robust software. Customers have thrown all kinds of different development environments at him: Java, C, Rails, mobile device platforms – but no .NET (yet).Anthony particularly likes user/device authentication problems and applied scalability practices. Cloud-computing buzzword bingo doesn't fly with him. He started the Boston Scalability User Group in 2007.

    Browse publications by this author
IBM WebSphere eXtreme Scale 6
Unlock this book and the full library FREE for 7 days
Start now