Integrating Websphere eXtreme Scale Data Grid with Relational Database: Part 2

Exclusive offer: get 50% off this eBook here
IBM WebSphere eXtreme Scale 6

IBM WebSphere eXtreme Scale 6 — Save 50%

Build scalable, high-performance software with IBM's WebSphere eXtreme Scale 6 data grid with this book and eBook

$35.99    $18.00
by Anthony Chaves | November 2009 | Java

Read Part One of The DataGrid API with IBM WebSphere eXtreme Scale 6 here.

Removal versus eviction

Setting an eviction policy on a BackingMap makes more sense now that we're using a Loader. Imagine that our cache holds only a fraction of the total data stored in the database. Under heavy load, the cache is constantly asked to hold more and more data, but it operates at capacity. What happens when we ask the cache to hold on to one more payment? The BackingMap needs to remove some payments in order to make room for more.

BackingMaps have three basic eviction policies: LRU (least-recently used), LFU (least-frequently used), and TTL (time-to-live). Each policy tells the BackingMap which objects should be removed in order to make room for more. In the event that an object is evicted from the cache, its status in the database is not changed. With eviction, objects enter and leave the cache due to cache misses and evictions innumerable times, and their presence in the database remains unchanged.

The only thing that affects an object in the database is an explicit call to change (either persist or merge) or remove it as per our application. Removal means the object is removed from the cache, and the Loader executes the delete from SQL to delete the corresponding row(s) from the database. Your data is safe when using evictions. The cache simply provides a window into your data. A remove operation explicitly tells both ObjectGrid and the database to delete an object.

Write-through and write-behind

Getting back to the slow down due to the Loader configuration, by default, the Loader uses write-through behavior:

IBM WebSphere eXtreme Scale 6

Now we know the problem. Write-through behavior wraps a database transaction for every write! For every ObjectGrid transaction, we execute one database transaction. On the up side, every object assuredly reaches the database, provided it doesn't violate any relational constraints. Despite this harsh reaction to write-through behavior, it is essential for objects that absolutely must get to the database as fast as possible. The problem is that we hit the database for every write operation on every BackingMap. It would be nice not to incur the cost of a database transaction every time we write to the cache.

Write-behind behavior gives us the help we need. Write-behind gives us the speed of an ObjectGrid transaction and the flexibility that comes with storing data in a database:

IBM WebSphere eXtreme Scale 6

Each ObjectGrid transaction is now separate from a database transaction. BackingMap now has two jobs. The first job is to store our objects as it always does. The second job is to send those objects to the JPAEntityLoader. The JPAEntityLoader then generates SQL statements to insert the data into a database.

We configured each BackingMap with its own JPAEntityLoader. Each BackingMap requires its own Loader because each Loader is specific to a JPA entity class. The relationship between JPAEntityLoader and a JPA entity is established when the BackingMap is initialized. The jpaTxCallback we specified in the ObjectGrid configuration coordinates the transactions between ObjectGrid and a JPA EntityManager.

In a write-through situation, our database transactions are only as large as our ObjectGrid transactions. Update one object in the BackingMap and one object is written to the database. With write-behind, our ObjectGrid transaction is complete, and our objects are put in a write-behind queue map. That queue map does not immediately synchronize with the database. It waits for some specified time or for some number of updates, to write out its contents to the database:

IBM WebSphere eXtreme Scale 6

We configure the database synchronization conditions with the setWriteBehind("time;conditions") method on a BackingMap instance. Programmatically the setWriteBehind method looks like this:

BackingMap paymentMap = grid.getMap("Payment");
paymentMap.setLoader(new JPAEntityLoader());
paymentMap.setWriteBehind("T120;C5001");

The same configuration in XML looks like this:

<backingMap name="Payment" writeBehind="T120;C5001"
pluginCollectionRef="Payment" />

Enabling write-behind is as simple as that. The setWriteBehind method takes one string parameter, but it is actually a two-in-one. At first, the T part is the time in seconds between syncing with the database. Here, we set the payment BackingMap to wait two minutes between syncs. The C part indicates the number (count) of changes made to the BackingMap that triggers a database sync.

Between these two parameters, the sync occurs on a whichever comes first basis. If two minutes elapse between syncs, and only 400 changes (persists, merges, or removals) have been put in the write-behind queue map, then those 400 changes are written out to the database. If only 30 seconds elapse, but we reach 5001 changes, then those changes will be written to the database.

ObjectGrid does not guarantee that the sync will take place exactly when either of those conditions is met. The sync may happen a little bit before (116 seconds or 4998 changes) or a little bit later (123 seconds or 5005 changes). The sync will happen as close to those conditions as ObjectGrid can reasonably do it.

The default value is "T300;C1000". This syncs a BackingMap to the database every five minutes, or 1000 changes to the BackingMap. This default is specified either with the string "T300;C1000" or with an empty string (" "). Omitting either part of the sync parameters is acceptable. The missing part will use the default value. Calling setWriteBehind("T60") has the BackingMap sync to the database every 60 seconds, or 1000 changes. Calling setWriteBehind("C500") syncs every five minutes, or 500 changes.

Write-behind behavior is enabled if the setWriteBehind method is called with an empty string. If you do not want write-behind behavior on a BackingMap, then do not call the setWriteBehind method at all.

A great feature of the write-behind behavior is that an object changed multiple times in the cache is only written in its final form to the database. If a payment object is changed in three different ObjectGrid transactions, the SQL produced by the JPAEntityLoader will reflect the object's final state before the sync. For example:

entityManager.getTransaction().begin();
Payment payment = createPayment(line, batch);
entityManager.getTransaction().commit();
some time later...
entityManager.getTransaction().begin();
payment.setAmount(new BigDecimal("44.95"));
entityManager.getTransaction().commit();
some time later...
entityManager.getTransaction().begin();
payment.setPaymentType(PaymentType.REAUTH);
entityManager.getTransaction().commit();

With write-through behavior, this would produce the following SQL:

insert into payment (id, amount, batch_id, card_id, payment_type) 
values (12345, 75.00, 31, 6087, 'AUTH');
update payment set (id, amount, batch_id, card_id, payment_type)
values (12345, 44.95, 31, 6087, 'AUTH')
where id = 12345;
update payment set (id, amount, batch_id, card_id, payment_type)
values (12345, 44.95, 31, 6087, 'REAUTH')
where id = 12345;

Now that we're using write-behind, that same application behavior produces just one SQL statement:

insert into payment (id, amount, batch_id, card_id, payment_type) 
values (12345, 44.95, 31, 6087, 'REAUTH');
IBM WebSphere eXtreme Scale 6 Build scalable, high-performance software with IBM's WebSphere eXtreme Scale 6 data grid with this book and eBook
Published: November 2009
eBook Price: $35.99
Book Price: $59.99
See more
Select your format and quantity:

BackingMap and Loader

The following problem does not exist in WebSphere eXtreme Scale version 7.0 and later. eXtreme Scale 7.0 solved the problem of writing data to a database out of order. The problem applies only to eXtreme Scale pre-7.0. The solution to this problem is left in as an example of using "soft references" to other objects and it remains a useful technique. We've seen that each BackingMap has its own instance of a Loader. Because each BackingMap uses the Loader to sync with the database according to its own conditions, we end up with different BackingMaps syncing at different times. Most of the time, we expect this to be a good thing. There are only four BackingMaps in our application that sync with the database (as seen below), but a larger application can have many more. Letting the BackingMaps sync on their own schedules reduces peak database load from an ObjectGrid instance.

IBM WebSphere eXtreme Scale 6

Our PaymentProcessor application should be pretty fast again after enabling write-behind on the BackingMaps. Each ObjectGrid transaction is no longer encumbered by a database transaction. By letting the application run for a bit,we see the output speed on the console. Part of that output includes this:

n exception occurred: javax.persistence.RollbackException: Error
while commiting the transaction at
org.hibernate.ejb.TransactionImpl.commit(TransactionImpl.java:71)
at com.ibm.websphere.objectgrid.jpa.JPATxCallback.
commit(JPATxCallback.java:158)
at com.ibm.ws.objectgrid.SessionImpl.commit(SessionImpl.java:1242)
at com.ibm.ws.objectgrid.writebehind.WriteBehindLoader.pushChanges
(WriteBehindLoader.java:1147)
at com.ibm.ws.objectgrid.writebehind.WriteBehindLoader.pushChanges
(WriteBehindLoader.java:1058)
at com.ibm.ws.objectgrid.writebehind.WriteBehindLoader.
run(WriteBehindLoader.java:768)
at java.lang.Thread.run(Thread.java:735)
Caused by: org.hibernate.TransientObjectException: object
references an unsaved transient instance - save the transient
instance before flushing: wxs.sample.models.Payment.card ->
wxs.sample.models.Card
at org.hibernate.engine.CascadingAction$9.noCascade(
CascadingAction.java:353)
at org.hibernate.engine.Cascade.cascade(Cascade.java:139)
at org.hibernate.event.def.AbstractFlushingEventListener.cascadeOn
Flush(AbstractFlushingEventListener.java:131)
at org.hibernate.event.def.AbstractFlushingEventListener.prepareEn
tityFlushes(AbstractFlushingEventListener.java:122)
at org.hibernate.event.def.AbstractFlushingEventListener.flushEver
ythingToExecutions(AbstractFlushingEventListener.java:65)
at org.hibernate.event.def.DefaultFlushEventListener.onFlush(
DefaultFlushEventListener.java:26)
at org.hibernate.impl.SessionImpl.flush(SessionImpl.java:1000)
at org.hibernate.impl.SessionImpl.managedFlush(SessionImpl.java:338)
at org.hibernate.transaction.JDBCTransaction.
commit(JDBCTransaction.java:106)
at org.hibernate.ejb.TransactionImpl.commit(TransactionImpl.java:54)

This is an artifact of using a JPAEntityLoader with our BackingMaps. The problem is the org.hibernate.TransientObjectException, which indicates that something is wrong with our ORM:

save the transient instance before flushing: 
wxs.sample.models.Payment.card -> wxs.sample.models.Card

Our objects are JPA entities. A JPA entity manager enforces referential integrity through the ORM mapping files specified in persistence.xml. When we persist a payment instance, the instances of card and batch that it references must already be in the database due to foreign key constraints on the payment object and payment table:

<entity class="wxs.sample.models.Payment" access="FIELD">
<attributes>
<id name="id"/>
<basic name="paymentType">
<enumerated>STRING</enumerated>
</basic>
[ 109 ]
<many-to-one name="batch"
target-entity="wxs.sample.models.Batch"
fetch="LAZY"/>
<many-to-one name="card"
target-entity="wxs.sample.models.Card"
fetch="LAZY"/>
</attributes>
</entity>

In the ORM definition, we notify the entity manager that a payment references an instance of a Card, and an instance of a Batch. A row for each of these objects must exist in the database before the JPA entity manager persists the payment. If that's not enough, then the database schema should enforce these constraints:

IBM WebSphere eXtreme Scale 6

Whether the JPA entity manager allows it or not, the payment table definition clearly sets foreign key constraints on the batch and card tables. What's going wrong here?

Remember, each BackingMap has its own Loader, and each Loader syncs with the database according to its own rules. Because the Loaders do not sync at the same time, in the same order, or in the correct order according to the relationships set up by our ORM mappings, we run into referential integrity constraints.

IBM WebSphere eXtreme Scale 6

We're getting this exception because the BackingMap holding our payment objects syncs to the grid before the BackingMap that holds a card one of the payment objects references (as seen above). Because we can't specify the order in which BackingMap objects write to the database, we face a few difficult decisions.

Picking battles

The first thing we can do is disable write-behind on the card BackingMap. Each time we create a card and persist it to the BackingMap, the JPAEntityLoader will immediately write it to the database before the ObjectGrid transaction commits:

BackingMap batchMap = grid.getMap("Batch");
batchMap.setLoader(new JPAEntityLoader());
batchMap.setLockStrategy(LockStrategy.PESSIMISTIC);
BackingMap paymentMap = grid.getMap("Payment");
paymentMap.setLoader(new JPAEntityLoader());
paymentMap.setWriteBehind("T120;C5001");
BackingMap addressMap = grid.getMap("Address");
addressMap.setLoader(new JPAEntityLoader());
addressMap.setWriteBehind("T60;C11");
BackingMap cardMap = grid.getMap("Card");
cardMap.setLoader(new JPAEntityLoader());

Can you guess what happens when we run the program now:

org.hibernate.TransientObjectException: object references 
an unsaved transient instance - save the transient instance
before flushing: wxs.sample.models.Card.address ->
wxs.sample.models.Address

Same problem, different relationship! This time the problem is between Address and Card. In fact, we could have encountered this exception before we encountered the TransientObjectException between Payment and Card. Now what? Disable write-behind on Address too? Sure, why not:

BackingMap batchMap = grid.getMap("Batch");
batchMap.setLoader(new JPAEntityLoader());
batchMap.setLockStrategy(LockStrategy.PESSIMISTIC);
BackingMap paymentMap = grid.getMap("Payment");
paymentMap.setLoader(new JPAEntityLoader());
paymentMap.setWriteBehind("T120;C5001");
BackingMap addressMap = grid.getMap("Address");
addressMap.setLoader(new JPAEntityLoader());
BackingMap cardMap = grid.getMap("Card");
cardMap.setLoader(new JPAEntityLoader());

Potentially, we now have two database writes before we create a payment. One write for the address commit, and one write for the card commit. We're losing out on some write-behind benefits very quickly. Only one of our entities now has write-behind enabled for it. One of the goals we had in using ObjectGrid was to reduce the volume of database write transactions. This is still faster than a database-only approach, but it is significantly slower than having write-behind enabled on the address and card BackingMaps.

If strict referential integrity is a high priority and unchangeable in the current schema, then we are stuck with what we've got. This is nothing to scoff at. Read requests are significantly faster for objects in the cache. The BackingMap for payments still has write-behind enabled, and the creation of payments is still the highest-volume processing we do. Because of the huge number of payments we create and modify, inserting ObjectGrid between the database and our application is worth it for this alone.

However, if you're willing to loosen up the database schema requirements a little bit, and throw ORM relationships out of the window, then we can do much better than what we've got.

JPALoader

We're having problems because the BackingMaps do not sync with regard to JPA entity relationships. The order of insertion is important in a database, and we cannot specify an order for BackingMaps to load their objects. Rather than fighting that,we can embrace it and dumb down our ObjectGrid and JPA entities. Decoupling entities removes the insertion order requirement between the two. As long as we don't mind having a payment with a card_id that doesn't yet exist in the card table, we can get some work done with this approach. Besides, that card will be in the table in the next five minutes, or 1000 BackingMap changes, if we're using the default write-behind values:

<entity class="wxs.sample.models.Payment" access="FIELD">
<attributes>
<id name="id"/>
<basic name="paymentType">
<enumerated>STRING</enumerated>
</basic>
</attributes>
</entity>
<entity class="wxs.sample.models.Batch" access="FIELD">
<attributes>
<id name="id"/>
<basic name="status">
<enumerated>STRING</enumerated>
</basic>
</attributes>
</entity>
<entity class="wxs.sample.models.Card" access="FIELD">
<attributes>
<id name="id"/>
<basic name="cardType">
<enumerated>STRING</enumerated>
</basic>
</attributes>
</entity>
<entity class="wxs.sample.models.Address" access="FIELD">
<attributes>
<id name="id"/>
</attributes>
</entity>

The new orm.xml file removes all of the relationship metadata. That takes care of the JPA part. We now need to pull the ObjectGrid entity metadata out of our model classes:

@javax.persistence.Entity
public class Batch implements Serializable {
@javax.persistence.Id int id;
BatchStatus status;
String originalFileName;
}

Note that we keep the JPA entity metadata in the class while removing the ObjectGrid entity metadata. The JPA entity metadata stays because we're still using JPA to map the entity to a database table. The ObjectGrid entity metadata goes away because we need to get a little "closer to the metal" as the saying goes:

@javax.persistence.Entity
public class Payment implements Serializable {
@Id @Index @javax.persistence.Id
int id;
int batchId;
int cardId;
PaymentType paymentType;
BigDecimal amount;
}

We've removed the ObjectGrid entity annotations. We can no longer use the ORM features of entities to define relationships between our models. It's back to the ObjectMap API, including defining the map in our useLocalObjectGrid() method:

grid.defineMap("Payment");
BackingMap paymentMap = grid.getMap("Payment");
paymentMap.setLoader(new JPALoader());
paymentMap.setWriteBehind("");

Because we're not using the ORM of ObjectGrid entities, we must remove the references to Batch and Card from the Payment class. We replace them with their respective IDs. When the JPALoader generates SQL from payment objects, it now has the batch ID that the payment references. This technique works because we're generating object IDs within our application. The combination of application-generated object IDs, and backing off from ORM, allows us to insert objects "out of order":

Without the Foreign key constraints on the payment table, we can insert payments with non-existent cardId and batchId. Additionally, we need to rework our application to use ObjectMap and Session, in place of the EntityManager API.

Changing all our classes to use the ObjectMap and the Session APIs removes some functionality that is useful. The query functionality we gain with ObjectGrid entities is too good to pass up in some situations. Particularly when we query for addresses and cards that already exist, we'd like to preserve this functionality. There is nothing to stop us from taking a hybrid approach to using ObjectGrid entities where it works and is helpful, and then dropping down to the lower level APIs for ObjectMap and Session where we need it. In this case, we're going to leave the ObjectGrid entity annotations on Address and Card.

Running the PaymentProcessor program with the hybrid approach gives us all of the advantages of read-through/write-behind behavior. We only gain this error-free functionality by losing our grip on the database schema and allowing our classes to have a little more knowledge of the database schema.

Summary

As usual, we covered a lot of ground in this article. This was a good opportunity to discuss what in-memory data grids do well. We also got to see some effects of their relative immaturity. Data grids don't yet have the widespread use held by relational databases. As such, there is a lack of tool support when integrating with an IMDG. The IMDG is also inherently platform-specific now. WebSphere eXtreme Scale caches Java objects themselves, not an interoperable representation. Where interoperability is an issue, relational databases shine.

Fortunately, we've seen how we can integrate our IMDG with our relational database. We can now build new applications using the data grid as a cache, and integrate it with a database for durability, data capacity, and reporting uses.

Configuring an ObjectGrid instance to use a JPALoader of JPAEntityLoader is easy. Starting the flow of objects to SQL to database rows is like turning on a switch. The most important thing to remember when using Loaders is that referential integrity will come back to haunt you if write-behind is enabled. There are a few ways to look at that problem. Is referential integrity the most important requirement in an application? Perhaps write-behind isn't the right behavior for that application. More realistically, a hybrid approach of write-through for critical referential integrity objects, and write-behind for high-volume or change-heavy objects works well. If write-speed is important, then relaxing the relational constraints in the database schema and in the ORM file makes sense.

Among WebSphere eXtreme Scale, a relational database, and a JPA provider, we can now achieve very high performance with increased durability and interoperability. Now that we know our way around WebSphere eXtreme Scale, and have covered the legacy issues, we can focus on building highly-scalable, highly-available deployments in the future.

 

If you have read this article you may be interested to view :

About the Author :


Anthony Chaves

Anthony writes software for customers of all sizes. He likes building scalable, robust software. Customers have thrown all kinds of different development environments at him: Java, C, Rails, mobile device platforms – but no .NET (yet).

Anthony particularly likes user/device authentication problems and applied scalability practices. Cloud-computing buzzword bingo  doesn't fly with him. He started the Boston Scalability User Group in 2007.

Books From Packt


WebSphere Application Server 7.0 Administration Guide
WebSphere Application Server 7.0 Administration Guide

Pentaho Reporting 3.5 for Java Developers
Pentaho Reporting 3.5 for Java Developers

JasperReports 3.5 for Java Developers
JasperReports 3.5 for Java Developers

Spring Persistence with Hibernate
Spring Persistence with Hibernate

RESTful Java Web Services
RESTful Java Web Services

JBoss AS 5 Development
JBoss AS 5 Development

Tomcat 6 Developer's Guide
Tomcat 6 Developer's Guide

JBoss RichFaces 3.3
JBoss RichFaces 3.3


Code Download and Errata
Packt Anytime, Anywhere
Register Books
Print Upgrades
eBook Downloads
Video Support
Contact Us
Awards Voting Nominations Previous Winners
Judges Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software
Resources
Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software