Spring Data

By Petri Kainulainen
    Advance your knowledge in tech with a Packt subscription

  • Instant online access to over 7,500+ books and videos
  • Constantly updated with 100+ new titles each month
  • Breadth and depth in over 1,000+ technologies

About this book

Spring Framework has always had a good support for different data access technologies. However, developers had to use technology-specific APIs, which often led to a situation where a lot of boilerplate code had to be written in order to implement even the simplest operations. Spring Data changed all this. Spring Data makes it easier to implement Spring-powered applications that use cloud-based storage services, NoSQL databases, map-reduce frameworks or relational databases.

"Spring Data" is a practical guide that is full of step-by-step instructions and examples which ensure that you can start using the Java Persistence API and Redis in your applications without extra hassle.

This book provides a brief introduction to the underlying data storage technologies, gives step-by-step instructions that will help you utilize the discussed technologies in your applications, and provides a solid foundation for expanding your knowledge beyond the concepts described in this book.

You will learn an easier way to manage your entities and to create database queries with Spring Data JPA. This book also demonstrates how you can add custom functions to your repositories. You will also learn how to use the Redis key-value store as data storage and to use its other features for enhancing your applications.

"Spring Data" includes all the practical instructions and examples that provide you with all the information you need to create JPA repositories with Spring Data JPA and to utilize the performance of Redis in your applications by using Spring Data Redis.

Publication date:
November 2012


Chapter 1. Getting Started

In this book, we will concentrate on two specific subprojects that offer support for Java Persistence API 2.0 and the Redis key-value store. But before we get to the point, we need to get a brief introduction to both the technologies. We need to do this for two reasons:

First, if we want to truly understand the benefits of Spring Data JPA, we need to have an idea on how database queries are created when the standard API is used. As soon as we compare these code samples to a query creation code that uses Spring Data JPA, its benefits are revealed to us.

Second, the basic knowledge about the Redis key-value store will help us to understand the second part of this book which describes how we can use it in our applications. After all, we should be familiar with any technology that we use in our applications. Right?

In this chapter, we will cover the following topics:

  • The motivation behind the Java Persistence API

  • The main components of the Java Persistence API

  • How we can create database queries with the Java Persistence API

  • The data types supported by the Redis key-value store.

  • The main features of the Redis key-value store.


Java Persistence API

Before the Java Persistence API (JPA) was introduced, we had the following three alternative technologies which we could use to implement our persistence layer:

  • The persistence mechanism provided by Enterprise JavaBeans (EJB) 2.x specifications

  • The JDBC API

  • The third party object-relational mapping (ORM) frameworks such as Hibernate.

This gave us some freedom when selecting the best tool for the job but as always, none of these options were problem free.

The problem with EJB 2.x was that it was too heavyweight and complicated. Its configuration relied on complicated XML documents and its programming model required a lot of boilerplate code. Also, EJB required that the application be deployed to a Java EE application server.

Programming against the JDBC API was rather simple and we could deploy our application in any servlet container. However, we had to write a lot of boilerplate code that was needed when we were transforming the information of our domain model to queries or building domain model objects from query results.

Third party ORM frameworks were often a good choice because they freed us from writing the unnecessary code that was used to build queries or to construct domain objects from query results. This freedom came with a price tag: objects and relational data are not compatible creatures, and even though ORM frameworks can solve most of the problems caused by the object-relational mismatch , the problems that they cannot solve efficiently are the ones that cause us the most pain.

The Java Persistence API provides a standard mechanism for implementing a persistence layer that uses relational databases. Its main motivation was to replace the persistence mechanism of EJB 2.x and to provide a standardized approach for object-relational mapping. Many of its features were originally introduced by the third party ORM frameworks, which have later become implementations of the Java Persistence API. The following section introduces its key concepts and describes how we can create queries with it.

Key concepts

An entity is a persistent domain object. Each entity class generally represents a single database table, and an instance of such a class contains the data of a single table row. Each entity instance always has a unique object identifier, which is the same thing to an entity that a primary key is to a database table.

An entity manager factory creates entity manager instances. All entity manager instances created by the same entity manager factory will use the same configuration and database. If you need to access multiple databases, you must configure one entity manager factory per used database. The methods of the entity manager factory are specified by the EntityManagerFactory interface.

The entity manager manages the entities of the application. The entity manager can be used to perform CRUD (Create, Read, Updated, and Delete) operations on entities and run complex queries against a database. The methods of an entity manager are declared by the EntityManager interface.

A persistence unit specifies all entity classes, which are managed by the entity managers of the application. Each persistence unit contains all classes representing the data stored in a single database.

A persistence context contains entity instances. Inside a persistence context, there must be only one entity instance for each object identifier. Each persistence context is associated with a specific entity manager that manages the lifecycle of the entity instances contained by the persistence context.

Creating database queries

The Java Persistence API introduced two new methods for creating database queries: Java Persistence Query Language (JPQL) and the Criteria API . The queries written by using these technologies do not deal directly with database tables. Instead, queries are written over the entities of the application and their persistent state. This ensures, in theory, that the created queries are portable and not tied to a specific database schema or database provider.

It is also possible to use SQL queries, but this ties the application to a specific database schema. If database provider specific extensions are used, our application is tied to the database provider as well.

Next we will take a look at how we can use the Java Persistence API to build database queries by using SQL, JPQL, and the Criteria API. Our example query will fetch all contacts whose first name is "John" from the database. This example uses a simple entity class called Contact that represents the data stored in the contacts table. The following table maps the entity's properties to the columns of the database:





Native SQL queries

SQL is a standardized query language that is designed to manage data that is stored in relational databases. The following code example describes how we can implement the specified query by using SQL:

//Obtain an instance of the entity manager
EntityManager em = ...

//Build the SQL query string with a query parameter
String getByFirstName="SELECT * FROM contacts c WHERE c.first_name = ?1";

//Create the Query instance
Query query = em.createNativeQuery(getByFirstName, Contact.class);

//Set the value of the query parameter
query.setParameter(1, "John");

//Get the list of results
List contacts = query.getResultList();

This example teaches us three things:

  • We don't have to learn a new query language in order to build queries with JPA.

  • The created query is not type safe and we must cast the results before we can use them.

  • We have to run the application before we can verify our query for spelling or syntactical errors. This increases the length of the developer feedback loop and decreases productivity.

Because SQL queries are tied to a specific database schema (or to the used database provider), we should use them only when it is absolutely necessary. Often the reason for using SQL queries is performance, but we might also have other reasons for using it. For example, we might be migrating a legacy application to JPA and we don't have time to do it right at the beginning.

Java Persistence Query Language

JPQL is a string-based query language with a syntax resembling that of SQL. Thus, learning JPQL is fairly easy as long as you have some experience with SQL. The code example that executes the specified query is as follows:

//Obtain an instance of the entity manager
EntityManager em = ...

//Build the JPQL query string with named parameter
String getByFirstName="SELECT c FROM Contact c WHERE c.firstName = :firstName";

//Create the Query instance
TypedQuery<Contact> query = em.createQuery(getByFirstName, Contact.class);

//Set the value of the named parameter
query.setParameter("firstName", "John");

//Get the list of results
List<Contact> contacts = query.getResultList();

This example tells us three things:

  • The created query is type safe and we don't have to cast the query results.

  • The JPQL query strings are very readable and easy to interpret.

  • The created query strings cannot be verified during compilation. The only way to verify our query strings for spelling or syntactical errors is to run our application. Unfortunately, this means that the length of the developer feedback loop is increased, which decreases productivity.

JPQL is a good choice for static queries. In other words, if the number of query parameters is always the same, JPQL should be our weapon of choice. But implementing dynamic queries with JPQL is often cumbersome as we have to build the query string manually.

The Criteria API

The Criteria API was introduced to address the problems found while using JPQL and to standardize the criteria efforts of third party ORM frameworks. It is used to construct query definition objects, which are transformed to the executed SQL query. The next code example demonstrates that we can implement our query by using the Criteria API:

//Obtain an instance of entity manager
EntityManager em = ...
//Get criteria builder
CriteriaBuilder cb = em.getCriteriaBuilder();

//Create criteria query
CriteriaQuery<Contact> query = cb.greateQuery(Contact.class);

//Create query root
Root<Contact> root = query.from(Contact.class);

//Create condition for the first name by using static meta
//model. You can also use "firstName" here.
Predicate firstNameIs = cb.equal(root.get(Contact_.firstName, "John");

//Specify the where condition of query

//Create typed query and get results
TypedQuery<Contact> q = em.createQuery(query);
List<Contact> contacts = q.getResultList();

We can see three things from this example:

  • The created query is type safe and results can be obtained without casting

  • The code is not as readable as the corresponding code that uses SQL or JPQL

  • Since we are dealing with a Java API, the Java compiler ensures that it is not possible to create syntactically incorrect queries

The Criteria API is a great tool if we have to create dynamic queries. The creation of dynamic queries is easier because we can deal with objects instead of building query strings manually. Unfortunately, when the complexity of the created query grows, the creation of the query definition object can be troublesome and the code becomes harder to understand.



Redis is an in-memory data store that keeps its entire data set in a memory and uses disk space only as a secondary persistent storage. Therefore, Redis can provide very fast read and write operations. The catch is that the size of the Redis data set cannot be higher than the amount of memory. The other features of Redis include:

  • Support for complex data types

  • Multiple persistence mechanisms

  • Master-slave replication

  • Implementation of the publish/subscribe messaging pattern

These features are described in the following subsections.

Supported data types

Each value stored by Redis has a key. Both keys and values are binary safe, which means that the key or the stored value can be either a string or the content of a binary file. However, Redis is more than just a simple key-value store. It supports multiple binary safe data types, which should be familiar to every programmer. These data types are as follows:

  • String: This is a data type where one key always refers to a single value.

  • List: This is a data type where one key refers to multiple string values, which are sorted in insertion order.

  • Set: This is a collection of unordered strings that cannot contain the same value more than once.

  • Sorted set: This is similar to a set but each of its values has a score which is used to order the values of a sorted set from the lowest score to the highest. The same score can be assigned to multiple values.

  • Hash: This is a data type where a single hash key always refers to a specific map of string keys and values.


Redis supports two persistence mechanisms that can be used to store the data set on disk. They are as follows:

  • RDB is the simplest persistence mechanism of Redis. It takes snapshots from the in-memory data sets at configured intervals, and stores the snapshot on disk. When a server is started, it will read the data set back to the memory from the snapshot file. This is the default persistence mechanism of Redis.

    RDB maximizes the performance of your Redis server, and its file format is really compact, which makes it a very useful tool for disaster recovery. Also, if you want to use the master-slave replication, you have to use RDB because the RDB snapshots are used when the data is synchronized between the master and the slaves.

    However, if you have to minimize the chance of data loss in all situations, RDB is not the right solution for you. Because RDB persists the data at configured intervals, you can always lose the data stored in to your Redis instance after the last snapshot was saved to a disk.

  • Append Only File (AOF) is a persistence model, which logs each operation changing the state of the in-memory data set to a specific log file. When a Redis instance is started, it will reconstruct the data set by executing all operations found from the log file.

    The advantage of the AOF is that it minimizes that chance of data loss in all situations. Also, since the log file is an append log, it cannot be irreversibly corrupted. On the other hand, AOF log files are usually larger than RDB files for the same data, and AOF can be slower than RDB if the server is experiencing a huge write load.

You can also enable both persistence mechanisms and get the best of both worlds. You can use RDB for creating backups of your data set and still ensure that your data is safe. In this case, Redis will use the AOF log file for building the data set on a server startup because it is most likely that it contains the latest data.

If you are using Redis as a temporary data storage and do not need persistency, you can disable both persistence mechanisms. This means that the data sets will be destroyed when the server is shut down.


Redis supports master-slave replication where a single master can have one or multiple slaves. Each slave is an exact copy of its master, and it can connect to both master and other slaves. In other words, a slave can be a master of other slaves. Since Redis 2.6, each slave is read-only by default, and all write operations to a slave are rejected. If we need to store temporary information to a slave, we have to configure that slave to allow write operations.

Replication is non-blocking on both sides. It will not block the queries made to the master even when a slave or slaves are synchronizing their data for the very first time. Slaves can be configured to serve the old data when they are synchronizing their data with the master. However, incoming connections to a slave will be blocked for a short period of time when the old data is replaced with the new data.

If a slave loses connection to the master, it will either continue serving the old data or return an error to the clients, depending on its configuration. When a connection between master and a slave is lost, the slave will automatically reopen the connection and send a synchronization request to the master.

Publish/subscribe messaging pattern

The publish/subscribe messaging pattern is a messaging pattern where the message sender (publisher) does not send messages directly to the receiver (subscriber). Instead, an additional element called a channel is used to transport messages from the publisher to the subscriber. Publishers can send a message to one or more channels. Subscribers can select the interesting channels and receive messages sent to these channels by subscribing to those channels.

Let's think of a situation where a single publisher is publishing messages to two channels, Channel 1 and Channel 2. Channel 1 has two subscribers: Subscriber 1 and Subscriber 2. Channel 2 also has two subscribers: Subscriber 2 and Subscriber 3. This situation is illustrated in the following figure:

The publish/subscribe pattern ensures that the publishers are not aware of the subscribers and vice versa. This gives us the possibility to divide our application into smaller modules, which have loose coupling between them. This makes the modules easier to maintain and replace if needed.

However, the greatest advantage of the publish/subscribe pattern is also its greatest weakness. Firstly, our application cannot rely on the fact that a specific component has subscribed to a specific channel. Secondly, there is no clean way for us to verify if this is the case. In fact, our application cannot assume that anyone is listening.

Redis offers a solid support for the publish/subscribe pattern. The main features of its publish/subscribe implementation are:

  • Publishers can publish messages to one or more channels at the same time

  • Subscribers can subscribe to the interesting channels by using the name of the channel or a pattern containing a wildcard

  • Unsubscribing from channels also supports both name and pattern matching



In this chapter, we have learned that:

  • Java Persistence API was introduced to address the concerns related to EJB 2.x and to provide a standard approach for object-relational mapping. Its features were selected from the features of the most popular third party persistence frameworks.

  • Redis is an in-memory data store, which keeps its entire data set in memory, supports complex data types, can use disk as a persistent storage, and supports master-slave replication. It also has an implementation of the publish/subscribe messaging pattern.

In the next chapter we will learn how we can set up a web application project that uses Spring Data JPA and use it to implement a simple contact manager application.

About the Author

  • Petri Kainulainen

    Petri Kainulainen is a software developer living in Tampere, Finland. He is specialized in application development with the Java programming language and the Spring framework. Petri has over 10 years of experience in software development, and during his career he has participated in the development projects of Finland's leading online market places as a software architect. He is currently working at Vincit Oy as a passionate software developer.

    Browse publications by this author
Book Title
Unlock this book and the full library for FREE
Start free trial