What do you get with Print?

Instant access to your digital copy whilst your Print order is Shipped

Paperback book shipped to your preferred address

Redeem a companion digital copy on all Print orders

Access this title in our online reader with advanced features

DRM FREE - Read whenever, wherever and however you want

Neo4j Graph Data Modelling

Chapter 1. Graphs Are Everywhere

Graphs are all around us. Each time we access the Internet, the data packets travel across a network of routers, switches, and cables and deliver what we have requested. While representing key concepts/objects in a problem and defining relationships or interactions between the concepts/objects involved, we generally draw bubbles or boxes to denote the objects, and arrows between those objects to represent the interactions or relationships. We use a similar notation while drawing a map to explain routes to others. The beauty of these notations, such as bubbles and arrows, is their expressiveness, a property that is usually lost when we obfuscate the model into records and tables. Graphs allow us to discover information and ease the modeling pain, which eventually makes our life smoother. To be able to use graphs better, we will need to understand a few basic concepts related to a graph database. In this chapter, we will explore the following:

Graphs in mathematics
The property graph model
Reasons for using a graph database
Usage of graphs—some obvious and some not-so-obvious graph problems
Advantages of using Neo4j

We chose Neo4j to explain graph data modeling in this book. However, the modeling concepts discussed here will apply to any graph database.

A few readers might be experienced Neo4j users and if you fall into this category, you might want to skip this chapter. However, if you are new to Neo4j or want a brief refresher, please carry on.

The property graph model

In Neo4j, we use a property graph model to represent information. The property graph model is an extension of the graphs from mathematics. The following figure gives an example of how data from Figure 1.1 can be represented in Neo4j:

Figure 1.2: Nodes, relationships and properties

The preceding figure introduces the following concepts that we use to model a property graph:

Nodes: Entities are modelled as nodes. In Figure 1.2, London, Bob, Alice are all entities.
Labels: These are used to represent the role of the node in our domain. A node can have multiple labels at the same time. Apart from adding more meaning to nodes, labels are also used to add constraints and indices that are local to the particular label. In the preceding figure, :Person and :Location are the two labels that we used. We can add an index or constraint on name for each of these labels, which will result in two separate indices—one for :Location and the other for :Person.
Relationships: These depict directed, semantically relevant connections between two nodes. A relationship in Neo4j will always have a start node, an end node, and a single type. While relationships need to be created with a direction, we can ignore the direction while traversing them. :LIVES_IN and :IS_MARRIED_TO in Figure 1.2 are relationship types.
Properties: These are key-value pairs that contain information about the node or relationship. In the previous figure, name and since are both properties that divulge more information about the node or relationship they are associated with. Neo4j can accept any Java Virtual Machine (JVM) type as a property, including but not limited to, date, string, double, and arrays.

This property graph model allows us to model data as close to the real world as possible.

The resultant model is simpler and more expressive. It also explicitly calls out relationships. In contrast to an RDBMS, which uses foreign keys to imply relationships, having them explicitly defined allows us to retrieve data by traversing relationships to find the information we need. This is a deliberate, practical algorithmic approach that uses the connectedness of data, rather than relying on some index lookups or joins to find the related data. Explicit relationships also make the property graph model a natural fit for most problem domains, as they are interconnected.

Storage – native graph storage versus non-native graph storage

As with all database management systems, graph databases have the concept of storage and query engines, which deal with persistence and queries over connected data. The query engine of the database is responsible for running the queries and retrieving or modifying data. The query engine exposes the graph data model through Create, Read, Update, and Delete operations (commonly referred to as CRUD). Storage deals with how the data is stored physically and how it is represented logically when retrieved. Its knowledge can help in choosing a graph database.

Relationships are an important part of any domain model and need to be traversed frequently. In a graph database, the relationships are explicit rather than inferred. Making relationships explicit is achieved either via the query engine working on a non-native graph storage (such as RDBMS, column stores, document stores) or using a native graph storage.

In a graph database relying on non-native graph storage, relationships need to be inferred at runtime. For example, if we want to model a graph in an RDBMS, our processing engine will have to infer the relationships using foreign keys and reify the relationships at runtime. This problem is computationally expensive and is infeasible for traversing multiple relationships because of the recursive joins involved. There are other graph databases in which NoSQL stores such as HDFS, column stores such as Cassandra, or documents are used to store data and expose a Graph API. Though there are no joins in a graph database using NoSQL stores, the database still has to use index lookups. In cases where non-native storage is used, the query engines have to make more computational effort.

Neo4j uses a native graph storage. Each node has a handle to all the outgoing relationships it has and each relationship, in turn, knows its terminal nodes. At runtime, to find neighboring nodes, Neo4j doesn't have to do an index lookup. Instead, neighboring nodes can be identified by looking at the relationships of the current node. This feature is called index-free adjacency. Index-free adjacency is mechanically sympathetic and allows the Neo4j query engine to have a significant performance boost while traversing the graph.

Reasons to use graph databases

Every morning when we check our Facebook feed, we are welcomed by a stream of updates from friends and news. Using information about how data is connected and matching it with our individual preferences, Facebook builds a stream of activities from our network that are relevant and interest us. LinkedIn does something similar while suggesting jobs within our network. When we fire up Google Maps or some application such as TomTom or Sygic maps and start navigating to a destination, we use the data that represents connections of various intersections within the city, and work out how best to traverse it. While shopping online, products are recommended to us based on how closely they are connected to what we have already bought or similar products that others have bought. We leverage connected data more and more every day without realizing it.

When dealing with connected data, a graph database gives us the following advantages:

The query performance of a graph database is a few orders of magnitude better than RDBMS or other NoSQL alternatives. As the dataset grows, RDBMS join performance deteriorates because of the ever-increasing size of the join tables. On the other hand, graph traversals are localized to a portion of the graph. So query execution time is proportional to the number of nodes visited, rather than being proportional to the overall amount of data stored. This makes the query performance fairly constant over time even though the data might increase exponentially.
Flexibility and agility are major considerations in today's world where business needs are constantly evolving. Developers need to have a tool that allows them to incrementally think of the model rather than locking down the data model before they start coding. Graph databases allow for addition of relationships, node types, and properties without making any changes to the existing queries. We can connect the model incrementally, thereby allowing for more sophisticated querying. This flexibility also means fewer migrations. Even in case of changes to the data model, migrations are relatively pain free and can be done without taking the database offline for a long time, thus helping teams deliver software faster while concentrating on the domain rather than managing infrastructure and communication.
Lesser ambiguity leads to better models. Since graph databases are schema-less, the schema is dictated by the application and hence is better validated. It allows for better design thinking by developers since there is no ambiguity of the domain model compared to how it is stored in tables.
The design to delivery time is reduced. From a developer's standpoint, one of the best features of a graph database is that it is whiteboard friendly. We can make a data model on a whiteboard and not worry about trying to translate it to a set of tables, which don't necessarily represent the data model as is. This allows the developers to concentrate on development rather than translation, thereby saving time.

While all that has been said might seem like jargon, it boils down to economics. Graph databases make more economic sense when the data is highly connected.

Choosing Neo4j for exploring graph databases

Neo4j is a fast, native graph database that satisfies Atomicity, Consistency, Isolation, Durability (ACID) properties. Through usage of transactions, developers can ensure that the failure of a transaction leaves the database's state unchanged ensuring atomicity. Any change to the database doesn't destroy data, ensuring consistency. Data modified by a transaction is isolated from other transactions till it is committed. Since Neo4j is a persistent graph database, the results of a committed transaction can always be retrieved, thus making it durable.

It started off supporting the TinkerPop stack. More information about the TinkerPop stack can be found at http://www.tinkerpop.com.

Neo4j provides numerous modeling and technical affordances, which are valuable when building real-world systems such as:

Neo4j is the most mature graph database and has been in production round the clock since 2003. Neo4j is open source with an enormous community. The Neo4j development team is highly engaged with that community so that the features and bugs are rapidly addressed. Neo4j provides native graph storage that enables its engine to perform native graph processing. From the query language to disks, everything is mechanically sympathetic to the transactional storage and rapid retrieval of graph data.
Cypher is a very expressive query language used to retrieve data from Neo4j. While it is superficially similar to SQL in some respect, Cypher is the only declarative query language that is built ground-up for humane yet performant graph queries and writes. The Neo4j Java API can be used on JVM-based languages as a more imperative and performant method of querying. This gives the best of both worlds by supporting imperative and declarative querying. (Neo4j plans to move away from supporting Gremlin in the long run, and currently Gremlin is supported through a plugin). Neo4j is open source and allows plugins to enhance or add functionalities, and there is a vibrant ecosystem of tooling around the core database.
Any Cypher statement that updates the graph is run within a transaction. If a transaction exists, the newly fired Cypher query will be run in it. If no transaction exists, then the statement will itself be transactional.
The community being fostered is incredible. This is also partly made possible by the project being open source. Neo4j is currently being used in production by companies such as UBS, Cisco, Walmart, eBay, Telenor, HP, Pitney Bowes, Accenture, Lockheed Martin, Glassdoor, and many others.

What you will learn

Translate a problem domain from a whiteboard to your database

Make design decisions based on the nature of data and how it is going to be used

Use Cypher to create and query data

Evolve your database in stages

Optimize the performance of your application with data design

Design paradigms to ensure flexibility, ease of querying, and performance

Move from an existing model to a new model without losing consistency

What do you get with Print?

Instant access to your digital copy whilst your Print order is Shipped

Paperback book shipped to your preferred address

Redeem a companion digital copy on all Print orders

Access this title in our online reader with advanced features

DRM FREE - Read whenever, wherever and however you want

Frequently bought together

$48.99

$32.99

$29.99

Total $ 111.97

G. A. Patino Aug 17, 2015

If you are looking for a book to give you a quick overview of the capabilities of graph databases, or how to think in terms of developing a graph database this is a good book to start. The explanations on how to decide whether something should be a node or a property are clear, and the example of a flight and hotel booking application make them very relatable to a wide array of users. The way business considerations are integrated in the modelling decisions will be great guidance for somebody getting into this field.It is also good if you want to get the basics of Neo4j very quickly, as the book is only 109 pages long. However, I think that only people with some programming experience and that are comfortable deducing language syntax rules from example code will be able to do it. This is because the authors rely too much on the declarative nature of Neo4j/Cypher and don't go into any details about syntax rules.The book could have benefited from an initial chapter explaining the Neo4j ecosystem better, as is not clear how to switch from the shell to the browser, or in what conditions to use each. Also, as mentioned above, an explanation of the language syntax would have greatly improved he accessibility of the text. There is no mention either of saving and loading databases either in the shell or the browser. Finally, expanding the last chapter to explain the main limitations of Neo4j, as opposed to graph databases, and what other open-source packages could be used to complement the language would have been very useful.

Amazon Verified review

Neo4j Graph Data Modelling: Design efficient and flexible databases by optimizing the power of Neo4j

What do you get with Print?

Neo4j Graph Data Modelling

Chapter 1. Graphs Are Everywhere

Graphs in mathematics

The property graph model

Storage – native graph storage versus non-native graph storage

Reasons to use graph databases

What to use a graph database for

Choosing Neo4j for exploring graph databases

The structure of the book

Summary

Page 1 of 9

Description

Who is this book for?

What you will learn

Product Details

What do you get with Print?

Product Details

Frequently bought together

Table of Contents

Recommendations for you

Customer reviews

People who bought this also bought

About the author

FAQs

Neo4j Graph Data Modelling: Design efficient and flexible databases by optimizing the power of Neo4j

What do you get with Print?

Contact Details

Shipping Address

Billing Address

Description

Who is this book for?

What you will learn

Product Details

What do you get with Print?

Contact Details

Shipping Address

Billing Address

Product Details

Packt Subscriptions

Frequently bought together

Table of Contents

Recommendations for you

Customer reviews

People who bought this also bought

About the author

FAQs

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access