Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Neo4j Graph Data Modelling
Neo4j Graph Data Modelling

Neo4j Graph Data Modelling: Design efficient and flexible databases by optimizing the power of Neo4j

Mex$541.99 Mex$378.99
Book Jul 2015 138 pages 1st Edition
eBook
Mex$541.99 Mex$378.99
Print
Mex$676.99
Subscription
Free Trial
eBook
Mex$541.99 Mex$378.99
Print
Mex$676.99
Subscription
Free Trial

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Buy Now

Product Details


Publication date : Jul 27, 2015
Length 138 pages
Edition : 1st Edition
Language : English
ISBN-13 : 9781784393441
Category :
Concepts :
Table of content icon View table of contents Preview book icon Preview Book

Neo4j Graph Data Modelling

Chapter 1. Graphs Are Everywhere

Graphs are all around us. Each time we access the Internet, the data packets travel across a network of routers, switches, and cables and deliver what we have requested. While representing key concepts/objects in a problem and defining relationships or interactions between the concepts/objects involved, we generally draw bubbles or boxes to denote the objects, and arrows between those objects to represent the interactions or relationships. We use a similar notation while drawing a map to explain routes to others. The beauty of these notations, such as bubbles and arrows, is their expressiveness, a property that is usually lost when we obfuscate the model into records and tables. Graphs allow us to discover information and ease the modeling pain, which eventually makes our life smoother. To be able to use graphs better, we will need to understand a few basic concepts related to a graph database. In this chapter, we will explore the following:

  • Graphs in mathematics

  • The property graph model

  • Reasons for using a graph database

  • Usage of graphs—some obvious and some not-so-obvious graph problems

  • Advantages of using Neo4j

We chose Neo4j to explain graph data modeling in this book. However, the modeling concepts discussed here will apply to any graph database.

A few readers might be experienced Neo4j users and if you fall into this category, you might want to skip this chapter. However, if you are new to Neo4j or want a brief refresher, please carry on.

Graphs in mathematics


A graph is a mathematical structure of objects in which some pairs of objects are connected by links. The objects are denoted by abstractions called nodes (also known as vertices) and their links are represented by relationships (also known as edges). The relationships might be directed where it makes semantic sense in one particular direction. In cases where the semantics work in both directions, we can safely use undirected relationships to denote the link.

Figure 1.1: Edges, vertices, directionality

In Figure 1.1, we have three actors or entities, Alice, Bob, and London, which are represented as nodes. The links between them are denoted by relationships. Alice is married to Bob and Bob is married to Alice. Both true, hence we represent Is Married To as an undirected relationship. However, Alice lives in London is represented by a directed relationship, Lives In, from Alice to London. This is because London lives in Alice cannot be true.

The property graph model


In Neo4j, we use a property graph model to represent information. The property graph model is an extension of the graphs from mathematics. The following figure gives an example of how data from Figure 1.1 can be represented in Neo4j:

Figure 1.2: Nodes, relationships and properties

The preceding figure introduces the following concepts that we use to model a property graph:

  • Nodes: Entities are modelled as nodes. In Figure 1.2, London, Bob, Alice are all entities.

  • Labels: These are used to represent the role of the node in our domain. A node can have multiple labels at the same time. Apart from adding more meaning to nodes, labels are also used to add constraints and indices that are local to the particular label. In the preceding figure, :Person and :Location are the two labels that we used. We can add an index or constraint on name for each of these labels, which will result in two separate indices—one for :Location and the other for :Person.

  • Relationships: These depict directed, semantically relevant connections between two nodes. A relationship in Neo4j will always have a start node, an end node, and a single type. While relationships need to be created with a direction, we can ignore the direction while traversing them. :LIVES_IN and :IS_MARRIED_TO in Figure 1.2 are relationship types.

  • Properties: These are key-value pairs that contain information about the node or relationship. In the previous figure, name and since are both properties that divulge more information about the node or relationship they are associated with. Neo4j can accept any Java Virtual Machine (JVM) type as a property, including but not limited to, date, string, double, and arrays.

This property graph model allows us to model data as close to the real world as possible.

The resultant model is simpler and more expressive. It also explicitly calls out relationships. In contrast to an RDBMS, which uses foreign keys to imply relationships, having them explicitly defined allows us to retrieve data by traversing relationships to find the information we need. This is a deliberate, practical algorithmic approach that uses the connectedness of data, rather than relying on some index lookups or joins to find the related data. Explicit relationships also make the property graph model a natural fit for most problem domains, as they are interconnected.

Storage – native graph storage versus non-native graph storage


As with all database management systems, graph databases have the concept of storage and query engines, which deal with persistence and queries over connected data. The query engine of the database is responsible for running the queries and retrieving or modifying data. The query engine exposes the graph data model through Create, Read, Update, and Delete operations (commonly referred to as CRUD). Storage deals with how the data is stored physically and how it is represented logically when retrieved. Its knowledge can help in choosing a graph database.

Relationships are an important part of any domain model and need to be traversed frequently. In a graph database, the relationships are explicit rather than inferred. Making relationships explicit is achieved either via the query engine working on a non-native graph storage (such as RDBMS, column stores, document stores) or using a native graph storage.

In a graph database relying on non-native graph storage, relationships need to be inferred at runtime. For example, if we want to model a graph in an RDBMS, our processing engine will have to infer the relationships using foreign keys and reify the relationships at runtime. This problem is computationally expensive and is infeasible for traversing multiple relationships because of the recursive joins involved. There are other graph databases in which NoSQL stores such as HDFS, column stores such as Cassandra, or documents are used to store data and expose a Graph API. Though there are no joins in a graph database using NoSQL stores, the database still has to use index lookups. In cases where non-native storage is used, the query engines have to make more computational effort.

Neo4j uses a native graph storage. Each node has a handle to all the outgoing relationships it has and each relationship, in turn, knows its terminal nodes. At runtime, to find neighboring nodes, Neo4j doesn't have to do an index lookup. Instead, neighboring nodes can be identified by looking at the relationships of the current node. This feature is called index-free adjacency. Index-free adjacency is mechanically sympathetic and allows the Neo4j query engine to have a significant performance boost while traversing the graph.

Reasons to use graph databases


Every morning when we check our Facebook feed, we are welcomed by a stream of updates from friends and news. Using information about how data is connected and matching it with our individual preferences, Facebook builds a stream of activities from our network that are relevant and interest us. LinkedIn does something similar while suggesting jobs within our network. When we fire up Google Maps or some application such as TomTom or Sygic maps and start navigating to a destination, we use the data that represents connections of various intersections within the city, and work out how best to traverse it. While shopping online, products are recommended to us based on how closely they are connected to what we have already bought or similar products that others have bought. We leverage connected data more and more every day without realizing it.

When dealing with connected data, a graph database gives us the following advantages:

  • The query performance of a graph database is a few orders of magnitude better than RDBMS or other NoSQL alternatives. As the dataset grows, RDBMS join performance deteriorates because of the ever-increasing size of the join tables. On the other hand, graph traversals are localized to a portion of the graph. So query execution time is proportional to the number of nodes visited, rather than being proportional to the overall amount of data stored. This makes the query performance fairly constant over time even though the data might increase exponentially.

  • Flexibility and agility are major considerations in today's world where business needs are constantly evolving. Developers need to have a tool that allows them to incrementally think of the model rather than locking down the data model before they start coding. Graph databases allow for addition of relationships, node types, and properties without making any changes to the existing queries. We can connect the model incrementally, thereby allowing for more sophisticated querying. This flexibility also means fewer migrations. Even in case of changes to the data model, migrations are relatively pain free and can be done without taking the database offline for a long time, thus helping teams deliver software faster while concentrating on the domain rather than managing infrastructure and communication.

  • Lesser ambiguity leads to better models. Since graph databases are schema-less, the schema is dictated by the application and hence is better validated. It allows for better design thinking by developers since there is no ambiguity of the domain model compared to how it is stored in tables.

  • The design to delivery time is reduced. From a developer's standpoint, one of the best features of a graph database is that it is whiteboard friendly. We can make a data model on a whiteboard and not worry about trying to translate it to a set of tables, which don't necessarily represent the data model as is. This allows the developers to concentrate on development rather than translation, thereby saving time.

While all that has been said might seem like jargon, it boils down to economics. Graph databases make more economic sense when the data is highly connected.

What to use a graph database for


Let's start by citing a few problem statements that are more suited to graph databases.

Routing is a graph problem and much research has been done in that respect. One of the leading delivery services in the world uses a Neo4j-based solution to route packages in real time based on information being collected worldwide.

Social networks are problems suited for graphs since they leverage the connections of users to fetch data and decide on what is accessible and what isn't. Facebook, in particular, uses its graph search and has exposed it to the users to enable them to make better searches. Facebook relies heavily on the graph of people and their friends to curate the feed.

Recommendation is again a graph problem that can be solved using graph databases. While companies such as eBay originally relied on MySQL, they eventually turned to Neo4j.

While routing, social networks and recommendations are all obvious graph problems, companies have solved a host of problems by fitting the data into graphs in the recent past.

Search, for example, doesn't come across as a graph problem and is not a very intuitive one. However, Google uses its Knowledge Graph to give you search results based on how well connected a piece of content is to the term being searched. More recently, Facebook has leveraged its social graph to help search become better.

Medical research is another domain where graphs are being used. Medical data is highly interconnected and hence can benefit greatly from the use of graph databases. Companies are now using graph databases for drug discovery and storing medical information.

Storage of ontologies is increasingly being solved using graph databases, which are rapidly finding applications in machine learning and analytics. Companies are also using graph databases in domains such as energy supply and transportation.

Choosing Neo4j for exploring graph databases


Neo4j is a fast, native graph database that satisfies Atomicity, Consistency, Isolation, Durability (ACID) properties. Through usage of transactions, developers can ensure that the failure of a transaction leaves the database's state unchanged ensuring atomicity. Any change to the database doesn't destroy data, ensuring consistency. Data modified by a transaction is isolated from other transactions till it is committed. Since Neo4j is a persistent graph database, the results of a committed transaction can always be retrieved, thus making it durable.

It started off supporting the TinkerPop stack. More information about the TinkerPop stack can be found at http://www.tinkerpop.com.

Neo4j provides numerous modeling and technical affordances, which are valuable when building real-world systems such as:

  • Neo4j is the most mature graph database and has been in production round the clock since 2003. Neo4j is open source with an enormous community. The Neo4j development team is highly engaged with that community so that the features and bugs are rapidly addressed. Neo4j provides native graph storage that enables its engine to perform native graph processing. From the query language to disks, everything is mechanically sympathetic to the transactional storage and rapid retrieval of graph data.

  • Cypher is a very expressive query language used to retrieve data from Neo4j. While it is superficially similar to SQL in some respect, Cypher is the only declarative query language that is built ground-up for humane yet performant graph queries and writes. The Neo4j Java API can be used on JVM-based languages as a more imperative and performant method of querying. This gives the best of both worlds by supporting imperative and declarative querying. (Neo4j plans to move away from supporting Gremlin in the long run, and currently Gremlin is supported through a plugin). Neo4j is open source and allows plugins to enhance or add functionalities, and there is a vibrant ecosystem of tooling around the core database.

  • Any Cypher statement that updates the graph is run within a transaction. If a transaction exists, the newly fired Cypher query will be run in it. If no transaction exists, then the statement will itself be transactional.

  • The community being fostered is incredible. This is also partly made possible by the project being open source. Neo4j is currently being used in production by companies such as UBS, Cisco, Walmart, eBay, Telenor, HP, Pitney Bowes, Accenture, Lockheed Martin, Glassdoor, and many others.

The structure of the book


This book is divided into two sections:

  • Section 1 (Chapter 2, Modeling Flights and Cities, to Chapter 5, Refactoring the Data Model) is essential to understand graph modeling concepts that you will use in your daily routine. We cover how to model a graph, how to query it, how to evolve a graph database to accommodate changes in the domain, and how to translate a RDBMS data model into a graph design.

  • Section 2 (Chapter 6, Modeling Communication Chains, to Chapter 8, Recommendations and Analysis of Historical Data) are more reference oriented with models that you might need for optimization or for specialized cases. Topics covered are modeling chains and advantages of modeling chains, modeling access control, and designing recommendation systems based on the data present.

Summary


In this chapter, we discussed that graph databases are structures that help represent data as nodes, relationships, and properties; relationships explicitly specify and qualify the connection between two entities; labels add semantic meaning to nodes and allow for addition of indices and constraints; properties add more information to the nodes and relationships. We saw a few use cases in which graphs are used currently.

From the next chapter onward, we will delve into designing a data model and use actual Cypher queries to feed it into Neo4j. The queries used in this book are compatible with Neo4j 2.2.3. They have also been tested with Neo4j 2.3.0-M02.

Left arrow icon Right arrow icon

Key benefits

What you will learn

Translate a problem domain from a whiteboard to your database Make design decisions based on the nature of data and how it is going to be used Use Cypher to create and query data Evolve your database in stages Optimize the performance of your application with data design Design paradigms to ensure flexibility, ease of querying, and performance Move from an existing model to a new model without losing consistency

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Buy Now

Product Details


Publication date : Jul 27, 2015
Length 138 pages
Edition : 1st Edition
Language : English
ISBN-13 : 9781784393441
Category :
Concepts :

Table of Contents

16 Chapters
Neo4j Graph Data Modeling Chevron down icon Chevron up icon
Credits Chevron down icon Chevron up icon
About the Author Chevron down icon Chevron up icon
About the Reviewers Chevron down icon Chevron up icon
www.PacktPub.com Chevron down icon Chevron up icon
Preface Chevron down icon Chevron up icon
Graphs Are Everywhere Chevron down icon Chevron up icon
Modeling Flights and Cities Chevron down icon Chevron up icon
Formulating an Itinerary Chevron down icon Chevron up icon
Modeling Bookings and Users Chevron down icon Chevron up icon
Refactoring the Data Model Chevron down icon Chevron up icon
Modeling Communication Chains Chevron down icon Chevron up icon
Modeling Access Control Chevron down icon Chevron up icon
Recommendations and Analysis of Historical Data Chevron down icon Chevron up icon
Wrapping Up Chevron down icon Chevron up icon
Index Chevron down icon Chevron up icon

Customer reviews

Filter icon Filter
Top Reviews
Rating distribution
Empty star icon Empty star icon Empty star icon Empty star icon Empty star icon 0
(0 Ratings)
5 star 0%
4 star 0%
3 star 0%
2 star 0%
1 star 0%

Filter reviews by


No reviews found
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

How do I buy and download an eBook? Chevron down icon Chevron up icon

Where there is an eBook version of a title available, you can buy it from the book details for that title. Add either the standalone eBook or the eBook and print book bundle to your shopping cart. Your eBook will show in your cart as a product on its own. After completing checkout and payment in the normal way, you will receive your receipt on the screen containing a link to a personalised PDF download file. This link will remain active for 30 days. You can download backup copies of the file by logging in to your account at any time.

If you already have Adobe reader installed, then clicking on the link will download and open the PDF file directly. If you don't, then save the PDF file on your machine and download the Reader to view it.

Please Note: Packt eBooks are non-returnable and non-refundable.

Packt eBook and Licensing When you buy an eBook from Packt Publishing, completing your purchase means you accept the terms of our licence agreement. Please read the full text of the agreement. In it we have tried to balance the need for the ebook to be usable for you the reader with our needs to protect the rights of us as Publishers and of our authors. In summary, the agreement says:

  • You may make copies of your eBook for your own use onto any machine
  • You may not pass copies of the eBook on to anyone else
How can I make a purchase on your website? Chevron down icon Chevron up icon

If you want to purchase a video course, eBook or Bundle (Print+eBook) please follow below steps:

  1. Register on our website using your email address and the password.
  2. Search for the title by name or ISBN using the search option.
  3. Select the title you want to purchase.
  4. Choose the format you wish to purchase the title in; if you order the Print Book, you get a free eBook copy of the same title. 
  5. Proceed with the checkout process (payment to be made using Credit Card, Debit Cart, or PayPal)
Where can I access support around an eBook? Chevron down icon Chevron up icon
  • If you experience a problem with using or installing Adobe Reader, the contact Adobe directly.
  • To view the errata for the book, see www.packtpub.com/support and view the pages for the title you have.
  • To view your account details or to download a new copy of the book go to www.packtpub.com/account
  • To contact us directly if a problem is not resolved, use www.packtpub.com/contact-us
What eBook formats do Packt support? Chevron down icon Chevron up icon

Our eBooks are currently available in a variety of formats such as PDF and ePubs. In the future, this may well change with trends and development in technology, but please note that our PDFs are not Adobe eBook Reader format, which has greater restrictions on security.

You will need to use Adobe Reader v9 or later in order to read Packt's PDF eBooks.

What are the benefits of eBooks? Chevron down icon Chevron up icon
  • You can get the information you need immediately
  • You can easily take them with you on a laptop
  • You can download them an unlimited number of times
  • You can print them out
  • They are copy-paste enabled
  • They are searchable
  • There is no password protection
  • They are lower price than print
  • They save resources and space
What is an eBook? Chevron down icon Chevron up icon

Packt eBooks are a complete electronic version of the print edition, available in PDF and ePub formats. Every piece of content down to the page numbering is the same. Because we save the costs of printing and shipping the book to you, we are able to offer eBooks at a lower cost than print editions.

When you have purchased an eBook, simply login to your account and click on the link in Your Download Area. We recommend you saving the file to your hard drive before opening it.

For optimal viewing of our eBooks, we recommend you download and install the free Adobe Reader version 9.