Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Neo4j Graph Data Modelling
Neo4j Graph Data Modelling

Neo4j Graph Data Modelling: Design efficient and flexible databases by optimizing the power of Neo4j

$15.99 per month
Book Jul 2015 138 pages 1st Edition
eBook
$25.99 $17.99
Print
$32.99
Subscription
$15.99 Monthly
eBook
$25.99 $17.99
Print
$32.99
Subscription
$15.99 Monthly

What do you get with a Packt Subscription?

Free for first 7 days. $15.99 p/m after that. Cancel any time!
Product feature icon Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!
Product feature icon 50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.
Product feature icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Product feature icon Thousands of reference materials covering every tech concept you need to stay up to date.
Subscribe now
View plans & pricing

Product Details


Publication date : Jul 27, 2015
Length 138 pages
Edition : 1st Edition
Language : English
ISBN-13 : 9781784393441
Category :
Concepts :
Table of content icon View table of contents Preview book icon Preview Book

Neo4j Graph Data Modelling

Chapter 1. Graphs Are Everywhere

Graphs are all around us. Each time we access the Internet, the data packets travel across a network of routers, switches, and cables and deliver what we have requested. While representing key concepts/objects in a problem and defining relationships or interactions between the concepts/objects involved, we generally draw bubbles or boxes to denote the objects, and arrows between those objects to represent the interactions or relationships. We use a similar notation while drawing a map to explain routes to others. The beauty of these notations, such as bubbles and arrows, is their expressiveness, a property that is usually lost when we obfuscate the model into records and tables. Graphs allow us to discover information and ease the modeling pain, which eventually makes our life smoother. To be able to use graphs better, we will need to understand a few basic concepts related to a graph database. In this chapter, we will explore the following:

  • Graphs in mathematics

  • The property graph model

  • Reasons for using a graph database

  • Usage of graphs—some obvious and some not-so-obvious graph problems

  • Advantages of using Neo4j

We chose Neo4j to explain graph data modeling in this book. However, the modeling concepts discussed here will apply to any graph database.

A few readers might be experienced Neo4j users and if you fall into this category, you might want to skip this chapter. However, if you are new to Neo4j or want a brief refresher, please carry on.

Graphs in mathematics


A graph is a mathematical structure of objects in which some pairs of objects are connected by links. The objects are denoted by abstractions called nodes (also known as vertices) and their links are represented by relationships (also known as edges). The relationships might be directed where it makes semantic sense in one particular direction. In cases where the semantics work in both directions, we can safely use undirected relationships to denote the link.

Figure 1.1: Edges, vertices, directionality

In Figure 1.1, we have three actors or entities, Alice, Bob, and London, which are represented as nodes. The links between them are denoted by relationships. Alice is married to Bob and Bob is married to Alice. Both true, hence we represent Is Married To as an undirected relationship. However, Alice lives in London is represented by a directed relationship, Lives In, from Alice to London. This is because London lives in Alice cannot be true.

The property graph model


In Neo4j, we use a property graph model to represent information. The property graph model is an extension of the graphs from mathematics. The following figure gives an example of how data from Figure 1.1 can be represented in Neo4j:

Figure 1.2: Nodes, relationships and properties

The preceding figure introduces the following concepts that we use to model a property graph:

  • Nodes: Entities are modelled as nodes. In Figure 1.2, London, Bob, Alice are all entities.

  • Labels: These are used to represent the role of the node in our domain. A node can have multiple labels at the same time. Apart from adding more meaning to nodes, labels are also used to add constraints and indices that are local to the particular label. In the preceding figure, :Person and :Location are the two labels that we used. We can add an index or constraint on name for each of these labels, which will result in two separate indices—one for :Location and the other for :Person.

  • Relationships: These depict directed, semantically relevant connections between two nodes. A relationship in Neo4j will always have a start node, an end node, and a single type. While relationships need to be created with a direction, we can ignore the direction while traversing them. :LIVES_IN and :IS_MARRIED_TO in Figure 1.2 are relationship types.

  • Properties: These are key-value pairs that contain information about the node or relationship. In the previous figure, name and since are both properties that divulge more information about the node or relationship they are associated with. Neo4j can accept any Java Virtual Machine (JVM) type as a property, including but not limited to, date, string, double, and arrays.

This property graph model allows us to model data as close to the real world as possible.

The resultant model is simpler and more expressive. It also explicitly calls out relationships. In contrast to an RDBMS, which uses foreign keys to imply relationships, having them explicitly defined allows us to retrieve data by traversing relationships to find the information we need. This is a deliberate, practical algorithmic approach that uses the connectedness of data, rather than relying on some index lookups or joins to find the related data. Explicit relationships also make the property graph model a natural fit for most problem domains, as they are interconnected.

Storage – native graph storage versus non-native graph storage


As with all database management systems, graph databases have the concept of storage and query engines, which deal with persistence and queries over connected data. The query engine of the database is responsible for running the queries and retrieving or modifying data. The query engine exposes the graph data model through Create, Read, Update, and Delete operations (commonly referred to as CRUD). Storage deals with how the data is stored physically and how it is represented logically when retrieved. Its knowledge can help in choosing a graph database.

Relationships are an important part of any domain model and need to be traversed frequently. In a graph database, the relationships are explicit rather than inferred. Making relationships explicit is achieved either via the query engine working on a non-native graph storage (such as RDBMS, column stores, document stores) or using a native graph storage.

In a graph database relying on non-native graph storage, relationships need to be inferred at runtime. For example, if we want to model a graph in an RDBMS, our processing engine will have to infer the relationships using foreign keys and reify the relationships at runtime. This problem is computationally expensive and is infeasible for traversing multiple relationships because of the recursive joins involved. There are other graph databases in which NoSQL stores such as HDFS, column stores such as Cassandra, or documents are used to store data and expose a Graph API. Though there are no joins in a graph database using NoSQL stores, the database still has to use index lookups. In cases where non-native storage is used, the query engines have to make more computational effort.

Neo4j uses a native graph storage. Each node has a handle to all the outgoing relationships it has and each relationship, in turn, knows its terminal nodes. At runtime, to find neighboring nodes, Neo4j doesn't have to do an index lookup. Instead, neighboring nodes can be identified by looking at the relationships of the current node. This feature is called index-free adjacency. Index-free adjacency is mechanically sympathetic and allows the Neo4j query engine to have a significant performance boost while traversing the graph.

Reasons to use graph databases


Every morning when we check our Facebook feed, we are welcomed by a stream of updates from friends and news. Using information about how data is connected and matching it with our individual preferences, Facebook builds a stream of activities from our network that are relevant and interest us. LinkedIn does something similar while suggesting jobs within our network. When we fire up Google Maps or some application such as TomTom or Sygic maps and start navigating to a destination, we use the data that represents connections of various intersections within the city, and work out how best to traverse it. While shopping online, products are recommended to us based on how closely they are connected to what we have already bought or similar products that others have bought. We leverage connected data more and more every day without realizing it.

When dealing with connected data, a graph database gives us the following advantages:

  • The query performance of a graph database is a few orders of magnitude better than RDBMS or other NoSQL alternatives. As the dataset grows, RDBMS join performance deteriorates because of the ever-increasing size of the join tables. On the other hand, graph traversals are localized to a portion of the graph. So query execution time is proportional to the number of nodes visited, rather than being proportional to the overall amount of data stored. This makes the query performance fairly constant over time even though the data might increase exponentially.

  • Flexibility and agility are major considerations in today's world where business needs are constantly evolving. Developers need to have a tool that allows them to incrementally think of the model rather than locking down the data model before they start coding. Graph databases allow for addition of relationships, node types, and properties without making any changes to the existing queries. We can connect the model incrementally, thereby allowing for more sophisticated querying. This flexibility also means fewer migrations. Even in case of changes to the data model, migrations are relatively pain free and can be done without taking the database offline for a long time, thus helping teams deliver software faster while concentrating on the domain rather than managing infrastructure and communication.

  • Lesser ambiguity leads to better models. Since graph databases are schema-less, the schema is dictated by the application and hence is better validated. It allows for better design thinking by developers since there is no ambiguity of the domain model compared to how it is stored in tables.

  • The design to delivery time is reduced. From a developer's standpoint, one of the best features of a graph database is that it is whiteboard friendly. We can make a data model on a whiteboard and not worry about trying to translate it to a set of tables, which don't necessarily represent the data model as is. This allows the developers to concentrate on development rather than translation, thereby saving time.

While all that has been said might seem like jargon, it boils down to economics. Graph databases make more economic sense when the data is highly connected.

What to use a graph database for


Let's start by citing a few problem statements that are more suited to graph databases.

Routing is a graph problem and much research has been done in that respect. One of the leading delivery services in the world uses a Neo4j-based solution to route packages in real time based on information being collected worldwide.

Social networks are problems suited for graphs since they leverage the connections of users to fetch data and decide on what is accessible and what isn't. Facebook, in particular, uses its graph search and has exposed it to the users to enable them to make better searches. Facebook relies heavily on the graph of people and their friends to curate the feed.

Recommendation is again a graph problem that can be solved using graph databases. While companies such as eBay originally relied on MySQL, they eventually turned to Neo4j.

While routing, social networks and recommendations are all obvious graph problems, companies have solved a host of problems by fitting the data into graphs in the recent past.

Search, for example, doesn't come across as a graph problem and is not a very intuitive one. However, Google uses its Knowledge Graph to give you search results based on how well connected a piece of content is to the term being searched. More recently, Facebook has leveraged its social graph to help search become better.

Medical research is another domain where graphs are being used. Medical data is highly interconnected and hence can benefit greatly from the use of graph databases. Companies are now using graph databases for drug discovery and storing medical information.

Storage of ontologies is increasingly being solved using graph databases, which are rapidly finding applications in machine learning and analytics. Companies are also using graph databases in domains such as energy supply and transportation.

Choosing Neo4j for exploring graph databases


Neo4j is a fast, native graph database that satisfies Atomicity, Consistency, Isolation, Durability (ACID) properties. Through usage of transactions, developers can ensure that the failure of a transaction leaves the database's state unchanged ensuring atomicity. Any change to the database doesn't destroy data, ensuring consistency. Data modified by a transaction is isolated from other transactions till it is committed. Since Neo4j is a persistent graph database, the results of a committed transaction can always be retrieved, thus making it durable.

It started off supporting the TinkerPop stack. More information about the TinkerPop stack can be found at http://www.tinkerpop.com.

Neo4j provides numerous modeling and technical affordances, which are valuable when building real-world systems such as:

  • Neo4j is the most mature graph database and has been in production round the clock since 2003. Neo4j is open source with an enormous community. The Neo4j development team is highly engaged with that community so that the features and bugs are rapidly addressed. Neo4j provides native graph storage that enables its engine to perform native graph processing. From the query language to disks, everything is mechanically sympathetic to the transactional storage and rapid retrieval of graph data.

  • Cypher is a very expressive query language used to retrieve data from Neo4j. While it is superficially similar to SQL in some respect, Cypher is the only declarative query language that is built ground-up for humane yet performant graph queries and writes. The Neo4j Java API can be used on JVM-based languages as a more imperative and performant method of querying. This gives the best of both worlds by supporting imperative and declarative querying. (Neo4j plans to move away from supporting Gremlin in the long run, and currently Gremlin is supported through a plugin). Neo4j is open source and allows plugins to enhance or add functionalities, and there is a vibrant ecosystem of tooling around the core database.

  • Any Cypher statement that updates the graph is run within a transaction. If a transaction exists, the newly fired Cypher query will be run in it. If no transaction exists, then the statement will itself be transactional.

  • The community being fostered is incredible. This is also partly made possible by the project being open source. Neo4j is currently being used in production by companies such as UBS, Cisco, Walmart, eBay, Telenor, HP, Pitney Bowes, Accenture, Lockheed Martin, Glassdoor, and many others.

The structure of the book


This book is divided into two sections:

  • Section 1 (Chapter 2, Modeling Flights and Cities, to Chapter 5, Refactoring the Data Model) is essential to understand graph modeling concepts that you will use in your daily routine. We cover how to model a graph, how to query it, how to evolve a graph database to accommodate changes in the domain, and how to translate a RDBMS data model into a graph design.

  • Section 2 (Chapter 6, Modeling Communication Chains, to Chapter 8, Recommendations and Analysis of Historical Data) are more reference oriented with models that you might need for optimization or for specialized cases. Topics covered are modeling chains and advantages of modeling chains, modeling access control, and designing recommendation systems based on the data present.

Summary


In this chapter, we discussed that graph databases are structures that help represent data as nodes, relationships, and properties; relationships explicitly specify and qualify the connection between two entities; labels add semantic meaning to nodes and allow for addition of indices and constraints; properties add more information to the nodes and relationships. We saw a few use cases in which graphs are used currently.

From the next chapter onward, we will delve into designing a data model and use actual Cypher queries to feed it into Neo4j. The queries used in this book are compatible with Neo4j 2.2.3. They have also been tested with Neo4j 2.3.0-M02.

Left arrow icon Right arrow icon

Key benefits

What you will learn

Translate a problem domain from a whiteboard to your database Make design decisions based on the nature of data and how it is going to be used Use Cypher to create and query data Evolve your database in stages Optimize the performance of your application with data design Design paradigms to ensure flexibility, ease of querying, and performance Move from an existing model to a new model without losing consistency

What do you get with a Packt Subscription?

Free for first 7 days. $15.99 p/m after that. Cancel any time!
Product feature icon Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!
Product feature icon 50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.
Product feature icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Product feature icon Thousands of reference materials covering every tech concept you need to stay up to date.
Subscribe now
View plans & pricing

Product Details


Publication date : Jul 27, 2015
Length 138 pages
Edition : 1st Edition
Language : English
ISBN-13 : 9781784393441
Category :
Concepts :

Table of Contents

16 Chapters
Neo4j Graph Data Modeling Chevron down icon Chevron up icon
Credits Chevron down icon Chevron up icon
About the Author Chevron down icon Chevron up icon
About the Reviewers Chevron down icon Chevron up icon
www.PacktPub.com Chevron down icon Chevron up icon
Preface Chevron down icon Chevron up icon
Graphs Are Everywhere Chevron down icon Chevron up icon
Modeling Flights and Cities Chevron down icon Chevron up icon
Formulating an Itinerary Chevron down icon Chevron up icon
Modeling Bookings and Users Chevron down icon Chevron up icon
Refactoring the Data Model Chevron down icon Chevron up icon
Modeling Communication Chains Chevron down icon Chevron up icon
Modeling Access Control Chevron down icon Chevron up icon
Recommendations and Analysis of Historical Data Chevron down icon Chevron up icon
Wrapping Up Chevron down icon Chevron up icon
Index Chevron down icon Chevron up icon

Customer reviews

Filter icon Filter
Top Reviews
Rating distribution
Empty star icon Empty star icon Empty star icon Empty star icon Empty star icon 0
(0 Ratings)
5 star 0%
4 star 0%
3 star 0%
2 star 0%
1 star 0%

Filter reviews by


No reviews found
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

What is included in a Packt subscription? Chevron down icon Chevron up icon

A subscription provides you with full access to view all Packt and licnesed content online, this includes exclusive access to Early Access titles. Depending on the tier chosen you can also earn credits and discounts to use for owning content

How can I cancel my subscription? Chevron down icon Chevron up icon

To cancel your subscription with us simply go to the account page - found in the top right of the page or at https://subscription.packtpub.com/my-account/subscription - From here you will see the ‘cancel subscription’ button in the grey box with your subscription information in.

What are credits? Chevron down icon Chevron up icon

Credits can be earned from reading 40 section of any title within the payment cycle - a month starting from the day of subscription payment. You also earn a Credit every month if you subscribe to our annual or 18 month plans. Credits can be used to buy books DRM free, the same way that you would pay for a book. Your credits can be found in the subscription homepage - subscription.packtpub.com - clicking on ‘the my’ library dropdown and selecting ‘credits’.

What happens if an Early Access Course is cancelled? Chevron down icon Chevron up icon

Projects are rarely cancelled, but sometimes it's unavoidable. If an Early Access course is cancelled or excessively delayed, you can exchange your purchase for another course. For further details, please contact us here.

Where can I send feedback about an Early Access title? Chevron down icon Chevron up icon

If you have any feedback about the product you're reading, or Early Access in general, then please fill out a contact form here and we'll make sure the feedback gets to the right team. 

Can I download the code files for Early Access titles? Chevron down icon Chevron up icon

We try to ensure that all books in Early Access have code available to use, download, and fork on GitHub. This helps us be more agile in the development of the book, and helps keep the often changing code base of new versions and new technologies as up to date as possible. Unfortunately, however, there will be rare cases when it is not possible for us to have downloadable code samples available until publication.

When we publish the book, the code files will also be available to download from the Packt website.

How accurate is the publication date? Chevron down icon Chevron up icon

The publication date is as accurate as we can be at any point in the project. Unfortunately, delays can happen. Often those delays are out of our control, such as changes to the technology code base or delays in the tech release. We do our best to give you an accurate estimate of the publication date at any given time, and as more chapters are delivered, the more accurate the delivery date will become.

How will I know when new chapters are ready? Chevron down icon Chevron up icon

We'll let you know every time there has been an update to a course that you've bought in Early Access. You'll get an email to let you know there has been a new chapter, or a change to a previous chapter. The new chapters are automatically added to your account, so you can also check back there any time you're ready and download or read them online.

I am a Packt subscriber, do I get Early Access? Chevron down icon Chevron up icon

Yes, all Early Access content is fully available through your subscription. You will need to have a paid for or active trial subscription in order to access all titles.

How is Early Access delivered? Chevron down icon Chevron up icon

Early Access is currently only available as a PDF or through our online reader. As we make changes or add new chapters, the files in your Packt account will be updated so you can download them again or view them online immediately.

How do I buy Early Access content? Chevron down icon Chevron up icon

Early Access is a way of us getting our content to you quicker, but the method of buying the Early Access course is still the same. Just find the course you want to buy, go through the check-out steps, and you’ll get a confirmation email from us with information and a link to the relevant Early Access courses.

What is Early Access? Chevron down icon Chevron up icon

Keeping up to date with the latest technology is difficult; new versions, new frameworks, new techniques. This feature gives you a head-start to our content, as it's being created. With Early Access you'll receive each chapter as it's written, and get regular updates throughout the product's development, as well as the final course as soon as it's ready.We created Early Access as a means of giving you the information you need, as soon as it's available. As we go through the process of developing a course, 99% of it can be ready but we can't publish until that last 1% falls in to place. Early Access helps to unlock the potential of our content early, to help you start your learning when you need it most. You not only get access to every chapter as it's delivered, edited, and updated, but you'll also get the finalized, DRM-free product to download in any format you want when it's published. As a member of Packt, you'll also be eligible for our exclusive offers, including a free course every day, and discounts on new and popular titles.