Graph model and Neo4j - where data is designed around relationships


Königsberg bridge problem

Can one find out whether or not it is possible to cross each bridge exactly once?

In the 18th century, the Swiss mathematician Leonhard Euler was intrigued by the question of whether a route existed that would traverse each of the seven bridges exactly once. In demonstrating that the answer is no, he laid the foundation for graph theory. (Source: Encyclopedia Britannica)

This famous question from the 18th century, about seven bridges connecting two small islands over Pregel river (a small town Königsberg, today’s Kaliningrad, Russia) became the foundation of Graph Theory. This question provoked the curiosity of Swiss mathematician Leonhard Euler; he was obsessed with finding a mathematical solution to this problem.

The key to this problem is understanding the degree of each node (how many bridges are connected to each landmass). In order to meet the rule (one can travel over each bridge only once), there should be a distinguished path which one can take to leave a node, as long as there is a path which one can take to arrive to this node. In other words, paths connecting to each node should always make up a pair, one for arriving and another one for leaving, hence the degrees of each node must always be an even number. The only exceptions are nodes at the very beginning and end of the entire path.

The degrees of each node must be even, except for the start and end, in order to meet the requirement.

In the case of the Königsberg bridge problem, there are 4 landmasses (2 islands and 2 contiguous lands) or nodes, and all degrees of the nodes are odd numbers, therefore, it’s impossible to travel without crossing the same bridge at least twice. As shown below, if we add one more bridge, we can create a continuous path without crossing the same bridge twice.

Including this rule of degree, Euler introduced a theory, Geometry of Position, which is currently known as Graph Theory. The theory became the foundation for the graph database model later on. The graph is an abstract data type, which is powerful for analyzing relationships between different elements, and can be applied in social media networks, navigation systems, and recommendation engines.

Source: How the Königsberg bridge problem changed mathematics — Dan Van der Vieren

What is the advantage of graph?

The Graph database model, similar to the relational database model, stores and organizes data items that relate to each other. Unlike the relational model which organizes items and relationships with rigid schemas, which requires indexing multiple tables to find a relationship. The graph model is structured entirely around relationships.

In the relational model example below, people and friends all have unique IDs, and a Person-Friend table stores relationships row by row with IDs of a person and friend. Therefore, in order to find Andreas’s friends, the relational model requires looking up three different tables to find friends’ details. In contrast, the graph model retains metadata that reflects each item’s relationship to others. Without requiring indexing multiple locations, the graph model is much faster in finding relationships, especially when the relationship is far from the origin.

Relational Versus Graph Model Is (Image source: William Lyon via.Neo4j — get off the ground in 30min or less!)

Facebook’s Open Graph

Facebook’s famous, now somewhat controversial, Open Graph was built based off of the graph model. As Facebook initiated the concept of Social Graph, which connects individuals to their friends and families, Facebook applied their relationships to non-human things. First, Facebook introduced Like to link individuals with different entities on Facebook that might interest them, including individual photos, events, and business pages.

Later, Facebook began interpreting different users actions, beyond likes, to better understand their aspirations. Actions could include eating at a restaurant, listening to a particular song, reading a news article, and going to a music festival, for example.

What is Open Graph (Source: Facebook Developers)

Neo4j and fundamental entities of graph

In researching the graph model, I ran into a product called Neo4j, and took a couple of tutorial courses for it. Neo4j is a graph database management system developed by Neo4j, Inc. Neo4j, Inc. now has over 200 employees in their main offices in San Francisco, and Malmo, Sweden. They raised $80 million from Series E funding in 2018, and is the most popular database according to DB-Engines ranking.

Including Neo4j’s system, graph models are a collection of 2 basic entities, nodes and relationships. They can be categorized by labels for an easier search. For instance, if we want to find travel options between two locations, Beijing and Sao Paulo, we search for relationships between two nodes with a label :City. Also, multiple labels can be applied to a node, for example, Beijing may also have a label :Capital.

In a graph, each relationship is one directional. For instance, if there is a flight path from Beijing to Sao Paulo via Dubai, but the airline does not provide a service from Dubai back to Beijing, the entire model can be described with a one directional relationship from Beijing to Dubai, and two relationships between Dubai and Sao Paulo pointing in both directions. Each relationship can also have a label just like a node. For instance, a label :Flight may be applied to these relationships, and :Bus:Train, or :Ship labels for non-air transportations.

relationship does not need to connect two nodes, but can connect back to the same node. In Australia, there is a seasonal flight from Sydney, which flies over Antarctica and comes back to Sydney. Their boarding passes say “from Sydney to Mystery Flight”, since the boarding pass format is not designed for this type relationship that leaves from and to the same node.

Lastly, both node and relationship can have different properties, which essentially is information that they are comprised of, such as a name and population for city nodes, and airline names, flight numbers, and fleet types for flight relationships.

Creating this flight example using Cypher

In Neo4j, a simple syntax but powerful graph query language, Cypher is used. Cypher is designed around nodes and relationships, and it is suitable for recognizing patterns in graph modeling. In this quick exercise, I tried to replicate the above travel search example in Neo4j. The first step was to create initial nodes and their relationships.

Nodes and relationships can be described as ()-[]->() using Cypher. () indicates a node, and [] as a relationship with > showing the direction of the relationship. Each node will have labels :City, and the relationship is :FLY_TO, (:City)-[:FLY_TO]->(:City). To create new nodes and relationships, we use the function CREATE and properties for each node and relationship are inserted together. Below is the initial line.

CREATE (:City{name: ‘Beijing’, population: 21000000})-[:FLY_TO{airline: “Dubai Air”, time: 5.2, fleet: “A320”}]->(:City{name: ‘Dubai’, population: 3200000})

Once these initial nodes are created, we use the MATCH function to run a query. In order to RETURN something, we need to find nodes with information we can identify them with, and assign variable names to them; I assigned a variable name ‘beijing’ to a node that can be found with a property ‘name’.

What is Open Graph (Source: Facebook Developers)

RETURN beijing, dubai

I accidentally created 2 nodes with the same property name “Dubai”, but one without population. I had to delete the duplicated node with some conditionals, however in Neo4j, we must delete relationships connecting to a node before deleting it. I used the conditional exists() to find the duplicated nodes and deleted the relationship (the variable name: rel) together.

DELETE rel, dubai
MATCH (dubai:City{name:’Dubai’})

WHERE exists(dubai.population) = false

MATCH (dubai)-[rel]->()

DELETE rel, dubai

In order to change, add or remove properties, I have to use SET or REMOVE functions to do so, SET dubai.population = 3,200,000, for example.

MERGE is another useful function. When you declare with a node or relationship with an existing label or property, it will work just like the MATCH function. Otherwise, it will generate a new node or relationship. Here, routes between more locations are added.

MATCH (dubai :City{name: “Dubai”}), (sao: City{name: “Sao Paulo”})

MERGE (dubai)-[:FLY_TO{airline: “Dubai Air”, fleet: “A350”, time: 6.8, cost: 600}]->(sao)

MERGE (sao)-[:FLY_TO{airline: “Dubai Air”, fleet: “A350”, time: 7.8, cost: 620}]->(dubai)

And some more nodes and relationships were added…

With this data, we can find the cheapest flights departing from Beijing, for example.

MATCH (:City{name: “Beijing”})-[flight:FLY_TO]->(destination:City)

RETURN destination.name, flight.cost ORDER BY flight.cost

This is based off of the fake data I made up.

Also, the fastest path between two cities by the property values can be found easily using the APOC plugin. For instance, the code below can find the flight path with the shortest time based on the time property from Beijing to Sao Paulo.

MATCH (beijing: City{name: “Beijing”}),(sao: City{name: “Sao Paulo”})

CALL apoc.algo.dijkstra(beijing, sao, ‘FLY_TO’, ‘time’) YIELD path, weight

RETURN path, weight

Follow on medium: https://medium.com/@takuma.kakehi/graph-model-and-neo4j-where-data-is-designed-around-relationships-e70428d10ff0

Reference

Neo4j raises $80 million for next-generation graph databases | Venturebeat

Neo4j — get off the ground in 30min or less! | Neo4j via. Medium

Neo4j: GraphDB Foundations with Cypher | Udemy course

Concepts: Relational to Graph | Neo4j

What is Open Graph | Facebook Developers

How the Königsberg bridge problem changed mathematics — Dan Van der Vieren | TED-Ed

Königsberg: Seven Small Bridges, One Giant Graph Problem | base cs

Neo4j — Finding the Shortest Path between two nodes based on relationship property | Stack Overflow

Leave a comment

Your email address will not be published. Required fields are marked *