Data Modeling with Neo4j

Data Modeling with
Neo4j
1
Michael Hunger, Neo Technology
@neo4j | michael@neo4j.org
Thanks to: Ian Robinson, Mark Needham,Alistair Jones
Samstag, 31. August 13

Please ask questions
in the chat
I‘ll answer at the end.
Follow up email with missing answers,
video and slides.
2

(Michael) -[:WORKS_ON]-> (Neo4j)
ME
Spring Cloud
Community
Cypher
console
community
graph
Server
3

This Webinar
๏Graphs are everywhere
๏Graph Model Building Blocks
๏(NOSQL) Data Models
๏Designing a Data Model
๏Embrace the Paradigm
4

Addressing Data Complexity
With Graphs

complexity = f(size, semi-structure, connectedness)
Data Complexity

Are Graphs Everywhere ?

Social Network

(Network) Impact Analysis

Route Finding

Recommendations

Logistics

Access Control

Fraud Analysis

Securities and Debt
Image: orgnet.com

Graphs Are Everywhere !!

Graph Model Building
Blocks

Property Graph Data Model

Four Building Blocks
1.Nodes
2.Relationships
3.Properties
4.Labels

Nodes
๏ Used to represent entities in your domain
๏ Can contain properties
• Used to represent entity attributes and/or metadata
(e.g. timestamps, version)
• Key-value pairs
‣Java primitives
‣Arrays
‣null is not a valid value
• Every node can have different properties

Relationships

Relationships
๏ Every relationship has a name and a direction
• Add structure to the graph
• Provide semantic context for nodes
๏ Can contain properties
• Used to represent quality or weight of relationship,
or metadata
๏ Every relationship must have a start node and end node
• No dangling relationships

Relationships (continued)
Nodes can have
more than one
relationship
Self relationships are
allowed
Nodes can be connected by
more than one relationship

Variable Structure
๏ Relationships are deﬁned with regard to node
instances, not classes of nodes
• Different nodes can be connected in different ways
• Allows for structural variation in the domain
• Contrast with relational schemas, where foreign key
relationships apply to all rows in a table

Labels
๏ Every node can have zero or more labels attached
๏ Used to represent roles (e.g. user, product, company)
• Group nodes
• Allow us to associate indexes and constraints with
groups of nodes

Four Building Blocks
๏ Nodes
• Entities
๏ Relationships
• Connect entities and structure domain
๏ Properties
• Attributes and metadata
๏ Labels
• Group nodes by role

Aggregate vs.
Connected Data-Model
24

NOSQL
Relational
Graph
Document
KeyValue
Riak
Column
oriented
25
Redis
Cassandra
Mongo
Couch
Neo4j
MySQL
Postgres
NOSQL Databases

26
“There is a significant downside - the whole approach works
really well when data access is aligned with the aggregates, but
what if you want to look at the data in a different way? Order
entry naturally stores orders as aggregates, but analyzing
product sales cuts across the aggregate structure. The
advantage of not using an aggregate structure in the database
is that it allows you to slice and dice your data different ways
for different audiences.
This is why aggregate-oriented stores talk so much about map-
reduce.”
Martin Fowler
Aggregate Oriented Model

27
The connected data model is based on fine grained elements
that are richly connected, the emphasis is on extracting many
dimensions and attributes as elements.
Connections are cheap and can be used not only for the
domain-level relationships but also for additional structures
that allow efficient access for different use-cases. The fine
grained model requires a external scope for mutating
operations that ensures Atomicity, Consistency, Isolation and
Durability - ACID also known as Transactions.
Michael Hunger
Connected Data Model

28
Relational vs. Graph

You know relational
28

You know relational
28
foo

You know relational
28
foo bar

You know relational
28
foo barfoo_bar

You know relational
28
... now consider a graph

Designing a Graph Model
an Example

Models
Images:
en.wikipedia.org
Purposeful abstraction of a domain
designed to satisfy particular application/
end-user goals

Design for Queryability
Model

Query
Design for Queryability

Method
1. Identify application/end-user goals
2. Figure out what questions to ask of the domain
3. Identify entities in each question
4. Identify relationships between entities in each
question
5. Convert entities and relationships to paths
These become the basis of the data model
6. Express questions as graph patterns
These become the basis for queries

From User Story to Model and Query
1.
User story
4.
Paths
3.
Entities and
relationships
?2.
Questions we want
to ask
5.
Data model
6.
Query

1. Application/End-User Goals
As an employee
I want to know who in thecompany has similar skills to meSo that we can exchangeknowledge

2. Questions To Ask of the Domain
Which people, who work for the same
company as me, have similar skills to me?
As an employee
I want to know who in thecompany has similar skills tome
So that we can exchangeknowledge

Person
Company
Skill
3. Identify Entities

Person WORKS_FOR Company
Person HAS_SKILL Skill
4. Identify Relationships Between
Entities

5. Convert to Cypher Paths

Relationship
Label

Relationship
Label
(:Person)-[:WORKS_FOR]->(:Company),
(:Person)-[:HAS_SKILL]->(:Skill)

Consolidate Paths

Consolidate Paths
(:Company)<-[:WORKS_FOR]-(:Person)-[:HAS_SKILL]->(:Skill)

Candidate Data Model

6. Express Question as Graph Pattern

Cypher Query
MATCH (company)<-[:WORKS_FOR]-(me:Person)-[:HAS_SKILL]->(skill),
(company)<-[:WORKS_FOR]-(colleague)-[:HAS_SKILL]->(skill)
WHERE me.name = {name}
RETURN colleague.name AS name,
count(skill) AS score,
collect(skill.name) AS skills
ORDER BY score DESC

ORDER BY score DESC
Graph Pattern

ORDER BY score DESC
Anchor Pattern in Graph

ORDER BY score DESC
Create Projection of Results

First Match

Second Match

Third Match

From User Story to Model and Query
ORDER BY score DESC
As an employee
I want to know who in thecompany has similar skills tome
So that we can exchangeknowledge
?Which people, who work for the
same company as me, have similar
skills to me?

Embrace the
paradigm

Use the building blocks
๏ Nodes
๏ Relationships
๏ Properties name: value
RELATIONSHIP_NAME

Anti-pattern: rich properties
name: “Canada”
languages_spoken: “[ ‘English’, ‘French’ ]”

Normalize Nodes

Anti-Pattern: Node represents multiple
concepts
name
age
position
company
department
project
skills
Person

HAS_SKILL
Normalize into separate concepts
name
age
Person
name
number_of_employees
Company
WORKS_FOR
Skill
name

Challenge: Property or Relationship?
๏ Can every property be replaced by a relationship?
• Hint: triple stores. Are they easy to use?
๏ Should every entity with the same property values be
connected?

Object Mapping
๏ Similar to how you would map objects to a relational
database, using an ORM such as Hibernate
๏ Generally simpler and easier to reason about
๏ Examples
• Java: Spring Data Neo4j
• Ruby: Active Model
๏ Why Map?
• Do you use mapping because you are scared of SQL?
• Following DDD, could you write your repositories
directly against the graph API?

CONNECT for fast access
In-Graph Indices

Relationships for querying
๏ like in other databases
• same structure for different use-cases (OLTP and
OLAP) doesn‘t work
• graph allows: add more structures
๏ Relationships should the primary means to access
nodes in the database
๏ Traversing relationships is cheap – that’s the whole
design goal of a graph database
๏ Use lookups only to ﬁnd starting nodes for a query
Data Modeling examples in Manual

Anti-pattern: unconnected graph
name: “Jones” name: “Jones”
name: “Jones”
name: “Jones”
name: “Jones”
name: “Jones”
name: “Jones” name: “Jones”
name: “Jones”
name: “Jones”
name: “Jones”

Pattern: Linked List
61

Pattern: Multiple Relationships
62

Pattern-Trees:Tags and Categories
63

Pattern-Tree: Multi-Level-Tree
64

Pattern-Trees: R-Tree (spatial)
65

Example:Activity Stream
66

Graph Evolution
67

Evolution: Relationship to Node
68
Peter
SENT_EMAIL
Michael
Peter EMAIL_FROM
Michael
EMAIL_TO
Email
Emil
EMAIL_CC
Community
TAGGED
. . .
see Hyperedges

Combine multiple Domains in a Graph
๏ you start with a single domain
๏ add more connected domains as your system evolves
๏ more domains allow to ask different queries
๏ one domain „indexes“ the other
๏ Example Facebook Graph Search
• social graph
• location graph
• activity graph
• favorite graph
• ...

Notes on the Graph Data Model
๏Schema free, but constraints
๏Model your graph with a whiteboard and a wise man
๏Nodes as main entities but useless without connections
๏Relationships are ﬁrst level citizens in the model and database
๏Normalize more than in a relational database
๏use meaningful relationship-types, not generic ones like IS_
๏use in-graph structures to allow different access paths
๏evolve your graph to your needs, incremental growth
70

๏ Worldwide one-day Neo4j Tutorials
๏ Books
• Graph Databases, Neo4j in Action
๏ neo4j.org
• http://neo4j.org/develop/modeling
๏ docs.neo4j.org
• Data Modeling Examples
๏ http://console.neo4j.org
๏ http://gist.neo4j.org
๏ Get Neo4j
• http://neo4j.org/download
๏ Participate
• http://groups.google.com/group/neo4j
How to get started?
72

ThankYou!
Questions ?
73

An Example
74

What language do they speak here?
Language Country

Tables
language_code
language_name
word_count
Language
country_code
country_name
ﬂag_uri
Country

Need to model the relationship
language_code
language_name
word_count
Language
country_code
country_name
ﬂag_uri
language_code
Country

What if the cardinality changes?
language_code
language_name
word_count
country_code
Language
country_code
country_name
ﬂag_uri
Country

Or we go many-to-many?
language_code
language_name
word_count
Language
country_code
country_name
ﬂag_uri
Country
language_code
country_code
LanguageCountry

Or we want to qualify the relationship?
language_code
language_name
word_count
Language
country_code
country_name
ﬂag_uri
Country
language_code
country_code
primary
LanguageCountry

Start talking about
Graphs

Explicit Relationship
name
word_count
Language
name
ﬂag_uri
Country
IS_SPOKEN_IN

Relationship Properties
name
word_count
Language
name
ﬂag_uri
Country
IS_SPOKEN_IN
as_primary

What’s different?
language_code
language_name
word_count
Language
country_code
country_name
ﬂag_uri
Country
language_code
country_code
primary
LanguageCountry
IS_SPOKEN_IN

What’s different?
๏ Implementation of maintaining relationships is left up
to the database
๏ Artiﬁcial keys disappear or are unnecessary
๏ Relationships get an explicit name
• can be navigated in both directions

Relationship specialisation
name
word_count
Language
name
ﬂag_uri
Country
IS_SPOKEN_IN
as_primary

Bidirectional relationships
name
word_count
Language
name
ﬂag_uri
Country
IS_SPOKEN_IN
PRIMARY_LANGUAGE

Weighted relationships
name
word_count
Language
name
ﬂag_uri
Country
POPULATION_SPEAKS
population_fraction

Keep on adding relationships
name
word_count
Language
name
ﬂag_uri
Country
POPULATION_SPEAKS
population_fraction
SIMILAR_TO ADJACENT_TO

Realworld Examples
92

93
Real World Use Cases:

93
•[A] ACL from Hell

93
•[B] Timely recommendations

93
•[C] Global collaboration

94

94
•[C] Global collaboration

[A] ACL from Hell
95

[A] ACL from Hell
๏ Customer:
• leading consumer utility company with tons and
tons of users
๏ Goal:
• comprehensive access control administration
for customers
๏ Beneﬁts:
• Flexible and dynamic architecture
• Exceptional performance
• Extensible data model supports new
applications and features
• Low cost
95

[A] ACL from Hell
๏ Customer:
tons of users
๏ Goal:
for customers
๏ Beneﬁts:
• Low cost
95
• A Reliable access control administration system for
5 million customers, subscriptions and agreements
• Complex dependencies between groups, companies,
individuals, accounts, products, subscriptions, services and
agreements
• Broad and deep graphs (master customers with 1000s of
customers, subscriptions & agreements)

[A] ACL from Hell
๏ Customer:
tons of users
๏ Goal:
for customers
๏ Beneﬁts:
• Low cost
95
• A Reliable access control administration system for
5 million customers, subscriptions and agreements
• Complex dependencies between groups, companies,
individuals, accounts, products, subscriptions, services and
agreements
• Broad and deep graphs (master customers with 1000s of
customers, subscriptions & agreements)
name: Andreas
subscription: sports
service: NFL
account: 9758352794
agreement: ultimate
owns
subscribes to
has plan
includes
provides group: graphistas
promotion: fall
member of
offered
discounts
company: Neo
Technologyworks with
gets discount on
subscription: local
subscribes to
provides service: Ravens
includes

[B] Timely Recommendations
96

๏ Customer:
• a professional social network
• 35 millions users, adding 30,000+ each day
๏ Goal: up-to-date recommendations
• Scalable solution with real-time end-user
experience
• Low maintenance and reliable architecture
• 8-week implementation
96

๏ Customer:
experience
96
๏ Problem:
• Real-time recommendation imperative to attract new
users and maintain positive user retention
• Clustered MySQL solution not scalable or fast enough
to support real-time requirements
๏ Upgrade from running a batch job
• initial hour-long batch job
• but then success happened, and it became a day
• then two days
๏ With Neo4j, real time recommendations

๏ Customer:
experience
96
๏ Problem:
• Real-time recommendation imperative to attract new
users and maintain positive user retention
• Clustered MySQL solution not scalable or fast enough
to support real-time requirements
๏ Upgrade from running a batch job
• initial hour-long batch job
• but then success happened, and it became a day
• then two days
๏ With Neo4j, real time recommendations
name:Andreas
job: talking
name: Allison
job: plumber
name: Tobias
job: coding
knows
knows
name: Peter
job: building
name: Emil
job: plumber
knows
name: Stephen
job: DJ
knows
knows
name: Delia
job: barking
knows
knows
name: Tiberius
job: dancer
knows
knows
knows
knows

[C] Collaboration on Global Scale
97

๏ Customer: a worldwide software leader
• highly collaborative end-users
๏ Goal: offer an online platform for global collaboration
• Highly ﬂexible data analysis
• Sub-second results for large, densely-connected data
• User experience - competitive advantage
97

97
• Massive amounts of data tied to members, user
groups, member content, etc. all interconnected
• Infer collaborative relationships through user-
generated content
• Worldwide Availability

97
generated content
Asia North America Europe

97
generated content

© All Rights Reserved 2013 | Neo
Technology, Inc.
Background
Business problem
Neo Technology Confidential
Solution & Benefits
San Jose, CA
Cisco.com
Industry: Communications
Use case: Recommendations
• Call center volumes needed to be lowered by improving
the efficacy of online self service
• Leverage large amounts of knowledge stored in service
cases, solutions, articles, forums, etc.
• Problem resolution times, as well as support costs, needed
to be lowered
• Cisco.com serves customer and business customers with
Support Services
• Needed real-time recommendations, to encourage use of
online knowledge base
• Cisco had been successfully using Neo4j for its internal
master data management solution.
• Identified a strong fit for online recommendations
• Cases, solutions, articles, etc. continuously scraped for cross-
reference links, and represented in Neo4j
• Real-time reading recommendations via Neo4j
• Neo4j Enterprise with HA cluster
• The result: customers obtain help faster, with decreased reliance
on customer support
Support
Support
Solution
Message

Technology, Inc.
Background
Business problem
San Jose, CA
Cisco HMP
Use case: Master Data Management
• Sales compensation system had become unable to meet
Cisco’s needs
• Existing Oracle RAC system had reached its limits:
• Insufficient flexibility for handling complex
organizational hierarchies and mappings
• “Real-time” queries were taking > 1 minute!
• Business-critical “P1” system needs to be continually
available, with zero downtime
• One of the world’s largest communications equipment
manufacturers
• #91 Global 2000. $44B in annual sales.
• Needed a system that could accommodate its master data
hierarchies in a performant way
• HMP is a Master Data Management system at whose heart
is Neo4j. Data access services available 24x7 to
applications companywide
• Cisco created a new system: the Hierarchy Management
Platform (HMP)
• Allows Cisco to manage master data centrally, and centralize
data access and business rules
• Neo4j provided “Minutes to Milliseconds” performance over
Oracle RAC, serving master data in real time
• The graph database model provided exactly the flexibility
needed to support Cisco’s business rules
• HMP so successful that it has expanded to
include product hierarchy

Technology, Inc.
Background
Business problem
Industry: Logistics
Use case: Parcel Routing
• 24x7 availability, year round
• Peak loads of 2500+ parcels per second
• Complex and diverse software stack
• Need predictable performance & linear scalability
• Daily changes to logistics network: route from any point, to
any point
• One of the world’s largest logistics carriers
• Projected to outgrow capacity of old system
• New parcel routing system
• Single source of truth for entire network
• B2C & B2B parcel tracking
• Real-time routing: up to 5M parcels per day
• Neo4j provides the ideal domain fit:
• a logistics network is a graph
• Extreme availability & performance with Neo4j clustering
• Hugely simplified queries, vs. relational for complex routing
• Flexible data model can reflect real-world data variance much
better than relational
• “Whiteboard friendly” model easy to understand

Technology, Inc.
Background
Business problem
Sausalito, CA
GlassDoor
Industry: Online Job Search
Use case: Social / Recommendations
• Wanted to leverage known fact that most jobs are found
through personal & professional connections
• Needed to rely on an existing source of social network
data. Facebook was the ideal choice.
• End users needed to get instant gratiﬁcation
• Aiming to have the best job search service, in a very
competitive market
• Online jobs and career community, providing anonymized
inside information to job seekers
• First-to-market with a product that let users ﬁnd jobs through
their network of Facebook friends
• Job recommendations served real-time from Neo4j
• Individual Facebook graphs imported real-time into Neo4j
• Glassdoor now stores > 50% of the entire Facebook social
graph
• Neo4j cluster has grown seamlessly, with new instances being
brought online as graph size and load have increased
KNOW
S
KNOWS
KNOWS
WORKS_AT
WORKS_AT

Technology, Inc.
Background
Business problem
San Jose, CA
Adobe
Industry: Web/ISV
Use case: Content Management, Social,Access Control
• Adobe needed a highly robust and available, 24x7
distributed global system, supporting collaboration for
users of its highest revenue product line
• Storing creative artifacts in the cloud meant managing
access rights for (eventually) millions of users, groups,
collections, and pieces of content
• Complex access control rules controlling who was
connected to whom, and who could see or edit what,
proved a signiﬁcant technical challenge
• One of the ten largest software companies globally
• $4B+ in revenue. Over 11,000 employees.
• Launched Creative Cloud in 2012, allowing its Creative
Suite users to collaborate via the Cloud
• Selected Neo4j to meet very aggressive project deadlines.The
ﬂexibility of the graph model, and performance, were the two
major selection factors.
• Easily evolve the system to meet tomorrow’s needs
• Extremely high availability and transactional performance
requirements. 24x7 with no downtime.
• Neo4j allows consistently fast response times with complex
queries, even as the system grows
• First (and possibly still only) database cluster to run across three
Amazon EC2 regions: U.S., Europe,Asia
User-Content-Access

Technology, Inc.
Background
Business problem
Paris, France
SFR
Use case: Network Management
• Infrastructure maintenance took one full week to plan,
because of the need to model network impacts
• Needed rapid, automated “what if” analysis to ensure
resilience during unplanned network outages
• Identify weaknesses in the network to uncover the need
for additional redundancy
• Network information spread across > 30 systems, with
daily changes to network infrastructure
• Business needs sometimes changed very rapidly
• Second largest communications company in France
• Part ofVivendi Group, partnering withVodafone
• Flexible network inventory management system, to support
modeling, aggregation & troubleshooting
• Single source of truth (Neo4j) representing the entire network
• Dynamic system loads data from 30+ systems, and allows new
applications to access network data
• Modeling efforts greatly reduced because of the near 1:1
mapping between the real world and the graph
• Flexible schema highly adaptable to changing business
requirements
Router
DEPENDS_O
N
Switch Switch
Router
Fiber
Fiber
Fiber
DEPENDS_ON
DEPENDS_ON
DEPEN
DS_O
N
DEPENDS_ON
DEPENDS_ON
DEPENDS_ON
DEPENDS_ON
DEPENDS_ON
DEPENDS_ON
LINKE
D
LINKE
D
LIN
KE
D
DEPENDS_ON

Technology, Inc.
Background
Business problem
Frankfurt, Germany
Deutsche Telecom
Use case: Social gaming
• The Fanorakel application allows fans to have an interactive
experience while watching sports
• Fans can vote for referee decisions and interact with other
fans watching the game
• Highly connected dataset with real-time updates
• Queries need to be served real-time on rapidly changing
data
• One technical challenge is to handle the very high spikes of
activity during popular games
• Europe’s largest communications company
• Provider of mobile & land telephone lines to consumers
and businesses, as well as internet services, television, and
other services
• Interactive, social offering gives fans a way to experience the
game more closely
• Increased customer stickiness for Deutsche Telekom
• A completely new channel for reaching customers with
information, promotions, and ads
• Clear competitive advantage
Interactive Television

Technology, Inc.
Background
Business problem
Global (U.S., France)
Hewlett Packard
Industry: Web/ISV, Communications
Use case: Network Management
• Use network topology information to identify root
problems causes on the network
• Simplify alarm handling by human operators
• Automate handling of certain types of alarms Help
operators respond rapidly to network issues
• Filter/group/eliminate redundant Network Management
System alarms by event correlation
• World’s largest provider of IT infrastructure, software &
services
• HP’s Uniﬁed Correlation Analyzer (UCA) application is a
key application inside HP’s OSS Assurance portfolio
• Carrier-class resource & service management, problem
determination, root cause & service impact analysis
• Helps communications operators manage large, complex
and fast changing networks
• Accelerated product development time
• Extremely fast querying of network topology
• Graph representation a perfect domain ﬁt
• 24x7 carrier-grade reliability with Neo4j HA clustering
• Met objective in under 6 months

Technology, Inc.
Background
Business problem
Oslo, Norway
Telenor
Use case: Resource Authorization & Access Control
• Degrading relational performance. User login taking
minutes while system retrieved access rights
• Millions of plans, customers, admins, groups.
Highly interconnected data set w/massive joins
• Nightly batch workaround solved the performance
problem, but meant data was no longer current
• Primary system was Sybase. Batch pre-compute
workaround projected to reach 9 hours by 2014: longer
than the nightly batch window
• 10th largest Telco provider in the world, leading in the
Nordics
• Online self-serve system where large business admins
manage employee subscriptions and plans
• Mission-critical system whose availability and
responsiveness is critical to customer satisfaction
• Moved authorization functionality from Sybase to Neo4j
• Modeling the resource graph in Neo4j was straightforward, as
the domain is inherently a graph
• Able to retire the batch process, and move to real-time
responses: measured in milliseconds
• Users able to see fresh data, not yesterday’s snapshot
• Customer retention risks fully mitigated
SUBSCRIBED_BY
CONTROLLED_B
PART_O
User
USER_ACCESS

Technology, Inc.
Background
Business problem
Silicon Valley & France
Viadeo
Industry: Professional Social Network
Use case: Social, Recommendations
• Business imperative for real-time recommendations: to
attract new users and retain existing ones
• Key differentiator: show members how they are connected
to any other member
• Real-time traversals of social graph not feasible with
MySQL cluster. Batch precompute meant stale data.
• Process taking longer & longer: > 1 week!
• World’s second-largest professional network
(after LinkedIn)
• 50M members. 30K+ new members daily.
• Over 400 staff with ofﬁces in 12 countries
• Neo4j solution implemented in 8 weeks with 3 part-time
programmers
• Able to move from batch to real-time: improved responsiveness
with up-to-date data.
• Viadeo (at the time) had 8M members and 35M relationships.
• Neo4j cluster now sits at the heart of Viadeo’s professional
network, connecting 50M+ professionals

Technology, Inc.
Background
Business problem
Hong Kong
Maaii
Use case: Social, Mobile
• Launched a new mobile communication app “Maaii”
allowing consumers to communicate by voice & text
(Similar to Line,Viber, Rebtel,VoxOx...)
• Needed to store & relate devices, users, and contacts
• Import phone numbers from users’ address books. Rapidly
serve up contacts from central database
to the mobile app
• Currently around 3M users w/200M nodes in the graph
• Hong Kong based telephony infrastructure provider
(aka M800 aka Pop Media)
• Exclusive China Mobile partner for international toll-free
services. SMS Hub & other offerings
• 2012 Red Herring Top 100 Global Winner
• Quick transactional performance for key operations:
• friend suggestions (“friend of friend”)
• updating contacts, blocking calls, etc.
• etc.
• High availability telephony app uses Neo4j clustering
• Strong architecture ﬁt: Scala w/Neo4j embedded

Technology, Inc.
Background
Business problem
Zürich, Switzerland
Junisphere
Industry: Web/ISV, Communications
Use case: Data Center Management
• “Business Service Management” requires mapping of
complex graph, covering: business processes--> business
services--> IT infrastructure
• Embed capability of storing and retrieving this information
into OEM application
• Re-architecting outdated C++ application based on
relational database, with Java
• Junisphere AG is a Zurich-based IT solutions provider
• Founded in 2001. Profitable. Self funded.
• Software & services.
• Novel approach to infrastructure monitoring:
Starts with the end user, mapped to business processes
and services, and dependent infrastructure
• Actively sought out a Java-based solution that could store data
as a graph
• Domain model is reflected directly in the database:
• “No time lost in translation”
• “Our business and enterprise consultants now speak the
same language, and can model the domain with the database
on a 1:1 ratio.”
• Spring Data Neo4j strong fit for Java architecture

Technology, Inc.
Background
Business problem
San Francisco, CA
Teachscape
Industry: Education
Use case: Resource Authorization & Access Control
• Neo4j was selected to be at the heart of a new
architecture.
• The user management system, centered around Neo4j, will
be used to support single sign-on, user management,
contract management, and end-user access to their
subscription entitlements.
• Teachscape, Inc. develops online learning tools for K-12
teachers, school principals, and other instructional leaders.
• Teachscape evaluated relational as an option, considering
MySQL and Oracle.
• Neo4j was selected because the graph data model
provides a more natural fit for managing organizational
hierarchy and access to assets.
• Domain and technology fit
• simple domain model where the relationships are relatively complex. Secondary
factors included support for transactions, strong Java support, and well-
implemented Lucene indexing integration
• Speed and Flexibility
• The business depends on being able to do complex walks quickly and efficiently.
This was a major factor in the decision to use Neo4j.
• Ease of Use
• accommodate efficient access for home-grown and commercial off-the-shelf
applications, as well as ad-hoc use.
• Extreme availability & performance with Neo4j clustering
• Hugely simplified queries, vs. relational for complex routing
• Flexible data model can reflect real-world data variance much better than

Technology, Inc.
Background
Business problem
Cambridge, Massachusetts
SevenBridges Genomics
Industry: Life Sciences
Use case: Content Management
• Neo4j is used to store metadata about each sequenced
genome (including a pointer to the sequenced genome
itself, which is a binary file stored on Amazon S3), and to
support search and other forms of information processing
against the genomic data.
• graph database was chosen because “Our specific domain
maps naturally onto graph paradigm”.
• Bioinformatics company offering gene sequencing
"as a service" (over the web)
• Provider of genomic information services
• Needed a new platform to support storage & retrieval of
sequenced genomes in the cloud
•Domain fit
• Domain naturally lends itself to a graph representation.
• Graph model determined to be a perfect fit.
•Agility & Performance
• Saved time with Neo4j as compared to the alternatives.
• Queries “practically write themselves.”
•Solution Completeness
• “Neo4j is incomparably better than other graph databases.”

112
Really, once you start
thinking in graphs
it's hard to stop
Recommendations MDM
Systems
Management
Geospatial
Social computing
Business intelligence
Biotechnology
Making Sense of all that
data
your brain
access control
linguistics
catalogs
genealogyrouting
compensation market vectors

112
Really, once you start
thinking in graphs
it's hard to stop
Recommendations MDM
Systems
Management
Geospatial
Social computing
Business intelligence
Biotechnology
Making Sense of all that
data
your brain
access control
linguistics
catalogs
genealogyrouting
compensation market vectors
What will you build?

is a
114

115
NOSQL

Graph Database
116

A graph database...
117

A graph database...
117
NO: not for charts & diagrams, or vector artwork

A graph database...
117
YES: for storing data that is structured as a graph

A graph database...
117
remember linked lists, trees?

A graph database...
117
graphs are the general-purpose data structure

A graph database...
117
graphs are the general-purpose data structure
“A relational database may tell you the average age of everyone
in this place,
but a graph database will tell you who is most likely to buy you a
beer.”

Data Modeling
118

Why Data Modeling
119
๏What is modeling?
๏Aren‘t we schema free?
๏How does it work in a
graph?
๏Where should modeling
happen? DB or Application

Data Models
120

Model mis-match
Real World Model

Model mis-match
Application Model Database Model

Trinity of models

Whiteboard --> Data
124

Whiteboard --> Data
124
Andreas
Peter
Emil
Allison

Whiteboard --> Data
124
Andreas
Peter
Emil
Allison
knows
knows knows
knows

Whiteboard --> Data
124
Andreas
Peter
Emil
Allison
knows
knows knows
knows
// Cypher query - friend of a friend
start n=node(0)
match (n)-->()-[:KNOWS|LIKES]->(foaf)
WHERE NOT((n)[:KNOWS]-->(foaf))
return foaf

You traverse the graph
125

// lookup starting point in an index
START n=node:People(name = ‘Andreas’)
Andreas
125

// lookup starting point in an index
START n=node:People(name = ‘Andreas’)
Andreas
125
// then traverse to find results
START me=node:People(name = ‘Andreas’
MATCH (me)-[:FRIEND]-(friend)-[:FRIEND]-(friend2)
RETURN friend2

SELECT skills.*, user_skill.*
FROM users
JOIN user_skill ON users.id = user_skill.user_id
JOIN skills ON user_skill.skill_id = skill.id WHERE users.id = 1
126
START user = node(1)
MATCH user -[user_skill]-> skill
RETURN skill, user_skill

Data Modeling with Neo4j

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Data Modeling with Neo4j

Similar to Data Modeling with Neo4j (20)

More from Neo4j

More from Neo4j (20)

Recently uploaded

Recently uploaded (20)

Data Modeling with Neo4j