Eifrem neo4j

Neo4j
the beneﬁts of
graph databases

What's the plan?
Why now? – Four trends

NoSQL overview

Graph databases && Neo4j

Conclusions

Food

988

Trend 1:
data set size

40
2007 2010

Trend 2: connectedness
Giant
Global
Graph
(GGG)
Information connectivity

Ontologies

RDF

Folksonomies
Tagging

Wikis User-
generated
content
Blogs

RSS

Hypertext

Text
documents web 1.0 web 2.0 “web 3.0”
1990 2000 2010 2020

Trend 3: semi-structure
Individualization of content!
In the salary lists of the 1970s, all elements had exactly
one job
In the salary lists of the 2000s, we need 5 job columns!
Or 8? Or 15?

Trend accelerated by the decentralization of content
generation that is the hallmark of the age of participation
(“web 2.0”)

Aside: RDBMS performance
Relational database
Performance

Salary List

Majority of
Webapps

Social network

}
Semantic Trading

custom

Data complexity

Trend 4: architecture

1990s: Database as integration hub

Trend 4: architecture

2000s: (Slowly towards...)
Decoupled services with own backend

Why NoSQL 2009?
Trend 1: Size.

Trend 2: Connectivity.

Trend 3: Semi-structure.

Trend 4: Architecture.

First off: the damn name
NoSQL is NOT “Never SQL”

NoSQL is NOT “No To SQL”

NoSQL is NOT “WE HATE CHRIS' DOG”

NoSQL
is simply

ot nly !

Four (emerging) NoSQL categories
Key-value stores
Based on Amazon's Dynamo paper
Data model: (global) collection of K-V pairs
Example: Dynomite, Voldemort, Tokyo

BigTable clones
Based on Google's BigTable paper
Data model: big table, column families
Example: Hbase, Hypertable

Four (emerging) NoSQL categories
Document databases
Inspired by Lotus Notes
Data model: collections of K-V collections
Example: CouchDB, MongoDB

Graph databases
Inspired by Euler & graph theory
Data model: nodes, rels, K-V on both
Example: AllegroGraph, VertexDB, Neo4j

NoSQL data models
Size

Key-value stores

Bigtable clones

Document
databases

Graph databases

Complexity

NoSQL data models
Size

Key-value stores

Bigtable clones

Document
databases

Graph databases

(This is still of
90% nodes & relationships)
of
use
cases

Complexity

The Graph DB model: representation
Core abstractions: name = “Emil”
age = 29
Nodes sex = “yes”

Relationships between nodes
Properties on both

type = KNOWS
time = 4 years

type = car
vendor = “SAAB”
model = “95 Aero”

Example: The Matrix
name = “The Architect”
name = “Morpheus”
rank = “Captain”
occupation = “Total badass”
name = “Thomas Anderson”
age = 29
disclosure = public

KNOWS KNOWS KNO CODED_BY
WS

KN
S
KNO W

OW name = “Cypher”
S last name = “Reagan”
name = “Agent Smith”
disclosure = secret version = 1.0b
age = 3 days age = 6 months language = C++

name = “Trinity”

Code (1): Building a node space
NeoService neo = ... // Get factory

// Create Thomas 'Neo' Anderson
Node mrAnderson = neo.createNode();
mrAnderson.setProperty( "name", "Thomas Anderson" );
mrAnderson.setProperty( "age", 29 );

// Create Morpheus
Node morpheus = neo.createNode();
morpheus.setProperty( "name", "Morpheus" );
morpheus.setProperty( "rank", "Captain" );
morpheus.setProperty( "occupation", "Total bad ass" );

// Create a relationship representing that they know each other
mrAnderson.createRelationshipTo( morpheus, RelTypes.KNOWS );
// ...create Trinity, Cypher, Agent Smith, Architect similarly

Code (1): Building a node space
NeoService neo = ... // Get factory
Transaction tx = neo.beginTx();

// Create Thomas 'Neo' Anderson
Node mrAnderson = neo.createNode();
mrAnderson.setProperty( "name", "Thomas Anderson" );
mrAnderson.setProperty( "age", 29 );

// Create Morpheus
Node morpheus = neo.createNode();
morpheus.setProperty( "name", "Morpheus" );
morpheus.setProperty( "rank", "Captain" );
morpheus.setProperty( "occupation", "Total bad ass" );

// Create a relationship representing that they know each other
mrAnderson.createRelationshipTo( morpheus, RelTypes.KNOWS );
// ...create Trinity, Cypher, Agent Smith, Architect similarly

tx.commit();

Code (1b): Deﬁning RelationshipTypes
// In package org.neo4j.api.core
public interface RelationshipType
{
String name();
}

// In package org.yourdomain.yourapp
// Example on how to roll dynamic RelationshipTypes
class MyDynamicRelType implements RelationshipType
{
private final String name;
MyDynamicRelType( String name ){ this.name = name; }
public String name() { return this.name; }
}

// Example on how to kick it, static-RelationshipType-like
enum MyStaticRelTypes implements RelationshipType
{
KNOWS,
WORKS_FOR,
}

Whiteboard friendly

owns
Björn Big Car
build drives

DayCare

The Graph DB model: traversal
Traverser framework for name = “Emil”
high-performance traversing age = 29
sex = “yes”
across the node space

type = KNOWS
time = 4 years

type = car
vendor = “SAAB”
model = “95 Aero”

Example: Mr Andersonʼs friends
age = 29
disclosure = public

WS

KN
S
KNO W



Code (2): Traversing a node space
// Instantiate a traverser that returns Mr Anderson's friends
Traverser friendsTraverser = mrAnderson.traverse(
Traverser.Order.BREADTH_FIRST,
StopEvaluator.END_OF_GRAPH,
ReturnableEvaluator.ALL_BUT_START_NODE,
RelTypes.KNOWS,
Direction.OUTGOING );

// Traverse the node space and print out the result
System.out.println( "Mr Anderson's friends:" );
for ( Node friend : friendsTraverser )
{
System.out.printf( "At depth %d => %s%n",
friendsTraverser.currentPosition().getDepth(),
friend.getProperty( "name" ) );
}

age = 29
disclosure = public

WS

KN

S
KNO W

$ bin/start-neo-example
Mr Anderson's friends:

At depth 1 => Morpheus
friendsTraverser = mrAnderson.traverse(
Traverser.Order. BREADTH_FIRST ,
At depth 1 => Trinity
StopEvaluator. END_OF_GRAPH , At depth 2 => Cypher
ReturnableEvaluator. ALL_BUT_START_NODE
,
RelTypes. KNOWS , At depth 3 => Agent Smith
Direction. OUTGOING ); $

Example: Friends in love?
age = 29
disclosure = public

WS

KN
S
K NO W

LO disclosure = secret version = 1.0b
VE language = C++
S age = 6 months


Code (3a): Custom traverser
// Create a traverser that returns all “friends in love”
Traverser loveTraverser = mrAnderson.traverse(
Traverser.Order.BREADTH_FIRST,
StopEvaluator.END_OF_GRAPH,
new ReturnableEvaluator()
{
public boolean isReturnableNode( TraversalPosition pos )
{
return pos.currentNode().hasRelationship(
RelTypes.LOVES, Direction.OUTGOING );
}
},
RelTypes.KNOWS,
Direction.OUTGOING );

Code (3a): Custom traverser
// Traverse the node space and print out the result
System.out.println( "Who’s a lover?" );
for ( Node person : loveTraverser )
{
System.out.printf( "At depth %d => %s%n",
loveTraverser.currentPosition().getDepth(),
person.getProperty( "name" ) );
}

age = 29
disclosure = public

WS

KN

S
K NO W
LO disclosure = secret version = 1.0b
VE language = C++
S age = 6 months

$ bin/start-neo-example
Who’s a lover?
new ReturnableEvaluator()
{
public boolean isReturnableNode( At depth 1 => Trinity
TraversalPosition pos)
{ $
return pos.currentNode().
hasRelationship( RelTypes. LOVES,
Direction .OUTGOING );
}
},

Bonus code: domain model
How do you implement your domain model?
Use the delegator pattern, i.e. every domain entity wraps a
Neo4j primitive:
// In package org.yourdomain.yourapp
class PersonImpl implements Person
{
private final Node underlyingNode;
PersonImpl( Node node ){ this.underlyingNode = node; }

public String getName()
{
return this.underlyingNode.getProperty( "name" );
}
public void setName( String name )
{
this.underlyingNode.setProperty( "name", name );
}
}

Domain layer frameworks
Qi4j (www.qi4j.org)
Framework for doing DDD in pure Java5
Deﬁnes Entities / Associations / Properties
Sound familiar? Nodes / Relʼs / Properties!
Neo4j is an “EntityStore” backend

NeoWeaver (http://components.neo4j.org/neo-weaver)
Weaves Neo4j-backed persistence into domain objects
in runtime (dynamic proxy / cglib based)
Veeeery alpha

Neo4j system characteristics
Disk-based
Native graph storage engine with custom binary on-disk
format
Transactional
JTA/JTS, XA, 2PC, Tx recovery, deadlock detection,
MVCC, etc
Scales up (what's the x and the y?)
Several billions of nodes/rels/props on single JVM
Robust
6+ years in 24/7 production

Social network pathExists()
~1k persons
Avg 50 friends per
person
pathExists(a, b) limit
depth 4
Two backends
Eliminate disk IO so
warm up caches

Social network pathExists()

Emil

Mike Kevin
John
Marcus

Bruce Leigh

# persons query time
Relational database 1 000 2 000 ms
Graph database (Neo4j) 1 000 2 ms
Graph database (Neo4j) 1 000 000 2 ms

Pros & Cons compared to RDBMS
+ No O/R impedance mismatch (whiteboard friendly)
+ Can easily evolve schemas
+ Can represent semi-structured info
+ Can represent graphs/networks (with performance)

- Lacks in tool and framework support
- Few other implementations => potential lock in
- No support for ad-hoc queries

Language bindings
Neo4j.py – bindings for Jython and CPython
http://components.neo4j.org/neo4j.py
Neo4jrb – bindings for JRuby (incl RESTful API)
http://wiki.neo4j.org/content/Ruby
Clojure
http://wiki.neo4j.org/content/Clojure
Scala (incl RESTful API)
http://wiki.neo4j.org/content/Scala
… .NET? Erlang?

Conclusion
Graphs && Neo4j => teh awesome!
Available NOW under AGPLv3 / commercial license
AGPLv3: “if youʼre open source, weʼre open source”
If you have proprietary software? Must buy a commercial
license
But up to 1M primitives itʼs free for all uses!
Download
http://neo4j.org
Feedback
http://lists.neo4j.org

Poop 1
Key-value stores?
=> the awesome
… if you have 1000s of BILLIONS records OR you don't
care about programmer productivity

What if you had no variables at all in your programs except
a single globally accessible hashtable?
Would your software be maintainable?

Poop 2
In a not-suck architecture...

… the only thing that makes sense is to have an
embedded database.

Looking ahead: polyglot persistence

Questions?

Image credit: lost again! Sorry :(

Eifrem neo4j

More Related Content

Similar to Eifrem neo4j

Similar to Eifrem neo4j (20)

Eifrem neo4j