Neo4j: Graph-like power
Graph-like power
Roman R.
MATCH (a:Actor),(m:Movie)
WHERE ='Keanu Reeves'
AND m.title='The Matrix'
CREATE (actor)-[:ACTS_IN]->(movie)
○ Graphs in NoSQL world
○ classification
○ definition
○ components
○ Neo4j
○ nodes, rels, props, indexes
○ Cypher
○ PHP and Neo4j
○ Demo
○ Alternatives
○ Q/A
NoSQL Databases
Tokyo Cabinet
Infinite GraphAllegroGraph

What is a Graph in math
● represent a connected set of objects
● graph:
○ vertex (node/points)
○ edge (arc/line/relationship/arrow) - undirected
○ attribute (property) - on node/relationship
● types:
○ pair: G = (V, E)
○ digraph: D = (V, A)
○ mixed: G = (V, E, A)
V = {1, 2, 3, 4, 5, 6}
E = {{1, 2}, {1, 5}, {2, 3}, {2, 5}, {3, 4}, {4, 5}, {4, 6}}
What is a Graph database
● stores data in a graph and retrieving vast networks of data
● shines when storing richly-connected data
● consists of nodes, connected by relationships
○ A Graph —records data in→ Nodes —which have→ Properties
○ Nodes —are organized by→ Rels —which also have→ Properties
○ Nodes —are grouped by→ Labels —into→ Sets
○ A Traversal —navigates→ a Graph
it —identifies→ Paths —which order→ Nodes
○ An Index —maps from→ Properties —to either→ Nodes or Rels
○ A Graph Database —manages a→ Graph and
—also manages related→ Indexes
Nodes, Rels, Props, Labels
A Graph
—records data in→ Nodes
—which have→ Properties
—are organized by→ Relationships
—which also have→ Properties
—are grouped by→ Labels
—into→ Sets
Graph Traversal
A Traversal
—navigates→ a Graph
—identifies→ Paths
—which order→ Nodes
what music
do my friends like
that I don’t yet own
if this power supply goes down,
what web services
are affected?

Graph Index
An Index
—maps from→ Properties
—to either→ Nodes or Rels
find the Account
for username master-of-graphs
A Graph Database
—manages a→ Graph and
—also manages related→ Indexes
How looks Graph database
A Graph Database transforms a RDBMS

A Graph Database elaborates a Key-Value Store
K* = key
V* = value
A Graph Database relates Column-Family
● BigTable databases are an evolution of key-value,
using "families" to allow grouping of rows
● stored in a graph, the families could become
hierarchical, and the relationships among data
becomes explicit
A Graph Database navigates a Document Store
D2/S2 = reference
NoSQL Data Models
90% of all use cases
Relational Databases

● intuitive, using a graph model for data representation
● reliable, fully transactional, upholds ACID
● durable and fast, using a custom disk-based, native storage engine
● massively scalable, up to several billion nodes/relationships/properties
● highly-available, when distributed across multiple machines
● expressive, with a powerful, human readable declarative graph query
● fast, with a powerful traversal framework for high-speed graph queries
● embeddable, with a few small jars
● simple, accesible by a convenient REST API interface or an object-
oriented JAVA API
● indexes are based on Apache Lucene, supports Secondary Indexes
● has been in commercial development for 10 years and in production for
over 7 years; since 2003;
● Cross-platform; Simple set-up; Well documented; Open source;
● GPL for Community, AGPL for Enterprise
Neo4j features
● CPU - Intel Core i3/i7
● Memory - 2GB .. 16/32GB
● Disk - 10GB SATA .. SSD w/ SATA
● Filesystem - ext4 .. ext4/ZFS
● Software - Oracle JAVA 7
Neo4j requirements
● Neo4j Community
○ Open-Source High Performance
○ fully ACID transactional graph database
● Neo4j Enterprise
○ High-Performance Cache (up to 10x faster)
○ Horizontal scalability with Neo4j Clustering (predictable scalability)
○ High-availability and online backups
○ Cache based sharding (shard your graph in memory)
○ Advanced Monitoring (operational metrics)
○ Certified for Windows and Linux
○ Email/Phone Support (10x5, 24x7 hours)
○ Subscriptions
■ Personal (up to 3 devs, $100k annual revenue) = FREE
■ Startups (<$10M funding, <$5M annual revenue) = $12k
■ Business (medium, to Global 2000) = Contact Sales
Neo4j license

apache solrjavasearch engine
● for the simple friends of friends query, Neo4j is 60% faster than MySQL
● for friends of friends of friends, Neo is 180 times faster
● and for the depth four query, Neo4j is 1,135 times faster
● and MySQL just chokes on the depth 5 query
Neo4j vs. Mysql
Neo4j: Nodes
● fundamental units that form a graph
● can have key/value-style properties
● index nodes and relationships
by {key, value} pairs
● represent entities
Neo4j: Relationships #1/2
● connect entities and structure domain
● allow for finding related data
● are always directed (outgoing or incoming)
● are equally well traversed in either direction
● can have relationships to itself
● have a relationship type (label)
Neo4j: Relationships #2/2

Neo4j: Properties
● nodes and relationships can have properties
● are key-value pairs
○ key is a string
○ values can be either a primitive or an array of
one primitive type
■ boolean, String, int, int[], etc
■ Java Language Specification
● entity attributes, rels qualities,
and metadata
Neo4j: Labels
● used to group nodes into sets
● any number of labels, including none
● can be added and removed during runtime
● can be used to mark temporary states for nodes
● names case-sensitive
● CamelCase (convention)
Neo4j: Paths
● is one or more nodes with connecting relationships
● shortest path:
● a path of length one:
● a path of length one:
Neo4j: Traversal
● Traversal Framework from box
● means visiting nodes, following relationships by rules
● in most cases only a subgraph is visited
● callback based traversal API
○ you can specify the traversal rules
● traversing breadth- or depth-first
● open Java API

Neo4j: graph algorithms
● A* (> uses the A* algorithm to find the cheapest path between two
● Dijkstra (dijkstra > Dijkstra algorithm to find the cheapest path
between two nodes)
● PathWithLength (> all paths of a certain length (depth)
between two nodes)
● Shortest paths (shortestPath Default > find all the
shortest paths between two nodes)
● All simple paths (allSimplePaths > find all simple paths
between two nodes; without loops;)
● All paths (allPaths > find all available paths between two
Neo4j: Schema
● is schema-optional graph database
● introduced in Neo4j 2.0
● eventually available (populating in the background, is
not immediately available for querying)
○ come online after fully populated
○ failed status (drop and recreate the index)
● can be created on labels group
● indexed Nodes & Rels
● node_auto_indexing=false,
Neo4j: Index
Neo4j: Constraints
● can help you keep your data clean
● specify the rules for what your data should
look like
● unique constraints is the only available
constraint type

● single server instance
○ nodes = 2^35 (~34 billion)
○ relationships = 2^35 (~34 billion)
○ labels = 2^31 (~2 billion)
○ properties = 2^36 to 2^38 depending on
property types (maximum ~274 billion, always
at least ~68 billion)
○ relationship types = 2^15 (~ 32’000)
Neo4j: Data Size
● powerful graph query language
● relatively simple
● declarative grammar (say what you want, not how)
● humane query language
● self-explanatory (based on English prose and neat iconography)
● written in Scala
● pattern-matching (borrows expression approaches from SPARQL)
● aggregation, ordering, limits
● create, update, delete
● structure and most of keywords inspired by SQL
● changing rather rapidly (CYPHER 1.9 START ...)
Cypher Query Language
“Makes the simple things easy, and the complex things possible”
Cypher patterns #1/2
● (a)
● (b)
● (a)-->(b)
● (a)-->(b)-->(c)
● (b)-->(c)<--(a)
● (b)-->()<--(a)
● (a)--(b)
● (a)-(*5)->(b)
● (a)-(*3..5)->(b)
○ (a)-(*3..)->(b)
○ (a)-(*..5)->(b)
○ (a)-(*)->(b)
Cypher patterns #2/2
● (a:Label)-->(m)
● (a:User:Admin)-->(m)
● (a)--(m)
● (a)-[r]->(m)
● (a)-[ACTED_IN]->(m)
● (a)-[r:SOME|ELSE|WTH]->(m)

“It all starts with the START”
Michael Hunger, Cypher webinar, Sep 2012
● designates the start points
● START is optional (in Neo4j >= 2.0)
● START <lookup> RETURN <expression>
● START n=node(0) RETURN n
● START n=node(*) RETURN
Cypher: MATCH
● primary way of getting data from the database
● START <lookup> MATCH <pattern> RETURN <expr>
● OPTIONAL MATCH <lookup> RETURN <expr>
● MATCH (n) RETURN count(n)
● MATCH (actor:Actor) RETURN;
● START me=node(0) MATCH (me)--(f) RETURN
● MATCH (n)-[r]->(m) RETURN n AS FROM, r AS `->`, m AS TO
● creates nodes and relationships
● CREATE (<name>[:label] [properties,..])
● CREATE (<node-in>)-[<var>:RELATION [properties,..]]->(<node-out>);
● CREATE (n:Actor { name:"Keanu Reeves" });
● CREATE (keanu)-[:ACTED_IN]->(matrix)
● MATCH (keanu {name:”..”}) SET keanu.age=49 RETURN
Cypher: CREATE / SET
Cypher: WHERE
● filters the results
● MATCH <pattern> WHERE <condition> RETURN <expr>
● WHERE =~ “(?i)John.*”
● WHERE type(rel) =~ “Perso.*”

Cypher: RETURN
● creates the result table
● any query can return data
● can be nodes, relationships, or properties on these
● RETURN DISTINCT <expression> AS x
● RETURN aggregate(expr) as alias
● RETURN nodes, rels, properties
● RETURN expressions of funcs and operators
● RETURN aggregation funcs on the above
Cypher: etc
● ORDER BY node.key, node2.key, .. ASC|DESC
● WITH (WITH count(*) as c)
● UNION / UNION ALL (combining results from multiple queries)
● Expressions
● Operators
● Comments
● Functions: ALL, ANY, LENGTH, {Math}, {String}, ...
● any updating query will run in a transaction
● “it is very important to finish each transaction”
● write lock on node/rel:
○ adding, changing or removing prop on a node/rel
● write lock on node:
○ creating or deleting a node
● write lock on node and both its nodes:
○ creating or deleting a relationship
Cypher: Transactions
Cypher: Aggregation
● count(node/rel/prop)
● count(n), count(n.prop)
● sum(n.prop)
● avg(n.prop)
● percentileDisc(n.prop, {median})
● stdev(n.prop, {median}) - calculate deviation from group
● max(n.prop, {median})
● collect(n.prop, {median})
● RETURN n, count(*)

FROM Person
WHERE name=“Valentin” and age > 30
● START person=node:Person(node=”Valentin”)
WHERE person.age > 30
RETURN person
Cypher: back to SQL #1/5
Cypher: back to SQL #2/5
● SELECT “Email”.*
FROM Person
JOIN “Email” ON “Person”.id = “Email”.person_id
WHERE “Person”.name = “Benedikt”
● START person=node:Person(name=”Benedikt”)
MATCH person-[:email]->email
RETURN email
Cypher: back to SQL #3/5
● show me all people that are both actors and
● SELECT name FROM Person
person_id IN (SELECT person_id FROM Actor) AND
person_id IN (SELECT person_id FROM Director)
● START person=node:Person(“name:*”)
WHERE (person)-[:ACTS_IN]->()
AND (person)-[:DIRECTED]->()
Cypher: back to SQL #4/5
● show me all Tom Hanks’s co-actors
JOIN Movie a1 ON tom.person_in = a1.person_id
JOIN Actor a2 ON a1.movie_id = a2.movie_id
JOIN Person co_actor ON co_actor.person_id = a2.person_id
WHERE = “Tom Hanks”
● START tom=node:Person(name=”Tom Hanks”)
MATCH tom-[:ACTS_IN]->movie,

Cypher: back to SQL #5/5
● show me all Lucy’s favorite directors
● SELECT, count(*) FROM Person lucy
JOIN Actor on Person.person_id = Actor.person_id
JOIN Director ON Actor.movie_id = Director.movie_id
JOIN Person dir ON Director.person_id = dir.person_id
WHERE = “Lucy Liu”
ORDER BY count(*) DESC
● START lucy=node:Person(name=”Lucy Liu”)
MATCH lucy-[:ACTS_IN]->movie,
RETURN, count(*)
ORDER BY, count(*) DESC
lucy = node:Person(name=”Lucy Lui”),
kevin = node:Person(name=”Kevin Bacon”)
p = shortestPath( lucy-[:ACTS_IN*]-kevin )
COALESCE(, n.title?))
Cypher: back to SQL #6/5
Neo4j Shell
● command-line shell for running Cypher queries
● supports remote shell
● :schema
● bash# neo4j-shell -path data/graph.db -readonly
-config conf/
-c “<command>”
Neo4j: Security
● does not deal with data encryption
● can be used all means built into the Java
● can be used encrypted datastore
● webadmin https

● manipulate data stored in RDF format
● focused on match triple sets
PREFIX foaf: <>
SELECT ?name ?email
?person a foaf:Person.
?person foaf:name ?name.
?person foaf:mbox ?email.
● graph traversal language
● scripting language
● Pipe & Filter (similar to jQuery)
● across different graph databases
● based on Groovy (limited to Java)
● not as stable in Neo4j
● XPath like
● ./outE[label=”family”]/inV/@name
● g.v(1).out('likes').in('likes').out('likes').groupCount(m)
●'x').out.groupCount(m).loop('x'){c++ < 1000}
● g.v(1).in(‘LOVE_OF’).out(‘SOME_IN’).has(‘title’,’abc’).back(2)
Neo4j and PHP
● everyman/neo4jphp <
○ PHP wrapper for the Neo4j using REST interface
○ Follows the PSR-0 autoloading standard
○ Basic wrappers for all components
○ Last update - a month ago
○ supports Gremlin
● Neo4j-PHP OGM < a lot of based on
○ Object Graph Mapper, inspired by Doctrine
○ based on DoctrineCommon
○ borrows significantly DoctrineORM design
○ uses annotations on classes
○ MIT Licence
● Neo4J PHP REST API client
○ Using Neo4j REST API
○ Node create/find/delete
○ Relationship create/list/filter
High Availability with Neo4j
● in HA - a single master and zero or more slaves
● slave synchronizing with the master to preserve
● master write to slave before transaction completes

Demo Example Datasets:
● DrWho (nodes=1'060; rels=2'286)
● Cineasts Movies & Actors (nodes=64'069; rels=121'778)
● Hubway Data Challenge (nodes=554'674; rels=2'011'904)
● JIRA and neo4j
● PHP and neo4j
● Kant in neo4j
Gephi (win, nix, mac)
Neoclipse (eclipse plugin)

KeyLines (JavaScript library)
Graffeine (npm package)
Neovigator (neography + processing.js)
● Heroku
○ GrapheneDB beta
○ bash$ heroku addons:add graphenedb
● Jelastic Cloud PaaS

● GrapheneDB - based on neo4j
● AllegroGraph - Closed Source, Commercial, RDF-QuadStore
● Sones - Closed Source, .NET focused
○ graph database built around the W3C spec for the Resource
Description Framework
○ supports SPARQL, RDFS++, and Prolog
● Virtuoso - Closed Source, RDF focused
● GraphDB - graph database built in .NET by the German company sones
● InfiniteGraph - goal is to create a graph database with "virtually
unlimited scalability."
● FlockDB
● - book, O'REILLY
● http://www.cs.usfca.
edu/~galles/visualization/Algorithms.html - Graph
Algorithms visualization
● best used for graph-style,
rich or complex,
structured dense data,
deep graphs with unlimited depth and cyclical,
with weighted connections,
interconnected data
● quickly add new functionality without impacting
existing deployments
● schema-less forcing to re-think entire approach to data
● not the silver bullet for all problems
  • 2. Graph-like power Roman R. MATCH (a:Actor),(m:Movie) WHERE ='Keanu Reeves' AND m.title='The Matrix' CREATE (actor)-[:ACTS_IN]->(movie)
  • 3. Today ○ Graphs in NoSQL world ○ classification ○ definition ○ components ○ Neo4j ○ nodes, rels, props, indexes ○ Cypher ○ PHP and Neo4j ○ Demo ○ Alternatives ○ Q/A 1
  • 5. What is a Graph in math 3 ● represent a connected set of objects ● graph: ○ vertex (node/points) ○ edge (arc/line/relationship/arrow) - undirected ○ attribute (property) - on node/relationship ● types: ○ pair: G = (V, E) ○ digraph: D = (V, A) ○ mixed: G = (V, E, A) V = {1, 2, 3, 4, 5, 6} E = {{1, 2}, {1, 5}, {2, 3}, {2, 5}, {3, 4}, {4, 5}, {4, 6}}
  • 6. What is a Graph database 4 ● stores data in a graph and retrieving vast networks of data ● shines when storing richly-connected data ● consists of nodes, connected by relationships ○ A Graph —records data in→ Nodes —which have→ Properties ○ Nodes —are organized by→ Rels —which also have→ Properties ○ Nodes —are grouped by→ Labels —into→ Sets ○ A Traversal —navigates→ a Graph it —identifies→ Paths —which order→ Nodes ○ An Index —maps from→ Properties —to either→ Nodes or Rels ○ A Graph Database —manages a→ Graph and —also manages related→ Indexes
  • 7. Nodes, Rels, Props, Labels 5 A Graph —records data in→ Nodes —which have→ Properties Nodes —are organized by→ Relationships —which also have→ Properties Nodes —are grouped by→ Labels —into→ Sets
  • 8. Graph Traversal 6 A Traversal —navigates→ a Graph it —identifies→ Paths —which order→ Nodes what music do my friends like that I don’t yet own if this power supply goes down, what web services are affected?
  • 9. Graph Index 7 An Index —maps from→ Properties —to either→ Nodes or Rels find the Account for username master-of-graphs
  • 10. Graph 8 A Graph Database —manages a→ Graph and —also manages related→ Indexes
  • 11. How looks Graph database 9
  • 12. A Graph Database transforms a RDBMS 10
  • 13. A Graph Database elaborates a Key-Value Store 11 K* = key V* = value
  • 14. A Graph Database relates Column-Family 12 ● BigTable databases are an evolution of key-value, using "families" to allow grouping of rows ● stored in a graph, the families could become hierarchical, and the relationships among data becomes explicit
  • 15. A Graph Database navigates a Document Store 13 D=Document, S=Subdocument, V=Value, D2/S2 = reference
  • 16. NoSQL Data Models 14 90% of all use cases Relational Databases
  • 17. 15
  • 18. ● intuitive, using a graph model for data representation ● reliable, fully transactional, upholds ACID ● durable and fast, using a custom disk-based, native storage engine ● massively scalable, up to several billion nodes/relationships/properties ● highly-available, when distributed across multiple machines ● expressive, with a powerful, human readable declarative graph query language ● fast, with a powerful traversal framework for high-speed graph queries ● embeddable, with a few small jars ● simple, accesible by a convenient REST API interface or an object- oriented JAVA API ● indexes are based on Apache Lucene, supports Secondary Indexes ● has been in commercial development for 10 years and in production for over 7 years; since 2003; ● Cross-platform; Simple set-up; Well documented; Open source; ● GPL for Community, AGPL for Enterprise 16 Neo4j features
  • 19. ● CPU - Intel Core i3/i7 ● Memory - 2GB .. 16/32GB ● Disk - 10GB SATA .. SSD w/ SATA ● Filesystem - ext4 .. ext4/ZFS ● Software - Oracle JAVA 7 17 Neo4j requirements
  • 20. ● Neo4j Community ○ Open-Source High Performance ○ fully ACID transactional graph database ● Neo4j Enterprise ○ High-Performance Cache (up to 10x faster) ○ Horizontal scalability with Neo4j Clustering (predictable scalability) ○ High-availability and online backups ○ Cache based sharding (shard your graph in memory) ○ Advanced Monitoring (operational metrics) ○ Certified for Windows and Linux ○ Email/Phone Support (10x5, 24x7 hours) ○ Subscriptions ■ Personal (up to 3 devs, $100k annual revenue) = FREE ■ Startups (<$10M funding, <$5M annual revenue) = $12k ■ Business (medium, to Global 2000) = Contact Sales 18 Neo4j license
  • 21. 19 ● for the simple friends of friends query, Neo4j is 60% faster than MySQL ● for friends of friends of friends, Neo is 180 times faster ● and for the depth four query, Neo4j is 1,135 times faster ● and MySQL just chokes on the depth 5 query Neo4j vs. Mysql
  • 22. Neo4j: Nodes ● fundamental units that form a graph ● can have key/value-style properties ● index nodes and relationships by {key, value} pairs ● represent entities 20
  • 23. Neo4j: Relationships #1/2 ● connect entities and structure domain ● allow for finding related data ● are always directed (outgoing or incoming) ● are equally well traversed in either direction ● can have relationships to itself ● have a relationship type (label) 21
  • 25. Neo4j: Properties ● nodes and relationships can have properties ● are key-value pairs ○ key is a string ○ values can be either a primitive or an array of one primitive type ■ boolean, String, int, int[], etc ■ Java Language Specification ● entity attributes, rels qualities, and metadata 23
  • 26. Neo4j: Labels ● used to group nodes into sets ● any number of labels, including none ● can be added and removed during runtime ● can be used to mark temporary states for nodes ● names case-sensitive ● CamelCase (convention) 24
  • 27. Neo4j: Paths ● is one or more nodes with connecting relationships ● shortest path: ● a path of length one: ● a path of length one: 25
  • 28. Neo4j: Traversal ● Traversal Framework from box ● means visiting nodes, following relationships by rules ● in most cases only a subgraph is visited ● callback based traversal API ○ you can specify the traversal rules ● traversing breadth- or depth-first ● open Java API 26
  • 29. Neo4j: graph algorithms ● A* (> uses the A* algorithm to find the cheapest path between two nodes) ● Dijkstra (dijkstra > Dijkstra algorithm to find the cheapest path between two nodes) ● PathWithLength (> all paths of a certain length (depth) between two nodes) ● Shortest paths (shortestPath Default > find all the shortest paths between two nodes) ● All simple paths (allSimplePaths > find all simple paths between two nodes; without loops;) ● All paths (allPaths > find all available paths between two nodes) 27
  • 30. Neo4j: Schema ● is schema-optional graph database 28
  • 31. ● introduced in Neo4j 2.0 ● eventually available (populating in the background, is not immediately available for querying) ○ come online after fully populated ○ failed status (drop and recreate the index) ● can be created on labels group ● indexed Nodes & Rels ● node_auto_indexing=false, node_keys_indexable Neo4j: Index 29
  • 32. Neo4j: Constraints ● can help you keep your data clean ● specify the rules for what your data should look like ● unique constraints is the only available constraint type 30
  • 33. ● single server instance ○ nodes = 2^35 (~34 billion) ○ relationships = 2^35 (~34 billion) ○ labels = 2^31 (~2 billion) ○ properties = 2^36 to 2^38 depending on property types (maximum ~274 billion, always at least ~68 billion) ○ relationship types = 2^15 (~ 32’000) 31 Neo4j: Data Size
  • 34. ● powerful graph query language ● relatively simple ● declarative grammar (say what you want, not how) ● humane query language ● self-explanatory (based on English prose and neat iconography) ● written in Scala ● pattern-matching (borrows expression approaches from SPARQL) ● aggregation, ordering, limits ● create, update, delete ● structure and most of keywords inspired by SQL ● changing rather rapidly (CYPHER 1.9 START ...) Cypher Query Language 32 “Makes the simple things easy, and the complex things possible”
  • 35. Cypher patterns #1/2 33 ● (a) ● (b) ● (a)-->(b) ● (a)-->(b)-->(c) ● (b)-->(c)<--(a) ● (b)-->()<--(a) ● (a)--(b) ● (a)-(*5)->(b) ● (a)-(*3..5)->(b) ○ (a)-(*3..)->(b) ○ (a)-(*..5)->(b) ○ (a)-(*)->(b)
  • 36. Cypher patterns #2/2 34 ● (a:Label)-->(m) ● (a:User:Admin)-->(m) ● (a)--(m) ● (a)-[r]->(m) ● (a)-[ACTED_IN]->(m) ● (a)-[r:SOME|ELSE|WTH]->(m)
  • 37. Cypher: START / RETURN “It all starts with the START” Michael Hunger, Cypher webinar, Sep 2012 ● designates the start points ● START is optional (in Neo4j >= 2.0) Examples: ● START <lookup> RETURN <expression> ● START n=node(0) RETURN n ● START n=node(*) RETURN 35
  • 38. Cypher: MATCH ● primary way of getting data from the database ● START <lookup> MATCH <pattern> RETURN <expr> ● OPTIONAL MATCH <lookup> RETURN <expr> Examples: ● MATCH (n) RETURN count(n) ● MATCH (actor:Actor) RETURN; ● START me=node(0) MATCH (me)--(f) RETURN ● MATCH (n)-[r]->(m) RETURN n AS FROM, r AS `->`, m AS TO 36
  • 39. ● creates nodes and relationships ● CREATE (<name>[:label] [properties,..]) ● CREATE (<node-in>)-[<var>:RELATION [properties,..]]->(<node-out>); ● CREATE UNIQUE ... Examples: ● CREATE (n:Actor { name:"Keanu Reeves" }); ● CREATE (keanu)-[:ACTED_IN]->(matrix) ● MATCH (keanu {name:”..”}) SET keanu.age=49 RETURN Cypher: CREATE / SET 37
  • 40. Cypher: WHERE ● filters the results ● MATCH <pattern> WHERE <condition> RETURN <expr> Examples: ● WHERE =~ “(?i)John.*” ● WHERE NOT .. ● WHERE type(rel) =~ “Perso.*” 38
  • 41. Cypher: RETURN ● creates the result table ● any query can return data ● can be nodes, relationships, or properties on these ● RETURN DISTINCT <expression> AS x ● RETURN aggregate(expr) as alias ● RETURN nodes, rels, properties ● RETURN expressions of funcs and operators ● RETURN aggregation funcs on the above 39
  • 42. Cypher: etc ● CASE / WHEN / ELSE ● ORDER BY node.key, node2.key, .. ASC|DESC ● LIMIT / SKIP ● WITH (WITH count(*) as c) ● UNION / UNION ALL (combining results from multiple queries) ● USING INDEX/SCAN ● MERGE / SET / DELETE / REMOVE / FORECH ● Expressions ● Operators ● Comments ● Functions: ALL, ANY, LENGTH, {Math}, {String}, ... 40
  • 43. ● any updating query will run in a transaction ● ACID ● “it is very important to finish each transaction” ● write lock on node/rel: ○ adding, changing or removing prop on a node/rel ● write lock on node: ○ creating or deleting a node ● write lock on node and both its nodes: ○ creating or deleting a relationship Cypher: Transactions 41
  • 44. Cypher: Aggregation ● count(node/rel/prop) ● count(n), count(n.prop) ● sum(n.prop) ● avg(n.prop) ● percentileDisc(n.prop, {median}) ● stdev(n.prop, {median}) - calculate deviation from group ● max(n.prop, {median}) ● collect(n.prop, {median}) ● RETURN n, count(*) 42
  • 45. ● SELECT * FROM Person WHERE name=“Valentin” and age > 30 ● START person=node:Person(node=”Valentin”) WHERE person.age > 30 RETURN person Cypher: back to SQL #1/5 43
  • 46. Cypher: back to SQL #2/5 ● SELECT “Email”.* FROM Person JOIN “Email” ON “Person”.id = “Email”.person_id WHERE “Person”.name = “Benedikt” ● START person=node:Person(name=”Benedikt”) MATCH person-[:email]->email RETURN email 44
  • 47. Cypher: back to SQL #3/5 ● show me all people that are both actors and directors ● SELECT name FROM Person WHERE person_id IN (SELECT person_id FROM Actor) AND person_id IN (SELECT person_id FROM Director) ● START person=node:Person(“name:*”) WHERE (person)-[:ACTS_IN]->() AND (person)-[:DIRECTED]->() RETURN 45
  • 48. Cypher: back to SQL #4/5 ● show me all Tom Hanks’s co-actors ● SELECT DISTICT FROM Person tom JOIN Movie a1 ON tom.person_in = a1.person_id JOIN Actor a2 ON a1.movie_id = a2.movie_id JOIN Person co_actor ON co_actor.person_id = a2.person_id WHERE = “Tom Hanks” ● START tom=node:Person(name=”Tom Hanks”) MATCH tom-[:ACTS_IN]->movie, co_actor-[:ACTS_IN]->movie RETURN DISTINCT 46
  • 49. Cypher: back to SQL #5/5 ● show me all Lucy’s favorite directors ● SELECT, count(*) FROM Person lucy JOIN Actor on Person.person_id = Actor.person_id JOIN Director ON Actor.movie_id = Director.movie_id JOIN Person dir ON Director.person_id = dir.person_id WHERE = “Lucy Liu” GROUP BY ORDER BY count(*) DESC ● START lucy=node:Person(name=”Lucy Liu”) MATCH lucy-[:ACTS_IN]->movie, director-[:DIRECTED]->movie RETURN, count(*) ORDER BY, count(*) DESC 47
  • 50. START lucy = node:Person(name=”Lucy Lui”), kevin = node:Person(name=”Kevin Bacon”) MATCH p = shortestPath( lucy-[:ACTS_IN*]-kevin ) RETURN EXTRACT (n in NODES(p): COALESCE(, n.title?)) 48 Cypher: back to SQL #6/5
  • 51. Neo4j Shell ● command-line shell for running Cypher queries ● supports remote shell ● :schema ● bash# neo4j-shell -path data/graph.db -readonly -config conf/ -c “<command>” 49
  • 52. Neo4j: Security ● does not deal with data encryption explicitly ● can be used all means built into the Java ● can be used encrypted datastore ● webadmin https 50
  • 53. ● manipulate data stored in RDF format ● focused on match triple sets PREFIX foaf: <> SELECT ?name ?email WHERE { ?person a foaf:Person. ?person foaf:name ?name. ?person foaf:mbox ?email. } SPARQL 51
  • 54. ● graph traversal language ● scripting language ● Pipe & Filter (similar to jQuery) ● across different graph databases ● based on Groovy (limited to Java) ● not as stable in Neo4j ● XPath like ● ./outE[label=”family”]/inV/@name ● g.v(1).out('likes').in('likes').out('likes').groupCount(m) ●'x').out.groupCount(m).loop('x'){c++ < 1000} ● g.v(1).in(‘LOVE_OF’).out(‘SOME_IN’).has(‘title’,’abc’).back(2) Gremlin 52
  • 55. Neo4j and PHP ● everyman/neo4jphp < ○ PHP wrapper for the Neo4j using REST interface ○ Follows the PSR-0 autoloading standard ○ Basic wrappers for all components ○ Last update - a month ago ○ supports Gremlin ● Neo4j-PHP OGM < a lot of based on ○ Object Graph Mapper, inspired by Doctrine ○ based on DoctrineCommon ○ borrows significantly DoctrineORM design ○ uses annotations on classes ○ MIT Licence ● Neo4J PHP REST API client ○ Using Neo4j REST API ○ Node create/find/delete ○ Relationship create/list/filter 53
  • 56. High Availability with Neo4j ● in HA - a single master and zero or more slaves ● slave synchronizing with the master to preserve consistency ● master write to slave before transaction completes 54
  • 57. Demo Example Datasets: ● DrWho (nodes=1'060; rels=2'286) ● Cineasts Movies & Actors (nodes=64'069; rels=121'778) ● Hubway Data Challenge (nodes=554'674; rels=2'011'904) GraphGist: ● JIRA and neo4j ● PHP and neo4j ● Kant in neo4j XSS 55
  • 58. Gephi (win, nix, mac) 56
  • 63. Neovigator (neography + processing.js) 61
  • 64. ● Heroku ○ GrapheneDB beta ○ bash$ heroku addons:add graphenedb ● Jelastic Cloud PaaS Cloud 62
  • 65. ● GrapheneDB - based on neo4j ● AllegroGraph - Closed Source, Commercial, RDF-QuadStore ● Sones - Closed Source, .NET focused ○ graph database built around the W3C spec for the Resource Description Framework ○ supports SPARQL, RDFS++, and Prolog ● Virtuoso - Closed Source, RDF focused ● GraphDB - graph database built in .NET by the German company sones ● InfiniteGraph - goal is to create a graph database with "virtually unlimited scalability." ● FlockDB Analogues 63
  • 66. Docs ● ● ● - book, O'REILLY ● http://www.cs.usfca. edu/~galles/visualization/Algorithms.html - Graph Algorithms visualization ● ● 64
  • 67. ● best used for graph-style, rich or complex, structured dense data, deep graphs with unlimited depth and cyclical, with weighted connections, interconnected data ● quickly add new functionality without impacting existing deployments ● schema-less forcing to re-think entire approach to data ● not the silver bullet for all problems Conclusion