Neo4j: Graph-like power

Graph-like power
Roman R.
MATCH (a:Actor),(m:Movie)
WHERE a.name ='Keanu Reeves'
AND m.title='The Matrix'
CREATE (actor)-[:ACTS_IN]->(movie)

Today
○ Graphs in NoSQL world
○ classification
○ definition
○ components
○ Neo4j
○ nodes, rels, props, indexes
○ Cypher
○ PHP and Neo4j
○ Demo
○ Alternatives
○ Q/A
1

NoSQL Databases
Key-Value
Document
Graph
Column
(BigTable
)
MemcacheDB
Redis
Riak
Cassandra
CouchDB
Neo4j
TITAN
HBase/Hadoop
OrientDB
2
Elasticsearch
RavenDB
Tokyo Cabinet
Infinite GraphAllegroGraph
NoSQL
MongoDB

What is a Graph in math
3
● represent a connected set of objects
● graph:
○ vertex (node/points)
○ edge (arc/line/relationship/arrow) - undirected
○ attribute (property) - on node/relationship
● types:
○ pair: G = (V, E)
○ digraph: D = (V, A)
○ mixed: G = (V, E, A)
V = {1, 2, 3, 4, 5, 6}
E = {{1, 2}, {1, 5}, {2, 3}, {2, 5}, {3, 4}, {4, 5}, {4, 6}}

What is a Graph database
4
● stores data in a graph and retrieving vast networks of data
● shines when storing richly-connected data
● consists of nodes, connected by relationships
○ A Graph —records data in→ Nodes —which have→ Properties
○ Nodes —are organized by→ Rels —which also have→ Properties
○ Nodes —are grouped by→ Labels —into→ Sets
○ A Traversal —navigates→ a Graph
it —identifies→ Paths —which order→ Nodes
○ An Index —maps from→ Properties —to either→ Nodes or Rels
○ A Graph Database —manages a→ Graph and
—also manages related→ Indexes

Nodes, Rels, Props, Labels
5
A Graph
—records data in→ Nodes
—which have→ Properties
Nodes
—are organized by→ Relationships
—which also have→ Properties
Nodes
—are grouped by→ Labels
—into→ Sets

Graph Traversal
6
A Traversal
—navigates→ a Graph
it
—identifies→ Paths
—which order→ Nodes
what music
do my friends like
that I don’t yet own
if this power supply goes down,
what web services
are affected?

Graph Index
7
An Index
—maps from→ Properties
—to either→ Nodes or Rels
find the Account
for username master-of-graphs

Graph
8
A Graph Database
—manages a→ Graph and
—also manages related→ Indexes

A Graph Database transforms a RDBMS
10

A Graph Database elaborates a Key-Value Store
11
K* = key
V* = value

A Graph Database relates Column-Family
12
● BigTable databases are an evolution of key-value,
using "families" to allow grouping of rows
● stored in a graph, the families could become
hierarchical, and the relationships among data
becomes explicit

A Graph Database navigates a Document Store
13
D=Document,
S=Subdocument,
V=Value,
D2/S2 = reference

NoSQL Data Models
14
90% of all use cases
Relational Databases

● intuitive, using a graph model for data representation
● reliable, fully transactional, upholds ACID
● durable and fast, using a custom disk-based, native storage engine
● massively scalable, up to several billion nodes/relationships/properties
● highly-available, when distributed across multiple machines
● expressive, with a powerful, human readable declarative graph query
language
● fast, with a powerful traversal framework for high-speed graph queries
● embeddable, with a few small jars
● simple, accesible by a convenient REST API interface or an object-
oriented JAVA API
● indexes are based on Apache Lucene, supports Secondary Indexes
● has been in commercial development for 10 years and in production for
over 7 years; since 2003;
● Cross-platform; Simple set-up; Well documented; Open source;
● GPL for Community, AGPL for Enterprise
16
Neo4j features

● CPU - Intel Core i3/i7
● Memory - 2GB .. 16/32GB
● Disk - 10GB SATA .. SSD w/ SATA
● Filesystem - ext4 .. ext4/ZFS
● Software - Oracle JAVA 7
17
Neo4j requirements

● Neo4j Community
○ Open-Source High Performance
○ fully ACID transactional graph database
● Neo4j Enterprise
○ High-Performance Cache (up to 10x faster)
○ Horizontal scalability with Neo4j Clustering (predictable scalability)
○ High-availability and online backups
○ Cache based sharding (shard your graph in memory)
○ Advanced Monitoring (operational metrics)
○ Certified for Windows and Linux
○ Email/Phone Support (10x5, 24x7 hours)
○ Subscriptions
■ Personal (up to 3 devs, $100k annual revenue) = FREE
■ Startups (<$10M funding, <$5M annual revenue) = $12k
■ Business (medium, to Global 2000) = Contact Sales
18
Neo4j license

19
● for the simple friends of friends query, Neo4j is 60% faster than MySQL
● for friends of friends of friends, Neo is 180 times faster
● and for the depth four query, Neo4j is 1,135 times faster
● and MySQL just chokes on the depth 5 query
Neo4j vs. Mysql

Neo4j: Nodes
● fundamental units that form a graph
● can have key/value-style properties
● index nodes and relationships
by {key, value} pairs
● represent entities
20

Neo4j: Relationships #1/2
● connect entities and structure domain
● allow for finding related data
● are always directed (outgoing or incoming)
● are equally well traversed in either direction
● can have relationships to itself
● have a relationship type (label)
21

Neo4j: Properties
● nodes and relationships can have properties
● are key-value pairs
○ key is a string
○ values can be either a primitive or an array of
one primitive type
■ boolean, String, int, int[], etc
■ Java Language Specification
● entity attributes, rels qualities,
and metadata
23

Neo4j: Labels
● used to group nodes into sets
● any number of labels, including none
● can be added and removed during runtime
● can be used to mark temporary states for nodes
● names case-sensitive
● CamelCase (convention)
24

Neo4j: Paths
● is one or more nodes with connecting relationships
● shortest path:
● a path of length one:
● a path of length one:
25

Neo4j: Traversal
● Traversal Framework from box
● means visiting nodes, following relationships by rules
● in most cases only a subgraph is visited
● callback based traversal API
○ you can specify the traversal rules
● traversing breadth- or depth-first
● open Java API
26

Neo4j: graph algorithms
● A* (> uses the A* algorithm to find the cheapest path between two
nodes)
● Dijkstra (dijkstra > Dijkstra algorithm to find the cheapest path
between two nodes)
● PathWithLength (> all paths of a certain length (depth)
between two nodes)
● Shortest paths (shortestPath Default > find all the
shortest paths between two nodes)
● All simple paths (allSimplePaths > find all simple paths
between two nodes; without loops;)
● All paths (allPaths > find all available paths between two
nodes)
27

Neo4j: Schema
● is schema-optional graph database
28

● introduced in Neo4j 2.0
● eventually available (populating in the background, is
not immediately available for querying)
○ come online after fully populated
○ failed status (drop and recreate the index)
● can be created on labels group
● indexed Nodes & Rels
● node_auto_indexing=false,
node_keys_indexable
Neo4j: Index
29

Neo4j: Constraints
● can help you keep your data clean
● specify the rules for what your data should
look like
● unique constraints is the only available
constraint type
30

● single server instance
○ nodes = 2^35 (~34 billion)
○ relationships = 2^35 (~34 billion)
○ labels = 2^31 (~2 billion)
○ properties = 2^36 to 2^38 depending on
property types (maximum ~274 billion, always
at least ~68 billion)
○ relationship types = 2^15 (~ 32’000)
31
Neo4j: Data Size

● powerful graph query language
● relatively simple
● declarative grammar (say what you want, not how)
● humane query language
● self-explanatory (based on English prose and neat iconography)
● written in Scala
● pattern-matching (borrows expression approaches from SPARQL)
● aggregation, ordering, limits
● create, update, delete
● structure and most of keywords inspired by SQL
● changing rather rapidly (CYPHER 1.9 START ...)
Cypher Query Language
32
“Makes the simple things easy, and the complex things possible”

Cypher patterns #1/2
33
● (a)
● (b)
● (a)-->(b)
● (a)-->(b)-->(c)
● (b)-->(c)<--(a)
● (b)-->()<--(a)
● (a)--(b)
● (a)-(*5)->(b)
● (a)-(*3..5)->(b)
○ (a)-(*3..)->(b)
○ (a)-(*..5)->(b)
○ (a)-(*)->(b)

Cypher patterns #2/2
34
● (a:Label)-->(m)
● (a:User:Admin)-->(m)
● (a)--(m)
● (a)-[r]->(m)
● (a)-[ACTED_IN]->(m)
● (a)-[r:SOME|ELSE|WTH]->(m)

Cypher: START / RETURN
“It all starts with the START”
Michael Hunger, Cypher webinar, Sep 2012
● designates the start points
● START is optional (in Neo4j >= 2.0)
Examples:
● START <lookup> RETURN <expression>
● START n=node(0) RETURN n
● START n=node(*) RETURN n.name
35

Cypher: MATCH
● primary way of getting data from the database
● START <lookup> MATCH <pattern> RETURN <expr>
● OPTIONAL MATCH <lookup> RETURN <expr>
Examples:
● MATCH (n) RETURN count(n)
● MATCH (actor:Actor) RETURN actor.name;
● START me=node(0) MATCH (me)--(f) RETURN f.name
● MATCH (n)-[r]->(m) RETURN n AS FROM, r AS `->`, m AS TO
36

● creates nodes and relationships
● CREATE (<name>[:label] [properties,..])
● CREATE (<node-in>)-[<var>:RELATION [properties,..]]->(<node-out>);
● CREATE UNIQUE ...
Examples:
● CREATE (n:Actor { name:"Keanu Reeves" });
● CREATE (keanu)-[:ACTED_IN]->(matrix)
● MATCH (keanu {name:”..”}) SET keanu.age=49 RETURN
Cypher: CREATE / SET
37

Cypher: WHERE
● filters the results
● MATCH <pattern> WHERE <condition> RETURN <expr>
Examples:
● WHERE n.name =~ “(?i)John.*”
● WHERE NOT ..
● WHERE type(rel) =~ “Perso.*”
38

Cypher: RETURN
● creates the result table
● any query can return data
● can be nodes, relationships, or properties on these
● RETURN DISTINCT <expression> AS x
● RETURN aggregate(expr) as alias
● RETURN nodes, rels, properties
● RETURN expressions of funcs and operators
● RETURN aggregation funcs on the above
39

Cypher: etc
● CASE / WHEN / ELSE
● ORDER BY node.key, node2.key, .. ASC|DESC
● LIMIT / SKIP
● WITH (WITH count(*) as c)
● UNION / UNION ALL (combining results from multiple queries)
● USING INDEX/SCAN
● MERGE / SET / DELETE / REMOVE / FORECH
● Expressions
● Operators
● Comments
● Functions: ALL, ANY, LENGTH, {Math}, {String}, ...
40

● any updating query will run in a transaction
● ACID
● “it is very important to finish each transaction”
● write lock on node/rel:
○ adding, changing or removing prop on a node/rel
● write lock on node:
○ creating or deleting a node
● write lock on node and both its nodes:
○ creating or deleting a relationship
Cypher: Transactions
41

Cypher: Aggregation
● count(node/rel/prop)
● count(n), count(n.prop)
● sum(n.prop)
● avg(n.prop)
● percentileDisc(n.prop, {median})
● stdev(n.prop, {median}) - calculate deviation from group
● max(n.prop, {median})
● collect(n.prop, {median})
● RETURN n, count(*)
42

● SELECT *
FROM Person
WHERE name=“Valentin” and age > 30
● START person=node:Person(node=”Valentin”)
WHERE person.age > 30
RETURN person
Cypher: back to SQL #1/5
43

● SELECT “Email”.*
FROM Person
JOIN “Email” ON “Person”.id = “Email”.person_id
WHERE “Person”.name = “Benedikt”
● START person=node:Person(name=”Benedikt”)
MATCH person-[:email]->email
RETURN email
44

● show me all people that are both actors and
directors
● SELECT name FROM Person
WHERE
person_id IN (SELECT person_id FROM Actor) AND
person_id IN (SELECT person_id FROM Director)
● START person=node:Person(“name:*”)
WHERE (person)-[:ACTS_IN]->()
AND (person)-[:DIRECTED]->()
RETURN person.name
45

● show me all Tom Hanks’s co-actors
● SELECT DISTICT co_actor.name FROM Person tom
JOIN Movie a1 ON tom.person_in = a1.person_id
JOIN Actor a2 ON a1.movie_id = a2.movie_id
JOIN Person co_actor ON co_actor.person_id = a2.person_id
WHERE tom.name = “Tom Hanks”
● START tom=node:Person(name=”Tom Hanks”)
MATCH tom-[:ACTS_IN]->movie,
co_actor-[:ACTS_IN]->movie
RETURN DISTINCT co_actor.name
46

● show me all Lucy’s favorite directors
● SELECT dir.name, count(*) FROM Person lucy
JOIN Actor on Person.person_id = Actor.person_id
JOIN Director ON Actor.movie_id = Director.movie_id
JOIN Person dir ON Director.person_id = dir.person_id
WHERE lucy.name = “Lucy Liu”
GROUP BY dir.name
ORDER BY count(*) DESC
● START lucy=node:Person(name=”Lucy Liu”)
MATCH lucy-[:ACTS_IN]->movie,
director-[:DIRECTED]->movie
RETURN director.name, count(*)
ORDER BY director.name, count(*) DESC
47

START
lucy = node:Person(name=”Lucy Lui”),
kevin = node:Person(name=”Kevin Bacon”)
MATCH
p = shortestPath( lucy-[:ACTS_IN*]-kevin )
RETURN
EXTRACT (n in NODES(p):
COALESCE(n.name?, n.title?))
48

Neo4j Shell
● command-line shell for running Cypher queries
● supports remote shell
● :schema
● bash# neo4j-shell -path data/graph.db -readonly
-config conf/neo4j.properties
-c “<command>”
49

Neo4j: Security
● does not deal with data encryption
explicitly
● can be used all means built into the Java
● can be used encrypted datastore
● webadmin https
50

● manipulate data stored in RDF format
● focused on match triple sets
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?name ?email
WHERE {
?person a foaf:Person.
?person foaf:name ?name.
?person foaf:mbox ?email.
}
SPARQL
51

● graph traversal language
● scripting language
● Pipe & Filter (similar to jQuery)
● across different graph databases
● based on Groovy (limited to Java)
● not as stable in Neo4j
● XPath like
● ./outE[label=”family”]/inV/@name
● g.v(1).out('likes').in('likes').out('likes').groupCount(m)
● g.V.as('x').out.groupCount(m).loop('x'){c++ < 1000}
● g.v(1).in(‘LOVE_OF’).out(‘SOME_IN’).has(‘title’,’abc’).back(2)
Gremlin
52

Neo4j and PHP
● everyman/neo4jphp < packagist.org
○ PHP wrapper for the Neo4j using REST interface
○ Follows the PSR-0 autoloading standard
○ Basic wrappers for all components
○ Last update - a month ago
○ supports Gremlin
● Neo4j-PHP OGM < a lot of based on
○ Object Graph Mapper, inspired by Doctrine
○ based on DoctrineCommon
○ borrows significantly DoctrineORM design
○ uses annotations on classes
○ MIT Licence
● Neo4J PHP REST API client
○ Using Neo4j REST API
○ Node create/find/delete
○ Relationship create/list/filter
53

High Availability with Neo4j
● in HA - a single master and zero or more slaves
● slave synchronizing with the master to preserve
consistency
● master write to slave before transaction completes
54

Demo
Neo4j.org Example Datasets:
● DrWho (nodes=1'060; rels=2'286)
● Cineasts Movies & Actors (nodes=64'069; rels=121'778)
● Hubway Data Challenge (nodes=554'674; rels=2'011'904)
GraphGist:
● JIRA and neo4j
● PHP and neo4j
● Kant in neo4j
XSS
55

KeyLines (JavaScript library)
59

Neovigator (neography + processing.js)
61

● Heroku
○ GrapheneDB beta
○ bash$ heroku addons:add graphenedb
● Jelastic Cloud PaaS
Cloud
62

● GrapheneDB - based on neo4j
● AllegroGraph - Closed Source, Commercial, RDF-QuadStore
● Sones - Closed Source, .NET focused
○ graph database built around the W3C spec for the Resource
Description Framework
○ supports SPARQL, RDFS++, and Prolog
● Virtuoso - Closed Source, RDF focused
● GraphDB - graph database built in .NET by the German company sones
● InfiniteGraph - goal is to create a graph database with "virtually
unlimited scalability."
● FlockDB
Analogues
63

Docs
● http://docs.neo4j.org/chunked/snapshot/
● http://docs.neo4j.org/refcard/2.0/
● http://graphdatabases.com/ - book, O'REILLY
● http://www.cs.usfca.
edu/~galles/visualization/Algorithms.html - Graph
Algorithms visualization
● http://bit.ly/rr-neo4j
● https://github.com/itspoma/test-neo4j
64

● best used for graph-style,
rich or complex,
structured dense data,
deep graphs with unlimited depth and cyclical,
with weighted connections,
interconnected data
● quickly add new functionality without impacting
existing deployments
● schema-less forcing to re-think entire approach to data
● not the silver bullet for all problems
Conclusion

Neo4j: Graph-like power

Related slideshows

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Neo4j: Graph-like power

Similar to Neo4j: Graph-like power (20)

Recently uploaded

Recently uploaded (20)

Neo4j: Graph-like power