Graph Databases in Python (PyCon Canada 2012)

GRAPH DATABASES
IN PYTHON
Javier de la Rosa
@versae
The CulturePlex Lab
Western University, London, ON

PyCon Canada 2012

WHO I AM
��
Javier de la Rosa
●
versae
●
versae
●
Computer Scientist and
Humanist
●
CulturePlex Lab
●
CulturePlex

Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 2

FIRST OF ALL

“You do not really understand something
unless you can explain it to your
grandmother”

– (Frequently attributed to) Richard Feynman


DATABASES (in the last 30 years)
●
Data in tables, rows and columns

●
Pretty basic mechanism to make connections:
– Primary keys, Foreign keys, and... that's all

●
Relational, ahem, really?


DATABASES (in the last 30 years)
●
Rigid data schemas
– Have you ever tried to make a schema migration?

●
Relational Algebra and SQL
– Terrible for highly interconnected data
– JOIN's can take a life to end (a bit overdramatized)


NoSQL, Not Only SQL
●
Document ●
Anaylitc
– MongoDB, CouchDB, etc. – Hadoop

●
Key-value stores ●
Graph
– Redis, Riak, Voldemort, – Neo4j, OrientDB,
Dynamo, etc. HyperGraphDB, Titan, etc.

●
Big Tables ●
Other
– Cassandra, Hbase, etc – Objectivity/DB, ZODB, etc.


DATABASES LANDSCAPE

Source: 451Research, https://451research.com/report-long?icid=2289


WHO IS USING GRAPHS?
●
Mozilla with Pancake and Pacer
– https://wiki.mozilla.org/Pancake &
http://pangloss.github.com/pacer/
●
Twitter with FlockDB
– https://github.com/twitter/flockdb
●
Facebook with Open Graph
– https://developers.facebook.com/docs/opengraph/
●
Google with Knowledge Graph
– http://www.google.ca/insidesearch/.../knowledge.html

WHY GRAPHS?
●
Data is getting more and more connected
– From text documents, to wikis, to ontologies, to
folksonomies, etc

●
And more semi-structured
– Think about the decentralization of content generation

●
And more complex
– Social networks, semantic trending, etc
Source: Neo Technology, http://www.slideshare.net/emileifrem/neo4j-the-benefits-of-graph-databases-oscon-2009


A FEW OF THE CURRENT USES
●
Social Networking and Recommendations
●
Network and Cloud Management
●
Master Data Management
●
Geospatial
●
Bioinformatics
●
Content Management and Security and Access
Control

Source: Mashable, http://mashable.com/2012/09/26/graph-databases/


AND WHY ELSE?
●
Because graphs are cool!

Leonard Euler

WHAT IS A GRAPH?

●
G = (V, E)
Where
– G is a graph
– V is a set of vertices
– E is a set of edges

Source: Wikipedia, https://en.wikipedia.org/wiki/Graph_(mathematics)


WHAT IS A GRAPH?
●
G = (V, E)
– Graph, aka network, diagram, etc.
– Vertex, aka point, dot, node, element, etc.
– Edge, aka relationship, arc, line, link, etc.

●
Basically, “a graph states that something is related
to something else”
– Svetlana Sicular,
Research Director at Gartner
Source: Gartner, http://blogs.gartner.com/svetlana-sicular/think-graph/


TYPES OF GRAPH

Undirected Digraph



TYPES OF GRAPH

Multigraph Hypergraph



SOME GRAPHS EVEN HAVE A NAME
●
Complete graphs

K3 K5 K8

Source: Wikipedia, http://en.wikipedia.org/wiki/Gallery_of_named_graphs


●
Stars

The star graphs S3, S4, S5 and S6



●
Snarks

Blanuša (second) Szekeres Double star



THINGS CAN COMPLICATE...

Local McLaughlin graph


WAIT A SEC,


DON'T WORRY
●
Just one more type: the Property Graph

1
2 1

2 3 3

4

4


THE PROPERTY GRAPH
●
Directed, attributed and multi-relational
Name: Javi

1
2 1
Knows Knows
Since: 2009 Since:1990
2 3 3 Name: David
Likes
Name: John
4
Likes
4

Title: The Art of Computer Programming
Price: $135


THE PROPERTY GRAPH
●
A set of nodes, and each node has:
– An unique identifier.
– A set of outgoing edges.
– A set of incoming edges.
– A collection of properties defined by a map from key to value.
●
A set of relationships, and each relationship has:
– An unique identifier.
– An outgoing tail vertex.
– An incoming head vertex.
– And a collection of properties defined by a map from key to value.

Source: TinkerPop, https://github.com/tinkerpop/gremlin/wiki/Defining-a-Property-Graph


IN SHORT
●
A Property Graph is composed by:
– A set of nodes
– A set of relationships
– Properties and id's on both

●
Sometimes, nodes and relationship can be typed
– In Blueprints and Neo4j, a label denotes the type of
relationship between its two nodes.


GRAPH DATABASES
●
A graph database uses graph structures with nodes,
edges, and properties to represent and store data
– ...but there is not an easy way to visualize this

Source: Wikipedia, https://en.wikipedia.org/wiki/Graph_database


HOW IT LOOKS IN PYTHON?


# Let's create a graph
>>> silvester = g.nodes.create(name="Silvester")



Name: Silvester


>>> arnold = g.nodes.create(name="Arnold")

Name: Silvester



Name: Silvester Name: Arnold



>>> punch = arnold.punches(silvester)




>>> punch = arnold.punches(silvester)

punches




punches

Name: Arnold

Name: Silvester


>>> chuck = g.nodes.create(name="Chuck")

punches

Name: Arnold

Name: Silvester


>>> chuck = g.nodes.create(name="Chuck")

punches

Name: Arnold

Name: Silvester Name: Chuck


>>> chuck.dropkicks(silvester)
>>> chuck.dropkicks(arnold)

punches

Name: Arnold



>>> chuck.dropkicks(silvester)
>>> chuck.dropkicks(arnold)

punches dropkicks

Name: Arnold

dropkicks



GRAPH DATABASES LANDSCAPE
Database Data Model Query Method License Python Binding

Cypher, Gremlin, Native,
Neo4j Property Graph GPL, AGPL
Traversal Blueprints, REST
Gremlin,
OrientDB Property Graph Apache 2 Blueprints
Traversal
Typed HGQuery,
HyperGraphDB LGPL Nope
Hypergraph Traversal

DEX Property Graph Traversal Commercial Blueprints

Titan Property Graph Gremlin Apache 2 Blueprints

AGPL,
InfoGrid Property Graph Traversal Nope
Commercial

InfiniteGraph Property Graph Gremlin Commercial Nope



And more:
– AffinityDB
– YarcData uRiKA
– Apache Giraph
– Cassovary
– StigDB
– NuvolaBase
– Pegasus
– Microsoft Trinity
– Sherlock
– And so on


Database Data Model Query Method License Python Binding

Cypher, Gremlin, Native,
Neo4j Property Graph GPL, AGPL
Traversal Blueprints, REST
Gremlin,
OrientDB Property Graph Apache 2 Blueprints
Traversal
Typed HGQuery,
HyperGraphDB LGPL Nope
Hypergraph Traversal

DEX Property Graph Traversal Commercial Blueprints

Titan Property Graph Gremlin Apache 2 Blueprints

AGPL,
InfoGrid Property Graph Traversal Nope
Commercial

InfiniteGraph Property Graph Gremlin Commercial Nope



GREMLIN, BLUEPRINTS, WAT?
Let me introduce you the TinkerPop Stack

Source:TinkerPop, http://www.tinkerpop.com/


BLUEPRINTS AND REXSTER
●
Blueprints is a property graph model interface

●
Rexster is a server that exposes any Blueprints
graph through REST

Source:TinkerPop, http://www.tinkerpop.com/


AND WHAT ABOUT PYTHON?
●
Options to connect to a Blueprints Graph Database

OrientDB Neo4j

bulbflow

Blueprints API Rexster python-blueprints

pyblueprints
DEX Titan

REST


BULBFLOW
●
Create
>>> alice = g.vertices.create(name="Alice")
>>> bob = g.vertices.create(name="Bob")
>>> g.edges.create(alice, "knows", bob)

●
Get
>>> alice = g.vertices.get(1)
>>> bob = g.vertices.get(2)

●
Update
>>> alice.age = 21
>>> alice.save()

●
Delete
>>> alice.delete()
Source: Bulbflow, http://bulbflow.com/docs/


PYBLUEPRINTS
●
Create
>>> alice = g.addVertex()
>>> alice.setProperty("name", "Alice")
>>> bob = g.addVertex()
>>> bob.setProperty("name", "Bob")
>>> g.addEdge(alice, bob, "knows")
●
Get
>>> alice = g.getVertex(1)
>>> bob = g.getVertex(2)

●
Update
>>> alice.setProperty("age", 21)

●
Delete
>>> g.removeVertex(alice.getId())
Source: PyBlueprints, https://github.com/escalant3/pyblueprints


BUT NEO4J HAS ITS OWN CLIENTS!
●
REST Clients for Neo4j
neo4j-rest-client
OrientDB Neo4j
py2neo

Blueprints API Rexster bulbflow

python-blueprints
DEX Titan
pyblueprints

REST


HOW CAN I LOOKUP?
●
An index is a data structure that supports the fast
lookup of elements by some key/value pair

Source: TinkerPop, https://github.com/tinkerpop/blueprints/wiki/Graph-Indices


INDICES
●
In Python bindings, are similar to dict
– bulbflow
# bulbflow creates auto indices to make easier basic lookups
>>> nodes = g.vertices.index.lookup(name="Alice")
>>> for node in nodes:
...: print vertex

– PyBlueprints
>>> index = g.getIndex("names", "vertex")
>>> index.put("name", alice.getProperty("name"), alice)
>>> nodes = index.get("name", "Alice")
...: print node


INDICES
●
Some Graph Databases provide full-text queries
– bulbflow
>>> nodes = g.vertices.index.query(name="ali*")
...: print node

– PyBlueprints
>>> index = g.getIndex("names", "vertex")
>>> nodes = index.query("name", "ali*")
...: print node


...MORE COMPLEX SEARCHS?

“Without traversals [FlockDB] is only a persisted
graph. But not a graph database.”
– Alex Popescu

Source: myNoSQL, http://nosql.mypopescu.com/


LET'S TRAVERSE THE GRAPH!
●
“A graph traversal is the problem of visiting all the
nodes in a graph in a particular manner”
– A* search
– Alpha-beta prunning
– Breadth-First Search (BFS)
– Depth-First Search (DFS)
– Dijkstra's algorithm
– Floyd-Warshall's algortimth
– Etc.
Source: Wikipedia, https://en.wikipedia.org/wiki/Graph_traversal


NEO4J TRAVERSAL API
●
Python-embedded (native Neo4j Python binding)
>>> traverser = gdb.traversal()
.relationships('knows').traverse(alice)

# The graph is traversed as you loop through the result
>>> for node in traverser.nodes:
...: print node

●
neo4j-rest-client
>>> traverser = alice.traverse(types=[client.All.knows])

# The graph is traversed as you loop through the result
>>> for node in traverser:
...: print node


BLUEPRINTS GREMLIN

●
Gremlin is a domain specific language for traversing
property graphs
– Defines how to do a query based on the graph structure
>>> gremlin = g.extensions.GremlinPlugin.execute_script
>>> params = {'alice_id': alice.id}
>>> script = "g.V(alice_id).out('knows')"
>>> node = gremlin(script=script, params=params)
>>> node == bob

Source: TinkerPop Gremlin, https://github.com/tinkerpop/gremlin/wiki
Source: Marko Rodríguez, The Graph Traversal Programmin Pattern, http://www.slideshare.net/slidarko/graph-windycitydb2010

NEO4J CYPHER QUERY LANGUAGE
●
Declarative graph query language
– Expressive and efficient querying
– Focused on expressing what to retrieve from a graph
– Inspired by SQL
– Pattern matching expressions from SPARQL



●
– Inspired by SQL

1 2
label

(1) -[:label]- (2)



●
– Inspired by SQL

1 2
label

START n=(1), m=(2) MATCH
n-[r:label]-m
RETURN r


PY2NEO CYPHER HELPERS
●
Get or create elements
>>> g.get_or_create_relationships(
...: (bob, "WORKS WITH", carol, {"since": 2004}),
...: (alice, "DISLIKES!", carol, {"reason": "youth"}),
...: (bob, "WORKS WITH", dave, {"since": 2009}), )

●
Get counts
>>> nodes_count = g.get_node_count()
>>> rels_count = g.get_relationship_count()

●
Delete
>>> g.delete()

Source: py2neo, http://py2neo.org/


NEO4J-REST-CLIENT CYPHER HELPERS
●
Query casting
>>> q = """start n=node(*) match n-[r:punchs]-() """
"""return n, n.name, r, r.since"""
>>> results = g.query(q, returns=(Node, unicode, Relationship, int))

●
Complex filtering
lookups = (
Q("name", exact="Arnold") &
(Q("surname", istartswith="swar") &
~Q("surname", iendswith="chenegger"))
)
arnolds = g.nodes.filter(lookups)

Source: neo4j-rest-client, https://github.com/versae/neo4j-rest-client


LET'S PLAY!
●
Deploy Neo4j in Heroku or Amazon

●
Use one of the available clients


NEO4J HEROKU ADD-ON
●
Create a Heroku app and add the Neo4j add-on
$ heroku apps:create pyconca
$ heroku addons:add neo4j --app pyconca
$ xdg-open `heroku config:get NEO4J_URL --app pyconca`
$ export NEO4J_URL=`heroku config:get NEO4J_URL --app pyconca`

●
Create a virtualenv with neo4j-rest-client
$ mkvirtualenv --no-site-packages pyconca
$ workon pyconca
$ pip install ipython neo4jrestclient
$ ipython


NEO4J HEROKU ADD-ON
●
Run IPython and that's it!
>>> import os
>>> NEO4J_URL = os.environ["NEO4J_URL"]
>>> from neo4jrestclient import client
>>> gdb = client.GraphDatabase(NEO4J_URL + "/db/data")
>>> gdb.url


THANKS!
Questions?
Javier de la Rosa
@versae
The CulturePlex Lab
Western University, London, ON

PyCon Canada 2012

APPENDIX: DATA MODELS
●
neo4django
– https://github.com/scholrly/neo4django

●
neomodel
– https://github.com/robinedwards/neomodel

●
bulbflow models
– http://bulbflow.com/quickstart/#models


APPENDIX: VISUALIZE YOUR GRAPH
●
Export somehow to .gexf for Gephi
– http://gephi.org/
●
Use D3.js
– http://d3js.org/
●
Use sigma.js
– http://sigmajs.org/
●
Take a look on Max De Marzi work
– http://maxdemarzi.com/category/visualization/
●
Use Sylva (for newbies)
– http://www.sylvadb.com/

Graph Databases in Python (PyCon Canada 2012)

Related slideshows

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

More Related Content

What's hot

What's hot (19)

Viewers also liked

Viewers also liked (19)

Similar to Graph Databases in Python (PyCon Canada 2012)

Similar to Graph Databases in Python (PyCon Canada 2012) (20)

More from Javier de la Rosa

More from Javier de la Rosa (6)

Recently uploaded

Recently uploaded (20)

Graph Databases in Python (PyCon Canada 2012)