SlideShare a Scribd company logo
GRAPH DATABASES
   IN PYTHON
      Javier de la Rosa
            @versae
       The CulturePlex Lab
  Western University, London, ON

      PyCon Canada 2012
WHO I AM
��
    Javier de la Rosa
●
     versae
●
     versae
●
    Computer Scientist and
    Humanist
●
    CulturePlex Lab
●
     CulturePlex


               Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012   2
FIRST OF ALL

“You do not really understand something
         unless you can explain it to your
                           grandmother”

                – (Frequently attributed to) Richard Feynman




     Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012   3
DATABASES (in the last 30 years)
●
    Data in tables, rows and columns


●
    Pretty basic mechanism to make connections:
    –   Primary keys, Foreign keys, and... that's all


●
    Relational, ahem, really?




                  Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012   4

Recommended for you

Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...

These are the slides from my ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data"

sparqllink traversal based query executionquerying
Data Day Texas 2017: Scaling Data Science at Stitch Fix
Data Day Texas 2017: Scaling Data Science at Stitch FixData Day Texas 2017: Scaling Data Science at Stitch Fix
Data Day Texas 2017: Scaling Data Science at Stitch Fix

At Stitch Fix we have a lot of Data Scientists. Around eighty at last count. One reason why I think we have so many, is that we do things differently. To get their work done, Data Scientists have access to whatever resources they need (within reason), because they’re end to end responsible for their work; they collaborate with their business partners on objectives and then prototype, iterate, productionize, monitor and debug everything and anything required to get the output desired. They’re full data-stack data scientists! The teams in the organization do a variety of different tasks: - Clothing recommendations for clients. - Clothes reordering recommendations. - Time series analysis & forecasting of inventory, client segments, etc. - Warehouse worker path routing. - NLP. … and more! They’re also quite prolific at what they do -- we are approaching 4500 job definitions at last count. So one might be wondering now, how have we enabled them to get their jobs done without getting in the way of each other? This is where the Data Platform teams comes into play. With the goal of lowering the cognitive overhead and engineering effort required on part of the Data Scientist, the Data Platform team tries to provide abstractions and infrastructure to help the Data Scientists. The relationship is a collaborative partnership, where the Data Scientist is free to make their own decisions and thus choose they way they do their work, and the onus then falls on the Data Platform team to convince Data Scientists to use their tools; the easiest way to do that is by designing the tools well. In regard to scaling Data Science, the Data Platform team has helped establish some patterns and infrastructure that help alleviate contention. Contention on: Access to Data Access to Compute Resources: Ad-hoc compute (think prototype, iterate, workspace) Production compute (think where things are executed once they’re needed regularly) For the talk (and this post) I only focused on how we reduced contention on Access to Data, & Access to Ad-hoc Compute to enable Data Science to scale at Stitch Fix. With that I invite you to take a look through the slides.

data sciencescalings3
H2O Deep Water - Making Deep Learning Accessible to Everyone
H2O Deep Water - Making Deep Learning Accessible to EveryoneH2O Deep Water - Making Deep Learning Accessible to Everyone
H2O Deep Water - Making Deep Learning Accessible to Everyone

Deep Water is H2O's integration with multiple open source deep learning libraries such as TensorFlow, MXNet and Caffe. On top of the performance gains from GPU backends, Deep Water naturally inherits all H2O properties in scalability. ease of use and deployment. In this talk, I will go through the motivation and benefits of Deep Water. After that, I will demonstrate how to build and deploy deep learning models with or without programming experience using H2O's R/Python/Flow (Web) interfaces. Jo-fai (or Joe) is a data scientist at H2O.ai. Before joining H2O, he was in the business intelligence team at Virgin Media in UK where he developed data products to enable quick and smart business decisions. He also worked remotely for Domino Data Lab in the US as a data science evangelist promoting products via blogging and giving talks at meetups. Joe has a background in water engineering. Before his data science journey, he was an EngD research engineer at STREAM Industrial Doctorate Centre working on machine learning techniques for drainage design optimization. Prior to that, he was an asset management consultant specialized in data mining and constrained optimization for the utilities sector in the UK and abroad. He also holds an MSc in Environmental Management and a BEng in Civil Engineering.

machine learningaiartificial intelligence
DATABASES (in the last 30 years)
●
    Rigid data schemas
    –   Have you ever tried to make a schema migration?


●
    Relational Algebra and SQL
    –   Terrible for highly interconnected data
    –   JOIN's can take a life to end (a bit overdramatized)




                  Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012   5
NoSQL, Not Only SQL
●
    Document                                      ●
                                                      Anaylitc
    –   MongoDB, CouchDB, etc.                         –   Hadoop


●
    Key-value stores                              ●
                                                      Graph
    –   Redis, Riak, Voldemort,                        –   Neo4j, OrientDB,
        Dynamo, etc.                                       HyperGraphDB, Titan, etc.


●
    Big Tables                                    ●
                                                      Other
    –   Cassandra, Hbase, etc                          –   Objectivity/DB, ZODB, etc.

                  Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012   6
DATABASES LANDSCAPE




                                          Source: 451Research, https://451research.com/report-long?icid=2289

Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012                                        7
WHO IS USING GRAPHS?
●
    Mozilla with Pancake and Pacer
    –   https://wiki.mozilla.org/Pancake &
        http://pangloss.github.com/pacer/
●
    Twitter with FlockDB
    –   https://github.com/twitter/flockdb
●
    Facebook with Open Graph
    –   https://developers.facebook.com/docs/opengraph/
●
    Google with Knowledge Graph
    –   http://www.google.ca/insidesearch/.../knowledge.html
                 Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012   8

Recommended for you

Neo, Titan & Cassandra
Neo, Titan & CassandraNeo, Titan & Cassandra
Neo, Titan & Cassandra

The document compares Neo4j, Titan, and Cassandra graph databases. It provides details on each database such as Neo4j using the Cypher query language, Cassandra being highly distributed and able to scale linearly, and Titan running on Cassandra or HBase but not supporting Cypher queries. It also gives a 15 point comparison of Cassandra vs Neo4j and examples of querying the same data in Gremlin, Cypher, and SQL. The conclusion recommends a graph database like Neo4j for recommendation queries and only using Titan for very large graphs or high loads.

cassandraneo4jtitan
LDQL: A Query Language for the Web of Linked Data
LDQL: A Query Language for the Web of Linked DataLDQL: A Query Language for the Web of Linked Data
LDQL: A Query Language for the Web of Linked Data

I used this slideset to present our research paper at the 14th Int. Semantic Web Conference (ISWC 2015). Find a preprint of the paper here: http://olafhartig.de/files/HartigPerez_ISWC2015_Preprint.pdf

ldqlresearchlinked data
Groovy On Trading Desk (2010)
Groovy On Trading Desk (2010)Groovy On Trading Desk (2010)
Groovy On Trading Desk (2010)

Introductionary talk and tutorial on Groovy, discussion of its success and popularity, issues with the future of Java with examples

WHY GRAPHS?
●
    Data is getting more and more connected
    –   From text documents, to wikis, to ontologies, to
        folksonomies, etc


●
    And more semi-structured
    –   Think about the decentralization of content generation


●
    And more complex
    –   Social networks, semantic trending, etc
                             Source: Neo Technology, http://www.slideshare.net/emileifrem/neo4j-the-benefits-of-graph-databases-oscon-2009

                 Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012                                                    9
A FEW OF THE CURRENT USES
●
    Social Networking and Recommendations
●
    Network and Cloud Management
●
    Master Data Management
●
    Geospatial
●
    Bioinformatics
●
    Content Management and Security and Access
    Control


                                                           Source: Mashable, http://mashable.com/2012/09/26/graph-databases/

                 Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012                                    10
AND WHY ELSE?
●
    Because graphs are cool!




                               Leonard Euler
              Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012   11
WHAT IS A GRAPH?


●
    G = (V, E)
    Where
    –   G is a graph
    –   V is a set of vertices
    –   E is a set of edges



                                                                 Source: Wikipedia, https://en.wikipedia.org/wiki/Graph_(mathematics)

                       Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012                                        12

Recommended for you

Designing and Building a Graph Database Application - Ian Robinson (Neo Techn...
Designing and Building a Graph Database Application - Ian Robinson (Neo Techn...Designing and Building a Graph Database Application - Ian Robinson (Neo Techn...
Designing and Building a Graph Database Application - Ian Robinson (Neo Techn...

Presented at JAX London In this session we'll look at some of the design and implementation strategies you can employ when building a Neo4j-based graph database solution, including architectural choices, data modelling, and testing.

nosqljaxlondonneo4j
How Graph Databases efficiently store, manage and query connected data at s...
How Graph Databases efficiently  store, manage and query  connected data at s...How Graph Databases efficiently  store, manage and query  connected data at s...
How Graph Databases efficiently store, manage and query connected data at s...

Graph Databases try to make it easy for developers to leverage huge amounts of connected information for everything from routing to recommendations. Doing that poses a number of challenges on the implementation side. In this talk we want to look at the different storage, query and consistency approaches that are used behind the scenes. We’ll check out current and future solutions used in Neo4j and other graph databases for addressing global consistency, query and storage optimization, indexing and more and see which papers and research database developers take inspirations from.

 
by jexp
graph databasearchitecturesoftware design
H2O with Erin LeDell at Portland R User Group
H2O with Erin LeDell at Portland R User GroupH2O with Erin LeDell at Portland R User Group
H2O with Erin LeDell at Portland R User Group

Erin LeDell's presentation on scalable machine learning in R with H2O from the Portland R User Group Meetup in Portland, 08.17.15

ensemblesportland r usersbig data
WHAT IS A GRAPH?
●
    G = (V, E)
    –   Graph, aka network, diagram, etc.
    –   Vertex, aka point, dot, node, element, etc.
    –   Edge, aka relationship, arc, line, link, etc.


●
    Basically, “a graph states that something is related
    to something else”
                                                              – Svetlana Sicular,
                                                    Research Director at Gartner
                                                            Source: Gartner, http://blogs.gartner.com/svetlana-sicular/think-graph/

                   Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012                                           13
TYPES OF GRAPH




Undirected                                            Digraph



                                             Source: Wikipedia, https://en.wikipedia.org/wiki/Graph_(mathematics)

   Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012                                        14
TYPES OF GRAPH




Multigraph                                      Hypergraph



                                             Source: Wikipedia, https://en.wikipedia.org/wiki/Graph_(mathematics)

   Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012                                        15
SOME GRAPHS EVEN HAVE A NAME
●
    Complete graphs




          K3                                  K5                                             K8



                                                       Source: Wikipedia, http://en.wikipedia.org/wiki/Gallery_of_named_graphs

               Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012                                         16

Recommended for you

Sparkler - Spark Crawler
Sparkler - Spark Crawler Sparkler - Spark Crawler
Sparkler - Spark Crawler

Sparkler is a new open source web crawler that scales horizonatally on Apache Spark. Sparkler was presented at Apache Big Data EU 2016, Seville, Spain

sparksearchsparkler
Microservices, containers, and machine learning
Microservices, containers, and machine learningMicroservices, containers, and machine learning
Microservices, containers, and machine learning

http://www.oscon.com/open-source-2015/public/schedule/detail/41579 In this presentation, an open source developer community considers itself algorithmically. This shows how to surface data insights from the developer email forums for just about any Apache open source project. It leverages advanced techniques for natural language processing, machine learning, graph algorithms, time series analysis, etc. As an example, we use data from the Apache Spark email list archives to help understand its community better; however, the code can be applied to many other communities. Exsto is an open source project that demonstrates Apache Spark workflow examples for SQL-based ETL (Spark SQL), machine learning (MLlib), and graph algorithms (GraphX). It surfaces insights about developer communities from their email forums. Natural language processing services in Python (based on NLTK, TextBlob, WordNet, etc.), gets containerized and used to crawl and parse email archives. These produce JSON data sets, then we run machine learning on a Spark cluster to find out insights such as: * What are the trending topic summaries? * Who are the leaders in the community for various topics? * Who discusses most frequently with whom? This talk shows how to use cloud-based notebooks for organizing and running the analytics and visualizations. It reviews the background for how and why the graph analytics and machine learning algorithms generalize patterns within the data — based on open source implementations for two advanced approaches, Word2Vec and TextRank The talk also illustrates best practices for leveraging functional programming for big data.

sqldata sciencemesos
A super fast introduction to Spark and glance at BEAM
A super fast introduction to Spark and glance at BEAMA super fast introduction to Spark and glance at BEAM
A super fast introduction to Spark and glance at BEAM

Apache Spark is one of the most popular general purpose distributed systems, with built in libraries to support everything from ML to SQL. Spark has APIs across languages including Scala, Java, Python, and R -- with more 3rd party language support (like Julia & C#). Apache BEAM is a cross-platform tool for building on top of different distributed systems, but its in it’s early stages. This talk will introduce the core concepts of Apache Spark, and look to the potential future of Apache BEAM. Apache Spark has two core abstractions for representing distributed data and computations. This talk will introduce the basics of RDDs and Spark DataFrames & Datasets, and Spark’s method for achieving resiliency. Since it’s a big data talk, we will include the almost required wordcount example, and end the Spark part with follow up pointers on Spark’s new ML APIs. For folks who are interested we’ll then talk a bit about portability, and how Apache BEAM aims to improve portability (as well it’s unique approach to cross-language support). Slides from Holden's talk at https://www.meetup.com/Wellington-Data-Scaling-Chats/events/mdcsdpyxcbxb/

apache sparkapache beambig data
SOME GRAPHS EVEN HAVE A NAME
●
    Stars




            The star graphs S3, S4, S5 and S6



                                                     Source: Wikipedia, http://en.wikipedia.org/wiki/Gallery_of_named_graphs

             Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012                                         17
SOME GRAPHS EVEN HAVE A NAME
●
    Snarks




    Blanuša (second)                  Szekeres                                 Double star



                                                      Source: Wikipedia, http://en.wikipedia.org/wiki/Gallery_of_named_graphs

              Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012                                         18
THINGS CAN COMPLICATE...




       Local McLaughlin graph
                                          Source: Wikipedia, http://en.wikipedia.org/wiki/Gallery_of_named_graphs

  Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012                                         19
WAIT A SEC,




Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012   20

Recommended for you

Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science LabScalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab

PDF and Keynote version of the presentation available here: https://github.com/h2oai/h2o-meetups/tree/master/2017_04_04_HarvardMed_Scalable_Ensembles

machine learningensemble learning
A whirlwind tour of graph databases
A whirlwind tour of graph databasesA whirlwind tour of graph databases
A whirlwind tour of graph databases

This document provides an overview of GraphDB and Neo4j. It discusses why graphs are useful for modeling connected data and common use cases. It also summarizes Neo4j's transactional graph database capabilities, performance advantages, and deployment options. Key topics covered include causal clustering, query planning, and driver and tooling support for developers.

 
by jexp
graphdbneo4jcypher
Treasure Data Cloud Strategy
Treasure Data Cloud StrategyTreasure Data Cloud Strategy
Treasure Data Cloud Strategy

Treasure Data is a cloud-based big data analytics company based in Silicon Valley with about 20 employees. The document discusses Treasure Data's services and architecture, which includes collecting data from various sources using Fluentd, storing the data in a columnar format on AWS S3, and performing analytics using Hadoop and SQL queries. Treasure Data aims to simplify big data adoption through its fully-managed platform and quick setup process. Example customers discussed were able to see results within 2 weeks of signing up.

hadooptreasure datatd-agent
DON'T WORRY
●
    Just one more type: the Property Graph



                                              1
                          2                                      1


            2                                 3                                    3


                                                                 4

                                              4




                Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012       21
THE PROPERTY GRAPH
●
    Directed, attributed and multi-relational
                                         Name: Javi

                                               1
                           2                                     1
                        Knows                                Knows
                      Since: 2009                          Since:1990
             2                                 3                                    3   Name: David
                                             Likes
         Name: John
                                                                  4
                                                                Likes
                                               4

                       Title: The Art of Computer Programming
                                       Price: $135


                 Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012                22
THE PROPERTY GRAPH
●
    A set of nodes, and each node has:
    –   An unique identifier.
    –   A set of outgoing edges.
    –   A set of incoming edges.
    –   A collection of properties defined by a map from key to value.
●
    A set of relationships, and each relationship has:
    –   An unique identifier.
    –   An outgoing tail vertex.
    –   An incoming head vertex.
    –   And a collection of properties defined by a map from key to value.

                                                 Source: TinkerPop, https://github.com/tinkerpop/gremlin/wiki/Defining-a-Property-Graph

                    Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012                                             23
IN SHORT
●
    A Property Graph is composed by:
    –   A set of nodes
    –   A set of relationships
    –   Properties and id's on both


●
    Sometimes, nodes and relationship can be typed
    –   In Blueprints and Neo4j, a label denotes the type of
        relationship between its two nodes.



                   Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012   24

Recommended for you

Intro to Python Data Analysis in Wakari
Intro to Python Data Analysis in WakariIntro to Python Data Analysis in Wakari
Intro to Python Data Analysis in Wakari

This document summarizes an introduction to data analysis in Python using Wakari. It discusses why Python is a good language for data analysis, highlighting key Python packages like NumPy, Pandas, Matplotlib and IPython. It also introduces Wakari, a browser-based Python environment for collaborative data analysis and reproducible research. Wakari allows sharing of code, notebooks and data through a web link. The document recommends several talks at the PyData conference on efficient computing, machine learning and interactive plotting.

wakariprogrammingpresentation
Persistent graphs in Python with Neo4j
Persistent graphs in Python with Neo4jPersistent graphs in Python with Neo4j
Persistent graphs in Python with Neo4j

This document discusses persistent graphs in Python with Neo4j. It begins by explaining the limitations of relational databases and how graph databases like Neo4j focus on modeling complex relationships through nodes and edges. It then provides an overview of Neo4j, describing it as an open source graph database that is stable, actively developed, and can handle billions of nodes and relationships to model complex data.

neo4jpython
PyCon India 2012: Rapid development of website search in python
PyCon India 2012: Rapid development of website search in pythonPyCon India 2012: Rapid development of website search in python
PyCon India 2012: Rapid development of website search in python

The document discusses developing website search capabilities in Python. It provides an overview of typical search engine components like indexing, analyzing, and searching. It then compares two Python search libraries - Pylucene and Whoosh. Benchmark tests on indexing, committing, and searching a 1GB dataset showed Whoosh to outperform Pylucene in speed. The document recommends designing search as an independent, pluggable component and considers Whoosh and Pylucene as good options for rapid development and integration into Python web projects.

pylucenewebsite searchweb search engine
GRAPH DATABASES
●
    A graph database uses graph structures with nodes,
    edges, and properties to represent and store data
    –   ...but there is not an easy way to visualize this




                                                                Source: Wikipedia, https://en.wikipedia.org/wiki/Graph_database

                  Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012                                       25
HOW IT LOOKS IN PYTHON?




  Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012   26
HOW IT LOOKS IN PYTHON?
# Let's create a graph
>>> silvester = g.nodes.create(name="Silvester")




              Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012   27
HOW IT LOOKS IN PYTHON?
# Let's create a graph
>>> silvester = g.nodes.create(name="Silvester")




                                   Name: Silvester




              Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012   28

Recommended for you

Making Django and NoSQL Play Nice
Making Django and NoSQL Play NiceMaking Django and NoSQL Play Nice
Making Django and NoSQL Play Nice

This document summarizes a talk about making Django and NoSQL databases like MongoDB play nicely together. Currently, Django's ORM is optimized for SQL databases and makes assumptions that don't always apply to NoSQL databases. The talk proposes some changes to address this, including having the Query object do less database-specific work and pushing more of that down to the individual database compilers. This would make the Query more agnostic and allow the compilers to generate queries optimized for their specific databases. An example backend for MongoDB would be built to demonstrate this approach.

django nosql python
Building Knowledge Graphs in DIG
Building Knowledge Graphs in DIGBuilding Knowledge Graphs in DIG
Building Knowledge Graphs in DIG

This document discusses building knowledge graphs using DIG (Distributed Information Graphs) to integrate heterogeneous data sources. It describes the steps involved, including data acquisition, feature extraction, mapping to an ontology, entity resolution, graph construction, and deployment. As a use case, DIG has been used to build a knowledge graph from over 100 million web pages related to human trafficking to help law enforcement identify victims and prosecute traffickers.

information integration
Relational vs. Non-Relational
Relational vs. Non-RelationalRelational vs. Non-Relational
Relational vs. Non-Relational

This document compares relational and non-relational databases. It discusses how in 2003 the main databases were relational, but by 2010 non-relational databases grew popular in the "NoSQL movement". However, the document argues that there are no truly new database designs and that relational and non-relational databases can be combined. It advises to choose a database based on the specific problem and features needed rather than general classifications. The document provides examples of which types of databases fit certain data and access needs.

HOW IT LOOKS IN PYTHON?
# Let's create a graph
>>> silvester = g.nodes.create(name="Silvester")
>>> arnold = g.nodes.create(name="Arnold")




                                   Name: Silvester




              Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012   29
HOW IT LOOKS IN PYTHON?
# Let's create a graph
>>> silvester = g.nodes.create(name="Silvester")
>>> arnold = g.nodes.create(name="Arnold")




      Name: Silvester                                                      Name: Arnold


                Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012          30
HOW IT LOOKS IN PYTHON?
# Let's create a graph
>>> silvester = g.nodes.create(name="Silvester")
>>> arnold = g.nodes.create(name="Arnold")

>>> punch = arnold.punches(silvester)




      Name: Silvester                                                      Name: Arnold


                Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012          31
HOW IT LOOKS IN PYTHON?
# Let's create a graph
>>> silvester = g.nodes.create(name="Silvester")
>>> arnold = g.nodes.create(name="Arnold")

>>> punch = arnold.punches(silvester)




                                      punches




      Name: Silvester                                                      Name: Arnold


                Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012          32

Recommended for you

Bubbles – Virtual Data Objects
Bubbles – Virtual Data ObjectsBubbles – Virtual Data Objects
Bubbles – Virtual Data Objects

This document describes Bubbles, a Python framework for data processing and quality probing. Bubbles focuses on representing data objects and defining operations that can be performed on those objects. Key aspects include: - Data objects define the structure and representations of data without enforcing a specific storage format. - Operations can be performed on data objects and are dispatched dynamically based on the objects' representations. - A context stores available operations and handles dispatching. - Stores provide interfaces to load and save objects from formats like SQL, CSV, etc. - Pipelines allow sequencing operations to transform and process objects from source to target stores. - The framework includes common operations for filtering, joining, aggreg

opensourcedatadata warehouse
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...

Search engines frequently miss the mark when it comes to understanding user intent. This talk will walk through some of the key building blocks necessary to turn a search engine into a dynamically-learning "intent engine", able to interpret and search on meaning, not just keywords. We will walk through CareerBuilder's semantic search architecture, including semantic autocomplete, query and document interpretation, probabilistic query parsing, automatic taxonomy discovery, keyword disambiguation, and personalization based upon user context/behavior. We will also see how to leverage an inverted index (Lucene/Solr) as a knowledge graph that can be used as a dynamic ontology to extract phrases, understand and weight the semantic relationships between those phrases and known entities, and expand the query to include those additional conceptual relationships. As an example, most search engines completely miss the mark at parsing a query like (Senior Java Developer Portland, OR Hadoop). We will show how to dynamically understand that "senior" designates an experience level, that "java developer" is a job title related to "software engineering", that "portland, or" is a city with a specific geographical boundary (as opposed to a keyword followed by a boolean operator), and that "hadoop" is the skill "Apache Hadoop", which is also related to other terms like "hbase", "hive", and "map/reduce". We will discuss how to train the search engine to parse the query into this intended understanding and how to reflect this understanding to the end user to provide an insightful, augmented search experience. Topics: Semantic Search, Apache Solr, Finite State Transducers, Probabilistic Query Parsing, Bayes Theorem, Augmented Search, Recommendations, Query Disambiguation, NLP, Knowledge Graphs

augmented searchnatural language processingapache solr
Espruinoの紹介
Espruinoの紹介Espruinoの紹介
Espruinoの紹介
javascript
HOW IT LOOKS IN PYTHON?


          punches

                           Name: Arnold




Name: Silvester

                    Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012   33
HOW IT LOOKS IN PYTHON?
  >>> chuck = g.nodes.create(name="Chuck")




          punches

                           Name: Arnold




Name: Silvester

                    Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012   34
HOW IT LOOKS IN PYTHON?
  >>> chuck = g.nodes.create(name="Chuck")




          punches

                           Name: Arnold




Name: Silvester                                                               Name: Chuck

                    Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012        35
HOW IT LOOKS IN PYTHON?
  >>> chuck.dropkicks(silvester)
  >>> chuck.dropkicks(arnold)




          punches

                           Name: Arnold




Name: Silvester                                                               Name: Chuck

                    Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012        36

Recommended for you

Blue Ocean Strategy 5 November 2010 Emergo Vfc1
Blue Ocean Strategy 5 November 2010 Emergo Vfc1Blue Ocean Strategy 5 November 2010 Emergo Vfc1
Blue Ocean Strategy 5 November 2010 Emergo Vfc1

P. ter Horst / 2010

emergo-vfc bijeenkomst
Divas (nx power lite)
Divas (nx power lite)Divas (nx power lite)
Divas (nx power lite)

Kovelinas is a painter born in 1978 in Lithuania. He is known for his Divas Powerpoint series of paintings. The document provides brief biographical information about the Lithuanian painter Kovelinas and references one of his art series but does not provide any additional context or details.

Graphical Analysis
Graphical AnalysisGraphical Analysis
Graphical Analysis

This document summarizes different market structures: pure competition, pure monopoly, monopolistic competition, oligopoly, and collusive oligopoly. It describes key characteristics of each like perfect competition having many firms and monopoly having a single firm. It also discusses profit maximization under these structures and provides examples.

HOW IT LOOKS IN PYTHON?
  >>> chuck.dropkicks(silvester)
  >>> chuck.dropkicks(arnold)




          punches                                   dropkicks

                           Name: Arnold



                               dropkicks



Name: Silvester                                                               Name: Chuck

                    Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012        37
GRAPH DATABASES LANDSCAPE
  Database       Data Model           Query Method                 License                    Python Binding

                                     Cypher, Gremlin,                                            Native,
   Neo4j        Property Graph                                  GPL, AGPL
                                        Traversal                                            Blueprints, REST
                                          Gremlin,
  OrientDB      Property Graph                                    Apache 2                         Blueprints
                                          Traversal
                   Typed                 HGQuery,
HyperGraphDB                                                         LGPL                              Nope
                 Hypergraph              Traversal

    DEX         Property Graph            Traversal             Commercial                         Blueprints

   Titan        Property Graph             Gremlin                Apache 2                         Blueprints

                                                                  AGPL,
  InfoGrid      Property Graph            Traversal                                                    Nope
                                                                Commercial

InfiniteGraph   Property Graph             Gremlin              Commercial                             Nope


                                                               Source: Wikipedia, https://en.wikipedia.org/wiki/Graph_database

                 Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012                                       40
GRAPH DATABASES LANDSCAPE
And more:
–   AffinityDB
–   YarcData uRiKA
–   Apache Giraph
–   Cassovary
–   StigDB
–   NuvolaBase
–   Pegasus
–   Microsoft Trinity
–   Sherlock
–   And so on

                 Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012   41
GRAPH DATABASES LANDSCAPE
  Database       Data Model           Query Method                 License                    Python Binding

                                     Cypher, Gremlin,                                            Native,
   Neo4j        Property Graph                                  GPL, AGPL
                                        Traversal                                            Blueprints, REST
                                          Gremlin,
  OrientDB      Property Graph                                    Apache 2                         Blueprints
                                          Traversal
                   Typed                 HGQuery,
HyperGraphDB                                                         LGPL                              Nope
                 Hypergraph              Traversal

    DEX         Property Graph            Traversal             Commercial                         Blueprints

   Titan        Property Graph             Gremlin                Apache 2                         Blueprints

                                                                  AGPL,
  InfoGrid      Property Graph            Traversal                                                    Nope
                                                                Commercial

InfiniteGraph   Property Graph             Gremlin              Commercial                             Nope


                                                               Source: Wikipedia, https://en.wikipedia.org/wiki/Graph_database

                 Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012                                       42

Recommended for you

Mobisfera - Agència de Mobile Marketing
Mobisfera - Agència de Mobile MarketingMobisfera - Agència de Mobile Marketing
Mobisfera - Agència de Mobile Marketing

Mobisfera és una Agència de Màrqueting Mòbil amb seu a Barcelona. Imaginem, dissenyem i desenvolupem solucions per a dispositius mòbils. A més, apostem per l’Internet of Thinks (IOT) i la Wearable Technology.

mobile marketing agencia disseny desenvolupament i
The Why and How of Java8 at LINE Fukuoka
The Why and How of Java8 at LINE FukuokaThe Why and How of Java8 at LINE Fukuoka
The Why and How of Java8 at LINE Fukuoka

LINE Developer Meetup in Fukuoka #9 Talk. Actual slide is http://youhei.github.io/why-and-how-of-java8-at-line-fukuoka/

java
Dementias Platform UK
Dementias Platform UKDementias Platform UK
Dementias Platform UK

The document outlines the stages and funding of the Dementias Platform UK project. It establishes cohort(s) in stage 1 with £6M funding and establishes an imaging platform in stage 2 with another £6M. An additional £36M in capital funding will go toward imaging, stem cells, and informatics. It lists the director as John Gallacher from Oxford and describes the 14 work packages and informatics network leads. The informatics network lead is Simon Lovestone and lists various informatics sub-networks and their leads. It provides a conceptual model for the imaging informatics component with a central XNAT hub and nodes at various research centers.

GREMLIN, BLUEPRINTS, WAT?
Let me introduce you the TinkerPop Stack




                                                                 Source:TinkerPop, http://www.tinkerpop.com/

      Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012                               43
BLUEPRINTS AND REXSTER
●
    Blueprints is a property graph model interface




●
    Rexster is a server that exposes any Blueprints
    graph through REST



                                                                          Source:TinkerPop, http://www.tinkerpop.com/

               Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012                               44
AND WHAT ABOUT PYTHON?
●
    Options to connect to a Blueprints Graph Database


    OrientDB         Neo4j

                                                                                       bulbflow


        Blueprints API              Rexster                                     python-blueprints


                                                                                   pyblueprints
      DEX            Titan

                                                             REST




                    Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012                45
BULBFLOW
●
    Create
    >>> alice = g.vertices.create(name="Alice")
    >>> bob = g.vertices.create(name="Bob")
    >>> g.edges.create(alice, "knows", bob)



●
    Get
    >>> alice = g.vertices.get(1)
    >>> bob = g.vertices.get(2)

●
    Update
    >>> alice.age = 21
    >>> alice.save()

●
    Delete
    >>> alice.delete()
                                                                             Source: Bulbflow, http://bulbflow.com/docs/

                 Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012                                46

Recommended for you

Fastrack Digital Marketing Campaign by Jubaer
Fastrack Digital Marketing Campaign by JubaerFastrack Digital Marketing Campaign by Jubaer
Fastrack Digital Marketing Campaign by Jubaer

Fastrack is a sub-brand of Titan that was established in 1998 and focuses on watches and accessories. The document discusses Fastrack's digital marketing campaign objectives of engaging 1000 students at top private universities in Bangladesh by 2016 through social media campaigns on platforms like Facebook and YouTube. The campaigns aim to raise brand awareness and engage customers among the target 20-25 year old male and female demographic interested in style, fashion and experiencing new things.

maeketingpowerpointinternet marketing
Designing The Digital Experience
Designing The Digital ExperienceDesigning The Digital Experience
Designing The Digital Experience

The document discusses three paths to designing digital experiences: structural, community, and customer. It advocates writing an experience brief to define goals and mapping the customer journey. The presentation provides recommendations for libraries to focus on the customer experience by asking questions, emphasizing conversation, and staging experiences on their website. The overall message is that experience design improves the ordinary interactions people have with an organization online.

digital spacesexperience economyuser experience
Presentación sobre autores por Mati y Vasile
Presentación sobre autores por Mati y VasilePresentación sobre autores por Mati y Vasile
Presentación sobre autores por Mati y Vasile

El documento presenta una lista de autores españoles e italianos, incluyendo breves biografías y extractos cortos de sus obras más famosas. Entre los autores se encuentran Juan Ramón Jiménez, Rafael Alberti, Miguel de Cervantes, Gustavo Adolfo Bécquer, William Shakespeare. El documento proporciona información básica sobre la vida y obras de estos importantes escritores.

PYBLUEPRINTS
●
    Create
    >>>   alice = g.addVertex()
    >>>   alice.setProperty("name", "Alice")
    >>>   bob = g.addVertex()
    >>>   bob.setProperty("name", "Bob")
    >>>   g.addEdge(alice, bob, "knows")
●
    Get
    >>> alice = g.getVertex(1)
    >>> bob = g.getVertex(2)

●
    Update
    >>> alice.setProperty("age", 21)


●
    Delete
    >>> g.removeVertex(alice.getId())
                                                                  Source: PyBlueprints, https://github.com/escalant3/pyblueprints

                   Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012                                       47
BUT NEO4J HAS ITS OWN CLIENTS!
●
    REST Clients for Neo4j
                                                                                neo4j-rest-client
    OrientDB         Neo4j
                                                                                       py2neo



        Blueprints API              Rexster                                            bulbflow


                                                                                python-blueprints
      DEX            Titan
                                                                                   pyblueprints


                                                             REST

                    Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012                48
HOW CAN I LOOKUP?
●
    An index is a data structure that supports the fast
    lookup of elements by some key/value pair




                                                   Source: TinkerPop, https://github.com/tinkerpop/blueprints/wiki/Graph-Indices

               Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012                                           49
INDICES
●
    In Python bindings, are similar to dict
    –   bulbflow
    # bulbflow creates auto indices to make easier basic lookups
    >>> nodes = g.vertices.index.lookup(name="Alice")
    >>> for node in nodes:
    ...:    print vertex



    –   PyBlueprints
    >>> index = g.getIndex("names", "vertex")
    >>> index.put("name", alice.getProperty("name"), alice)
    >>> nodes = index.get("name", "Alice")
    >>> for node in nodes:
    ...:    print node



                   Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012   50

Recommended for you

Hidden markovmodel
Hidden markovmodelHidden markovmodel
Hidden markovmodel

The document provides an introduction to hidden Markov models (HMM) and their applications. It begins with an overview of HMM and its advantages for modeling sequential data. It then describes the basic concepts of Markov models, including their graphical representation, definitions, and algorithms for calculating sequence and state probabilities. The document introduces HMM and the hidden aspect, which is the state transition information that cannot be directly observed. It provides the formal definition of HMM and describes the three main problems in HMM: model evaluation, decoding, and training. It focuses on explaining the forward algorithm for efficient model evaluation in linear time complexity. The document uses examples throughout to illustrate key concepts such as Markov models, HMM, and the forward algorithm.

Multi-screen media report - May 2012 (Nielsen)
Multi-screen media report - May 2012 (Nielsen)Multi-screen media report - May 2012 (Nielsen)
Multi-screen media report - May 2012 (Nielsen)

This Nielsen report summarizes the results of a global survey of over 28,000 online consumers in 56 countries regarding their multi-screen media usage. The survey found that watching video on computers has become as popular as watching TV among online users. Reported online and mobile video viewing is rising, with over half of global online consumers watching videos on mobile phones monthly. Smartphone ownership is up significantly since 2010 and tablets are also gaining popularity globally. The report concludes that portable devices will continue affecting media consumption as their adoption increases.

reportmulti-screencross-platform
Running R on Hadoop - CHUG - 20120815
Running R on Hadoop - CHUG - 20120815Running R on Hadoop - CHUG - 20120815
Running R on Hadoop - CHUG - 20120815

The document summarizes a presentation on using R and Hadoop together. It includes: 1) An outline of topics to be covered including why use MapReduce and R, options for combining R and Hadoop, an overview of RHadoop, a step-by-step example, and advanced RHadoop features. 2) Code examples from Jonathan Seidman showing how to analyze airline on-time data using different R and Hadoop options - naked streaming, Hive, RHIPE, and RHadoop. 3) The analysis calculates average departure delays by year, month and airline using each method.

chugapache hadoopmapreduce
INDICES
●
    Some Graph Databases provide full-text queries
    –   bulbflow
    >>> nodes = g.vertices.index.query(name="ali*")
    >>> for node in nodes:
    ...:    print node




    –   PyBlueprints
    >>> index = g.getIndex("names", "vertex")
    >>> nodes = index.query("name", "ali*")
    >>> for node in nodes:
    ...:    print node




                   Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012   51
...MORE COMPLEX SEARCHS?

“Without traversals [FlockDB] is only a persisted
graph. But not a graph database.”
                                                                   – Alex Popescu




                                                                     Source: myNoSQL, http://nosql.mypopescu.com/

           Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012                               52
LET'S TRAVERSE THE GRAPH!
●
    “A graph traversal is the problem of visiting all the
    nodes in a graph in a particular manner”
    –   A* search
    –   Alpha-beta prunning
    –   Breadth-First Search (BFS)
    –   Depth-First Search (DFS)
    –   Dijkstra's algorithm
    –   Floyd-Warshall's algortimth
    –   Etc.
                                                                  Source: Wikipedia, https://en.wikipedia.org/wiki/Graph_traversal

                    Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012                                        53
NEO4J TRAVERSAL API
●
    Python-embedded (native Neo4j Python binding)
    >>> traverser = gdb.traversal()
                    .relationships('knows').traverse(alice)

    # The graph is traversed as you loop through the result
    >>> for node in traverser.nodes:
    ...:    print node


●
    neo4j-rest-client
    >>> traverser = alice.traverse(types=[client.All.knows])

    # The graph is traversed as you loop through the result
    >>> for node in traverser:
    ...:    print node




                 Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012   54

Recommended for you

Getting started with R & Hadoop
Getting started with R & HadoopGetting started with R & Hadoop
Getting started with R & Hadoop

My talk at August's joint meeting of Chicago's R and Hadoop user groups providing an introduction to using R with Hadoop. It starts with a quick introduction to and overview of available options, then focuses on using RHadoop's rmr library to perform an analysis on the publicly-available 'airline' data set.

apache hadooprstatshadoop
Binary RDF for Scalable Publishing, Exchanging and Consumption in the Web of ...
Binary RDF for Scalable Publishing, Exchanging and Consumption in the Web of ...Binary RDF for Scalable Publishing, Exchanging and Consumption in the Web of ...
Binary RDF for Scalable Publishing, Exchanging and Consumption in the Web of ...

This document discusses scalability issues with publishing, exchanging, and consuming large RDF datasets on the semantic web. It proposes an integrated solution called Binary RDF that includes (1) a binary serialization format for efficient publication and exchange of RDF data, and (2) basic data structures for direct consumption without decompression. Preliminary results show Binary RDF in the form of HDT can provide a compact representation of RDF and support direct pattern matching queries during consumption. Further work is needed to fully understand RDF structure and apply it to innovative dictionary and triple indexes.

world wide websemantic webcompression
The Semantic Web and Drupal 7 - Loja 2013
The Semantic Web and Drupal 7 - Loja 2013The Semantic Web and Drupal 7 - Loja 2013
The Semantic Web and Drupal 7 - Loja 2013

Presentation given at Drupal Summit Latino 2013 in Loja, Ecuador, covering the topic of the Semantic Web and Drupal 7.

ecuadordrupaldslloja2013
BLUEPRINTS GREMLIN

●
    Gremlin is a domain specific language for traversing
    property graphs
    –   Defines how to do a query based on the graph structure
    >>>   gremlin = g.extensions.GremlinPlugin.execute_script
    >>>   params = {'alice_id': alice.id}
    >>>   script = "g.V(alice_id).out('knows')"
    >>>   node = gremlin(script=script, params=params)
    >>>   node == bob


                                                                             Source: TinkerPop Gremlin, https://github.com/tinkerpop/gremlin/wiki
                       Source: Marko Rodríguez, The Graph Traversal Programmin Pattern, http://www.slideshare.net/slidarko/graph-windycitydb2010
                   Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012                                                      55
NEO4J CYPHER QUERY LANGUAGE
●
    Declarative graph query language
    –   Expressive and efficient querying
    –   Focused on expressing what to retrieve from a graph
    –   Inspired by SQL
    –   Pattern matching expressions from SPARQL




                                                               Source: Wikipedia, https://en.wikipedia.org/wiki/Graph_database

                 Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012                                       56
NEO4J CYPHER QUERY LANGUAGE
●
    Declarative graph query language
    –   Expressive and efficient querying
    –   Focused on expressing what to retrieve from a graph
    –   Inspired by SQL
    –   Pattern matching expressions from SPARQL

                        1                                     2
                                         label


                    (1) -[:label]- (2)

                                                               Source: Wikipedia, https://en.wikipedia.org/wiki/Graph_database

                 Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012                                       57
NEO4J CYPHER QUERY LANGUAGE
●
    Declarative graph query language
    –   Expressive and efficient querying
    –   Focused on expressing what to retrieve from a graph
    –   Inspired by SQL
    –   Pattern matching expressions from SPARQL

                        1                                     2
                                         label

               START n=(1), m=(2) MATCH
                    n-[r:label]-m
                       RETURN r
                                                               Source: Wikipedia, https://en.wikipedia.org/wiki/Graph_database

                 Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012                                       58

Recommended for you

Exploring Linked Data
Exploring Linked DataExploring Linked Data
Exploring Linked Data

Roberto García presented on exploring linked data. He discussed how semantic data is fine for computers but difficult for people to interact with. He proposed automatically generating user interfaces from ontologies and datasets, including overview menus, faceted browsing, and interaction patterns to allow users to build queries without knowledge of SPARQL or dataset structure. He demonstrated examples of his approach applied to DBPedia and LinkedMDB data.

hcisemantic weblinked data
Tese phd
Tese phdTese phd
Tese phd

This document summarizes Rodrigo Dias Arruda Senra's 2012 doctoral thesis defense at the University of Campinas. The thesis studied how to organize digital information for sharing across heterogeneous systems and proposed three main contributions: 1) SciFrame, a conceptual framework for scientific digital data processing; 2) database descriptors to enable loose coupling between applications and database management systems; and 3) organographs, a method for explicitly organizing information based on tasks.

phd thesis
Cloud-Based Spatial Data Analytics with R/Shiny
Cloud-Based Spatial Data Analytics with R/ShinyCloud-Based Spatial Data Analytics with R/Shiny
Cloud-Based Spatial Data Analytics with R/Shiny

This document discusses using R Shiny and related tools to create cloud-based spatial data analytics applications. It describes a case study of an app called VectorPoint created to analyze spatial disease data from Peru. The app allows users to collect field data via smartphones, calculate disease probabilities on a map, and track inspections. R Shiny allows rapid prototyping by combining R code and interactive web interfaces. While powerful for prototyping, R Shiny has limitations like requiring an online connection and not being optimized for speed.

data science associationuptakecdsc
PY2NEO CYPHER HELPERS
●
    Get or create elements
    >>> g.get_or_create_relationships(
    ...:    (bob, "WORKS WITH", carol, {"since": 2004}),
    ...:    (alice, "DISLIKES!", carol, {"reason": "youth"}),
    ...:    (bob, "WORKS WITH", dave, {"since": 2009}), )

●
    Get counts
    >>> nodes_count = g.get_node_count()
    >>> rels_count = g.get_relationship_count()




●
    Delete
    >>> g.delete()


                                                                                        Source: py2neo, http://py2neo.org/

                     Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012                              59
NEO4J-REST-CLIENT CYPHER HELPERS
●
    Query casting
    >>> q = """start n=node(*) match n-[r:punchs]-() """ 
            """return n, n.name, r, r.since"""
    >>> results = g.query(q, returns=(Node, unicode, Relationship, int))



●
    Complex filtering
    lookups = (
        Q("name", exact="Arnold") &
        (Q("surname", istartswith="swar") &
         ~Q("surname", iendswith="chenegger"))
    )
    arnolds = g.nodes.filter(lookups)




                                                            Source: neo4j-rest-client, https://github.com/versae/neo4j-rest-client

                  Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012                                            60
LET'S PLAY!
●
    Deploy Neo4j in Heroku or Amazon


●
    Use one of the available clients




               Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012   61
NEO4J HEROKU ADD-ON
●
    Create a Heroku app and add the Neo4j add-on
    $   heroku apps:create pyconca
    $   heroku addons:add neo4j --app pyconca
    $   xdg-open `heroku config:get NEO4J_URL --app pyconca`
    $   export NEO4J_URL=`heroku config:get NEO4J_URL --app pyconca`




●
    Create a virtualenv with neo4j-rest-client
    $   mkvirtualenv --no-site-packages pyconca
    $   workon pyconca
    $   pip install ipython neo4jrestclient
    $   ipython




                   Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012   62

Recommended for you

Apache Pig: Making data transformation easy
Apache Pig: Making data transformation easyApache Pig: Making data transformation easy
Apache Pig: Making data transformation easy

This is part of an introductory course to Big Data Tools for Artificial Intelligence. These slides introduce students to the use of Apache Pig as an ETL tool over Hadoop.

apache pighadoopetl
Scalable Hadoop with succinct Python: the best of both worlds
Scalable Hadoop with succinct Python: the best of both worldsScalable Hadoop with succinct Python: the best of both worlds
Scalable Hadoop with succinct Python: the best of both worlds

The document discusses using Python with Hadoop frameworks. It outlines some of the benefits of Hadoop like scalability and schema flexibility, and benefits of Python like succinct code and many data science libraries. It then reviews several projects that aim to bridge Python and Hadoop, including mrjob for MapReduce jobs, Pydoop for faster MapReduce, Pig for higher-level data flows, Snakebite for a Python HDFS client, and PySpark for working with Spark. However, it notes that Python support is often an afterthought or fringe project compared to the native Java support, and lacks commercial backing or cohesive APIs.

hadoop summitpython
Graph Gurus Episode 1: Enterprise Graph
Graph Gurus Episode 1: Enterprise GraphGraph Gurus Episode 1: Enterprise Graph
Graph Gurus Episode 1: Enterprise Graph

This document summarizes an introductory webinar on building an enterprise knowledge graph from RDF data using TigerGraph. It introduces RDF and knowledge graphs, demonstrates loading DBpedia data into a TigerGraph graph database using a universal schema, and provides examples of queries to extract information from the graph such as related people, publishers by location, and related topics for a given predicate. The webinar encourages attendees to learn more about graph databases and TigerGraph through additional resources and future webinar episodes.

#bigdata#analytics#datascience
NEO4J HEROKU ADD-ON
●
    Run IPython and that's it!
    >>>   import os
    >>>   NEO4J_URL = os.environ["NEO4J_URL"]
    >>>   from neo4jrestclient import client
    >>>   gdb = client.GraphDatabase(NEO4J_URL + "/db/data")
    >>>   gdb.url




                   Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012   63
NEO4J HEROKU ADD-ON
●
    Run IPython and that's it!
    >>>   import os
    >>>   NEO4J_URL = os.environ["NEO4J_URL"]
    >>>   from neo4jrestclient import client
    >>>   gdb = client.GraphDatabase(NEO4J_URL + "/db/data")
    >>>   gdb.url




                   Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012   64
THANKS!
     Questions?
    Javier de la Rosa
          @versae
     The CulturePlex Lab
Western University, London, ON

    PyCon Canada 2012
APPENDIX: DATA MODELS
●
    neo4django
    –   https://github.com/scholrly/neo4django


●
    neomodel
    –   https://github.com/robinedwards/neomodel


●
    bulbflow models
    –   http://bulbflow.com/quickstart/#models

                 Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012   66

Recommended for you

Power of Python with Big Data
Power of Python with Big DataPower of Python with Big Data
Power of Python with Big Data

Programmers love Python because of how fast and easy it is to use. Python cuts development time in half with its simple to read syntax and easy compilation feature. Debugging your programs is a breeze in Python with its built in debugger. Python is continued to be a favourite option for data scientists who use it for building and using Machine learning applications and other scientific computations. Python has evolved as the most preferred Language for Data Analytics and the increasing search trends on python also indicates that Python is the next "Big Thing" and a must for Professionals in the Data Analytics domain.

learn python onlinepythonpython tutorial
Grails goes Graph
Grails goes GraphGrails goes Graph
Grails goes Graph

This document discusses Grails integration with Neo4j graph databases. It begins with an introduction to graph databases and Neo4j. It then covers the Grails Neo4j plugin which allows using Neo4j as the persistence layer for Grails domain classes. Finally, it addresses some challenges in mapping the Grails domain model to the Neo4j nodespace and potential solutions.

springone2gxgrailsneo4j
Mapping the Web Ontology Language to OpenApi
Mapping the Web Ontology Language to OpenApiMapping the Web Ontology Language to OpenApi
Mapping the Web Ontology Language to OpenApi

This document proposes a mapping between the Web Ontology Language (OWL) and the OpenAPI Specification (OAS) to generate REST APIs from OWL ontologies. It describes a mapping method, discusses related work, and details the mapping's coverage of OWL constructs. While some constructs like complex boolean restrictions are not supported, the mapping specification and implementation aim to make ontology knowledge graphs accessible via RESTful APIs in accordance with FAIR principles. Future work includes enhancing path/schema naming and adding metadata annotations.

owloasapis
APPENDIX: VISUALIZE YOUR GRAPH
●
    Export somehow to .gexf for Gephi
    –   http://gephi.org/
●
    Use D3.js
    –   http://d3js.org/
●
    Use sigma.js
    –   http://sigmajs.org/
●
    Take a look on Max De Marzi work
    –   http://maxdemarzi.com/category/visualization/
●
    Use Sylva (for newbies)
    –   http://www.sylvadb.com/
                    Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012   67

More Related Content

What's hot

introduction to Neo4j (Tabriz Software Open Talks)
introduction to Neo4j (Tabriz Software Open Talks)introduction to Neo4j (Tabriz Software Open Talks)
introduction to Neo4j (Tabriz Software Open Talks)
Farzin Bagheri
 
Presto as a Service - Tips for operation and monitoring
Presto as a Service - Tips for operation and monitoringPresto as a Service - Tips for operation and monitoring
Presto as a Service - Tips for operation and monitoring
Taro L. Saito
 
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 1 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 1 (...Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 1 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 1 (...
Olaf Hartig
 
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...
Olaf Hartig
 
Data Day Texas 2017: Scaling Data Science at Stitch Fix
Data Day Texas 2017: Scaling Data Science at Stitch FixData Day Texas 2017: Scaling Data Science at Stitch Fix
Data Day Texas 2017: Scaling Data Science at Stitch Fix
Stefan Krawczyk
 
H2O Deep Water - Making Deep Learning Accessible to Everyone
H2O Deep Water - Making Deep Learning Accessible to EveryoneH2O Deep Water - Making Deep Learning Accessible to Everyone
H2O Deep Water - Making Deep Learning Accessible to Everyone
Sri Ambati
 
Neo, Titan & Cassandra
Neo, Titan & CassandraNeo, Titan & Cassandra
Neo, Titan & Cassandra
johnrjenson
 
LDQL: A Query Language for the Web of Linked Data
LDQL: A Query Language for the Web of Linked DataLDQL: A Query Language for the Web of Linked Data
LDQL: A Query Language for the Web of Linked Data
Olaf Hartig
 
Groovy On Trading Desk (2010)
Groovy On Trading Desk (2010)Groovy On Trading Desk (2010)
Groovy On Trading Desk (2010)
Jonathan Felch
 
Designing and Building a Graph Database Application - Ian Robinson (Neo Techn...
Designing and Building a Graph Database Application - Ian Robinson (Neo Techn...Designing and Building a Graph Database Application - Ian Robinson (Neo Techn...
Designing and Building a Graph Database Application - Ian Robinson (Neo Techn...
jaxLondonConference
 
How Graph Databases efficiently store, manage and query connected data at s...
How Graph Databases efficiently  store, manage and query  connected data at s...How Graph Databases efficiently  store, manage and query  connected data at s...
How Graph Databases efficiently store, manage and query connected data at s...
jexp
 
H2O with Erin LeDell at Portland R User Group
H2O with Erin LeDell at Portland R User GroupH2O with Erin LeDell at Portland R User Group
H2O with Erin LeDell at Portland R User Group
Sri Ambati
 
Sparkler - Spark Crawler
Sparkler - Spark Crawler Sparkler - Spark Crawler
Sparkler - Spark Crawler
Thamme Gowda
 
Microservices, containers, and machine learning
Microservices, containers, and machine learningMicroservices, containers, and machine learning
Microservices, containers, and machine learning
Paco Nathan
 
A super fast introduction to Spark and glance at BEAM
A super fast introduction to Spark and glance at BEAMA super fast introduction to Spark and glance at BEAM
A super fast introduction to Spark and glance at BEAM
Holden Karau
 
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science LabScalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
Sri Ambati
 
A whirlwind tour of graph databases
A whirlwind tour of graph databasesA whirlwind tour of graph databases
A whirlwind tour of graph databases
jexp
 
Treasure Data Cloud Strategy
Treasure Data Cloud StrategyTreasure Data Cloud Strategy
Treasure Data Cloud Strategy
Treasure Data, Inc.
 
Intro to Python Data Analysis in Wakari
Intro to Python Data Analysis in WakariIntro to Python Data Analysis in Wakari
Intro to Python Data Analysis in Wakari
Karissa Rae McKelvey
 

What's hot (19)

introduction to Neo4j (Tabriz Software Open Talks)
introduction to Neo4j (Tabriz Software Open Talks)introduction to Neo4j (Tabriz Software Open Talks)
introduction to Neo4j (Tabriz Software Open Talks)
 
Presto as a Service - Tips for operation and monitoring
Presto as a Service - Tips for operation and monitoringPresto as a Service - Tips for operation and monitoring
Presto as a Service - Tips for operation and monitoring
 
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 1 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 1 (...Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 1 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 1 (...
 
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...
 
Data Day Texas 2017: Scaling Data Science at Stitch Fix
Data Day Texas 2017: Scaling Data Science at Stitch FixData Day Texas 2017: Scaling Data Science at Stitch Fix
Data Day Texas 2017: Scaling Data Science at Stitch Fix
 
H2O Deep Water - Making Deep Learning Accessible to Everyone
H2O Deep Water - Making Deep Learning Accessible to EveryoneH2O Deep Water - Making Deep Learning Accessible to Everyone
H2O Deep Water - Making Deep Learning Accessible to Everyone
 
Neo, Titan & Cassandra
Neo, Titan & CassandraNeo, Titan & Cassandra
Neo, Titan & Cassandra
 
LDQL: A Query Language for the Web of Linked Data
LDQL: A Query Language for the Web of Linked DataLDQL: A Query Language for the Web of Linked Data
LDQL: A Query Language for the Web of Linked Data
 
Groovy On Trading Desk (2010)
Groovy On Trading Desk (2010)Groovy On Trading Desk (2010)
Groovy On Trading Desk (2010)
 
Designing and Building a Graph Database Application - Ian Robinson (Neo Techn...
Designing and Building a Graph Database Application - Ian Robinson (Neo Techn...Designing and Building a Graph Database Application - Ian Robinson (Neo Techn...
Designing and Building a Graph Database Application - Ian Robinson (Neo Techn...
 
How Graph Databases efficiently store, manage and query connected data at s...
How Graph Databases efficiently  store, manage and query  connected data at s...How Graph Databases efficiently  store, manage and query  connected data at s...
How Graph Databases efficiently store, manage and query connected data at s...
 
H2O with Erin LeDell at Portland R User Group
H2O with Erin LeDell at Portland R User GroupH2O with Erin LeDell at Portland R User Group
H2O with Erin LeDell at Portland R User Group
 
Sparkler - Spark Crawler
Sparkler - Spark Crawler Sparkler - Spark Crawler
Sparkler - Spark Crawler
 
Microservices, containers, and machine learning
Microservices, containers, and machine learningMicroservices, containers, and machine learning
Microservices, containers, and machine learning
 
A super fast introduction to Spark and glance at BEAM
A super fast introduction to Spark and glance at BEAMA super fast introduction to Spark and glance at BEAM
A super fast introduction to Spark and glance at BEAM
 
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science LabScalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
 
A whirlwind tour of graph databases
A whirlwind tour of graph databasesA whirlwind tour of graph databases
A whirlwind tour of graph databases
 
Treasure Data Cloud Strategy
Treasure Data Cloud StrategyTreasure Data Cloud Strategy
Treasure Data Cloud Strategy
 
Intro to Python Data Analysis in Wakari
Intro to Python Data Analysis in WakariIntro to Python Data Analysis in Wakari
Intro to Python Data Analysis in Wakari
 

Viewers also liked

Persistent graphs in Python with Neo4j
Persistent graphs in Python with Neo4jPersistent graphs in Python with Neo4j
Persistent graphs in Python with Neo4j
Tobias Lindaaker
 
PyCon India 2012: Rapid development of website search in python
PyCon India 2012: Rapid development of website search in pythonPyCon India 2012: Rapid development of website search in python
PyCon India 2012: Rapid development of website search in python
Chetan Giridhar
 
Making Django and NoSQL Play Nice
Making Django and NoSQL Play NiceMaking Django and NoSQL Play Nice
Making Django and NoSQL Play Nice
Alex Gaynor
 
Building Knowledge Graphs in DIG
Building Knowledge Graphs in DIGBuilding Knowledge Graphs in DIG
Building Knowledge Graphs in DIG
Palak Modi
 
Relational vs. Non-Relational
Relational vs. Non-RelationalRelational vs. Non-Relational
Relational vs. Non-Relational
PostgreSQL Experts, Inc.
 
Bubbles – Virtual Data Objects
Bubbles – Virtual Data ObjectsBubbles – Virtual Data Objects
Bubbles – Virtual Data Objects
Stefan Urbanek
 
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
Trey Grainger
 
Blue Ocean Strategy 5 November 2010 Emergo Vfc1
Blue Ocean Strategy 5 November 2010 Emergo Vfc1Blue Ocean Strategy 5 November 2010 Emergo Vfc1
Blue Ocean Strategy 5 November 2010 Emergo Vfc1
Nicoline Valk
 
Divas (nx power lite)
Divas (nx power lite)Divas (nx power lite)
Divas (nx power lite)
Lia Dumitrescu
 
Graphical Analysis
Graphical AnalysisGraphical Analysis
Graphical Analysis
Mary Rose Jadulco
 
Mobisfera - Agència de Mobile Marketing
Mobisfera - Agència de Mobile MarketingMobisfera - Agència de Mobile Marketing
Mobisfera - Agència de Mobile Marketing
Mobisfera
 
The Why and How of Java8 at LINE Fukuoka
The Why and How of Java8 at LINE FukuokaThe Why and How of Java8 at LINE Fukuoka
The Why and How of Java8 at LINE Fukuoka
Youhei Nitta
 
Dementias Platform UK
Dementias Platform UKDementias Platform UK
Fastrack Digital Marketing Campaign by Jubaer
Fastrack Digital Marketing Campaign by JubaerFastrack Digital Marketing Campaign by Jubaer
Fastrack Digital Marketing Campaign by Jubaer
Slide Gen
 
Designing The Digital Experience
Designing The Digital ExperienceDesigning The Digital Experience
Designing The Digital Experience
David King
 
Presentación sobre autores por Mati y Vasile
Presentación sobre autores por Mati y VasilePresentación sobre autores por Mati y Vasile
Presentación sobre autores por Mati y Vasile
sextoBLucena
 
Hidden markovmodel
Hidden markovmodelHidden markovmodel
Hidden markovmodel
Thị Thanh Mỹ Bùi
 
Multi-screen media report - May 2012 (Nielsen)
Multi-screen media report - May 2012 (Nielsen)Multi-screen media report - May 2012 (Nielsen)
Multi-screen media report - May 2012 (Nielsen)
Maple Aikon
 

Viewers also liked (19)

Persistent graphs in Python with Neo4j
Persistent graphs in Python with Neo4jPersistent graphs in Python with Neo4j
Persistent graphs in Python with Neo4j
 
PyCon India 2012: Rapid development of website search in python
PyCon India 2012: Rapid development of website search in pythonPyCon India 2012: Rapid development of website search in python
PyCon India 2012: Rapid development of website search in python
 
Making Django and NoSQL Play Nice
Making Django and NoSQL Play NiceMaking Django and NoSQL Play Nice
Making Django and NoSQL Play Nice
 
Building Knowledge Graphs in DIG
Building Knowledge Graphs in DIGBuilding Knowledge Graphs in DIG
Building Knowledge Graphs in DIG
 
Relational vs. Non-Relational
Relational vs. Non-RelationalRelational vs. Non-Relational
Relational vs. Non-Relational
 
Bubbles – Virtual Data Objects
Bubbles – Virtual Data ObjectsBubbles – Virtual Data Objects
Bubbles – Virtual Data Objects
 
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
 
Espruinoの紹介
Espruinoの紹介Espruinoの紹介
Espruinoの紹介
 
Blue Ocean Strategy 5 November 2010 Emergo Vfc1
Blue Ocean Strategy 5 November 2010 Emergo Vfc1Blue Ocean Strategy 5 November 2010 Emergo Vfc1
Blue Ocean Strategy 5 November 2010 Emergo Vfc1
 
Divas (nx power lite)
Divas (nx power lite)Divas (nx power lite)
Divas (nx power lite)
 
Graphical Analysis
Graphical AnalysisGraphical Analysis
Graphical Analysis
 
Mobisfera - Agència de Mobile Marketing
Mobisfera - Agència de Mobile MarketingMobisfera - Agència de Mobile Marketing
Mobisfera - Agència de Mobile Marketing
 
The Why and How of Java8 at LINE Fukuoka
The Why and How of Java8 at LINE FukuokaThe Why and How of Java8 at LINE Fukuoka
The Why and How of Java8 at LINE Fukuoka
 
Dementias Platform UK
Dementias Platform UKDementias Platform UK
Dementias Platform UK
 
Fastrack Digital Marketing Campaign by Jubaer
Fastrack Digital Marketing Campaign by JubaerFastrack Digital Marketing Campaign by Jubaer
Fastrack Digital Marketing Campaign by Jubaer
 
Designing The Digital Experience
Designing The Digital ExperienceDesigning The Digital Experience
Designing The Digital Experience
 
Presentación sobre autores por Mati y Vasile
Presentación sobre autores por Mati y VasilePresentación sobre autores por Mati y Vasile
Presentación sobre autores por Mati y Vasile
 
Hidden markovmodel
Hidden markovmodelHidden markovmodel
Hidden markovmodel
 
Multi-screen media report - May 2012 (Nielsen)
Multi-screen media report - May 2012 (Nielsen)Multi-screen media report - May 2012 (Nielsen)
Multi-screen media report - May 2012 (Nielsen)
 

Similar to Graph Databases in Python (PyCon Canada 2012)

Running R on Hadoop - CHUG - 20120815
Running R on Hadoop - CHUG - 20120815Running R on Hadoop - CHUG - 20120815
Running R on Hadoop - CHUG - 20120815
Chicago Hadoop Users Group
 
Getting started with R & Hadoop
Getting started with R & HadoopGetting started with R & Hadoop
Getting started with R & Hadoop
Jeffrey Breen
 
Binary RDF for Scalable Publishing, Exchanging and Consumption in the Web of ...
Binary RDF for Scalable Publishing, Exchanging and Consumption in the Web of ...Binary RDF for Scalable Publishing, Exchanging and Consumption in the Web of ...
Binary RDF for Scalable Publishing, Exchanging and Consumption in the Web of ...
WU (Vienna University of Economics and Business)
 
The Semantic Web and Drupal 7 - Loja 2013
The Semantic Web and Drupal 7 - Loja 2013The Semantic Web and Drupal 7 - Loja 2013
The Semantic Web and Drupal 7 - Loja 2013
scorlosquet
 
Exploring Linked Data
Exploring Linked DataExploring Linked Data
Exploring Linked Data
Roberto García
 
Tese phd
Tese phdTese phd
Tese phd
Rodrigo Senra
 
Cloud-Based Spatial Data Analytics with R/Shiny
Cloud-Based Spatial Data Analytics with R/ShinyCloud-Based Spatial Data Analytics with R/Shiny
Cloud-Based Spatial Data Analytics with R/Shiny
IDEAS - Int'l Data Engineering and Science Association
 
Apache Pig: Making data transformation easy
Apache Pig: Making data transformation easyApache Pig: Making data transformation easy
Apache Pig: Making data transformation easy
Victor Sanchez Anguix
 
Scalable Hadoop with succinct Python: the best of both worlds
Scalable Hadoop with succinct Python: the best of both worldsScalable Hadoop with succinct Python: the best of both worlds
Scalable Hadoop with succinct Python: the best of both worlds
DataWorks Summit
 
Graph Gurus Episode 1: Enterprise Graph
Graph Gurus Episode 1: Enterprise GraphGraph Gurus Episode 1: Enterprise Graph
Graph Gurus Episode 1: Enterprise Graph
TigerGraph
 
Power of Python with Big Data
Power of Python with Big DataPower of Python with Big Data
Power of Python with Big Data
Edureka!
 
Grails goes Graph
Grails goes GraphGrails goes Graph
Grails goes Graph
darthvader42
 
Mapping the Web Ontology Language to OpenApi
Mapping the Web Ontology Language to OpenApiMapping the Web Ontology Language to OpenApi
Mapping the Web Ontology Language to OpenApi
Paola Espinoza-Arias
 
DrupalCamp NJ 2014 Solr and Schema.org
DrupalCamp NJ 2014 Solr and Schema.orgDrupalCamp NJ 2014 Solr and Schema.org
DrupalCamp NJ 2014 Solr and Schema.org
scorlosquet
 
PyCon Colombia 2020 Python for Data Analysis: Past, Present, and Future
PyCon Colombia 2020 Python for Data Analysis: Past, Present, and Future PyCon Colombia 2020 Python for Data Analysis: Past, Present, and Future
PyCon Colombia 2020 Python for Data Analysis: Past, Present, and Future
Wes McKinney
 
Bcn On Rails May2010 On Graph Databases
Bcn On Rails May2010 On Graph DatabasesBcn On Rails May2010 On Graph Databases
Bcn On Rails May2010 On Graph Databases
Pere Urbón-Bayes
 
Spark 2013-04-17
Spark 2013-04-17Spark 2013-04-17
Spark 2013-04-17
michaelmalak
 
17CS008.pdf
17CS008.pdf17CS008.pdf
17CS008.pdf
Siva453615
 
Big Data & Hadoop. Simone Leo (CRS4)
Big Data & Hadoop. Simone Leo (CRS4)Big Data & Hadoop. Simone Leo (CRS4)
Big Data & Hadoop. Simone Leo (CRS4)
CRS4 Research Center in Sardinia
 
Seminario Cristian Lai, 06-09-2012
Seminario Cristian Lai, 06-09-2012Seminario Cristian Lai, 06-09-2012
Seminario Cristian Lai, 06-09-2012
CRS4 Research Center in Sardinia
 

Similar to Graph Databases in Python (PyCon Canada 2012) (20)

Running R on Hadoop - CHUG - 20120815
Running R on Hadoop - CHUG - 20120815Running R on Hadoop - CHUG - 20120815
Running R on Hadoop - CHUG - 20120815
 
Getting started with R & Hadoop
Getting started with R & HadoopGetting started with R & Hadoop
Getting started with R & Hadoop
 
Binary RDF for Scalable Publishing, Exchanging and Consumption in the Web of ...
Binary RDF for Scalable Publishing, Exchanging and Consumption in the Web of ...Binary RDF for Scalable Publishing, Exchanging and Consumption in the Web of ...
Binary RDF for Scalable Publishing, Exchanging and Consumption in the Web of ...
 
The Semantic Web and Drupal 7 - Loja 2013
The Semantic Web and Drupal 7 - Loja 2013The Semantic Web and Drupal 7 - Loja 2013
The Semantic Web and Drupal 7 - Loja 2013
 
Exploring Linked Data
Exploring Linked DataExploring Linked Data
Exploring Linked Data
 
Tese phd
Tese phdTese phd
Tese phd
 
Cloud-Based Spatial Data Analytics with R/Shiny
Cloud-Based Spatial Data Analytics with R/ShinyCloud-Based Spatial Data Analytics with R/Shiny
Cloud-Based Spatial Data Analytics with R/Shiny
 
Apache Pig: Making data transformation easy
Apache Pig: Making data transformation easyApache Pig: Making data transformation easy
Apache Pig: Making data transformation easy
 
Scalable Hadoop with succinct Python: the best of both worlds
Scalable Hadoop with succinct Python: the best of both worldsScalable Hadoop with succinct Python: the best of both worlds
Scalable Hadoop with succinct Python: the best of both worlds
 
Graph Gurus Episode 1: Enterprise Graph
Graph Gurus Episode 1: Enterprise GraphGraph Gurus Episode 1: Enterprise Graph
Graph Gurus Episode 1: Enterprise Graph
 
Power of Python with Big Data
Power of Python with Big DataPower of Python with Big Data
Power of Python with Big Data
 
Grails goes Graph
Grails goes GraphGrails goes Graph
Grails goes Graph
 
Mapping the Web Ontology Language to OpenApi
Mapping the Web Ontology Language to OpenApiMapping the Web Ontology Language to OpenApi
Mapping the Web Ontology Language to OpenApi
 
DrupalCamp NJ 2014 Solr and Schema.org
DrupalCamp NJ 2014 Solr and Schema.orgDrupalCamp NJ 2014 Solr and Schema.org
DrupalCamp NJ 2014 Solr and Schema.org
 
PyCon Colombia 2020 Python for Data Analysis: Past, Present, and Future
PyCon Colombia 2020 Python for Data Analysis: Past, Present, and Future PyCon Colombia 2020 Python for Data Analysis: Past, Present, and Future
PyCon Colombia 2020 Python for Data Analysis: Past, Present, and Future
 
Bcn On Rails May2010 On Graph Databases
Bcn On Rails May2010 On Graph DatabasesBcn On Rails May2010 On Graph Databases
Bcn On Rails May2010 On Graph Databases
 
Spark 2013-04-17
Spark 2013-04-17Spark 2013-04-17
Spark 2013-04-17
 
17CS008.pdf
17CS008.pdf17CS008.pdf
17CS008.pdf
 
Big Data & Hadoop. Simone Leo (CRS4)
Big Data & Hadoop. Simone Leo (CRS4)Big Data & Hadoop. Simone Leo (CRS4)
Big Data & Hadoop. Simone Leo (CRS4)
 
Seminario Cristian Lai, 06-09-2012
Seminario Cristian Lai, 06-09-2012Seminario Cristian Lai, 06-09-2012
Seminario Cristian Lai, 06-09-2012
 

More from Javier de la Rosa

Dr. Glearning for FirefoxOS
Dr. Glearning for FirefoxOSDr. Glearning for FirefoxOS
Dr. Glearning for FirefoxOS
Javier de la Rosa
 
Neutralización de /l/ por /r/
Neutralización de /l/ por /r/Neutralización de /l/ por /r/
Neutralización de /l/ por /r/
Javier de la Rosa
 
BaroqueArt at Arts, Humanities, and Complex Networks — 2nd Leonardo satellite...
BaroqueArt at Arts, Humanities, and Complex Networks — 2nd Leonardo satellite...BaroqueArt at Arts, Humanities, and Complex Networks — 2nd Leonardo satellite...
BaroqueArt at Arts, Humanities, and Complex Networks — 2nd Leonardo satellite...
Javier de la Rosa
 
Databases evolution in CulturePlex Lab
Databases evolution in CulturePlex LabDatabases evolution in CulturePlex Lab
Databases evolution in CulturePlex Lab
Javier de la Rosa
 
Presentation of "Hybrid Page Layout Analysis via Tab-Stop Detection"
Presentation of "Hybrid Page Layout Analysis via Tab-Stop Detection"Presentation of "Hybrid Page Layout Analysis via Tab-Stop Detection"
Presentation of "Hybrid Page Layout Analysis via Tab-Stop Detection"
Javier de la Rosa
 
Mejora de un problema combinatorio sobre vectores ordenados
Mejora de un problema combinatorio sobre vectores ordenadosMejora de un problema combinatorio sobre vectores ordenados
Mejora de un problema combinatorio sobre vectores ordenados
Javier de la Rosa
 

More from Javier de la Rosa (6)

Dr. Glearning for FirefoxOS
Dr. Glearning for FirefoxOSDr. Glearning for FirefoxOS
Dr. Glearning for FirefoxOS
 
Neutralización de /l/ por /r/
Neutralización de /l/ por /r/Neutralización de /l/ por /r/
Neutralización de /l/ por /r/
 
BaroqueArt at Arts, Humanities, and Complex Networks — 2nd Leonardo satellite...
BaroqueArt at Arts, Humanities, and Complex Networks — 2nd Leonardo satellite...BaroqueArt at Arts, Humanities, and Complex Networks — 2nd Leonardo satellite...
BaroqueArt at Arts, Humanities, and Complex Networks — 2nd Leonardo satellite...
 
Databases evolution in CulturePlex Lab
Databases evolution in CulturePlex LabDatabases evolution in CulturePlex Lab
Databases evolution in CulturePlex Lab
 
Presentation of "Hybrid Page Layout Analysis via Tab-Stop Detection"
Presentation of "Hybrid Page Layout Analysis via Tab-Stop Detection"Presentation of "Hybrid Page Layout Analysis via Tab-Stop Detection"
Presentation of "Hybrid Page Layout Analysis via Tab-Stop Detection"
 
Mejora de un problema combinatorio sobre vectores ordenados
Mejora de un problema combinatorio sobre vectores ordenadosMejora de un problema combinatorio sobre vectores ordenados
Mejora de un problema combinatorio sobre vectores ordenados
 

Recently uploaded

Quantum Communications Q&A with Gemini LLM
Quantum Communications Q&A with Gemini LLMQuantum Communications Q&A with Gemini LLM
Quantum Communications Q&A with Gemini LLM
Vijayananda Mohire
 
Calgary MuleSoft Meetup APM and IDP .pptx
Calgary MuleSoft Meetup APM and IDP .pptxCalgary MuleSoft Meetup APM and IDP .pptx
Calgary MuleSoft Meetup APM and IDP .pptx
ishalveerrandhawa1
 
What’s New in Teams Calling, Meetings and Devices May 2024
What’s New in Teams Calling, Meetings and Devices May 2024What’s New in Teams Calling, Meetings and Devices May 2024
What’s New in Teams Calling, Meetings and Devices May 2024
Stephanie Beckett
 
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - MydbopsScaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Mydbops
 
What's New in Copilot for Microsoft365 May 2024.pptx
What's New in Copilot for Microsoft365 May 2024.pptxWhat's New in Copilot for Microsoft365 May 2024.pptx
What's New in Copilot for Microsoft365 May 2024.pptx
Stephanie Beckett
 
How to Build a Profitable IoT Product.pptx
How to Build a Profitable IoT Product.pptxHow to Build a Profitable IoT Product.pptx
How to Build a Profitable IoT Product.pptx
Adam Dunkels
 
Password Rotation in 2024 is still Relevant
Password Rotation in 2024 is still RelevantPassword Rotation in 2024 is still Relevant
Password Rotation in 2024 is still Relevant
Bert Blevins
 
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdfWhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
ArgaBisma
 
Transcript: Details of description part II: Describing images in practice - T...
Transcript: Details of description part II: Describing images in practice - T...Transcript: Details of description part II: Describing images in practice - T...
Transcript: Details of description part II: Describing images in practice - T...
BookNet Canada
 
Recent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS InfrastructureRecent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS Infrastructure
KAMAL CHOUDHARY
 
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptxRPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
SynapseIndia
 
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdfBT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
Neo4j
 
The Rise of Supernetwork Data Intensive Computing
The Rise of Supernetwork Data Intensive ComputingThe Rise of Supernetwork Data Intensive Computing
The Rise of Supernetwork Data Intensive Computing
Larry Smarr
 
BLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALL
BLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALLBLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALL
BLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALL
Liveplex
 
Manual | Product | Research Presentation
Manual | Product | Research PresentationManual | Product | Research Presentation
Manual | Product | Research Presentation
welrejdoall
 
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Erasmo Purificato
 
Observability For You and Me with OpenTelemetry
Observability For You and Me with OpenTelemetryObservability For You and Me with OpenTelemetry
Observability For You and Me with OpenTelemetry
Eric D. Schabell
 
find out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challengesfind out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challenges
huseindihon
 
20240702 QFM021 Machine Intelligence Reading List June 2024
20240702 QFM021 Machine Intelligence Reading List June 202420240702 QFM021 Machine Intelligence Reading List June 2024
20240702 QFM021 Machine Intelligence Reading List June 2024
Matthew Sinclair
 
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
Kief Morris
 

Recently uploaded (20)

Quantum Communications Q&A with Gemini LLM
Quantum Communications Q&A with Gemini LLMQuantum Communications Q&A with Gemini LLM
Quantum Communications Q&A with Gemini LLM
 
Calgary MuleSoft Meetup APM and IDP .pptx
Calgary MuleSoft Meetup APM and IDP .pptxCalgary MuleSoft Meetup APM and IDP .pptx
Calgary MuleSoft Meetup APM and IDP .pptx
 
What’s New in Teams Calling, Meetings and Devices May 2024
What’s New in Teams Calling, Meetings and Devices May 2024What’s New in Teams Calling, Meetings and Devices May 2024
What’s New in Teams Calling, Meetings and Devices May 2024
 
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - MydbopsScaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
 
What's New in Copilot for Microsoft365 May 2024.pptx
What's New in Copilot for Microsoft365 May 2024.pptxWhat's New in Copilot for Microsoft365 May 2024.pptx
What's New in Copilot for Microsoft365 May 2024.pptx
 
How to Build a Profitable IoT Product.pptx
How to Build a Profitable IoT Product.pptxHow to Build a Profitable IoT Product.pptx
How to Build a Profitable IoT Product.pptx
 
Password Rotation in 2024 is still Relevant
Password Rotation in 2024 is still RelevantPassword Rotation in 2024 is still Relevant
Password Rotation in 2024 is still Relevant
 
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdfWhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
 
Transcript: Details of description part II: Describing images in practice - T...
Transcript: Details of description part II: Describing images in practice - T...Transcript: Details of description part II: Describing images in practice - T...
Transcript: Details of description part II: Describing images in practice - T...
 
Recent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS InfrastructureRecent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS Infrastructure
 
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptxRPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
 
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdfBT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
 
The Rise of Supernetwork Data Intensive Computing
The Rise of Supernetwork Data Intensive ComputingThe Rise of Supernetwork Data Intensive Computing
The Rise of Supernetwork Data Intensive Computing
 
BLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALL
BLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALLBLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALL
BLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALL
 
Manual | Product | Research Presentation
Manual | Product | Research PresentationManual | Product | Research Presentation
Manual | Product | Research Presentation
 
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
 
Observability For You and Me with OpenTelemetry
Observability For You and Me with OpenTelemetryObservability For You and Me with OpenTelemetry
Observability For You and Me with OpenTelemetry
 
find out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challengesfind out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challenges
 
20240702 QFM021 Machine Intelligence Reading List June 2024
20240702 QFM021 Machine Intelligence Reading List June 202420240702 QFM021 Machine Intelligence Reading List June 2024
20240702 QFM021 Machine Intelligence Reading List June 2024
 
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
 

Graph Databases in Python (PyCon Canada 2012)

  • 1. GRAPH DATABASES IN PYTHON Javier de la Rosa @versae The CulturePlex Lab Western University, London, ON PyCon Canada 2012
  • 2. WHO I AM ● Javier de la Rosa ● versae ● versae ● Computer Scientist and Humanist ● CulturePlex Lab ● CulturePlex Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 2
  • 3. FIRST OF ALL “You do not really understand something unless you can explain it to your grandmother” – (Frequently attributed to) Richard Feynman Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 3
  • 4. DATABASES (in the last 30 years) ● Data in tables, rows and columns ● Pretty basic mechanism to make connections: – Primary keys, Foreign keys, and... that's all ● Relational, ahem, really? Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 4
  • 5. DATABASES (in the last 30 years) ● Rigid data schemas – Have you ever tried to make a schema migration? ● Relational Algebra and SQL – Terrible for highly interconnected data – JOIN's can take a life to end (a bit overdramatized) Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 5
  • 6. NoSQL, Not Only SQL ● Document ● Anaylitc – MongoDB, CouchDB, etc. – Hadoop ● Key-value stores ● Graph – Redis, Riak, Voldemort, – Neo4j, OrientDB, Dynamo, etc. HyperGraphDB, Titan, etc. ● Big Tables ● Other – Cassandra, Hbase, etc – Objectivity/DB, ZODB, etc. Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 6
  • 7. DATABASES LANDSCAPE Source: 451Research, https://451research.com/report-long?icid=2289 Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 7
  • 8. WHO IS USING GRAPHS? ● Mozilla with Pancake and Pacer – https://wiki.mozilla.org/Pancake & http://pangloss.github.com/pacer/ ● Twitter with FlockDB – https://github.com/twitter/flockdb ● Facebook with Open Graph – https://developers.facebook.com/docs/opengraph/ ● Google with Knowledge Graph – http://www.google.ca/insidesearch/.../knowledge.html Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 8
  • 9. WHY GRAPHS? ● Data is getting more and more connected – From text documents, to wikis, to ontologies, to folksonomies, etc ● And more semi-structured – Think about the decentralization of content generation ● And more complex – Social networks, semantic trending, etc Source: Neo Technology, http://www.slideshare.net/emileifrem/neo4j-the-benefits-of-graph-databases-oscon-2009 Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 9
  • 10. A FEW OF THE CURRENT USES ● Social Networking and Recommendations ● Network and Cloud Management ● Master Data Management ● Geospatial ● Bioinformatics ● Content Management and Security and Access Control Source: Mashable, http://mashable.com/2012/09/26/graph-databases/ Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 10
  • 11. AND WHY ELSE? ● Because graphs are cool! Leonard Euler Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 11
  • 12. WHAT IS A GRAPH? ● G = (V, E) Where – G is a graph – V is a set of vertices – E is a set of edges Source: Wikipedia, https://en.wikipedia.org/wiki/Graph_(mathematics) Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 12
  • 13. WHAT IS A GRAPH? ● G = (V, E) – Graph, aka network, diagram, etc. – Vertex, aka point, dot, node, element, etc. – Edge, aka relationship, arc, line, link, etc. ● Basically, “a graph states that something is related to something else” – Svetlana Sicular, Research Director at Gartner Source: Gartner, http://blogs.gartner.com/svetlana-sicular/think-graph/ Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 13
  • 14. TYPES OF GRAPH Undirected Digraph Source: Wikipedia, https://en.wikipedia.org/wiki/Graph_(mathematics) Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 14
  • 15. TYPES OF GRAPH Multigraph Hypergraph Source: Wikipedia, https://en.wikipedia.org/wiki/Graph_(mathematics) Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 15
  • 16. SOME GRAPHS EVEN HAVE A NAME ● Complete graphs K3 K5 K8 Source: Wikipedia, http://en.wikipedia.org/wiki/Gallery_of_named_graphs Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 16
  • 17. SOME GRAPHS EVEN HAVE A NAME ● Stars The star graphs S3, S4, S5 and S6 Source: Wikipedia, http://en.wikipedia.org/wiki/Gallery_of_named_graphs Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 17
  • 18. SOME GRAPHS EVEN HAVE A NAME ● Snarks Blanuša (second) Szekeres Double star Source: Wikipedia, http://en.wikipedia.org/wiki/Gallery_of_named_graphs Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 18
  • 19. THINGS CAN COMPLICATE... Local McLaughlin graph Source: Wikipedia, http://en.wikipedia.org/wiki/Gallery_of_named_graphs Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 19
  • 20. WAIT A SEC, Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 20
  • 21. DON'T WORRY ● Just one more type: the Property Graph 1 2 1 2 3 3 4 4 Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 21
  • 22. THE PROPERTY GRAPH ● Directed, attributed and multi-relational Name: Javi 1 2 1 Knows Knows Since: 2009 Since:1990 2 3 3 Name: David Likes Name: John 4 Likes 4 Title: The Art of Computer Programming Price: $135 Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 22
  • 23. THE PROPERTY GRAPH ● A set of nodes, and each node has: – An unique identifier. – A set of outgoing edges. – A set of incoming edges. – A collection of properties defined by a map from key to value. ● A set of relationships, and each relationship has: – An unique identifier. – An outgoing tail vertex. – An incoming head vertex. – And a collection of properties defined by a map from key to value. Source: TinkerPop, https://github.com/tinkerpop/gremlin/wiki/Defining-a-Property-Graph Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 23
  • 24. IN SHORT ● A Property Graph is composed by: – A set of nodes – A set of relationships – Properties and id's on both ● Sometimes, nodes and relationship can be typed – In Blueprints and Neo4j, a label denotes the type of relationship between its two nodes. Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 24
  • 25. GRAPH DATABASES ● A graph database uses graph structures with nodes, edges, and properties to represent and store data – ...but there is not an easy way to visualize this Source: Wikipedia, https://en.wikipedia.org/wiki/Graph_database Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 25
  • 26. HOW IT LOOKS IN PYTHON? Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 26
  • 27. HOW IT LOOKS IN PYTHON? # Let's create a graph >>> silvester = g.nodes.create(name="Silvester") Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 27
  • 28. HOW IT LOOKS IN PYTHON? # Let's create a graph >>> silvester = g.nodes.create(name="Silvester") Name: Silvester Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 28
  • 29. HOW IT LOOKS IN PYTHON? # Let's create a graph >>> silvester = g.nodes.create(name="Silvester") >>> arnold = g.nodes.create(name="Arnold") Name: Silvester Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 29
  • 30. HOW IT LOOKS IN PYTHON? # Let's create a graph >>> silvester = g.nodes.create(name="Silvester") >>> arnold = g.nodes.create(name="Arnold") Name: Silvester Name: Arnold Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 30
  • 31. HOW IT LOOKS IN PYTHON? # Let's create a graph >>> silvester = g.nodes.create(name="Silvester") >>> arnold = g.nodes.create(name="Arnold") >>> punch = arnold.punches(silvester) Name: Silvester Name: Arnold Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 31
  • 32. HOW IT LOOKS IN PYTHON? # Let's create a graph >>> silvester = g.nodes.create(name="Silvester") >>> arnold = g.nodes.create(name="Arnold") >>> punch = arnold.punches(silvester) punches Name: Silvester Name: Arnold Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 32
  • 33. HOW IT LOOKS IN PYTHON? punches Name: Arnold Name: Silvester Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 33
  • 34. HOW IT LOOKS IN PYTHON? >>> chuck = g.nodes.create(name="Chuck") punches Name: Arnold Name: Silvester Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 34
  • 35. HOW IT LOOKS IN PYTHON? >>> chuck = g.nodes.create(name="Chuck") punches Name: Arnold Name: Silvester Name: Chuck Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 35
  • 36. HOW IT LOOKS IN PYTHON? >>> chuck.dropkicks(silvester) >>> chuck.dropkicks(arnold) punches Name: Arnold Name: Silvester Name: Chuck Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 36
  • 37. HOW IT LOOKS IN PYTHON? >>> chuck.dropkicks(silvester) >>> chuck.dropkicks(arnold) punches dropkicks Name: Arnold dropkicks Name: Silvester Name: Chuck Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 37
  • 38. GRAPH DATABASES LANDSCAPE Database Data Model Query Method License Python Binding Cypher, Gremlin, Native, Neo4j Property Graph GPL, AGPL Traversal Blueprints, REST Gremlin, OrientDB Property Graph Apache 2 Blueprints Traversal Typed HGQuery, HyperGraphDB LGPL Nope Hypergraph Traversal DEX Property Graph Traversal Commercial Blueprints Titan Property Graph Gremlin Apache 2 Blueprints AGPL, InfoGrid Property Graph Traversal Nope Commercial InfiniteGraph Property Graph Gremlin Commercial Nope Source: Wikipedia, https://en.wikipedia.org/wiki/Graph_database Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 40
  • 39. GRAPH DATABASES LANDSCAPE And more: – AffinityDB – YarcData uRiKA – Apache Giraph – Cassovary – StigDB – NuvolaBase – Pegasus – Microsoft Trinity – Sherlock – And so on Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 41
  • 40. GRAPH DATABASES LANDSCAPE Database Data Model Query Method License Python Binding Cypher, Gremlin, Native, Neo4j Property Graph GPL, AGPL Traversal Blueprints, REST Gremlin, OrientDB Property Graph Apache 2 Blueprints Traversal Typed HGQuery, HyperGraphDB LGPL Nope Hypergraph Traversal DEX Property Graph Traversal Commercial Blueprints Titan Property Graph Gremlin Apache 2 Blueprints AGPL, InfoGrid Property Graph Traversal Nope Commercial InfiniteGraph Property Graph Gremlin Commercial Nope Source: Wikipedia, https://en.wikipedia.org/wiki/Graph_database Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 42
  • 41. GREMLIN, BLUEPRINTS, WAT? Let me introduce you the TinkerPop Stack Source:TinkerPop, http://www.tinkerpop.com/ Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 43
  • 42. BLUEPRINTS AND REXSTER ● Blueprints is a property graph model interface ● Rexster is a server that exposes any Blueprints graph through REST Source:TinkerPop, http://www.tinkerpop.com/ Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 44
  • 43. AND WHAT ABOUT PYTHON? ● Options to connect to a Blueprints Graph Database OrientDB Neo4j bulbflow Blueprints API Rexster python-blueprints pyblueprints DEX Titan REST Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 45
  • 44. BULBFLOW ● Create >>> alice = g.vertices.create(name="Alice") >>> bob = g.vertices.create(name="Bob") >>> g.edges.create(alice, "knows", bob) ● Get >>> alice = g.vertices.get(1) >>> bob = g.vertices.get(2) ● Update >>> alice.age = 21 >>> alice.save() ● Delete >>> alice.delete() Source: Bulbflow, http://bulbflow.com/docs/ Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 46
  • 45. PYBLUEPRINTS ● Create >>> alice = g.addVertex() >>> alice.setProperty("name", "Alice") >>> bob = g.addVertex() >>> bob.setProperty("name", "Bob") >>> g.addEdge(alice, bob, "knows") ● Get >>> alice = g.getVertex(1) >>> bob = g.getVertex(2) ● Update >>> alice.setProperty("age", 21) ● Delete >>> g.removeVertex(alice.getId()) Source: PyBlueprints, https://github.com/escalant3/pyblueprints Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 47
  • 46. BUT NEO4J HAS ITS OWN CLIENTS! ● REST Clients for Neo4j neo4j-rest-client OrientDB Neo4j py2neo Blueprints API Rexster bulbflow python-blueprints DEX Titan pyblueprints REST Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 48
  • 47. HOW CAN I LOOKUP? ● An index is a data structure that supports the fast lookup of elements by some key/value pair Source: TinkerPop, https://github.com/tinkerpop/blueprints/wiki/Graph-Indices Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 49
  • 48. INDICES ● In Python bindings, are similar to dict �� bulbflow # bulbflow creates auto indices to make easier basic lookups >>> nodes = g.vertices.index.lookup(name="Alice") >>> for node in nodes: ...: print vertex – PyBlueprints >>> index = g.getIndex("names", "vertex") >>> index.put("name", alice.getProperty("name"), alice) >>> nodes = index.get("name", "Alice") >>> for node in nodes: ...: print node Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 50
  • 49. INDICES ● Some Graph Databases provide full-text queries – bulbflow >>> nodes = g.vertices.index.query(name="ali*") >>> for node in nodes: ...: print node – PyBlueprints >>> index = g.getIndex("names", "vertex") >>> nodes = index.query("name", "ali*") >>> for node in nodes: ...: print node Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 51
  • 50. ...MORE COMPLEX SEARCHS? “Without traversals [FlockDB] is only a persisted graph. But not a graph database.” – Alex Popescu Source: myNoSQL, http://nosql.mypopescu.com/ Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 52
  • 51. LET'S TRAVERSE THE GRAPH! ● “A graph traversal is the problem of visiting all the nodes in a graph in a particular manner” – A* search – Alpha-beta prunning – Breadth-First Search (BFS) – Depth-First Search (DFS) – Dijkstra's algorithm – Floyd-Warshall's algortimth – Etc. Source: Wikipedia, https://en.wikipedia.org/wiki/Graph_traversal Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 53
  • 52. NEO4J TRAVERSAL API ● Python-embedded (native Neo4j Python binding) >>> traverser = gdb.traversal() .relationships('knows').traverse(alice) # The graph is traversed as you loop through the result >>> for node in traverser.nodes: ...: print node ● neo4j-rest-client >>> traverser = alice.traverse(types=[client.All.knows]) # The graph is traversed as you loop through the result >>> for node in traverser: ...: print node Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 54
  • 53. BLUEPRINTS GREMLIN ● Gremlin is a domain specific language for traversing property graphs – Defines how to do a query based on the graph structure >>> gremlin = g.extensions.GremlinPlugin.execute_script >>> params = {'alice_id': alice.id} >>> script = "g.V(alice_id).out('knows')" >>> node = gremlin(script=script, params=params) >>> node == bob Source: TinkerPop Gremlin, https://github.com/tinkerpop/gremlin/wiki Source: Marko Rodríguez, The Graph Traversal Programmin Pattern, http://www.slideshare.net/slidarko/graph-windycitydb2010 Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 55
  • 54. NEO4J CYPHER QUERY LANGUAGE ● Declarative graph query language – Expressive and efficient querying – Focused on expressing what to retrieve from a graph – Inspired by SQL – Pattern matching expressions from SPARQL Source: Wikipedia, https://en.wikipedia.org/wiki/Graph_database Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 56
  • 55. NEO4J CYPHER QUERY LANGUAGE ● Declarative graph query language – Expressive and efficient querying – Focused on expressing what to retrieve from a graph – Inspired by SQL – Pattern matching expressions from SPARQL 1 2 label (1) -[:label]- (2) Source: Wikipedia, https://en.wikipedia.org/wiki/Graph_database Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 57
  • 56. NEO4J CYPHER QUERY LANGUAGE ● Declarative graph query language – Expressive and efficient querying – Focused on expressing what to retrieve from a graph – Inspired by SQL – Pattern matching expressions from SPARQL 1 2 label START n=(1), m=(2) MATCH n-[r:label]-m RETURN r Source: Wikipedia, https://en.wikipedia.org/wiki/Graph_database Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 58
  • 57. PY2NEO CYPHER HELPERS ● Get or create elements >>> g.get_or_create_relationships( ...: (bob, "WORKS WITH", carol, {"since": 2004}), ...: (alice, "DISLIKES!", carol, {"reason": "youth"}), ...: (bob, "WORKS WITH", dave, {"since": 2009}), ) ● Get counts >>> nodes_count = g.get_node_count() >>> rels_count = g.get_relationship_count() ● Delete >>> g.delete() Source: py2neo, http://py2neo.org/ Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 59
  • 58. NEO4J-REST-CLIENT CYPHER HELPERS ● Query casting >>> q = """start n=node(*) match n-[r:punchs]-() """ """return n, n.name, r, r.since""" >>> results = g.query(q, returns=(Node, unicode, Relationship, int)) ● Complex filtering lookups = ( Q("name", exact="Arnold") & (Q("surname", istartswith="swar") & ~Q("surname", iendswith="chenegger")) ) arnolds = g.nodes.filter(lookups) Source: neo4j-rest-client, https://github.com/versae/neo4j-rest-client Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 60
  • 59. LET'S PLAY! ● Deploy Neo4j in Heroku or Amazon ● Use one of the available clients Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 61
  • 60. NEO4J HEROKU ADD-ON ● Create a Heroku app and add the Neo4j add-on $ heroku apps:create pyconca $ heroku addons:add neo4j --app pyconca $ xdg-open `heroku config:get NEO4J_URL --app pyconca` $ export NEO4J_URL=`heroku config:get NEO4J_URL --app pyconca` ● Create a virtualenv with neo4j-rest-client $ mkvirtualenv --no-site-packages pyconca $ workon pyconca $ pip install ipython neo4jrestclient $ ipython Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 62
  • 61. NEO4J HEROKU ADD-ON ● Run IPython and that's it! >>> import os >>> NEO4J_URL = os.environ["NEO4J_URL"] >>> from neo4jrestclient import client >>> gdb = client.GraphDatabase(NEO4J_URL + "/db/data") >>> gdb.url Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 63
  • 62. NEO4J HEROKU ADD-ON ● Run IPython and that's it! >>> import os >>> NEO4J_URL = os.environ["NEO4J_URL"] >>> from neo4jrestclient import client >>> gdb = client.GraphDatabase(NEO4J_URL + "/db/data") >>> gdb.url Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 64
  • 63. THANKS! Questions? Javier de la Rosa @versae The CulturePlex Lab Western University, London, ON PyCon Canada 2012
  • 64. APPENDIX: DATA MODELS ● neo4django – https://github.com/scholrly/neo4django ● neomodel – https://github.com/robinedwards/neomodel ● bulbflow models – http://bulbflow.com/quickstart/#models Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 66
  • 65. APPENDIX: VISUALIZE YOUR GRAPH ● Export somehow to .gexf for Gephi – http://gephi.org/ ● Use D3.js – http://d3js.org/ ● Use sigma.js – http://sigmajs.org/ ● Take a look on Max De Marzi work – http://maxdemarzi.com/category/visualization/ ● Use Sylva (for newbies) – http://www.sylvadb.com/ Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 67