SlideShare a Scribd company logo
An Introduction to
    Neo4j
   Michal Bachman
    @bachmanm
Roadmap
•   Intro to NOSQL
•   Intro to Graph Databases
•   Intro to Neo4j
•   A bit of hacking
•   Current research
•   Q&A



                               @bachmanm
Not Only SQL

          @bachmanm
Why NOSQL now?

   Driving trends




                    @bachmanm

Recommended for you

Jamaica Personal Income Tax Guide 2016 Edition (1)
Jamaica Personal Income Tax Guide  2016 Edition (1)Jamaica Personal Income Tax Guide  2016 Edition (1)
Jamaica Personal Income Tax Guide 2016 Edition (1)

The document provides guidance on Jamaica's personal income tax rates and thresholds for 2016-2017, which were increased from the previous levels. Key points include: - For 2016, the threshold was increased to $1,000,272 from July 1, and the tax rate above $6 million increased to 30% for the latter half of the year. - For 2017, the threshold further increased to $1,500,096 from April 1. - Worked examples are provided to illustrate the tax calculations and potential refunds for individuals under the new thresholds. - Guidance is given for applying the changes for employed and self-employed individuals for the dual tax periods in 2016.

TDD для интеграции с БД легко и просто!
TDD для интеграции с БД легко и просто!TDD для интеграции с БД легко и просто!
TDD для интеграции с БД легко и просто!

TDD подход к разработке зарекомендовал себя как очень надежный и быстрый способ реализовать задачи бизнеса с помощью программного кода. Но большая часть примеров на тренингах и в интернете показывает как применять TDD в очень простых ситуациях для кода вида вход/выход или с использованием заглушек для простых зависимостей. А как насчет осталь��ых областей разработки приложения как интеграция с БД? Возможно ли применить TDD к ним? Что даст в этом случае TDD разработчику? Я попробую в своем докладе ответить на эти вопросы и покажу на практических примерах как может быть полезен подход TDD для кода интеграции с БД, как он уменьшает риски и открывает двери для техник рефакторинга БД. В качестве бонуса будут затронуты некоторые NoSQL решения, что должно сделать тему еще популярнее! P.S. Все примеры будут демонстрироваться на Java.

база данныххранилище данныхтестирование
Pomodoro technique
Pomodoro techniquePomodoro technique
Pomodoro technique

The Pomodoro Technique is a time management method developed by Francesco Cirillo in the late 1980s. The technique uses a timer to break down work into intervals traditionally 25 minutes in length, separated by short breaks.

time managementpomodorotricode
Trend 1: Data Size




                     @bachmanm
Trend 2: Connectedness
                                                                                          GGG
                                                                                 Onotologies

                                                                              RDFa


                                                                         Folksonomies
Information connectivity




                                                               Tagging

                                                     Wikis

                                                               UGC

                                                       Blogs

                                                    Feeds


                                        Hypertext
                              Text
                           Documents




                                                                                                @bachmanm
Trend 3: Semi-structured Data




                            @bachmanm
Trend 4: Application Architecture (80’s)



                           Application




                               DB




                                         @bachmanm

Recommended for you

Presentation by TachyonNexus & Intel at Strata Singapore 2015
Presentation by TachyonNexus & Intel at Strata Singapore 2015Presentation by TachyonNexus & Intel at Strata Singapore 2015
Presentation by TachyonNexus & Intel at Strata Singapore 2015

Make Tachyon Ready for Next-Gen Data Center Platforms with NVM. The talk was presented at Strata Singapore, December 2015, focusing on using Tachyon Tiered Storage with NVM as the next generation data center platforms.

Using Spark with Tachyon by Gene Pang
Using Spark with Tachyon by Gene PangUsing Spark with Tachyon by Gene Pang
Using Spark with Tachyon by Gene Pang

This document summarizes a presentation about Tachyon, an open source memory-centric distributed storage system. It introduces Tachyon and how it can be used with Spark to resolve issues around slow data sharing, in-memory data loss during crashes, and data duplication. The presentation outlines new features in Tachyon 0.8.0 like tiered storage, pluggable data management policies, and a unified namespace across storage systems. It concludes by inviting users and collaborators to try, develop, and get involved with the Tachyon community.

apache sparkspark summit eu
Spark Summit EU 2015: Reynold Xin Keynote
Spark Summit EU 2015: Reynold Xin KeynoteSpark Summit EU 2015: Reynold Xin Keynote
Spark Summit EU 2015: Reynold Xin Keynote

This document summarizes Spark's development over the past 12 months and provides a look ahead. It discusses improvements to both the frontend, such as DataFrames and machine learning pipelines, and the backend through projects like Tungsten for performance optimizations. Going forward, it mentions new features like the Dataset API, streaming DataFrames, and potential hardware improvements from technologies like 3D XPoint memory. The overall goal is to provide a unified engine and APIs that can automatically optimize analytics workloads across languages and domains.

databricksapache sparkspark summit
Trend 4: Application Architecture (90’s)



                        App   App    App




                               DB




                                    @bachmanm
Application   Application   Application




    DB            DB            DB


                                          @bachmanm
Side note: RDBMS performance
 Salary List




                          @bachmanm
Four NOSQL Categories




                        @bachmanm

Recommended for you

Great functional testing with WebDriver and Thucydides
Great functional testing with WebDriver and ThucydidesGreat functional testing with WebDriver and Thucydides
Great functional testing with WebDriver and Thucydides

Presentation from online conference ConfeT&QA (October 2012) and Selenium Camp 2013 (February 2013) about techniques and approaches to create great functional automated tests.

thucydidesautomated testingtesting
Ceph at Work in Bloomberg: Object Store, RBD and OpenStack
Ceph at Work in Bloomberg: Object Store, RBD and OpenStackCeph at Work in Bloomberg: Object Store, RBD and OpenStack
Ceph at Work in Bloomberg: Object Store, RBD and OpenStack

Bloomberg's Chris Jones and Chris Morgan joined Red Hat Storage Day New York on 1/19/16 to explain how Red Hat Ceph Storage helps the financial giant tackle its data storage challenges.

cephred hat ceph storagered hat storage
Epiphany: Connecting Millions of Events to Thirty Billion Data Points in Real...
Epiphany: Connecting Millions of Events to Thirty Billion Data Points in Real...Epiphany: Connecting Millions of Events to Thirty Billion Data Points in Real...
Epiphany: Connecting Millions of Events to Thirty Billion Data Points in Real...

This document describes Epiphany, Rocket Fuel's real-time attribution platform. It connects millions of events to 50 billion data points to attribute conversions across devices and algorithms. It uses HBase to lookup impressions in milliseconds. Data flows from actions keyed by user/impression/conversion days to HBase and Hive tables. It enables idempotent attribution across advertisers and algorithms at scale in real-time.

apachehadoop summit
Key-Value Stores
• “Dynamo: Amazon’s Highly Available Key-
  Value Store” (2007)
• Data model:
  – Global key-value mapping
  – Big scalable HashMap
  – Highly fault tolerant (typically)
• Examples:
  – Riak, Redis, Voldemort

                                            @bachmanm
Pros and Cons
• Strengths
  – Simple data model
  – Great at scaling out horizontally
     • Scalable
     • Available
• Weaknesses:
  – Simplistic data model
  – Poor for complex data


                                        @bachmanm
Column Family (BigTable)
• Google’s “Bigtable: A Distributed Storage
  System for Structured Data” (2006)
• Data model:
  – A big table, with column families
  – Map-reduce for querying/processing
• Examples:
  – HBase, HyperTable, Cassandra



                                              @bachmanm
Pros and Cons
• Strengths
  – Data model supports semi-structured data
  – Naturally indexed (columns)
  – Good at scaling out horizontally
• Weaknesses:
  – Unsuited for interconnected data




                                               @bachmanm

Recommended for you

Alluxio Use Cases at Strata+Hadoop World Beijing 2016
Alluxio Use Cases at Strata+Hadoop World Beijing 2016Alluxio Use Cases at Strata+Hadoop World Beijing 2016
Alluxio Use Cases at Strata+Hadoop World Beijing 2016

1) Alluxio is an open-source virtual distributed storage system that provides memory-speed access to data across various storage platforms including HDFS, S3, and Swift. 2) Alluxio was presented as having four main use cases - using off-heap memory to alleviate resource pressure, enabling fast data sharing between jobs, accelerating access to remote storage, and providing a unified namespace across different storage systems. 3) Case studies demonstrated that Alluxio improved performance and enabled new workflows, with speedups of 15-300x reported for different customers including Barclays, Qunar, and Baidu.

Spark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production usersSpark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production users

At Databricks, we have a unique view into over a hundred different companies trying out Spark for development and production use-cases, from their support tickets and forum posts. Having seen so many different workflows and applications, some discernible patterns emerge when looking at common performance and scalability issues that our users run into. This talk will discuss some of these common common issues from an engineering and operations perspective, describing solutions and clarifying misconceptions.

databricksapache sparkspark summit
CES 2016 Trends and Implications - Havas
CES 2016 Trends and Implications - Havas CES 2016 Trends and Implications - Havas
CES 2016 Trends and Implications - Havas

The document summarizes trends observed at CES 2016. It notes that while hardware changes more slowly than software and expectations, CES 2016 showed an evolution in products that were faster, thinner, cheaper and more connected. Key trends included autonomous mobility with self-driving vehicles; collaborative systems as companies partner to create more value; cognitive robotics becoming more human-like; infinite screens as everything becomes a display; mixed reality with virtual and augmented reality gaining momentum; and diagnostic wearables that closely monitor health metrics.

cestrendsmarketing
Document Databases
• Data model
  – Collections of documents
  – A document is a key-value collection
  – Index-centric, lots of map-reduce
• Examples
  – CouchDB, MongoDB




                                           @bachmanm
Pros and Cons
• Strengths
  – Simple, powerful data model (just like SVN!)
  – Good scaling (especially if sharding supported)
• Weaknesses:
  – Unsuited for interconnected data
  – Query model limited to keys (and indexes)
     • Map reduce for larger queries




                                                 @bachmanm
Graph Databases
• Data model:
  – Nodes with properties
  – Named relationships with properties
  – Hypergraph, sometimes
• Examples:
  – Neo4j (of course), Sones GraphDB, OrientDB,
    InfiniteGraph, AllegroGraph



                                                  @bachmanm
Pros and Cons
• Strengths
  – Powerful data model
  – Fast
     • For connected data, can be many orders of magnitude
       faster than RDBMS
• Weaknesses:
  – Sharding
     • Though they can scale reasonably well
     • And for some domains you can shard too!

                                                     @bachmanm

Recommended for you

Alluxio Presentation at Strata San Jose 2016
Alluxio Presentation at Strata San Jose 2016Alluxio Presentation at Strata San Jose 2016
Alluxio Presentation at Strata San Jose 2016

Alluxio (formerly Tachyon) provides a unified namespace and tiered storage that allows data to be shared across clusters at memory speed. It is a virtual distributed storage system with a memory-centric architecture that abstracts persistent storage from applications. Alluxio enables data sharing between frameworks by allowing inter-process sharing at memory speed rather than being slowed by network or disk I/O. It also provides data resilience during application crashes by allowing processes to re-read data from memory I/O rather than network or disk I/O. Alluxio further allows consolidating memory usage across applications by preventing data duplication at the memory level.

What is Architecture?
What is Architecture?What is Architecture?
What is Architecture?

The document discusses different architects and what they find architecture in, including nature, curving forms, simple geometries, form and function, materials, details, volume, light, technology, and sustainability. It asks what architecture is, stating that architecture is simply a need that is developed through an innovative design for the future. The document encourages sharing ideas to lead to better ideas, and remembering that architecture is more than buildings but is the essence of life.

CV espanol
CV espanolCV espanol
CV espanol

Rebeca González Eriksen tiene una amplia experiencia en investigación clínica y dietética. Ha obtenido doctorados en medicina y nutrigenómica del Imperial College London y maestrías en nutrición y salud pública del London School of Hygiene & Tropical Medicine. Actualmente trabaja como dietista clínica en Marbella, España después de haber ocupado cargos de investigación en varias universidades e instituciones médicas en Reino Unido, Ghana y España.

Social Network “path exists”
              Performance
• Experiment:
  • ~1k persons                           # persons query time

  • Average 50 friends per   Relational   1000      2000ms
                             database
    person
                             Neo4j        1000      2ms
  • pathExists(a,b)
                             Neo4j        1000000   2ms
    limited to depth 4
  • Caches warm to
    eliminate disk IO


                                                      @bachmanm
Four NOSQL Categories




                        @bachmanm
What are graphs good for?
•   Recommendations
•   Business intelligence
•   Social computing
•   Geospatial
•   MDM
•   Systems management
•   Web of things
•   Genealogy
•   Time series data
•   Product catalogue
•   Web analytics
•   Scientific computing (especially bioinformatics)
•   Indexing your slow RDBMS
•   And much more!


                                                       @bachmanm
Neo4j is a Graph Database

So we need to detour through a little
           graph theory



                                        @bachmanm

Recommended for you

An Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4jAn Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4j

Neo4j is a graph database that stores data in nodes and relationships. It allows for efficient querying of connected data through graph traversals. Key aspects include nodes that can contain properties, relationships that connect nodes and also contain properties, and the ability to navigate the graph through traversals. Neo4j provides APIs for common graph operations like creating and removing nodes/relationships, running traversals, and managing transactions. It is well suited for domains that involve connected, semi-structured data like social networks.

No SQL- The Future Of Data Storage
No SQL- The Future Of Data StorageNo SQL- The Future Of Data Storage
No SQL- The Future Of Data Storage

NoSQL database is the first alternative to relational databases, with scalability, availability, and fault tolerance being key deciding factors.

nosqlbig datadatabase
NoSQL-Overview
NoSQL-OverviewNoSQL-Overview
NoSQL-Overview

This document provides an overview of NoSQL databases and their characteristics. It discusses the different eras of databases and pressures that led to the rise of NoSQL databases. It then categorizes and describes the different types of NoSQL databases, including key-value stores, document stores, column family stores, and graph databases. Specific examples like MongoDB, Cassandra, HBase, Neo4j are also outlined. The document emphasizes that the type of database chosen should depend on the problem to be solved and characteristics of the data.

@bachmanm
Meet Leonhard Euler
    • Swiss mathematician
    • Inventor of Graph
      Theory (1736)




                                       @bachmanm
http://en.wikipedia.org/wiki/File:Leonhard_Euler_2.jpg
http://en.wikipedia.org/wiki/Seven_Bridges_of_Königsberg   @bachmanm
Property Graph Model
                                  name: Michal Bachman




• nodes / vertices
• relationships / edges
                                  title: Intro to Neo4j
• properties                      duration: 45




                    name: Neo4j           name: NOSQL




                                                          @bachmanm

Recommended for you

Lviv EDGE 2 - NoSQL
Lviv EDGE 2 - NoSQLLviv EDGE 2 - NoSQL
Lviv EDGE 2 - NoSQL

This document provides an overview of NoSQL databases, including a brief history, classifications, pros and cons of usage, and trends. It discusses how NoSQL technologies originated from distributed computing needs and were driven by scalability, parallelization, and costs. Major classifications of NoSQL databases are described as column-oriented stores, key-value stores, document stores, and graph databases. Examples like MongoDB, Cassandra, and Neo4j are outlined. Both benefits and limitations of NoSQL are presented. Emerging trends around SQL access and adoption of Hadoop are also noted.

mongodbmysqlgraph
No Sql Movement
No Sql MovementNo Sql Movement
No Sql Movement

This document discusses trends driving the adoption of NoSQL databases, including increasing data size, connectivity of information, semi-structured data, and distributed application architectures. It describes four categories of NoSQL databases - aggregate-oriented, key-value stores, column family (BigTable), and document databases - and provides examples and comparisons of their pros and cons.

nosqldatabase
How to use NoSQL in Enterprise Java Applications - NoSQL Roadshow Zurich
How to use NoSQL in Enterprise Java Applications - NoSQL Roadshow ZurichHow to use NoSQL in Enterprise Java Applications - NoSQL Roadshow Zurich
How to use NoSQL in Enterprise Java Applications - NoSQL Roadshow Zurich

This document discusses how to use NoSQL databases in enterprise Java applications. It provides an overview of Spring Data, an open source framework that supports NoSQL and SQL databases. Spring Data provides common infrastructure and repositories to access data stores like MongoDB, Redis, and Neo4J. The presentation includes an example of using Spring Data to access MongoDB, with annotations for entities, configuration for the data store, and repositories for data access. Attendees are encouraged to try Spring Data with a data model that matches their data.

springjavanosql
Graphs are very whiteboard-friendly




                                @bachmanm
@bachmanm
Neo4j




        @bachmanm
32 billion nodes
32 billion relationships
64 billion properties
                           @bachmanm

Recommended for you

Intro to Big Data
Intro to Big DataIntro to Big Data
Intro to Big Data

Introduction to Big Data and NoSQL. This presentation was given to the Master DBA course at John Bryce Education in Israel. Work is based on presentations by Michael Naumov, Baruch Osoveskiy, Bill Graham and Ronen Fidel.

dbahadoopbig data
Introduction to h base
Introduction to h baseIntroduction to h base
Introduction to h base

HBase is a distributed, scalable, big data store that provides fast lookup capabilities like Google BigTable. It uses a table-like data structure with rows indexed by a key and stores data in columns grouped by families. HBase is designed to operate on top of Hadoop HDFS for scalability and high availability. It allows for fast lookups, full table scans, and range scans across large datasets distributed across clusters of commodity servers.

Grails goes Graph
Grails goes GraphGrails goes Graph
Grails goes Graph

This document discusses Grails integration with Neo4j graph databases. It begins with an introduction to graph databases and Neo4j. It then covers the Grails Neo4j plugin which allows using Neo4j as the persistence layer for Grails domain classes. Finally, it addresses some challenges in mapping the Grails domain model to the Neo4j nodespace and potential solutions.

springone2gxgrailsneo4j
@bachmanm
http://opfm.jpl.nasa.gov/




                      @bachmanm
http://news.xinhuanet.com




                       @bachmanm
@bachmanm

Recommended for you

Scaing databases on the cloud
Scaing databases on the cloudScaing databases on the cloud
Scaing databases on the cloud

Insights drawn from our practical experience of designing, developing and deploying large applications on the cloud.- clr

cloud computing servicesdatabase design for cloudcloud applications
Scaling Databases On The Cloud
Scaling Databases On The CloudScaling Databases On The Cloud
Scaling Databases On The Cloud

The most apt, current database Scaling Techniques for your cloud-app that ensure your application database scales effortlessly without a hitch.

dynamic database scalingclouddb scaling techniques
Sharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data LessonsSharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data Lessons

Traackr evaluated several NoSQL database options to store its heterogeneous, unstructured web data. Document databases were the best fit due to their flexibility to store variable length text like tweets and blog posts without predefined schemas. MongoDB was selected due to its maturity, adoption, and support for ad-hoc queries and batch processing needed by Traackr in early 2010.

mongodbstart upsearch engines
@bachmanm
Community


  Advanced



    Enterprise


                 @bachmanm
How do I use it?




                   @bachmanm
Getting started is easy
• Single package download, includes server stuff
  – http://neo4j.org/download/
• For developer convenience, Ivy (or whatever):
  –   <dependency org="org.neo4j" name="neo4j-community" rev="1.9.M04"/>




                                                                   @bachmanm

Recommended for you

Emergent Distributed Data Storage
Emergent Distributed Data StorageEmergent Distributed Data Storage
Emergent Distributed Data Storage

This was presented at NHN on Jan. 27, 2009. It introduces Big Data, its storages, and its analyses. Especially, it covers MapReduce debates and hybrid systems of RDBMS and MapReduce. In addition, in terms of Schema-Free, various non-relational data storages are explained.

rdbmsschema-freemapreduce
How to Get Started with Your MongoDB Pilot Project
How to Get Started with Your MongoDB Pilot ProjectHow to Get Started with Your MongoDB Pilot Project
How to Get Started with Your MongoDB Pilot Project

Open source, high performance database MongoDB can be used for a pilot project. The document discusses finding a non-critical initial project, getting experience with MongoDB, benchmarking performance, and presenting the business case for broader use. It also outlines steps for moving a successful pilot to production, including using MongoDB's auto-sharding, replication, and commercial support options.

mongodbjared rosoffdataversity
Big Data & Hadoop Introduction
Big Data & Hadoop IntroductionBig Data & Hadoop Introduction
Big Data & Hadoop Introduction

Disclaimer : The images, company, product and service names that are used in this presentation, are for illustration purposes only. All trademarks and registered trademarks are the property of their respective owners. Data/Image collected from various sources from Internet. Intention was to present the big picture of Big Data & Hadoop

hadoopdata analysisbig data
Run it!
• Server is easy to start stop
  – cd <install directory>
  – bin/neo4j start
  – bin/neo4j stop
• Provides a REST API in addition to the other
  APIs we’ve seen
• Provides some ops support
  – JMX, data browser, graph visualisation

                                             @bachmanm
Embed it!
• If you want to host the database in your
  process just load the jars

• And point the config at the right place on disk

• Embedded databases can be HA too
  – You don’t have to run as server



                                             @bachmanm
name: Phil Johnson



title: Cognitive Psychology
duration: 30                                               name: Michal Bachman




                                           name: UX



                                                           title: Intro to Neo4j
                                                           duration: 45

    name: Martin Macke




      name: Jeremy White      INTERESTED   name: Neo4j   name: NOSQL




                                                                       @bachmanm
GraphDatabaseService neo = new EmbeddedGraphDatabase("/data/webexpo");

Transaction tx = neo.beginTx();
try {
      Node speaker = neo.createNode();
      speaker.setProperty("name", "Michal Bachman");

    Node talk = neo.createNode();
    talk.setProperty("title", "Intro to Neo4j");

    Relationship delivers
         = speaker.createRelationshipTo(talk,
              DynamicRelationshipType.withName("DELIVERS"));
    delivers.setProperty("day", ”Saturday");

      neo.index().forNodes("people")
             .add(speaker, "name", "Michal Bachman");
} finally {
      tx.finish();
}


      name: Michal Bachman                 DELIVERS     title: Intro to Neo4j
                                        day: Saturday

                                                                         @bachmanm

Recommended for you

Demystifying data engineering
Demystifying data engineeringDemystifying data engineering
Demystifying data engineering

Slides for the talk at AI in Production meetup: https://www.meetup.com/LearnDataScience/events/255723555/ Abstract: Demystifying Data Engineering With recent progress in the fields of big data analytics and machine learning, Data Engineering is an emerging discipline which is not well-defined and often poorly understood. In this talk, we aim to explain Data Engineering, its role in Data Science, the difference between a Data Scientist and a Data Engineer, the role of a Data Engineer and common concepts as well as commonly misunderstood ones found in Data Engineering. Toward the end of the talk, we will examine a typical Data Analytics system architecture.

data miningdata sciencedata engineering
Life Science Database Cross Search and Metadata
Life Science Database Cross Search and MetadataLife Science Database Cross Search and Metadata
Life Science Database Cross Search and Metadata

Life science databases are sometimes difficult to understand due to lack of information. I'd like to add metadata into databases and improve search results.

microdataschema.orgdatabase
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptx

NoSQL databases provide an alternative to traditional relational databases that is well-suited for large datasets, high scalability needs, and flexible, changing schemas. NoSQL databases sacrifice strict consistency for greater scalability and availability. The document model is well-suited for semi-structured data and allows for embedding related data within documents. Key-value stores provide simple lookup of data by key but do not support complex queries. Graph databases effectively represent network-like connections between data elements.

Neo4j Introduction at Imperial College London
@bachmanm
Core API
• Nodes
  – Properties (optional K-V pairs)
• Relatiosnhips
  – Start node (required)
  – End node (required)
  – Properties (optional K-V pairs)




                                      @bachmanm
All Conference Topics




                        @bachmanm

Recommended for you

Gilbane Boston 2011 big data
Gilbane Boston 2011 big dataGilbane Boston 2011 big data
Gilbane Boston 2011 big data

"Get Ready for Big Data" presentation from Gilbane Boston 2011; for more details, see http://gilbaneboston.com/conference_program.html#t2 and http://pbokelly.blogspot.com/2011/12/gilbane-boston-2011-big-data.html

xmlrdbmsnosql
introduction to NOSQL Database
introduction to NOSQL Databaseintroduction to NOSQL Database
introduction to NOSQL Database

This document provides an introduction to NoSQL databases. It discusses the history and limitations of relational databases that led to the development of NoSQL databases. The key motivations for NoSQL databases are that they can handle big data, provide better scalability and flexibility than relational databases. The document describes some core NoSQL concepts like the CAP theorem and different types of NoSQL databases like key-value, columnar, document and graph databases. It also outlines some remaining research challenges in the area of NoSQL databases.

Recommendations with Neo4j (FOSDEM 2015)
Recommendations with Neo4j (FOSDEM 2015)Recommendations with Neo4j (FOSDEM 2015)
Recommendations with Neo4j (FOSDEM 2015)

This document discusses recommendations engines that use graph databases like Neo4j. It introduces GraphAware, an open-source recommendation engine plugin for Neo4j. The document outlines the business and technical challenges of building recommendation engines, and how GraphAware addresses these challenges through its flexible, high-performance architecture and APIs. It provides an example of building a simple friend recommendation engine using GraphAware.

graphawarerecommendation engineneo4j
name: Phil Johnson



title: Cognitive Psychology
duration: 30                                               name: Michal Bachman




                                           name: UX



                                                           title: Intro to Neo4j
                                                           duration: 45

    name: Martin Macke




      name: Jeremy White      INTERESTED   name: Neo4j   name: NOSQL




                                                                       @bachmanm
All Conference Topics
    Node webExpo = neo.getReferenceNode();
    for (Relationship talksAt : webExpo.getRelationships(INCOMING, TALKS_AT)) {
          Node speaker = talksAt.getStartNode();
          for (Relationship delivers : speaker.getRelationships(OUTGOING, DELIVERS)) {
                Node talk = delivers.getEndNode();
                for (Relationship about : talk.getRelationships(OUTGOING, ABOUT)) {
                      String topicName = (String) about.getEndNode().getProperty(NAME);
                      //add to result...
                }
          }
    }




-------------------
Printing all topics
All topics: development, data, advertising, education, usa, business, microsoft, webdesign, software,
responsiveness, ux, e-commerce, php, psychology, crm, api, chef, javascript, patterns, product design,
marketing, metro, social media, web, startup, analytics, lean, cqrs, node.js, branding, cloud, testing, neo4j,
rest, css, design, publishing, nosql. Took: 2 ms
Which talks should I attend?




                               @bachmanm
name: Phil Johnson



title: Cognitive Psychology
duration: 30                                               name: Michal Bachman




                                           name: UX



                                                           title: Intro to Neo4j
                                                           duration: 45

    name: Martin Macke




      name: Jeremy White      INTERESTED   name: Neo4j   name: NOSQL




                                                                       @bachmanm

Recommended for you

Advanced Neo4j Use Cases with the GraphAware Framework
Advanced Neo4j Use Cases with the GraphAware FrameworkAdvanced Neo4j Use Cases with the GraphAware Framework
Advanced Neo4j Use Cases with the GraphAware Framework

The document discusses GraphAware Framework, which makes it easy to build, test, and deploy custom APIs, transaction-driven behavior, and asynchronous computation functionality for Neo4j. It provides examples like representing time series data, tracking graph changes, assigning UUIDs, and running algorithms. GraphAware Framework is open source and supports building both generic and domain-specific Neo4j extensions.

graph databaseneo4jgraphaware
GraphAware Framework Intro
GraphAware Framework IntroGraphAware Framework Intro
GraphAware Framework Intro

The document discusses the GraphAware Framework, which allows developers to build custom APIs, transaction-driven behavior, and asynchronous computations for Neo4j. It provides examples like the TimeTree module for storing and querying time series data and a change feed module for tracking graph changes. The framework makes it easy to build, test, and deploy these advanced functionalities for Neo4j.

neo4jgraphsgraphconnect
Modelling Data as Graphs (Neo4j)
Modelling Data as Graphs (Neo4j)Modelling Data as Graphs (Neo4j)
Modelling Data as Graphs (Neo4j)

Modelling Data in Neo4j for beginners, common mistakes, frequently asked questions, hardware sizing and a few extra tips

neo4jdata modellinggraph database
Which talks should I attend?
   TraversalDescription talksTraversal = Traversal.description()
        .uniqueness(Uniqueness.NONE)
        .breadthFirst()
        .relationships(INTERESTED, OUTGOING)
        .relationships(ABOUT, INCOMING)
        .evaluator(Evaluators.atDepth(2));

   Node attendee =
        neo.index().forNodes("people").get("name", ”Jeremy White").getSingle();

   Iterable<Node> talks = talksTraversal.traverse(attendee).nodes();

   //iterate over talks and print




------------------------------------------
Suggesting talks for 100 random attendees.
...
Aneta Lebedova: Measure Everything!, To the USA, The real me. Took: 1 ms
Bohumir Kubat: Beyond the polar bear, How (not) to do API, Critical interface design. Took: 1 ms
Vladimir Vales: Application Development for Windows 8 Metro. Took: 1 ms
Suggested talks for 100 random attendees in 449 ms
What do we have in common?




                         @bachmanm
name: Phil Johnson



title: Cognitive Psychology
duration: 30                                               name: Michal Bachman




                                           name: UX



                                                           title: Intro to Neo4j
                                                           duration: 45

    name: Martin Macke




      name: Jeremy White      INTERESTED   name: Neo4j   name: NOSQL




                                                                       @bachmanm
What do we have in common?
      //retrieve attendeeOne and attendeeTwo from index

      int maxDepth = 2;
      Iterable<Path> paths = GraphAlgoFactory
            .allPaths(Traversal.expanderForAllTypes(), maxDepth)
            .findAllPaths(attendeeOne, attendeeTwo);

      for (Path path : paths) {
            //print it
      }



------------------------------------------------------------
Finding things in common for 100 random couples of attendees
...
Karel Kunc and Phil Smith:

(Karel Kunc)--[INTERESTED]-->(ux)<--[INTERESTED]--(Phil Smith),
(Karel Kunc)--[DISLIKED]-->(Be a punk consumer!)<--[DISLIKED]--(Phil Smith),
(Karel Kunc)--[DISLIKED]-->(Beyond the polar bear)<--[LIKED]--(Phil Smith),
(Karel Kunc)--[LIKED]-->(Shipito.com – business in USA)<--[LIKED]--(Phil Smith).
Took: 0 ms.
...

Found things in common for 100 random couples of attendees in 142 ms.

Recommended for you

Intro to Neo4j (CZ)
Intro to Neo4j (CZ)Intro to Neo4j (CZ)
Intro to Neo4j (CZ)

Představení open-source grafové databáze na prvním oficiálním českém Neo4j meetupu.

graph databaseczech republicneo4j
Modelling Data in Neo4j (plus a few tips)
Modelling Data in Neo4j (plus a few tips)Modelling Data in Neo4j (plus a few tips)
Modelling Data in Neo4j (plus a few tips)

Modelling Data in Neo4j, bidirectional relationships, qualifying relationships with properties vs. relationship types (performance comparison), Neo4j hardware sizing, Cypher vs. Java API

data modellinggraph databaseneo4j
(Big) Data Science
(Big) Data Science(Big) Data Science
(Big) Data Science

This document discusses graph theory and its applications to data science. It provides examples of social and technological networks that can be represented as graphs, and covers graph theory concepts like connected components, triadic closure, structural balance, and centrality measures. Neo4j is presented as an open-source graph database that allows storing and querying graph data using the Cypher query language.

graph databasegraph theoryneo4j
Youngsters, Y U No Like Java?




                            @bachmanm
Who is my beer mate?

myself                     beerMate:?




                talk:?



                                 @bachmanm
Who is my beer mate?

(myself)                     (beerMate)




                  (talk)



                                   @bachmanm
Who is my beer mate?
start myself=node:people(name = "Emil Votruba")

match (myself)-[:LIKED]->(talk)<-[:LIKED]-(beerMate)

return distinct beerMate.name, count(beerMate)

order by count(beerMate) desc

limit 5;




                                                       @bachmanm

Recommended for you

Neo4j - Tales from the Trenches
Neo4j - Tales from the TrenchesNeo4j - Tales from the Trenches
Neo4j - Tales from the Trenches

Lessons learned from over a year with Neo4j on a social network / recommendation engine. Presented at Neo4j user group in London, UK in 2012.

user groupgraph databasenosql
WebExpo Prague 2012 - Introduction to Neo4j (Czech)
WebExpo Prague 2012 - Introduction to Neo4j (Czech)WebExpo Prague 2012 - Introduction to Neo4j (Czech)
WebExpo Prague 2012 - Introduction to Neo4j (Czech)

Slides from my talk (in Czech) about Neo4j at WebExpo Prague on 22nd September 2012.

nosqlsoftware developmentneo4j
論文紹介:A Systematic Survey of Prompt Engineering on Vision-Language Foundation ...
論文紹介:A Systematic Survey of Prompt Engineering on Vision-Language Foundation ...論文紹介:A Systematic Survey of Prompt Engineering on Vision-Language Foundation ...
論文紹介:A Systematic Survey of Prompt Engineering on Vision-Language Foundation ...

Jindong Gu, Zhen Han, Shuo Chen, Ahmad Beirami, Bailan He, Gengyuan Zhang, Ruotong Liao, Yao Qin, Volker Tresp, Philip Torr "A Systematic Survey of Prompt Engineering on Vision-Language Foundation Models" arXiv2023 https://arxiv.org/abs/2307.12980

Cypher Query
start myself=node:people(name = ”Alex Smart")

match (myself)-[:LIKED]->(talk)<-[:LIKED]-(beerMate)

return distinct beerMate.name, count(beerMate)

order by count(beerMate) desc

limit 5;




                                                       @bachmanm
Cypher Query
start myself=node:people(name = ”Emil Votruba")

match (myself)-[:LIKED]->()<-[:LIKED]-(beerMate)

return distinct beerMate.name, count(beerMate)

order by count(beerMate) desc

limit 5;




                                                   @bachmanm
Who is my beer mate?




                       @bachmanm
Current Research
•   Graph partitioning
•   Graph analytics (“OLAP” and predictive)
•   Performance improvements
•   Query languages
•   MVCC and single-threaded write models
•   ACID (tradeoffs for weakening C and I)
•   Yield and Harvest in distributed systems
•   Application-level
    – Recommendations
    – Protein interactions
    –…

                                               @bachmanm

Recommended for you

Research Directions for Cross Reality Interfaces
Research Directions for Cross Reality InterfacesResearch Directions for Cross Reality Interfaces
Research Directions for Cross Reality Interfaces

An invited talk given by Mark Billinghurst on Research Directions for Cross Reality Interfaces. This was given on July 2nd 2024 as part of the 2024 Summer School on Cross Reality in Hagenberg, Austria (July 1st - 7th)

augmented realitycross realityvirtual reality
WPRiders Company Presentation Slide Deck
WPRiders Company Presentation Slide DeckWPRiders Company Presentation Slide Deck
WPRiders Company Presentation Slide Deck

YOUR RELIABLE WEB DESIGN & DEVELOPMENT TEAM — FOR LASTING SUCCESS WPRiders is a web development company specialized in WordPress and WooCommerce websites and plugins for customers around the world. The company is headquartered in Bucharest, Romania, but our team members are located all over the world. Our customers are primarily from the US and Western Europe, but we have clients from Australia, Canada and other areas as well. Some facts about WPRiders and why we are one of the best firms around: More than 700 five-star reviews! You can check them here. 1500 WordPress projects delivered. We respond 80% faster than other firms! Data provided by Freshdesk. We’ve been in business since 2015. We are located in 7 countries and have 22 team members. With so many projects delivered, our team knows what works and what doesn’t when it comes to WordPress and WooCommerce. Our team members are: - highly experienced developers (employees & contractors with 5 -10+ years of experience), - great designers with an eye for UX/UI with 10+ years of experience - project managers with development background who speak both tech and non-tech - QA specialists - Conversion Rate Optimisation - CRO experts They are all working together to provide you with the best possible service. We are passionate about WordPress, and we love creating custom solutions that help our clients achieve their goals. At WPRiders, we are committed to building long-term relationships with our clients. We believe in accountability, in doing the right thing, as well as in transparency and open communication. You can read more about WPRiders on the About us page.

web development agencywpriderswordpress development
20240705 QFM024 Irresponsible AI Reading List June 2024
20240705 QFM024 Irresponsible AI Reading List June 202420240705 QFM024 Irresponsible AI Reading List June 2024
20240705 QFM024 Irresponsible AI Reading List June 2024

Everything that I found interesting last month about the irresponsible use of machine intelligence

quantumfaxmachine
Questions?
Neo4j: http://neo4j.org
Neo Technology: http://neotechnology.com
Twitter: @bachmanm
Code: git://github.com/bachmanm/neo4j-imperial.git

More Related Content

Viewers also liked

Easy AJAX with Java and DWR
Easy AJAX with Java and DWREasy AJAX with Java and DWR
Easy AJAX with Java and DWR
Mikalai Alimenkou
 
Finance Tips for New Parents
Finance Tips for New ParentsFinance Tips for New Parents
Finance Tips for New Parents
Miguel Aliaga
 
Mobile Strategy Partners 2010 Mobile Banking Summit Workshop Presentation
Mobile Strategy Partners 2010 Mobile Banking Summit Workshop PresentationMobile Strategy Partners 2010 Mobile Banking Summit Workshop Presentation
Mobile Strategy Partners 2010 Mobile Banking Summit Workshop Presentation
David Eads
 
Jamaica Personal Income Tax Guide 2016 Edition (1)
Jamaica Personal Income Tax Guide  2016 Edition (1)Jamaica Personal Income Tax Guide  2016 Edition (1)
Jamaica Personal Income Tax Guide 2016 Edition (1)
Dawgen Global
 
TDD для интеграции с БД легко и просто!
TDD для интеграции с БД легко и просто!TDD для интеграции с БД легко и просто!
TDD для интеграции с БД легко и просто!
Mikalai Alimenkou
 
Pomodoro technique
Pomodoro techniquePomodoro technique
Pomodoro technique
Tricode (part of Dept)
 
Presentation by TachyonNexus & Intel at Strata Singapore 2015
Presentation by TachyonNexus & Intel at Strata Singapore 2015Presentation by TachyonNexus & Intel at Strata Singapore 2015
Presentation by TachyonNexus & Intel at Strata Singapore 2015
Tachyon Nexus, Inc.
 
Using Spark with Tachyon by Gene Pang
Using Spark with Tachyon by Gene PangUsing Spark with Tachyon by Gene Pang
Using Spark with Tachyon by Gene Pang
Spark Summit
 
Spark Summit EU 2015: Reynold Xin Keynote
Spark Summit EU 2015: Reynold Xin KeynoteSpark Summit EU 2015: Reynold Xin Keynote
Spark Summit EU 2015: Reynold Xin Keynote
Databricks
 
Great functional testing with WebDriver and Thucydides
Great functional testing with WebDriver and ThucydidesGreat functional testing with WebDriver and Thucydides
Great functional testing with WebDriver and Thucydides
Mikalai Alimenkou
 
Ceph at Work in Bloomberg: Object Store, RBD and OpenStack
Ceph at Work in Bloomberg: Object Store, RBD and OpenStackCeph at Work in Bloomberg: Object Store, RBD and OpenStack
Ceph at Work in Bloomberg: Object Store, RBD and OpenStack
Red_Hat_Storage
 
Epiphany: Connecting Millions of Events to Thirty Billion Data Points in Real...
Epiphany: Connecting Millions of Events to Thirty Billion Data Points in Real...Epiphany: Connecting Millions of Events to Thirty Billion Data Points in Real...
Epiphany: Connecting Millions of Events to Thirty Billion Data Points in Real...
DataWorks Summit
 
Alluxio Use Cases at Strata+Hadoop World Beijing 2016
Alluxio Use Cases at Strata+Hadoop World Beijing 2016Alluxio Use Cases at Strata+Hadoop World Beijing 2016
Alluxio Use Cases at Strata+Hadoop World Beijing 2016
Alluxio, Inc.
 
Spark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production usersSpark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production users
Databricks
 
CES 2016 Trends and Implications - Havas
CES 2016 Trends and Implications - Havas CES 2016 Trends and Implications - Havas
CES 2016 Trends and Implications - Havas
Tom Goodwin
 
Alluxio Presentation at Strata San Jose 2016
Alluxio Presentation at Strata San Jose 2016Alluxio Presentation at Strata San Jose 2016
Alluxio Presentation at Strata San Jose 2016
Jiří Šimša
 
What is Architecture?
What is Architecture?What is Architecture?
What is Architecture?
Marsha Benson
 
CV espanol
CV espanolCV espanol
CV espanol
Rebeca Eriksen
 

Viewers also liked (18)

Easy AJAX with Java and DWR
Easy AJAX with Java and DWREasy AJAX with Java and DWR
Easy AJAX with Java and DWR
 
Finance Tips for New Parents
Finance Tips for New ParentsFinance Tips for New Parents
Finance Tips for New Parents
 
Mobile Strategy Partners 2010 Mobile Banking Summit Workshop Presentation
Mobile Strategy Partners 2010 Mobile Banking Summit Workshop PresentationMobile Strategy Partners 2010 Mobile Banking Summit Workshop Presentation
Mobile Strategy Partners 2010 Mobile Banking Summit Workshop Presentation
 
Jamaica Personal Income Tax Guide 2016 Edition (1)
Jamaica Personal Income Tax Guide  2016 Edition (1)Jamaica Personal Income Tax Guide  2016 Edition (1)
Jamaica Personal Income Tax Guide 2016 Edition (1)
 
TDD для интеграции с БД легко и просто!
TDD для интеграции с БД легко и просто!TDD для интеграции с БД легко и просто!
TDD для интеграции с БД легко и просто!
 
Pomodoro technique
Pomodoro techniquePomodoro technique
Pomodoro technique
 
Presentation by TachyonNexus & Intel at Strata Singapore 2015
Presentation by TachyonNexus & Intel at Strata Singapore 2015Presentation by TachyonNexus & Intel at Strata Singapore 2015
Presentation by TachyonNexus & Intel at Strata Singapore 2015
 
Using Spark with Tachyon by Gene Pang
Using Spark with Tachyon by Gene PangUsing Spark with Tachyon by Gene Pang
Using Spark with Tachyon by Gene Pang
 
Spark Summit EU 2015: Reynold Xin Keynote
Spark Summit EU 2015: Reynold Xin KeynoteSpark Summit EU 2015: Reynold Xin Keynote
Spark Summit EU 2015: Reynold Xin Keynote
 
Great functional testing with WebDriver and Thucydides
Great functional testing with WebDriver and ThucydidesGreat functional testing with WebDriver and Thucydides
Great functional testing with WebDriver and Thucydides
 
Ceph at Work in Bloomberg: Object Store, RBD and OpenStack
Ceph at Work in Bloomberg: Object Store, RBD and OpenStackCeph at Work in Bloomberg: Object Store, RBD and OpenStack
Ceph at Work in Bloomberg: Object Store, RBD and OpenStack
 
Epiphany: Connecting Millions of Events to Thirty Billion Data Points in Real...
Epiphany: Connecting Millions of Events to Thirty Billion Data Points in Real...Epiphany: Connecting Millions of Events to Thirty Billion Data Points in Real...
Epiphany: Connecting Millions of Events to Thirty Billion Data Points in Real...
 
Alluxio Use Cases at Strata+Hadoop World Beijing 2016
Alluxio Use Cases at Strata+Hadoop World Beijing 2016Alluxio Use Cases at Strata+Hadoop World Beijing 2016
Alluxio Use Cases at Strata+Hadoop World Beijing 2016
 
Spark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production usersSpark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production users
 
CES 2016 Trends and Implications - Havas
CES 2016 Trends and Implications - Havas CES 2016 Trends and Implications - Havas
CES 2016 Trends and Implications - Havas
 
Alluxio Presentation at Strata San Jose 2016
Alluxio Presentation at Strata San Jose 2016Alluxio Presentation at Strata San Jose 2016
Alluxio Presentation at Strata San Jose 2016
 
What is Architecture?
What is Architecture?What is Architecture?
What is Architecture?
 
CV espanol
CV espanolCV espanol
CV espanol
 

Similar to Neo4j Introduction at Imperial College London

An Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4jAn Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4j
Debanjan Mahata
 
No SQL- The Future Of Data Storage
No SQL- The Future Of Data StorageNo SQL- The Future Of Data Storage
No SQL- The Future Of Data Storage
Bethmi Gunasekara
 
NoSQL-Overview
NoSQL-OverviewNoSQL-Overview
NoSQL-Overview
Ranjeet Jha - OCM-JEA
 
Lviv EDGE 2 - NoSQL
Lviv EDGE 2 - NoSQLLviv EDGE 2 - NoSQL
Lviv EDGE 2 - NoSQL
zenyk
 
No Sql Movement
No Sql MovementNo Sql Movement
No Sql Movement
Ajit Koti
 
How to use NoSQL in Enterprise Java Applications - NoSQL Roadshow Zurich
How to use NoSQL in Enterprise Java Applications - NoSQL Roadshow ZurichHow to use NoSQL in Enterprise Java Applications - NoSQL Roadshow Zurich
How to use NoSQL in Enterprise Java Applications - NoSQL Roadshow Zurich
Patrick Baumgartner
 
Intro to Big Data
Intro to Big DataIntro to Big Data
Intro to Big Data
Zohar Elkayam
 
Introduction to h base
Introduction to h baseIntroduction to h base
Introduction to h base
TrendProgContest13
 
Grails goes Graph
Grails goes GraphGrails goes Graph
Grails goes Graph
darthvader42
 
Scaing databases on the cloud
Scaing databases on the cloudScaing databases on the cloud
Scaing databases on the cloud
Imaginea
 
Scaling Databases On The Cloud
Scaling Databases On The CloudScaling Databases On The Cloud
Scaling Databases On The Cloud
Imaginea
 
Sharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data LessonsSharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data Lessons
George Stathis
 
Emergent Distributed Data Storage
Emergent Distributed Data StorageEmergent Distributed Data Storage
Emergent Distributed Data Storage
hybrid cloud
 
How to Get Started with Your MongoDB Pilot Project
How to Get Started with Your MongoDB Pilot ProjectHow to Get Started with Your MongoDB Pilot Project
How to Get Started with Your MongoDB Pilot Project
DATAVERSITY
 
Big Data & Hadoop Introduction
Big Data & Hadoop IntroductionBig Data & Hadoop Introduction
Big Data & Hadoop Introduction
Jayant Mukherjee
 
Demystifying data engineering
Demystifying data engineeringDemystifying data engineering
Demystifying data engineering
Thang Bui (Bob)
 
Life Science Database Cross Search and Metadata
Life Science Database Cross Search and MetadataLife Science Database Cross Search and Metadata
Life Science Database Cross Search and Metadata
Maori Ito
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptx
Rahul Borate
 
Gilbane Boston 2011 big data
Gilbane Boston 2011 big dataGilbane Boston 2011 big data
Gilbane Boston 2011 big data
Peter O'Kelly
 
introduction to NOSQL Database
introduction to NOSQL Databaseintroduction to NOSQL Database
introduction to NOSQL Database
nehabsairam
 

Similar to Neo4j Introduction at Imperial College London (20)

An Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4jAn Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4j
 
No SQL- The Future Of Data Storage
No SQL- The Future Of Data StorageNo SQL- The Future Of Data Storage
No SQL- The Future Of Data Storage
 
NoSQL-Overview
NoSQL-OverviewNoSQL-Overview
NoSQL-Overview
 
Lviv EDGE 2 - NoSQL
Lviv EDGE 2 - NoSQLLviv EDGE 2 - NoSQL
Lviv EDGE 2 - NoSQL
 
No Sql Movement
No Sql MovementNo Sql Movement
No Sql Movement
 
How to use NoSQL in Enterprise Java Applications - NoSQL Roadshow Zurich
How to use NoSQL in Enterprise Java Applications - NoSQL Roadshow ZurichHow to use NoSQL in Enterprise Java Applications - NoSQL Roadshow Zurich
How to use NoSQL in Enterprise Java Applications - NoSQL Roadshow Zurich
 
Intro to Big Data
Intro to Big DataIntro to Big Data
Intro to Big Data
 
Introduction to h base
Introduction to h baseIntroduction to h base
Introduction to h base
 
Grails goes Graph
Grails goes GraphGrails goes Graph
Grails goes Graph
 
Scaing databases on the cloud
Scaing databases on the cloudScaing databases on the cloud
Scaing databases on the cloud
 
Scaling Databases On The Cloud
Scaling Databases On The CloudScaling Databases On The Cloud
Scaling Databases On The Cloud
 
Sharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data LessonsSharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data Lessons
 
Emergent Distributed Data Storage
Emergent Distributed Data StorageEmergent Distributed Data Storage
Emergent Distributed Data Storage
 
How to Get Started with Your MongoDB Pilot Project
How to Get Started with Your MongoDB Pilot ProjectHow to Get Started with Your MongoDB Pilot Project
How to Get Started with Your MongoDB Pilot Project
 
Big Data & Hadoop Introduction
Big Data & Hadoop IntroductionBig Data & Hadoop Introduction
Big Data & Hadoop Introduction
 
Demystifying data engineering
Demystifying data engineeringDemystifying data engineering
Demystifying data engineering
 
Life Science Database Cross Search and Metadata
Life Science Database Cross Search and MetadataLife Science Database Cross Search and Metadata
Life Science Database Cross Search and Metadata
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptx
 
Gilbane Boston 2011 big data
Gilbane Boston 2011 big dataGilbane Boston 2011 big data
Gilbane Boston 2011 big data
 
introduction to NOSQL Database
introduction to NOSQL Databaseintroduction to NOSQL Database
introduction to NOSQL Database
 

More from Michal Bachman

Recommendations with Neo4j (FOSDEM 2015)
Recommendations with Neo4j (FOSDEM 2015)Recommendations with Neo4j (FOSDEM 2015)
Recommendations with Neo4j (FOSDEM 2015)
Michal Bachman
 
Advanced Neo4j Use Cases with the GraphAware Framework
Advanced Neo4j Use Cases with the GraphAware FrameworkAdvanced Neo4j Use Cases with the GraphAware Framework
Advanced Neo4j Use Cases with the GraphAware Framework
Michal Bachman
 
GraphAware Framework Intro
GraphAware Framework IntroGraphAware Framework Intro
GraphAware Framework Intro
Michal Bachman
 
Modelling Data as Graphs (Neo4j)
Modelling Data as Graphs (Neo4j)Modelling Data as Graphs (Neo4j)
Modelling Data as Graphs (Neo4j)
Michal Bachman
 
Intro to Neo4j (CZ)
Intro to Neo4j (CZ)Intro to Neo4j (CZ)
Intro to Neo4j (CZ)
Michal Bachman
 
Modelling Data in Neo4j (plus a few tips)
Modelling Data in Neo4j (plus a few tips)Modelling Data in Neo4j (plus a few tips)
Modelling Data in Neo4j (plus a few tips)
Michal Bachman
 
(Big) Data Science
(Big) Data Science(Big) Data Science
(Big) Data Science
Michal Bachman
 
Neo4j - Tales from the Trenches
Neo4j - Tales from the TrenchesNeo4j - Tales from the Trenches
Neo4j - Tales from the Trenches
Michal Bachman
 
WebExpo Prague 2012 - Introduction to Neo4j (Czech)
WebExpo Prague 2012 - Introduction to Neo4j (Czech)WebExpo Prague 2012 - Introduction to Neo4j (Czech)
WebExpo Prague 2012 - Introduction to Neo4j (Czech)
Michal Bachman
 

More from Michal Bachman (9)

Recommendations with Neo4j (FOSDEM 2015)
Recommendations with Neo4j (FOSDEM 2015)Recommendations with Neo4j (FOSDEM 2015)
Recommendations with Neo4j (FOSDEM 2015)
 
Advanced Neo4j Use Cases with the GraphAware Framework
Advanced Neo4j Use Cases with the GraphAware FrameworkAdvanced Neo4j Use Cases with the GraphAware Framework
Advanced Neo4j Use Cases with the GraphAware Framework
 
GraphAware Framework Intro
GraphAware Framework IntroGraphAware Framework Intro
GraphAware Framework Intro
 
Modelling Data as Graphs (Neo4j)
Modelling Data as Graphs (Neo4j)Modelling Data as Graphs (Neo4j)
Modelling Data as Graphs (Neo4j)
 
Intro to Neo4j (CZ)
Intro to Neo4j (CZ)Intro to Neo4j (CZ)
Intro to Neo4j (CZ)
 
Modelling Data in Neo4j (plus a few tips)
Modelling Data in Neo4j (plus a few tips)Modelling Data in Neo4j (plus a few tips)
Modelling Data in Neo4j (plus a few tips)
 
(Big) Data Science
(Big) Data Science(Big) Data Science
(Big) Data Science
 
Neo4j - Tales from the Trenches
Neo4j - Tales from the TrenchesNeo4j - Tales from the Trenches
Neo4j - Tales from the Trenches
 
WebExpo Prague 2012 - Introduction to Neo4j (Czech)
WebExpo Prague 2012 - Introduction to Neo4j (Czech)WebExpo Prague 2012 - Introduction to Neo4j (Czech)
WebExpo Prague 2012 - Introduction to Neo4j (Czech)
 

Recently uploaded

論文紹介:A Systematic Survey of Prompt Engineering on Vision-Language Foundation ...
論文紹介:A Systematic Survey of Prompt Engineering on Vision-Language Foundation ...論文紹介:A Systematic Survey of Prompt Engineering on Vision-Language Foundation ...
論文紹介:A Systematic Survey of Prompt Engineering on Vision-Language Foundation ...
Toru Tamaki
 
Research Directions for Cross Reality Interfaces
Research Directions for Cross Reality InterfacesResearch Directions for Cross Reality Interfaces
Research Directions for Cross Reality Interfaces
Mark Billinghurst
 
WPRiders Company Presentation Slide Deck
WPRiders Company Presentation Slide DeckWPRiders Company Presentation Slide Deck
WPRiders Company Presentation Slide Deck
Lidia A.
 
20240705 QFM024 Irresponsible AI Reading List June 2024
20240705 QFM024 Irresponsible AI Reading List June 202420240705 QFM024 Irresponsible AI Reading List June 2024
20240705 QFM024 Irresponsible AI Reading List June 2024
Matthew Sinclair
 
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
Chris Swan
 
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-InTrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc
 
20240702 QFM021 Machine Intelligence Reading List June 2024
20240702 QFM021 Machine Intelligence Reading List June 202420240702 QFM021 Machine Intelligence Reading List June 2024
20240702 QFM021 Machine Intelligence Reading List June 2024
Matthew Sinclair
 
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdfBT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
Neo4j
 
Pigging Solutions Sustainability brochure.pdf
Pigging Solutions Sustainability brochure.pdfPigging Solutions Sustainability brochure.pdf
Pigging Solutions Sustainability brochure.pdf
Pigging Solutions
 
Cookies program to display the information though cookie creation
Cookies program to display the information though cookie creationCookies program to display the information though cookie creation
Cookies program to display the information though cookie creation
shanthidl1
 
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptxRPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
SynapseIndia
 
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...
Bert Blevins
 
20240702 Présentation Plateforme GenAI.pdf
20240702 Présentation Plateforme GenAI.pdf20240702 Présentation Plateforme GenAI.pdf
20240702 Présentation Plateforme GenAI.pdf
Sally Laouacheria
 
DealBook of Ukraine: 2024 edition
DealBook of Ukraine: 2024 editionDealBook of Ukraine: 2024 edition
DealBook of Ukraine: 2024 edition
Yevgen Sysoyev
 
Coordinate Systems in FME 101 - Webinar Slides
Coordinate Systems in FME 101 - Webinar SlidesCoordinate Systems in FME 101 - Webinar Slides
Coordinate Systems in FME 101 - Webinar Slides
Safe Software
 
Measuring the Impact of Network Latency at Twitter
Measuring the Impact of Network Latency at TwitterMeasuring the Impact of Network Latency at Twitter
Measuring the Impact of Network Latency at Twitter
ScyllaDB
 
Best Programming Language for Civil Engineers
Best Programming Language for Civil EngineersBest Programming Language for Civil Engineers
Best Programming Language for Civil Engineers
Awais Yaseen
 
Password Rotation in 2024 is still Relevant
Password Rotation in 2024 is still RelevantPassword Rotation in 2024 is still Relevant
Password Rotation in 2024 is still Relevant
Bert Blevins
 
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - MydbopsScaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Mydbops
 
The Rise of Supernetwork Data Intensive Computing
The Rise of Supernetwork Data Intensive ComputingThe Rise of Supernetwork Data Intensive Computing
The Rise of Supernetwork Data Intensive Computing
Larry Smarr
 

Recently uploaded (20)

論文紹介:A Systematic Survey of Prompt Engineering on Vision-Language Foundation ...
論文紹介:A Systematic Survey of Prompt Engineering on Vision-Language Foundation ...論文紹介:A Systematic Survey of Prompt Engineering on Vision-Language Foundation ...
論文紹介:A Systematic Survey of Prompt Engineering on Vision-Language Foundation ...
 
Research Directions for Cross Reality Interfaces
Research Directions for Cross Reality InterfacesResearch Directions for Cross Reality Interfaces
Research Directions for Cross Reality Interfaces
 
WPRiders Company Presentation Slide Deck
WPRiders Company Presentation Slide DeckWPRiders Company Presentation Slide Deck
WPRiders Company Presentation Slide Deck
 
20240705 QFM024 Irresponsible AI Reading List June 2024
20240705 QFM024 Irresponsible AI Reading List June 202420240705 QFM024 Irresponsible AI Reading List June 2024
20240705 QFM024 Irresponsible AI Reading List June 2024
 
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
 
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-InTrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
 
20240702 QFM021 Machine Intelligence Reading List June 2024
20240702 QFM021 Machine Intelligence Reading List June 202420240702 QFM021 Machine Intelligence Reading List June 2024
20240702 QFM021 Machine Intelligence Reading List June 2024
 
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdfBT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
 
Pigging Solutions Sustainability brochure.pdf
Pigging Solutions Sustainability brochure.pdfPigging Solutions Sustainability brochure.pdf
Pigging Solutions Sustainability brochure.pdf
 
Cookies program to display the information though cookie creation
Cookies program to display the information though cookie creationCookies program to display the information though cookie creation
Cookies program to display the information though cookie creation
 
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptxRPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
 
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...
 
20240702 Présentation Plateforme GenAI.pdf
20240702 Présentation Plateforme GenAI.pdf20240702 Présentation Plateforme GenAI.pdf
20240702 Présentation Plateforme GenAI.pdf
 
DealBook of Ukraine: 2024 edition
DealBook of Ukraine: 2024 editionDealBook of Ukraine: 2024 edition
DealBook of Ukraine: 2024 edition
 
Coordinate Systems in FME 101 - Webinar Slides
Coordinate Systems in FME 101 - Webinar SlidesCoordinate Systems in FME 101 - Webinar Slides
Coordinate Systems in FME 101 - Webinar Slides
 
Measuring the Impact of Network Latency at Twitter
Measuring the Impact of Network Latency at TwitterMeasuring the Impact of Network Latency at Twitter
Measuring the Impact of Network Latency at Twitter
 
Best Programming Language for Civil Engineers
Best Programming Language for Civil EngineersBest Programming Language for Civil Engineers
Best Programming Language for Civil Engineers
 
Password Rotation in 2024 is still Relevant
Password Rotation in 2024 is still RelevantPassword Rotation in 2024 is still Relevant
Password Rotation in 2024 is still Relevant
 
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - MydbopsScaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
 
The Rise of Supernetwork Data Intensive Computing
The Rise of Supernetwork Data Intensive ComputingThe Rise of Supernetwork Data Intensive Computing
The Rise of Supernetwork Data Intensive Computing
 

Neo4j Introduction at Imperial College London

  • 1. An Introduction to Neo4j Michal Bachman @bachmanm
  • 2. Roadmap • Intro to NOSQL • Intro to Graph Databases • Intro to Neo4j • A bit of hacking • Current research • Q&A @bachmanm
  • 3. Not Only SQL @bachmanm
  • 4. Why NOSQL now? Driving trends @bachmanm
  • 5. Trend 1: Data Size @bachmanm
  • 6. Trend 2: Connectedness GGG Onotologies RDFa Folksonomies Information connectivity Tagging Wikis UGC Blogs Feeds Hypertext Text Documents @bachmanm
  • 7. Trend 3: Semi-structured Data @bachmanm
  • 8. Trend 4: Application Architecture (80’s) Application DB @bachmanm
  • 9. Trend 4: Application Architecture (90’s) App App App DB @bachmanm
  • 10. Application Application Application DB DB DB @bachmanm
  • 11. Side note: RDBMS performance Salary List @bachmanm
  • 13. Key-Value Stores • “Dynamo: Amazon’s Highly Available Key- Value Store” (2007) • Data model: – Global key-value mapping – Big scalable HashMap – Highly fault tolerant (typically) • Examples: – Riak, Redis, Voldemort @bachmanm
  • 14. Pros and Cons • Strengths – Simple data model – Great at scaling out horizontally • Scalable • Available • Weaknesses: – Simplistic data model – Poor for complex data @bachmanm
  • 15. Column Family (BigTable) • Google’s “Bigtable: A Distributed Storage System for Structured Data” (2006) • Data model: – A big table, with column families – Map-reduce for querying/processing • Examples: – HBase, HyperTable, Cassandra @bachmanm
  • 16. Pros and Cons • Strengths – Data model supports semi-structured data – Naturally indexed (columns) – Good at scaling out horizontally • Weaknesses: – Unsuited for interconnected data @bachmanm
  • 17. Document Databases • Data model – Collections of documents – A document is a key-value collection – Index-centric, lots of map-reduce • Examples – CouchDB, MongoDB @bachmanm
  • 18. Pros and Cons • Strengths – Simple, powerful data model (just like SVN!) – Good scaling (especially if sharding supported) • Weaknesses: – Unsuited for interconnected data – Query model limited to keys (and indexes) • Map reduce for larger queries @bachmanm
  • 19. Graph Databases • Data model: – Nodes with properties – Named relationships with properties – Hypergraph, sometimes • Examples: – Neo4j (of course), Sones GraphDB, OrientDB, InfiniteGraph, AllegroGraph @bachmanm
  • 20. Pros and Cons • Strengths – Powerful data model – Fast • For connected data, can be many orders of magnitude faster than RDBMS • Weaknesses: – Sharding • Though they can scale reasonably well • And for some domains you can shard too! @bachmanm
  • 21. Social Network “path exists” Performance • Experiment: • ~1k persons # persons query time • Average 50 friends per Relational 1000 2000ms database person Neo4j 1000 2ms • pathExists(a,b) Neo4j 1000000 2ms limited to depth 4 • Caches warm to eliminate disk IO @bachmanm
  • 23. What are graphs good for? • Recommendations • Business intelligence • Social computing • Geospatial • MDM • Systems management • Web of things • Genealogy • Time series data • Product catalogue • Web analytics • Scientific computing (especially bioinformatics) • Indexing your slow RDBMS • And much more! @bachmanm
  • 24. Neo4j is a Graph Database So we need to detour through a little graph theory @bachmanm
  • 26. Meet Leonhard Euler • Swiss mathematician • Inventor of Graph Theory (1736) @bachmanm http://en.wikipedia.org/wiki/File:Leonhard_Euler_2.jpg
  • 28. Property Graph Model name: Michal Bachman • nodes / vertices • relationships / edges title: Intro to Neo4j • properties duration: 45 name: Neo4j name: NOSQL @bachmanm
  • 29. Graphs are very whiteboard-friendly @bachmanm
  • 31. Neo4j @bachmanm
  • 32. 32 billion nodes 32 billion relationships 64 billion properties @bachmanm
  • 38. Community Advanced Enterprise @bachmanm
  • 39. How do I use it? @bachmanm
  • 40. Getting started is easy • Single package download, includes server stuff – http://neo4j.org/download/ • For developer convenience, Ivy (or whatever): – <dependency org="org.neo4j" name="neo4j-community" rev="1.9.M04"/> @bachmanm
  • 41. Run it! • Server is easy to start stop – cd <install directory> – bin/neo4j start – bin/neo4j stop • Provides a REST API in addition to the other APIs we’ve seen • Provides some ops support – JMX, data browser, graph visualisation @bachmanm
  • 42. Embed it! • If you want to host the database in your process just load the jars • And point the config at the right place on disk • Embedded databases can be HA too – You don’t have to run as server @bachmanm
  • 43. name: Phil Johnson title: Cognitive Psychology duration: 30 name: Michal Bachman name: UX title: Intro to Neo4j duration: 45 name: Martin Macke name: Jeremy White INTERESTED name: Neo4j name: NOSQL @bachmanm
  • 44. GraphDatabaseService neo = new EmbeddedGraphDatabase("/data/webexpo"); Transaction tx = neo.beginTx(); try { Node speaker = neo.createNode(); speaker.setProperty("name", "Michal Bachman"); Node talk = neo.createNode(); talk.setProperty("title", "Intro to Neo4j"); Relationship delivers = speaker.createRelationshipTo(talk, DynamicRelationshipType.withName("DELIVERS")); delivers.setProperty("day", ”Saturday"); neo.index().forNodes("people") .add(speaker, "name", "Michal Bachman"); } finally { tx.finish(); } name: Michal Bachman DELIVERS title: Intro to Neo4j day: Saturday @bachmanm
  • 47. Core API • Nodes – Properties (optional K-V pairs) • Relatiosnhips – Start node (required) – End node (required) – Properties (optional K-V pairs) @bachmanm
  • 49. name: Phil Johnson title: Cognitive Psychology duration: 30 name: Michal Bachman name: UX title: Intro to Neo4j duration: 45 name: Martin Macke name: Jeremy White INTERESTED name: Neo4j name: NOSQL @bachmanm
  • 50. All Conference Topics Node webExpo = neo.getReferenceNode(); for (Relationship talksAt : webExpo.getRelationships(INCOMING, TALKS_AT)) { Node speaker = talksAt.getStartNode(); for (Relationship delivers : speaker.getRelationships(OUTGOING, DELIVERS)) { Node talk = delivers.getEndNode(); for (Relationship about : talk.getRelationships(OUTGOING, ABOUT)) { String topicName = (String) about.getEndNode().getProperty(NAME); //add to result... } } } ------------------- Printing all topics All topics: development, data, advertising, education, usa, business, microsoft, webdesign, software, responsiveness, ux, e-commerce, php, psychology, crm, api, chef, javascript, patterns, product design, marketing, metro, social media, web, startup, analytics, lean, cqrs, node.js, branding, cloud, testing, neo4j, rest, css, design, publishing, nosql. Took: 2 ms
  • 51. Which talks should I attend? @bachmanm
  • 52. name: Phil Johnson title: Cognitive Psychology duration: 30 name: Michal Bachman name: UX title: Intro to Neo4j duration: 45 name: Martin Macke name: Jeremy White INTERESTED name: Neo4j name: NOSQL @bachmanm
  • 53. Which talks should I attend? TraversalDescription talksTraversal = Traversal.description() .uniqueness(Uniqueness.NONE) .breadthFirst() .relationships(INTERESTED, OUTGOING) .relationships(ABOUT, INCOMING) .evaluator(Evaluators.atDepth(2)); Node attendee = neo.index().forNodes("people").get("name", ”Jeremy White").getSingle(); Iterable<Node> talks = talksTraversal.traverse(attendee).nodes(); //iterate over talks and print ------------------------------------------ Suggesting talks for 100 random attendees. ... Aneta Lebedova: Measure Everything!, To the USA, The real me. Took: 1 ms Bohumir Kubat: Beyond the polar bear, How (not) to do API, Critical interface design. Took: 1 ms Vladimir Vales: Application Development for Windows 8 Metro. Took: 1 ms Suggested talks for 100 random attendees in 449 ms
  • 54. What do we have in common? @bachmanm
  • 55. name: Phil Johnson title: Cognitive Psychology duration: 30 name: Michal Bachman name: UX title: Intro to Neo4j duration: 45 name: Martin Macke name: Jeremy White INTERESTED name: Neo4j name: NOSQL @bachmanm
  • 56. What do we have in common? //retrieve attendeeOne and attendeeTwo from index int maxDepth = 2; Iterable<Path> paths = GraphAlgoFactory .allPaths(Traversal.expanderForAllTypes(), maxDepth) .findAllPaths(attendeeOne, attendeeTwo); for (Path path : paths) { //print it } ------------------------------------------------------------ Finding things in common for 100 random couples of attendees ... Karel Kunc and Phil Smith: (Karel Kunc)--[INTERESTED]-->(ux)<--[INTERESTED]--(Phil Smith), (Karel Kunc)--[DISLIKED]-->(Be a punk consumer!)<--[DISLIKED]--(Phil Smith), (Karel Kunc)--[DISLIKED]-->(Beyond the polar bear)<--[LIKED]--(Phil Smith), (Karel Kunc)--[LIKED]-->(Shipito.com – business in USA)<--[LIKED]--(Phil Smith). Took: 0 ms. ... Found things in common for 100 random couples of attendees in 142 ms.
  • 57. Youngsters, Y U No Like Java? @bachmanm
  • 58. Who is my beer mate? myself beerMate:? talk:? @bachmanm
  • 59. Who is my beer mate? (myself) (beerMate) (talk) @bachmanm
  • 60. Who is my beer mate? start myself=node:people(name = "Emil Votruba") match (myself)-[:LIKED]->(talk)<-[:LIKED]-(beerMate) return distinct beerMate.name, count(beerMate) order by count(beerMate) desc limit 5; @bachmanm
  • 61. Cypher Query start myself=node:people(name = ”Alex Smart") match (myself)-[:LIKED]->(talk)<-[:LIKED]-(beerMate) return distinct beerMate.name, count(beerMate) order by count(beerMate) desc limit 5; @bachmanm
  • 62. Cypher Query start myself=node:people(name = ”Emil Votruba") match (myself)-[:LIKED]->()<-[:LIKED]-(beerMate) return distinct beerMate.name, count(beerMate) order by count(beerMate) desc limit 5; @bachmanm
  • 63. Who is my beer mate? @bachmanm
  • 64. Current Research • Graph partitioning • Graph analytics (“OLAP” and predictive) • Performance improvements • Query languages • MVCC and single-threaded write models • ACID (tradeoffs for weakening C and I) • Yield and Harvest in distributed systems • Application-level – Recommendations – Protein interactions –… @bachmanm
  • 65. Questions? Neo4j: http://neo4j.org Neo Technology: http://neotechnology.com Twitter: @bachmanm Code: git://github.com/bachmanm/neo4j-imperial.git

Editor's Notes

  1. WelcomeIntroduce myself, NeoTechMotivations:Presented this at a conference Conversations with FriendsTalked to Serena, no affiliationBigData and NOSQL popular termsGraphs are getting more and more popular (Facebook)Not much attention at ImperialAsk about the audience, heard about graph databases? Graphs? Databases?Outcomes:Learn about a new technologySee application of graph theory in practiceTailored to students (not industry)Agenda:Intro to NOSQLIntro to Graph DatabasesIntro to Neo4jPractical part – how to work with oneReal experiencesCurrent researchQ &amp; A
  2. Why now?Not woke up one day thinking Rel DBs are not cool any moretrends
  3. Generate, process, store and work with
  4. UGC = User Generated ContentGGG = Giant Global Graph (what the web will become)– každýkousíček, každájednotkazajímavýchdat je sémantickypropojená s každoudalšízajímavoujednotkoudat (Tim Berners-Lee)Data jsoupropojenější (lineárně)RDFa (Resource Description Framework in attributes), českysystémpopisuzdrojů v atributech, je technologie pro přenosstrukturovanýchinformacíuvnitřwebovýchstránek. RDFa je jedenzezpůsobůzápisu (serializace) datovéhoformátu Resource Description Framework (RDF). Ontologie je v informaticevýslovný (explicitní) a formalizovanýpopisurčitéproblematiky. Je to formální a deklarativníreprezentace, kteráobsahujeglosář (definicipojmů) a tezaurus (definicivztahůmezijednotlivýmipojmy). Ontologie je slovníkem, kterýslouží k uchovávání a předáváníznalostitýkající se určitéproblematiky.
  5. Data losing predictable structureIndividualisation of data, can’t box each individual, want data about meShape of data, less predictable structureDecentralisation of data creation accelerates this trend
  6. Apps can choose what makes sense to store the data
  7. This is strictly about connected data – joins kill performance there.No bashing of RDBMS performance for tabular transaction processing
  8. Krásavesvětě NOSQL - nikdovámnepřikazuje, vybratdatabázi, kteráodpovídátypučicharakteristicedat, se kterýmipracujete. key-value databáze: jedenklíč - jednahodnota, hash mapy, Redis, Riak (Amazon Dynamo), Většinouvysocetolerantnívůčivýpadkům, Jednoduchýdatový model, Vynikajícíhorizontálníškálovatelnost, Dostupnost, BigTabledatabáze: k-vvvvvvv store s implicitnímiindexy, Cassandra (Google), PodporačástečněstrukturovanýchdatAutomatický index (sloupce), Dobráhorizontálníškálovatelnost, opětnevhodné pro propojená dataDokumentovédatabáze, známá je například subversion, MongoDB, CouchDB, …Kolekcedokumentů, Dokument je kolekce key-value párů, Index je důležitý, hodně map-reduce,Škálovatelnostcelkemdobrá. (Ne takjako key-value, složitějšímdatovýmmodelem, Jednoduchý a výkonýdatový model, jako subversion.Nevýhodouvšech 3 je nejsouúplněvhodné pro hustěpropojená data. Přílišjednoduchýdatový (HashMap, rychlá, ale…) model znamená, žechceme-li získatjakékolivokamžitéhlubšíporozuměníuloženýmdatům. Musí to býtzodpovědnostíaplikačnívrstvy (čili to musímenějaknaprogramovat). Velmičastojsoutedytytodatabázespojeny s frameworkyjako Map-Reduce, pro kterémusímevytvořitúlohy, kterénámtotoporozuměníumožnízískat.Map-reduce je dávkováoperace (to bychuvedl v kontrastu s on-line / in-the-click-stream synchronníoperací), abystezískalipohlednavašepropojená data.Všechny 3 pracují s agregovanýmidaty, tzn. Ževyžadujístruktutupředem, data, kterápatřílogicky k sobě (jakoobjednávka a jejíjednotlivépoložky), jsou v databáziuloženy u sebe a je k nimtaké v dotazechpřistupovánojako k celku. V key-value úložištích je tímcelkemhodnota, v CF CF a v Dok. Dbsdokumenty.OKvpřípadech, kdypřístup k datůmvyžadujepřesnětutostrukturu. Pokud se ale chcemena data podívatjinak, napříkladanalyzovat z objednávekcelkovéprodejejednotlivýchproduktů, musíme s toustrukturoutrochubojovat a to je ten důvod, proč se tolikmluví o map-reduce vespojení s těmitodatabázemi. Výhodouukládánídat v neagregovanýchformách je to, že se dajíanalyzovat a prezentovat z různáchúhlůpohledy v závislotinakonkrétnímpřípadě.A samozřejměgrafovédatabáze, kvůlikterýmtudnesjsme a o kterých se tohodozvíme o něcovíczaminutku
  9. History – Amazon decide that they always wanted the shopping basket to be available, but couldn’t take a chance on RDBMSSo they built their ownBig risk, but simple data model and well-known computing science underpinning it (e.g. consistent hashing, Bloom filters for sensible replication)+ Massive read/write scale- Simplistic data model moves heavy lifting into the app tier (e.g. map reduce)
  10. Mongo DB has a reputation for taking liberties with durability to get speedCouch DB has good multimaster replication from Lotus Notes
  11. People talk about Codd’s relational model being mature because it was proposed in 1969 – 42 years old.Euler’s graph theory was proposed in 1736 – 275 years old.
  12. Can’t easily shard graphs like documents or KV stores.This means that high performance graph databases are limited in terms of data set size that can be handled by a single machine.Can use replicas to speed things up (and improve availability) but limits data set size limited to a single machine’s disk/memory.Some domains can shard easily (.e.g geo, most web apps) using consistent routing approach and cache sharding – we’ll cover that later.
  13. Teoriegrafůzkoumávlastnostistruktur, zvanýchgrafy. Ty jsoutvořenyvrcholy, kteréjsouvzájemněspojenéhranami. Znázorňuje se obvyklejakomnožinabodůspojenýchčárami. Formálně je grafuspořádanoudvojicímnožinyvrcholů V a množinyhran E.
  14. SedmmostůměstaKrálovce (dnes Kaliningrad)Kdodělá pro velkoufirmu, tímmyslímněkolikvrstevmanagementu, softwarovýarchitektnajinémpatřenežvývojářiTatoinformace je pro Vás, v těchtofirmáchbývátěžképrosadit “nové” technologie. Ale relační model, se kterýmpřišel E.F. Codd v roce 1969, je pouze 43 let starý. Grafový model je 276 starý. TakžepříštěažVámšéfnebochytrýarchitektřeknenaadopci NOSQL něcovesmyslu “tadypoužívámejenomzralé a prokázanévyspělétechnologie”, víte, kterýmsměrem ho máteposlat… tímmámnamyslitřebatutopřednáškunawebunebopříslušnéstránkynawikipedii. Takžejakukládáme data v grafu…
  15. Takžejakukládáme data v grafu…V grafuukládámedata jakovrcholy a vrcholyjsouvlastnědokumenty, kterémodoumítlibovolnéklíče a k nimpřiřazenéhodnoty. Stejnějakodokument v MongoDB. V čem se grafliší od MongoDB je že v grafujsouvztahymezivrcholy. A to je trade-off, MongoDB je lépeškálovatelné, protožetohlenedělá. Neo4J je lepší pro propojená data, tohledělá. Ukládávztahymezijednotlivýmivrcholy. Ale nenítakdobřeškálovatelné. A do musímevzít v potazpřiřešeníVašichproblémů: chcetemasivníškálovatelnost, nebookamžitýnáhled do propojenostiVašich dat. POPSAT GRAFVztahymajisemantickyvyznam! Recnici, prednasky v RDBMSJe to poměrněintuitivnízpůsobukládánídat! Úkolgrafovédatabáze je vzíttatointuitivní data, kterásimůžemejednodušenačrtnoutnatabulinebokuspapíru a rychle je procházetvevašichprogramech.
  16. A to je jednahezkávlastnostgrafů – jsouideální pro tabule,zadnístranyobálek, pivníchtácků a krabiček od cigaret… to jsouvěci, nakterýchtynejlepšídesigny (zejménavestartupech) většinouvznikajíJájsemsivybraljakopříkladWebExpo, původnějsemchtělzmapovatkorupčníaféryčeskýchpolitiků, ale tohle je o něconeškodnější. Vztahymeziřečníky, přednáškam, tématy, účastníky a podobněsimůžemenakreslitnapivnítácek! WebExpo je doména,kterámáspoustuvztahů – řečnícimajípřednášky, …To simůžetejednodušenakreslitnatabuli, to je mimochodem to, co dělámejakoprogramátoři, kdyžsedíme s lidmi, kteřípotřebujínějakýkussoftwaru a my se snažímetomu business problému, tédoméněporozumět. Sednemsi k tabuli, nakreslímezákazníky, objednávky, faktury, produkty a podobně a vztahymezinimi!A co udělámepak – vezmemenášpěkný design a denormalizujeme ho. Potíme se vymýšlením, jak to všechnonaládujeme do tabulek. A jsmešťastní a usměvaví, než to zpustímenaživo, do provozu…. A ono to bežíjakželva… Co uděláme? Denormalitzujemenáš model! Všechnaenergie, kteroujsmeinvestovali, krev, pot a slzy, všechno v niveč. U grafovédatabáze, to co je napapíře je přesně to, co naházíte do databáze.
  17. To neznamená,žejsteomluveni s designovéfáze. Pořád se musítehlubocezamysletnadtím, jaké entity (neboobjekty) tvořívašidoménu a jakéjsoumezinimivztahy! Stálepotřebujete design.Nemůžetejednoduševzít data ztabulek, kterámáte a násilím je natřískat do vašízbrusunovégrafovédatabáze. Člověkmusízačítmyslet v nódách a vztazích.Přinavrhovánídatovéhomodelu pro WebExpomusímeudělathodnědesignovýchrozhodnutí: jakodlišitřečníky od účastníků? A je to vůbecpotřeba? Udělatzepátka a sobotynódy, nebojenomvlastnostnajednotlivýchpřednáškách?Stálemusítedělat design, ale pointa je že design datovéhomodelu pro grafovoudatabázimůžebýtpříjemná a přirozenázkušenost.
  18. Stará se proVás o nódy, vztahymezinimi a indexy.Neo4j je stabilní a běží od roku 2003ProcházíaktivnímvývojemPrimárně pro Javu, ale použitelná se spoustoudalšíchtechnologiíIdeální pro škáludesítekserverů v clusteru, ne pro stovkyPro hustěpropojená data, není to KV store
  19. 32 billion nodes, 32 billion relationships, 64 billion properties
  20. Plně a militantně ACID. Kdoneví, co to znamená?Rychlevysvětlit: atomicity, consistency, isolation, durabilityNěkterédalší NOSQL databáze se vzdávajíněkterýchgarancíveprospěchvýkonu, u Neo4j tohlevypnoutnejde. Data jsouvždyzapsánana disk.
  21. Vyhledatzacatek v indexu (Lucene)Prozkoumavatokoli
  22. Vyhledatzacatek v indexu (Lucene)Prozkoumavatokoli
  23. Neo mázabudovanoucelouknihovnugrafovýchalgoritmů, jakonejkratšícesta, všechnycesty, atp
  24. 1m hops zasekundunanormálnímlaptopu, žádnýrozdílpřiznásobenípočtudatHigh performance graph operationsTraverses 1,000,000+ relationships / second on commodity hardware
  25. Obecněpokudpoužíváte MySQL a neplatítezaněj, nebudeteplatitaniza Neo.
  26. Pojďmesikázatpoužití v embedded módunakonkrétnímpříkladu. Vytvořiljsemgraf z webexpa, řečníci a přednáškyjsouopravdové, 1000 účastníkůmánáhodněvygenerovanájména. Popsatgraf a scénář.KdonečteJavuKodbudenagithubu
  27. Vztahymůžoubýtbuďřetězceznaků, neboEnum, kterévámdajívýhodustatickéhotypování v IDE, pro Neo4j v tom nenížádnýrozdíl.Postupopakujemedokudnemámecelýgraf
  28. Tohle je screenshot z webovékonzole, kdemůžemegrafvizálněprocházet. Běžínalaptopu, dámVámnakonci URL, abystesi s tímmohlipohrát.Tak, mámegraf, ale jak z nějteďdostaneme data ven?
  29. Existujeněkolikzpůsobů,jakpsátdotazy v Neo4j, liší se čitelností, složitostí, výkonem a úrovníabstrakce. UkážuVámněkterézezpůsobů a začnuodspoda, tzn. On nativníhonejrychlejšího API.
  30. Core API pracujepřímo s jednotkami, kteréjsme do databázeuložili – vrcholy, hrany a jejichvlastnosti.
  31. Podívejme se ještějednounavelýgraf. Novýgrafmávždyjednunódu s ID 0, z téjsmeudělalliWebExpo.
  32. Tohle je imperativní API, všechnupráciděláprogramátor, je nejvýkonnější
  33. Pojďme se podívat o úroveňvýš co se abstrakcetýčenatakzvané traversal API, kterénámumožnípsátdotazydeklarativně, to znamenápopsat, jakchcemegrafprocházet. Samotnéprocházeníudělá Neo4J zanás.
  34. Můžemepsátvlastníevaluatory
  35. Dalšípovedenoufunkcí je knihovnaalgoritmů pro hledánícestmezidvěmauzly.
  36. Takénejkratšícesta, Dijkstra a další
  37. Těžké pro neprogramátory, pojďmě se podívatnaněcojednoduššího
  38. Na nejvyššíúrovniabstrakce Neo4j zprostředkovávásvůjvlastníjazyk pro psanídotazů, částečněinspirovaný SQL. Ten jazyk se jmenuje Cypher a rozumílidskyčitelnýmpříkazům, jakonapříkladtomu, kterýtadyteďvidíte.
  39. Musímenědezačít, napomocsivezmeme index s názvem people, kdenajdemepanaEmilaVotrubupodlejména.Dálemusímeupřesnit, co za data vlastněchcemezískat, v tomtopřípadějménočlověka a skóre, kolikvěcímámespolečnýchNakonecasinechcemejítnapivoúplně se všemi, ale janomřekněme s 5 lidmi, se kterýmitohomámespolečnéhonejvícAsividítevliv SQL----- Meeting Notes (09/09/2012 20:18) -----animace
  40. Musímenědezačít, napomocsivezmeme index s názvem people, kdenajdemepanaEmilaVotrubupodlejména.Dálemusímeupřesnit, co za data vlastněchcemezískat, v tomtopřípadějménočlověka a skóre, kolikvěcímámespolečnýchNakonecasinechcemejítnapivoúplně se všemi, ale janomřekněme s 5 lidmi, se kterýmitohomámespolečnéhonejvícAsividítevliv SQL----- Meeting Notes (09/09/2012 20:18) -----animace
  41. Musímenědezačít, napomocsivezmeme index s názvem people, kdenajdemepanaEmilaVotrubupodlejména.Dálemusímeupřesnit, co za data vlastněchcemezískat, v tomtopřípadějménočlověka a skóre, kolikvěcímámespolečnýchNakonecasinechcemejítnapivoúplně se všemi, ale janomřekněme s 5 lidmi, se kterýmitohomámespolečnéhonejvícAsividítevliv SQL----- Meeting Notes (09/09/2012 20:18) -----animace
  42. A výsledek pro panavotrubu.
  43. “Tales from the Trenches” for further tips