Enterprise Ready: Neo4j in Production
#1 Database for Connected Data
Jeff Morris
Head of Product
Enterprise Ready
Neo4j in Production
June 2017
Who We Are: The Graph Database for Connected Data
Neo4j is an enterprise-grade native graph database that enables you to:
• Store and query data relationships
• Traverse any levels of depth on real-time
• Add and connect new data on the fly
• Performance
• ACID Transactions
• Agility
Designed, built and tested natively
for graphs from the start to ensure:
• Developer Productivity
• Hardware Efficiency
Graph is Top Trending Database Type
Neo4j Enterprise Metadata &
Content Mangement Use Cases
Sample of Connected Graphs
Organization Identity & Access Network & IT Ops
• SF-based C2C rental platform
• Dataportal democratizes data access for
growing number of employees while improving
discoverability and trust
• Data strewn everywhere—in silos, in segmented
departments, nothing was universally accessible
Business Problem
• Data-driven culture hampered by variety and
dependability of data, tribal knowledge and
word-of-mouth distribution
• Needed visibility into information usage, context,
lineage and popularity across company of 3,000+
Solution and Benefits
• Offers search with context & metadata, user &
team-centric pages for origin & lineage
• Nodes are resources: data tables, dashboards,
reports, users, teams, business outcomes, etc.
• Relationships reflect consumption, production,
association, etc.
• Neo4j, Elasticsearch, Python
Knowledge Graph, Metadata Management8
CE users since 2017
• Large global bank
• Deploying Reference Data to users and systems
• 12 data domains, 18 datasets, 400+ integrations
• Complex data management infrastructure
Business Problem
• Master data silos were inflexible and hard to
• Needed simplification to reduce redundancy
• Reduce risk when data is in consumers’ hands
• Dramatically improve efficiency
Solution and Benefits
• Data distribution flows improved dramatically
• Knowledge Base improves consumer access
• Ad-hoc analytics improved
• Governance, lineage and trust improved
• Better service level from IT to data consumers
Master Data Management / Metadata9
CE Customer since 2016 Q1EE Customer since 2015
• 5 year long drug discovery research
• Parse & Navigate over 25 Million scientific papers
• Sourced from National Library of Research and
tagging of “Medical Subject Headers” (MeSH tags)
Business Problem
• Seeking to automate phenotype, compound and
protein cell behavior research by using previously
documented research more effectively
• Text mining for research elements like DNA strings,
proteins, RNA, chemicals and diseases
Solution and Benefits
• Found ways to identify compound interaction
behavior from millions of research documents
• Relations between biological entities can be
identified and validated by biologic experts
• Still very challenging to keep up-to-date, add
genomics data, and find a breakthrough
Content Management / Biomedical Research10
CE Customer since 2016 Q1CE Customer since 2012
• How Neo4j is used in investigations
• Non-technical reporters manually gather data
• “Low-tech” data curation
• Journalists want to model data as a story, not
as data
Business Problem
• Identify repeated business relationships among
individuals and their holdings and accounts
• Scan documents and identify possible entities,
then create relationships between people and
• Names and alias variances
Solution and Benefits
• Uses Neo4j in “story discovery” phase
• Uncovers shortest paths for leads for reporters
• Many investigations underway now
Columbia University EDUCATION
Investigative Journalism / Fraud Detection11
CE Customer since 2016 Q1EE Customer since 2015 Q4
• eBay Israel, Entity Management Platform
• Taxonomy for hundreds of thousands of entities like
categories, products, sellers, sales, buyers, stores,
• Entities have permanent “souls” and “states” in
which they exist throughout their lifecycle
• All to make editing product items easy and fast
Business Problem
• Users demand high interactivity isolated
workspaces with inheritance for building pages
• Support versioning of entities so that users can
easily make changes, while preserving history of
its previous states
Solution and Benefits
• Chose Neo4j for performance, flexibility,
developer productivity
• Easy to learn
• Flexible way to represent how data entities
change throughout their lifecycle
• Patent pending
Master Data Management / Metadata12
CE Customer since 2016 Q1EE Customer since 2015 Q4
• French Telecom
• Big Data Governance in support for GDPR
• Environment with Hadoop, Analytics,
Recommendation engines, etc.
Business Problem
• Manage people, roles & rights, flow, audit, log
management, processes, policies, lineage,
metadata, lifecycles, security, etc…
• All because GDPR arrives in May 2018
Solution and Benefits
• Governance system oversees all systems
• Enforces correct policies
• Allows flexibility beyond Hadoop
• Architect has written Neo4j French manual
Master Data Management / Metadata13
CE Customer since 2016 Q1EE Customer since 2015
Neo4j in the Enterprise
Native Graph Differentiation
Graph Overview
name: “Dan”
born: May 29, 1970
twitter: “@dan”
name: “Ann”
born: Dec 5, 1975
Jan 10, 2011
brand: “Volvo”
model: “V70”
Neo4j Invented the Labeled Property Graph Model
• Can have name-value properties
• Can have Labels to classify nodes
• Relate nodes by type and direction
• Can have name-value properties
Neo4j Advantage - Agility
Cypher: Powerful and Expressive Query Language
MATCH (:Person { name:“Dan”} ) -[:MARRIED_TO]-> (spouse)
Dan Ann
Neo4j Advantage – Developer productivity
Example HR Query in SQL The Same Query using Cypher
MATCH (boss)-[:MANAGES*0..3]->(sub),
WHERE = “John Doe”
RETURN AS Subordinate,
count(report) AS Total
Project Impact
Less time writing queries
• More time understanding the answers
• Leaving time to ask the next question
Less time debugging queries:
• More time writing the next piece of code
• Improved quality of overall code base
Code that’s easier to read:
• Faster ramp-up for new project members
• Improved maintainability & troubleshooting
Productivity Gains with Graph Query Language
The query asks: “Find all direct reports and how many people they manage, up to three levels down”
Open Source
(Available to anyone)
Apache 2.0
Open Source
(As part of Neo4j)
GPL v3
Open Process
(Open to anyone)
Formal Standard
(Standards Body)
e.g. ANSI, ISO
Documentation, TCK, Grammar, Parser
Opening the Language
Vendor Support & Interest
Graph Composition
SQL Integration
SQL Cypher
Next Steps for Cypher
Feb ’17, May ’17, Sep '17
oCIM Summer of Syntax
Twice-Monthly Calls
Cypher Language Group
Cypher Improvement
Requests & Proposals
Evolving the Language Together
Graph Visualizations in Neo4j Browser
One more thing…
RDBMS Vocabulary Mapped to Graph Modeling
Relational DB Construct Graph DB Construct
Entity table Node labels
Row Node
Columns Node properties
Technical primary keys Replace with business primary keys
Constraints Unique constraints for business keys
Indexes Indexes on any property
Foreign keys Relationships
Default values Node keys
De-normalized or duplicated data Create separate nodes
Join tables Relationships
Join table columns Relationship properties
Relational DBMSs Can’t Handle Relationships Well
• Cannot model or store data and relationships
without complexity
• Performance degrades with number and levels
of relationships, and database size
• Query complexity grows with need for JOINs
• Adding new types of data and relationships
requires schema redesign, increasing time to
… making traditional databases inappropriate
when data relationships are valuable in real-time
Slow development
Poor performance
Low scalability
Hard to maintain
Queries can take non-sequential,
arbitrary paths through data
Real-time queries need speed and
consistent response times
Queries must run reliably
with consistent results
A single query can
touch a lot of data
Relationship Queries Strain Traditional Databases
At Write Time:
data is connected
as it is stored
At Read Time:
Lightning-fast retrieval of data and relationships via
pointer chasing
Index free adjacency
Graph Optimized Memory & Storage
Neo4j: Native Graph from the Start
Native graph storage
Optimized for real-time reads and ACID writes
• Relationships stored as physical objects,
eliminating need for joins and join tables
• Nodes connected at write time, enabling
scale-independent response times
Native graph querying
Memory structures and algorithms optimized for graphs
• Index-free adjacency enables 1M+ hops per second via in-
memory pointer chasing
• Off-heap page cache improves operational robustness
and scaling compared with JVM-based caches
• “Minutes to milliseconds” performance improvement
Neo4j Advantage - Performance Neo4j Advantage - ACID Transactions
Connectedness and Size of Data Set
Relational and Other
NoSQL Databases
0 to 2 hops
0 to 3 degrees
Thousands of connections
Tens to hundreds of hops
Thousands of degrees
Billions of connections
“Minutes to
“Minutes to Milliseconds” Real-Time Query Performance
Equivalent Cypher Query
MATCH (you)-[:BOUGHT]->(something)<-[:BOUGHT]-(other)-[:BOUGHT]->(reco)
WHERE id(you)={id}
Traversal Speeds on Amazon Retail Dataset
Threads Hops per second
1 3-4 million
10 17-29 million
20 34-50 million
30 36-60 million
Social Recommendation Example
Neo4j Advantage - Performance
Graph databases are designed for data relationships
Discrete Data
connected data
Fit for Purpose: The Right Architecture for the Right Job
Other NoSQL Relational DBMS Graph DB
Connected Data
Focused on
Data Relationships
Development Benefits
Easy model maintenance
Easy query
Deployment Benefits
Ultra high performance
Minimal resource usage
Graph DatabaseRDBMS
TabularAggregate Oriented (3)
Key-Value, Column-Family,
Document Database
Source: Martin Fowler NoSQL Distilled
Database Management Systems
Five Key Sub-Patterns (incl. SQL)
NoSQL Databases Don’t Handle Relationships
• No data structures to model or store
• No query constructs to support data
• Relating data requires “JOIN logic”
in the application
• No ACID support for transactions
… making NoSQL databases inappropriate when
data relationships are valuable in real-time
queries due to
replicated in-memory
architecture and
index-free adjacency
Slow queries
due to
index lookups +
network hops
Using Graph
Using Other NoSQL to Join Data
Relationship Queries on non-native Graph Architectures
Neo4j Scalability
Dynamic pointer compression
Unlimited-sized graphs with no
performance compromise
Index partitioning
Auto-partitioning of indexes into
2GB partitions
Causal clustering architecture
Enables unlimited read scaling
with ACID writes and a choice
of consistency levels
Multi-Data Center Support
Creates HA, Fault Tolerant Global
Efficient processing
Native graph processing and storage
often requires 10x less hardware
Efficient storage
One-tenth the disk and memory
requirements of certain alternatives
Neo4j Advantage – Scalability
Neo4j Performance Improvements by Version
Neo4j 2.2 Neo4j 2.3 Neo4j 3.0 Neo4j 3.1 Neo4j 3.2
Complex Mixed-Workload Throughput
than 2.2
Raft-based architecture
• Continuously available
• Consensus commits
• Third-generation cluster architecture
Cluster-aware stack
• Seamless integration among drivers,
Bolt protocol and cluster
• No need for external load balancer
• Stateful, cluster-aware sessions with
encrypted connections
Streamlined development
• Relieves developers from complex infrastructure concerns
• Faster and easier to develop distributed graph applications
Neo4j Enterprise: Causal Clustering Architecture
Modern and Fault-Tolerant to Guarantee Graph Safety
Neo4j Advantage – Scalability
Global Cluster
Geo Aware
Load Balancing
Tiered Replicas
Full-Stack API
Now in Neo4j 3.2: Multi-DC Clustering
How Causally Consistent Reads Work
App ServerApp Server
an order
1: Read
App ServerApp Server
App ServerApp Server
2. Create
How it Works:
• Application chooses a consistency level
“Read Any” vs “Read your own writes”
• Cluster chooses appropriate members
Default optimizes for scalability
(i.e. read replica server for reads)
Causal Clustering Enables:
• Application-driven SLAs
• Optimizing for freshness vs. cost
• Tunability within an application
On an application & session basis
1: Read any replica | 2: Write [Tx 101] | 3: RYOW*[Tx 101] | 4: Write [Tx 102] | 5: RYOW [Tx 102]
Graph Transactions Over
ACID Consistency
Graph Transactions Over
Maintains Integrity Over Time Eventual Consistency Becomes Corrupt Over Time
The Importance of ACID Graph Writes
• Ghost vertices
• Stale indexes
• Half-edges
• Uni-directed ghost edges
Summary of Neo4j: Built for the Enterprise
Native Graph Storage
Designed, built, and tested for graphs
Native Graph Query Processing
For real-time, relationship-based apps
Evaluate millions of relationships in a blink
Whiteboard-Friendly Data Modeling
Faster projects compared to RDBMS
Data Integrity and Security
Fully ACID transactions, causal consistency
and enterprise security
Powerful, Expressive Query Language
Improved productivity, with 10x to 100x
less code than SQL
Scalability and High Availability
Architecture provides ideal balance of
performance, availability, scale for graphs
Built-in ETL
Seamless import from other databases
Fits easily into your IT environment, with
drivers and APIs for popular languages
Case Studies for Knowledge Graphs
and Recommendation Engines
Neo4j Case Studies
Networks are Graphs
network topology
Mobile Mobile
Sys Admins
Servers, on-premise virtual machines,
cloud virtual machines, etc.
Network Admins
Switches, Routers, Egress Points
App Admins
I.e. Salesforce, Marketo, SAP, Oracle
Apps, Tableau, SharePoint, DBA’s etc.
Internal Users
HR, Sales, Marketing, Data Analysts,
E-staff etc.
Numerous Customers & Partners
Network Admins
Switches, Routers, Egress Points
Sys Admins
Servers, on-premise virtual machines,
cloud virtual machines, etc.
App Admins
I.e. Salesforce, Marketo, SAP, Oracle
Apps, Tableau, SharePoint, DBA’s etc.
Internal Users
HR, Sales, Marketing, Data Analysts,
E-staff etc.
Network Admins
Switches, Routers, Egress Points
Sys Admins
Servers, on-premise virtual machines,
cloud virtual machines, etc.
App Admins
I.e. Salesforce, Marketo, SAP, Oracle
Apps, Tableau, SharePoint, DBA’s etc.
Internal Users
HR, Sales, Marketing, Data Analysts,
E-staff etc.
Network Admins
Switches, Routers, Egress Points
Sys Admins
Servers, on-premise virtual machines,
cloud virtual machines, etc.
App Admins
I.e. Salesforce, Marketo, SAP, Oracle
Apps, Tableau, SharePoint, DBA’s etc.
Internal Users
HR, Sales, Marketing, Data Analysts,
E-staff etc.
Network Admins
Switches, Routers, Egress Points
Sys Admins
Servers, on-premise virtual machines,
cloud virtual machines, etc.
App Admins
I.e. Salesforce, Marketo, SAP, Oracle
Apps, Tableau, SharePoint, DBA’s etc.
Internal Users
HR, Sales, Marketing, Data Analysts,
E-staff etc.
Network Admins
Switches, Routers, Egress Points
Sys Admins
Servers, on-premise virtual machines,
cloud virtual machines, etc.
App Admins
I.e. Salesforce, Marketo, SAP, Oracle
Apps, Tableau, SharePoint, DBA’s etc.
Internal Users
HR, Sales, Marketing, Data Analysts,
E-staff etc.
• Large Nordic Telecom Provider
• 1M Broadband routers deployed in Sweden
• Half of subscribership are over 55yrs old
• Each household connects 10 devices
• Goal to improve customer experience
Business Problem
• Broadband router enhancement to improve
customer experience
• Context-based in home services
• How to build smart home platform that allows
vendors to build new “home-centric” apps
Solution and Benefits
• New Features deployed to 1M homes
• API-based platform for easy apps that:
• Automatically assemble Spotify playlists
based on who is in the house
• Notify parents when children get home
• Build smart shopping lists
Smart Home / Internet of Things55
EE Customer since 2016 Q4
• Large Public University – “U-Dub”
• IT staff for 80K+ students and employees
• Transforming IT systems from mainframe to cloud
• Providing IT & data warehousing services to 3
campuses, 6 hospitals, and 6,300 EDW users
Business Problem
• Old Sharepoint metadata was too complicated
for users, not flexible and not transparent
• $1B project to migrate HR system from
mainframe to Workday needed to be smooth
• Future projects needed repeatable predictability
• Needed new glossary, impact analysis, analytics
Solution and Benefits
• Consulted with NDU peers, built simple model
• Built Visualizer with Elasticsearch, Neo4j & D3.js
• Improved predictability, lineage, and impact
understanding for over 6,300 users
University of Washington EDUCATION & RESEARCH
Metadata Management, IT & Network Operations56
CE Customer since 2016 Q1
• Ad-Tech supplier in NYC identifies "intent signals"
• Collects device-born consumer data from mobile,
desktops & tablets
• Contains device and buyer data on more than
90% of American households
• Supersized Graph
Business Problem
• Recognize buyer receptivity to offers near time of
• Device data and consumer behaviors change
• Triangulate who is holding a device, where and
when it happens, to signal active purchase intent,
and create real-time offers to assist user
Solution and Benefits
• 3 Billion nodes, 9 billion relationships
• 1 Billion daily transactions on 3 servers
• Hybrid solution with Neo4j, Hadoop, Spark,
MongoDB and Ruby
• Breakthrough results from 60%-250%
higher than industry benchmarks
Social Network, Internet of Things, and Real-Time Buyer Identification57
EE Customer since 2014 Q3
• World's largest hospitality / hotel company
• 7th largest web site on internet
• 1.5 M hotel rooms offered online by 2018
• Revenue Management System that allows
property managers to update their pricing rates
Business Problem
• Provide the right room & price at the right time
• Old rate program was inflexible and bogged down
as they increased the pricing options per property
per day
• Lay the path to be an innovator in the future
Solution and Benefits
• 2016-era rate program embeds Neo4j as "cache"
• Created a graph per hotel for 4500 properties in 3
• 1000% increase in volume over 4 years
• 50% decrease in infrastructure costs
• "Use Neo4j Support!"
Pricing Recommendations Engine58
EE Customer since 2014 Q2
• Personal shopping assistant
• Converses with buyer via text, picture and voice
to provide real-time recommendations
• Combines AI and natural language understanding
(NLU) in Neo4j Knowledge Graph
• First of many apps in eBay's AI Platform
Business Problem
• Improve personal context in online shopping
• Transform buyer-provided context into ideal
purchase recommendations over social platforms
• "Feels like talking to a friend"
Solution and Benefits
• 3 developers, 8M nodes, 20M relationships
• Needed high-performance traversals to respond
to live customer requests
• Easy to train new algorithms and grow model
• Generating revenue since launch
Knowledge Graph powers Real-Time Recommendations59
EE Customer since 2016 Q3
Case Study: Knowledge Graphs at eBay
Case Study: Knowledge Graphs at eBay
Case Study: Knowledge Graphs at eBay
Case Study: Knowledge Graphs at eBay
Case Study: Knowledge Graphs at eBay
Men’s Backpack
Case Study: Knowledge Graphs at eBay
Try it out at:
Case Study: Knowledge Graphs at eBay
Enterprise or Community Edition
Enterprise-Class Technology
Ready for real-time enterprise applications
Performance and Scalability
• Clustered replication across
data centers
• Unlimited graph sizes
• Intelligent online space reuse
• Enterprise lock manager
• Compiled runtime for common
• Kerberos authentication add-on
• Clustering on CAPI flash add-on
Monitoring and Administration
• Advanced monitoring by role
• Cypher query tracing
• Hot backups
• Enterprise security
Enterprise Schema Governance
• Property existence constraints
• Composite and node key constraints
Features in Community and Enterprise Editions
Both Editions—GRAPH Features Database Features Architecture Features
Labeled Property Graph Model ACID Transactions Language drivers for Java, Python, C# & JavaScript
Native Graph Processing & Storage High-performance Native API HTTPS plug-in
Graph Query Language “Cypher” High-performance caching REST API
Neo4j Browser w/ Syntax Highlighting Cost-based query optimizer RPM, Azure & AWS Cloud Delivery
Fast Writes via Native Label Index
Fast Reads via Composite Indexes
Enterprise Edition—GRAPH Features Database Features Architecture Features
Database storage reallocation Query monitoring with enriched metrics Enterprise Lock Manger accesses all available cores on server
Cypher query tracing
Compiled Cypher Runtime to
accelerate common queries
Causal Clustering, core and read-replica design
Node Key schema constraints User & role-based security Multi-Data Center Support for global scale
Property existence constraints LDAP & Active Directory Integration Driver-based load balancing
Kerberos Security plug-in Driver-based Causal Clustering API exposes routing logic
Bold is new in 3.2
Licensing Options
Edition / Program Audience License Price Point
Community Edition IT Developers GPLv3 Free
Enterprise Edition
Fair Trade
AGPL3 Free, but must publish source code
Enterprise Edition
Commercial ~$500/month/core
Early Startups
Early Stage, <20
Commercial Free until traction established
Startups w/ Traction <3M ARR Commercial $1,500/month
Most deployments require only 3 server cluster for fault tolerance & HA
Enterprise-Class Expertise
Neo4j Customer Success
Expert design, development and
deployment services
• Graph and application design
• Application deployment
• Data center configuration
• Developer and user training
• World-class support with SLAs
• Support portal and knowledge base
Graph Innovation Network
Worldwide community of Neo4j and
graph database experts
• Service providers
• OEMs and VARs
• Technology partners
• Open source community
Use Neo4j experts and join the Innovation Network.
Develop your apps right the first time.
The Largest Graph Innovation Network
3,000,000+ with 50k additional per month
Neo4j Downloads
3,000,000+ with 50k additional per month
Neo4j Downloads
225+ customers
50% from Global 2000
225+ customers
50% from Global 2000
Technology and Services Partners
Technology and Services Partners
450+ annual events & 10k attendees
Graph and Neo4j awareness and training
450+ annual events & 10k attendees
Graph and Neo4j awareness and training
Neo4j Meetup Members
Neo4j Meetup Members
Online and Classroom Education Registrants
Online and Classroom Education Registrants
Users Love Neo4j
Graph Visionaries
Enterprise Customers
Graph Visionaries
Enterprise Customers
System Integrators
System Integrators
IaaS, PaaSm, DBaaS
IaaS, PaaSm, DBaaS
The Density of the Neo4j Innovation Network
OEM & Tech
OEM & Tech
Graph Solutions
Data Science
Data Models
Graph Solutions
Data Science
Data Models
Technical Support
Packaged Services
Custom Services
Technical Support
Packaged Services
Custom Services
Online Training
Custom Onsite
Online Training
Custom Onsite
The Connected Enterprise Value Proposition
Fastest path to Graph Success
Innovation Launchpad
• Neo4j Enterprise Edition
• HA, Causal Cluster, MDC
• Better performance
• Hardened product
The Next Innovation
• Density of the network accelerates
innovation opportunity
• Thousands of project successes
• Partners, Service Providers,
Vendors, Academics, Researchers
Millions of Graph Hours
• Shrink learning curve
• Design advice
• Contextual experience
• Deploy & Ops support
Analysts are Invited to Attend GraphConnect NYC
Case Studies for Knowledge Graphs
and Recommendation Engines
Neo4j Case Studies

Enterprise ready: a look at Neo4j in production

  Enterprise Ready: Neo4j in Production #1 Database for Connected Data Jeff Morris Head of Product Marketing Enterprise Ready Neo4j in Production June 2017
  • 2. 6/28/2017 2 Who We Are: The Graph Database for Connected Data Neo4j is an enterprise-grade native graph database that enables you to: • Store and query data relationships • Traverse any levels of depth on real-time • Add and connect new data on the fly • Performance • ACID Transactions • Agility 3 Designed, built and tested natively for graphs from the start to ensure: • Developer Productivity • Hardware Efficiency Graph is Top Trending Database Type
  • 3. 6/28/2017 3 Neo4j Enterprise Metadata & Content Mangement Use Cases Sample of Connected Graphs Organization Identity & Access Network & IT Ops
  • 4. 6/28/2017 4 Background • SF-based C2C rental platform • Dataportal democratizes data access for growing number of employees while improving discoverability and trust • Data strewn everywhere—in silos, in segmented departments, nothing was universally accessible Business Problem • Data-driven culture hampered by variety and dependability of data, tribal knowledge and word-of-mouth distribution • Needed visibility into information usage, context, lineage and popularity across company of 3,000+ Solution and Benefits • Offers search with context & metadata, user & team-centric pages for origin & lineage • Nodes are resources: data tables, dashboards, reports, users, teams, business outcomes, etc. • Relationships reflect consumption, production, association, etc. • Neo4j, Elasticsearch, Python Airbnb Dataportal TRAVEL TECHNOLOGY Knowledge Graph, Metadata Management8 CE users since 2017 Background • Large global bank • Deploying Reference Data to users and systems • 12 data domains, 18 datasets, 400+ integrations • Complex data management infrastructure Business Problem • Master data silos were inflexible and hard to consume • Needed simplification to reduce redundancy • Reduce risk when data is in consumers’ hands • Dramatically improve efficiency Solution and Benefits • Data distribution flows improved dramatically • Knowledge Base improves consumer access • Ad-hoc analytics improved • Governance, lineage and trust improved • Better service level from IT to data consumers UBS FINANCIAL SERVICES Master Data Management / Metadata9 CE Customer since 2016 Q1EE Customer since 2015
  • 5. 6/28/2017 5 Background • 5 year long drug discovery research • Parse & Navigate over 25 Million scientific papers • Sourced from National Library of Research and tagging of “Medical Subject Headers” (MeSH tags) Business Problem • Seeking to automate phenotype, compound and protein cell behavior research by using previously documented research more effectively • Text mining for research elements like DNA strings, proteins, RNA, chemicals and diseases Solution and Benefits • Found ways to identify compound interaction behavior from millions of research documents • Relations between biological entities can be identified and validated by biologic experts • Still very challenging to keep up-to-date, add genomics data, and find a breakthrough Novartis PHARMACEUTICAL RESEARCH Content Management / Biomedical Research10 CE Customer since 2016 Q1CE Customer since 2012 Background • How Neo4j is used in investigations • Non-technical reporters manually gather data • “Low-tech” data curation • Journalists want to model data as a story, not as data Business Problem • Identify repeated business relationships among individuals and their holdings and accounts • Scan documents and identify possible entities, then create relationships between people and documents. • Names and alias variances Solution and Benefits • Uses Neo4j in “story discovery” phase • Uncovers shortest paths for leads for reporters • Many investigations underway now Columbia University EDUCATION Investigative Journalism / Fraud Detection11 CE Customer since 2016 Q1EE Customer since 2015 Q4
  • 6. 6/28/2017 6 Background • eBay Israel, Entity Management Platform • Taxonomy for hundreds of thousands of entities like categories, products, sellers, sales, buyers, stores, etc. • Entities have permanent “souls” and “states” in which they exist throughout their lifecycle • All to make editing product items easy and fast Business Problem • Users demand high interactivity isolated workspaces with inheritance for building pages • Support versioning of entities so that users can easily make changes, while preserving history of its previous states Solution and Benefits • Chose Neo4j for performance, flexibility, developer productivity • Easy to learn • Flexible way to represent how data entities change throughout their lifecycle • Patent pending eBay Israel ONLINE RETAILER Master Data Management / Metadata12 CE Customer since 2016 Q1EE Customer since 2015 Q4 Background • French Telecom • Big Data Governance in support for GDPR • Environment with Hadoop, Analytics, Recommendation engines, etc. Business Problem • Manage people, roles & rights, flow, audit, log management, processes, policies, lineage, metadata, lifecycles, security, etc… • All because GDPR arrives in May 2018 Solution and Benefits • Governance system oversees all systems • Enforces correct policies • Allows flexibility beyond Hadoop • Architect has written Neo4j French manual ORANGE TELECOMMUNICATIONS Master Data Management / Metadata13 CE Customer since 2016 Q1EE Customer since 2015
  • 7. 6/28/2017 7 Neo4j in the Enterprise Native Graph Differentiation Graph Overview CAR name: “Dan” born: May 29, 1970 twitter: “@dan” name: “Ann” born: Dec 5, 1975 since: Jan 10, 2011 brand: “Volvo” model: “V70” Neo4j Invented the Labeled Property Graph Model Nodes • Can have name-value properties • Can have Labels to classify nodes Relationships • Relate nodes by type and direction • Can have name-value properties MARRIED TO LIVES WITH PERSON PERSON 15 Neo4j Advantage - Agility
  • 8. 6/28/2017 8 Cypher: Powerful and Expressive Query Language MATCH (:Person { name:“Dan”} ) -[:MARRIED_TO]-> (spouse) MARRIED_TO Dan Ann NODE RELATIONSHIP TYPE LABEL PROPERTY VARIABLE Neo4j Advantage – Developer productivity 17 Example HR Query in SQL The Same Query using Cypher MATCH (boss)-[:MANAGES*0..3]->(sub), (sub)-[:MANAGES*1..3]->(report) WHERE = “John Doe” RETURN AS Subordinate, count(report) AS Total Project Impact Less time writing queries • More time understanding the answers • Leaving time to ask the next question Less time debugging queries: • More time writing the next piece of code • Improved quality of overall code base Code that’s easier to read: • Faster ramp-up for new project members • Improved maintainability & troubleshooting Productivity Gains with Graph Query Language The query asks: “Find all direct reports and how many people they manage, up to three levels down”
  • 9. 6/28/2017 9 Open Source (Available to anyone) Apache 2.0 Open Source (As part of Neo4j) GPL v3 Open Process (Open to anyone) CIR, CIP, oCIM Formal Standard (Standards Body) e.g. ANSI, ISO openCypher Documentation, TCK, Grammar, Parser Opening the Language Databases Tools ruruki Vendor Support & Interest
  • 10. 6/28/2017 10 Graph Composition Cypher Query Return SQL Integration SQL Cypher Next Steps for Cypher Feb ’17, May ’17, Sep '17 oCIM Summer of Syntax Twice-Monthly Calls Cypher Language Group Github Cypher Improvement Requests & Proposals Evolving the Language Together
  • 12. 6/28/2017 12 One more thing… RDBMS Vocabulary Mapped to Graph Modeling Relational DB Construct Graph DB Construct Entity table Node labels Row Node Columns Node properties Technical primary keys Replace with business primary keys Constraints Unique constraints for business keys Indexes Indexes on any property Foreign keys Relationships Default values Node keys De-normalized or duplicated data Create separate nodes Join tables Relationships Join table columns Relationship properties
  • 13. 6/28/2017 13 Relational DBMSs Can’t Handle Relationships Well • Cannot model or store data and relationships without complexity • Performance degrades with number and levels of relationships, and database size • Query complexity grows with need for JOINs • Adding new types of data and relationships requires schema redesign, increasing time to market … making traditional databases inappropriate when data relationships are valuable in real-time Slow development Poor performance Low scalability Hard to maintain Queries can take non-sequential, arbitrary paths through data Real-time queries need speed and consistent response times Queries must run reliably with consistent results Q A single query can touch a lot of data Relationship Queries Strain Traditional Databases 2 7
  • 14. 6/28/2017 14 At Write Time: data is connected as it is stored At Read Time: Lightning-fast retrieval of data and relationships via pointer chasing Index free adjacency Graph Optimized Memory & Storage Neo4j: Native Graph from the Start Native graph storage Optimized for real-time reads and ACID writes • Relationships stored as physical objects, eliminating need for joins and join tables • Nodes connected at write time, enabling scale-independent response times Native graph querying Memory structures and algorithms optimized for graphs • Index-free adjacency enables 1M+ hops per second via in- memory pointer chasing • Off-heap page cache improves operational robustness and scaling compared with JVM-based caches • “Minutes to milliseconds” performance improvement Neo4j Advantage - Performance Neo4j Advantage - ACID Transactions
  • 15. 6/28/2017 15 Connectedness and Size of Data Set ResponseTime Relational and Other NoSQL Databases 0 to 2 hops 0 to 3 degrees Thousands of connections 1000x Advantage Tens to hundreds of hops Thousands of degrees Billions of connections Graph “Minutes to milliseconds” “Minutes to Milliseconds” Real-Time Query Performance Equivalent Cypher Query MATCH (you)-[:BOUGHT]->(something)<-[:BOUGHT]-(other)-[:BOUGHT]->(reco) WHERE id(you)={id} RETURN reco Traversal Speeds on Amazon Retail Dataset Threads Hops per second 1 3-4 million 10 17-29 million 20 34-50 million 30 36-60 million 3 1 Social Recommendation Example Neo4j Advantage - Performance
  • 16. 6/28/2017 16 Graph databases are designed for data relationships Discrete Data Minimally connected data Fit for Purpose: The Right Architecture for the Right Job Other NoSQL Relational DBMS Graph DB Connected Data Focused on Data Relationships Development Benefits Easy model maintenance Easy query Deployment Benefits Ultra high performance Minimal resource usage Graph Graph DatabaseRDBMS TabularAggregate Oriented (3) Key-Value, Column-Family, Document Database Source: Martin Fowler NoSQL Distilled Database Management Systems Five Key Sub-Patterns (incl. SQL)
  • 17. 6/28/2017 17 NoSQL Databases Don’t Handle Relationships • No data structures to model or store relationships • No query constructs to support data relationships • Relating data requires “JOIN logic” in the application • No ACID support for transactions … making NoSQL databases inappropriate when data relationships are valuable in real-time UNIFIED, IN-MEMORY MAP Lightning-fast queries due to replicated in-memory architecture and index-free adjacency MACHINE 1 MACHINE 2 MACHINE 3 Slow queries due to index lookups + network hops Using Graph Using Other NoSQL to Join Data Q R Q R Relationship Queries on non-native Graph Architectures 3 5
  • 18. 6/28/2017 18 Neo4j Scalability Dynamic pointer compression Unlimited-sized graphs with no performance compromise Index partitioning Auto-partitioning of indexes into 2GB partitions Causal clustering architecture Enables unlimited read scaling with ACID writes and a choice of consistency levels Multi-Data Center Support Creates HA, Fault Tolerant Global Applications Efficient processing Native graph processing and storage often requires 10x less hardware Efficient storage One-tenth the disk and memory requirements of certain alternatives Neo4j Advantage – Scalability Neo4j Performance Improvements by Version 0 2000 4000 6000 8000 10000 12000 14000 Neo4j 2.2 Neo4j 2.3 Neo4j 3.0 Neo4j 3.1 Neo4j 3.2 Complex Mixed-Workload Throughput 32% 50% 27% 70% 320% Faster than 2.2
  • 19. 6/28/2017 19 Raft-based architecture • Continuously available • Consensus commits • Third-generation cluster architecture Cluster-aware stack • Seamless integration among drivers, Bolt protocol and cluster • No need for external load balancer • Stateful, cluster-aware sessions with encrypted connections Streamlined development • Relieves developers from complex infrastructure concerns • Faster and easier to develop distributed graph applications Neo4j Enterprise: Causal Clustering Architecture Modern and Fault-Tolerant to Guarantee Graph Safety 38 Neo4j Advantage – Scalability Global Cluster Topologies Geo Aware Load Balancing Tiered Replicas Full-Stack API US EAST GROUP UK GROUP HK GROUP SA GROUP Now in Neo4j 3.2: Multi-DC Clustering
  • 20. 6/28/2017 20 How Causally Consistent Reads Work App ServerApp Server DriverDriver 3: Review Profile 4: Create an order Async Replication Raft Replication 1: Read Product Catalog Core Server Core Server Replica Server App ServerApp Server DriverDriver App ServerApp Server DriverDriver ENTERPRISE EDITION 2. Create Account 5: Review orders How it Works: • Application chooses a consistency level “Read Any” vs “Read your own writes” • Cluster chooses appropriate members Default optimizes for scalability (i.e. read replica server for reads) Causal Clustering Enables: • Application-driven SLAs • Optimizing for freshness vs. cost • Tunability within an application On an application & session basis 1: Read any replica | 2: Write [Tx 101] | 3: RYOW*[Tx 101] | 4: Write [Tx 102] | 5: RYOW [Tx 102] Graph Transactions Over ACID Consistency Graph Transactions Over Non-ACID DBMSs 41 Maintains Integrity Over Time Eventual Consistency Becomes Corrupt Over Time The Importance of ACID Graph Writes • Ghost vertices • Stale indexes • Half-edges • Uni-directed ghost edges
  • 21. 6/28/2017 21 Summary of Neo4j: Built for the Enterprise Native Graph Storage Designed, built, and tested for graphs Native Graph Query Processing For real-time, relationship-based apps Evaluate millions of relationships in a blink Whiteboard-Friendly Data Modeling Faster projects compared to RDBMS Data Integrity and Security Fully ACID transactions, causal consistency and enterprise security Powerful, Expressive Query Language Improved productivity, with 10x to 100x less code than SQL Scalability and High Availability Architecture provides ideal balance of performance, availability, scale for graphs Built-in ETL Seamless import from other databases Integration Fits easily into your IT environment, with drivers and APIs for popular languages MATCH (A)42 Case Studies for Knowledge Graphs and Recommendation Engines Neo4j Case Studies
  • 24. 6/28/2017 24 Sys Admins Servers, on-premise virtual machines, cloud virtual machines, etc. Network Admins Switches, Routers, Egress Points App Admins I.e. Salesforce, Marketo, SAP, Oracle Apps, Tableau, SharePoint, DBA’s etc. Internal Users HR, Sales, Marketing, Data Analysts, E-staff etc. Numerous Customers & Partners Router Servers Servers Apps FirewallCloud Switch Apps Network Admins Switches, Routers, Egress Points Sys Admins Servers, on-premise virtual machines, cloud virtual machines, etc. App Admins I.e. Salesforce, Marketo, SAP, Oracle Apps, Tableau, SharePoint, DBA’s etc. Internal Users HR, Sales, Marketing, Data Analysts, E-staff etc.
  • 25. 6/28/2017 25 Router Servers Servers Apps FirewallCloud Switch Apps Network Admins Switches, Routers, Egress Points Sys Admins Servers, on-premise virtual machines, cloud virtual machines, etc. App Admins I.e. Salesforce, Marketo, SAP, Oracle Apps, Tableau, SharePoint, DBA’s etc. Internal Users HR, Sales, Marketing, Data Analysts, E-staff etc. Router Servers Servers Apps FirewallCloud Switch Apps Network Admins Switches, Routers, Egress Points Sys Admins Servers, on-premise virtual machines, cloud virtual machines, etc. App Admins I.e. Salesforce, Marketo, SAP, Oracle Apps, Tableau, SharePoint, DBA’s etc. Internal Users HR, Sales, Marketing, Data Analysts, E-staff etc.
  • 26. 6/28/2017 26 Router Servers Servers Apps FirewallCloud Switch Apps Network Admins Switches, Routers, Egress Points Sys Admins Servers, on-premise virtual machines, cloud virtual machines, etc. App Admins I.e. Salesforce, Marketo, SAP, Oracle Apps, Tableau, SharePoint, DBA’s etc. Internal Users HR, Sales, Marketing, Data Analysts, E-staff etc. Router Servers Servers Apps FirewallCloud Switch Apps Network Admins Switches, Routers, Egress Points Sys Admins Servers, on-premise virtual machines, cloud virtual machines, etc. App Admins I.e. Salesforce, Marketo, SAP, Oracle Apps, Tableau, SharePoint, DBA’s etc. Internal Users HR, Sales, Marketing, Data Analysts, E-staff etc.
  • 27. 6/28/2017 27 Background • Large Nordic Telecom Provider • 1M Broadband routers deployed in Sweden • Half of subscribership are over 55yrs old • Each household connects 10 devices • Goal to improve customer experience Business Problem • Broadband router enhancement to improve customer experience • Context-based in home services • How to build smart home platform that allows vendors to build new “home-centric” apps Solution and Benefits • New Features deployed to 1M homes • API-based platform for easy apps that: • Automatically assemble Spotify playlists based on who is in the house • Notify parents when children get home • Build smart shopping lists TELIA ZONE TELECOMMUNICATIONS Smart Home / Internet of Things55 EE Customer since 2016 Q4 Background • Large Public University – “U-Dub” • IT staff for 80K+ students and employees • Transforming IT systems from mainframe to cloud • Providing IT & data warehousing services to 3 campuses, 6 hospitals, and 6,300 EDW users Business Problem • Old Sharepoint metadata was too complicated for users, not flexible and not transparent • $1B project to migrate HR system from mainframe to Workday needed to be smooth • Future projects needed repeatable predictability • Needed new glossary, impact analysis, analytics Solution and Benefits • Consulted with NDU peers, built simple model • Built Visualizer with Elasticsearch, Neo4j & D3.js • Improved predictability, lineage, and impact understanding for over 6,300 users University of Washington EDUCATION & RESEARCH Metadata Management, IT & Network Operations56 CE Customer since 2016 Q1
  • 28. 6/28/2017 28 Background • Ad-Tech supplier in NYC identifies "intent signals" • Collects device-born consumer data from mobile, desktops & tablets • Contains device and buyer data on more than 90% of American households • Supersized Graph Business Problem • Recognize buyer receptivity to offers near time of purchase • Device data and consumer behaviors change frequently • Triangulate who is holding a device, where and when it happens, to signal active purchase intent, and create real-time offers to assist user Solution and Benefits • 3 Billion nodes, 9 billion relationships • 1 Billion daily transactions on 3 servers • Hybrid solution with Neo4j, Hadoop, Spark, MongoDB and Ruby • Breakthrough results from 60%-250% higher than industry benchmarks Qualia ADVERTISING TECHNOLOGY Social Network, Internet of Things, and Real-Time Buyer Identification57 EE Customer since 2014 Q3 Background • World's largest hospitality / hotel company • 7th largest web site on internet • 1.5 M hotel rooms offered online by 2018 • Revenue Management System that allows property managers to update their pricing rates Business Problem • Provide the right room & price at the right time • Old rate program was inflexible and bogged down as they increased the pricing options per property per day • Lay the path to be an innovator in the future Solution and Benefits • 2016-era rate program embeds Neo4j as "cache" • Created a graph per hotel for 4500 properties in 3 clusters • 1000% increase in volume over 4 years • 50% decrease in infrastructure costs • "Use Neo4j Support!" MARRIOTT TRAVEL & HOSPITALITY SERVICES Pricing Recommendations Engine58 EE Customer since 2014 Q2
  • 29. 6/28/2017 29 Background • Personal shopping assistant • Converses with buyer via text, picture and voice to provide real-time recommendations • Combines AI and natural language understanding (NLU) in Neo4j Knowledge Graph • First of many apps in eBay's AI Platform Business Problem • Improve personal context in online shopping • Transform buyer-provided context into ideal purchase recommendations over social platforms • "Feels like talking to a friend" Solution and Benefits • 3 developers, 8M nodes, 20M relationships • Needed high-performance traversals to respond to live customer requests • Easy to train new algorithms and grow model • Generating revenue since launch eBay ShopBot ONLINE RETAIL Knowledge Graph powers Real-Time Recommendations59 EE Customer since 2016 Q3 Case Study: Knowledge Graphs at eBay
  • 30. 6/28/2017 30 Case Study: Knowledge Graphs at eBay Case Study: Knowledge Graphs at eBay
  • 31. 6/28/2017 31 Case Study: Knowledge Graphs at eBay Bags Case Study: Knowledge Graphs at eBay
  • 32. 6/28/2017 32 Men’s Backpack Handbag Case Study: Knowledge Graphs at eBay Try it out at: Case Study: Knowledge Graphs at eBay
  • 33. 6/28/2017 33 Enterprise or Community Edition Summary Enterprise-Class Technology Ready for real-time enterprise applications Performance and Scalability • Clustered replication across data centers • Unlimited graph sizes • Intelligent online space reuse • Enterprise lock manager • Compiled runtime for common queries • Kerberos authentication add-on • Clustering on CAPI flash add-on Monitoring and Administration • Advanced monitoring by role • Cypher query tracing • Hot backups • Enterprise security Enterprise Schema Governance • Property existence constraints • Composite and node key constraints 68
  • 34. 6/28/2017 34 Features in Community and Enterprise Editions 69 Both Editions—GRAPH Features Database Features Architecture Features Labeled Property Graph Model ACID Transactions Language drivers for Java, Python, C# & JavaScript Native Graph Processing & Storage High-performance Native API HTTPS plug-in Graph Query Language “Cypher” High-performance caching REST API Neo4j Browser w/ Syntax Highlighting Cost-based query optimizer RPM, Azure & AWS Cloud Delivery Fast Writes via Native Label Index Fast Reads via Composite Indexes Enterprise Edition—GRAPH Features Database Features Architecture Features Database storage reallocation Query monitoring with enriched metrics Enterprise Lock Manger accesses all available cores on server Cypher query tracing Compiled Cypher Runtime to accelerate common queries Causal Clustering, core and read-replica design Node Key schema constraints User & role-based security Multi-Data Center Support for global scale Property existence constraints LDAP & Active Directory Integration Driver-based load balancing Kerberos Security plug-in Driver-based Causal Clustering API exposes routing logic Bold is new in 3.2 Licensing Options 70 Edition / Program Audience License Price Point Community Edition IT Developers GPLv3 Free Enterprise Edition Fair Trade Projects AGPL3 Free, but must publish source code Enterprise Edition Real-time applications Commercial ~$500/month/core Early Startups Early Stage, <20 employees Commercial Free until traction established Startups w/ Traction <3M ARR Commercial $1,500/month Most deployments require only 3 server cluster for fault tolerance & HA
  • 35. 6/28/2017 35 Enterprise-Class Expertise Neo4j Customer Success Expert design, development and deployment services • Graph and application design • Application deployment • Data center configuration • Developer and user training • World-class support with SLAs • Support portal and knowledge base Graph Innovation Network Worldwide community of Neo4j and graph database experts • Service providers • OEMs and VARs • Technology partners • Open source community Use Neo4j experts and join the Innovation Network. Develop your apps right the first time. 71 The Largest Graph Innovation Network 3,000,000+ with 50k additional per month Neo4j Downloads 3,000,000+ with 50k additional per month Neo4j Downloads 225+ customers 50% from Global 2000 225+ customers 50% from Global 2000 100+ Technology and Services Partners 100+ Technology and Services Partners 450+ annual events & 10k attendees Graph and Neo4j awareness and training 450+ annual events & 10k attendees Graph and Neo4j awareness and training 43,000+ Neo4j Meetup Members 43,000+ Neo4j Meetup Members 50,000+ Online and Classroom Education Registrants 50,000+ Online and Classroom Education Registrants
  • 36. 6/28/2017 36 Users Love Neo4j Graph Visionaries Enterprise Customers Graph Visionaries Enterprise Customers 74 Partners System Integrators Trainers OEMs Partners System Integrators Trainers OEMs Cloud IaaS, PaaSm, DBaaS Marketplace Cloud IaaS, PaaSm, DBaaS Marketplace OSS Community Events Forums Add-Ons The Density of the Neo4j Innovation Network Tech Ecosystem OEM & Tech Partners Tech Ecosystem OEM & Tech Partners Graph Solutions Data Science Architecture Data Models Graph Solutions Data Science Architecture Data Models Commercial Support Technical Support Packaged Services Custom Services Commercial Support Technical Support Packaged Services Custom Services Education Documents Online Training Classroom Custom Onsite Education Documents Online Training Classroom Custom Onsite Standards Initiatives openCypher, LDBS Standards Initiatives openCypher, LDBS
  • 37. 6/28/2017 37 The Connected Enterprise Value Proposition Fastest path to Graph Success Graph Expertise Graph Expertise Graph Database Platform Graph Database Platform Innovation Network Innovation Network Enterprise-Grade Innovation Launchpad • Neo4j Enterprise Edition • HA, Causal Cluster, MDC • Better performance • Hardened product The Next Innovation • Density of the network accelerates innovation opportunity • Thousands of project successes • Partners, Service Providers, Vendors, Academics, Researchers Millions of Graph Hours • Shrink learning curve • Design advice • Contextual experience • Deploy & Ops support 75 Neo4j Commercial Value Analysts are Invited to Attend GraphConnect NYC 76
  • 38. 6/28/2017 38 Case Studies for Knowledge Graphs and Recommendation Engines Neo4j Case Studies