SlideShare a Scribd company logo
Choosing the Right Data Management Architecture

for Cognitive Computing

Adrian Bowles, PhD

Founder, STORM Insights, Inc.

Lead Analyst, AI, Aragon Research

info@storminsights.com
Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
OCTOBER 12, 2017
Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
AGENDA - CHOOSING THE RIGHT DATA MANAGEMENT ARCHITECTURE FOR COGNITIVE COMPUTING
The Role of Data In AI & CC

What do we need to manage?

Application, Data, and Algorithm Attributes that Influence Architecture

Database Options

Open Source Infrastructure

Prebuilt Knowledge

Getting Started: Basic Principles
Model
Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
COGNITIVE COMPUTING FUNDAMENTALS: MODELS & ASSUMPTIONS
Model
The Corpus, Assumptions, Algorithms
Used to
Generate & Score Hypotheses
or
Calculate The Strength of a Relationship
Principles that control the
development and representation
of natural intelligence in the
neocortex provide a guide to the
implementation of machine
intelligence.(Numenta
Hierarchical Temporal Memory)
A function applied to a string
representing data or a concept
results in a value or vector
meaningful for comparison.
A Model is an Abstract Representation of Reality
Essential Data for
Cognitive Computing
Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
MODELS WILL MAKE OR BREAK YOUR APPLICATION
Your Model The Real World
“When the map and the terrain disagree, believe the terrain.”
Gause and Weinberg (Exploring Requirements)
Systems
Controls
Learn
Plan Reason
Understand
Model
Data Mgmt
Human
Machine
Input Output
Gestures
Emotions
Language
Narrative Generation
Visualization
Reports
Haptics
Sensors
(IOT)
Systems
Controls
Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
COGNITIVE SYSTEMS: COMMUNICATIONS & CONTROL
Perception
Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
WHERE YOU ARE DICTATES WHAT YOU NEED
Ingest Analyze Maintain/Manage
When everything is connected…

New sources of data emerge

New sources of value emerge

Old assumptions must be challenged
Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
THE IMPACT OF THE IOT
CHOICES HAVE CONSEQUENCES
How You Think About a Domain…
…influences your choice of maps and models…

rules and representations…and required operations.
HOW YOU ORGANIZE CONSTRAINS HOW YOU WORK - DESIGN WORKFLOW FIRST
Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
START WITH A TAXONOMY
A taxonomy represents the formal structure of classes or types of objects within a domain. 

•Generally hierarchical and provide names for each class in the domain. 

•May also capture the membership properties of each object in relation to the other objects. 

•The rules of a specific taxonomy are used to classify or categorize any object in the domain, so
they must be complete, consistent, and unambiguous. This rigor in specification should ensure that
any newly discovered object must fit into one, and only one, category or object class.
Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
ONTOLOGIES
An ontology formalizes and specifies the names, definitions,
and attributes of entities within a domain. For practical
purposes, an accepted ontology defines the domain.
Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
RDF - Resource Description Framework - A directed, labeled graph.
RDFS - RDF Specifications Suite Recommendations (Language for representing RDF
vocabularies)
SPARQL - A Semantic Protocol & Query Language for RDF Data
OWL - The Web Ontology Language is a Semantic We
language designed to represent knowledge about things
and relationships between things on the Web.
An OWL Document is an Ontology.
https://www.w3.org/2013/data/
THE SEMANTIC WEB - ALL DATA SHOULD BE ASSOCIATED WITH SEMANTIC ATTRIBUTES (MEANING)
BASICS OF THE W3C SEMANTIC WEB ONTOLOGY STACK
Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
CRITICAL QUESTIONS…
What data do we need?

What data will be produced?

Where does the data get created?

Where does the data get analyzed/refined?

How do we present/output the data?
And for each data category & data lifecycle phase,

What does it look like?

How much is there?
Architectural
Influences
Copyright (c) 2014-2017 by STORM Insights Inc. All Rights Reserved.
DEEP STRUCTURE REQUIRES STRONGER METHODS FOR ANALYSIS
Perception: obvious
structure is easy to
process…
but most of the
interesting stuff isn’t
obvious to a
computer.
Issue:
Do we store or
generate all
intermediate forms?
STATIC
DIVERTED OR
SAMPLED
STREAMINGIN MOTION
STOP AND FRISK
STORED
DATA - SLOTH KILLS
To understand (analyze) data…
Divert the flow?
Pool the data?
Evaluate everything without changing the flow?
Sample? (catch and release?)
Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
COMPLEXITY VS MOBILITY
CCTV SmartPhone
Traffic
Counter Fitbit
Data
Complexity
Stationary Mobile
Low
High
Weather
Station
Telematic
Device
DATA ATTRIBUTES DICTATE ARCHITECTURE CHOICES
Speed
Streaming
Structure/Complexity
Surface_Shallow Dense_Deep
Static
Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
DATA LOCATION INFLUENCES ARCHITECTURE CHOICES
Speed
Streaming
Location
Sensor Gateway Cloud Data Center
Static
Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
ALGORITHM ATTRIBUTES DICTATE ARCHITECTURE CHOICES
Parallelism
Embarrassing
Computational Complexity
n
Sequential
Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
(Parallelism and computational
complexity are not actually
orthogonal…)
p(polynomial)
Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
DATABASE OPTIONS
What Do You Want/Need to Store?

How much? How complex? How fast?
What Do You Want/Need to DO With What You Store?
Options Include…
Files, tables, trees, queues, stacks, lists…

Hierarchical

RDBMS

Object DBMS

NoSQL

Graph
How You Think About a Domain…
…influences your choice of maps and models…

rules and representations…and required operations.
Data Management
Options
Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
EVOLUTION OF DATA MANAGEMENT SOLUTIONS
Images courtesy of Wikipedia
Today:
Delta Airlines processes 5,000,000 business events per day
Pratt & Whitney jet engine: 5,000 sensors producing 10GB/s/per engine.
Formula 1 car sensors produce about 1.2GB/s
and we need to predict the future…
Perform Operations on Data at Rest
Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
GRAPH DATABASES FOR GRAPH DATA!
Why choose a graph database?
Speed to delivery when the data is naturally modeled as a graph

Simplifies multi-hop queries

Visualization? Baked-in
Do you need an on-premise solution, or to manage your own database?
You Probably Already Think In Graphs if…
You watch detective shows
You remember relationships between people
You took a biology class
Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
Wikipedia contributors. "Taxonomy (biology)." Wikipedia,
The Free Encyclopedia. Wikipedia, The Free Encyclopedia,
11 May. 2016. Web. 12 May. 2016.
GRAPHS 101
Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
Typical crazy wall whiteboard - from Fargo.
A screen from IBM I2 Coplink
GRAPHS 101
Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
GRAPHS 101
Family Tree
LinkedIn Tree
GRAPHS SHOULD BE PART OF YOUR TOOLKIT
A graph is a structure with vertices and edges.
a
e
dc
b
Old Post Road
Cross Highway
Compo
Shinbone Alley
Elk Road
Old Post Road Paved

Old Post Road 11 miles

Elk Road Dirt

Elk Road 2 miles

Cross Highway toll road

Cross Highway 250 miles

Main Street 1 mile

Shinbone Alley .5 miles

a bus stop

b gas station

b Shell

c Elementary school

d House

e Office building
May be labeled, edges may be directed, all may
be stored/processed by properties
represented as key/value pairs.
Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
GRAPHS HAVE RELEVANT MATHEMATICAL PROPERTIES
e.g. If you represent a graph as a matrix M, then values in Mn
represent the number of paths of length n in the original graph.
a
e
dc
b
a b c d e
a 1
b 1
c 1
d 1
e 1
M =
Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
OVERVIEW OF THE GRAPH DATABASE MARKET
Wikipedia contributors. "Graph database." Wikipedia, The Free Encyclopedia. Wikipedia, The Free Encyclopedia, 11
Property
graph
RDF
RDF - Resource Description Framework, W3C specs for
metadata modeling, now used in knowledge management
Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
OPEN SOURCE FOR GRAPH DATA
Apache TinkerPop, TinkerPop, Apache, Apache feather logo, and Apache TinkerPop project logo are
either registered trademarks or trademarks of The Apache Software Foundation in the United States and other countries.
Apache TinkerPop™ is a graph computing framework for both
graph databases (OLTP) and graph analytic systems (OLAP).
“A graph is a structure composed of vertices and edges. Both vertices and edges
can have an arbitrary number of key/value-pairs called properties. Vertices denote
discrete objects such as a person, a place, or an event. Edges denote relationships
between vertices. For instance, a person may know another person, have been
involved in an event, and/or was recently at a particular place. Properties express
non-relational information about the vertices and edges. Example properties include
a vertex having a name, an age and an edge having a timestamp and/or a weight.
Together, the aforementioned graph is known as a property graph and it is the
foundational data structure of Apache TinkerPop.”
Apache TinkerPop™ is an open source, vendor-agnostic, graph computing
framework distributed under the commercial friendly Apache2 license. When a data
system is TinkerPop-enabled, its users are able to model their domain as a graph
and analyze that graph using the Gremlin graph traversal language.
OPEN SOURCE PROJECTS
Apache Spark
Registered trademarks or trademarks of The Apache Software Foundation
UIMA
Hadoop
Open Source
for Infrastructure
RELEVANT APACHE SOFTWARE FOUNDATION OPEN SOURCE PROJECTS
Apache Storm: “a free and open source distributed realtime
computation system. Storm makes it easy to reliably process
unbounded streams of data, doing for realtime processing what
Hadoop did for batch processing.”
Apache Spark Streaming: “Spark Streaming brings Apache
Spark's language-integrated API to stream processing, letting you
write streaming jobs the same way you write batch jobs.”
Registered trademarks or trademarks of The Apache Software Foundation
RELEVANT APACHE SOFTWARE FOUNDATION OPEN SOURCE PROJECTS
Apache Flink: “open-source stream processing framework for
distributed, high-performing, always-available, and accurate data
streaming applications.”
Apache Samza: “a distributed stream processing framework. It
uses Apache Kafka for messaging, and Apache Hadoop YARN to
provide fault tolerance, processor isolation, security, and resource
management.”
Apache Apex: “Enterprise-grade unified stream and batch
processing engine.”
Registered trademarks or trademarks of The Apache Software Foundation
Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
USE PRE-BUILT KNOWLEDGE RESOURCES
Off The Shelf
Knowledge
Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
OPENCYC
Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
OFF THE SHELF KNOWLEDGE - NEED TO ASSOCIATE/RECOGNIZE/UNDERSTAND TO
ORGANIZE/REPRESENT
Wordnet(R) Princeton
University "About WordNet." 

Princeton University. 2010.
<http://
wordnet.princeton.edu>
Do you have or can you capture streaming data that can increase your value proposition?
Data about your product that can improve performance, reliability, predictability…
Can you create value from new analysis of open data?
Adding your own data/algorithms to open data creates value.
Start by evaluating the emerging open source de facto standards.
Choose an infrastructure that allows you to evaluate live streaming data in the context of
relevant historical data.
It’s All About the Data
GETTING STARTED…
Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
Basic
Principles
Today:
Delta Airlines processes
5,000,000 business events per day
Pratt & Whitney jet engine:
5,000 sensors producing
10GB/s/per engine.
Formula 1 car sensors produce
about 1.2GB/s
and we need to predict the future…
Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
AS THE SCOPE CHANGES, SO MUST THE SOLUTIONS
Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
PRODUCTION ARCHITECTURE VS TRAINING ARCHITECTURE: CHALLENGE YOUR ASSUMPTIONS
In Production, 

May Scale UP or DOWN.
Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
The SourceFog Cloud
Data CenterGateway
SHOULD YOU MOVE THE COMPUTATION TO THE DATA, OR DATA TO THE PROCESSOR?
STREAMING
ANALYTICS
DESCRIPTIVE
DATA
PREDICTIVE
STREAMING ANALYTICS: MOVE THE PROCESS TO THE DATA
STREAMING ANALYTICS: STATISTICAL ANALYSIS OF DATA IN MOTION
Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
3-TIER IOT ARCHITECTURE ENABLES DISTRIBUTED INTELLIGENCE & ANALYTICS
Sensors/
Devices
Train the Deep Learning Model
Data Center

Cloud

Cluster

Network
Compress & Run

The DL Model
DataSources
Store
Process/Transform
Observe
Key
Data Flows on the Edges, Queries Everywhere
Sampling vs Monitoring Everything…
Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
STREAMING DATA ARCHITECTURE
PRIMUS INTER PARES
Cloud First!
Mobile First!
AI First!
Data First!!!
Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
6 RECOMMENDATIONS
Define Your Application Requirements in Terms of Data

Streaming? Plan for it

Process/Analyze As Close to the Source as Possible

Move Intelligence To The Edge (Fog)

Parallelism in Algorithms? Exploit it with hardware

Start With Open Source for Infrastructure
adrian@storminsights.com
Twitter @ajbowles
Skype ajbowles
If you would like to connect on LinkedIn,
please let me know that you that you
registered for the Smart Data webinar series.
NEXT WEEK…
October 18 Enterprise Analytics Online


1PM Eastern: 

ModernAI From Machine Learning to Cognitive Computing
KEEP IN TOUCH
Upcoming SmartData Webinar Dates & Topics
Nov. 9 See Me Feel Me, Touch Me, Heal Me:

The Rise of the Cognitive Interface

Dec. 14 The Road to Autonomous Applications

Jan. 11 AI At The Edge:

Pushing Intelligence to Fog Computing Nodes

More Related Content

Smart Data Webinar: Choosing the Right Data Management Architecture for Cognitive Computing

  • 1. Choosing the Right Data Management Architecture for Cognitive Computing Adrian Bowles, PhD Founder, STORM Insights, Inc. Lead Analyst, AI, Aragon Research info@storminsights.com Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved. OCTOBER 12, 2017
  • 2. Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved. AGENDA - CHOOSING THE RIGHT DATA MANAGEMENT ARCHITECTURE FOR COGNITIVE COMPUTING The Role of Data In AI & CC What do we need to manage? Application, Data, and Algorithm Attributes that Influence Architecture Database Options Open Source Infrastructure Prebuilt Knowledge Getting Started: Basic Principles
  • 3. Model Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved. COGNITIVE COMPUTING FUNDAMENTALS: MODELS & ASSUMPTIONS Model The Corpus, Assumptions, Algorithms Used to Generate & Score Hypotheses or Calculate The Strength of a Relationship Principles that control the development and representation of natural intelligence in the neocortex provide a guide to the implementation of machine intelligence.(Numenta Hierarchical Temporal Memory) A function applied to a string representing data or a concept results in a value or vector meaningful for comparison. A Model is an Abstract Representation of Reality Essential Data for Cognitive Computing
  • 4. Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved. MODELS WILL MAKE OR BREAK YOUR APPLICATION Your Model The Real World “When the map and the terrain disagree, believe the terrain.” Gause and Weinberg (Exploring Requirements)
  • 5. Systems Controls Learn Plan Reason Understand Model Data Mgmt Human Machine Input Output Gestures Emotions Language Narrative Generation Visualization Reports Haptics Sensors (IOT) Systems Controls Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved. COGNITIVE SYSTEMS: COMMUNICATIONS & CONTROL Perception
  • 6. Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved. WHERE YOU ARE DICTATES WHAT YOU NEED Ingest Analyze Maintain/Manage
  • 7. When everything is connected… New sources of data emerge New sources of value emerge Old assumptions must be challenged Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved. THE IMPACT OF THE IOT
  • 8. CHOICES HAVE CONSEQUENCES How You Think About a Domain… …influences your choice of maps and models… rules and representations…and required operations.
  • 9. HOW YOU ORGANIZE CONSTRAINS HOW YOU WORK - DESIGN WORKFLOW FIRST Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
  • 10. Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved. START WITH A TAXONOMY A taxonomy represents the formal structure of classes or types of objects within a domain. •Generally hierarchical and provide names for each class in the domain. •May also capture the membership properties of each object in relation to the other objects. •The rules of a specific taxonomy are used to classify or categorize any object in the domain, so they must be complete, consistent, and unambiguous. This rigor in specification should ensure that any newly discovered object must fit into one, and only one, category or object class.
  • 11. Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved. ONTOLOGIES An ontology formalizes and specifies the names, definitions, and attributes of entities within a domain. For practical purposes, an accepted ontology defines the domain.
  • 12. Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved. RDF - Resource Description Framework - A directed, labeled graph. RDFS - RDF Specifications Suite Recommendations (Language for representing RDF vocabularies) SPARQL - A Semantic Protocol & Query Language for RDF Data OWL - The Web Ontology Language is a Semantic We language designed to represent knowledge about things and relationships between things on the Web. An OWL Document is an Ontology. https://www.w3.org/2013/data/ THE SEMANTIC WEB - ALL DATA SHOULD BE ASSOCIATED WITH SEMANTIC ATTRIBUTES (MEANING) BASICS OF THE W3C SEMANTIC WEB ONTOLOGY STACK
  • 13. Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved. CRITICAL QUESTIONS… What data do we need? What data will be produced? Where does the data get created? Where does the data get analyzed/refined? How do we present/output the data? And for each data category & data lifecycle phase, What does it look like? How much is there? Architectural Influences
  • 14. Copyright (c) 2014-2017 by STORM Insights Inc. All Rights Reserved. DEEP STRUCTURE REQUIRES STRONGER METHODS FOR ANALYSIS Perception: obvious structure is easy to process… but most of the interesting stuff isn’t obvious to a computer. Issue: Do we store or generate all intermediate forms?
  • 15. STATIC DIVERTED OR SAMPLED STREAMINGIN MOTION STOP AND FRISK STORED DATA - SLOTH KILLS To understand (analyze) data… Divert the flow? Pool the data? Evaluate everything without changing the flow? Sample? (catch and release?) Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
  • 16. Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved. COMPLEXITY VS MOBILITY CCTV SmartPhone Traffic Counter Fitbit Data Complexity Stationary Mobile Low High Weather Station Telematic Device
  • 17. DATA ATTRIBUTES DICTATE ARCHITECTURE CHOICES Speed Streaming Structure/Complexity Surface_Shallow Dense_Deep Static Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
  • 18. DATA LOCATION INFLUENCES ARCHITECTURE CHOICES Speed Streaming Location Sensor Gateway Cloud Data Center Static Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
  • 19. ALGORITHM ATTRIBUTES DICTATE ARCHITECTURE CHOICES Parallelism Embarrassing Computational Complexity n Sequential Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved. (Parallelism and computational complexity are not actually orthogonal…) p(polynomial)
  • 20. Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved. DATABASE OPTIONS What Do You Want/Need to Store? How much? How complex? How fast? What Do You Want/Need to DO With What You Store? Options Include… Files, tables, trees, queues, stacks, lists… Hierarchical RDBMS Object DBMS NoSQL Graph How You Think About a Domain… …influences your choice of maps and models… rules and representations…and required operations. Data Management Options
  • 21. Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved. EVOLUTION OF DATA MANAGEMENT SOLUTIONS Images courtesy of Wikipedia Today: Delta Airlines processes 5,000,000 business events per day Pratt & Whitney jet engine: 5,000 sensors producing 10GB/s/per engine. Formula 1 car sensors produce about 1.2GB/s and we need to predict the future… Perform Operations on Data at Rest
  • 22. Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved. GRAPH DATABASES FOR GRAPH DATA! Why choose a graph database? Speed to delivery when the data is naturally modeled as a graph Simplifies multi-hop queries Visualization? Baked-in Do you need an on-premise solution, or to manage your own database? You Probably Already Think In Graphs if… You watch detective shows You remember relationships between people You took a biology class
  • 23. Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved. Wikipedia contributors. "Taxonomy (biology)." Wikipedia, The Free Encyclopedia. Wikipedia, The Free Encyclopedia, 11 May. 2016. Web. 12 May. 2016. GRAPHS 101
  • 24. Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved. Typical crazy wall whiteboard - from Fargo. A screen from IBM I2 Coplink GRAPHS 101
  • 25. Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved. GRAPHS 101 Family Tree LinkedIn Tree
  • 26. GRAPHS SHOULD BE PART OF YOUR TOOLKIT A graph is a structure with vertices and edges. a e dc b Old Post Road Cross Highway Compo Shinbone Alley Elk Road Old Post Road Paved Old Post Road 11 miles Elk Road Dirt Elk Road 2 miles Cross Highway toll road Cross Highway 250 miles Main Street 1 mile Shinbone Alley .5 miles a bus stop b gas station b Shell c Elementary school d House e Office building May be labeled, edges may be directed, all may be stored/processed by properties represented as key/value pairs.
  • 27. Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved. GRAPHS HAVE RELEVANT MATHEMATICAL PROPERTIES e.g. If you represent a graph as a matrix M, then values in Mn represent the number of paths of length n in the original graph. a e dc b a b c d e a 1 b 1 c 1 d 1 e 1 M =
  • 28. Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved. OVERVIEW OF THE GRAPH DATABASE MARKET Wikipedia contributors. "Graph database." Wikipedia, The Free Encyclopedia. Wikipedia, The Free Encyclopedia, 11 Property graph RDF RDF - Resource Description Framework, W3C specs for metadata modeling, now used in knowledge management
  • 29. Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved. OPEN SOURCE FOR GRAPH DATA Apache TinkerPop, TinkerPop, Apache, Apache feather logo, and Apache TinkerPop project logo are either registered trademarks or trademarks of The Apache Software Foundation in the United States and other countries. Apache TinkerPop™ is a graph computing framework for both graph databases (OLTP) and graph analytic systems (OLAP). “A graph is a structure composed of vertices and edges. Both vertices and edges can have an arbitrary number of key/value-pairs called properties. Vertices denote discrete objects such as a person, a place, or an event. Edges denote relationships between vertices. For instance, a person may know another person, have been involved in an event, and/or was recently at a particular place. Properties express non-relational information about the vertices and edges. Example properties include a vertex having a name, an age and an edge having a timestamp and/or a weight. Together, the aforementioned graph is known as a property graph and it is the foundational data structure of Apache TinkerPop.” Apache TinkerPop™ is an open source, vendor-agnostic, graph computing framework distributed under the commercial friendly Apache2 license. When a data system is TinkerPop-enabled, its users are able to model their domain as a graph and analyze that graph using the Gremlin graph traversal language.
  • 30. OPEN SOURCE PROJECTS Apache Spark Registered trademarks or trademarks of The Apache Software Foundation UIMA Hadoop Open Source for Infrastructure
  • 31. RELEVANT APACHE SOFTWARE FOUNDATION OPEN SOURCE PROJECTS Apache Storm: “a free and open source distributed realtime computation system. Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing.” Apache Spark Streaming: “Spark Streaming brings Apache Spark's language-integrated API to stream processing, letting you write streaming jobs the same way you write batch jobs.” Registered trademarks or trademarks of The Apache Software Foundation
  • 32. RELEVANT APACHE SOFTWARE FOUNDATION OPEN SOURCE PROJECTS Apache Flink: “open-source stream processing framework for distributed, high-performing, always-available, and accurate data streaming applications.” Apache Samza: “a distributed stream processing framework. It uses Apache Kafka for messaging, and Apache Hadoop YARN to provide fault tolerance, processor isolation, security, and resource management.” Apache Apex: “Enterprise-grade unified stream and batch processing engine.” Registered trademarks or trademarks of The Apache Software Foundation
  • 33. Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved. USE PRE-BUILT KNOWLEDGE RESOURCES Off The Shelf Knowledge
  • 34. Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved. OPENCYC
  • 35. Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved. OFF THE SHELF KNOWLEDGE - NEED TO ASSOCIATE/RECOGNIZE/UNDERSTAND TO ORGANIZE/REPRESENT Wordnet(R) Princeton University "About WordNet." Princeton University. 2010. <http:// wordnet.princeton.edu>
  • 36. Do you have or can you capture streaming data that can increase your value proposition? Data about your product that can improve performance, reliability, predictability… Can you create value from new analysis of open data? Adding your own data/algorithms to open data creates value. Start by evaluating the emerging open source de facto standards. Choose an infrastructure that allows you to evaluate live streaming data in the context of relevant historical data. It’s All About the Data GETTING STARTED… Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved. Basic Principles
  • 37. Today: Delta Airlines processes 5,000,000 business events per day Pratt & Whitney jet engine: 5,000 sensors producing 10GB/s/per engine. Formula 1 car sensors produce about 1.2GB/s and we need to predict the future… Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved. AS THE SCOPE CHANGES, SO MUST THE SOLUTIONS
  • 38. Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved. PRODUCTION ARCHITECTURE VS TRAINING ARCHITECTURE: CHALLENGE YOUR ASSUMPTIONS In Production, May Scale UP or DOWN.
  • 39. Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved. The SourceFog Cloud Data CenterGateway SHOULD YOU MOVE THE COMPUTATION TO THE DATA, OR DATA TO THE PROCESSOR?
  • 40. STREAMING ANALYTICS DESCRIPTIVE DATA PREDICTIVE STREAMING ANALYTICS: MOVE THE PROCESS TO THE DATA STREAMING ANALYTICS: STATISTICAL ANALYSIS OF DATA IN MOTION Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
  • 41. Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved. 3-TIER IOT ARCHITECTURE ENABLES DISTRIBUTED INTELLIGENCE & ANALYTICS Sensors/ Devices Train the Deep Learning Model Data Center Cloud Cluster Network Compress & Run The DL Model
  • 42. DataSources Store Process/Transform Observe Key Data Flows on the Edges, Queries Everywhere Sampling vs Monitoring Everything… Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved. STREAMING DATA ARCHITECTURE
  • 43. PRIMUS INTER PARES Cloud First! Mobile First! AI First! Data First!!! Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
  • 44. Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved. 6 RECOMMENDATIONS Define Your Application Requirements in Terms of Data
 Streaming? Plan for it Process/Analyze As Close to the Source as Possible
 Move Intelligence To The Edge (Fog) Parallelism in Algorithms? Exploit it with hardware
 Start With Open Source for Infrastructure
  • 45. adrian@storminsights.com Twitter @ajbowles Skype ajbowles If you would like to connect on LinkedIn, please let me know that you that you registered for the Smart Data webinar series. NEXT WEEK… October 18 Enterprise Analytics Online 
 1PM Eastern: ModernAI From Machine Learning to Cognitive Computing KEEP IN TOUCH Upcoming SmartData Webinar Dates & Topics Nov. 9 See Me Feel Me, Touch Me, Heal Me:
 The Rise of the Cognitive Interface
 Dec. 14 The Road to Autonomous Applications
 Jan. 11 AI At The Edge: Pushing Intelligence to Fog Computing Nodes