Smart Data Webinar: Choosing the Right Data Management Architecture for Cognitive Computing
- 1. Choosing the Right Data Management Architecture
for Cognitive Computing
Adrian Bowles, PhD
Founder, STORM Insights, Inc.
Lead Analyst, AI, Aragon Research
info@storminsights.com
Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
OCTOBER 12, 2017
- 2. Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
AGENDA - CHOOSING THE RIGHT DATA MANAGEMENT ARCHITECTURE FOR COGNITIVE COMPUTING
The Role of Data In AI & CC
What do we need to manage?
Application, Data, and Algorithm Attributes that Influence Architecture
Database Options
Open Source Infrastructure
Prebuilt Knowledge
Getting Started: Basic Principles
- 3. Model
Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
COGNITIVE COMPUTING FUNDAMENTALS: MODELS & ASSUMPTIONS
Model
The Corpus, Assumptions, Algorithms
Used to
Generate & Score Hypotheses
or
Calculate The Strength of a Relationship
Principles that control the
development and representation
of natural intelligence in the
neocortex provide a guide to the
implementation of machine
intelligence.(Numenta
Hierarchical Temporal Memory)
A function applied to a string
representing data or a concept
results in a value or vector
meaningful for comparison.
A Model is an Abstract Representation of Reality
Essential Data for
Cognitive Computing
- 4. Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
MODELS WILL MAKE OR BREAK YOUR APPLICATION
Your Model The Real World
“When the map and the terrain disagree, believe the terrain.”
Gause and Weinberg (Exploring Requirements)
- 6. Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
WHERE YOU ARE DICTATES WHAT YOU NEED
Ingest Analyze Maintain/Manage
- 7. When everything is connected…
New sources of data emerge
New sources of value emerge
Old assumptions must be challenged
Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
THE IMPACT OF THE IOT
- 8. CHOICES HAVE CONSEQUENCES
How You Think About a Domain…
…influences your choice of maps and models…
rules and representations…and required operations.
- 9. HOW YOU ORGANIZE CONSTRAINS HOW YOU WORK - DESIGN WORKFLOW FIRST
Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
- 10. Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
START WITH A TAXONOMY
A taxonomy represents the formal structure of classes or types of objects within a domain.
•Generally hierarchical and provide names for each class in the domain.
•May also capture the membership properties of each object in relation to the other objects.
•The rules of a specific taxonomy are used to classify or categorize any object in the domain, so
they must be complete, consistent, and unambiguous. This rigor in specification should ensure that
any newly discovered object must fit into one, and only one, category or object class.
- 11. Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
ONTOLOGIES
An ontology formalizes and specifies the names, definitions,
and attributes of entities within a domain. For practical
purposes, an accepted ontology defines the domain.
- 12. Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
RDF - Resource Description Framework - A directed, labeled graph.
RDFS - RDF Specifications Suite Recommendations (Language for representing RDF
vocabularies)
SPARQL - A Semantic Protocol & Query Language for RDF Data
OWL - The Web Ontology Language is a Semantic We
language designed to represent knowledge about things
and relationships between things on the Web.
An OWL Document is an Ontology.
https://www.w3.org/2013/data/
THE SEMANTIC WEB - ALL DATA SHOULD BE ASSOCIATED WITH SEMANTIC ATTRIBUTES (MEANING)
BASICS OF THE W3C SEMANTIC WEB ONTOLOGY STACK
- 13. Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
CRITICAL QUESTIONS…
What data do we need?
What data will be produced?
Where does the data get created?
Where does the data get analyzed/refined?
How do we present/output the data?
And for each data category & data lifecycle phase,
What does it look like?
How much is there?
Architectural
Influences
- 14. Copyright (c) 2014-2017 by STORM Insights Inc. All Rights Reserved.
DEEP STRUCTURE REQUIRES STRONGER METHODS FOR ANALYSIS
Perception: obvious
structure is easy to
process…
but most of the
interesting stuff isn’t
obvious to a
computer.
Issue:
Do we store or
generate all
intermediate forms?
- 15. STATIC
DIVERTED OR
SAMPLED
STREAMINGIN MOTION
STOP AND FRISK
STORED
DATA - SLOTH KILLS
To understand (analyze) data…
Divert the flow?
Pool the data?
Evaluate everything without changing the flow?
Sample? (catch and release?)
Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
- 16. Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
COMPLEXITY VS MOBILITY
CCTV SmartPhone
Traffic
Counter Fitbit
Data
Complexity
Stationary Mobile
Low
High
Weather
Station
Telematic
Device
- 17. DATA ATTRIBUTES DICTATE ARCHITECTURE CHOICES
Speed
Streaming
Structure/Complexity
Surface_Shallow Dense_Deep
Static
Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
- 18. DATA LOCATION INFLUENCES ARCHITECTURE CHOICES
Speed
Streaming
Location
Sensor Gateway Cloud Data Center
Static
Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
- 19. ALGORITHM ATTRIBUTES DICTATE ARCHITECTURE CHOICES
Parallelism
Embarrassing
Computational Complexity
n
Sequential
Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
(Parallelism and computational
complexity are not actually
orthogonal…)
p(polynomial)
- 20. Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
DATABASE OPTIONS
What Do You Want/Need to Store?
How much? How complex? How fast?
What Do You Want/Need to DO With What You Store?
Options Include…
Files, tables, trees, queues, stacks, lists…
Hierarchical
RDBMS
Object DBMS
NoSQL
Graph
How You Think About a Domain…
…influences your choice of maps and models…
rules and representations…and required operations.
Data Management
Options
- 21. Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
EVOLUTION OF DATA MANAGEMENT SOLUTIONS
Images courtesy of Wikipedia
Today:
Delta Airlines processes 5,000,000 business events per day
Pratt & Whitney jet engine: 5,000 sensors producing 10GB/s/per engine.
Formula 1 car sensors produce about 1.2GB/s
and we need to predict the future…
Perform Operations on Data at Rest
- 22. Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
GRAPH DATABASES FOR GRAPH DATA!
Why choose a graph database?
Speed to delivery when the data is naturally modeled as a graph
Simplifies multi-hop queries
Visualization? Baked-in
Do you need an on-premise solution, or to manage your own database?
You Probably Already Think In Graphs if…
You watch detective shows
You remember relationships between people
You took a biology class
- 23. Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
Wikipedia contributors. "Taxonomy (biology)." Wikipedia,
The Free Encyclopedia. Wikipedia, The Free Encyclopedia,
11 May. 2016. Web. 12 May. 2016.
GRAPHS 101
- 24. Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
Typical crazy wall whiteboard - from Fargo.
A screen from IBM I2 Coplink
GRAPHS 101
- 25. Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
GRAPHS 101
Family Tree
LinkedIn Tree
- 26. GRAPHS SHOULD BE PART OF YOUR TOOLKIT
A graph is a structure with vertices and edges.
a
e
dc
b
Old Post Road
Cross Highway
Compo
Shinbone Alley
Elk Road
Old Post Road Paved
Old Post Road 11 miles
Elk Road Dirt
Elk Road 2 miles
Cross Highway toll road
Cross Highway 250 miles
Main Street 1 mile
Shinbone Alley .5 miles
a bus stop
b gas station
b Shell
c Elementary school
d House
e Office building
May be labeled, edges may be directed, all may
be stored/processed by properties
represented as key/value pairs.
- 27. Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
GRAPHS HAVE RELEVANT MATHEMATICAL PROPERTIES
e.g. If you represent a graph as a matrix M, then values in Mn
represent the number of paths of length n in the original graph.
a
e
dc
b
a b c d e
a 1
b 1
c 1
d 1
e 1
M =
- 28. Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
OVERVIEW OF THE GRAPH DATABASE MARKET
Wikipedia contributors. "Graph database." Wikipedia, The Free Encyclopedia. Wikipedia, The Free Encyclopedia, 11
Property
graph
RDF
RDF - Resource Description Framework, W3C specs for
metadata modeling, now used in knowledge management
- 29. Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
OPEN SOURCE FOR GRAPH DATA
Apache TinkerPop, TinkerPop, Apache, Apache feather logo, and Apache TinkerPop project logo are
either registered trademarks or trademarks of The Apache Software Foundation in the United States and other countries.
Apache TinkerPop™ is a graph computing framework for both
graph databases (OLTP) and graph analytic systems (OLAP).
“A graph is a structure composed of vertices and edges. Both vertices and edges
can have an arbitrary number of key/value-pairs called properties. Vertices denote
discrete objects such as a person, a place, or an event. Edges denote relationships
between vertices. For instance, a person may know another person, have been
involved in an event, and/or was recently at a particular place. Properties express
non-relational information about the vertices and edges. Example properties include
a vertex having a name, an age and an edge having a timestamp and/or a weight.
Together, the aforementioned graph is known as a property graph and it is the
foundational data structure of Apache TinkerPop.”
Apache TinkerPop™ is an open source, vendor-agnostic, graph computing
framework distributed under the commercial friendly Apache2 license. When a data
system is TinkerPop-enabled, its users are able to model their domain as a graph
and analyze that graph using the Gremlin graph traversal language.
- 30. OPEN SOURCE PROJECTS
Apache Spark
Registered trademarks or trademarks of The Apache Software Foundation
UIMA
Hadoop
Open Source
for Infrastructure
- 31. RELEVANT APACHE SOFTWARE FOUNDATION OPEN SOURCE PROJECTS
Apache Storm: “a free and open source distributed realtime
computation system. Storm makes it easy to reliably process
unbounded streams of data, doing for realtime processing what
Hadoop did for batch processing.”
Apache Spark Streaming: “Spark Streaming brings Apache
Spark's language-integrated API to stream processing, letting you
write streaming jobs the same way you write batch jobs.”
Registered trademarks or trademarks of The Apache Software Foundation
- 32. RELEVANT APACHE SOFTWARE FOUNDATION OPEN SOURCE PROJECTS
Apache Flink: “open-source stream processing framework for
distributed, high-performing, always-available, and accurate data
streaming applications.”
Apache Samza: “a distributed stream processing framework. It
uses Apache Kafka for messaging, and Apache Hadoop YARN to
provide fault tolerance, processor isolation, security, and resource
management.”
Apache Apex: “Enterprise-grade unified stream and batch
processing engine.”
Registered trademarks or trademarks of The Apache Software Foundation
- 33. Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
USE PRE-BUILT KNOWLEDGE RESOURCES
Off The Shelf
Knowledge
- 35. Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
OFF THE SHELF KNOWLEDGE - NEED TO ASSOCIATE/RECOGNIZE/UNDERSTAND TO
ORGANIZE/REPRESENT
Wordnet(R) Princeton
University "About WordNet."
Princeton University. 2010.
<http://
wordnet.princeton.edu>
- 36. Do you have or can you capture streaming data that can increase your value proposition?
Data about your product that can improve performance, reliability, predictability…
Can you create value from new analysis of open data?
Adding your own data/algorithms to open data creates value.
Start by evaluating the emerging open source de facto standards.
Choose an infrastructure that allows you to evaluate live streaming data in the context of
relevant historical data.
It’s All About the Data
GETTING STARTED…
Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
Basic
Principles
- 37. Today:
Delta Airlines processes
5,000,000 business events per day
Pratt & Whitney jet engine:
5,000 sensors producing
10GB/s/per engine.
Formula 1 car sensors produce
about 1.2GB/s
and we need to predict the future…
Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
AS THE SCOPE CHANGES, SO MUST THE SOLUTIONS
- 38. Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
PRODUCTION ARCHITECTURE VS TRAINING ARCHITECTURE: CHALLENGE YOUR ASSUMPTIONS
In Production,
May Scale UP or DOWN.
- 39. Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
The SourceFog Cloud
Data CenterGateway
SHOULD YOU MOVE THE COMPUTATION TO THE DATA, OR DATA TO THE PROCESSOR?
- 41. Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
3-TIER IOT ARCHITECTURE ENABLES DISTRIBUTED INTELLIGENCE & ANALYTICS
Sensors/
Devices
Train the Deep Learning Model
Data Center
Cloud
Cluster
Network
Compress & Run
The DL Model
- 43. PRIMUS INTER PARES
Cloud First!
Mobile First!
AI First!
Data First!!!
Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
- 44. Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
6 RECOMMENDATIONS
Define Your Application Requirements in Terms of Data
Streaming? Plan for it
Process/Analyze As Close to the Source as Possible
Move Intelligence To The Edge (Fog)
Parallelism in Algorithms? Exploit it with hardware
Start With Open Source for Infrastructure
- 45. adrian@storminsights.com
Twitter @ajbowles
Skype ajbowles
If you would like to connect on LinkedIn,
please let me know that you that you
registered for the Smart Data webinar series.
NEXT WEEK…
October 18 Enterprise Analytics Online
1PM Eastern:
ModernAI From Machine Learning to Cognitive Computing
KEEP IN TOUCH
Upcoming SmartData Webinar Dates & Topics
Nov. 9 See Me Feel Me, Touch Me, Heal Me:
The Rise of the Cognitive Interface
Dec. 14 The Road to Autonomous Applications
Jan. 11 AI At The Edge:
Pushing Intelligence to Fog Computing Nodes