SlideShare a Scribd company logo
© 2017 MapR Technologies 1
Real World Impact
of a Global Data Fabric
© 2017 MapR Technologies 2
Contact Information
Ellen Friedman, PhD
Principal Technologist, MapR Technologies
Committer Apache Drill & Apache Mahout projects
O’Reilly author
Email efriedman@mapr.com ellenf@apache.org
Twitter @Ellen_Friedman #BigDataDLN
© 2017 MapR Technologies 3
A big idea - when executed well -
can have enormous reach
© 2017 MapR Technologies 4
Image credit NASA June 2015 DSCCOVR mission
http://bit.ly/nasa-europe-from-space
From 1 millions miles away:
© 2017 MapR Technologies 5
Sometimes all the pieces are
already there – just need
the big idea
© 2017 MapR Technologies 6
Oddly, that’s where his
real adventure started
It all happened in the 19th century…
Matthew Fountain Maury was an officer in the US
Navy in the 1830s
He injured his leg, so the
Navy gave him a “desk job”
© 2017 MapR Technologies 7
Big data project: Maury’s Wind and Currents charts
© 2017 MapR Technologies 8
Big data project: Maury’s Wind and Currents charts
At first, nobody was
interested in them…
…until Captain Jackson
shaved a month off the run
from Baltimore to
Rio de Janeiro
Then everybody
wanted one!
© 2017 MapR Technologies 9
Building a global data
fabric can have huge impact
© 2017 MapR Technologies 10
Pre-Existing Environments – Complex Silos
conso
© 2017 MapR Technologies 11
Build a Global Data Fabric
Flexibility & agility to respond well as life changes
© 2017 MapR Technologies 12
Business Happens at the Speed of Life
Singapore
San Jose
London
Images © E. Friedman 2017
New York
© 2017 MapR Technologies 13
Data fabric doesn’t have to
be global to be a big
advantage
© 2017 MapR Technologies 14
But lots of people want to
take advantage of cloud –
data fabric helps
© 2017 MapR Technologies 15
Cloud Neutrality for Optimization
Burst
Private
On-premise
data center
Core
4x cheaper for base load
4x cheaper for
peak loads
© 2017 MapR Technologies 16
Impact of a Data Fabric
• Agility & speed: respond to changes in outside world
• Handle many types of data, from many sources, at scale
• Comprehensive data, fine-grained control of who has access
• From edge to center: Computation where you need it
© 2017 MapR Technologies 17
Impact of a Data Fabric
• Take advantage of flexibility offered by new technologies
– Share “raw” data; localize refined, special-purpose data
– You don’t have to know all decisions about data use from the start
• Don’t need a huge team to administer multi-tenant system
• Uniform security
© 2017 MapR Technologies 18
Does it have real world
impact?
© 2017 MapR Technologies 19
Example: MapR Customer with Metrics from Many Sources
log consolidator
web server
web server
Web-
server
Log
Web-
server
Log
log_events
log-stash
log-stash
data center GHQ
log_events
events
Elaborate
events
(log-stash)
Aggregate
Signal
detection
log consolidator
web server
Web-
server
Log
web server
Web-
server
Log
log_events
log-stash
log-stash
data center
log consolidator
web server
Web-
server
Log
web server
Web-
server
Log
log_events
log-stash
log-stash
data center
© 2017 MapR Technologies 20
“A year of developer time in
the bank”
© 2017 MapR Technologies 21
MapR Edge Computing: Changes Decision Time
• Connected car industry
• Telecommunications industry
• Hospitals and medical testing facilities
© Ellen Friedman
© Ellen Friedman
©WesAbrams
© 2017 MapR Technologies 22
MapR Edge: Improves Time to Insight
Before MapR Edge After MapR Edge
• Oil & gas
• Medical device
• Car test & dev
48 hours
12 hours
24 hours
< 2 hours
< 15 minutes
< 5 minutes
© 2017 MapR Technologies 23
Global Data Fabric:
How do you build it?
© 2017 MapR Technologies 24
Heart of Stream-1st Architecture: Message Transport
Real-time
analytics
EMR
Patient Facilities
management
Insurance
audit
A
B
Medical tests
C
Medical test
results
The right messaging tool
supports several classes of use
cases (A, B & C)
Image © 2016 Ted Dunning & Ellen Friedman from Chap 1 O’Reilly book Streaming Architecture
© 2017 MapR Technologies 25
Stream Transport that Decouples Producers & Consumers
P
P
P
C
C
C
Transport Processing
Kafka /
MapR Streams
© 2017 MapR Technologies 26
Stream transport supports
microservices
© 2017 MapR Technologies 27
Stream-first Architecture: Basis for Micro-Services
Stream instead of database as the shared “truth”
POS
1..n
Fraud
detector
Last card
use
Updater
Card
analytics
Other
card activity
Image © 2016 Ted Dunning & Ellen Friedman from Chap 6 of O’Reilly book Streaming Architecture used with permission
© 2017 MapR Technologies 28
Legacy Applications
Platform Has a Big Role: Example
Big Data 1.0 Applications Next-Gen Applications
MapR Converged Data Platform
High Availability Real Time Unified Security Multi-tenancy Disaster Recovery Global Namespace
Real-Time NoQL Database Stream TransportWeb-Scale Storage
© 2017 MapR Technologies 29
MapR: Universal Pathnames & Global Namespace
Files
Table
Streams
Directories
Cluster
Volume mount point
© 2017 MapR Technologies 30
MapR Stream Replication
stream
Data
source
Consumer
© 2017 MapR Technologies 31
stream
stream
Data
source
Consumer
MapR Stream Replication
© 2017 MapR Technologies 32
Geo-distributed Data Appears Local
stream
stream
Data
source
ConsumerGlobal Data Center
Regional Data Center
© 2017 MapR Technologies 33
Streams Replication Across Data Centers
• Replication across data centers with
preserved offsets (unlike Kafka)
• Opens new use cases
Example: Shared inventory in
ad-tech industry
Inventory
model
Global
analytics
Database
Local
state
Inventory
model
Local
state
Data center 1 Data center 2
Central data center
© 2017 MapR Technologies 34
Financial Services: Leveraging Many Topics
© 2017 MapR Technologies 35
Multi-Master Table Replication
SF
DB
NY
DB
Source Source
SF
DB
NY
DB
Source Source
A
B
Bi-directional replication helps
© 2017 MapR Technologies 36
Three Aspects
Global
Data
Fabric
Data
Platform
DataOps
© 2017 MapR Technologies 37
Developers get excited about
DataOps
© 2017 MapR Technologies 38
DataOps: Brings Flexibility & Focus
• You don’t have to be a data scientist to contribute to machine learning
• Software engineer/ developer plays a role: but you need good data skills
© 2017 MapR Technologies 39
Machine Learning Everywhere
Image courtesy Mtell used with permission.Images © Ellen Friedman.
© 2017 MapR Technologies 40
Metrics
Metrics
ResultsRendezvous
Rendezvous Architecture for Easier ML Logistics
Scores
ArchiveDecoy
m1
m2
m3
Features /
profiles
InputRaw
© 2017 MapR Technologies 41
New: Machine Learning Logistics
Model Management in the Real World
O’Reilly book by Ellen Friedman & Ted Dunning © Sept 2017
Free pdf copy of book courtesy of MapR:
https://mapr.com/ebook/machine-learning-logistics/
Visit MapR booth for free book signings & booth theater
presentation
Thur schedule:
Book signing: 12:30 – 13:30
Booth presentation by Ted Dunning: 15:15 – 15:30
© 2017 MapR Technologies 42
Please support women in tech – help build
girls’ dreams of what they can accomplish
© Ellen Friedman 2015#womenintech #datawomen
© 2017 MapR Technologies 43
Thank you !
© 2017 MapR Technologies 44
Contact Information
Ellen Friedman, PhD
Principal Technologist, MapR Technologies
Committer Apache Drill & Apache Mahout projects
O’Reilly author
Email efriedman@mapr.com ellenf@apache.org
Twitter @Ellen_Friedman

More Related Content

Big Data LDN 2017: Real World Impact of a Global Data Fabric

  • 1. © 2017 MapR Technologies 1 Real World Impact of a Global Data Fabric
  • 2. © 2017 MapR Technologies 2 Contact Information Ellen Friedman, PhD Principal Technologist, MapR Technologies Committer Apache Drill & Apache Mahout projects O’Reilly author Email efriedman@mapr.com ellenf@apache.org Twitter @Ellen_Friedman #BigDataDLN
  • 3. © 2017 MapR Technologies 3 A big idea - when executed well - can have enormous reach
  • 4. © 2017 MapR Technologies 4 Image credit NASA June 2015 DSCCOVR mission http://bit.ly/nasa-europe-from-space From 1 millions miles away:
  • 5. © 2017 MapR Technologies 5 Sometimes all the pieces are already there – just need the big idea
  • 6. © 2017 MapR Technologies 6 Oddly, that’s where his real adventure started It all happened in the 19th century… Matthew Fountain Maury was an officer in the US Navy in the 1830s He injured his leg, so the Navy gave him a “desk job”
  • 7. © 2017 MapR Technologies 7 Big data project: Maury’s Wind and Currents charts
  • 8. © 2017 MapR Technologies 8 Big data project: Maury’s Wind and Currents charts At first, nobody was interested in them… …until Captain Jackson shaved a month off the run from Baltimore to Rio de Janeiro Then everybody wanted one!
  • 9. © 2017 MapR Technologies 9 Building a global data fabric can have huge impact
  • 10. © 2017 MapR Technologies 10 Pre-Existing Environments – Complex Silos conso
  • 11. © 2017 MapR Technologies 11 Build a Global Data Fabric Flexibility & agility to respond well as life changes
  • 12. © 2017 MapR Technologies 12 Business Happens at the Speed of Life Singapore San Jose London Images © E. Friedman 2017 New York
  • 13. © 2017 MapR Technologies 13 Data fabric doesn’t have to be global to be a big advantage
  • 14. © 2017 MapR Technologies 14 But lots of people want to take advantage of cloud – data fabric helps
  • 15. © 2017 MapR Technologies 15 Cloud Neutrality for Optimization Burst Private On-premise data center Core 4x cheaper for base load 4x cheaper for peak loads
  • 16. © 2017 MapR Technologies 16 Impact of a Data Fabric • Agility & speed: respond to changes in outside world • Handle many types of data, from many sources, at scale • Comprehensive data, fine-grained control of who has access • From edge to center: Computation where you need it
  • 17. © 2017 MapR Technologies 17 Impact of a Data Fabric • Take advantage of flexibility offered by new technologies – Share “raw” data; localize refined, special-purpose data – You don’t have to know all decisions about data use from the start • Don’t need a huge team to administer multi-tenant system • Uniform security
  • 18. © 2017 MapR Technologies 18 Does it have real world impact?
  • 19. © 2017 MapR Technologies 19 Example: MapR Customer with Metrics from Many Sources log consolidator web server web server Web- server Log Web- server Log log_events log-stash log-stash data center GHQ log_events events Elaborate events (log-stash) Aggregate Signal detection log consolidator web server Web- server Log web server Web- server Log log_events log-stash log-stash data center log consolidator web server Web- server Log web server Web- server Log log_events log-stash log-stash data center
  • 20. © 2017 MapR Technologies 20 “A year of developer time in the bank”
  • 21. © 2017 MapR Technologies 21 MapR Edge Computing: Changes Decision Time • Connected car industry • Telecommunications industry • Hospitals and medical testing facilities © Ellen Friedman © Ellen Friedman ©WesAbrams
  • 22. © 2017 MapR Technologies 22 MapR Edge: Improves Time to Insight Before MapR Edge After MapR Edge • Oil & gas • Medical device • Car test & dev 48 hours 12 hours 24 hours < 2 hours < 15 minutes < 5 minutes
  • 23. © 2017 MapR Technologies 23 Global Data Fabric: How do you build it?
  • 24. © 2017 MapR Technologies 24 Heart of Stream-1st Architecture: Message Transport Real-time analytics EMR Patient Facilities management Insurance audit A B Medical tests C Medical test results The right messaging tool supports several classes of use cases (A, B & C) Image © 2016 Ted Dunning & Ellen Friedman from Chap 1 O’Reilly book Streaming Architecture
  • 25. © 2017 MapR Technologies 25 Stream Transport that Decouples Producers & Consumers P P P C C C Transport Processing Kafka / MapR Streams
  • 26. © 2017 MapR Technologies 26 Stream transport supports microservices
  • 27. © 2017 MapR Technologies 27 Stream-first Architecture: Basis for Micro-Services Stream instead of database as the shared “truth” POS 1..n Fraud detector Last card use Updater Card analytics Other card activity Image © 2016 Ted Dunning & Ellen Friedman from Chap 6 of O’Reilly book Streaming Architecture used with permission
  • 28. © 2017 MapR Technologies 28 Legacy Applications Platform Has a Big Role: Example Big Data 1.0 Applications Next-Gen Applications MapR Converged Data Platform High Availability Real Time Unified Security Multi-tenancy Disaster Recovery Global Namespace Real-Time NoQL Database Stream TransportWeb-Scale Storage
  • 29. © 2017 MapR Technologies 29 MapR: Universal Pathnames & Global Namespace Files Table Streams Directories Cluster Volume mount point
  • 30. © 2017 MapR Technologies 30 MapR Stream Replication stream Data source Consumer
  • 31. © 2017 MapR Technologies 31 stream stream Data source Consumer MapR Stream Replication
  • 32. © 2017 MapR Technologies 32 Geo-distributed Data Appears Local stream stream Data source ConsumerGlobal Data Center Regional Data Center
  • 33. © 2017 MapR Technologies 33 Streams Replication Across Data Centers • Replication across data centers with preserved offsets (unlike Kafka) • Opens new use cases Example: Shared inventory in ad-tech industry Inventory model Global analytics Database Local state Inventory model Local state Data center 1 Data center 2 Central data center
  • 34. © 2017 MapR Technologies 34 Financial Services: Leveraging Many Topics
  • 35. © 2017 MapR Technologies 35 Multi-Master Table Replication SF DB NY DB Source Source SF DB NY DB Source Source A B Bi-directional replication helps
  • 36. © 2017 MapR Technologies 36 Three Aspects Global Data Fabric Data Platform DataOps
  • 37. © 2017 MapR Technologies 37 Developers get excited about DataOps
  • 38. © 2017 MapR Technologies 38 DataOps: Brings Flexibility & Focus • You don’t have to be a data scientist to contribute to machine learning • Software engineer/ developer plays a role: but you need good data skills
  • 39. © 2017 MapR Technologies 39 Machine Learning Everywhere Image courtesy Mtell used with permission.Images © Ellen Friedman.
  • 40. © 2017 MapR Technologies 40 Metrics Metrics ResultsRendezvous Rendezvous Architecture for Easier ML Logistics Scores ArchiveDecoy m1 m2 m3 Features / profiles InputRaw
  • 41. © 2017 MapR Technologies 41 New: Machine Learning Logistics Model Management in the Real World O’Reilly book by Ellen Friedman & Ted Dunning © Sept 2017 Free pdf copy of book courtesy of MapR: https://mapr.com/ebook/machine-learning-logistics/ Visit MapR booth for free book signings & booth theater presentation Thur schedule: Book signing: 12:30 – 13:30 Booth presentation by Ted Dunning: 15:15 – 15:30
  • 42. © 2017 MapR Technologies 42 Please support women in tech – help build girls’ dreams of what they can accomplish © Ellen Friedman 2015#womenintech #datawomen
  • 43. © 2017 MapR Technologies 43 Thank you !
  • 44. © 2017 MapR Technologies 44 Contact Information Ellen Friedman, PhD Principal Technologist, MapR Technologies Committer Apache Drill & Apache Mahout projects O’Reilly author Email efriedman@mapr.com ellenf@apache.org Twitter @Ellen_Friedman