SlideShare a Scribd company logo
Privileged and confidential
Open Blueprint for Real-Time
Analytics in Retail
Victoria Livschitz, Founder & CTO, Grid Dynamics
03/16/2017
2
Business Need
About the speaker:
Chairman & CTO: present
Founder and CEO: 2006 – 2013
Principal engineer @Sun: 1997 – 2006
Engineering IT services company focused on digital transformation through
cloud & open source for Fortune 500 clients.
Pioneer in real-time processing from inception in 2006.
Frequent contributor to open source projects: Hadoop, Solr, Lucene, Storm,
others.
Victoria Livschitz
About Grid Dynamics:
What is “real-time”, anyways?
3
4
What is “real-time” in analytics, ML, DS & AI?
Receive
event
Event
Analyze
event
Act on
event
ResponseAugment
model
How long is the cycle?
What is done online vs. offline?
Learning Analysis
5
Weeks Days Hours Seconds
Receive
event
Event Analyze
event
Act on
event
ResponseAugment
model
How long is the cycle?
What is done online vs. offline?
Learning Analysis
What is “real-time” in analytics, ML, DS & AI?
6
Event
Act on
event
Response
Receive
event
A few seconds
A day or more
Receive
event
Augment
model
Analyze
event
Modify
reaction
1.Offline learning/analytics, online response
Value
of “real-time”
7
2. Offline learning, real-time
analytics, online response
Event
Act on
event
Response
Receive
event
A few seconds
Receive
event
Augment
model
Analyze
event
Modify
reaction
1.Offline learning/analytics, online response
Event
Receive
event Response
Analyze
event
Act on
event
A few
seconds
Receive
event
Augment
modelDay +
Value
of “real-time”
A day or more
8
Receive
event
Analyze
event
Act on
event
Augment
model
3. Real-time learning/analytics, online response A few seconds
2. Offline learning, real-time
analytics, online response
Event
Act on
event
Response
Receive
event
A few seconds
A day
Receive
event
Augment
model
Analyze
event
Modify
reaction
1.Offline learning/analytics, online response
Valueof“real-time”
Event
Receive
event Response
Analyze
event
Act on
event
A few
seconds
Receive
event
Augment
modelDay +
Event Response
9
Top 6 drivers of real-time applications
#3. Dynamic pricing
Determine “right price” for products
based on availability, trending,
personal context & competitive price
#1. Personalized search
Augment search hits and relevancy
ranking based on personal context &
history
#2. Personalized offers
Motivate “buy now” behavior by
offering deals based on personal
context & history
#4. Dynamic inventory
Predict inventory needs & re-stock
products in stores based on
fluctuations in inventory & demand
#5. Intelligent sourcing
Determine what order to source from
what store to optimize delivery SLAs
& shipment costs
#6. Real-time alerts
Detect unusual patterns: fraud, surge in
demand, weather changes, shift in
brand sentiment. Respond right away
Emergingplatformforreal-timeanalytics:
In-StreamProcessing(ISP)
10
11
In a complex landscape of Big Data systems…
12
…in-stream processing service is an approach
to build real-time extensions of Big Data applications
Today’s
focus
13
Rapidly growing applications in multiple industries
• Fraud detection
• Sentiment analytics
• Preventive maintenance
• Facilities optimization
• Network monitoring
• Intelligence and surveillance
• Risk management
• E-commerce
• Clickstream analytics
• Dynamic pricing
• Supply chain optimization
• Predictive medicine
• Transaction cost analysis
• Market data management
• Algorithmic trading
• Data warehouse augmentation
14
ISP is ideal for:
• Real-time data ingress to replace batch ETLs
• Real-time identification of one-in-a-million “actionable insights”
• Real-time response to actionable insights
• Real-time learning from new data
15
Grid Dynamics open blueprint for ISP
16
17
Blueprint goals
Pre-integrated Real-time streaming;
real-time ML
Cloud-ready
Proven mission-
critical use
Open source
(and built 100%
with open source)
Production-ready
Portable across
clouds
Extendable
18
Target performance & reliability SLAs
Throughput Scales to 100,000s events per second
Latency Seconds to compute; minutes to deliver results
ML strength Full power of streaming algorithms
Reliability Built-in data loss mitigation mechanisms
Availability 99.99+ on commodity cloud infrastructure
19
Selected stack for ISP blueprint
• REST API
• Message Queue
• HDFS
• Other
20
Common ISP systems interfaces
21
Every component is scalable in its own way
• No single point of failure
• Automatic failover
• Data replication
22
Designed as a complete platform
• No single points of failure
• No bottlenecks
• Built-in scaling
• Dockerized
• Deployable to any cloud
• Bindings for Mesos/Marathon
• Reference implementation for
AWS (open source)
• Reference demo: real-time
twitter sentiment analytics for
new movie reviews
ISP reference implementation:
fully-automated DevOps for running ISP
on any modern cloud
23
24
Chosen DevOps stack for RI
• Cloud: AWS
• Deployment unit: Docker container
• Container management: Mesos & Marathon
• Bare cloud infrastructure deployment: Ansible
• Orchestration & application management: Tonomi (for now)
25
How to achieve cloud portability?
• Phase 1: bootstrap management cluster
• [manual] Choose a cloud. Get a set of VMs (6) to host mngt cluster
• [automated] Deploy & configure Mesos/Marathon cluster on available VMs
• Phase 2: use management cluster to provision ISP environments
• [automated] Deploy all ISP components as Docker containers
• [automated] Deploy analytics application components (like Twitter API)
• [automated] Configure all dependencies
• [automated] Scale on-demand
• [automated] Shut down when done
26
Topology with twitter data analytics demo
“TakeISPforaspin”demo:Real-timetwitter
sentimentanalyticsfornewmoviereviews
27
28
Real-time demo, a.k.a. “Data Science Kitchen”
• Provide reference example on how to use ISP platform…
• ... and learn the basics of data science along the way
• Gets actual Twitter data via streaming API
• Analyses & visualizes what people think about latest movies
• Exposes data science “kitchen”: models, training sets, dictionaries
• Provides nice web UI to play with data
• Uses our ISP RI (reference implementation)
• Demo is running on AWS as a public service
• Everything is open sourced
• Documentation on our Tech Blog
29
Demo app: pick movies you want to monitor
30
Compare different views on data
31
Compare trending between different movies
Examples of
positive &
negative Carrie
Fisher tweets
Carrie Fisher
dies Star Wars
releases
new movie
Oscar night
32
Where to learn more
• 7-part blog series on ISP
• 7-part blog series on Data Science Kitchen
1. Read our blog: blog.griddynamics.com
2. Connect
• Twitter: @griddynamics
• Subscribe to our blog
• Drop email: info@griddynamics.com

More Related Content

Open Blueprint for Real-Time Analytics in Retail: Strata Hadoop World 2017 San Jose, CA

  • 1. Privileged and confidential Open Blueprint for Real-Time Analytics in Retail Victoria Livschitz, Founder & CTO, Grid Dynamics 03/16/2017
  • 2. 2 Business Need About the speaker: Chairman & CTO: present Founder and CEO: 2006 – 2013 Principal engineer @Sun: 1997 – 2006 Engineering IT services company focused on digital transformation through cloud & open source for Fortune 500 clients. Pioneer in real-time processing from inception in 2006. Frequent contributor to open source projects: Hadoop, Solr, Lucene, Storm, others. Victoria Livschitz About Grid Dynamics:
  • 4. 4 What is “real-time” in analytics, ML, DS & AI? Receive event Event Analyze event Act on event ResponseAugment model How long is the cycle? What is done online vs. offline? Learning Analysis
  • 5. 5 Weeks Days Hours Seconds Receive event Event Analyze event Act on event ResponseAugment model How long is the cycle? What is done online vs. offline? Learning Analysis What is “real-time” in analytics, ML, DS & AI?
  • 6. 6 Event Act on event Response Receive event A few seconds A day or more Receive event Augment model Analyze event Modify reaction 1.Offline learning/analytics, online response Value of “real-time”
  • 7. 7 2. Offline learning, real-time analytics, online response Event Act on event Response Receive event A few seconds Receive event Augment model Analyze event Modify reaction 1.Offline learning/analytics, online response Event Receive event Response Analyze event Act on event A few seconds Receive event Augment modelDay + Value of “real-time” A day or more
  • 8. 8 Receive event Analyze event Act on event Augment model 3. Real-time learning/analytics, online response A few seconds 2. Offline learning, real-time analytics, online response Event Act on event Response Receive event A few seconds A day Receive event Augment model Analyze event Modify reaction 1.Offline learning/analytics, online response Valueof“real-time” Event Receive event Response Analyze event Act on event A few seconds Receive event Augment modelDay + Event Response
  • 9. 9 Top 6 drivers of real-time applications #3. Dynamic pricing Determine “right price” for products based on availability, trending, personal context & competitive price #1. Personalized search Augment search hits and relevancy ranking based on personal context & history #2. Personalized offers Motivate “buy now” behavior by offering deals based on personal context & history #4. Dynamic inventory Predict inventory needs & re-stock products in stores based on fluctuations in inventory & demand #5. Intelligent sourcing Determine what order to source from what store to optimize delivery SLAs & shipment costs #6. Real-time alerts Detect unusual patterns: fraud, surge in demand, weather changes, shift in brand sentiment. Respond right away
  • 11. 11 In a complex landscape of Big Data systems…
  • 12. 12 …in-stream processing service is an approach to build real-time extensions of Big Data applications Today’s focus
  • 13. 13 Rapidly growing applications in multiple industries • Fraud detection • Sentiment analytics • Preventive maintenance • Facilities optimization • Network monitoring • Intelligence and surveillance • Risk management • E-commerce • Clickstream analytics • Dynamic pricing • Supply chain optimization • Predictive medicine • Transaction cost analysis • Market data management • Algorithmic trading • Data warehouse augmentation
  • 14. 14 ISP is ideal for: • Real-time data ingress to replace batch ETLs • Real-time identification of one-in-a-million “actionable insights” • Real-time response to actionable insights • Real-time learning from new data
  • 15. 15
  • 16. Grid Dynamics open blueprint for ISP 16
  • 17. 17 Blueprint goals Pre-integrated Real-time streaming; real-time ML Cloud-ready Proven mission- critical use Open source (and built 100% with open source) Production-ready Portable across clouds Extendable
  • 18. 18 Target performance & reliability SLAs Throughput Scales to 100,000s events per second Latency Seconds to compute; minutes to deliver results ML strength Full power of streaming algorithms Reliability Built-in data loss mitigation mechanisms Availability 99.99+ on commodity cloud infrastructure
  • 19. 19 Selected stack for ISP blueprint • REST API • Message Queue • HDFS • Other
  • 20. 20 Common ISP systems interfaces
  • 21. 21 Every component is scalable in its own way • No single point of failure • Automatic failover • Data replication
  • 22. 22 Designed as a complete platform • No single points of failure • No bottlenecks • Built-in scaling • Dockerized • Deployable to any cloud • Bindings for Mesos/Marathon • Reference implementation for AWS (open source) • Reference demo: real-time twitter sentiment analytics for new movie reviews
  • 23. ISP reference implementation: fully-automated DevOps for running ISP on any modern cloud 23
  • 24. 24 Chosen DevOps stack for RI • Cloud: AWS • Deployment unit: Docker container • Container management: Mesos & Marathon • Bare cloud infrastructure deployment: Ansible • Orchestration & application management: Tonomi (for now)
  • 25. 25 How to achieve cloud portability? • Phase 1: bootstrap management cluster • [manual] Choose a cloud. Get a set of VMs (6) to host mngt cluster • [automated] Deploy & configure Mesos/Marathon cluster on available VMs • Phase 2: use management cluster to provision ISP environments • [automated] Deploy all ISP components as Docker containers • [automated] Deploy analytics application components (like Twitter API) • [automated] Configure all dependencies • [automated] Scale on-demand • [automated] Shut down when done
  • 26. 26 Topology with twitter data analytics demo
  • 28. 28 Real-time demo, a.k.a. “Data Science Kitchen” • Provide reference example on how to use ISP platform… • ... and learn the basics of data science along the way • Gets actual Twitter data via streaming API • Analyses & visualizes what people think about latest movies • Exposes data science “kitchen”: models, training sets, dictionaries • Provides nice web UI to play with data • Uses our ISP RI (reference implementation) • Demo is running on AWS as a public service • Everything is open sourced • Documentation on our Tech Blog
  • 29. 29 Demo app: pick movies you want to monitor
  • 31. 31 Compare trending between different movies Examples of positive & negative Carrie Fisher tweets Carrie Fisher dies Star Wars releases new movie Oscar night
  • 32. 32 Where to learn more • 7-part blog series on ISP • 7-part blog series on Data Science Kitchen 1. Read our blog: blog.griddynamics.com 2. Connect • Twitter: @griddynamics • Subscribe to our blog • Drop email: info@griddynamics.com