This presentation outlines key business drivers for real-time analytics applications in retail and describes the emerging architectures based on In-Stream Processing (ISP) technologies. The slides present a complete open blueprint for an ISP platform - including a demo application for real-time Twitter Sentiment Analytics - designed with 100% open source components and deployable to any cloud.
To learn more, read an adjoining blog series on this topic here : https://blog.griddynamics.com/in-stream-processing-service-blueprint
Report
Share
Report
Share
1 of 32
Download to read offline
More Related Content
Open Blueprint for Real-Time Analytics in Retail: Strata Hadoop World 2017 San Jose, CA
1. Privileged and confidential
Open Blueprint for Real-Time
Analytics in Retail
Victoria Livschitz, Founder & CTO, Grid Dynamics
03/16/2017
2. 2
Business Need
About the speaker:
Chairman & CTO: present
Founder and CEO: 2006 – 2013
Principal engineer @Sun: 1997 – 2006
Engineering IT services company focused on digital transformation through
cloud & open source for Fortune 500 clients.
Pioneer in real-time processing from inception in 2006.
Frequent contributor to open source projects: Hadoop, Solr, Lucene, Storm,
others.
Victoria Livschitz
About Grid Dynamics:
4. 4
What is “real-time” in analytics, ML, DS & AI?
Receive
event
Event
Analyze
event
Act on
event
ResponseAugment
model
How long is the cycle?
What is done online vs. offline?
Learning Analysis
5. 5
Weeks Days Hours Seconds
Receive
event
Event Analyze
event
Act on
event
ResponseAugment
model
How long is the cycle?
What is done online vs. offline?
Learning Analysis
What is “real-time” in analytics, ML, DS & AI?
7. 7
2. Offline learning, real-time
analytics, online response
Event
Act on
event
Response
Receive
event
A few seconds
Receive
event
Augment
model
Analyze
event
Modify
reaction
1.Offline learning/analytics, online response
Event
Receive
event Response
Analyze
event
Act on
event
A few
seconds
Receive
event
Augment
modelDay +
Value
of “real-time”
A day or more
8. 8
Receive
event
Analyze
event
Act on
event
Augment
model
3. Real-time learning/analytics, online response A few seconds
2. Offline learning, real-time
analytics, online response
Event
Act on
event
Response
Receive
event
A few seconds
A day
Receive
event
Augment
model
Analyze
event
Modify
reaction
1.Offline learning/analytics, online response
Valueof“real-time”
Event
Receive
event Response
Analyze
event
Act on
event
A few
seconds
Receive
event
Augment
modelDay +
Event Response
9. 9
Top 6 drivers of real-time applications
#3. Dynamic pricing
Determine “right price” for products
based on availability, trending,
personal context & competitive price
#1. Personalized search
Augment search hits and relevancy
ranking based on personal context &
history
#2. Personalized offers
Motivate “buy now” behavior by
offering deals based on personal
context & history
#4. Dynamic inventory
Predict inventory needs & re-stock
products in stores based on
fluctuations in inventory & demand
#5. Intelligent sourcing
Determine what order to source from
what store to optimize delivery SLAs
& shipment costs
#6. Real-time alerts
Detect unusual patterns: fraud, surge in
demand, weather changes, shift in
brand sentiment. Respond right away
14. 14
ISP is ideal for:
• Real-time data ingress to replace batch ETLs
• Real-time identification of one-in-a-million “actionable insights”
• Real-time response to actionable insights
• Real-time learning from new data
17. 17
Blueprint goals
Pre-integrated Real-time streaming;
real-time ML
Cloud-ready
Proven mission-
critical use
Open source
(and built 100%
with open source)
Production-ready
Portable across
clouds
Extendable
18. 18
Target performance & reliability SLAs
Throughput Scales to 100,000s events per second
Latency Seconds to compute; minutes to deliver results
ML strength Full power of streaming algorithms
Reliability Built-in data loss mitigation mechanisms
Availability 99.99+ on commodity cloud infrastructure
21. 21
Every component is scalable in its own way
• No single point of failure
• Automatic failover
• Data replication
22. 22
Designed as a complete platform
• No single points of failure
• No bottlenecks
• Built-in scaling
• Dockerized
• Deployable to any cloud
• Bindings for Mesos/Marathon
• Reference implementation for
AWS (open source)
• Reference demo: real-time
twitter sentiment analytics for
new movie reviews
25. 25
How to achieve cloud portability?
• Phase 1: bootstrap management cluster
• [manual] Choose a cloud. Get a set of VMs (6) to host mngt cluster
• [automated] Deploy & configure Mesos/Marathon cluster on available VMs
• Phase 2: use management cluster to provision ISP environments
• [automated] Deploy all ISP components as Docker containers
• [automated] Deploy analytics application components (like Twitter API)
• [automated] Configure all dependencies
• [automated] Scale on-demand
• [automated] Shut down when done
28. 28
Real-time demo, a.k.a. “Data Science Kitchen”
• Provide reference example on how to use ISP platform…
• ... and learn the basics of data science along the way
• Gets actual Twitter data via streaming API
• Analyses & visualizes what people think about latest movies
• Exposes data science “kitchen”: models, training sets, dictionaries
• Provides nice web UI to play with data
• Uses our ISP RI (reference implementation)
• Demo is running on AWS as a public service
• Everything is open sourced
• Documentation on our Tech Blog
31. 31
Compare trending between different movies
Examples of
positive &
negative Carrie
Fisher tweets
Carrie Fisher
dies Star Wars
releases
new movie
Oscar night
32. 32
Where to learn more
• 7-part blog series on ISP
• 7-part blog series on Data Science Kitchen
1. Read our blog: blog.griddynamics.com
2. Connect
• Twitter: @griddynamics
• Subscribe to our blog
• Drop email: info@griddynamics.com