VoltDB Big Data Camp LA 2014 - Scott Jar
- 8. VoltDB
Are we really forced to choose?!
8
Timely! Accurate!
Sampled recommendations!
!
Sensor indicates out of band!
!
Some fraud detected now!
Suggestion after purchase!
!
Trapped miners too late!
!
All fraud found days later!
or
- 9. VoltDB
The analytics stack is taking shape!
Enterprise
Apps
ETL
CRM
ERP
Etc.
Data Lake
(HDFS, etc)
BIG
DATA
Batch
Manipulate
Pre-‐process,
etc.
Impala
Hawq
Big
SQL
…
SQL on
Hadoop
Map
Reduce
Exploratory
Analytics
Netezza
/
BLU
RedshiO
VerQca
Greenplum
…
BI
Reporting
- 11. VoltDB
If we think about big data purely as historical analytics… !
!
!
! ! ! ! ! ! ! !…we will miss the opportunity!
11
- 13. VoltDB
Applications Require Data To!
• Ingest huge amounts of events!
��� Make data-driven decision on each event!
• Analyze in real time for operational visibility!
13
- 14. VoltDB
The RDBMS is Getting Crushed!
Reports!
Analytics!
Dashboards!
Alerts!
Mobile!
M2M!
Market Data!
Clickstreams!
Web Interactions!
Social Media!
Internet of Things!
RDBMS
(In
Real-‐Time)
vs.
Batch
Scale
with
unnatural
acts?
• SSD
or
Fusion
IO
cards
• Sharding
• Cache/Grid
• NoSQL,
No
ACID
Boleneck
Unlimited Data with Real Time Analytics!
- 15. VoltDB
Future Corporate Data Architecture!
Enterprise
Apps
ETL
CRM
ERP
Etc.
Data Lake
(HDFS, etc)
BIG
DATA
SQL on
Hadoop
Map
Reduce
Exploratory
Analytics
BI
Reporting
Fast Operational
Database
FAST
DATA
Export
Ingest /
Interactive
Real-Time
Analytics
Fast Serve
Analytics
Decisioning
- 16. VoltDB
Enterprise
Apps
ETL
CRM
ERP
Etc.
BIG
DATA
SQL on
Hadoop
Map
Reduce
Exploratory
Analytics
Data Lake
(HDFS)
BI
Reporting
Fast Operational
Database
FAST
DATA
Requirements for Fast Data!
Export
Decisioning
Ingest /
Interactive
Real-Time
Analytics
Fast Serve
Analytics
1
2
3
4
5
1) Ingest
&
interact
on
streams
of
inbound
data
2) Make
per
event,
data
driven
decisions
3) Real-‐Qme
analyQcs
on
fast
moving
data
4) Integrated
export
to
data
warehouse
5) High
speed
serving
of
warehouse
derived
analyQcs
- 17. VoltDB
Enterprise
Apps
ETL
CRM
ERP
Etc.
BIG
DATA
SQL on
Hadoop
Map
Reduce
Exploratory
Analytics
Data Lake
(HDFS)
BI
Reporting
Requirements for Fast Data – Stream Processing!
1) Ingest
&
interact
on
streams
of
inbound
data
2) Make
per
event,
data
driven
decisions
3) Real-‐Qme
analyQcs
on
fast
moving
data
4) Integrated
export
to
data
warehouse
5) High
speed
serving
of
warehouse
derived
analyQcs
6) System
of
Record
OLTP
(requires
different
system)
FAST
DATA
Unable to do fast serving of
Analytics from warehouse
2
4
5
Decisioning
Ingest
Stream
Processing
Continuous
Computation
for RTA
SQL
database
Decisions only on
Aggregated or
predefined
1
3
Hand coded
computations
- 18. VoltDB
How fast?!
• Yahoo Cloud Serving Benchmark
(YCSB) is a popular industry-
standard benchmark for cloud
databases!
• Workload “B” is most widely reported!
– 95% reads with 5% updates. !
• Results - Best in class cloud
performance (run in the cloud)!!
– 285k TPS for 3 nodes scaling linearly to
724k TPS for a 12 node cluster!
Latency (ms) vs. Throughput!
Linear Scalability!
724k!!
- 19. VoltDB
Example: Log management is a … mess!
19
Log
Web Server
Log
Log
Log
Log
Log
Hadoop
Log
Log
Log
Log
Log
Log
…
Log
Web Server
Log
Web Server
Log
Web Server
Log
Web Server
…
Log
Log
Log
Log
Log
Hadoop
Import
Aggregate
Clean
Filter
…
Write
to
Disk
Analysis
FTP
- 20. VoltDB
Log management is a Fast Data problem!
20
VoltDB
Hadoop
Read
Queue
Aggregate
Clean
Filter
Export
Analysis
Kafka
Web Server
Web Server
Web Server
Web Server
Web Server
…
hps://github.com/VoltDB/app-‐log-‐ingesQon
- 21. VoltDB
IoT,
Energy,
Sensor
Smart
grid/meters,
asset
tracking
&
management
Personalized
Targe4ng
Ad
opQmizaQon,
audience
segmenQng
Telco
Billing
and
rights
management,
subscriber
data,
etc.
Capital
Markets
Risk,
market
data
management,
customer
mgt
Infrastructure
Data
pipeline,
system
performance,
streaming
ETL
There are lots of Fast Data Problems!
21
UK
Smart
Meter