SlideShare a Scribd company logo
Srinath Perera
VP Research, WSO2
srinath@wso2.com
The Rise of Streaming SQL and
Evolution of Streaming Applications
What is Streaming?
• A Stream is series of Events
• Query Data Streams
• Detect conditions fast (within the time of
receiving the data, - 10ms-1m).
e.g. receive an alert by querying a data streams coming from a
temperature sensor and detecting when the temperature has
reached the freezing point.
Almost all new data is Streaming
Almost all new data is
streams, even batch
data are at one point
potential streaming
data
One can choose to consume them as streaming data or batch
data based on value of responding to them fast
• Transaction data
• Log data
• Sensor data
• Health data
• Traffic Data
Stream Processing Market
Lack of proficient
developers are slowing it
Success depends on
Analytics
Positive trends
• Microservices and Observability
• Security analytics
• EDA and Messaging
Lot of analytics and machine learning
use cases will eventually shift to stream
processing
Stream Processing and IoT depends on each
Other
Market
200-500m
30% growth
Building a Streaming App
Code it Yourself
• Code it yourself
• Publish data to a message
topic
• Write a actor: Subscribe,
process, and put back to a
topic
Use a Streaming SQL based Stream
Processor
• Just write Streaming SQL ( will discuss
later)
Use a Stream Processors
• You just write actor and
stream processor
handles data flow, scale,
failures
History of Stream Processing
Started with active
databases, users want
to act when data met
a condition
TelegraphCQ
(based
PostgreSQL)
People thought about
this outside of
databases as well
Stream
Processing
Complex
Event
Processing
History of Stream Processing
Stream Processing
• Create a graph of actors
and run them using many
machines
• e.g. Aurora, PIPES,
STREAM, Borealis
( academic)
Complex Event Processing
Processing
• Provide a query language
and focused on effect
matching on 1-2 nodes.
• SASE, Esper, Cayuga, and
Siddhi (powers WSO2 SP),
Apama, IBM Infosphere
Niche Applications: Stock Markets, Monitoring and
Alerts, Surveillance
Stream Processing enters Big Data
Yahoo S4 (2010)
Twitter Storm (2011)
Both were donated to Apache
Described as “like
Hadoop, but realtime”
Wide adoption and
visibility
Spark
Streaming,
Samza, Flink
Rise of Streaming SQL
Apache based SP
engines used Code as
API
Big Data Switched to SQL from
MapReduce
Merged to support SQL over many
nodes
Streaming SQL
Apache Storm
Apache Flink
WSO2 CEP->WSO2 SP
Apache Kafka (KSQL)
Apache Samza and Calcite
CEPStream
Processing
What is Streaming SQL?
Time bID T
07:23:30 B1 210
07:23:37 B1 234
…
…
A Stream is a table never ending table, think of table
where new data (events) kept adding
Select bid, t*7/5 + 32 as tF
from BoilerStream Where t >
350
Streaming SQL
is SQL written
in such a never
ending table
Unlike SQL that returns data
when query us done,
Streaming SQL outputs data
as new events are added
You get a trigger whenever
data matches
Why Streaming SQL?
core operations covers
90% of use cases
without code, rest
handled via extensions
Easy to learn for the many people
who know SQL.
It's expressive, short, sweet and fast!!
Manipulate streaming data
declaratively without having to write
code.
A query engine can
better optimize the
executions with a
streaming SQL
model.
Common Solutions with SP
Detect a condition and trigger an alert
that bring user back to dashboard
Condition can
be
• a simple limit
• A complex
trend over time
• correlations
across streams
• a machine
learning model
Detect a condition and
update a dashboard
Train a ML
model apply over
steaming data,
and switch
models as they
drift
Detect a condition
and trigger an action
Calculate short
term values, store them
long term in a database,
and show single view
}
Stream Processor are Stateful
Stream Processors works off memory,
that is the secret of their performance
in 50K plus throughput
To avoid this,
Stream processors
must have HA
Stream Queries never ends
When a stream
processor failed,
which it eventually
must, the streaming
App will loose state
Most stream queries are stateful
(e.g. patterns, windows, joins)
}
Most Stream Processors are Obese
Most Stream Processors need 5+
nodes to setup a HA environment
Then minimal HA
size matters.
Their use cases are large, so are
there deployments. 5 plus nodes
are not a problem for large use
deployments
However, given a
Stream Processors can
do 50,000 events per
second, most use cases
need a one node.
Most famous Stream Processors
come from large internet
companies
}
Stream Processing need ML
Use Streaming
machine learning that
learns on the fly
Train the models offline
and apply online.
When model drift from
data, retrain and swap
the model.
As stream processing is the real
time extension of batch
processing.
Most batch ML use cases will
apply in realtime as well.
}
SP need Advanced Query Authoring Environments
We need integrated
development environments that
let developers write, simulate,
debug, trace, and verify and do it
Lack of programmers who
are comfortable with stream
processing is holding it back
Stream processing
queries are like regular
expressions, which are
• Based on simple rules
• very powerful
• tough on new
programmers
}
}
Stream Processors are
So Far
Two branches: Stream
Processing and CEP
Obese
Rise of
Streaming SQL
Introduction to Stream
Processing
Apache Storm and
inclusion to Big
Data
Stateful and Need HA
Need ML
Need Authoring Tools
WSO2 SP
When to use WSO2 SP?
When you want to detect complex
patterns over time
When you want to fuse data in motion
and data at rest in same application
When you are not sure about the
final load ( scale with Kafa with
same queries)
When you want to do ML
within your queries
When your load is less than
100,000 events/sec ( WSO2 SP
support with just two nodes)
When you want your
end users to tweak your
queries
Next Steps
Checkout WSO2 Stream
Processor Learn about Streaming
Applications with
13 Stream Processing Patterns
for building Streaming
Applications
Webinar: Distributed
Stream Processing with
WSO2 SP
Learn about Streaming
SQL with Streaming SQL
101
Webinar: WSO2 Stream
Processor
Questions?
I write at
https://medium.com/@srinathperera

More Related Content

The Rise of Streaming SQL and Evolution of Streaming Applications

  • 1. Srinath Perera VP Research, WSO2 srinath@wso2.com The Rise of Streaming SQL and Evolution of Streaming Applications
  • 2. What is Streaming? • A Stream is series of Events • Query Data Streams • Detect conditions fast (within the time of receiving the data, - 10ms-1m). e.g. receive an alert by querying a data streams coming from a temperature sensor and detecting when the temperature has reached the freezing point.
  • 3. Almost all new data is Streaming Almost all new data is streams, even batch data are at one point potential streaming data One can choose to consume them as streaming data or batch data based on value of responding to them fast • Transaction data • Log data • Sensor data • Health data • Traffic Data
  • 4. Stream Processing Market Lack of proficient developers are slowing it Success depends on Analytics Positive trends • Microservices and Observability • Security analytics • EDA and Messaging Lot of analytics and machine learning use cases will eventually shift to stream processing Stream Processing and IoT depends on each Other Market 200-500m 30% growth
  • 5. Building a Streaming App Code it Yourself • Code it yourself • Publish data to a message topic • Write a actor: Subscribe, process, and put back to a topic Use a Streaming SQL based Stream Processor • Just write Streaming SQL ( will discuss later) Use a Stream Processors • You just write actor and stream processor handles data flow, scale, failures
  • 6. History of Stream Processing Started with active databases, users want to act when data met a condition TelegraphCQ (based PostgreSQL) People thought about this outside of databases as well Stream Processing Complex Event Processing
  • 7. History of Stream Processing Stream Processing • Create a graph of actors and run them using many machines • e.g. Aurora, PIPES, STREAM, Borealis ( academic) Complex Event Processing Processing • Provide a query language and focused on effect matching on 1-2 nodes. • SASE, Esper, Cayuga, and Siddhi (powers WSO2 SP), Apama, IBM Infosphere Niche Applications: Stock Markets, Monitoring and Alerts, Surveillance
  • 8. Stream Processing enters Big Data Yahoo S4 (2010) Twitter Storm (2011) Both were donated to Apache Described as “like Hadoop, but realtime” Wide adoption and visibility Spark Streaming, Samza, Flink
  • 9. Rise of Streaming SQL Apache based SP engines used Code as API Big Data Switched to SQL from MapReduce Merged to support SQL over many nodes Streaming SQL Apache Storm Apache Flink WSO2 CEP->WSO2 SP Apache Kafka (KSQL) Apache Samza and Calcite CEPStream Processing
  • 10. What is Streaming SQL? Time bID T 07:23:30 B1 210 07:23:37 B1 234 … … A Stream is a table never ending table, think of table where new data (events) kept adding Select bid, t*7/5 + 32 as tF from BoilerStream Where t > 350 Streaming SQL is SQL written in such a never ending table Unlike SQL that returns data when query us done, Streaming SQL outputs data as new events are added You get a trigger whenever data matches
  • 11. Why Streaming SQL? core operations covers 90% of use cases without code, rest handled via extensions Easy to learn for the many people who know SQL. It's expressive, short, sweet and fast!! Manipulate streaming data declaratively without having to write code. A query engine can better optimize the executions with a streaming SQL model.
  • 12. Common Solutions with SP Detect a condition and trigger an alert that bring user back to dashboard Condition can be • a simple limit • A complex trend over time • correlations across streams • a machine learning model Detect a condition and update a dashboard Train a ML model apply over steaming data, and switch models as they drift Detect a condition and trigger an action Calculate short term values, store them long term in a database, and show single view }
  • 13. Stream Processor are Stateful Stream Processors works off memory, that is the secret of their performance in 50K plus throughput To avoid this, Stream processors must have HA Stream Queries never ends When a stream processor failed, which it eventually must, the streaming App will loose state Most stream queries are stateful (e.g. patterns, windows, joins) }
  • 14. Most Stream Processors are Obese Most Stream Processors need 5+ nodes to setup a HA environment Then minimal HA size matters. Their use cases are large, so are there deployments. 5 plus nodes are not a problem for large use deployments However, given a Stream Processors can do 50,000 events per second, most use cases need a one node. Most famous Stream Processors come from large internet companies }
  • 15. Stream Processing need ML Use Streaming machine learning that learns on the fly Train the models offline and apply online. When model drift from data, retrain and swap the model. As stream processing is the real time extension of batch processing. Most batch ML use cases will apply in realtime as well. }
  • 16. SP need Advanced Query Authoring Environments We need integrated development environments that let developers write, simulate, debug, trace, and verify and do it Lack of programmers who are comfortable with stream processing is holding it back Stream processing queries are like regular expressions, which are • Based on simple rules • very powerful • tough on new programmers } }
  • 17. Stream Processors are So Far Two branches: Stream Processing and CEP Obese Rise of Streaming SQL Introduction to Stream Processing Apache Storm and inclusion to Big Data Stateful and Need HA Need ML Need Authoring Tools
  • 19. When to use WSO2 SP? When you want to detect complex patterns over time When you want to fuse data in motion and data at rest in same application When you are not sure about the final load ( scale with Kafa with same queries) When you want to do ML within your queries When your load is less than 100,000 events/sec ( WSO2 SP support with just two nodes) When you want your end users to tweak your queries
  • 20. Next Steps Checkout WSO2 Stream Processor Learn about Streaming Applications with 13 Stream Processing Patterns for building Streaming Applications Webinar: Distributed Stream Processing with WSO2 SP Learn about Streaming SQL with Streaming SQL 101 Webinar: WSO2 Stream Processor