Bootstrapping Microservices with Kafka, Akka and Spark

Bootstrapping microservices
with kafka, Akka and spark
http://linkedin.com/in/alexvsilva
@thealexsilva
ALEX SILVA

Who am I?
- DATA Platform Architect
at Pluralsight
- Rackspace
- WDW

TECHNOLOGY
LEARNING
PLATFORM
What shou
ld
Ilearn?Where
Sho
uld
IStart?Who
can
help
me?Whatdid
I learn?
• Online technology
learning platform
• Subscription model
• Data-driven
PLURALSIGHT

microservices
Relational
ARCHITECTURE
COMMIT
LOGS
Data
ingestion
sTREAM
PROCESSING
Putting it all
together

MONOLITHIC APPS
Deployment Boundary
ORDER
SERVICE
AUTH
SERVICE
RETURNS
SERVICE
INVENTORY
SERVICE
SHOPPING
CART
FULFILLMENT
SERVICE

SHOPPING
CART
AUTH
SERVICE
ORDER
SERVICE
RETURNS
SERVICE
INVENTORY
SERVICE
FULFILLMENT
SERVICE
MICROSERVICES

Bootstrapping Microservices with Kafka, Akka and Spark

Customer
Printer
Invoices
Job
Returns

Customer Invoices
Jobs
Most services

What data do we share?
HOW do we do it?
Invoices
Customer
Job
Returns
Printer

Encapsulation and loose coupling
“Sliceable”, domain-specific datasets
The service data mismatch

DATA WILL DIVERGE
OVERTIME
Invoices
Customer
Job
Returns
Printer

What
consistency do you really need and
when?

ACID 2.0
Associative
Commutative
Idempotent
Distributed

Indexes are awesome and we need it,
they make the lookups fast!
why do you want to scan all the data if you know
what you want?
That’s dumb.
- Dustin Vannoy

LEADER FOLLOWER
REPLICATE
Failover + Resiliency

MUTATION vs. facts
UPDATE wishlist set qty=3
where user_id=121 and product_id=123
At 2:39pm, user 121 updated his wish list,
changing the quantity of product 123 from 1 to 3.
AND
state mutation
fact

SHOPPING CART
SERVICE
CATALOG
SERVICE
USER
SERVICE
FULFILLMENT
SERVICE
USER COMMIT LOG
RETURNS
SERVICE
WRITES TO
REPLICATES
REPLICATES
REPLICATES
REPLICATES
WHAT IF…
SEPARATE READS FROM WRITES

A Messaging system based on
distributed log semantics
Scalable
Fault tolerant
Stateful
Strong ordering
High concurrency

BROKER
BROKER
BROKER(User, 0)
Topic: User
(User, 0)
(User, 0)
READS/WRITES FROM/TO
Leader only
REPLICATION PROTOCOL

Replication is about RESILIENCY
BROKER BROKER BROKER BROKER

Looks like A GLOBALLY ORDERED QUEUE
BROKER
APPLICATION
APPLICATION
CONSUMER
APPLICATION

THE LOG is a linear structure
Old New
Messages are added here

Consumers have a position
Only sequential access Read to offset and SCAN
Old New
Consumer 1
Consumer 2

MESSAGES CAN BE REPLAYED
FOR AS LONG AS THEY EXIST IN THE LOG
Old New
Consumer 1
Consumer 2

A DISTRIBUTED REPLICATION PROTOCOL
Rewind and Replay

LOG CLEAN UP Policy: delete
Scan
1 2 3 4 5 6 7 8 9 10 12 12Old New
After
log.retention.ms or retention.bytes
messages are dropped from the log.

Log clean up policy: compact
Delete retention
point
Cleaner point
delete.retention.ms
16 19 21 23 24 25 261 8 12 13 15Old
New
Log headLog tail

Continuously updating datasets
Max(viewed_time) from
clip_views
where location=‘CA’
over 1 day window

Similar features as a database
JOINAGREGGATE FILTER VIEW

Why spark?
Support for many different data formats
Structured streaming
Failover and lifecycle management
Medium latency
Unified api

EVENT STREAM / LOG
MATERIALIZEDVIEWS/CACHE
HADOOP
ETL
SERVICE
TRANSF
Writes to
Replicates to
• Reproducible
• Stays in sync

Separate data capture from replication

REAL-TIME DATA REPLICATION PLATFORM

Hydra ingest
Data capture at scale

HYDRA REQUEST
INGESTORS Transports
Ingestion replication protocol

Always Capture metadata at ingestion time
Automate data replication
Automate data pipelines
Automate data discovery

Data Metadata
Make more kinds of datasets:
1. readily available
2. easier to use for the entire
organization.

Why avro?
Schema evolution
Smaller data footprint
Json friendly
Strong community support
Existing tools

What is it?
Abstraction layer on top of datasets
Models data flows
Sources and operations
Based on a custom dsl
Api-driven

examples
Kafka Source
JSON File Source
SaveAsAvro Operation
DatabaseUpsert Operation

Putting it all together…
HYDRA
BROKER
INGESTION
Customer
HYDRA
STREAM DISPATCH
{ }
/dsls
Invoices Returns

WE ARE ON GITHUB!
github.com/pluralsight/hydra-spark
github.com/pluralsight/hydra

Bootstrapping Microservices with Kafka, Akka and Spark

Related slideshows

More Related Content

Bootstrapping Microservices with Kafka, Akka and Spark

Editor's Notes