Designing a reactive real-time data platform: Architecture and Infrastructure Challenges
- 2. Who am I?
- DATA Platform Architect
at Pluralsight
- Rackspace
- WDW
- 41. akka {
actor {
deployment {
/services-manager/handler_registry/segment_handler {
router = round-robin-pool
optimal-size-exploring-resizer {
enabled = on
action-interval = 5s
downsize-after-underutilized-for = 2h
}
}
/services-manager/kafka_producer {
router = round-robin-pool
resizer {
lower-bound = 5
upper-bound = 50
messages-per-resize = 500
}
}
}
}
}
- 51. HYDRA REPLICATION PROTOCOL
HYDRA REQUEST
INGESTORS
Publish
Akka Actors (remote)
Transport
Transports
Akka Actors (remote)
Kafka
Postgres
Elastic Search
Inspect metadata
and decide
- 56. HYDRA Message delivery guarantees
ALSO METADATA DRIVEN
AT-LEAST-ONCE SEMANTICS
AKKA PERSISTENT ACTOR
hydra-delivery-strategy
- 58. A Messaging system based on
distributed log semantics
Scalable
Fault tolerant
Stateful
Strong ordering
High concurrency
- 61. Looks like A GLOBALLY ORDERED QUEUE
BROKER
APPLICATION
APPLICATION
CONSUMER
APPLICATION
- 62. THE LOG is a linear structure
Old New
Messages are added here
- 63. Consumers have a position
Only sequential access Read to offset and SCAN
Old New
Consumer 1
Consumer 2
- 64. MESSAGES CAN BE REPLAYED
FOR AS LONG AS THEY EXIST IN THE LOG
Old New
Consumer 1
Consumer 2
- 73. Why spark?
Support for many different data formats
Structured streaming
Failover and lifecycle management
Medium latency
Unified api
- 74. EVENT STREAM / LOG
MATERIALIZEDVIEWS/CACHE
HADOOP
ETL
SERVICE
TRANSF
Writes to
Replicates to
• Reproducible
• Stays in sync
- 76. IT WORKS FOR MICROSERVICES TOO
HYDRA
Sends
BROKER
stores
(at a minimum)
INGESTION
Customer
HYDRA
STREAM DISPATCH
{ }
/dsls
submits
POSTs
Invoices Returns
joins/normalizes
streams
- 78. What is it?
Abstraction layer on top of SPARK datasets
Models data flows
Sources and operations
Based on a custom dsl
Api-driven
- 80. WE ARE ON GITHUB!
github.com/pluralsight/hydra
github.com/pluralsight/hydra-spark