HBase Meetup @ Cask HQ 09/25
- 1. HBase Meetup @ Cask HQ
September 25, 2014
CASK DATA APP PLATFORM
tigon.io cdap.io coopr.io
- 2. PROPRIETARY & CONFIDENTIAL
HBase Meetup Agenda
• Cask Open Source Project Announcements by Jonathan Gray
• Project CDAP: Cask Data App Platform by Jonathan Gray
• Project Coopr: Cluster Provisioning by Albert Shau
• Project Tigon: RT Streaming on YARN + HBase by Gokul Gunasekaran
• HBase at Flipboard by Sang Chi
• Master Topologies post HBase 1.0 by Mikhail Antonov of WANDisco
- 4. PROPRIETARY & CONFIDENTIAL
Simple access to powerful technology
• Continuuity is now Cask
• Same Mission. Same Team. Same Technology. Now Open Source!
• We are an open source software company focused on developers and
applications on Hadoop
• We have been building our platform and technologies for 3 years and have
released major projects today, everything by the end of the year
• We are committed to building vibrant communities around these projects
and will drive towards a true community-driven process
- 5. Cask Newly Launched Projects
CASK DATA APP PLATFORM
Real-time streaming
for the real world Clusters with a click
PROPRIETARY & CONFIDENTIAL
Virtualization for
Hadoop Data and Apps
tigon.io cdap.io coopr.io
- 6. PROPRIETARY & CONFIDENTIAL
Cask Launch Day
Thursday, September 25, 2014
•Website launch at cask.co with lots of documentation and technical content
• Project sites launch at cdap.io / coopr.io / tigon.io / tephra.io
•Reactor released as CDAP v2.5 under ASL2 on GitHub
•Loom released as Coopr v0.9.8 under ASL2 on GitHub
•Tigon dev release as Tigon v0.1.0 (Cask + AT&T) under ASL2 on GitHub
•Hosting HBase Meetup @ Cask HQ w/ sessions on CDAP, Coopr, Tigon
- 8. Why virtualize?
PROPRIETARY & CONFIDENTIAL
Runtime Languages
•Simpler programming
•Portability
Virtual machines and containers
•More efficient resource utilization
•Reuse
Software defined networks
•Adaptability
•New applications
Bringing the concepts of virtualization to Hadoop and HBase data and applications
augments existing Hadoop-ecosystem open source technologies to enable more use
cases to be built by more developers in less time.
•Broader use cases
•Faster development
•Accelerated disruption of proprietary incumbents
- 9. Cask Data Application Platform
Data Virtualization
Logical representations of data
Application innovation
• Enable a new class of applications to drive greater business
value, including those requiring real-time and batch processing
Simplified development
• Simplify big data app development – more apps faster with
less dependence on Hadoop expertise
!
Production-ready applications
• Avoid compromising operational transparency and control -
security, logging, metrics, lineage, and more
PROPRIETARY & CONFIDENTIAL
App Virtualization
Standardized containers for apps
- 10. App Virtualization
PROPRIETARY & CONFIDENTIAL
What is App Virtualization?
• Applications deployed in CDAP containers with runtime services
!
Features
• Framework level guarantees
• Applications aren’t required to be idempotent
• Support for development life cycle and operational deployment
• Portable from laptop to cluster
• Logging, metrics, security with no developer overhead
• Standardization of containers across programming paradigms
• Take advantage of Spark, Cascading, Hive, etc. using programmatic
APIs without worrying about system implementation
!
Benefits
• Developers can build a broader range of apps focusing on business
logic, not core system services
• Speed up time from coding to testing to operational deployment
• Take advantage of new technology with less need for training and
expertise
!
- 11. Data Virtualization
PROPRIETARY & CONFIDENTIAL
What is Data Virtualization?
• Logical representations of underlying data in CDAP datasets
!
Features
• Streams for data ingestion
• Supports Kafka, Flume, REST, user-defined protocols
• Time-stamped and ordered
• Horizontally scalable
• Logical representations in commonly used access patterns
• Time series, Key value, objects, geospatial index, OLAP cube and more
• Data available to multiple applications
• MapReduce, Hive, Spark, Flows and more
• REST APIs
• Unified batch and real-time processing
!
Benefits
• Simplify data ingestion and Extract Transform Load (ETL) to accelerate time to
value
• Maximize value of data by making it easy to find and easy to explore through
multiple query methods
• Protect the data through security, audit, lineage, and reporting
!
!
- 13. TIGON
+ =
Real-time Streaming for the Real World
Gokul, Software Engineer, Cask Data
HBase Meetup, September 2014
- 14. PROPRIETARY & CONFIDENTIAL
Meet Tigon
Open-source Distributed Real-time Stream Processing Framework
!
Exactly-once processing guarantees
Provides both imperative Java API and SQL-like declarative language for building powerful apps!
Built on top of Hadoop YARNTM and Apache HBaseTM
Leverages Twill, an Apache incubator project and CASK’s open source transaction engine - Tephra
- 15. PROPRIETARY & CONFIDENTIAL
Tigon Stack
Evolution of Tigon
Flowlet
Flowlet
Flowlet
Flowlet
TigonSQL
Flowlet
Events
Tigon Architecture
STANDALONE
Threads
In Memory Queues
DISTRIBUTED
YARN Containers
HBase Tables
- 16. PROPRIETARY & CONFIDENTIAL
Tigon in Action
Sample Case : Real-time filter and join of data streams
Flowlet
Flowlet
Flowlet
TigonSQL
Join & Filter
Events
(<id>,<name>)
<id,name, age> [name, count]
Events
(<id>,<age>)
- 17. PROPRIETARY & CONFIDENTIAL
Help Tigon Grow
Developer Preview Release available for download on www.tigon.io
Tigon source available on GitHub (www.github.com/caskdata/tigon)
!
Download, Develop, Launch, Fork, Contribute!