SlideShare a Scribd company logo
HBase Meetup @ Cask HQ 
September 25, 2014 
CASK DATA APP PLATFORM 
tigon.io cdap.io coopr.io
PROPRIETARY & CONFIDENTIAL 
HBase Meetup Agenda 
• Cask Open Source Project Announcements by Jonathan Gray 
• Project CDAP: Cask Data App Platform by Jonathan Gray 
• Project Coopr: Cluster Provisioning by Albert Shau 
• Project Tigon: RT Streaming on YARN + HBase by Gokul Gunasekaran 
• HBase at Flipboard by Sang Chi 
• Master Topologies post HBase 1.0 by Mikhail Antonov of WANDisco
Cask Open Source Project Announcements 
100% Apache 2.0 Licensed Software
PROPRIETARY & CONFIDENTIAL 
Simple access to powerful technology 
• Continuuity is now Cask 
• Same Mission. Same Team. Same Technology. Now Open Source! 
• We are an open source software company focused on developers and 
applications on Hadoop 
• We have been building our platform and technologies for 3 years and have 
released major projects today, everything by the end of the year 
• We are committed to building vibrant communities around these projects 
and will drive towards a true community-driven process
Cask Newly Launched Projects 
CASK DATA APP PLATFORM 
Real-time streaming 
for the real world Clusters with a click 
PROPRIETARY & CONFIDENTIAL 
Virtualization for 
Hadoop Data and Apps 
tigon.io cdap.io coopr.io
PROPRIETARY & CONFIDENTIAL 
Cask Launch Day 
Thursday, September 25, 2014 
•Website launch at cask.co with lots of documentation and technical content 
• Project sites launch at cdap.io / coopr.io / tigon.io / tephra.io 
•Reactor released as CDAP v2.5 under ASL2 on GitHub 
•Loom released as Coopr v0.9.8 under ASL2 on GitHub 
•Tigon dev release as Tigon v0.1.0 (Cask + AT&T) under ASL2 on GitHub 
•Hosting HBase Meetup @ Cask HQ w/ sessions on CDAP, Coopr, Tigon
Cask Data App Platform 
cdap.io
Why virtualize? 
PROPRIETARY & CONFIDENTIAL 
Runtime Languages 
•Simpler programming 
•Portability 
Virtual machines and containers 
•More efficient resource utilization 
•Reuse 
Software defined networks 
•Adaptability 
•New applications 
Bringing the concepts of virtualization to Hadoop and HBase data and applications 
augments existing Hadoop-ecosystem open source technologies to enable more use 
cases to be built by more developers in less time. 
•Broader use cases 
•Faster development 
•Accelerated disruption of proprietary incumbents
Cask Data Application Platform 
Data Virtualization 
Logical representations of data 
Application innovation 
• Enable a new class of applications to drive greater business 
value, including those requiring real-time and batch processing 
Simplified development 
• Simplify big data app development – more apps faster with 
less dependence on Hadoop expertise 
! 
Production-ready applications 
• Avoid compromising operational transparency and control - 
security, logging, metrics, lineage, and more 
PROPRIETARY & CONFIDENTIAL 
App Virtualization 
Standardized containers for apps
App Virtualization 
PROPRIETARY & CONFIDENTIAL 
What is App Virtualization? 
• Applications deployed in CDAP containers with runtime services 
! 
Features 
• Framework level guarantees 
• Applications aren’t required to be idempotent 
• Support for development life cycle and operational deployment 
• Portable from laptop to cluster 
• Logging, metrics, security with no developer overhead 
• Standardization of containers across programming paradigms 
• Take advantage of Spark, Cascading, Hive, etc. using programmatic 
APIs without worrying about system implementation 
! 
Benefits 
• Developers can build a broader range of apps focusing on business 
logic, not core system services 
• Speed up time from coding to testing to operational deployment 
• Take advantage of new technology with less need for training and 
expertise 
!
Data Virtualization 
PROPRIETARY & CONFIDENTIAL 
What is Data Virtualization? 
• Logical representations of underlying data in CDAP datasets 
! 
Features 
• Streams for data ingestion 
• Supports Kafka, Flume, REST, user-defined protocols 
• Time-stamped and ordered 
• Horizontally scalable 
• Logical representations in commonly used access patterns 
• Time series, Key value, objects, geospatial index, OLAP cube and more 
• Data available to multiple applications 
• MapReduce, Hive, Spark, Flows and more 
• REST APIs 
• Unified batch and real-time processing 
! 
Benefits 
• Simplify data ingestion and Extract Transform Load (ETL) to accelerate time to 
value 
• Maximize value of data by making it easy to find and easy to explore through 
multiple query methods 
• Protect the data through security, audit, lineage, and reporting 
! 
!
Thank You! 
! 
Questions?
TIGON 
+ = 
Real-time Streaming for the Real World 
Gokul, Software Engineer, Cask Data 
HBase Meetup, September 2014
PROPRIETARY & CONFIDENTIAL 
Meet Tigon 
Open-source Distributed Real-time Stream Processing Framework 
! 
Exactly-once processing guarantees 
Provides both imperative Java API and SQL-like declarative language for building powerful apps! 
Built on top of Hadoop YARNTM and Apache HBaseTM 
Leverages Twill, an Apache incubator project and CASK’s open source transaction engine - Tephra
PROPRIETARY & CONFIDENTIAL 
Tigon Stack 
Evolution of Tigon 
Flowlet 
Flowlet 
Flowlet 
Flowlet 
TigonSQL 
Flowlet 
Events 
Tigon Architecture 
STANDALONE 
Threads 
In Memory Queues 
DISTRIBUTED 
YARN Containers 
HBase Tables
PROPRIETARY & CONFIDENTIAL 
Tigon in Action 
Sample Case : Real-time filter and join of data streams 
Flowlet 
Flowlet 
Flowlet 
TigonSQL 
Join & Filter 
Events 
(<id>,<name>) 
<id,name, age> [name, count] 
Events 
(<id>,<age>)
PROPRIETARY & CONFIDENTIAL 
Help Tigon Grow 
Developer Preview Release available for download on www.tigon.io 
Tigon source available on GitHub (www.github.com/caskdata/tigon) 
! 
Download, Develop, Launch, Fork, Contribute!
PROPRIETARY & CONFIDENTIAL 
Coopr 
clusters with a click 
!
PROPRIETARY & CONFIDENTIAL

More Related Content

HBase Meetup @ Cask HQ 09/25

  • 1. HBase Meetup @ Cask HQ September 25, 2014 CASK DATA APP PLATFORM tigon.io cdap.io coopr.io
  • 2. PROPRIETARY & CONFIDENTIAL HBase Meetup Agenda • Cask Open Source Project Announcements by Jonathan Gray • Project CDAP: Cask Data App Platform by Jonathan Gray • Project Coopr: Cluster Provisioning by Albert Shau • Project Tigon: RT Streaming on YARN + HBase by Gokul Gunasekaran • HBase at Flipboard by Sang Chi • Master Topologies post HBase 1.0 by Mikhail Antonov of WANDisco
  • 3. Cask Open Source Project Announcements 100% Apache 2.0 Licensed Software
  • 4. PROPRIETARY & CONFIDENTIAL Simple access to powerful technology • Continuuity is now Cask • Same Mission. Same Team. Same Technology. Now Open Source! • We are an open source software company focused on developers and applications on Hadoop • We have been building our platform and technologies for 3 years and have released major projects today, everything by the end of the year • We are committed to building vibrant communities around these projects and will drive towards a true community-driven process
  • 5. Cask Newly Launched Projects CASK DATA APP PLATFORM Real-time streaming for the real world Clusters with a click PROPRIETARY & CONFIDENTIAL Virtualization for Hadoop Data and Apps tigon.io cdap.io coopr.io
  • 6. PROPRIETARY & CONFIDENTIAL Cask Launch Day Thursday, September 25, 2014 •Website launch at cask.co with lots of documentation and technical content • Project sites launch at cdap.io / coopr.io / tigon.io / tephra.io •Reactor released as CDAP v2.5 under ASL2 on GitHub •Loom released as Coopr v0.9.8 under ASL2 on GitHub •Tigon dev release as Tigon v0.1.0 (Cask + AT&T) under ASL2 on GitHub •Hosting HBase Meetup @ Cask HQ w/ sessions on CDAP, Coopr, Tigon
  • 7. Cask Data App Platform cdap.io
  • 8. Why virtualize? PROPRIETARY & CONFIDENTIAL Runtime Languages •Simpler programming •Portability Virtual machines and containers •More efficient resource utilization •Reuse Software defined networks •Adaptability •New applications Bringing the concepts of virtualization to Hadoop and HBase data and applications augments existing Hadoop-ecosystem open source technologies to enable more use cases to be built by more developers in less time. •Broader use cases •Faster development •Accelerated disruption of proprietary incumbents
  • 9. Cask Data Application Platform Data Virtualization Logical representations of data Application innovation • Enable a new class of applications to drive greater business value, including those requiring real-time and batch processing Simplified development • Simplify big data app development – more apps faster with less dependence on Hadoop expertise ! Production-ready applications • Avoid compromising operational transparency and control - security, logging, metrics, lineage, and more PROPRIETARY & CONFIDENTIAL App Virtualization Standardized containers for apps
  • 10. App Virtualization PROPRIETARY & CONFIDENTIAL What is App Virtualization? • Applications deployed in CDAP containers with runtime services ! Features • Framework level guarantees • Applications aren’t required to be idempotent • Support for development life cycle and operational deployment • Portable from laptop to cluster • Logging, metrics, security with no developer overhead • Standardization of containers across programming paradigms • Take advantage of Spark, Cascading, Hive, etc. using programmatic APIs without worrying about system implementation ! Benefits • Developers can build a broader range of apps focusing on business logic, not core system services • Speed up time from coding to testing to operational deployment • Take advantage of new technology with less need for training and expertise !
  • 11. Data Virtualization PROPRIETARY & CONFIDENTIAL What is Data Virtualization? • Logical representations of underlying data in CDAP datasets ! Features • Streams for data ingestion • Supports Kafka, Flume, REST, user-defined protocols • Time-stamped and ordered • Horizontally scalable • Logical representations in commonly used access patterns • Time series, Key value, objects, geospatial index, OLAP cube and more • Data available to multiple applications • MapReduce, Hive, Spark, Flows and more • REST APIs • Unified batch and real-time processing ! Benefits • Simplify data ingestion and Extract Transform Load (ETL) to accelerate time to value • Maximize value of data by making it easy to find and easy to explore through multiple query methods • Protect the data through security, audit, lineage, and reporting ! !
  • 12. Thank You! ! Questions?
  • 13. TIGON + = Real-time Streaming for the Real World Gokul, Software Engineer, Cask Data HBase Meetup, September 2014
  • 14. PROPRIETARY & CONFIDENTIAL Meet Tigon Open-source Distributed Real-time Stream Processing Framework ! Exactly-once processing guarantees Provides both imperative Java API and SQL-like declarative language for building powerful apps! Built on top of Hadoop YARNTM and Apache HBaseTM Leverages Twill, an Apache incubator project and CASK’s open source transaction engine - Tephra
  • 15. PROPRIETARY & CONFIDENTIAL Tigon Stack Evolution of Tigon Flowlet Flowlet Flowlet Flowlet TigonSQL Flowlet Events Tigon Architecture STANDALONE Threads In Memory Queues DISTRIBUTED YARN Containers HBase Tables
  • 16. PROPRIETARY & CONFIDENTIAL Tigon in Action Sample Case : Real-time filter and join of data streams Flowlet Flowlet Flowlet TigonSQL Join & Filter Events (<id>,<name>) <id,name, age> [name, count] Events (<id>,<age>)
  • 17. PROPRIETARY & CONFIDENTIAL Help Tigon Grow Developer Preview Release available for download on www.tigon.io Tigon source available on GitHub (www.github.com/caskdata/tigon) ! Download, Develop, Launch, Fork, Contribute!
  • 18. PROPRIETARY & CONFIDENTIAL Coopr clusters with a click !