SlideShare a Scribd company logo
FOSS rEvolution in data
Dr. Konstantin Boudnik
Vice President
Open Source Development, WANdisco
What I do in my spare time
● VP, Apache Bigtop
● Hadoop committer
● Committer & contributor to various projects in
Apache Hadoop ecosystemOn Linux desktop
since 1995 :)
● Built first all Java cluster platform in 2000
● Java in LSB
FOSS Peaceful Evolution
● Anarchy: ἀν + ἀρχός (an + arkhos) without a ruler
● Evolution = Revolution
● Iterative build-up liberating people creativity
● Multiple complimentary licenses
– GPL (!), LGPL
– ASF
– CC
● Varieties of business models
Non-aggressive competition:
driven by data
● This presentation was created w/o commercial
software
● Open-source world
– Operating system & cloud computing
– VoIP, video communications
– SSL, GPG, Tor: security
– Bitcoin+: currencies & exchange
– Journalism & News
– Music & Video; P2P
– Storage: keeping and preserving the data
Users demand
● Simplicity
● Data preservation & uninterrupted data
● Consistency guarantees
● Improvements to traditional HA
– Too complex
– Error-prone
– Low on SLAs
● Proactive fault tolerance
– Prevention instead of a cure
FOSS responds by
● Changing use-cases lead to fast progressing
● Hadoop alone
– 10+ releases of 1.x in last 2 years
– 12 releases of 2.x in last 2 years
● A typical Hadoop stack has 18+ components
– Each evolving independent from the next
● But it is mind boggling. Consider this....
Foss evolution cos-boudnik
It goes further
● Today's Hadoop is more than storage + MR
● Baby elephant outgrew his cradle
– Hbase
– SQL frontends
– In-memory processing
– Storage caching
– Connectors
– DSL languages
– <your name is here>
OMG!
Foss evolution cos-boudnik
Foss evolution cos-boudnik
FOSS complexity management
● Bigtop software stack framework
– Define component versions in the BOM
– Make component changes if needed
● Run Bigtop build
● Deploy the cluster w/ provided puppet
● Test the cluster with integration suite
● Rinse and repeat as needed
– Seamless integration into CI
– Easy provisioning and incremental updates
Foss evolution cos-boudnik
FOSS HA solutions
● Hadoop2 HA: delivered on top of ZK quorum
● But...
– Failure mitigation is always reactive
– High operational complexity
– Potential for NN split-brain
– Still topping up at five '9s'
– Oh, and what about “Strong Consistency”?
What are the alternatives?
● Consensus based replication
– Preventative care is and cheaper and better
– Coordination of intent in DSM
– Multiple active masters with same state
– Single-copy equivalence of DSM
– HADOOP-10641, HBASE-10909
WANdisco: continuous availability company
WANdisco := Wide Area Network Distributed Computing
We solve availability problems for enterprises.. If you can’t afford
99.999% - we’ll help
Publicly trading at London Stock Exchange since mid-2012 (LSE:WAND)
Apache Software Foundation sponsor; actively contributing to Hadoop,
SVN, and others
US patented active-active replication technology
Located on three continents
Enterprise ready, high availability software solutions for globally
distributed organizations
Subversion, Git, Hadoop HDFS at 200+ customer sites
Thank you
@c0sin

More Related Content

Foss evolution cos-boudnik

  • 1. FOSS rEvolution in data Dr. Konstantin Boudnik Vice President Open Source Development, WANdisco
  • 2. What I do in my spare time ● VP, Apache Bigtop ● Hadoop committer ● Committer & contributor to various projects in Apache Hadoop ecosystemOn Linux desktop since 1995 :) ● Built first all Java cluster platform in 2000 ● Java in LSB
  • 3. FOSS Peaceful Evolution ● Anarchy: ἀν + ἀρχός (an + arkhos) without a ruler ● Evolution = Revolution ● Iterative build-up liberating people creativity ● Multiple complimentary licenses – GPL (!), LGPL – ASF – CC ● Varieties of business models
  • 4. Non-aggressive competition: driven by data ● This presentation was created w/o commercial software ● Open-source world – Operating system & cloud computing – VoIP, video communications – SSL, GPG, Tor: security – Bitcoin+: currencies & exchange – Journalism & News – Music & Video; P2P – Storage: keeping and preserving the data
  • 5. Users demand ● Simplicity ● Data preservation & uninterrupted data ● Consistency guarantees ● Improvements to traditional HA – Too complex – Error-prone – Low on SLAs ● Proactive fault tolerance – Prevention instead of a cure
  • 6. FOSS responds by ● Changing use-cases lead to fast progressing ● Hadoop alone – 10+ releases of 1.x in last 2 years – 12 releases of 2.x in last 2 years ● A typical Hadoop stack has 18+ components – Each evolving independent from the next ● But it is mind boggling. Consider this....
  • 8. It goes further ● Today's Hadoop is more than storage + MR ● Baby elephant outgrew his cradle – Hbase – SQL frontends – In-memory processing – Storage caching – Connectors – DSL languages – <your name is here>
  • 12. FOSS complexity management ● Bigtop software stack framework – Define component versions in the BOM – Make component changes if needed ● Run Bigtop build ● Deploy the cluster w/ provided puppet ● Test the cluster with integration suite ● Rinse and repeat as needed – Seamless integration into CI – Easy provisioning and incremental updates
  • 14. FOSS HA solutions ● Hadoop2 HA: delivered on top of ZK quorum ● But... – Failure mitigation is always reactive – High operational complexity – Potential for NN split-brain – Still topping up at five '9s' – Oh, and what about “Strong Consistency”?
  • 15. What are the alternatives? ● Consensus based replication – Preventative care is and cheaper and better – Coordination of intent in DSM – Multiple active masters with same state – Single-copy equivalence of DSM – HADOOP-10641, HBASE-10909
  • 16. WANdisco: continuous availability company WANdisco := Wide Area Network Distributed Computing We solve availability problems for enterprises.. If you can’t afford 99.999% - we’ll help Publicly trading at London Stock Exchange since mid-2012 (LSE:WAND) Apache Software Foundation sponsor; actively contributing to Hadoop, SVN, and others US patented active-active replication technology Located on three continents Enterprise ready, high availability software solutions for globally distributed organizations Subversion, Git, Hadoop HDFS at 200+ customer sites