Foss evolution cos-boudnik
- 1. FOSS rEvolution in data
Dr. Konstantin Boudnik
Vice President
Open Source Development, WANdisco
- 2. What I do in my spare time
● VP, Apache Bigtop
● Hadoop committer
● Committer & contributor to various projects in
Apache Hadoop ecosystemOn Linux desktop
since 1995 :)
● Built first all Java cluster platform in 2000
● Java in LSB
- 3. FOSS Peaceful Evolution
● Anarchy: ἀν + ἀρχός (an + arkhos) without a ruler
● Evolution = Revolution
● Iterative build-up liberating people creativity
● Multiple complimentary licenses
– GPL (!), LGPL
– ASF
– CC
● Varieties of business models
- 4. Non-aggressive competition:
driven by data
● This presentation was created w/o commercial
software
● Open-source world
– Operating system & cloud computing
– VoIP, video communications
– SSL, GPG, Tor: security
– Bitcoin+: currencies & exchange
– Journalism & News
– Music & Video; P2P
– Storage: keeping and preserving the data
- 5. Users demand
● Simplicity
● Data preservation & uninterrupted data
● Consistency guarantees
● Improvements to traditional HA
– Too complex
– Error-prone
– Low on SLAs
● Proactive fault tolerance
– Prevention instead of a cure
- 6. FOSS responds by
● Changing use-cases lead to fast progressing
● Hadoop alone
– 10+ releases of 1.x in last 2 years
– 12 releases of 2.x in last 2 years
● A typical Hadoop stack has 18+ components
– Each evolving independent from the next
● But it is mind boggling. Consider this....
- 8. It goes further
● Today's Hadoop is more than storage + MR
● Baby elephant outgrew his cradle
– Hbase
– SQL frontends
– In-memory processing
– Storage caching
– Connectors
– DSL languages
– <your name is here>
- 12. FOSS complexity management
● Bigtop software stack framework
– Define component versions in the BOM
– Make component changes if needed
● Run Bigtop build
● Deploy the cluster w/ provided puppet
● Test the cluster with integration suite
● Rinse and repeat as needed
– Seamless integration into CI
– Easy provisioning and incremental updates
- 14. FOSS HA solutions
● Hadoop2 HA: delivered on top of ZK quorum
● But...
– Failure mitigation is always reactive
– High operational complexity
– Potential for NN split-brain
– Still topping up at five '9s'
– Oh, and what about “Strong Consistency”?
- 15. What are the alternatives?
● Consensus based replication
– Preventative care is and cheaper and better
– Coordination of intent in DSM
– Multiple active masters with same state
– Single-copy equivalence of DSM
– HADOOP-10641, HBASE-10909
- 16. WANdisco: continuous availability company
WANdisco := Wide Area Network Distributed Computing
We solve availability problems for enterprises.. If you can’t afford
99.999% - we’ll help
Publicly trading at London Stock Exchange since mid-2012 (LSE:WAND)
Apache Software Foundation sponsor; actively contributing to Hadoop,
SVN, and others
US patented active-active replication technology
Located on three continents
Enterprise ready, high availability software solutions for globally
distributed organizations
Subversion, Git, Hadoop HDFS at 200+ customer sites