Building a next-generation database

    david [dot]
            Twitter: @FoundationDB
Ease of building successful applications:
• High performance
• Ease scaling out
• Ease of building abstractions
• Ease of operation
Historical Perspective: 2008

 NoSQL doesn’t really exist yet

Databases in 2008
Relational is entrenched; NoSQL emerging
with some interesting advantages:
• Voldemort
• Cassandra
• HBase
 …but the fine print about data guarantees
            doesn’t look so good.
The CAP2008 theorem
• Brewer: Pick 2 out of 3
• Werner Vogels (CTO “Data
  inconsistency in large-scale reliable
  distributed systems has to be tolerated …
  [for performance and to handle faults]”
• Wrong descriptions all over the web: “The
  availability property means that the system
  is ‘online’ and the client of the system can
  expect to receive a response for its
CAP2008 Conclusions?
• Scaling requires distributed design
• Distributed requires high availability
• Availability requires no C

 So, if we want scalability we have to give up C,
            the cornerstone of ACID.

Thinking about CAP2008
• Is a partition worse than a failure?
• Three computers can’t agree?
• Keyword: Availability…

       Availability != high availability

Flash forward to CAP2012
• Brewer: “Why ‘2 of 3’ is misleading”
• Brewer: “CAP prohibits … perfect availability”
• Vogles: “Achieving strict consistency can come at
  a cost in update or read latency, and may result in
  lower throughput…”
• Google (Spanner): “…it is better to have
  application programmers deal with performance
  problems due to overuse of transactions as
  bottlenecks arise, rather than always coding
  around the lack of transactions.“
The FoundationDB concept
• Attack CAP2008 and deliver transactions at
  NoSQL performance and scale
• Reduce core to minimal feature set
• Add features back with higher-level
• Decouple choice of data model and
  choice of storage technology
Database software:        Application

•Ordered key-value API    Layer

                         Key-value API
•Fault tolerant

Engineering pressures
Engineering Challenge              Strategy
Engineering for extreme reliability Simulation
and fault tolerance of large clusters
under adverse conditions
Many asynchronous                     Erlang?
communicating processes
Fast algorithms; efficient I/O        C++

              We need new tools!
First tool: Flow
• A new programming language
• Adds actor-model concurrency to C++11
• New keywords: ACTOR, future, promise,
  wait, choose, when, streams
• Flow code -> C++11 code -> binary

Flow allows…
• Testability by enabling simulation.
• Performance by compiling to native.
• Easier ACTOR-model coding.
Flow eases development

Flow output
Flow performance
Joe Armstrong (author of “Programming Erlang”):

“Write a ring benchmark. Create N processes in a ring.
Send a message round the ring M times so that a total
of N * M messages get sent. Time how long this takes
for different values of N and M. Write a similar
program in some other programming language you are
familiar with. Compare the results. Write a blog, and
publish the results on the internet!”
Flow performance
                 (N=1000, M=1000)
•   Ruby (using threads): 1990 seconds
•   Ruby (queues): 360 seconds
•   Objective C (using threads): 26 seconds
•   Java (threads): 12 seconds
•   Stackless Python: 1.68 seconds
•   Erlang: 1.09 seconds
•   Google Go: 0.87 seconds
•   Flow: 0.075 seconds
Second Tool: Lithium
•   Enabled by Flow
•   Simulate physical interfaces
•   Simulate failures modes
•   Deterministic simulation of entire system

Third tool: Magnesium
Traditional approaches
• Glue together smaller transactional
  – Two-phase-commit (Open/X XA)
  – Paxos
• Build on a distributed file system
  – BigTable/HBase

Deterministic simulation testing
Deterministic simulation testingDeterministic simulation testing
Deterministic simulation testing

This document discusses the use of deterministic simulation to test distributed systems. It describes how Flow, a programming language extension to C++, can be used to simulate concurrency and external communications deterministically. This allows debugging a simulation instead of the live distributed system. Key aspects of the simulation include single-threaded pseudo-concurrency, simulating external connections and files, and ensuring all control flow is deterministic based only on inputs. The simulator is used to run tests and simulated disasters to uncover bugs in a more efficient manner than real world testing alone.

The FoundationDB approach
• Deconstruct a traditional transactional
  database and scale the individual parts
• Each part must also be fault tolerant
• The parts:
  – Accept requests
  – Check for transaction conflicts
  – Log transactions
  – Store data
Key insight
Checking for transaction conflicts
• Problem is scalable
• When highly optimized, is a small
  amount of the total % of work.
• Is tricky to make fault tolerant…
Training montage
•   Paxos coordination algorithm
•   Multi-versioned data structures
•   SSD optimizations
•   Application-managed page cache
•   Prioritization deeply integrated
•   Control theory for queue sizes
•   Testing, testing, testing

Did we reach our big goals?
•   High performance
•   Ease scaling out
•   Ease of building abstractions
•   Ease of operation
High performance
delivers performance
exceeding other
NoSQL databases, but
with transactions!
Ease of scaling out
• Add and remove nodes on-the-fly
• Single key-space with global transactions
• Validated to 96-cores, 48-SSDs
Ease of building abstractions
• Transactions enable abstraction
• Abstractions very hard to build on non-
  transactional systems
• Ordered data model for performance

     Abstractions built on a scalable, fault
tolerant, transactional foundation inherit those

Examples of “ease”
• SQL database in one day
• Indexed table layer (3 days * 1 intern)
• Fractal spatial index in 200 lines:
Ease of operation
• Automatic data partitioning/replication
• Highly fault-tolerant
• Minimal management

          Try to break it yourself!
• Our mission is to solve the problem of state
  management so that developers can focus on
  building their applications
• 3+ years in the making, now ready for your
• Bindings for C, Python, JVM, Node.js, Ruby
Free at

Join our Alpha community
Building a next-generation database

    david [dot]
            Twitter: @FoundationDB

Building FoundationDB

  • 1. Building a next-generation database david [dot] Twitter: @FoundationDB
  • 2. Motivation Ease of building successful applications: • High performance • Ease scaling out • Ease of building abstractions • Ease of operation
  • 4. Historical Perspective: 2008 Future NoSQL doesn’t really exist yet
  • 5. Databases in 2008 Relational is entrenched; NoSQL emerging with some interesting advantages: • Voldemort • Cassandra • HBase …but the fine print about data guarantees doesn’t look so good.
  • 6. The CAP2008 theorem • Brewer: Pick 2 out of 3 • Werner Vogels (CTO “Data inconsistency in large-scale reliable distributed systems has to be tolerated … [for performance and to handle faults]” • Wrong descriptions all over the web: “The availability property means that the system is ‘online’ and the client of the system can expect to receive a response for its request.”
  • 7. CAP2008 Conclusions? • Scaling requires distributed design • Distributed requires high availability • Availability requires no C So, if we want scalability we have to give up C, the cornerstone of ACID. Right?
  • 8. Thinking about CAP2008 • Is a partition worse than a failure? • Three computers can’t agree? • Keyword: Availability… Availability != high availability
  • 9. Flash forward to CAP2012 • Brewer: “Why ‘2 of 3’ is misleading” • Brewer: “CAP prohibits … perfect availability” • Vogles: “Achieving strict consistency can come at a cost in update or read latency, and may result in lower throughput…” • Google (Spanner): “…it is better to have application programmers deal with performance problems due to overuse of transactions as bottlenecks arise, rather than always coding around the lack of transactions.“
  • 10. The FoundationDB concept • Attack CAP2008 and deliver transactions at NoSQL performance and scale • Reduce core to minimal feature set • Add features back with higher-level abstractions—“Layers” • Decouple choice of data model and choice of storage technology
  • 11. FoundationDB Database software: Application •Ordered key-value API Layer •Scalable Key-value API •Transactional •Fault tolerant
  • 13. Engineering pressures Engineering Challenge Strategy Engineering for extreme reliability Simulation and fault tolerance of large clusters under adverse conditions Many asynchronous Erlang? communicating processes Fast algorithms; efficient I/O C++ We need new tools!
  • 14. First tool: Flow • A new programming language • Adds actor-model concurrency to C++11 • New keywords: ACTOR, future, promise, wait, choose, when, streams • Flow code -> C++11 code -> binary Seriously?
  • 15. Flow allows… • Testability by enabling simulation. • Performance by compiling to native. • Easier ACTOR-model coding.
  • 18. Flow performance Joe Armstrong (author of “Programming Erlang”): “Write a ring benchmark. Create N processes in a ring. Send a message round the ring M times so that a total of N * M messages get sent. Time how long this takes for different values of N and M. Write a similar program in some other programming language you are familiar with. Compare the results. Write a blog, and publish the results on the internet!”
  • 19. Flow performance (N=1000, M=1000) • Ruby (using threads): 1990 seconds • Ruby (queues): 360 seconds • Objective C (using threads): 26 seconds • Java (threads): 12 seconds • Stackless Python: 1.68 seconds • Erlang: 1.09 seconds • Google Go: 0.87 seconds • Flow: 0.075 seconds
  • 20. Second Tool: Lithium • Enabled by Flow • Simulate physical interfaces • Simulate failures modes • Deterministic simulation of entire system
  • 24. Traditional approaches • Glue together smaller transactional systems – Two-phase-commit (Open/X XA) – Paxos • Build on a distributed file system – BigTable/HBase
  • 25. The FoundationDB approach • Deconstruct a traditional transactional database and scale the individual parts • Each part must also be fault tolerant • The parts: – Accept requests – Check for transaction conflicts – Log transactions – Store data
  • 26. Key insight Checking for transaction conflicts • Problem is scalable • When highly optimized, is a small amount of the total % of work. • Is tricky to make fault tolerant…
  • 27. Training montage • Paxos coordination algorithm • Multi-versioned data structures • SSD optimizations • Application-managed page cache • Prioritization deeply integrated • Control theory for queue sizes • Testing, testing, testing
  • 29. Did we reach our big goals? • High performance • Ease scaling out • Ease of building abstractions • Ease of operation
  • 30. High performance FoundationDB delivers performance exceeding other NoSQL databases, but with transactions!
  • 31. Ease of scaling out • Add and remove nodes on-the-fly • Single key-space with global transactions • Validated to 96-cores, 48-SSDs
  • 32. Ease of building abstractions • Transactions enable abstraction • Abstractions very hard to build on non- transactional systems • Ordered data model for performance Abstractions built on a scalable, fault tolerant, transactional foundation inherit those properties.
  • 33. Examples of “ease” • SQL database in one day • Indexed table layer (3 days * 1 intern) • Fractal spatial index in 200 lines:
  • 34. Ease of operation • Automatic data partitioning/replication • Highly fault-tolerant • Minimal management Try to break it yourself!
  • 35. Conclusion • Our mission is to solve the problem of state management so that developers can focus on building their applications • 3+ years in the making, now ready for your applications • Bindings for C, Python, JVM, Node.js, Ruby
  • 37. Join our Alpha community
  • 38. Building a next-generation database david [dot] Twitter: @FoundationDB