This document discusses data storage and management. It covers various data storage models like relational, document, key-value, and graph databases. It emphasizes that the most important factors are providing fast response times under all loads, and selecting a consistency model appropriate for the application's needs. New data storage technologies have made consistency more complex, so applications need to make their consistency rules explicit rather than relying on the storage engine. Data should be engineered to have good response time distributions and be agile, adaptable, and sustainable over the long term as applications and technologies change.
32. 0%
25.00%
50.00%
75.00%
100.00%
0 100 200 300 400 500 600 700 800 900 1000
Response Time Histogram%ofResponses
Response Time (ms)
Empty Box Time
Response time of a
single request on an
unloaded system.
33. Scalability is a means, not an end.
What we need is
fast response time,
under all loads.
34. App Web Media Web API Gateway
CMSApp layer
Cache farm Search farm
Indexer
Search
Admin
Jobbers
Caching
Proxy/LB
Caching
Proxy/LB
CDN
Egress
36. App Web Media Web API Gateway
CMSApp layer
Cache farm Search farm
Indexer
Search
Admin
Jobbers
Caching
Proxy/LB
Caching
Proxy/LB
CDN
Egress
37. App Web Media Web API Gateway
CMS
App layer
Cache farm Search farm
Indexer
Search
Admin
Jobbers
Caching
Proxy/LB
Caching
Proxy/LB
CDN
Egress
ISP
38. You cannot be sure the data is unchanged since your
observation, except by making another observation.
P(unch) = F(dC/dt, dt)
UNCERTAINTY PRINCIPLE
39. THE ROLE OF SURPRISE
Unlikely answers are often more interesting.
47. SUPER-OBSERVER
There are no one-to-many mappings from the super-
observer’s states to any other observer’s states.
48. SUPER-OBSERVER
A super-observer is maximally present if it can discriminate
among the Cartesian product of all other observations.
Observer Set of States
Steve {L, R}
Brian {L, R}
Super-Observer {L → B, R → F} × {L, R}
50. PORKY PIG’S
WINDOW SHADE
If Porky Pig is looking at the window shade,
he always observes it to be down.
If he is looking away from the window shade,
it rolls up.
54. STATE SPACE
Cartesian product of all possible sets of states.
Example
1,000,000 bytes of RAM
8 bits per byte
2 states per bit
8,000,000 dimensions with 2 values each
or
1,000,000 dimensions with 256 values each
55. STATE SPACE
10,000,000 rows in a table
20 columns
Whole database is a single point in a 200,000,000
dimensional space.
Changes to data are transforms of that point.
State over time is the trajectory of that point.
65. WHAT ABOUT CAP?
Consistency:
“...there must exist a total order on all operations such that
each operation looks as if it were completed at a single instant.”
Seth Gilbert and Nancy Lynch. 2002.
Brewer's conjecture and the feasibility of consistent, available,
partition-tolerant web services.
SIGACT News 33, 2 (June 2002), 51-59. DOI=10.1145/564585.564601
http://doi.acm.org/10.1145/564585.564601
66. WHAT ABOUT CAP?
Linearizability
Seth Gilbert and Nancy Lynch. 2002.
Brewer's conjecture and the feasibility of consistent, available,
partition-tolerant web services.
SIGACT News 33, 2 (June 2002), 51-59. DOI=10.1145/564585.564601
http://doi.acm.org/10.1145/564585.564601
67. The data base consists of entities which are
related in certain ways. These relationships are
best thought of as assertions about the data.
68. Examples of such assertions are:
“Names is an index forTelephone_numbers.”
“The value of Count_of_X gives the number of
employees in department X.”
69. The data base is said to be consistent if it satisfies
all its assertions. In some cases, the data base must
become temporarily inconsistent in order to
transform it to a new consistent state.
From "Granularity of Locks and Degrees of Consistency in a
Shared Data Base",
J.N. Gray, R.A. Lorie, G.R. Putzolu, I.L.Traiger, 1976
From "Granularity of Locks and Degrees of Consistency in a
Shared Data Base",
J.N. Gray, R.A. Lorie, G.R. Putzolu, I.L.Traiger, 1976
70. Consistency is a predicate C on entities and their
values.The predicate is generally not known to the
system but is embodied in the structure of the
transactions.
From "Transactions and Consistency in Distributed Database Systems",
I.L.Traiger, J.N. Gray, C.A. Galtieri, and B.G. Lindsay, 1982
71. “C” VERSUS “A”?
Response Time
Consistency
Resilience
See also: http://goo.gl/1Yv3
→ http://dbmsmusings.blogspot.com/2010/04/problems-with-cap-and-yahoos-little.html
81. ENFORCING C
Must account for overlapping wavefronts of information.
There is no master clock.
Simultaneity is positional.
Make C explicit in the application, don’t rely on storage engine.