Cloud Foundry Summit 2015: Using Service Brokers to Manage Data Lifecycle
- 1. Using Service Brokers to Manage Data
Lifecycle
Josh Kruck | @krujos
jkruck@pivotal.io
github.com/krujos
- 16. 16
Good code needs good tests.
Good tests need good data.
Good data needs… a copy.
A play in 3 acts
so lets get one!
- 21. 21
Once you find a copy, it needs a curator
Sizing (don’t use all of 10 TB of prod to test)
But your sample must represent the entirety of
the dataset.
Representative curation is futile with most
datasets (unknown unknowns).
Sizing means you restrict your tests to what you
left in.
Sizing hides performance issues (missing index)
So maybe it’s not worth it….
- 22. 22
Once you find a copy, it needs a curator
Sanitize it!
Can’t have SSN’s and
CC in test
- 27. 27
The sum of the mess is worth more than its parts
There’s 5475 secondary copies with
no load, can we leverage them for
testing?
Fix: Let CF manage
your data.
- 29. 29
most copies do nothing, but when the sky is falling you need them
first do no harm
- 32. 32
Putting the E in Enterprise
Buy a CDM Product
Actifio, Delphix, ViPR
Great if they support your workloads!
And you can consume the form factors they
deliver
- 33. 33
Based on technology to allow layered writes
Layered FS (Docker, Docker, Docker)?
Clones, Linked Clones, VM Snaps
Writeable Snapshots (FlexClone, XtremIO,
LVM Snaps)
Building is harder than buying
BYO
Editor's Notes
- First, act, how do I get the copies?
- much sleuthing and failed attempts to generate legit test data later…
- Act II
- ACT III
I have a customer who hasn’t refreshed test data in three years.
- ACT III
I have a customer who hasn’t refreshed test data in three years.
- Represent the entirety of the dataset means things like previous schemas. Rows with missing additive fields, FK’s etc. Is selecting those records going to cause issues? What about formats assumed in the data itself (but surely no one stores encoded information in their database).
Everyone knows the data well enough to know what representative is? (no)
- Represent the entirety of the dataset means things like previous schemas. Rows with missing additive fields, FK’s etc. Is selecting those records going to cause issues? What about formats assumed in the data itself (but surely no one stores encoded information in their database).
Everyone knows the data well enough to know what representative is? (no)
- Represent the entirety of the dataset means things like previous schemas. Rows with missing additive fields, FK’s etc. Is selecting those records going to cause issues? What about formats assumed in the data itself (but surely no one stores encoded information in their database).
Everyone knows the data well enough to know what representative is? (no)
- Represent the entirety of the dataset means things like previous schemas. Rows with missing additive fields, FK’s etc. Is selecting those records going to cause issues? What about formats assumed in the data itself (but surely no one stores encoded information in their database).
Everyone knows the data well enough to know what representative is? (no)