Advanced Benchmarking at Parse
- 2. Parse?
• Parse is a backend service for mobile apps
• Data Storage
• Server-side code
• Push Notifications
• Analytics
• … all by dropping an SDK into your app
- 3. Parse Stats
• Parse has 400,000 apps
• Rapidly growing MongoDB deployment with:
• 500 databases
• 2.5M collections
• 8M indexes
• 50T storage (excluding replication)
• We have all kinds of workloads!
- 4. Variety is Fun
• We support just about any kind workload you can
imagine
• Games, social networking, events, travel, music, etc
• Apps that are read heavy or write heavy
• Heavy push users (time sensitive notifications)
• Apps that store large objects
• Apps that use us for backups
• Inefficient queries
- 5. 2.6 - Why Upgrade?
• General desire to stay current, precursor
for 2.8 and pluggable storage engines
• Specific features in 2.6
• Background indexing on secondaries
• Index intersection
• query plan summary logging
- 6. Upgrading is Scary
• In the early days, we just upgraded
• Put a new version on a secondary
• ???
• Upgrade primaries
• ???
• Fix bugs as we find them - LIVE!
- 7. Upgrading
• We’re too big now to cowboy it up
• Upgrading blindly is a potential catastrophe
• In particular, we want to avoid:
• Significant performance regressions
• Unexpected bugs that break customer
apps
- 8. Benchmarking
• We know that:
• Benchmarking can detect performance
regressions between versions
• Tools and sample workloads (sysbench, YCSB,
…) already exist
• MongoDB runs its own benchmarks
• Our workload is complex - we want more
confidence
- 9. A Customized Approach
• Why not test with production
workloads?
• Flashback: https://github.com/
ParsePlatform/flashback
• Record - python tool to record ops
• Replay - go tool to play back ops
- 10. Record
• Record leverages mongo’s profiling and oplog
• Profiling is enabled on all DBs
• Inserts are collected from the oplog
• All other ops taken from profile db
• Ops are recorded for specified time period
(24H) and then merged
• Produces a JSON file of ops to feed the replay
tool
- 12. Base Snapshot
• Need to replay prod ops on prod data
• It’s best to play back ops on a consistent copy of the data,
otherwise:
• inserts are duplicate key errors
• deletes are no-ops
• queries don’t return the right data
• Using EBS snapshots, we grab a copy of the db during the
recording
• Discard ops before the snapshot
- 14. Base Snapshot
• Snapshot is restored to our benchmark server(s)
• EBS volume has to be “warmed” because snapshot
blocks are not instantiated
• Multi TB volumes can take a few hours to warm
• After warming we create an LVM snapshot
• We can “rewind” (merge) after each playback,
iterating faster
- 15. Playback
1. Freeze the LVM volume
2. Start the version of mongo being tested
3. Adjust replay parameters
• # workers
• # num ops
• timestamp to start at (when base snapshot was taken)
4. Go!
5. Client-side results are logged to file, server-side collected
from monitoring tools
- 17. Our Workload
• 24h of ops collected
• 10M ops at a time, as fast as possible
• 10 workers
• No warming of RS
• LVM snapshot reset, mongod restarted for
each version
• Rinse and repeat for multiple replica sets
- 20. Results
• 33% loss in throughput.
• A second workload showed a 75% drop
in throughput
• 3669.73 ops/sec vs 975.64 ops/sec
• Ouch! What do we do next?
- 21. Replay Data
2.4.10
P99
2.4.10
MAX
2.6.3
P99
2.6.3
MAX
query 18.45ms 20953ms 19.21ms 60001ms
insert 23.5ms 6290ms 50.29ms 48837ms
update 21.87ms 3835ms 21.79ms 48776ms
FAM 21.99ms 6159ms 24.91ms 49254ms
- 23. Bug Hunt!
• Old fashioned troubleshooting begins
• Began isolating query patterns and collections
with high max times
• Reproduced issue, confirmed slowness in 2.6
• Lots of documentation and log gathering,
including extremely verbose QLOG
• Started investigation with the Mongo team that ran
several weeks
- 24. What we found
• Basically, new query planner in 2.6 meets Parse
auto-indexer
• We create lots of indexes automatically
• More indexes to score and potentially race
• Increased likelihood of running into query
planner bugs
- 25. Example 1
Remove op on “Installation”
{ "installationId": {"$ne": ? }, "appIdentifier": "?",
"deviceToken": “?”}
• 9M documents
• installationId is UUID, unique value
• "installationId": {"$ne": ? } matches most documents
• deviceToken is a unique token identifying the device
- 26. { "installationId": {"$ne": ? }, "appIdentifier": "?", "deviceToken": “?”}
• Three candidate indexes:
{installationId: 1, deviceToken: 1}
{deviceToken: 1, installationId: 1}
{deviceToken: 1}
• The second and third indexes are clearly better candidates
for this query, since the device token is a simple point lookup.
• Mongo bug where the work required to skip keys was not
factored in to the plan ranking, causing the inefficient plan to
sometimes tie
• Since it’s a remove op, held the write lock for the DB
• Fixed in: https://jira.mongodb.org/browse/SERVER-14311
- 27. Example 2
Query on “Activity”:
{ $or: [ { _p_project: “?" }, { _p_newProject: “?”} ], acl: { $in: [ "a", “b”, “c" ] } } }
• 25M documents
• _p_project and _p_newProject are pointers to unique IDs of other objects
• acl matches most documents
• Four candidate indexes for this query
{ _p_newProject: 1 }
{ _p_project: 1 }
{ _p_project: 1, _created_at: 1 }
{ acl: 1 }
- 28. { $or: [ { _p_project: “?" }, { _p_newProject: “?”} ], acl: { $in: [ "a", “b”, “c" ] } } }
• Query Planner would race multiple plans using indexes
• Due to a bug, one of the raced indexes would do a full
index scan (acl)
• Index scan was non-yielding, tying up the lock until it had
completed
• Parse query killer job kills non-yielding queries after 45s
• Query planner would fail to cache plan, and would re-run
on next query with the same pattern
• Fixed: https://jira.mongodb.org/browse/SERVER-15152
- 29. Example 3
Query on “Activity”: { $or: [ { _p_project: “?" }, { _p_newProject: “?”} ], acl:
{ $in: [ "a", “b”, “c" ] } } } (same as previous example)
• Usually fast, but occasionally saw high nscanned and query time > 60s
• Since there were indexes on all fields in AND condition, this was a
candidate for index intersection
• planSummary: IXSCAN { _p_project: 1 }, IXSCAN
{ _p_newProject: 1 }, IXSCAN { acl: 1.0 }
• acl was not selective, but _p_project and _p_newProject would
sometimes match 0 documents during race
• intersection-based query plan would get cached, subsequent queries
slow
• Fixed in https://jira.mongodb.org/browse/SERVER-14961
- 31. Comparison
2.4.10
P99
2.4.10
MAX
2.6.4
P99
2.6.4
MAX
2.6.5
P99
2.6.5
MAX
query
18
ms
20,953
ms
19
ms
60,001
ms
10
ms
4,352
ms
insert
23
ms
6,290
ms
50
ms
48,837
ms
24
ms
2,225
ms
update
22
ms
3,835
ms
21
ms
48,776
ms
23
ms
4,535
ms
FAM
22
ms
6,159
ms
24
ms
49,254
ms
23
ms
4,353
ms
- 32. More Results
2.4.10 2.6.5
Ops:10M
W:10
3061
ops/sec
4443
ops/sec
Ops:10M
W:250
10666
ops/sec
12248
ops/sec
Ops:20M
W:1000
11735
ops/sec
14335
ops/sec
- 33. What now?
• 2.6 has a green light on performance
• Working through functionality testing
• Unit/integration testing catching
majority of issues
• Bonus: Flashback error log helping us to
identify problems not caught by tests
- 34. Wrap Up
• Benchmarking with something representative of your
production workload is worth the time
• Saved us from discovering slowness in production and
inevitable and painful rollbacks
• Using actual production data is even better
• Helped us avoid new bugs
• Learned a lot about our own service (indexing
algorithms need some work)
• Initial work can be reused to efficiently test future versions
- 35. Questions?
• Flashback: https://github.com/ParsePlatform/
flashback
• Links to bugs:
• https://jira.mongodb.org/browse/SERVER-14311
• https://jira.mongodb.org/browse/SERVER-15152
• https://jira.mongodb.org/browse/SERVER-14961