SlideShare a Scribd company logo
Production
MongoDB
in the Cloud
From Essentials
to Corner Cases
Who are we?
Mike Hobbs & Bridget Kromhout
Social Commerce
&
Brand Interest Graph Analytics
Why MongoDB?
● Scalable, high-performance, open source
● Dynamic schemas for unstructured data
● Query language close to SQL in power
● "Eventually consistent" is hard to program right
Our configuration
12-node cluster (4 shards x 3 replica sets)
Several other non-sharded replica sets
Desired webapp response time is < 10ms
Total data size: 110 GB
Total index size: 28 GB
Largest collection: 49 GB
Largest index: 8.1 GB
EC2: EBS, instance size, replication
MongoDB: right for only some data sets
Memory & iowait
Working set needs to fit in memory
● Indexes
● Frequently accessed records
Avoid swapping!!!
EBS latency in EC2 is an issue.
Fragmentation
Fragmentation steals from your most precious resource by
reserving memory that is not used.
Run a compaction when your storageSize significantly
exceeds your data size
mongos> db.widgets.stats()
...
"size" : 5097988,
"storageSize" : 22507520,
Padding can reduce fragmentation and I/O
db.widgets.insert({widg_id: "72120", padding: "XXXX...XXX"})
db.widgets.update({widg_id: "72120"}, {
$unset: {padding: ""},
$set: {desc: "Grout remover", price: "13.39", instock:
true}
})
Replica sets
"optime" : { "t" : 1365165841000 , "i" : 1 },
"optimeDate" : { "$date" : "Fri Apr 5 07:44:01 2013" },
test-3-1.yourdomain
test-3-2.yourdomain
test-3-3.yourdomain
test-3-1.yourdomain
test
Elections
08:52:06 [rsMgr] can't see a majority of the set, relinquishing
primary
08:52:06 [rsMgr] replSet relinquishing primary state
08:52:06 [rsMgr] replSet SECONDARY
08:52:12 [rsMgr] replSet can't see a majority, will not try to
elect self
Primary always determined by an election.
2-member replSet without an arbiter: if the secondary goes
offline, the primary will step down:
Priorities can rig elections.
Ensure availability of an odd number of voting members.
Manual primary changes
No "become primary now" command.
Manual stepdowns with recusal timeout are best option.
test-1:PRIMARY> rs.stepDown(300)
Wed Apr 3 11:45:36 DBClientCursor::init call() failed
Wed Apr 3 11:45:36 query failed : admin.$cmd { replSetStepDown: 300.0 } to:
127.0.0.1:27017
Wed Apr 3 11:45:36 Error: error doing query: failed
src/mongo/shell/collection.js:155
Wed Apr 3 11:45:36 trying reconnect to 127.0.0.1:27017
Wed Apr 3 11:45:36 reconnect 127.0.0.1:27017 ok
test-1:SECONDARY>
This triggers an election.
(Obviously, make sure your preferred candidate(s) can win.)
States: down (initializing), startup2, secondary, primary
replSet back to standalone? No.
Test server: replicaset of 1, shard of 1. removed --replSet but
shard configuration needed manual update:
db.shards.update({host:"testreplset/test.domain.net"}, {$set:
{host:"test.domain.net"}})
UpdatedExisting values no longer returned by mongos, but
visible when connected to mongod:
> db.schedule.update({_id:...}, {$set:{lock:true}}, false, true);
db.runCommand("getlasterror")
{
"updatedExisting" : true,
"n" : 1,
"connectionId" : 73,
"err" : null,
"ok" : 1
}
Solution: re-adding --replSet to the mongod startup line and
reverting shard configs. (Bug open with 10gen.)
Sharding
Can increase parallelization of CPU & I/O
Carefully choose a shard key (nontrivial to change)
Must run config servers & mongos
Doesn't ensure high availability
Doesn't help if you're already out of memory
256GB collection max
for initial sharding
Rebalancing data across shards
Queries block while servers
negotiate final hand-off.
Updating indexes after hand-
off can be slow.
Best run off-peak
mongos> use config
switched to db config
mongos> db.settings.find()
{ "_id" : "balancer", "activeWindow" :
{ "start" : "23:00", "stop" : "6:00" }
}
Mongos & replSet primary changes
Application-level errors talking to mongos after an election:
pymongo.errors.AutoReconnect: could not connect to localhost:
27020: [Errno 111] Connection refused
pymongo.errors.OperationFailure: database error: error
querying server
Mongos errors talking to mongod on original primary:
Tue Apr 2 09:01:05 [conn3288] Socket say send() errno:110
Connection timed out 10.141.131.214:27017
Tue Apr 2 09:01:05 [conn3288] DBException in process: socket
exception [SEND_ERROR] for 10.141.131.214:27017
Connection pool checked lazily; invalid connections can persist
for days, depending on load. Can clear manually:
mongos> db.adminCommand({connPoolSync:1});
{ "ok" : 1 }
mongos>
Failure handling
Applications must handle fail-over outages:
AutoReconnect & OperationFailure in pymongo
def auto_reconnect(func, *args, **kwargs):
""" Executes func, retrying on AutoReconnect """
for _ in range(100):
try:
return func(self, *args, **kwargs)
except pymongo.errors.AutoReconnect:
pass
except pymongo.errors.OperationFailure:
pass
time.sleep(0.1)
raise TimeoutError()
MMS (MongoDB Monitoring Service)
● free; hosted by 10gen
● need to run agent locally
● 10gen's commercial support relies on MMS
Profiling queries [1]
Finding bad queries that are actively running:
$ mongo | tee mongo.log
> db.currentOp()
...
bye
$ grep numYields mongo.log
"numYields" : 0,
"numYields" : 62247,
"numYields" : 0,
...
# Use your favorite viewer to find the op with 62247 yields
Helpful to get server back to a responsive state:
$ mongo
> db.killOp(10883898)
Profiling queries [2]
Using nscanned to find queries that likely aren't
using indexes:
$ grep -P 'nscanned:dd' /var/log/mongodb.log
... or in real-time:
$ tail -f /var/log/mongodb.log | grep -P 'nscanned:dd'
MongoDB also provides the setProfilingLevel()
command which can log all queries to system.profile
collection.
> db.system.profile.find({nscanned:{$gte:10}})
system.profile does incur some performance
overhead, though.
Nagios
● plugin uses pymongo
● set up service groups
Ideas for the future
● Better reconnect handling in applications
● Lose the EBS? Ephemeral disk faster; rely
on replication to keep data persistent.
● Intelligent use of mongo profiling (reduce
observer effect of setProfilingLevel)
● Use more MMS alerts
● Going to 2.4.x (fast counts, hashed
sharding)
Production MongoDB in the Cloud

More Related Content

Production MongoDB in the Cloud

  • 1. Production MongoDB in the Cloud From Essentials to Corner Cases
  • 2. Who are we? Mike Hobbs & Bridget Kromhout Social Commerce & Brand Interest Graph Analytics
  • 3. Why MongoDB? ● Scalable, high-performance, open source ● Dynamic schemas for unstructured data ● Query language close to SQL in power ● "Eventually consistent" is hard to program right
  • 4. Our configuration 12-node cluster (4 shards x 3 replica sets) Several other non-sharded replica sets Desired webapp response time is < 10ms Total data size: 110 GB Total index size: 28 GB Largest collection: 49 GB Largest index: 8.1 GB EC2: EBS, instance size, replication MongoDB: right for only some data sets
  • 5. Memory & iowait Working set needs to fit in memory ● Indexes ● Frequently accessed records Avoid swapping!!! EBS latency in EC2 is an issue.
  • 6. Fragmentation Fragmentation steals from your most precious resource by reserving memory that is not used. Run a compaction when your storageSize significantly exceeds your data size mongos> db.widgets.stats() ... "size" : 5097988, "storageSize" : 22507520, Padding can reduce fragmentation and I/O db.widgets.insert({widg_id: "72120", padding: "XXXX...XXX"}) db.widgets.update({widg_id: "72120"}, { $unset: {padding: ""}, $set: {desc: "Grout remover", price: "13.39", instock: true} })
  • 7. Replica sets "optime" : { "t" : 1365165841000 , "i" : 1 }, "optimeDate" : { "$date" : "Fri Apr 5 07:44:01 2013" }, test-3-1.yourdomain test-3-2.yourdomain test-3-3.yourdomain test-3-1.yourdomain test
  • 8. Elections 08:52:06 [rsMgr] can't see a majority of the set, relinquishing primary 08:52:06 [rsMgr] replSet relinquishing primary state 08:52:06 [rsMgr] replSet SECONDARY 08:52:12 [rsMgr] replSet can't see a majority, will not try to elect self Primary always determined by an election. 2-member replSet without an arbiter: if the secondary goes offline, the primary will step down: Priorities can rig elections. Ensure availability of an odd number of voting members.
  • 9. Manual primary changes No "become primary now" command. Manual stepdowns with recusal timeout are best option. test-1:PRIMARY> rs.stepDown(300) Wed Apr 3 11:45:36 DBClientCursor::init call() failed Wed Apr 3 11:45:36 query failed : admin.$cmd { replSetStepDown: 300.0 } to: 127.0.0.1:27017 Wed Apr 3 11:45:36 Error: error doing query: failed src/mongo/shell/collection.js:155 Wed Apr 3 11:45:36 trying reconnect to 127.0.0.1:27017 Wed Apr 3 11:45:36 reconnect 127.0.0.1:27017 ok test-1:SECONDARY> This triggers an election. (Obviously, make sure your preferred candidate(s) can win.) States: down (initializing), startup2, secondary, primary
  • 10. replSet back to standalone? No. Test server: replicaset of 1, shard of 1. removed --replSet but shard configuration needed manual update: db.shards.update({host:"testreplset/test.domain.net"}, {$set: {host:"test.domain.net"}}) UpdatedExisting values no longer returned by mongos, but visible when connected to mongod: > db.schedule.update({_id:...}, {$set:{lock:true}}, false, true); db.runCommand("getlasterror") { "updatedExisting" : true, "n" : 1, "connectionId" : 73, "err" : null, "ok" : 1 } Solution: re-adding --replSet to the mongod startup line and reverting shard configs. (Bug open with 10gen.)
  • 11. Sharding Can increase parallelization of CPU & I/O Carefully choose a shard key (nontrivial to change) Must run config servers & mongos Doesn't ensure high availability Doesn't help if you're already out of memory 256GB collection max for initial sharding
  • 12. Rebalancing data across shards Queries block while servers negotiate final hand-off. Updating indexes after hand- off can be slow. Best run off-peak mongos> use config switched to db config mongos> db.settings.find() { "_id" : "balancer", "activeWindow" : { "start" : "23:00", "stop" : "6:00" } }
  • 13. Mongos & replSet primary changes Application-level errors talking to mongos after an election: pymongo.errors.AutoReconnect: could not connect to localhost: 27020: [Errno 111] Connection refused pymongo.errors.OperationFailure: database error: error querying server Mongos errors talking to mongod on original primary: Tue Apr 2 09:01:05 [conn3288] Socket say send() errno:110 Connection timed out 10.141.131.214:27017 Tue Apr 2 09:01:05 [conn3288] DBException in process: socket exception [SEND_ERROR] for 10.141.131.214:27017 Connection pool checked lazily; invalid connections can persist for days, depending on load. Can clear manually: mongos> db.adminCommand({connPoolSync:1}); { "ok" : 1 } mongos>
  • 14. Failure handling Applications must handle fail-over outages: AutoReconnect & OperationFailure in pymongo def auto_reconnect(func, *args, **kwargs): """ Executes func, retrying on AutoReconnect """ for _ in range(100): try: return func(self, *args, **kwargs) except pymongo.errors.AutoReconnect: pass except pymongo.errors.OperationFailure: pass time.sleep(0.1) raise TimeoutError()
  • 15. MMS (MongoDB Monitoring Service) ● free; hosted by 10gen ● need to run agent locally ● 10gen's commercial support relies on MMS
  • 16. Profiling queries [1] Finding bad queries that are actively running: $ mongo | tee mongo.log > db.currentOp() ... bye $ grep numYields mongo.log "numYields" : 0, "numYields" : 62247, "numYields" : 0, ... # Use your favorite viewer to find the op with 62247 yields Helpful to get server back to a responsive state: $ mongo > db.killOp(10883898)
  • 17. Profiling queries [2] Using nscanned to find queries that likely aren't using indexes: $ grep -P 'nscanned:dd' /var/log/mongodb.log ... or in real-time: $ tail -f /var/log/mongodb.log | grep -P 'nscanned:dd' MongoDB also provides the setProfilingLevel() command which can log all queries to system.profile collection. > db.system.profile.find({nscanned:{$gte:10}}) system.profile does incur some performance overhead, though.
  • 18. Nagios ● plugin uses pymongo ● set up service groups
  • 19. Ideas for the future ● Better reconnect handling in applications ● Lose the EBS? Ephemeral disk faster; rely on replication to keep data persistent. ● Intelligent use of mongo profiling (reduce observer effect of setProfilingLevel) ● Use more MMS alerts ● Going to 2.4.x (fast counts, hashed sharding)