This document discusses MongoDB configuration and operations in the cloud. It describes:
1) The authors' 12-node MongoDB cluster configuration running on AWS EC2 with several replica sets and a total data size of 110GB.
2) Key considerations for MongoDB in the cloud including memory usage, fragmentation, elections, and manual primary changes.
3) Additional topics like sharding, rebalancing data, mongos behavior during elections, failure handling, and monitoring with MMS and Nagios.
2. Who are we?
Mike Hobbs & Bridget Kromhout
Social Commerce
&
Brand Interest Graph Analytics
3. Why MongoDB?
● Scalable, high-performance, open source
● Dynamic schemas for unstructured data
● Query language close to SQL in power
● "Eventually consistent" is hard to program right
4. Our configuration
12-node cluster (4 shards x 3 replica sets)
Several other non-sharded replica sets
Desired webapp response time is < 10ms
Total data size: 110 GB
Total index size: 28 GB
Largest collection: 49 GB
Largest index: 8.1 GB
EC2: EBS, instance size, replication
MongoDB: right for only some data sets
5. Memory & iowait
Working set needs to fit in memory
● Indexes
● Frequently accessed records
Avoid swapping!!!
EBS latency in EC2 is an issue.
6. Fragmentation
Fragmentation steals from your most precious resource by
reserving memory that is not used.
Run a compaction when your storageSize significantly
exceeds your data size
mongos> db.widgets.stats()
...
"size" : 5097988,
"storageSize" : 22507520,
Padding can reduce fragmentation and I/O
db.widgets.insert({widg_id: "72120", padding: "XXXX...XXX"})
db.widgets.update({widg_id: "72120"}, {
$unset: {padding: ""},
$set: {desc: "Grout remover", price: "13.39", instock:
true}
})
8. Elections
08:52:06 [rsMgr] can't see a majority of the set, relinquishing
primary
08:52:06 [rsMgr] replSet relinquishing primary state
08:52:06 [rsMgr] replSet SECONDARY
08:52:12 [rsMgr] replSet can't see a majority, will not try to
elect self
Primary always determined by an election.
2-member replSet without an arbiter: if the secondary goes
offline, the primary will step down:
Priorities can rig elections.
Ensure availability of an odd number of voting members.
9. Manual primary changes
No "become primary now" command.
Manual stepdowns with recusal timeout are best option.
test-1:PRIMARY> rs.stepDown(300)
Wed Apr 3 11:45:36 DBClientCursor::init call() failed
Wed Apr 3 11:45:36 query failed : admin.$cmd { replSetStepDown: 300.0 } to:
127.0.0.1:27017
Wed Apr 3 11:45:36 Error: error doing query: failed
src/mongo/shell/collection.js:155
Wed Apr 3 11:45:36 trying reconnect to 127.0.0.1:27017
Wed Apr 3 11:45:36 reconnect 127.0.0.1:27017 ok
test-1:SECONDARY>
This triggers an election.
(Obviously, make sure your preferred candidate(s) can win.)
States: down (initializing), startup2, secondary, primary
10. replSet back to standalone? No.
Test server: replicaset of 1, shard of 1. removed --replSet but
shard configuration needed manual update:
db.shards.update({host:"testreplset/test.domain.net"}, {$set:
{host:"test.domain.net"}})
UpdatedExisting values no longer returned by mongos, but
visible when connected to mongod:
> db.schedule.update({_id:...}, {$set:{lock:true}}, false, true);
db.runCommand("getlasterror")
{
"updatedExisting" : true,
"n" : 1,
"connectionId" : 73,
"err" : null,
"ok" : 1
}
Solution: re-adding --replSet to the mongod startup line and
reverting shard configs. (Bug open with 10gen.)
11. Sharding
Can increase parallelization of CPU & I/O
Carefully choose a shard key (nontrivial to change)
Must run config servers & mongos
Doesn't ensure high availability
Doesn't help if you're already out of memory
256GB collection max
for initial sharding
12. Rebalancing data across shards
Queries block while servers
negotiate final hand-off.
Updating indexes after hand-
off can be slow.
Best run off-peak
mongos> use config
switched to db config
mongos> db.settings.find()
{ "_id" : "balancer", "activeWindow" :
{ "start" : "23:00", "stop" : "6:00" }
}
13. Mongos & replSet primary changes
Application-level errors talking to mongos after an election:
pymongo.errors.AutoReconnect: could not connect to localhost:
27020: [Errno 111] Connection refused
pymongo.errors.OperationFailure: database error: error
querying server
Mongos errors talking to mongod on original primary:
Tue Apr 2 09:01:05 [conn3288] Socket say send() errno:110
Connection timed out 10.141.131.214:27017
Tue Apr 2 09:01:05 [conn3288] DBException in process: socket
exception [SEND_ERROR] for 10.141.131.214:27017
Connection pool checked lazily; invalid connections can persist
for days, depending on load. Can clear manually:
mongos> db.adminCommand({connPoolSync:1});
{ "ok" : 1 }
mongos>
14. Failure handling
Applications must handle fail-over outages:
AutoReconnect & OperationFailure in pymongo
def auto_reconnect(func, *args, **kwargs):
""" Executes func, retrying on AutoReconnect """
for _ in range(100):
try:
return func(self, *args, **kwargs)
except pymongo.errors.AutoReconnect:
pass
except pymongo.errors.OperationFailure:
pass
time.sleep(0.1)
raise TimeoutError()
15. MMS (MongoDB Monitoring Service)
● free; hosted by 10gen
● need to run agent locally
● 10gen's commercial support relies on MMS
16. Profiling queries [1]
Finding bad queries that are actively running:
$ mongo | tee mongo.log
> db.currentOp()
...
bye
$ grep numYields mongo.log
"numYields" : 0,
"numYields" : 62247,
"numYields" : 0,
...
# Use your favorite viewer to find the op with 62247 yields
Helpful to get server back to a responsive state:
$ mongo
> db.killOp(10883898)
17. Profiling queries [2]
Using nscanned to find queries that likely aren't
using indexes:
$ grep -P 'nscanned:dd' /var/log/mongodb.log
... or in real-time:
$ tail -f /var/log/mongodb.log | grep -P 'nscanned:dd'
MongoDB also provides the setProfilingLevel()
command which can log all queries to system.profile
collection.
> db.system.profile.find({nscanned:{$gte:10}})
system.profile does incur some performance
overhead, though.
19. Ideas for the future
● Better reconnect handling in applications
● Lose the EBS? Ephemeral disk faster; rely
on replication to keep data persistent.
● Intelligent use of mongo profiling (reduce
observer effect of setProfilingLevel)
● Use more MMS alerts
● Going to 2.4.x (fast counts, hashed
sharding)