SlideShare a Scribd company logo
mongodb @ foursquare
MongoSF - 5/24/2011
Jorge Ortiz (@jorgeortiz85)
what is foursquare?
location-based social network - “check-in” to bars,
restaurants, museums, parks, etc
  friend-finder (where are my friends right now?)
  virtual game (badges, points, mayorships)
  city guide (local, personalized recommendations)
  location diary + stats engine (where was I a year ago?)
  specials (get rewards at your favorite restaurant)
foursquare: the numbers

>9M users
~3M checkins/day
>15M venues
>300k merchants
>60 employees
foursquare: the tech

 Nginx, HAProxy
 Scala, Lift
 MongoDB, PostgreSQL (legacy)
 (Kestrel, Munin, Ganglia, Python, Memcache, ...)
 All on EC2
foursquare <3’s mongodb

fast
indexes & rich queries
sharding, auto-balancing
replication (see: http://engineering.foursquare.com/)
geo-indexes
amazing support
mongodb: our numbers

8 clusters
  some sharded, some not
  some master/slave, some replica set
~40 machines (68.4GB, m2.4xl on EC2)
2.3 billion records
~15k QPS
mongodb: lessons learned

keep working set in memory
avoid long-running queries (reads or writes)
monitor everything (especially per-collection stats)
shard from day 1
beware EBS
use small field names for large collections
keep working set in memory
avoid long-running queries
monitor everything
(per collection stats)
shard from day 1
beware EBS
use small field names for
large collections
mongodb: pain points


mongos -- failover and thundering herds
index creation -- production impact unclear
auto-balancing -- getting there
replication chains -- use replica sets
rogue: a scala dsl for mongo

 type-safe
 all mongo query features
 logging & validation hooks
 pagination
 index-aware

         http://github.com/foursquare/rogue
mongo-java-driver query
val query =
  (BasicDBOBjectBuilder
    .start
    .push(“mayorid”)
      .add(“$lte”, 100)
    .pop
    .push(“veneuname”)
      .add(“$eq”, “Starbucks”)
    .pop
    .get)
rogue: code example

Venue where (_.mayorid <= 100)
        and (_.venuename eqs “Starbucks”)
        and (_.tags contains “wifi”)
        and (_.latlng near
              (39.0, -74.0, Degrees(0.2))
  orderDesc (_._id)
      fetch (5)
rogue: schema example

class Venue extends MongoRecord[Venue] {
  object _id extends ObjectIdField(this)
  object venuename extends StringField(this)
  object mayorid extends LongField(this)
  object tags extends ListField[String](this)
  object latlng extends LatLngField(this)
}
rogue: logging & validation

 logging:
   slf4j
   Tracer
 validation:
   radius, $in size
   index checks
rogue: pagination

val query: Query[Venue] = ...

val vs: List[Venue] =
  (query
    .countPerPage(20)
    .setPage(5)
    .fetch())
rogue: cursors


val query: Query[Checkin] = ...

for (checkin <- query) {
  ... f(checkin) ...
}
rogue: index-aware


val vs: List[Checkin] =
  (Checkin
    where (_.userid eqs 646)
      and (_.venueid eqs vid)
    fetch ())
rogue: index-aware


val vs: List[Checkin] =
  (Checkin
    where (_.userid eqs 646)
      and (_.venueid eqs vid) // hidden scan!
    fetch ())
rogue: index-aware


val vs: List[Checkin] =
  (Checkin
    where (_.userid eqs 646) // known index
     scan (_.venueid eqs vid) // known scan
    fetch ())
rogue: future directions


 iteratees for cursors
 compile-time index checking
 select partial objects
 generate full javascript for mapreduce
we’re hiring
    (nyc & sf)
http://foursquare.com/jobs
  jorge@foursquare.com
        @jorgeortiz85

More Related Content

MongoSF - mongodb @ foursquare

  • 1. mongodb @ foursquare MongoSF - 5/24/2011 Jorge Ortiz (@jorgeortiz85)
  • 2. what is foursquare? location-based social network - “check-in” to bars, restaurants, museums, parks, etc friend-finder (where are my friends right now?) virtual game (badges, points, mayorships) city guide (local, personalized recommendations) location diary + stats engine (where was I a year ago?) specials (get rewards at your favorite restaurant)
  • 3. foursquare: the numbers >9M users ~3M checkins/day >15M venues >300k merchants >60 employees
  • 4. foursquare: the tech Nginx, HAProxy Scala, Lift MongoDB, PostgreSQL (legacy) (Kestrel, Munin, Ganglia, Python, Memcache, ...) All on EC2
  • 5. foursquare <3’s mongodb fast indexes & rich queries sharding, auto-balancing replication (see: http://engineering.foursquare.com/) geo-indexes amazing support
  • 6. mongodb: our numbers 8 clusters some sharded, some not some master/slave, some replica set ~40 machines (68.4GB, m2.4xl on EC2) 2.3 billion records ~15k QPS
  • 7. mongodb: lessons learned keep working set in memory avoid long-running queries (reads or writes) monitor everything (especially per-collection stats) shard from day 1 beware EBS use small field names for large collections
  • 8. keep working set in memory
  • 13. use small field names for large collections
  • 14. mongodb: pain points mongos -- failover and thundering herds index creation -- production impact unclear auto-balancing -- getting there replication chains -- use replica sets
  • 15. rogue: a scala dsl for mongo type-safe all mongo query features logging & validation hooks pagination index-aware http://github.com/foursquare/rogue
  • 16. mongo-java-driver query val query = (BasicDBOBjectBuilder .start .push(“mayorid”) .add(“$lte”, 100) .pop .push(“veneuname”) .add(“$eq”, “Starbucks”) .pop .get)
  • 17. rogue: code example Venue where (_.mayorid <= 100) and (_.venuename eqs “Starbucks”) and (_.tags contains “wifi”) and (_.latlng near (39.0, -74.0, Degrees(0.2)) orderDesc (_._id) fetch (5)
  • 18. rogue: schema example class Venue extends MongoRecord[Venue] { object _id extends ObjectIdField(this) object venuename extends StringField(this) object mayorid extends LongField(this) object tags extends ListField[String](this) object latlng extends LatLngField(this) }
  • 19. rogue: logging & validation logging: slf4j Tracer validation: radius, $in size index checks
  • 20. rogue: pagination val query: Query[Venue] = ... val vs: List[Venue] = (query .countPerPage(20) .setPage(5) .fetch())
  • 21. rogue: cursors val query: Query[Checkin] = ... for (checkin <- query) { ... f(checkin) ... }
  • 22. rogue: index-aware val vs: List[Checkin] = (Checkin where (_.userid eqs 646) and (_.venueid eqs vid) fetch ())
  • 23. rogue: index-aware val vs: List[Checkin] = (Checkin where (_.userid eqs 646) and (_.venueid eqs vid) // hidden scan! fetch ())
  • 24. rogue: index-aware val vs: List[Checkin] = (Checkin where (_.userid eqs 646) // known index scan (_.venueid eqs vid) // known scan fetch ())
  • 25. rogue: future directions iteratees for cursors compile-time index checking select partial objects generate full javascript for mapreduce
  • 26. we’re hiring (nyc & sf) http://foursquare.com/jobs jorge@foursquare.com @jorgeortiz85

Editor's Notes

  1. \n
  2. \n
  3. \n
  4. \n
  5. \n
  6. \n
  7. \n
  8. \n
  9. \n
  10. \n
  11. \n
  12. \n
  13. \n
  14. \n
  15. \n
  16. \n
  17. \n
  18. \n
  19. \n
  20. \n
  21. \n
  22. \n
  23. \n
  24. \n
  25. \n
  26. \n