Mongo db eveningschemadesign
- 3. MongoDB is a ___________ database
1.⯠Document
2.⯠Open source
3.⯠High performance
4.⯠Horizontally scalable
5.⯠Full featured
- 4. 1. Document Database
â˘âŻ Not for .PDF & .DOC ďŹles
â˘âŻ A document is essentially an associative array
â˘âŻ Document = JSON object
â˘âŻ Document = PHP Array
â˘âŻ Document = Python Dict
â˘âŻ Document = Ruby Hash
â˘âŻ etc
- 5. 1. NoSQL Data Model
Key-Value
Store
Riak
Memcache
Project
Voldemort
Redis
BerkeleyDB
Document
Database
MongoDB
CouchDB
OrientDB
Column-Family
Stores
Amazon
SimpleDB
Cassandra
Hbase
Hypertable
Graph
Databases
Neo4J
FlockDB
OrientDB
- 6. 2. Open Source
â˘âŻ MongoDB is an open source project
â˘âŻ On GitHub
â˘âŻ Licensed under the AGPL
â˘âŻ Started & sponsored by MongoDB Inc (formerly
known as 10gen)
â˘âŻ Commercial licenses available
â˘âŻ Contributions welcome
- 7. 3. High Performance
â˘âŻ Written in C++
â˘âŻ Extensive use of memory-mapped ďŹles
i.e.read-through write-through memory caching.
â˘âŻ Runs nearly everywhere
â˘âŻ Data serialized as BSON (fast parsing)
â˘âŻ Full support for primary & secondary indexes
â˘âŻ Document model = less work
- 10. 4. High Availability
â˘âŻ Automated replication and failover
â˘âŻ Multi-data center support
â˘âŻ Improved operational simplicity (e.g., HW swaps)
â˘âŻ Data durability and consistency
- 12. 5. Full Featured
â˘âŻ Ad Hoc queries
â˘âŻ Real time aggregation
â˘âŻ Rich query capabilities
â˘âŻ Strongly consistent
â˘âŻ Geospatial features
â˘âŻ Support for most programming languages
â˘âŻ Flexible schema
- 15. $ tar âzxvf mongodb-osx-x86_64-2.6.0.tgz
$ cd mongodb-osx-i386-2.6.0/bin
$ mkdir âp /data/db
$ ./mongod
Running MongoDB
- 16. MacBook-Pro-:~ $ mongo
MongoDB shell version: 2.6.0
connecting to: test
> db.cms.insert({text: 'Welcome to MongoDB'})
> db.cms.find().pretty()
{
"_id" : ObjectId("51c34130fbd5d7261b4cdb55"),
"text" : "Welcome to MongoDB"
}
Mongo Shell
- 17. _id
â˘âŻ _id is the primary key in MongoDB
â˘âŻ Automatically indexed
â˘âŻ Automatically created as an ObjectId if not provided
â˘âŻ Any unique immutable value could be used
- 18. ObjectId
â˘âŻ ObjectId is a special 12 byte value
â˘âŻ Guaranteed to be unique across your cluster
â˘âŻ ObjectId("50804d0bd94ccab2da652599")
|----ts-----||---mac---||-pid-||----inc-----|
4 3 2 3
- 23. Entities in our Blogging System
â˘âŻ Users (post authors)
â˘âŻ Article
â˘âŻ Comments
â˘âŻ Tags,Category
â˘âŻ Interactions (views,clicks)
- 26. In a MongoDB based app
We start building our app
and let the schema evolve
- 28. Seek = 5+ ms
Read = really really
fast
Post
Author
Comment
Disk seeks and data locality
- 34. MongoDB Drivers
â˘âŻ OfďŹcial Support for 12 languages
â˘âŻ Community drivers for tons more
â˘âŻ Drivers connect to mongo servers
â˘âŻ Drivers translate BSON into native types
â˘âŻ mongo shell is not a driver,but works like one in some
ways
â˘âŻ Installed using typical means (maven,npm,pecl,gem,
pip)
- 36. # Python dictionary (or object)
>>> article = { âtitleâ : âSchema design in MongoDBâ,
âauthorâ : âprasoonkâ,
âsectionâ : âschemaâ,
âslugâ : âschema-design-in-mongodbâ,
âtextâ : âData in MongoDB has a flexible schema.
So, 2 documents neednât have same structure.
It allows implicit schema to evolve.â,
âdateâ : datetime.utcnow(),
âtagsâ : [âMongoDBâ, âschemaâ] }
>>> db[âarticlesâ].insert(article)
Design schema.. In application code
- 37. >>> img_data = Binary(open(âarticle_img.jpgâ).read())
>>> article = { âtitleâ : âSchema evolutionin MongoDBâ,
âauthorâ : âmattbatesâ,
âsectionâ : âschemaâ,
âslugâ : âschema-evolution-in-mongodbâ,
âtextâ : âMongoDb has dynamic schema. For good
performance, you would need an implicit
structure and indexesâ,
âdateâ : datetime.utcnow(),
âtagsâ : [âMongoDBâ, âschemaâ, âmigrationâ],
âheadline_imgâ : {
âimgâ : img_data,
âcaptionâ : âA sample document at the shellâ
}}
Letâs add a headline image
- 38. >>> article = { âtitleâ : âFavourite web application frameworkâ,
âauthorâ : âprasoonkâ,
âsectionâ : âweb-devâ,
âslugâ : âweb-app-frameworksâ,
âgalleryâ : [
{ âimg_urlâ : âhttp://x.com/45rtyâ, âcaptionâ : âFlaskâ, ..},
..
]
âdateâ : datetime.utcnow(),
âtagsâ : [âPythonâ, âwebâ],
}
>>> db[âarticlesâ].insert(article)
And different types of article
- 39. >>> user = {
'user' : 'prasoonk',
'email' : 'prasoon.kumar@mongodb.com',
'password' : 'prasoon101',
'joined' : datetime.utcnow(),
'location' : { 'city' : 'Mumbai' },
}
} >>> db[âusersâ].insert(user)
Users and proďŹles
- 40. Retrive using Comparison Operators
$gt,$gte,$in,$lt,$lte,$ne,$nin
â˘âŻ Use to query documents
â˘âŻ Logical:$or,$and,$not,$nor Element:$exists,$type
â˘âŻ Logical:$or,$and,$not,$nor Element:$exists,$type
â˘âŻ Evaluation:$mod,$regex,$where Geospatial:$geoWithin,$geoIntersects,$near,$nearSphere
- 41. Modelling comments (1)
â˘âŻ Two collectionsâarticles and comments
â˘âŻ Use a reference (i.e. foreign key) to link together
â˘âŻ But.. N+1 queries to retrieve article and comments
{
â_idâ: ObjectId(..),
âtitleâ:âSchema design in MongoDBâ,
âauthorâ:âmattbatesâ,
âdateâ: ISODate(..),
âtagsâ: [âMongoDBâ, âschemaâ],
âsectionâ:âschemaâ,
âslugâ:âschema-design-in-mongodbâ,
âcommentsâ:[ObjectId(..),âŚ]
}
{ â_idâ: ObjectId(..),
âarticle_idâ: 1,
âtextâ: âA great article,helped me
understand schema designâ,
âdateâ: ISODate(..),,
âauthorâ:âjohnsmithâ
}
- 42. $gt,$gte,$in,$lt,$lte,$ne,$nin
â˘âŻ Use to query documents
â˘âŻ Logical:$or,$and,$not,$nor Element:$exists,$type
â˘âŻ Evaluation:$mod,$regex,$where Geospatial:$geoWithin,$geoIntersects,$near,$nearSphere
Comparison Operators
db.articles.ďŹnd( { 'title' : âIntro to MongoDBâ } )
db.articles.ďŹnd( { âdate' : { â$ltâ :
{ISODate("2014-02-19T00:00:00.000Z") }} )
db.articles.ďŹnd( { âtagsâ : { â$inâ : [ânosqlâ,âdatabaseâ] } } );
- 43. Modelling comments (2)
â˘âŻ Single articles collectionâ
embed comments in article
documents
â˘âŻ Pros
â˘âŻ Single query, document designed
for the access pattern
â˘âŻ Locality (disk, shard)
â˘âŻ Cons
â˘âŻ Comments array is unbounded;
documents will grow in size
(remember 16MB document
limit)
{
â_idâ: ObjectId(..),
âtitleâ:âSchema design in MongoDBâ,
âauthorâ:âmattbatesâ,
âdateâ: ISODate(..),
âtagsâ: [âMongoDBâ,âschemaâ],
âŚ
âcommentsâ:[
{
âtextâ: âAgreatarticle,helpedme
understandschemadesignâ,
âdateâ:ISODate(..),
âauthorâ:âjohnsmithâ
},
âŚ
]
}
- 44. Modelling comments (3)
â˘âŻ Another option: hybrid of (2) and (3),embed
top x comments (e.g.by date,popularity) into
the article document
â˘âŻ Fixed-size (2.4 feature) comments array
â˘âŻ All other comments âoverďŹowâ into a comments
collection (double write) in buckets
â˘âŻ Pros
â⯠Document size is more ďŹxed â fewer moves
â⯠Single query built
â⯠Full comment history with rich query/aggregation
- 45. Modelling comments (3)
{
â_idâ:ObjectId(..),
âtitleâ:âSchemadesigninMongoDBâ,
âauthorâ:âmattbatesâ,
âdateâ:ISODate(..),
âtagsâ:[âMongoDBâ, âschemaâ],
âŚ
âcomments_countâ:45,
âcomments_pagesâ:1
âcommentsâ:[
{
âtextâ: âAgreatarticle,helpedme
understandschemadesignâ,
âdateâ:ISODate(..),
âauthorâ:âjohnsmithâ
},
âŚ
]
}
Total number of comments
â˘âŻ Integer counter updated by
update operation as
comments added/removed
Number of pages
â˘âŻ Page is a bucket of 100
comments (see next slide..)
Fixed-size comments array
â˘âŻ 10 most recent
â˘âŻ Sorted by date on insertion
- 46. Modelling comments (3)
{
â_idâ: ObjectId(..),
âarticle_idâ: ObjectId(..),
âpageâ: 1,
âcountâ: 42
âcommentsâ: [
{
âtextâ: âA great article,helped me
understand schema designâ,
âdateâ: ISODate(..),
âauthorâ:âjohnsmithâ
},
âŚ
}
One comment bucket
(page) document
containing up to about 100
comments
Array of 100 comment sub-
documents
- 48. Modelling interactions
â˘âŻ Document per article per dayâ
âbucketingâ
â˘âŻ Daily counter and hourly sub-
document counters for interactions
â˘âŻ Bounded array (24 hours)
â˘âŻ Single query to retrieve daily article
interactions; ready-made for
graphing and further aggregation
{
â_idâ: ObjectId(..),
âarticle_idâ: ObjectId(..),
âsectionâ:âschemaâ,
âdateâ: ISODate(..),
âdailyâ: {âviewsâ: 45,âcommentsâ: 150 }
âhoursâ: {
0 : {âviewsâ: 10 },
1 : {âviewsâ: 2 },
âŚ
23 : {âcommentsâ: 14,âviewsâ: 10 }
}
}
- 49. JSON and RESTful API
Client-side
JSON
(eg AngularJS, (BSON)
Real applications are not built at a shellâletâs build a RESTful API.
Pymongo driver
Python web
app
HTTP(S) REST
Examples to follow: Python RESTful API using Flask microframework
- 50. myCMS REST endpoints
Method URI Action
GET /articles Retrieve all articles
GET /articles-by-tag/[tag] Retrieve all articles by tag
GET /articles/[article_id] Retrieve a speciďŹc article by article_id
POST /articles Add a new article
GET /articles/[article_id]/comments Retrieve all article comments by
article_id
POST /articles/[article_id]/comments Add a new comment to an article.
POST /users Register a user user
GET /users/[username] Retrieve userâs proďŹle
PUT /users/[username] Update a userâs proďŹle
- 51. $ git clone http://www.github.com/prasoonk/mycms_mongodb
$ cd mycms-mongodb
$ virtualenv venv
$ source venv/bin/activate
$ pip install âr requirements.txt
$ mkdir âp data/db
$ mongod --dbpath=data/db --fork --logpath=mongod.log
$ python web.py
[$ deactivate]
Getting started with the skeleton code
- 52. @app.route('/cms/api/v1.0/articles', methods=['GET'])
def get_articles():
"""Retrieves all articles in the collection
sorted by date
"""
# query all articles and return a cursor sorted by date
cur = db['articles'].find().sort('dateâ)
if not cur:
abort(400)
# iterate the cursor and add docs to a dict
articles = [article for article in cur]
return jsonify({'articles' : json.dumps(articles, default=json_util.default)})
RESTful API methods in Python + Flask
- 53. @app.route('/cms/api/v1.0/articles/<string:article_id>/comments', methods = ['POST'])
def add_comment(article_id):
"""Adds a comment to the specified article and a
bucket, as well as updating a view counter
"ââ
âŚ
page_id = article['last_comment_id'] // 100
âŚ
# push the comment to the latest bucket and $inc the count
page = db['comments'].find_and_modify(
{ 'article_id' : ObjectId(article_id),
'page' : page_id},
{ '$inc' : { 'count' : 1 },
'$push' : {
'comments' : comment } },
fields= {'count' : 1},
upsert=True,
new=True)
RESTful API methods in Python + Flask
- 54. # $inc the page count if bucket size (100) is exceeded
if page['count'] > 100:
db.articles.update(
{ '_id' : article_id,
'comments_pages': article['comments_pages'] },
{ '$inc': { 'comments_pages': 1 } } )
# let's also add to the article itself
# most recent 10 comments only
res = db['articles'].update(
{'_id' : ObjectId(article_id)},
{'$push' : {'comments' : { '$each' : [comment],
'$sort' : {âdate' : 1 },
'$slice' : -10}},
'$inc' : {'comment_count' : 1}})
âŚ
RESTful API methods in Python + Flask
- 55. def add_interaction(article_id, type):
"""Record the interaction (view/comment) for the
specified article into the daily bucket and
update an hourly counter
"""
ts = datetime.datetime.utcnow()
# $inc daily and hourly view counters in day/article stats bucket
# note the unacknowledged w=0 write concern for performance
db['interactions'].update(
{ 'article_id' : ObjectId(article_id),
'date' : datetime.datetime(ts.year, ts.month, ts.day)},
{ '$inc' : {
'daily.{}â.format(type) : 1,
'hourly.{}.{}'.format(ts.hour, type) : 1
}},
upsert=True,
w=0)
RESTful API methods in Python + Flask
- 56. $ curl -i http://localhost:5000/cms/api/v1.0/articles
HTTP/1.0 200 OK
Content-Type: application/json
Content-Length: 335
Server: Werkzeug/0.9.4 Python/2.7.5
Date: Thu, 10 Apr 2014 16:00:51 GMT
{
"articles": "[{"title": "Schema design in MongoDB", "text": "Data in MongoDB
has a flexible schema..", "section": "schema", "author": "prasoonk", "date":
{"$date": 1397145312505}, "_id": {"$oid": "5346bef5f2610c064a36a793"},
"slug": "schema-design-in-mongodb", "tags": ["MongoDB", "schema"]}]"}
Testing the API â retrieve articles
- 57. $ curl -H "Content-Type: application/json" -X POST -d '{"text":"An interesting
article and a great read."}'
http://localhost:5000/cms/api/v1.0/articles/52ed73a30bd031362b3c6bb3/
comments
{
"comment": "{"date": {"$date": 1391639269724}, "text": "An interesting
article and a great read."}â
}
Testing the API â comment on an article
- 58. Schema iteration
New feature in the backlog?
Documents have dynamic schema so we just iterate the
object schema.
>>> user = {âusernameâ:âmattâ,
âďŹrstâ:âMattâ,
âlastâ:âBatesâ,
âpreferencesâ: {âopt_outâ: True } }
>>> user.save(user)
- 60. Further reading
â˘âŻ âmyCMSâ skeleton source code:
http://www.github.com/prasoonk/mycms_mongodb
â˘âŻ Data Models
http://docs.mongodb.org/manual/data-modeling/
â˘âŻ Use case-metadata and asset management:
http://docs.mongodb.org/ecosystem/use-cases/metadata-and-
asset-management/
â˘âŻ Use case-storing comments:
http://docs.mongodb.org/ecosystem/use-cases/storing-
comments/
- 63. For More Information
Resource Location
MongoDB Downloads mongodb.com/download
Free Online Training education.mongodb.com
Webinars and Events mongodb.com/events
White Papers mongodb.com/white-papers
Case Studies mongodb.com/customers
Presentations mongodb.com/presentations
Documentation docs.mongodb.org
Additional Info info@mongodb.com
Resource Location