SlideShare a Scribd company logo
Application Development Series
Back to Basics
Reporting & Analytics
Daniel Roberts
@dmroberts
#MongoDBBasics
2
• Recap from last session
• Reporting / Analytics options
• Map Reduce
• Aggregation Framework introduction
– Aggregation explain
• mycms application reports
• Geospatial with Aggregation Framework
• Text Search with Aggregation Framework
Agenda
3
• Virtual Genius Bar
– Use the chat to post
questions
– EMEA Solution
Architecture / Support
team are on hand
– Make use of them
during the sessions!!!
Q & A
Recap from last time….
5
Indexing
• Indexes
• Multikey, compound,
„dot.notation‟
• Covered, sorting
• Text, GeoSpatial
• Btrees
>db.articles.ensureIndex( { author
: 1, tags : 1 } )
>db.user.find({user:"danr"}, {_id:0,
password:1})
>db.articles.ensureIndex( {
location: “2dsphere” } )
>>db.articles.ensureIndex(
{ "$**" : “text”,
name : “TextIndex”} )
options db.col.ensureIndex({ key : type})
6
Index performance / efficiency
• Examine index plans
• Identity slow queries
• n / nscanned ratio
• Which index used.
operators .explain() , db profiler
> db.articles.find(
{author:'Dan Roberts‟})
.sort({date:-1}
).explain()
> db.setProfilingLevel(1,
100)
{ "was" : 0, "slowms" : 100,
"ok" : 1 }
> db.system.profile.find()
.pretty()
Reporting / Analytics options
8
• Query Language
– Leverage pre aggregated documents
• Aggregation Framework
– Calculate new values from the data that we have
– For instance : Average views, comments count
• MapReduce
– Internal Javascript based implementation
– External Hadoop, using the MongoDB connector
• A combination of the above
Access data for reporting, options
9
• Immediate results
– Simple from a query
perspective.
– Interactions collection
Pre Aggregated Reports
{
„_id‟ : ObjectId(..),
„article_id‟ : ObjectId(..),
„section‟ : „schema‟,
„date‟ : ISODate(..),
„daily‟: { „views‟ : 45,
„comments‟ : 150 }
„hours‟ : {
0 : { „views‟ : 10 },
1 : { „views‟ : 2 },
…
23 : { „views‟ : 14,
„comments‟ : 10 }
}
}
> db.interactions.find(
{"article_id" : ObjectId(”…..")},
{_id:0, hourly:1}
)
10
• Use query result to display directly in application
– Create new REST API
– D3.js library or similar in UI
Pre Aggregated Reports
{
"hourly" : {
"0" : {
"view" : 1
},
"1" : {
"view" : 1
},
……
"22" : {
"view" : 5
},
"23" : {
"view" : 3
}
}
}
Map Reduce
12
• Map Reduce
– MongoDB – JavaScript
• Incremental Map Reduce
Map Reduce
//Map Reduce Example
> db.articles.mapReduce(
function() { emit(this.author, this.comment_count); },
function(key, values) { return Array.sum (values) },
{
query : {},
out: { merge: "comment_count" }
}
)
Output
{ "_id" : "Dan Roberts", "value" : 6 }
{ "_id" : "Jim Duffy", "value" : 1 }
{ "_id" : "Kunal Taneja", "value" : 2 }
{ "_id" : "Paul Done", "value" : 2 }
13
MongoDB – Hadoop Connector
Hadoop Integration
Primary
Secondary
Secondary
HDFS
Primary
Secondary
Secondary
Primary
Secondary
Secondary
Primary
Secondary
Secondary
HDFS HDFS HDFS
MapReduce MapReduce MapReduce MapReduce
MongoS MongoSMongoS
Application ApplicationApplication
Application
Dash Boards /
Reporting
1) Data Flow,
Input /
Output via
Application
Tier
Aggregation Framework
15
• Multi-stage pipeline
– Like a unix pipe –
• “ps -ef | grep mongod”
– Aggregate data, Transform
documents
– Implemented in the core server
Aggregation Framework
//Find out which are the most popular tags…
db.articles.aggregate([
{ $unwind : "$tags" },
{ $group : { _id : "$tags" , number : { $sum : 1 } } },
{ $sort : { number : -1 } }
])
Output
{ "_id" : "mongodb", "number" : 6 }
{ "_id" : "nosql", "number" : 3 }
{ "_id" : "database", "number" : 1 }
{ "_id" : "aggregation", "number" : 1 }
{ "_id" : "node", "number" : 1 }
16
In our mycms application..
//Our new python example
@app.route('/cms/api/v1.0/tag_counts', methods=['GET'])
def tag_counts():
pipeline = [ { "$unwind" : "$tags" },
{ "$group" : { "_id" : "$tags" , "number" : { "$sum" : 1 } }
},
{ "$sort" : { "number" : -1 } }]
cur = db['articles'].aggregate(pipeline, cursor={})
# Check everything ok
if not cur:
abort(400)
# iterate the cursor and add docs to a dict
tags = [tag for tag in cur]
return jsonify({'tags' : json.dumps(tags, default=json_util.default)})
17
• Pipeline and Expression operators
Aggregation operators
Pipeline
$match
$sort
$limit
$skip
$project
$unwind
$group
$geoNear
$text
$search
Tip: Other operators for date, time, boolean and string manipulation
Expression
$addToSet
$first
$last
$max
$min
$avg
$push
$sum
Arithmetic
$add
$divide
$mod
$multiply
$subtract
Conditional
$cond
$ifNull
Variables
$let
$map
18
• What reports and analytics do we need in our
application?
– Popular Tags
– Popular Articles
– Popular Locations – integration with Geo Spatial
– Average views per hour or day
Application Reports
19
• Unwind each „tags‟ array
• Group and count each one, then Sort
• Output to new collection
– Query from new collection so don‟t need to compute for
every request.
Popular Tags
db.articles.aggregate([
{ $unwind : "$tags" },
{ $group : { _id : "$tags" , number : { $sum : 1 } } },
{ $sort : { number : -1 } },
{ $out : "tags"}
])
20
• Top 5 articles by average daily views
– Use the $avg operator
– Use use $match to constrain data range
• Utilise with $gt and $lt operators
Popular Articles
db.interactions.aggregate([
{
{$match : { date :
{ $gt : ISODate("2014-02-20T00:00:00.000Z")}}},
{$group : {_id: "$article_id", a : { $avg : "$daily.view"}}},
{$sort : { a : -1}},
{$limit : 5}
]);
21
• Use Explain plan to ensure the efficient use of the
index when querying.
Aggregation Framework Explain
db.interactions.aggregate([
{$group : {_id: "$article_id", a : { $avg : "$daily.view"}}},
{$sort : { a : -1}},
{$limit : 5}
],
{explain : true}
);
22
Explain output…
{
"stages" : [
{
"$cursor" : { "query" : … }, "fields" : { … },
"plan" : {
"cursor" : "BasicCursor",
"isMultiKey" : false,
"scanAndOrder" : false,
"allPlans" : [
{
"cursor" : "BasicCursor",
"isMultiKey" : false,
"scanAndOrder" : false
}
]
}
}
},
…
"ok" : 1
}
Geo Spatial & Text Search
Aggregation
24
• $text operator with aggregation framework
– All articles with MongoDB
– Group by author, sort by comments count
Text Search
db.articles.aggregate([
{ $match: { $text: { $search: "mongodb" } } },
{ $group: { _id: "$author", comments:
{ $sum: "$comment_count" } } }
{$sort : {comments: -1}},
])
25
• $geoNear operator with aggregation framework
– Again use geo operator in the $match statement.
– Group by author, and article count.
Utilise with Geo spatial
db.articles.aggregate([
{ $match: { location: { $geoNear :
{ $geometry :
{ type: "Point" ,coordinates : [-0.128, 51.507] } },
$maxDistance :5000}
}
},
{ $group: { _id: "$author", articleCount: { $sum: 1 } } }
])
Summary
27
• Aggregating Data…
– Map Reduce
– Hadoop
– Pre-Aggregated Reports
– Aggregation Framework
• Tune with Explain plan
• Compute on the fly or Compute and store
• Geospatial
• Text Search
Summary
28
– Operations for you application
– Scale out
– Availability
– How do we prepare of production
– Sizing
Next Session – 3th April
1403   app dev series - session 5 - analytics

More Related Content

1403 app dev series - session 5 - analytics

  • 1. Application Development Series Back to Basics Reporting & Analytics Daniel Roberts @dmroberts #MongoDBBasics
  • 2. 2 • Recap from last session • Reporting / Analytics options • Map Reduce • Aggregation Framework introduction – Aggregation explain • mycms application reports • Geospatial with Aggregation Framework • Text Search with Aggregation Framework Agenda
  • 3. 3 • Virtual Genius Bar – Use the chat to post questions – EMEA Solution Architecture / Support team are on hand – Make use of them during the sessions!!! Q & A
  • 4. Recap from last time….
  • 5. 5 Indexing • Indexes • Multikey, compound, „dot.notation‟ • Covered, sorting • Text, GeoSpatial • Btrees >db.articles.ensureIndex( { author : 1, tags : 1 } ) >db.user.find({user:"danr"}, {_id:0, password:1}) >db.articles.ensureIndex( { location: “2dsphere” } ) >>db.articles.ensureIndex( { "$**" : “text”, name : “TextIndex”} ) options db.col.ensureIndex({ key : type})
  • 6. 6 Index performance / efficiency • Examine index plans • Identity slow queries • n / nscanned ratio • Which index used. operators .explain() , db profiler > db.articles.find( {author:'Dan Roberts‟}) .sort({date:-1} ).explain() > db.setProfilingLevel(1, 100) { "was" : 0, "slowms" : 100, "ok" : 1 } > db.system.profile.find() .pretty()
  • 8. 8 • Query Language – Leverage pre aggregated documents • Aggregation Framework – Calculate new values from the data that we have – For instance : Average views, comments count • MapReduce – Internal Javascript based implementation – External Hadoop, using the MongoDB connector • A combination of the above Access data for reporting, options
  • 9. 9 • Immediate results – Simple from a query perspective. – Interactions collection Pre Aggregated Reports { „_id‟ : ObjectId(..), „article_id‟ : ObjectId(..), „section‟ : „schema‟, „date‟ : ISODate(..), „daily‟: { „views‟ : 45, „comments‟ : 150 } „hours‟ : { 0 : { „views‟ : 10 }, 1 : { „views‟ : 2 }, … 23 : { „views‟ : 14, „comments‟ : 10 } } } > db.interactions.find( {"article_id" : ObjectId(”…..")}, {_id:0, hourly:1} )
  • 10. 10 • Use query result to display directly in application – Create new REST API – D3.js library or similar in UI Pre Aggregated Reports { "hourly" : { "0" : { "view" : 1 }, "1" : { "view" : 1 }, …… "22" : { "view" : 5 }, "23" : { "view" : 3 } } }
  • 12. 12 • Map Reduce – MongoDB – JavaScript • Incremental Map Reduce Map Reduce //Map Reduce Example > db.articles.mapReduce( function() { emit(this.author, this.comment_count); }, function(key, values) { return Array.sum (values) }, { query : {}, out: { merge: "comment_count" } } ) Output { "_id" : "Dan Roberts", "value" : 6 } { "_id" : "Jim Duffy", "value" : 1 } { "_id" : "Kunal Taneja", "value" : 2 } { "_id" : "Paul Done", "value" : 2 }
  • 13. 13 MongoDB – Hadoop Connector Hadoop Integration Primary Secondary Secondary HDFS Primary Secondary Secondary Primary Secondary Secondary Primary Secondary Secondary HDFS HDFS HDFS MapReduce MapReduce MapReduce MapReduce MongoS MongoSMongoS Application ApplicationApplication Application Dash Boards / Reporting 1) Data Flow, Input / Output via Application Tier
  • 15. 15 • Multi-stage pipeline – Like a unix pipe – • “ps -ef | grep mongod” – Aggregate data, Transform documents – Implemented in the core server Aggregation Framework //Find out which are the most popular tags… db.articles.aggregate([ { $unwind : "$tags" }, { $group : { _id : "$tags" , number : { $sum : 1 } } }, { $sort : { number : -1 } } ]) Output { "_id" : "mongodb", "number" : 6 } { "_id" : "nosql", "number" : 3 } { "_id" : "database", "number" : 1 } { "_id" : "aggregation", "number" : 1 } { "_id" : "node", "number" : 1 }
  • 16. 16 In our mycms application.. //Our new python example @app.route('/cms/api/v1.0/tag_counts', methods=['GET']) def tag_counts(): pipeline = [ { "$unwind" : "$tags" }, { "$group" : { "_id" : "$tags" , "number" : { "$sum" : 1 } } }, { "$sort" : { "number" : -1 } }] cur = db['articles'].aggregate(pipeline, cursor={}) # Check everything ok if not cur: abort(400) # iterate the cursor and add docs to a dict tags = [tag for tag in cur] return jsonify({'tags' : json.dumps(tags, default=json_util.default)})
  • 17. 17 • Pipeline and Expression operators Aggregation operators Pipeline $match $sort $limit $skip $project $unwind $group $geoNear $text $search Tip: Other operators for date, time, boolean and string manipulation Expression $addToSet $first $last $max $min $avg $push $sum Arithmetic $add $divide $mod $multiply $subtract Conditional $cond $ifNull Variables $let $map
  • 18. 18 • What reports and analytics do we need in our application? – Popular Tags – Popular Articles – Popular Locations – integration with Geo Spatial – Average views per hour or day Application Reports
  • 19. 19 • Unwind each „tags‟ array • Group and count each one, then Sort • Output to new collection – Query from new collection so don‟t need to compute for every request. Popular Tags db.articles.aggregate([ { $unwind : "$tags" }, { $group : { _id : "$tags" , number : { $sum : 1 } } }, { $sort : { number : -1 } }, { $out : "tags"} ])
  • 20. 20 • Top 5 articles by average daily views – Use the $avg operator – Use use $match to constrain data range • Utilise with $gt and $lt operators Popular Articles db.interactions.aggregate([ { {$match : { date : { $gt : ISODate("2014-02-20T00:00:00.000Z")}}}, {$group : {_id: "$article_id", a : { $avg : "$daily.view"}}}, {$sort : { a : -1}}, {$limit : 5} ]);
  • 21. 21 • Use Explain plan to ensure the efficient use of the index when querying. Aggregation Framework Explain db.interactions.aggregate([ {$group : {_id: "$article_id", a : { $avg : "$daily.view"}}}, {$sort : { a : -1}}, {$limit : 5} ], {explain : true} );
  • 22. 22 Explain output… { "stages" : [ { "$cursor" : { "query" : … }, "fields" : { … }, "plan" : { "cursor" : "BasicCursor", "isMultiKey" : false, "scanAndOrder" : false, "allPlans" : [ { "cursor" : "BasicCursor", "isMultiKey" : false, "scanAndOrder" : false } ] } } }, … "ok" : 1 }
  • 23. Geo Spatial & Text Search Aggregation
  • 24. 24 • $text operator with aggregation framework – All articles with MongoDB – Group by author, sort by comments count Text Search db.articles.aggregate([ { $match: { $text: { $search: "mongodb" } } }, { $group: { _id: "$author", comments: { $sum: "$comment_count" } } } {$sort : {comments: -1}}, ])
  • 25. 25 • $geoNear operator with aggregation framework – Again use geo operator in the $match statement. – Group by author, and article count. Utilise with Geo spatial db.articles.aggregate([ { $match: { location: { $geoNear : { $geometry : { type: "Point" ,coordinates : [-0.128, 51.507] } }, $maxDistance :5000} } }, { $group: { _id: "$author", articleCount: { $sum: 1 } } } ])
  • 27. 27 • Aggregating Data… – Map Reduce – Hadoop – Pre-Aggregated Reports – Aggregation Framework • Tune with Explain plan • Compute on the fly or Compute and store • Geospatial • Text Search Summary
  • 28. 28 – Operations for you application – Scale out – Availability – How do we prepare of production – Sizing Next Session – 3th April

Editor's Notes

  1. db.interactions.find({"article_id" : ObjectId("532198379fb5ba99a6bd4063")})db.interactions.find({"article_id" : ObjectId("532198379fb5ba99a6bd4063")},{_id:0,hourly:1})