SlideShare a Scribd company logo
20 MAR C H , 2018
# M D B l o c a l
THE PATH TO
TRULY UNDERSTANDING
YOUR MONGODB DATA
# M D B l o c a l
# M D B l o c a l
TOM HOLLANDER
PRODUCT MANAGER, MONGODB
@tomhollander
# M D B l o c a l
1. Background: Data Analytics
2. The importance of data visualisation
3. Methods for data visualisation in MongoDB
AGENDA
# M D B l o c a l
BACKGROUND
# M D B l o c a l
TERMINOLOGY
“Business
Intelligence” “Business
Analytics”
ANALYTICS
DATA VISUALISATION
# M D B l o c a l
• More data has been created
in the last 2 years than
entire previous history of the
human race
• By 2020:
• 1.7MB per person every
second
DATA GROWTH IS EXPLOSIVE
# M D B l o c a l
• Analytics is big $!
• $150B in 2017
• $210B+ in 2020
• Less than 0.5% of data is
analysed and used –
imagine the potential!
THE STATE OF ANALYTICS
Source: IDC. https://www.idc.com/getdoc.jsp?containerId=prUS42371417
# M D B l o c a l
EVOLUTION OF ANALYTICS
• Self service
• Mobile access
• Spark
• Real time analytics
• On-prem and cloud
• On demand reporting
2014 20162012
• Dedicated reporting team
• Desktop access
• Hadoop
• Batch analytics
• On prem only
• Monthly reports
2018
# M D B l o c a l
IMPORTANCE OF DATA
VISUALISATION
Data Analytics: Understanding Your MongoDB Data
# M D B l o c a l
# M D B l o c a l
• Charles Minard
(1869)
• Napoleon's march
and retreat on
Moscow in 1812.
EARLY DATA VISUALISATIONS
# M D B l o c a l
I
X Y
10 8.04
8 6.95
13 7.58
9 8.81
11 8.33
14 9.96
6 7.24
4 4.26
12 10.84
7 4.82
5 5.68
9.00 7.50
10.00 3.75
0.816
Mean
Variance
Correlation
# M D B l o c a l
I
X Y
10 8.04
8 6.95
13 7.58
9 8.81
11 8.33
14 9.96
6 7.24
4 4.26
12 10.84
7 4.82
5 5.68
9.00 7.50
10.00 3.75
0.816
Mean
Variance
Correlation
# M D B l o c a l
I
X Y
10 8.04
8 6.95
13 7.58
9 8.81
11 8.33
14 9.96
6 7.24
4 4.26
12 10.84
7 4.82
5 5.68
9.00 7.50
10.00 3.75
0.816
II III IV
X Y X Y X Y
10 9.14 10 7.46 8 6.58
8 8.14 8 6.77 8 5.76
13 8.74 13 12.74 8 7.71
9 8.77 9 7.11 8 8.84
11 9.26 11 7.81 8 8.47
14 8.1 14 8.84 8 7.04
6 6.13 6 6.08 8 5.25
4 3.1 4 5.39 19 12.5
12 9.13 12 8.15 8 5.56
7 7.26 7 6.42 8 7.91
5 4.74 5 5.73 8 6.89
9.00 7.50 9.00 7.50 9.00 7.50
10.00 3.75 10.00 3.75 10.00 3.75
0.816 0.816 0.817
Mean
Variance
Correlation
# M D B l o c a l
# M D B l o c a l
# M D B l o c a l
SO YOU WANT TO VISUALISE?
SO YOU WANT TO VISUALIZE?
# M D B l o c a l
EASY (ish) HARD (er?)
# M D B l o c a l
• Use the correct architecture
• Determine what your needs are
• Multiple data sources?
• Huge amounts of complex data?
• Quick self service?
• Choose the right solution for you
THINGS TO THINK ABOUT
# M D B l o c a l
• Run analytics against your main
deployment used by your Online
Transaction Processing (OLTP) apps
• May be OK in some cases, but watch
out for:
• Poor performing analytics queries
• Analytics impacting OLTP workloads
ARCHITECTURE:
SHARED DEPLOYMENT OLTP Client
DB
Analytics
# M D B l o c a l
• Hidden secondaries maintain a
copy of the primary’s data set
• Hidden secondaries are used for
workloads with different access
patterns
• Contain identical data, but can
have different indexes
• Hidden secondary cannot
become primary
ARCHITECTURE:
HIDDEN REPLICAS OLTP Client Analytics
Primary
Secondary
Secondary
Secondary
P=0
Hidden=true
# M D B l o c a l
• An Extract-Transform-Load tool
retrieves data from one or more
databases, transforms the data
and loads into a data warehouse
• Minimal impact on OLTP
systems; data can be highly
optimised for analysis
• Expensive to setup and maintain
• Data can be stale
ARCHITECTURE:
ETL TO DATA WAREHOUSE Analytics
DB1
DB2
DB3
Data
Warehouse
ETL
OLTP Clients
# M D B l o c a l
TOOLING OPTIONS
TOOLING
# M D B l o c a l
• Pros
• Custom tailored solution: fits
exactly as required!
• Cons
• High investment
• Maintenance
• Deep understanding of the
underlying tech and its
language(s)
BUILD YOUR OWN
# M D B l o c a l
BUILD YOUR OWN
DEMO
# M D B l o c a l
• Day-to-day development/operations
• Data management and manipulation
• Adding indexes
• Viewing server stats
• Schema analysis with visualisations
MONGODB COMPASS
# M D B l o c a l
MONGODB COMPASS
DEMO
# M D B l o c a l
• Understand the range of types and values in your documents
• When you want zero effort visualisations, and don’t need the
ability to customise
MONGODB COMPASS: WHEN TO USE
# M D B l o c a l
• Visualise and explore MongoDB
data in SQL-based BI tools:
• Automatically discovers the schema
• Translates complex SQL statements
issued by the BI tool into MongoDB
aggregation queries
• Converts the results into a tabular
format for rendering inside the BI
tool
MONGODB BI CONNECTOR
# M D B l o c a l
MONGODB BI CONNECTOR
MySQL protocol
MongoDB
mongosqld
etc.
DRDL
# M D B l o c a l
MONGODB BI CONNECTOR
DEMO
# M D B l o c a l
• Existing investment in BI tools (Tableau, Power BI, Qlik etc.)
• You are analysing data from multiple data sources (not just
MongoDB)
• Your MongoDB datasets are highly structured
• Consistent, minimal nesting, no polymorphism
• You have the time and patience for schema mapping
• Extremely powerful but high ramp
BI CONNECTOR: WHEN TO USE
# M D B l o c a l
• Lightweight and intuitive
• Build visualisations on
MongoDB data (nested,
polymorphic)
• Share content in a
dashboard
• Beta available soon!
MONGODB CHARTS
# M D B l o c a l
MONGODB CHARTS
DEMO
# M D B l o c a l
• Your data is in MongoDB collections
• You don’t want to flatten / ETL your MongoDB data
• When you want quick answers from simple but customisable
visualisations
• Self service for semi-technical audience
MONGODB CHARTS: WHEN TO USE
# M D B l o c a l
DATA VISUALISATION LIFE CYCLE
1. Acquire 2. Prep
- Calcs
- Groups
- Data types
3. Visualise
- Bar
- Pie
- Line
4. Explore
- Dashboards
5. Share
- Export
- Collaborate
- Embed
# M D B l o c a l
• Visualisations are incredibly powerful for understanding your data
• Use them to derive insight
• There are multiple options for visualising your MongoDB data
• Combine the tools for the most power!
SUMMARY
# M D B l o c a l
Q&A
tom.hollander@mongodb.com
@tomhollander
# M D B l o c a l
THANK YOU!
tom.hollander@mongodb.com
@tomhollander

More Related Content

Data Analytics: Understanding Your MongoDB Data

  • 1. 20 MAR C H , 2018 # M D B l o c a l THE PATH TO TRULY UNDERSTANDING YOUR MONGODB DATA
  • 2. # M D B l o c a l
  • 3. # M D B l o c a l TOM HOLLANDER PRODUCT MANAGER, MONGODB @tomhollander
  • 4. # M D B l o c a l 1. Background: Data Analytics 2. The importance of data visualisation 3. Methods for data visualisation in MongoDB AGENDA
  • 5. # M D B l o c a l BACKGROUND
  • 6. # M D B l o c a l TERMINOLOGY “Business Intelligence” “Business Analytics” ANALYTICS DATA VISUALISATION
  • 7. # M D B l o c a l • More data has been created in the last 2 years than entire previous history of the human race • By 2020: • 1.7MB per person every second DATA GROWTH IS EXPLOSIVE
  • 8. # M D B l o c a l • Analytics is big $! • $150B in 2017 • $210B+ in 2020 • Less than 0.5% of data is analysed and used – imagine the potential! THE STATE OF ANALYTICS Source: IDC. https://www.idc.com/getdoc.jsp?containerId=prUS42371417
  • 9. # M D B l o c a l EVOLUTION OF ANALYTICS • Self service • Mobile access • Spark • Real time analytics • On-prem and cloud • On demand reporting 2014 20162012 • Dedicated reporting team • Desktop access • Hadoop • Batch analytics • On prem only • Monthly reports 2018
  • 10. # M D B l o c a l IMPORTANCE OF DATA VISUALISATION
  • 12. # M D B l o c a l
  • 13. # M D B l o c a l • Charles Minard (1869) • Napoleon's march and retreat on Moscow in 1812. EARLY DATA VISUALISATIONS
  • 14. # M D B l o c a l I X Y 10 8.04 8 6.95 13 7.58 9 8.81 11 8.33 14 9.96 6 7.24 4 4.26 12 10.84 7 4.82 5 5.68 9.00 7.50 10.00 3.75 0.816 Mean Variance Correlation
  • 15. # M D B l o c a l I X Y 10 8.04 8 6.95 13 7.58 9 8.81 11 8.33 14 9.96 6 7.24 4 4.26 12 10.84 7 4.82 5 5.68 9.00 7.50 10.00 3.75 0.816 Mean Variance Correlation
  • 16. # M D B l o c a l I X Y 10 8.04 8 6.95 13 7.58 9 8.81 11 8.33 14 9.96 6 7.24 4 4.26 12 10.84 7 4.82 5 5.68 9.00 7.50 10.00 3.75 0.816 II III IV X Y X Y X Y 10 9.14 10 7.46 8 6.58 8 8.14 8 6.77 8 5.76 13 8.74 13 12.74 8 7.71 9 8.77 9 7.11 8 8.84 11 9.26 11 7.81 8 8.47 14 8.1 14 8.84 8 7.04 6 6.13 6 6.08 8 5.25 4 3.1 4 5.39 19 12.5 12 9.13 12 8.15 8 5.56 7 7.26 7 6.42 8 7.91 5 4.74 5 5.73 8 6.89 9.00 7.50 9.00 7.50 9.00 7.50 10.00 3.75 10.00 3.75 10.00 3.75 0.816 0.816 0.817 Mean Variance Correlation
  • 17. # M D B l o c a l
  • 18. # M D B l o c a l
  • 19. # M D B l o c a l SO YOU WANT TO VISUALISE? SO YOU WANT TO VISUALIZE?
  • 20. # M D B l o c a l EASY (ish) HARD (er?)
  • 21. # M D B l o c a l • Use the correct architecture • Determine what your needs are • Multiple data sources? • Huge amounts of complex data? • Quick self service? • Choose the right solution for you THINGS TO THINK ABOUT
  • 22. # M D B l o c a l • Run analytics against your main deployment used by your Online Transaction Processing (OLTP) apps • May be OK in some cases, but watch out for: • Poor performing analytics queries • Analytics impacting OLTP workloads ARCHITECTURE: SHARED DEPLOYMENT OLTP Client DB Analytics
  • 23. # M D B l o c a l • Hidden secondaries maintain a copy of the primary’s data set • Hidden secondaries are used for workloads with different access patterns • Contain identical data, but can have different indexes • Hidden secondary cannot become primary ARCHITECTURE: HIDDEN REPLICAS OLTP Client Analytics Primary Secondary Secondary Secondary P=0 Hidden=true
  • 24. # M D B l o c a l • An Extract-Transform-Load tool retrieves data from one or more databases, transforms the data and loads into a data warehouse • Minimal impact on OLTP systems; data can be highly optimised for analysis • Expensive to setup and maintain • Data can be stale ARCHITECTURE: ETL TO DATA WAREHOUSE Analytics DB1 DB2 DB3 Data Warehouse ETL OLTP Clients
  • 25. # M D B l o c a l TOOLING OPTIONS TOOLING
  • 26. # M D B l o c a l • Pros • Custom tailored solution: fits exactly as required! • Cons • High investment • Maintenance • Deep understanding of the underlying tech and its language(s) BUILD YOUR OWN
  • 27. # M D B l o c a l BUILD YOUR OWN DEMO
  • 28. # M D B l o c a l • Day-to-day development/operations • Data management and manipulation • Adding indexes • Viewing server stats • Schema analysis with visualisations MONGODB COMPASS
  • 29. # M D B l o c a l MONGODB COMPASS DEMO
  • 30. # M D B l o c a l • Understand the range of types and values in your documents • When you want zero effort visualisations, and don’t need the ability to customise MONGODB COMPASS: WHEN TO USE
  • 31. # M D B l o c a l • Visualise and explore MongoDB data in SQL-based BI tools: • Automatically discovers the schema • Translates complex SQL statements issued by the BI tool into MongoDB aggregation queries • Converts the results into a tabular format for rendering inside the BI tool MONGODB BI CONNECTOR
  • 32. # M D B l o c a l MONGODB BI CONNECTOR MySQL protocol MongoDB mongosqld etc. DRDL
  • 33. # M D B l o c a l MONGODB BI CONNECTOR DEMO
  • 34. # M D B l o c a l • Existing investment in BI tools (Tableau, Power BI, Qlik etc.) • You are analysing data from multiple data sources (not just MongoDB) • Your MongoDB datasets are highly structured • Consistent, minimal nesting, no polymorphism • You have the time and patience for schema mapping • Extremely powerful but high ramp BI CONNECTOR: WHEN TO USE
  • 35. # M D B l o c a l • Lightweight and intuitive • Build visualisations on MongoDB data (nested, polymorphic) • Share content in a dashboard • Beta available soon! MONGODB CHARTS
  • 36. # M D B l o c a l MONGODB CHARTS DEMO
  • 37. # M D B l o c a l • Your data is in MongoDB collections • You don’t want to flatten / ETL your MongoDB data • When you want quick answers from simple but customisable visualisations • Self service for semi-technical audience MONGODB CHARTS: WHEN TO USE
  • 38. # M D B l o c a l DATA VISUALISATION LIFE CYCLE 1. Acquire 2. Prep - Calcs - Groups - Data types 3. Visualise - Bar - Pie - Line 4. Explore - Dashboards 5. Share - Export - Collaborate - Embed
  • 39. # M D B l o c a l • Visualisations are incredibly powerful for understanding your data • Use them to derive insight • There are multiple options for visualising your MongoDB data • Combine the tools for the most power! SUMMARY
  • 40. # M D B l o c a l Q&A tom.hollander@mongodb.com @tomhollander
  • 41. # M D B l o c a l THANK YOU! tom.hollander@mongodb.com @tomhollander

Editor's Notes

  1. 96 DVDs per person per day
  2. Eye can process 10million bits per second. Roughly the same as Ethernet.
  3. One of the best statistical drawings ever made. Tells of 400,000 army marching on moscow and returning with 10,000. Shows time and loss of life, routes and river crossings etc.