SlideShare a Scribd company logo
Elevate MongoDB with ODBC/JDBC
Sumit Sarkar, Principal Systems Engineer
DataDirect, A Progress Software Company
www.linkedin.com/in/meetsumit
@SAsInSumit
© 2013 Progress Software Corporation. All rights reserved.2
© 2013 Progress Software Corporation. All rights reserved.3
Welcome MongoDB connectivity to the DataDirect family
Big Data/NoSQL
 Apache Hadoop Hive
 Cloudera
 Hortonworks
 Pivotal HD
 MapR
 EMR
 Pivotal HAWQ
 Cloudera Impala
 MongoDB
 Cassandra (preview)
 Spark SQL (preview)
 SAP HANA (ODBC preview)
Data Warehouses
 Amazon Redshift
 SAP Sybase IQ
 Teradata
 Pivotal Greenplum
Relational
 Oracle DB
 Microsoft SQL Server
 IBM DB2
 MySQL
 PostgreSQL
 IBM Informix
 SAP Sybase
 Pervasive SQL
 Progress OpenEdge
 Progress Rollbase
SaaS/Cloud
 Salesforce.com
 Database.com
 FinancialForce
 Veeva CRM
 ServiceMAX
 Any Force.com App
 Hubspot
 Marketo
 Microsoft Dynamics CRM
 Microsoft SQL Azure
 Oracle Eloqua
 Oracle Service Cloud
 Google Analytics
EDI/XML/Text
 EDIFACT
 EDIG@S
 EANCOM
 X12
 IATA
 Healthcare EDI: X12, HIPAA,
ICD-10, HL7
 Custom EDI
 Flat files: CSV, TSV, dBase,
Clipper, Foxpro, Paradox
 Text Files
Any
 SDK
 SequeLink Socket Server
 Customer Engineering
© 2013 Progress Software Corporation. All rights reserved.4
ODBC/JDBC applications with
MongoDB
© 2013 Progress Software Corporation. All rights reserved.5
Why connect to MongoDB with SQL access?
 Analytics and Data Visualization
 Reporting
 Data Virtualization / Federation
 Data Warehousing
© 2013 Progress Software Corporation. All rights reserved.6
Elevate your MongoDB data value with open database standards
Business Intelligence Data Integration
ODBC/JDBC
© 2013 Progress Software Corporation. All rights reserved.7
Why MongoDB Developers care about SQL?
 Your applications are awesome and the data deserves open
access
 Expand footprint of MongoDB where enterprise reporting
requirements prevent adoption
 Share analytics, reporting, integration work with to your
colleagues
© 2013 Progress Software Corporation. All rights reserved.8
Select MongoDB ODBC/JDBC use cases from our user base
 Network Security: Build reports using Microsoft BI Stack for
all incoming network traffic stored in MongoDB
 Enhance Operational Systems: Store order details from IBM
order management system in MongoDB repository that
require SAP reporting
 Visual Analytics: Use Tableau to determine success of
marketing campaign data including clicks, videos, social
shares, etc from MongoLabs
 Complex Analysis: Build cubes for intelligence using
complex MongoDB documents storing clinical trial data
© 2013 Progress Software Corporation. All rights reserved.9
How SQL access to NoSQL works
© 2013 Progress Software Corporation. All rights reserved.10
MongoDB: an OpenSource, NoSQL Document (JSON) Database
MongoDB Data Model
Sample JSON Document:
{ name: “sue”, age: 26, status: “A”, groups: [“news”, “sports”]}
Relational database design
focuses on data storage
NoSQL document database design
focuses on data use
© 2013 Progress Software Corporation. All rights reserved.11
Introduced tools to perfect our intelligent schema
DataDirect ODBC/
JDBC Driver
Meta
data
Schema
Tool
Business Intelligence Data Integration
© 2013 Progress Software Corporation. All rights reserved.12
MongoDB | Schema Design Comparison
Relational Design NoSQL Document Design
{ user: {
first: “Brody,
last: “Messmer”, ...
}
purchases: [
{ symbol: “PRGS”, date: “2013-02-13”, price: 23.50, qty: 100, ...},
{ symbol: “PRGS”, date: “2012-06-12”, price: 20.57, qty: 100, ...},
...
]
}
...
Collection: users
VS
user_id first last …
123456 Brody Messmer …
…
user_id symbol date price qty …
123456 PRGS 2013-02-13 23.50 100 …
123456 PRGS 2012-06-12 20.57 100 …
…
Table: users
Table: purchases
© 2013 Progress Software Corporation. All rights reserved.13
Collection Name: stock
{ symbol: “PRGS”,
purchases:[
{date: ISODate(“2013-02-13T16:58:36Z”), price: 23.50, qty: 100},
{date: ISODate(“2012-06-12T08:00:01Z”), price: 20.57, qty: 100,
sellDate: ISODate(“2013-08-16T12:34:58Z”)}, sellPrice: 24.60}
]
}
MongoDB | Progress DataDirect Approach – Normalize
Table Name: stock
_id symbol
1 PRGS
stock_id Date Price qty sellDate sellPrice
1 2013-02-13
16:58:36
23.50 100 NULL NULL
1 2012-06-12
08:00:01
20.57 100 2013-08-16
12:34:58
24.60
Table Name: stock_purchases
© 2013 Progress Software Corporation. All rights reserved.14
Demo | MongoDB
© 2013 Progress Software Corporation. All rights reserved.15
Tableau and MongoDB ODBC
Demo with User Profile Data
© 2013 Progress Software Corporation. All rights reserved.17
Lessons Learned
© 2013 Progress Software Corporation. All rights reserved.18
Challenges we faced in building SQL interface to NoSQL
 Mapping SQL to MongoDB query language
• Queries are written as BSON, but exposed in Mongo “drivers” as an API.
• No support for joins
• Lack of Implicit conversions for query filters (where clause)
• Sorting is not ANSI SQL compliant
 Non-relational schema
• Document oriented data model with complex data types (denormalized)
• Self-describing schema -- Can only discover columns by selecting data
© 2013 Progress Software Corporation. All rights reserved.19
SQL to MongoDB Lessons Learned with our users
 Managing logical schema across BI metadata modeling clients and servers/clusters
 Important to consider cross collection joins in stress tests on BI servers
 Different BI tools require specific properties such as max varchar sizes on string data
types, or field names that start with underscore.
 Hard to anticipate document structures and levels of nesting – so we supported
infinitely deep
 Lack of enforced data types from document to document (row to row)
• Driver attempts to convert to the SQL Type defined. If conversion fails, NULL is returned.
• MongoDB does not implicitly convert to/from strings when applying filters (where clause)
– Driver is capable filtering client side via a hidden option
 Field names of different case can result in collisions from document to document
• The driver will expose these fields as separate columns without “uppercaseIdentifiers”=true
 Must sample data to generate schema on read – need to configure this value
“ColumnDiscoverySampleSize”
Elevate MongoDB with ODBC/JDBC

More Related Content

Elevate MongoDB with ODBC/JDBC

  • 1. Elevate MongoDB with ODBC/JDBC Sumit Sarkar, Principal Systems Engineer DataDirect, A Progress Software Company www.linkedin.com/in/meetsumit @SAsInSumit
  • 2. © 2013 Progress Software Corporation. All rights reserved.2
  • 3. © 2013 Progress Software Corporation. All rights reserved.3 Welcome MongoDB connectivity to the DataDirect family Big Data/NoSQL  Apache Hadoop Hive  Cloudera  Hortonworks  Pivotal HD  MapR  EMR  Pivotal HAWQ  Cloudera Impala  MongoDB  Cassandra (preview)  Spark SQL (preview)  SAP HANA (ODBC preview) Data Warehouses  Amazon Redshift  SAP Sybase IQ  Teradata  Pivotal Greenplum Relational  Oracle DB  Microsoft SQL Server  IBM DB2  MySQL  PostgreSQL  IBM Informix  SAP Sybase  Pervasive SQL  Progress OpenEdge  Progress Rollbase SaaS/Cloud  Salesforce.com  Database.com  FinancialForce  Veeva CRM  ServiceMAX  Any Force.com App  Hubspot  Marketo  Microsoft Dynamics CRM  Microsoft SQL Azure  Oracle Eloqua  Oracle Service Cloud  Google Analytics EDI/XML/Text  EDIFACT  EDIG@S  EANCOM  X12  IATA  Healthcare EDI: X12, HIPAA, ICD-10, HL7  Custom EDI  Flat files: CSV, TSV, dBase, Clipper, Foxpro, Paradox  Text Files Any  SDK  SequeLink Socket Server  Customer Engineering
  • 4. © 2013 Progress Software Corporation. All rights reserved.4 ODBC/JDBC applications with MongoDB
  • 5. © 2013 Progress Software Corporation. All rights reserved.5 Why connect to MongoDB with SQL access?  Analytics and Data Visualization  Reporting  Data Virtualization / Federation  Data Warehousing
  • 6. © 2013 Progress Software Corporation. All rights reserved.6 Elevate your MongoDB data value with open database standards Business Intelligence Data Integration ODBC/JDBC
  • 7. © 2013 Progress Software Corporation. All rights reserved.7 Why MongoDB Developers care about SQL?  Your applications are awesome and the data deserves open access  Expand footprint of MongoDB where enterprise reporting requirements prevent adoption  Share analytics, reporting, integration work with to your colleagues
  • 8. © 2013 Progress Software Corporation. All rights reserved.8 Select MongoDB ODBC/JDBC use cases from our user base  Network Security: Build reports using Microsoft BI Stack for all incoming network traffic stored in MongoDB  Enhance Operational Systems: Store order details from IBM order management system in MongoDB repository that require SAP reporting  Visual Analytics: Use Tableau to determine success of marketing campaign data including clicks, videos, social shares, etc from MongoLabs  Complex Analysis: Build cubes for intelligence using complex MongoDB documents storing clinical trial data
  • 9. © 2013 Progress Software Corporation. All rights reserved.9 How SQL access to NoSQL works
  • 10. © 2013 Progress Software Corporation. All rights reserved.10 MongoDB: an OpenSource, NoSQL Document (JSON) Database MongoDB Data Model Sample JSON Document: { name: “sue”, age: 26, status: “A”, groups: [“news”, “sports”]} Relational database design focuses on data storage NoSQL document database design focuses on data use
  • 11. © 2013 Progress Software Corporation. All rights reserved.11 Introduced tools to perfect our intelligent schema DataDirect ODBC/ JDBC Driver Meta data Schema Tool Business Intelligence Data Integration
  • 12. © 2013 Progress Software Corporation. All rights reserved.12 MongoDB | Schema Design Comparison Relational Design NoSQL Document Design { user: { first: “Brody, last: “Messmer”, ... } purchases: [ { symbol: “PRGS”, date: “2013-02-13”, price: 23.50, qty: 100, ...}, { symbol: “PRGS”, date: “2012-06-12”, price: 20.57, qty: 100, ...}, ... ] } ... Collection: users VS user_id first last … 123456 Brody Messmer … … user_id symbol date price qty … 123456 PRGS 2013-02-13 23.50 100 … 123456 PRGS 2012-06-12 20.57 100 … … Table: users Table: purchases
  • 13. © 2013 Progress Software Corporation. All rights reserved.13 Collection Name: stock { symbol: “PRGS”, purchases:[ {date: ISODate(“2013-02-13T16:58:36Z”), price: 23.50, qty: 100}, {date: ISODate(“2012-06-12T08:00:01Z”), price: 20.57, qty: 100, sellDate: ISODate(“2013-08-16T12:34:58Z”)}, sellPrice: 24.60} ] } MongoDB | Progress DataDirect Approach – Normalize Table Name: stock _id symbol 1 PRGS stock_id Date Price qty sellDate sellPrice 1 2013-02-13 16:58:36 23.50 100 NULL NULL 1 2012-06-12 08:00:01 20.57 100 2013-08-16 12:34:58 24.60 Table Name: stock_purchases
  • 14. © 2013 Progress Software Corporation. All rights reserved.14 Demo | MongoDB
  • 15. © 2013 Progress Software Corporation. All rights reserved.15 Tableau and MongoDB ODBC
  • 16. Demo with User Profile Data
  • 17. © 2013 Progress Software Corporation. All rights reserved.17 Lessons Learned
  • 18. © 2013 Progress Software Corporation. All rights reserved.18 Challenges we faced in building SQL interface to NoSQL  Mapping SQL to MongoDB query language • Queries are written as BSON, but exposed in Mongo “drivers” as an API. • No support for joins • Lack of Implicit conversions for query filters (where clause) • Sorting is not ANSI SQL compliant  Non-relational schema • Document oriented data model with complex data types (denormalized) • Self-describing schema -- Can only discover columns by selecting data
  • 19. © 2013 Progress Software Corporation. All rights reserved.19 SQL to MongoDB Lessons Learned with our users  Managing logical schema across BI metadata modeling clients and servers/clusters  Important to consider cross collection joins in stress tests on BI servers  Different BI tools require specific properties such as max varchar sizes on string data types, or field names that start with underscore.  Hard to anticipate document structures and levels of nesting – so we supported infinitely deep  Lack of enforced data types from document to document (row to row) • Driver attempts to convert to the SQL Type defined. If conversion fails, NULL is returned. • MongoDB does not implicitly convert to/from strings when applying filters (where clause) – Driver is capable filtering client side via a hidden option  Field names of different case can result in collisions from document to document • The driver will expose these fields as separate columns without “uppercaseIdentifiers”=true  Must sample data to generate schema on read – need to configure this value “ColumnDiscoverySampleSize”

Editor's Notes

  1. “Elevate MongoDB with ODBC/JDBC Adoption for MongoDB is growing across the enterprise and disrupting existing business intelligence, analytics and data integration infrastructure.  Join us to disrupt that disruption using ODBC and JDBC access to MongoDB for instant out-of-box integration with existing infrastructure to elevate and expand your organization’s MongoDB footprint.  We'll talk about common challenges and gotchas that shops face when exposing unstructured and semi-structured data using these established data connectivity standards.  Existing infrastructure requirements should not dictate developers’ freedom of choice in a database.”
  2. 350+ ISVs 10,000 DEUs
  3. Microstrategy IBM Cognos Crystal Reports Tableau Qlikview SQLServer IntegrationServices IBM DataStage Infosphere SQLServer LinkedServer SAS Sqoop Pentaho SAP BusinessObjects Informatica PowerCenter SAP DataServices Tibco Spotfire OBIEE Sybase ECDA IBM Federation Server SPSS Syncsort DMExpress Talend
  4. Not here to get into a SQL is better than NoSQL argument
  5. Our approach provides the most natural and flexible representation of nested, multi-value data to relational applications We examine the data stored and normalize nested data/tuples into second-normal relational form based on a fixed schema ‘Top-level’ data becomes the parent table and nested data become ‘virtual’ tables with FK relation to the parent table Defined using schema definition tool Transparent to the application!
  6. http://www.mongodb.com/presentations/webinar-user-data-management-mongodb Flexible data model Indexes and Query API Easy for developers High Performance and Scalable Account Info, Activity Streams, Social Networks