Why Data Virtualization? An Introduction by Denodo
- 1. What is Data Virtualization and Why It Matters to You Alberto Pan, CTO Justo Hidalgo, VP Product Management & Consulting Denodo Technologies
- 3. Contents Why Data Virtualization? Productivity Distributed Query Optimization Layer Independence Governance Data Quality Architecture
- 6. Disjoint Views of Entities – the Elements Customer data spread over different and heterogeneous data sources Too much effort to locate and obtain the data. Data need to be not only extracted, but combined among different applications, interfaces and formats. Log files (.txt/.log files) CRM (MySQL) Billing System (Web Service - Rest) Incidences System (Web Application) Inventory System (MS SQL Server) Product Catalog (Web Service -SOAP) Knowledge Base (Internet) Product Data (CSV)
- 8. Happy Ending: Single View of Element- Virtual Integration JDBC ODBC WS CSV XML Web Web Flat files Homogeneous access to all data CRM (MySQL) Billing System (Web Service - Rest) Incidences System (Web Application) Inventory System (MS SQL Server) Product Catalog (Web Service -SOAP) Knowledge Base Product Data (CSV) Log files (.txt/.log files)
- 13. Why a Data Virtualization Layer? P roductivity D istributed Query Optimization P hysical and Logical independence G overnance D ata Quality
- 15. Built-in connectors for data sources Complex Data Combination operations do not need to be programmed Productivity… Applications & 3 rd Party Tools Enterprise Applications, BI, Portals, Dashboards, Web Applications… NAME DESCRIPTION PRICE NAME DESCRIPTION PRICE NAME MANUFACTURER SCORE NAME DESCRIPTION PRICE MANUFACTURER SCORE U ∞
- 16. Applications do not need to deal with complex data-related issues E.g. swapping of large result sets E.g. caching of costly result sets E.g. management of changes in the sources is done in the DV layer, leaving the business layer unaffected Collaboration and Prototyping Virtualization allows rapid prototyping and testing … Productivity…
- 17. Uniform access Developers use a single model and API instead of learning a mixture of different APIs Learning and execution curves are lower for every additional project on top of the DV layer … Productivity Multi-access A Data Virtualization layer can offer the most appropriate access type for each application (JDBC, Web Service, Sharepoint widget…)
- 19. Multiple execution strategies available Performance of a distributed join query may vary enormously depending on the used method e.g: hash join , merge join, nested join,… Even if the join is among the same data views, the optimum method may be different for different queries. Distributed Query Optimization…
- 20. The final Executable Plan depends on characteristics such as Strategies Sources Order Hash Join Logic Plan Candidate Physical Plans BOOK REVIEW BOOK REVIEW 1 BOOK REVIEW 2 BOOK REVIEW 2 BOOKSTORE A BOOKSTORE B BOOK STORE A BOOK STORE B Nested Loop Join BOOK STORE A NL BOOK STORE B BOOK STORE A BOOK STORE B Hash Join
- 21. Source query limitations Push processing to data sources Materialization : pre-load frequently used data and temporal locality … Distributed Query Optimization join pushed into data source Delegate join into data source
- 23. Applications are independent of changes in data source location, implementation (e.g. from legacy to new system) and schema. E.g. A mainframe is replaced by a new system. Customer data now comes from two systems instead of one due to a merge/acquisition. Two aplications are reengineered into a single one. The data schema of a data source changes. Physical and Logical Independence…
- 24. Let each tool do its business ! An ESB is good at orchestrating business services Data Virtualization is good at accessing information repositories, homogeneizing them and turning them into services … Physical and Logical Independence… ESB DATA VIRTUALIZATION
- 25. Changes need to be done in a single place. E.g. the way to determine if a customer is ‘VIP’ changes. Many applications will use this data field. In some applications (e.g. BRMS systems) the field can be used many times. … Physical and Logical Independence
- 27. Single entry point for data auditing : Track Data and Metadata changes. E.g. Which user was the last one that modified a certain view? Single point to introspect and query metadata. What is the schema provided by any data source? Governance…
- 28. Change impact management . Single point to answer questions like: … Governance… What are the consequences of a change in a data source? Where does the data used by applications come from?. What transformations are applied on source data before they are consumed by applications?
- 29. Single entry point for data monitoring : Track data sources and data services usage. E.g. how does the number of concurrent connections to a data source evolves throughout the day? send me an e-mail alert if at least 10% of the last 100 queries to a data source failed. Security : Provide authentication and authorization mechanisms for data access. Provide Data encryption functionalities. Protect data sources: Limit concurrent queries to a certain data source. Cache all or part of the data. Limit data replication needs at the data source level. … Governance
- 31. Many data quality actions can be applied at this layer, avoiding duplicating them in every data source/ application. Data Quality
- 32. … AND WHAT CAN WE DO WITH THESE PIECES?
- 35. Denodo Platform 4.6 – Virtualized Data Services in Less Time Improved connectivity with Enterprise Ecosystem Sources Connectivity, Middleware and DQ Tools, Publish level Improved Productivity & Ease of Use for Application Developer (connectivity, web integration etc.) and Data Management Professional (metadata, governance etc) Benefits to Business Rapid access to real-time data from disparate sources for - Agile Reporting and Operational BI / Dashboards - Customer Service Operations, Customer Portals Web Integration becomes “mainstream”
- 37. … but you can get very far with Data Virtualization!
Editor's Notes
- http://www.flickr.com/photos/maxbraun/98688824/
- http://dutchamericantranslations.wordpress.com/2010/01/04/matters-of-taste-acronym-or-initialism/
- http://www.flickr.com/photos/glenirah/4376553184/
- http://www.flickr.com/photos/adikos/4443291195/
- Collaboration: self-documenting model, but also actionable. Rapid prototyping platform.
- Collaboration: self-documenting model, but also actionable. Rapid prototyping platform.
- http://www.flickr.com/photos/laserstars/908946494/
- http://www.flickr.com/photos/tudor/458287668/
- http://www.flickr.com/photos/totalaldo/508664515/
- http://www.flickr.com/photos/heist_mine/4256417595/
- http://www.flickr.com/photos/oskay/2157682522/
- http://www.flickr.com/photos/stevendepolo/3703145222/
- http://www.flickr.com/photos/m-nicolson/2414298534/
- http://www.flickr.com/photos/psd/2086641/