SlideShare a Scribd company logo
David Rice & Tom Bruce
The Modern Data
Warehouse
15th January 2019
@snapanalytics
hello@snapanalytics.co.uk
Snap-analytics
Agenda
2
Topic
01 Introductions
02 Evolution of the Data Warehouse
03 Problems with traditional Data Warehousing
04 Why the Modern Data Platform?
05 Three components of the Modern Data Platform
06 Demo
07 Key takeaways
Introductions
3
Tom Bruce David Rice - aka ‘Data Dave’
(Delivery Lead and Co-
founder)
Extensive experience designing and
delivering enterprise data warehouse
and analytics solutions.
Core functional expertise in
• Finance
• Marketing
Tom has worked with clients
including:
• Jaguar Land Rover
• Deutsche Bank
• Carlsberg
(CEO and Co-founder)
Over 15 years experience in data
analytics including:
• Data warehouse design
• ETL (data integration)
• Data Modelling
• Delivering self service analytics
David has worked with clients
including:
• ING Bank
• Barclays Capital and
• Jaguar Land Rover
Bill Inmon
Mid 1970s
Bill Inmon begins to define and
discuss the term ‘Data Warehouse’.
AC Nielsen’s ‘Data Mart’
Early 1970s
ACNielsen provided ‘Data Marts’ to
their clients in order to help them
understand their sales better.
Evolution of the Data Warehouse
4
IBM Article of Data
Warehousing
Late 1980s
In 1988 IBM published ‘An
architecture for a business information
system’ and coined the term “business
data warehouse”
Early 1980s
Evolution of the Data Warehouse
MPP Databases
Teradata create the DBC/1012
database.
Goodyear aerospace build the
‘Goodyear MPP’ supercomputer.
5
TDWI
Mid 1990s
‘The Data Warehouse Institute’ is founded.
Early 1990s
Evolution of the Data Warehouse
Ralph Kimball
Ralph Kimball introduces the ‘Red
Brick Data Warehouse’,
The Data Warehouse Toolkit
1996 - ‘The Data Warehouse Toolkit’ is
published by Ralph Kimball
6
‘Big Data’ & No SQL
Late 2000sEarly 2000s
Evolution of the Data Warehouse
Data Vault
Dan Linstedt introduces Data Vault
modelling
Cloud Computing
7
Cloud Adoption
Late 2010sEarly 2010s
Evolution of the Data Warehouse
Cloud Data Warehousing
The benefits of Data Warehousing in the cloud were realised as:
Google Launched a Data Warehouse as a service ‘Big Query’ in 2011
Amazon launched Redshift in 2013
Snowflake Inc. was publicly launched in 2014
Microsoft launched Azure SQL Data Warehouse in 2016
DW Automation
Connectivity
8
Three big
problems!
The Data Warehouse
Data Integration
(ETL)
Data Modelling
9
Poor outcomes
60 percent of Big Data
projects will fail
Gartner, 2017
10
Problems with traditional DW solutions
Initial Set Up
Performance Tuning
Ongoing Maintenance
Scalability
Data Security &
Compliance
Flexibility
High Upfront Costs Resilience
11
Problems with traditional ETL solutions
Time consuming
Documentation
Inconsistent
Auditability & Lineage
Performance
Inefficient
12
A new way of thinking
Modern data platform
Modern data platforms like Snowflake are fast
to set up and scale up. Low cost storage and
decoupled storage and compute eliminate
resource contention. Native JSON support
and ‘time travel’ features also provide great
benefits.
Combining, modern data platforms data
modelling principles and DW automation
tools delivers highly agile, highly scalable,
performant solutions. This can serve the
needs of your data scientists and business
community alike.
Data Warehouse automation
Tools like Fivetran improve consistency, and
significantly reduce development cycles.
Agile data modelling
Data Vault 2.0 enables parallel loading,
support for unstructured data, and is built
with change in mind.
13
Multi
Cloud
Availability
Per
Second
Pricing
Performance
Data
Sharing
Multi
Use
Cases
Zero
Copy
Clone
Time
Travel
Instant
Elasticity
Benefits of Snowflake Data Platform
14
High Level Architecture - Snowflake
15
Streaming
Support
ELT
Performance
Zero-
configuration
SQL
Transforms
Rapid
Dev
Pre-built
Connectors
Benefits of Fivetran
16
Fivetran – Salesforce Schema
17
c
CitiBike Demo Context
3
• CitiBike is a bike share program in New York (similar to
Boris Bikes in London)
• Users are either annual members or buy short term passes
• There are numerous different stations across the city and
users will collect a bike from a station and then return it to
another station once they are finished
• CitiBike want to have a data warehouse to allow them to
analyse all of the historical trips and join this with external
data to give greater insight
• We will see how a modern data platform can be created
within minutes to help them achieve this goal
Demo Architecture
3
Amazon S3
Citibike Trips (CSV)
Amazon S3
NYC Weather Data
(JSON)
Azure Blob
Station MD (JSON)
Snowflake
Staging
Trips Weather
Station
MD
Transformation
Trips & Weather
Reporting
Trips View
Direct Load
Loading in Snowflake
3
• Data is loaded and queried using virtual warehouses available in the following sizes:
• Compute and storage can be completely isolated meaning no resource contention
• Processed using massively parallel processing (MPP) compute clusters
• Able to scale up the server with no administration needed
• Bulk data loading can be done from the following sources:
XS
1 server
XXXL
128 servers
SNOWFLAKE
DEMO
a) Bulk loading from S3 Stage
b) Scaling up the server
02 – ELT v ETL
3
• Modern cloud based solutions now mean that we can utilise ELT
rather than ETL:
 Endless storage capabilities and scalable processing power
 Ability to store semi-structured data meaning that it can be
transformed after loading
• Big advantage of ELT is that it adds extra flexibility:
 Data can be loaded very quickly
 Developers can then decide to transform what is necessary,
and can quickly change what needs to be transformed
FIVETRAN
DEMO
a) JSON source file
b) Loaded into Azure blob storage
c) Fivetran connector
d) Load
e) Transformation
03 – Semi-structured Data
3
• Snowflake is able to store semi-structured data (JSON, Avro, ORC & Parquet) natively enabling ELT
• Variant data type in Snowflake stores this data with SQL extensions to query directly
• Transformation to turn JSON data into structured tables in Snowflake is extremely simple
• Snowflake is a combination of both a Data Warehouse and a Data Lake – a ‘Data Lakehouse’
WEATHER
DATA LOAD
a) Load Weather JSON data from stage
b) View the weather data in raw form
c) Transform the JSON into structured
data
04 – Zero-copy Cloning for Dev and Test
3
• Data is often required to be copied for things such as QA and test
environments
• Creating copies of the data and environments takes considerable time
and there is cost associated to storing the data twice
• Snowflake uses cloning to instantly create copies of the data which do
not persist a copy of the data, simply referencing the original data
 Only new or updated records get stored in the new cloned table
CLONING
DEMO
05 – Time Travel
3
• Frequently there are issues with tables or data that
is accidentally deleted
• Data may be corrupted or changes may be
implemented that adversely affect the data
• Snowflake allows access to historical data (i.e.
changed or deleted) at any point within a 90 day
period
• Data can be quickly backed up from key times in the
past
TIME TRAVEL
DEMO
06 – Reporting Connectivity
3
• Snowflake connects to many different reporting tools, we’ve just selected a few below:
POWER BI
DEMO
Key takeaways
Maximise the work NOT
done
Build for Change
Are you future ready?
33

More Related Content

What's hot

Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse Architecture
James Serra
 
Modern Data Warehousing with the Microsoft Analytics Platform System
Modern Data Warehousing with the Microsoft Analytics Platform SystemModern Data Warehousing with the Microsoft Analytics Platform System
Modern Data Warehousing with the Microsoft Analytics Platform System
James Serra
 
Elastic Data Warehousing
Elastic Data WarehousingElastic Data Warehousing
Elastic Data Warehousing
Snowflake Computing
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouse
James Serra
 
Free Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseFree Training: How to Build a Lakehouse
Free Training: How to Build a Lakehouse
Databricks
 
Data warehouse presentaion
Data warehouse presentaionData warehouse presentaion
Data warehouse presentaion
sridhark1981
 
Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...
Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...
Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...
Amazon Web Services
 
Data platform architecture
Data platform architectureData platform architecture
Data platform architecture
Sudheer Kondla
 
Data Lake,beyond the Data Warehouse
Data Lake,beyond the Data WarehouseData Lake,beyond the Data Warehouse
Data Lake,beyond the Data Warehouse
Data Science Thailand
 
DI&A Slides: Data Lake vs. Data Warehouse
DI&A Slides: Data Lake vs. Data WarehouseDI&A Slides: Data Lake vs. Data Warehouse
DI&A Slides: Data Lake vs. Data Warehouse
DATAVERSITY
 
Data Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaData Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & Athena
Amazon Web Services
 
Introducing Azure SQL Data Warehouse
Introducing Azure SQL Data WarehouseIntroducing Azure SQL Data Warehouse
Introducing Azure SQL Data Warehouse
James Serra
 
Lecture1 introduction to big data
Lecture1 introduction to big dataLecture1 introduction to big data
Lecture1 introduction to big data
hktripathy
 
Building the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsBuilding the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake Analytics
Khalid Salama
 
Introduction to Data Engineering
Introduction to Data EngineeringIntroduction to Data Engineering
Introduction to Data Engineering
Durga Gadiraju
 
Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overview
James Serra
 
Introducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data WarehouseIntroducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data Warehouse
Snowflake Computing
 
Building a Data Lake on AWS
Building a Data Lake on AWSBuilding a Data Lake on AWS
Building a Data Lake on AWS
Gary Stafford
 
DataMinds 2022 Azure Purview Erwin de Kreuk
DataMinds 2022 Azure Purview Erwin de KreukDataMinds 2022 Azure Purview Erwin de Kreuk
DataMinds 2022 Azure Purview Erwin de Kreuk
Erwin de Kreuk
 
Data Sharing with Snowflake
Data Sharing with SnowflakeData Sharing with Snowflake
Data Sharing with Snowflake
Snowflake Computing
 

What's hot (20)

Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse Architecture
 
Modern Data Warehousing with the Microsoft Analytics Platform System
Modern Data Warehousing with the Microsoft Analytics Platform SystemModern Data Warehousing with the Microsoft Analytics Platform System
Modern Data Warehousing with the Microsoft Analytics Platform System
 
Elastic Data Warehousing
Elastic Data WarehousingElastic Data Warehousing
Elastic Data Warehousing
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouse
 
Free Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseFree Training: How to Build a Lakehouse
Free Training: How to Build a Lakehouse
 
Data warehouse presentaion
Data warehouse presentaionData warehouse presentaion
Data warehouse presentaion
 
Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...
Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...
Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...
 
Data platform architecture
Data platform architectureData platform architecture
Data platform architecture
 
Data Lake,beyond the Data Warehouse
Data Lake,beyond the Data WarehouseData Lake,beyond the Data Warehouse
Data Lake,beyond the Data Warehouse
 
DI&A Slides: Data Lake vs. Data Warehouse
DI&A Slides: Data Lake vs. Data WarehouseDI&A Slides: Data Lake vs. Data Warehouse
DI&A Slides: Data Lake vs. Data Warehouse
 
Data Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaData Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & Athena
 
Introducing Azure SQL Data Warehouse
Introducing Azure SQL Data WarehouseIntroducing Azure SQL Data Warehouse
Introducing Azure SQL Data Warehouse
 
Lecture1 introduction to big data
Lecture1 introduction to big dataLecture1 introduction to big data
Lecture1 introduction to big data
 
Building the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsBuilding the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake Analytics
 
Introduction to Data Engineering
Introduction to Data EngineeringIntroduction to Data Engineering
Introduction to Data Engineering
 
Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overview
 
Introducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data WarehouseIntroducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data Warehouse
 
Building a Data Lake on AWS
Building a Data Lake on AWSBuilding a Data Lake on AWS
Building a Data Lake on AWS
 
DataMinds 2022 Azure Purview Erwin de Kreuk
DataMinds 2022 Azure Purview Erwin de KreukDataMinds 2022 Azure Purview Erwin de Kreuk
DataMinds 2022 Azure Purview Erwin de Kreuk
 
Data Sharing with Snowflake
Data Sharing with SnowflakeData Sharing with Snowflake
Data Sharing with Snowflake
 

Similar to Modern data warehouse presentation

10 Reasons Snowflake Is Great for Analytics
10 Reasons Snowflake Is Great for Analytics10 Reasons Snowflake Is Great for Analytics
10 Reasons Snowflake Is Great for Analytics
Senturus
 
Using Data Platforms That Are Fit-For-Purpose
Using Data Platforms That Are Fit-For-PurposeUsing Data Platforms That Are Fit-For-Purpose
Using Data Platforms That Are Fit-For-Purpose
DATAVERSITY
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
DATAVERSITY
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
James Serra
 
IBM Cloud Day January 2021 - A well architected data lake
IBM Cloud Day January 2021 - A well architected data lakeIBM Cloud Day January 2021 - A well architected data lake
IBM Cloud Day January 2021 - A well architected data lake
Torsten Steinbach
 
Agile Data Engineering - Intro to Data Vault Modeling (2016)
Agile Data Engineering - Intro to Data Vault Modeling (2016)Agile Data Engineering - Intro to Data Vault Modeling (2016)
Agile Data Engineering - Intro to Data Vault Modeling (2016)
Kent Graziano
 
The Shifting Landscape of Data Integration
The Shifting Landscape of Data IntegrationThe Shifting Landscape of Data Integration
The Shifting Landscape of Data Integration
DATAVERSITY
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
James Serra
 
Delivering rapid-fire Analytics with Snowflake and Tableau
Delivering rapid-fire Analytics with Snowflake and TableauDelivering rapid-fire Analytics with Snowflake and Tableau
Delivering rapid-fire Analytics with Snowflake and Tableau
Harald Erb
 
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & TableauBig Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Sam Palani
 
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data LakesADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
DATAVERSITY
 
Agile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
Agile Data Warehouse Modeling: Introduction to Data Vault Data ModelingAgile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
Agile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
Kent Graziano
 
Demystifying Data Warehouse as a Service (DWaaS)
Demystifying Data Warehouse as a Service (DWaaS)Demystifying Data Warehouse as a Service (DWaaS)
Demystifying Data Warehouse as a Service (DWaaS)
Kent Graziano
 
Metail and Elastic MapReduce
Metail and Elastic MapReduceMetail and Elastic MapReduce
Metail and Elastic MapReduce
Gareth Rogers
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
DATAVERSITY
 
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
Denodo
 
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical SolutionEnterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Dmitry Anoshin
 
Laboratorio práctico: Data warehouse en la nube
Laboratorio práctico: Data warehouse en la nubeLaboratorio práctico: Data warehouse en la nube
Laboratorio práctico: Data warehouse en la nube
Software Guru
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
Alex Ivy
 
Big Data Building Blocks with AWS Cloud
Big Data Building Blocks with AWS CloudBig Data Building Blocks with AWS Cloud
Big Data Building Blocks with AWS Cloud
Blazeclan Technologies Private Limited
 

Similar to Modern data warehouse presentation (20)

10 Reasons Snowflake Is Great for Analytics
10 Reasons Snowflake Is Great for Analytics10 Reasons Snowflake Is Great for Analytics
10 Reasons Snowflake Is Great for Analytics
 
Using Data Platforms That Are Fit-For-Purpose
Using Data Platforms That Are Fit-For-PurposeUsing Data Platforms That Are Fit-For-Purpose
Using Data Platforms That Are Fit-For-Purpose
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
IBM Cloud Day January 2021 - A well architected data lake
IBM Cloud Day January 2021 - A well architected data lakeIBM Cloud Day January 2021 - A well architected data lake
IBM Cloud Day January 2021 - A well architected data lake
 
Agile Data Engineering - Intro to Data Vault Modeling (2016)
Agile Data Engineering - Intro to Data Vault Modeling (2016)Agile Data Engineering - Intro to Data Vault Modeling (2016)
Agile Data Engineering - Intro to Data Vault Modeling (2016)
 
The Shifting Landscape of Data Integration
The Shifting Landscape of Data IntegrationThe Shifting Landscape of Data Integration
The Shifting Landscape of Data Integration
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
 
Delivering rapid-fire Analytics with Snowflake and Tableau
Delivering rapid-fire Analytics with Snowflake and TableauDelivering rapid-fire Analytics with Snowflake and Tableau
Delivering rapid-fire Analytics with Snowflake and Tableau
 
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & TableauBig Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
 
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data LakesADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
 
Agile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
Agile Data Warehouse Modeling: Introduction to Data Vault Data ModelingAgile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
Agile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
 
Demystifying Data Warehouse as a Service (DWaaS)
Demystifying Data Warehouse as a Service (DWaaS)Demystifying Data Warehouse as a Service (DWaaS)
Demystifying Data Warehouse as a Service (DWaaS)
 
Metail and Elastic MapReduce
Metail and Elastic MapReduceMetail and Elastic MapReduce
Metail and Elastic MapReduce
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
 
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
 
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical SolutionEnterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
 
Laboratorio práctico: Data warehouse en la nube
Laboratorio práctico: Data warehouse en la nubeLaboratorio práctico: Data warehouse en la nube
Laboratorio práctico: Data warehouse en la nube
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
 
Big Data Building Blocks with AWS Cloud
Big Data Building Blocks with AWS CloudBig Data Building Blocks with AWS Cloud
Big Data Building Blocks with AWS Cloud
 

Recently uploaded

Saket @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Neha Singla Top Model Safe
Saket @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Neha Singla Top Model SafeSaket @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Neha Singla Top Model Safe
Saket @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Neha Singla Top Model Safe
shruti singh$A17
 
iot paper presentation FINAL EDIT by kiran.pptx
iot paper presentation FINAL EDIT by kiran.pptxiot paper presentation FINAL EDIT by kiran.pptx
iot paper presentation FINAL EDIT by kiran.pptx
KiranKumar139571
 
[D3T1S02] Aurora Limitless Database Introduction
[D3T1S02] Aurora Limitless Database Introduction[D3T1S02] Aurora Limitless Database Introduction
[D3T1S02] Aurora Limitless Database Introduction
Amazon Web Services Korea
 
NPS_Presentation_V3.pptx it is regarding National pension scheme
NPS_Presentation_V3.pptx it is regarding National pension schemeNPS_Presentation_V3.pptx it is regarding National pension scheme
NPS_Presentation_V3.pptx it is regarding National pension scheme
ASISHSABAT3
 
Daryaganj @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model Safe
Daryaganj @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model SafeDaryaganj @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model Safe
Daryaganj @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model Safe
nehadubay1
 
Noida Extension @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Vishakha Singla Top Model Safe
Noida Extension @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Vishakha Singla Top Model SafeNoida Extension @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Vishakha Singla Top Model Safe
Noida Extension @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Vishakha Singla Top Model Safe
kumkum tuteja$A17
 
Rohini @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Megha Singla Top Model Safe
Rohini @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Megha Singla Top Model SafeRohini @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Megha Singla Top Model Safe
Rohini @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Megha Singla Top Model Safe
depikasharma
 
Greater Kailash @ℂall @Girls ꧁❤ 9873777170 ❤꧂Glamorous sonam Mehra Top Model ...
Greater Kailash @ℂall @Girls ꧁❤ 9873777170 ❤꧂Glamorous sonam Mehra Top Model ...Greater Kailash @ℂall @Girls ꧁❤ 9873777170 ❤꧂Glamorous sonam Mehra Top Model ...
Greater Kailash @ℂall @Girls ꧁❤ 9873777170 ❤꧂Glamorous sonam Mehra Top Model ...
shoeb2926
 
Vasant Kunj @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Ruhi Singla Top Model Safe
Vasant Kunj @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Ruhi Singla Top Model SafeVasant Kunj @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Ruhi Singla Top Model Safe
Vasant Kunj @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Ruhi Singla Top Model Safe
nikita dubey$A17
 
Victoria University degree offer diploma Transcript
Victoria University  degree offer diploma TranscriptVictoria University  degree offer diploma Transcript
Victoria University degree offer diploma Transcript
taqyea
 
[D3T2S03] Data&AI Roadshow 2024 - Amazon DocumentDB 실습
[D3T2S03] Data&AI Roadshow 2024 - Amazon DocumentDB 실습[D3T2S03] Data&AI Roadshow 2024 - Amazon DocumentDB 실습
[D3T2S03] Data&AI Roadshow 2024 - Amazon DocumentDB 실습
Amazon Web Services Korea
 
Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...
Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...
Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...
javier ramirez
 
Lajpat Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Arti Singh Top Model Safe
Lajpat Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Arti Singh Top Model SafeLajpat Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Arti Singh Top Model Safe
Lajpat Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Arti Singh Top Model Safe
aarusi sexy model
 
Karol Bagh @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Jya Khan Top Model Safe
Karol Bagh @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Jya Khan Top Model SafeKarol Bagh @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Jya Khan Top Model Safe
Karol Bagh @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Jya Khan Top Model Safe
bookmybebe1
 
Introduction to the Red Hat Portfolio.pdf
Introduction to the Red Hat Portfolio.pdfIntroduction to the Red Hat Portfolio.pdf
Introduction to the Red Hat Portfolio.pdf
kihus38
 
Supervised Learning (Data Science).pptx
Supervised Learning  (Data Science).pptxSupervised Learning  (Data Science).pptx
Supervised Learning (Data Science).pptx
TARIKU ENDALE
 
[D3T1S03] Amazon DynamoDB design puzzlers
[D3T1S03] Amazon DynamoDB design puzzlers[D3T1S03] Amazon DynamoDB design puzzlers
[D3T1S03] Amazon DynamoDB design puzzlers
Amazon Web Services Korea
 
Mahipalpur @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model Safe
Mahipalpur @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model SafeMahipalpur @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model Safe
Mahipalpur @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model Safe
aashuverma204
 
From Clues to Connections: How Social Media Investigators Expose Hidden Networks
From Clues to Connections: How Social Media Investigators Expose Hidden NetworksFrom Clues to Connections: How Social Media Investigators Expose Hidden Networks
From Clues to Connections: How Social Media Investigators Expose Hidden Networks
Milind Agarwal
 
LLM powered Contract Compliance Application.pptx
LLM powered Contract Compliance Application.pptxLLM powered Contract Compliance Application.pptx
LLM powered Contract Compliance Application.pptx
Jyotishko Biswas
 

Recently uploaded (20)

Saket @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Neha Singla Top Model Safe
Saket @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Neha Singla Top Model SafeSaket @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Neha Singla Top Model Safe
Saket @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Neha Singla Top Model Safe
 
iot paper presentation FINAL EDIT by kiran.pptx
iot paper presentation FINAL EDIT by kiran.pptxiot paper presentation FINAL EDIT by kiran.pptx
iot paper presentation FINAL EDIT by kiran.pptx
 
[D3T1S02] Aurora Limitless Database Introduction
[D3T1S02] Aurora Limitless Database Introduction[D3T1S02] Aurora Limitless Database Introduction
[D3T1S02] Aurora Limitless Database Introduction
 
NPS_Presentation_V3.pptx it is regarding National pension scheme
NPS_Presentation_V3.pptx it is regarding National pension schemeNPS_Presentation_V3.pptx it is regarding National pension scheme
NPS_Presentation_V3.pptx it is regarding National pension scheme
 
Daryaganj @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model Safe
Daryaganj @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model SafeDaryaganj @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model Safe
Daryaganj @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model Safe
 
Noida Extension @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Vishakha Singla Top Model Safe
Noida Extension @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Vishakha Singla Top Model SafeNoida Extension @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Vishakha Singla Top Model Safe
Noida Extension @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Vishakha Singla Top Model Safe
 
Rohini @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Megha Singla Top Model Safe
Rohini @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Megha Singla Top Model SafeRohini @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Megha Singla Top Model Safe
Rohini @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Megha Singla Top Model Safe
 
Greater Kailash @ℂall @Girls ꧁❤ 9873777170 ❤꧂Glamorous sonam Mehra Top Model ...
Greater Kailash @ℂall @Girls ꧁❤ 9873777170 ❤꧂Glamorous sonam Mehra Top Model ...Greater Kailash @ℂall @Girls ꧁❤ 9873777170 ❤꧂Glamorous sonam Mehra Top Model ...
Greater Kailash @ℂall @Girls ꧁❤ 9873777170 ❤꧂Glamorous sonam Mehra Top Model ...
 
Vasant Kunj @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Ruhi Singla Top Model Safe
Vasant Kunj @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Ruhi Singla Top Model SafeVasant Kunj @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Ruhi Singla Top Model Safe
Vasant Kunj @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Ruhi Singla Top Model Safe
 
Victoria University degree offer diploma Transcript
Victoria University  degree offer diploma TranscriptVictoria University  degree offer diploma Transcript
Victoria University degree offer diploma Transcript
 
[D3T2S03] Data&AI Roadshow 2024 - Amazon DocumentDB 실습
[D3T2S03] Data&AI Roadshow 2024 - Amazon DocumentDB 실습[D3T2S03] Data&AI Roadshow 2024 - Amazon DocumentDB 실습
[D3T2S03] Data&AI Roadshow 2024 - Amazon DocumentDB 실습
 
Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...
Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...
Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...
 
Lajpat Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Arti Singh Top Model Safe
Lajpat Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Arti Singh Top Model SafeLajpat Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Arti Singh Top Model Safe
Lajpat Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Arti Singh Top Model Safe
 
Karol Bagh @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Jya Khan Top Model Safe
Karol Bagh @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Jya Khan Top Model SafeKarol Bagh @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Jya Khan Top Model Safe
Karol Bagh @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Jya Khan Top Model Safe
 
Introduction to the Red Hat Portfolio.pdf
Introduction to the Red Hat Portfolio.pdfIntroduction to the Red Hat Portfolio.pdf
Introduction to the Red Hat Portfolio.pdf
 
Supervised Learning (Data Science).pptx
Supervised Learning  (Data Science).pptxSupervised Learning  (Data Science).pptx
Supervised Learning (Data Science).pptx
 
[D3T1S03] Amazon DynamoDB design puzzlers
[D3T1S03] Amazon DynamoDB design puzzlers[D3T1S03] Amazon DynamoDB design puzzlers
[D3T1S03] Amazon DynamoDB design puzzlers
 
Mahipalpur @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model Safe
Mahipalpur @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model SafeMahipalpur @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model Safe
Mahipalpur @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model Safe
 
From Clues to Connections: How Social Media Investigators Expose Hidden Networks
From Clues to Connections: How Social Media Investigators Expose Hidden NetworksFrom Clues to Connections: How Social Media Investigators Expose Hidden Networks
From Clues to Connections: How Social Media Investigators Expose Hidden Networks
 
LLM powered Contract Compliance Application.pptx
LLM powered Contract Compliance Application.pptxLLM powered Contract Compliance Application.pptx
LLM powered Contract Compliance Application.pptx
 

Modern data warehouse presentation

  • 1. David Rice & Tom Bruce The Modern Data Warehouse 15th January 2019 @snapanalytics hello@snapanalytics.co.uk Snap-analytics
  • 2. Agenda 2 Topic 01 Introductions 02 Evolution of the Data Warehouse 03 Problems with traditional Data Warehousing 04 Why the Modern Data Platform? 05 Three components of the Modern Data Platform 06 Demo 07 Key takeaways
  • 3. Introductions 3 Tom Bruce David Rice - aka ‘Data Dave’ (Delivery Lead and Co- founder) Extensive experience designing and delivering enterprise data warehouse and analytics solutions. Core functional expertise in • Finance • Marketing Tom has worked with clients including: • Jaguar Land Rover • Deutsche Bank • Carlsberg (CEO and Co-founder) Over 15 years experience in data analytics including: • Data warehouse design • ETL (data integration) • Data Modelling • Delivering self service analytics David has worked with clients including: • ING Bank • Barclays Capital and • Jaguar Land Rover
  • 4. Bill Inmon Mid 1970s Bill Inmon begins to define and discuss the term ‘Data Warehouse’. AC Nielsen’s ‘Data Mart’ Early 1970s ACNielsen provided ‘Data Marts’ to their clients in order to help them understand their sales better. Evolution of the Data Warehouse 4
  • 5. IBM Article of Data Warehousing Late 1980s In 1988 IBM published ‘An architecture for a business information system’ and coined the term “business data warehouse” Early 1980s Evolution of the Data Warehouse MPP Databases Teradata create the DBC/1012 database. Goodyear aerospace build the ‘Goodyear MPP’ supercomputer. 5
  • 6. TDWI Mid 1990s ‘The Data Warehouse Institute’ is founded. Early 1990s Evolution of the Data Warehouse Ralph Kimball Ralph Kimball introduces the ‘Red Brick Data Warehouse’, The Data Warehouse Toolkit 1996 - ‘The Data Warehouse Toolkit’ is published by Ralph Kimball 6
  • 7. ‘Big Data’ & No SQL Late 2000sEarly 2000s Evolution of the Data Warehouse Data Vault Dan Linstedt introduces Data Vault modelling Cloud Computing 7
  • 8. Cloud Adoption Late 2010sEarly 2010s Evolution of the Data Warehouse Cloud Data Warehousing The benefits of Data Warehousing in the cloud were realised as: Google Launched a Data Warehouse as a service ‘Big Query’ in 2011 Amazon launched Redshift in 2013 Snowflake Inc. was publicly launched in 2014 Microsoft launched Azure SQL Data Warehouse in 2016 DW Automation Connectivity 8
  • 9. Three big problems! The Data Warehouse Data Integration (ETL) Data Modelling 9
  • 10. Poor outcomes 60 percent of Big Data projects will fail Gartner, 2017 10
  • 11. Problems with traditional DW solutions Initial Set Up Performance Tuning Ongoing Maintenance Scalability Data Security & Compliance Flexibility High Upfront Costs Resilience 11
  • 12. Problems with traditional ETL solutions Time consuming Documentation Inconsistent Auditability & Lineage Performance Inefficient 12
  • 13. A new way of thinking Modern data platform Modern data platforms like Snowflake are fast to set up and scale up. Low cost storage and decoupled storage and compute eliminate resource contention. Native JSON support and ‘time travel’ features also provide great benefits. Combining, modern data platforms data modelling principles and DW automation tools delivers highly agile, highly scalable, performant solutions. This can serve the needs of your data scientists and business community alike. Data Warehouse automation Tools like Fivetran improve consistency, and significantly reduce development cycles. Agile data modelling Data Vault 2.0 enables parallel loading, support for unstructured data, and is built with change in mind. 13
  • 15. High Level Architecture - Snowflake 15
  • 17. Fivetran – Salesforce Schema 17 c
  • 18. CitiBike Demo Context 3 • CitiBike is a bike share program in New York (similar to Boris Bikes in London) • Users are either annual members or buy short term passes • There are numerous different stations across the city and users will collect a bike from a station and then return it to another station once they are finished • CitiBike want to have a data warehouse to allow them to analyse all of the historical trips and join this with external data to give greater insight • We will see how a modern data platform can be created within minutes to help them achieve this goal
  • 19. Demo Architecture 3 Amazon S3 Citibike Trips (CSV) Amazon S3 NYC Weather Data (JSON) Azure Blob Station MD (JSON) Snowflake Staging Trips Weather Station MD Transformation Trips & Weather Reporting Trips View Direct Load
  • 20. Loading in Snowflake 3 • Data is loaded and queried using virtual warehouses available in the following sizes: • Compute and storage can be completely isolated meaning no resource contention • Processed using massively parallel processing (MPP) compute clusters • Able to scale up the server with no administration needed • Bulk data loading can be done from the following sources: XS 1 server XXXL 128 servers
  • 21. SNOWFLAKE DEMO a) Bulk loading from S3 Stage b) Scaling up the server
  • 22. 02 – ELT v ETL 3 • Modern cloud based solutions now mean that we can utilise ELT rather than ETL:  Endless storage capabilities and scalable processing power  Ability to store semi-structured data meaning that it can be transformed after loading • Big advantage of ELT is that it adds extra flexibility:  Data can be loaded very quickly  Developers can then decide to transform what is necessary, and can quickly change what needs to be transformed
  • 23. FIVETRAN DEMO a) JSON source file b) Loaded into Azure blob storage c) Fivetran connector d) Load e) Transformation
  • 24. 03 – Semi-structured Data 3 • Snowflake is able to store semi-structured data (JSON, Avro, ORC & Parquet) natively enabling ELT • Variant data type in Snowflake stores this data with SQL extensions to query directly • Transformation to turn JSON data into structured tables in Snowflake is extremely simple • Snowflake is a combination of both a Data Warehouse and a Data Lake – a ‘Data Lakehouse’
  • 25. WEATHER DATA LOAD a) Load Weather JSON data from stage b) View the weather data in raw form c) Transform the JSON into structured data
  • 26. 04 – Zero-copy Cloning for Dev and Test 3 • Data is often required to be copied for things such as QA and test environments • Creating copies of the data and environments takes considerable time and there is cost associated to storing the data twice • Snowflake uses cloning to instantly create copies of the data which do not persist a copy of the data, simply referencing the original data  Only new or updated records get stored in the new cloned table
  • 28. 05 – Time Travel 3 • Frequently there are issues with tables or data that is accidentally deleted • Data may be corrupted or changes may be implemented that adversely affect the data • Snowflake allows access to historical data (i.e. changed or deleted) at any point within a 90 day period • Data can be quickly backed up from key times in the past
  • 30. 06 – Reporting Connectivity 3 • Snowflake connects to many different reporting tools, we’ve just selected a few below:
  • 32. Key takeaways Maximise the work NOT done Build for Change Are you future ready? 33

Editor's Notes

  1. Modern Data Platform Start small and scale up quickly…minimising risk. Support for JSON and data science use cases at massive scale. Data Warehouse Automation Note that there are other solutions that solve some of these problems too. SQL Data Warehouse as presented by Kamil, Data Bricks presented by Niall based on Apache Spark. We recommend exploring multiple options, taking a fact based view and implementing what works for you and your organisation.