Laboratorio práctico: Data warehouse en la nube
- 2. © 2018 Snowflake Computing Inc. All Rights Reserved
AGENDA
2
1. Arquitectura
2. Cargar Datos
3. Consulta de Datos
4. Clonar Sin Copiar Datos
5. Time Travel
6. Disponibilidad y Seguridad
7. Compartir Datos con Externos
- 3. © 2018 Snowflake Computing Inc. All Rights Reserved
Tienen su cuenta y el SQL para el taller?
Saca tu cuenta gratuita de Snowflake: https://trial.snowflake.com
Baja el SQL de este folder: https://bit.ly/2RGWQF4
- 4. © 2018 Snowflake Computing Inc. All Rights Reserved
SNOWFLAKE - ARQUITECTURA:
Almacenamiento y Procesamiento
Almacenamiento separado del
cómputo
Crece automaticamente
Almacenamiento Ilimitado
Cambia el poder de cómputo al
instante
Aumenta o disminuye con base en las
necesidades del negocio al instante y se
apaga cuando no está en uso
Las diferentes cargas de trabajo
no compiten entre sí
ETL, reporting, data science y otras
applications todas procesando datos al
mismo tiempo sin degradación en el
desempeño
4
Management Optimization Security Availability Transactions Metadata
- 5. © 2018 Snowflake Computing Inc. All Rights Reserved
Administracion Centralizada
Metadata separada del
almacenamiento y procesamiento
En cuanto la información esta lista, está disponible en todos los
clusters
Consistencia transacional
completa (ACID)
Management Optimization Security Availability Transactions Metadata
5
SNOWFLAKE - ARQUITECTURA:
Servicios Globales
- 6. © 2018 Snowflake Computing Inc. All Rights Reserved
Un Poco Mas A Fondo
6
JDBC/ODBC
Cache Cache Cache Cache
Cloud
services
Authentication & Access Control
Infrastructure
manager
Optimizer Metadata
manager
Security
Virtual
warehouses
Base de Datos
VPC/VNet
A
I
Q
B
J
R
C
K
S
D
L
T
E
M
U
F
N
V
G
O
W
H
P
X
A`
E`
B`
F`
C`
G`
D`
H`
Metadata
Información sobre los datos guardados en
Snowflake
Guardada y administrada en la capa de Servicios
Almacenada de manera escalable en un formato
propietario, de rápido acceso y tolerante fallas
Almacenamiento de Datos
Datos guardados en las bases de datos y tablas de
Snowflake
Almacenados en un espacio en la nube manejado
por Snowflake
Formato propietario optimizado
Automáticamente comprimidos y encriptados
- 7. © 2018 Snowflake Computing Inc. All Rights Reserved
Manos a la Obra - Setup Inicial
1. Entra a tu cuenta de Snowflake
2. Abre el SQL Script en Snowflake
3. Ejecuta el SQL hasta la línea 76
- 15. © 2018 Snowflake Computing Inc. All Rights Reserved
ENABLING KEY USE CASES
15
Consolidate legacy
datamarts to eliminate silos
and support new projects
Datamart & data silo
consolidation
Directly load structured +
semi-structured data into Snowflake
for reporting & analytics
Integrated
data analytics
Direct access to data for SQL
analysts to explore data, identify
correlations, build & test models
Exploratory &
ad hoc analytics
1110
- 16. © 2018 Snowflake Computing Inc. All Rights Reserved
OPERATIONAL DATA LAKE
16
Scenario Pain Points
Snowflake Value
Grow storage independent
of Compute (500TB to 2PB)
Separate Warehouses for
workloads – zero contention
Manage data in one place
without data marts
• Data volumes growing at
2TB per hour
• Need to provide concurrency for
43,000 internal and external users
• Data warehouse powers Nielsen
Marketing Cloud and must be
always on
• Existing Netezza is not able to
perform difficult queries while
loading data
• Cost to maintain hardware and
software becoming prohibitive
• Hadoop is not a real database –
Need ANSI SQL
Source ODS Stage
Multiple Data
Marts
Source
In database
transformations
Users, tools, and Website
directly query the data
- 17. © 2018 Snowflake Computing Inc. All Rights Reserved
BI AND ANALYTICS
17
Scenario Pain Points
Snowflake Value
Scalability with
performance
Ability to separately
scale users
Cost savings compared to
legacy solution
• Market research firm
• Incorporate larger sets
of modern data
• ETL with Informatica
• Need performance with scalability
• Risk-averse to data loss
Snowflake
User
Groups
Finance
Marketing
Data
scientists
Oracle Exadata
Profiles
Ratings
Loses
Before
- 18. © 2018 Snowflake Computing Inc. All Rights Reserved
LOCALYTICS
18
Scenario Pain Points
Snowflake Value
Only platform to
handle this scale
Fraction of cost
by not having to pay
for the peak
Snowflake’s Zero
Management service
is cheaper and better
• 25 Trillion mobile events
• Exposed to customers as an
interactive data cube
• Hitting limits of legacy
architecture (Vertica on AWS)
• Can’t support deletes or
out-of-sequence updates
- 19. © 2018 Snowflake Computing Inc. All Rights Reserved
REAL-WORLD USE CASE
19
Continuous
Loading (4TB/day)
S3
<5min SLA
Virtual
Warehouse
Medium
ETL &
Maintenance
Virtual
Warehouse
Large
Virtual
Warehouse
2X-Large
Reporting
(Segmented)
Interactive
Dashboard
50% < 1s
85% < 2s
95% < 5s
Virtual Warehouse
Auto Scale – X-Large x 5
4 trillion rows
3+ petabyte raw data
8x compression ratio
25M micro partitions
Prod DB
- 20. © 2018 Snowflake Computing Inc. All Rights Reserved
MODERNIZING DATA WAREHOUSE
20
Snowflake Value
Map their current data model easily
and quickly with the same SQL
SaaS service with
zero downtime
Separate Warehouses for each team – zero
contention and full independence on the same data
In database
transformations
Users, tools, website
directly query data
Source Hadoop Vertica
Multiple datamarts
Source
REF/
TXN
REF/
TXN
Scenario Pain Points
• Entire company on GCP with business
units using Snowflake
• 30 Node Hadoop Cluster and 46 node
Vertica Cluster on premise
• Extremely security focused with
requirements for PCI
• Using Looker and Spark
• Teams wanted autonomy,
but didn’t have resources to support
• Vertica requires constant maintenance
and suffers from outages
• Performance suffers during time
of intense reporting
• Moving to BigQuery would create
a new set of problems, Redshift
had the same problems
• Getting data from partners is
painful and slow
- 21. © 2018 Snowflake Computing Inc. All Rights Reserved
DATA ANALYTICS AT EXTREME SCALE
21
Scenario Pain Points
Snowflake Value
120x faster –
from 20 hours
to 45 minutes
Ad-Hoc analytics
available to all users
in minutes
Deployed in a week
during the busiest
time of year
• Financial institution with
a huge focus on security
• Overburdened staff
• Business needs to run monthly reports
that span 10 years of historical data
• No way to analyze
semi-structured data
• Quoted $500,000 to replace their
existing hardware appliance
• 20+ hours to run reports
• Could not continue to scale
• Users unable to query while
performing ETL
Profiles
Ratings
Loses
Microstrategy
1-2 days
1-2 mins
Legacy DW
- 22. © 2018 Snowflake Computing Inc. All Rights Reserved
AD-TECH ANALYTICS
22
Scenario Pain Points
Snowflake Value
“Because of [Snowflake], business intelligence
is moving from a cost center to a value center”
• Analyze and monetize large data set
of website traffic
• Continue to add data from new sites
• Large data volumes of traditional
and JSON data
• Separate data warehouse &
Hadoop environments
• Unpredictable performance on both
data warehouse & Hadoop
Solution
• All data in Snowflake (replaced
130-node Hadoop & DW)
• Data analysts directly explore,
build, test, and deploy new
algorithms using Snowflake
• Up-to-date dev / test environment
in Snowflake
— Keith Lavery
Senior Director, BI Data and Analytics
- 23. © 2018 Snowflake Computing Inc. All Rights Reserved
CUSTOMER RESULTS
23
We can do 100 times
more queries per day,
helping us give our
clients richer analysis
far more rapidly.
— Balaji Rao
VP Technology
Snowflake is faster,
more flexible, and more
scalable than the
alternatives on the
market. The fact that
we don’t need to do any
configuration or tuning
is great because we
can focus on analyzing
data instead of on
managing and tuning a
data warehouse.
With Snowflake, I’m able
to spin up as many as I
want on demand and to
spin them down and not
pay for those things that
I’m not using.
Snowflake is
awesomely fast, allows
us to store data at a low
cost and deploy exactly
the compute capacity
needed, and does all of
that without requiring
tuning or tweaking.
— Craig Lancaster
CTO
— Matt Solnit
CTO
— Kurk Spendlove
Director Engineering