SlideShare a Scribd company logo
© 2018 Snowflake Computing Inc. All Rights Reserved
SNOWFLAKE
Taller Técnico
© 2018 Snowflake Computing Inc. All Rights Reserved
AGENDA
2
1. Arquitectura
2. Cargar Datos
3. Consulta de Datos
4. Clonar Sin Copiar Datos
5. Time Travel
6. Disponibilidad y Seguridad
7. Compartir Datos con Externos
© 2018 Snowflake Computing Inc. All Rights Reserved
Tienen su cuenta y el SQL para el taller?
Saca tu cuenta gratuita de Snowflake: https://trial.snowflake.com
Baja el SQL de este folder: https://bit.ly/2RGWQF4
© 2018 Snowflake Computing Inc. All Rights Reserved
SNOWFLAKE - ARQUITECTURA:
Almacenamiento y Procesamiento
Almacenamiento separado del
cómputo
Crece automaticamente
Almacenamiento Ilimitado
Cambia el poder de cómputo al
instante
Aumenta o disminuye con base en las
necesidades del negocio al instante y se
apaga cuando no está en uso
Las diferentes cargas de trabajo
no compiten entre sí
ETL, reporting, data science y otras
applications todas procesando datos al
mismo tiempo sin degradación en el
desempeño
4
Management Optimization Security Availability Transactions Metadata
© 2018 Snowflake Computing Inc. All Rights Reserved
Administracion Centralizada
Metadata separada del
almacenamiento y procesamiento
En cuanto la información esta lista, está disponible en todos los
clusters
Consistencia transacional
completa (ACID)
Management Optimization Security Availability Transactions Metadata
5
SNOWFLAKE - ARQUITECTURA:
Servicios Globales
© 2018 Snowflake Computing Inc. All Rights Reserved
Un Poco Mas A Fondo
6
JDBC/ODBC
Cache Cache Cache Cache
Cloud
services
Authentication & Access Control
Infrastructure
manager
Optimizer Metadata
manager
Security
Virtual
warehouses
Base de Datos
VPC/VNet
A
I
Q
B
J
R
C
K
S
D
L
T
E
M
U
F
N
V
G
O
W
H
P
X
A`
E`
B`
F`
C`
G`
D`
H`
Metadata
Información sobre los datos guardados en
Snowflake
Guardada y administrada en la capa de Servicios
Almacenada de manera escalable en un formato
propietario, de rápido acceso y tolerante fallas
Almacenamiento de Datos
Datos guardados en las bases de datos y tablas de
Snowflake
Almacenados en un espacio en la nube manejado
por Snowflake
Formato propietario optimizado
Automáticamente comprimidos y encriptados
© 2018 Snowflake Computing Inc. All Rights Reserved
Manos a la Obra - Setup Inicial
1. Entra a tu cuenta de Snowflake
2. Abre el SQL Script en Snowflake
3. Ejecuta el SQL hasta la línea 76
ETL/ELT
Snowpipe
XS
S
M
M
L
Ventas
Data
Science
Global Services
AWS QuickSight
S
AWS Glue
-Modelo logico
-Seguridad
-Planeación de consultas y optimización
-Control Transacional
Elasticidad Instantánea
XL
ETL/ELT
Snowpipe
XS M
Ventas
Data
Science
M…
Multi-cluster
Global Services
-Modelo logico
-Seguridad
-Planeación de consultas y optimización
-Control Transacional
AWS QuickSight
S
AWS Glue
Piensa Diferente
ETL/ELT
Snowpipe
XS
S
M
Ventas
Data
Science
M…
QA/Dev
Clone
XL
Multi-cluster
Estructurados &
semi-estructurados
Global Services
AWS QuickSight
AWS Glue
XL
L
Finanzas/DBAs
Usuarios
Externos
Data
Sharing
Protección de
datos & time travel
M
-Modelo logico
-Seguridad
-Planeación de consultas y optimización
-Control Transacional
© 2018 Snowflake Computing Inc. All Rights Reserved
THANK YOUGRACIAS
© 2018 Snowflake Computing Inc. All Rights Reserved
Casos De Estudio
© 2018 Snowflake Computing Inc. All Rights Reserved
ENABLING KEY USE CASES
15
Consolidate legacy
datamarts to eliminate silos
and support new projects
Datamart & data silo
consolidation
Directly load structured +
semi-structured data into Snowflake
for reporting & analytics
Integrated
data analytics
Direct access to data for SQL
analysts to explore data, identify
correlations, build & test models
Exploratory &
ad hoc analytics
1110
© 2018 Snowflake Computing Inc. All Rights Reserved
OPERATIONAL DATA LAKE
16
Scenario Pain Points
Snowflake Value
Grow storage independent
of Compute (500TB to 2PB)
Separate Warehouses for
workloads – zero contention
Manage data in one place
without data marts
• Data volumes growing at
2TB per hour
• Need to provide concurrency for
43,000 internal and external users
• Data warehouse powers Nielsen
Marketing Cloud and must be
always on
• Existing Netezza is not able to
perform difficult queries while
loading data
• Cost to maintain hardware and
software becoming prohibitive
• Hadoop is not a real database –
Need ANSI SQL
Source ODS Stage
Multiple Data
Marts
Source
In database
transformations
Users, tools, and Website
directly query the data
© 2018 Snowflake Computing Inc. All Rights Reserved
BI AND ANALYTICS
17
Scenario Pain Points
Snowflake Value
Scalability with
performance
Ability to separately
scale users
Cost savings compared to
legacy solution
• Market research firm
• Incorporate larger sets
of modern data
• ETL with Informatica
• Need performance with scalability
• Risk-averse to data loss
Snowflake
User
Groups
Finance
Marketing
Data
scientists
Oracle Exadata
Profiles
Ratings
Loses
Before
© 2018 Snowflake Computing Inc. All Rights Reserved
LOCALYTICS
18
Scenario Pain Points
Snowflake Value
Only platform to
handle this scale
Fraction of cost
by not having to pay
for the peak
Snowflake’s Zero
Management service
is cheaper and better
• 25 Trillion mobile events
• Exposed to customers as an
interactive data cube
• Hitting limits of legacy
architecture (Vertica on AWS)
• Can’t support deletes or
out-of-sequence updates
© 2018 Snowflake Computing Inc. All Rights Reserved
REAL-WORLD USE CASE
19
Continuous
Loading (4TB/day)
S3
<5min SLA
Virtual
Warehouse
Medium
ETL &
Maintenance
Virtual
Warehouse
Large
Virtual
Warehouse
2X-Large
Reporting
(Segmented)
Interactive
Dashboard
50% < 1s
85% < 2s
95% < 5s
Virtual Warehouse
Auto Scale – X-Large x 5
4 trillion rows
3+ petabyte raw data
8x compression ratio
25M micro partitions
Prod DB
© 2018 Snowflake Computing Inc. All Rights Reserved
MODERNIZING DATA WAREHOUSE
20
Snowflake Value
Map their current data model easily
and quickly with the same SQL
SaaS service with
zero downtime
Separate Warehouses for each team – zero
contention and full independence on the same data
In database
transformations
Users, tools, website
directly query data
Source Hadoop Vertica
Multiple datamarts
Source
REF/
TXN
REF/
TXN
Scenario Pain Points
• Entire company on GCP with business
units using Snowflake
• 30 Node Hadoop Cluster and 46 node
Vertica Cluster on premise
• Extremely security focused with
requirements for PCI
• Using Looker and Spark
• Teams wanted autonomy,
but didn’t have resources to support
• Vertica requires constant maintenance
and suffers from outages
• Performance suffers during time
of intense reporting
• Moving to BigQuery would create
a new set of problems, Redshift
had the same problems
• Getting data from partners is
painful and slow
© 2018 Snowflake Computing Inc. All Rights Reserved
DATA ANALYTICS AT EXTREME SCALE
21
Scenario Pain Points
Snowflake Value
120x faster –
from 20 hours
to 45 minutes
Ad-Hoc analytics
available to all users
in minutes
Deployed in a week
during the busiest
time of year
• Financial institution with
a huge focus on security
• Overburdened staff
• Business needs to run monthly reports
that span 10 years of historical data
• No way to analyze
semi-structured data
• Quoted $500,000 to replace their
existing hardware appliance
• 20+ hours to run reports
• Could not continue to scale
• Users unable to query while
performing ETL
Profiles
Ratings
Loses
Microstrategy
1-2 days
1-2 mins
Legacy DW
© 2018 Snowflake Computing Inc. All Rights Reserved
AD-TECH ANALYTICS
22
Scenario Pain Points
Snowflake Value
“Because of [Snowflake], business intelligence
is moving from a cost center to a value center”
• Analyze and monetize large data set
of website traffic
• Continue to add data from new sites
• Large data volumes of traditional
and JSON data
• Separate data warehouse &
Hadoop environments
• Unpredictable performance on both
data warehouse & Hadoop
Solution
• All data in Snowflake (replaced
130-node Hadoop & DW)
• Data analysts directly explore,
build, test, and deploy new
algorithms using Snowflake
• Up-to-date dev / test environment
in Snowflake
— Keith Lavery
Senior Director, BI Data and Analytics
© 2018 Snowflake Computing Inc. All Rights Reserved
CUSTOMER RESULTS
23
We can do 100 times
more queries per day,
helping us give our
clients richer analysis
far more rapidly.
— Balaji Rao
VP Technology
Snowflake is faster,
more flexible, and more
scalable than the
alternatives on the
market. The fact that
we don’t need to do any
configuration or tuning
is great because we
can focus on analyzing
data instead of on
managing and tuning a
data warehouse.
With Snowflake, I’m able
to spin up as many as I
want on demand and to
spin them down and not
pay for those things that
I’m not using.
Snowflake is
awesomely fast, allows
us to store data at a low
cost and deploy exactly
the compute capacity
needed, and does all of
that without requiring
tuning or tweaking.
— Craig Lancaster
CTO
— Matt Solnit
CTO
— Kurk Spendlove
Director Engineering

More Related Content

Laboratorio práctico: Data warehouse en la nube

  • 1. © 2018 Snowflake Computing Inc. All Rights Reserved SNOWFLAKE Taller Técnico
  • 2. © 2018 Snowflake Computing Inc. All Rights Reserved AGENDA 2 1. Arquitectura 2. Cargar Datos 3. Consulta de Datos 4. Clonar Sin Copiar Datos 5. Time Travel 6. Disponibilidad y Seguridad 7. Compartir Datos con Externos
  • 3. © 2018 Snowflake Computing Inc. All Rights Reserved Tienen su cuenta y el SQL para el taller? Saca tu cuenta gratuita de Snowflake: https://trial.snowflake.com Baja el SQL de este folder: https://bit.ly/2RGWQF4
  • 4. © 2018 Snowflake Computing Inc. All Rights Reserved SNOWFLAKE - ARQUITECTURA: Almacenamiento y Procesamiento Almacenamiento separado del cómputo Crece automaticamente Almacenamiento Ilimitado Cambia el poder de cómputo al instante Aumenta o disminuye con base en las necesidades del negocio al instante y se apaga cuando no está en uso Las diferentes cargas de trabajo no compiten entre sí ETL, reporting, data science y otras applications todas procesando datos al mismo tiempo sin degradación en el desempeño 4 Management Optimization Security Availability Transactions Metadata
  • 5. © 2018 Snowflake Computing Inc. All Rights Reserved Administracion Centralizada Metadata separada del almacenamiento y procesamiento En cuanto la información esta lista, está disponible en todos los clusters Consistencia transacional completa (ACID) Management Optimization Security Availability Transactions Metadata 5 SNOWFLAKE - ARQUITECTURA: Servicios Globales
  • 6. © 2018 Snowflake Computing Inc. All Rights Reserved Un Poco Mas A Fondo 6 JDBC/ODBC Cache Cache Cache Cache Cloud services Authentication & Access Control Infrastructure manager Optimizer Metadata manager Security Virtual warehouses Base de Datos VPC/VNet A I Q B J R C K S D L T E M U F N V G O W H P X A` E` B` F` C` G` D` H` Metadata Información sobre los datos guardados en Snowflake Guardada y administrada en la capa de Servicios Almacenada de manera escalable en un formato propietario, de rápido acceso y tolerante fallas Almacenamiento de Datos Datos guardados en las bases de datos y tablas de Snowflake Almacenados en un espacio en la nube manejado por Snowflake Formato propietario optimizado Automáticamente comprimidos y encriptados
  • 7. © 2018 Snowflake Computing Inc. All Rights Reserved Manos a la Obra - Setup Inicial 1. Entra a tu cuenta de Snowflake 2. Abre el SQL Script en Snowflake 3. Ejecuta el SQL hasta la línea 76
  • 8. ETL/ELT Snowpipe XS S M M L Ventas Data Science Global Services AWS QuickSight S AWS Glue -Modelo logico -Seguridad -Planeación de consultas y optimización -Control Transacional
  • 10. XL ETL/ELT Snowpipe XS M Ventas Data Science M… Multi-cluster Global Services -Modelo logico -Seguridad -Planeación de consultas y optimización -Control Transacional AWS QuickSight S AWS Glue
  • 12. ETL/ELT Snowpipe XS S M Ventas Data Science M… QA/Dev Clone XL Multi-cluster Estructurados & semi-estructurados Global Services AWS QuickSight AWS Glue XL L Finanzas/DBAs Usuarios Externos Data Sharing Protección de datos & time travel M -Modelo logico -Seguridad -Planeación de consultas y optimización -Control Transacional
  • 13. © 2018 Snowflake Computing Inc. All Rights Reserved THANK YOUGRACIAS
  • 14. © 2018 Snowflake Computing Inc. All Rights Reserved Casos De Estudio
  • 15. © 2018 Snowflake Computing Inc. All Rights Reserved ENABLING KEY USE CASES 15 Consolidate legacy datamarts to eliminate silos and support new projects Datamart & data silo consolidation Directly load structured + semi-structured data into Snowflake for reporting & analytics Integrated data analytics Direct access to data for SQL analysts to explore data, identify correlations, build & test models Exploratory & ad hoc analytics 1110
  • 16. © 2018 Snowflake Computing Inc. All Rights Reserved OPERATIONAL DATA LAKE 16 Scenario Pain Points Snowflake Value Grow storage independent of Compute (500TB to 2PB) Separate Warehouses for workloads – zero contention Manage data in one place without data marts • Data volumes growing at 2TB per hour • Need to provide concurrency for 43,000 internal and external users • Data warehouse powers Nielsen Marketing Cloud and must be always on • Existing Netezza is not able to perform difficult queries while loading data • Cost to maintain hardware and software becoming prohibitive • Hadoop is not a real database – Need ANSI SQL Source ODS Stage Multiple Data Marts Source In database transformations Users, tools, and Website directly query the data
  • 17. © 2018 Snowflake Computing Inc. All Rights Reserved BI AND ANALYTICS 17 Scenario Pain Points Snowflake Value Scalability with performance Ability to separately scale users Cost savings compared to legacy solution • Market research firm • Incorporate larger sets of modern data • ETL with Informatica • Need performance with scalability • Risk-averse to data loss Snowflake User Groups Finance Marketing Data scientists Oracle Exadata Profiles Ratings Loses Before
  • 18. © 2018 Snowflake Computing Inc. All Rights Reserved LOCALYTICS 18 Scenario Pain Points Snowflake Value Only platform to handle this scale Fraction of cost by not having to pay for the peak Snowflake’s Zero Management service is cheaper and better • 25 Trillion mobile events • Exposed to customers as an interactive data cube • Hitting limits of legacy architecture (Vertica on AWS) • Can’t support deletes or out-of-sequence updates
  • 19. © 2018 Snowflake Computing Inc. All Rights Reserved REAL-WORLD USE CASE 19 Continuous Loading (4TB/day) S3 <5min SLA Virtual Warehouse Medium ETL & Maintenance Virtual Warehouse Large Virtual Warehouse 2X-Large Reporting (Segmented) Interactive Dashboard 50% < 1s 85% < 2s 95% < 5s Virtual Warehouse Auto Scale – X-Large x 5 4 trillion rows 3+ petabyte raw data 8x compression ratio 25M micro partitions Prod DB
  • 20. © 2018 Snowflake Computing Inc. All Rights Reserved MODERNIZING DATA WAREHOUSE 20 Snowflake Value Map their current data model easily and quickly with the same SQL SaaS service with zero downtime Separate Warehouses for each team – zero contention and full independence on the same data In database transformations Users, tools, website directly query data Source Hadoop Vertica Multiple datamarts Source REF/ TXN REF/ TXN Scenario Pain Points • Entire company on GCP with business units using Snowflake • 30 Node Hadoop Cluster and 46 node Vertica Cluster on premise • Extremely security focused with requirements for PCI • Using Looker and Spark • Teams wanted autonomy, but didn’t have resources to support • Vertica requires constant maintenance and suffers from outages • Performance suffers during time of intense reporting • Moving to BigQuery would create a new set of problems, Redshift had the same problems • Getting data from partners is painful and slow
  • 21. © 2018 Snowflake Computing Inc. All Rights Reserved DATA ANALYTICS AT EXTREME SCALE 21 Scenario Pain Points Snowflake Value 120x faster – from 20 hours to 45 minutes Ad-Hoc analytics available to all users in minutes Deployed in a week during the busiest time of year • Financial institution with a huge focus on security • Overburdened staff • Business needs to run monthly reports that span 10 years of historical data • No way to analyze semi-structured data • Quoted $500,000 to replace their existing hardware appliance • 20+ hours to run reports • Could not continue to scale • Users unable to query while performing ETL Profiles Ratings Loses Microstrategy 1-2 days 1-2 mins Legacy DW
  • 22. © 2018 Snowflake Computing Inc. All Rights Reserved AD-TECH ANALYTICS 22 Scenario Pain Points Snowflake Value “Because of [Snowflake], business intelligence is moving from a cost center to a value center” • Analyze and monetize large data set of website traffic • Continue to add data from new sites • Large data volumes of traditional and JSON data • Separate data warehouse & Hadoop environments • Unpredictable performance on both data warehouse & Hadoop Solution • All data in Snowflake (replaced 130-node Hadoop & DW) • Data analysts directly explore, build, test, and deploy new algorithms using Snowflake • Up-to-date dev / test environment in Snowflake — Keith Lavery Senior Director, BI Data and Analytics
  • 23. © 2018 Snowflake Computing Inc. All Rights Reserved CUSTOMER RESULTS 23 We can do 100 times more queries per day, helping us give our clients richer analysis far more rapidly. — Balaji Rao VP Technology Snowflake is faster, more flexible, and more scalable than the alternatives on the market. The fact that we don’t need to do any configuration or tuning is great because we can focus on analyzing data instead of on managing and tuning a data warehouse. With Snowflake, I’m able to spin up as many as I want on demand and to spin them down and not pay for those things that I’m not using. Snowflake is awesomely fast, allows us to store data at a low cost and deploy exactly the compute capacity needed, and does all of that without requiring tuning or tweaking. — Craig Lancaster CTO — Matt Solnit CTO — Kurk Spendlove Director Engineering