SlideShare a Scribd company logo
HOW TO REALIZE AN ADDITIONAL
270% ROI WITH SNOWFLAKE
2
Introduction
How to dramatically increase the ROI of your Snowflake investment by:
● Managing the size of your data warehouse
● Defining and setting limits on query times to prevent runaway queries
● Implementing visibility and telemetry to monitor usage
● Automating the creation, maintenance and management of data aggregates
3
Today’s Speakers
VP, Analytics, Rakuten
Rewards
Twitter?
Mark is VP, Analytics, at Rakuten
Rewards, formerly Ebates. He’s been
with the company since 2014, and
was with the team that sold Ebates
to Rakuten in 2015. Mark plays a
double role, leading a center of
excellence analytics group and
product managing the enterprise
business intelligence stack. Prior to
joining Ebates, Mark worked in the
residential real estate, and online
tournament spaces.
Mark Stange-Tregear
Chief Strategy Officer, AtScale
@dmariani
Dave is one of the co-founders of
AtScale and is currently the Chief
Strategy Officer.
Prior to AtScale, Dave was VP of
Engineering at Klout & at Yahoo!
where he built the world's largest
multi-dimensional cube for BI on
Hadoop.
Dave is a Big Data visionary & serial
entrepreneur.
Dave Mariani
@rakutenrewards
4
In Pursuit of Processing Power - A Timeline
2014 2016 20172015 2018 2019
SSRS
SQL Server
SSAS
M.Strategy
Hadoop (Cloudera)
@scale
Tableau
Hadoop (Cloudera)
@scale
Tableau
Hadoop (Cloudera)
@scale
Tableau
@scale
Tableau
Snowflake
5
Why Snowflake?
USERS “TRIP” OVER EACH OTHER
FRUSTRATION AND MISSED GOALS
SEPARATE COMPUTE INTO
DISCRETE CLUSTERS
• Dozens of discreet warehouses
• Budget cost to the business unit
• Separate ETL from ad hoc workload
• Horizontally scalable on-demand
PROBLEM
RESULT
Managing Data Warehouses
MORE WAREHOUSES MORE VISIBILITY MORE CONTROL
HOW MANY?
● At least one warehouse per business team or engineering group (often several)
● Dedicated warehouses for ETL components
● Dedicated warehouses for 3rd party products
CONTROL OF WAREHOUSE SIZE?
● Constantly reviewed for potential down-sizing
● Resizing control centralized with cost management and oversight “team”
IS BIGGER BETTER?
● Not always
● IO intensive workloads can work on smaller clusters
● Aggregations and joins on bigger clusters
7
More Visibility … More Control
Get to know and love:
● "SNOWFLAKE"."ACCOUNT_USAGE"."WAREHOUSE_METERING_HISTORY"
● "SNOWFLAKE"."ACCOUNT_USAGE"."QUERY_HISTORY”
and these are well worth knowing as well…
● "SNOWFLAKE"."ACCOUNT_USAGE"."STORAGE_USAGE"
● "SNOWFLAKE"."ACCOUNT_USAGE"."METERING_DAILY_HISTORY"
● SNOWFLAKE"."READER_ACCOUNT_USAGE"."WAREHOUSE_METERING_HISTORY”
MAKE MONITORING EASY… AUTOMATED DAILY REPORTS
8
Levers to Pull
9
1. Warehouse size… typically moving down, but sometimes up
2. Horizontal scaling
3. Move jobs between warehouses
4. Caching
5. Code rewrite
6. Clustering
A note on code rewrite…
● Data modeling is still important
● Snowflake is very powerful, but joins and aggregations cost
● Repetitive joins, repetitive aggregation? Consider creating “flat” warehouse table
10
Query
Performance
User
Concurrency
Compute Costs
How fast can the Cloud
Data Warehouse answer a
query for one user?
How do multiple users
running queries affect
performance & stability?
How do query workloads
and configuration impact
your monthly bill?
Semantic
Complexity
How difficult is it to write
the query to answer the
business question?
Additional Considerations When Managing Snowflake
The Cloud Analytics Stack
COMPONENT
CONSUMPTION
VISUALIZATION, ANALYSIS, REPORTING
SEMANTIC LAYER
QUERY ACCESS,METADATA, MASKING, AUDITING
PREPARED DATA
DATA PROCESSING, MODELING
RAW DATA
DATA STORAGE, ENCRYPTION
DATA TRANSFORMATION
ETL,MERGING, AGGREGATION
LAYER (FUNCTION)
BI Tools AI/ML Tools Applications
UNIVERSAL SEMANTIC LAYER
Data Warehouse File Access Engine
ETL Engine
File System (Data Lake)
Data
Catalog
The Cloud Analytics Stack
12
COMPONENT
CONSUMPTION
VISUALIZATION, ANALYSIS, REPORTING
SEMANTIC LAYER
QUERY ACCESS, FILTERING, MASKING, AUDITING
PREPARED DATA
DATA PROCESSING, MODELING
RAW DATA
DATA STORAGE, ENCRYPTION
DATA TRANSFORMATION
ETL,MERGING, AGGREGATION
LAYER (FUNCTION)
BI Tools AI/ML Tools Applications
Multi-dimensional Engine
Data Governance Engine
Virtualization Engine
Data Warehouse File Access Engine
ETL Engine
File System (Data Lake)
Data
Catalog
A Semantic Layer is Critical to Success
13
1. Simplicity
2. Single Source of truth
3. Governance for all
14
A Semantic Layer Simplifies & Normalizes Data Access
SELECT
`d_product_manufacturer_id` AS `d_product_manufacturer_id`,
SUM( `Total Ext Sales Price` ) AS `sum_total__ext_sales_price_ok`
FROM
`tpc-ds benchmark model` `TPC-DS Benchmark Model`
WHERE
`I Category` = 'Electronics'
AND `Sold Calendar Year-Week` = 1999
AND `Sold d_customer_gmt_offset` = -5.00
AND `Sold d_month_of_year` = 7
GROUP BY 1
ORDER BY 2 DESC
LIMIT 100;
with ss as (
select
i_manufact_id,sum(ss_ext_sales_price) total_sales
from
store_sales,
date_dim,
customer_address,
item
where
i_manufact_id in (select
i_manufact_id
from
item
where i_category in ('Electronics'))
and ss_item_sk = i_item_sk
and ss_sold_date_sk = d_date_sk
and d_year = 1999
and d_moy = 7
and ss_addr_sk = ca_address_sk
and ca_gmt_offset = -5
group by i_manufact_id),
cs as (
select
i_manufact_id,sum(cs_ext_sales_price) total_sales
from
catalog_sales,
date_dim,
customer_address,
item
where
...
TPC-DS Query
#33:
What is the monthly sales
figure based on extended
price for a specific month
in a specific year, for
manufacturers in a specific
category in a given time
zone? Group sales by
manufacturer identifier
and sort output by sales
amount, by channel, and
give Total sales.
398 bytes 1,872 bytes
AtScale SQL TPC-DS Raw
15
AtScale’s TPC-DS 10TB Benchmark (10,000 Scale Factor)
THE TPC-DS 10TB
DATASET HAS:
Multiple fact tables
Large fact tables
Large dimensions
1
2
3
Delivers orders of
magnitude query
improvements that
are amplified with high
user concurrency
16
Benchmark Results: Query Performance
14x
Faster
Smooths out & mitigates
user concurrency
challenges without
requiring additional
compute resources
17
Benchmark Results: Concurrency
Note: Tthread group 1 is the average of 5 runs for each of the 20
queries, The 5, 25 & 50 thead groups ran each of the 20 queries 1
time per thread.
18
Benchmark Results: Compute Cost
Allows for smaller compute
resources
& mitigates unpredictable
& unbounded costs for
on-demand pricing models
4x
Less Cost
19
Same Workloads, Smaller Warehouses
Snowflake - Raw Snowflake + AtScale
1, 5, 25, 50 threads 1, 5, 25, 50, 100 threads
Test
Improvement Factor
with AtScale
Snowflake
Query Performance1 4x Faster
User Concurrency2 14x Faster
Compute Cost3 73% Cheaper
Complexity4 76% less complex
SQL queries
Results of TPC-DS 10TB Benchmark Test
20
1. Elapsed time for executing 1 query five times
2. Elapsed time executing 1 (x5), 5, 25, 50 queries
3. Compute costs for cluster time (Redshift, Snowflake) or bytes read (BigQuery) for user concurrency test
4. Complexity score for SQL queries for number of: functions, operations, tables, objects & subqueries (AtScale = 258, TPC-DS = 1,057)
Configuration
Virtual Data
Warehouse Used
Compute
Cost
per Hour1
Snowflake
3X-Large (64
credits/hour)
$128.00
AtScale on
Snowflake
1X-Large (16
credits/hour)
$32.00
AtScale customers realize an additional
270% ROI on Snowflake
21
DEMO
SELECT
`d_product_manufacturer_id` AS `d_product_manufacturer_id`,
SUM( `Total Ext Sales Price` ) AS `sum_total__ext_sales_price_ok`
FROM
`tpc-ds benchmark model` `TPC-DS Benchmark Model`
WHERE
`I Category` = 'Electronics'
AND `Sold Calendar Year-Week` = 1999
AND `Sold d_customer_gmt_offset` = -5.00
AND `Sold d_month_of_year` = 7
GROUP BY 1
ORDER BY 2 DESC
LIMIT 100;
with ss as (
select
i_manufact_id,sum(ss_ext_sales_price) total_sales
from
store_sales,
date_dim,
customer_address,
item
where
i_manufact_id in (select
i_manufact_id
from
item
where i_category in ('Electronics'))
and ss_item_sk = i_item_sk
and ss_sold_date_sk = d_date_sk
and d_year = 1999
and d_moy = 7
and ss_addr_sk = ca_address_sk
and ca_gmt_offset = -5
group by i_manufact_id),
cs as (
select
i_manufact_id,sum(cs_ext_sales_price) total_sales
from
catalog_sales,
date_dim,
customer_address,
item
where
...
TPC-DS Query
#33:
What is the monthly sales
figure based on extended
price for a specific month
in a specific year, for
manufacturers in a specific
category in a given time
zone? Group sales by
manufacturer identifier
and sort output by sales
amount, by channel, and
give Total sales.
398 bytes 1,872 bytes
AtScale SQL TPC-DS Raw
22
Summary: How to realize an additional 270% ROI on Snowflake
▵ Download the Snowflake benchmark report
at: https://www.atscale.com/snowflake
benchmark
▵ Read the Rakuten Rewards case study at:
https://www.atscale.com/rakutenrewards
▵ COMING SOON! Estimate your cost savings
using the AtScale calculator
Q&A!
www.atscale.com

More Related Content

How to Realize an Additional 270% ROI on Snowflake

  • 1. HOW TO REALIZE AN ADDITIONAL 270% ROI WITH SNOWFLAKE
  • 2. 2 Introduction How to dramatically increase the ROI of your Snowflake investment by: ● Managing the size of your data warehouse ● Defining and setting limits on query times to prevent runaway queries ● Implementing visibility and telemetry to monitor usage ● Automating the creation, maintenance and management of data aggregates
  • 3. 3 Today’s Speakers VP, Analytics, Rakuten Rewards Twitter? Mark is VP, Analytics, at Rakuten Rewards, formerly Ebates. He’s been with the company since 2014, and was with the team that sold Ebates to Rakuten in 2015. Mark plays a double role, leading a center of excellence analytics group and product managing the enterprise business intelligence stack. Prior to joining Ebates, Mark worked in the residential real estate, and online tournament spaces. Mark Stange-Tregear Chief Strategy Officer, AtScale @dmariani Dave is one of the co-founders of AtScale and is currently the Chief Strategy Officer. Prior to AtScale, Dave was VP of Engineering at Klout & at Yahoo! where he built the world's largest multi-dimensional cube for BI on Hadoop. Dave is a Big Data visionary & serial entrepreneur. Dave Mariani @rakutenrewards
  • 4. 4 In Pursuit of Processing Power - A Timeline 2014 2016 20172015 2018 2019 SSRS SQL Server SSAS M.Strategy Hadoop (Cloudera) @scale Tableau Hadoop (Cloudera) @scale Tableau Hadoop (Cloudera) @scale Tableau @scale Tableau Snowflake
  • 5. 5 Why Snowflake? USERS “TRIP” OVER EACH OTHER FRUSTRATION AND MISSED GOALS SEPARATE COMPUTE INTO DISCRETE CLUSTERS • Dozens of discreet warehouses • Budget cost to the business unit • Separate ETL from ad hoc workload • Horizontally scalable on-demand PROBLEM RESULT
  • 6. Managing Data Warehouses MORE WAREHOUSES MORE VISIBILITY MORE CONTROL HOW MANY? ● At least one warehouse per business team or engineering group (often several) ● Dedicated warehouses for ETL components ● Dedicated warehouses for 3rd party products CONTROL OF WAREHOUSE SIZE? ● Constantly reviewed for potential down-sizing ● Resizing control centralized with cost management and oversight “team” IS BIGGER BETTER? ● Not always ● IO intensive workloads can work on smaller clusters ● Aggregations and joins on bigger clusters
  • 7. 7 More Visibility … More Control Get to know and love: ● "SNOWFLAKE"."ACCOUNT_USAGE"."WAREHOUSE_METERING_HISTORY" ● "SNOWFLAKE"."ACCOUNT_USAGE"."QUERY_HISTORY” and these are well worth knowing as well… ● "SNOWFLAKE"."ACCOUNT_USAGE"."STORAGE_USAGE" ● "SNOWFLAKE"."ACCOUNT_USAGE"."METERING_DAILY_HISTORY" ● SNOWFLAKE"."READER_ACCOUNT_USAGE"."WAREHOUSE_METERING_HISTORY” MAKE MONITORING EASY… AUTOMATED DAILY REPORTS
  • 8. 8
  • 9. Levers to Pull 9 1. Warehouse size… typically moving down, but sometimes up 2. Horizontal scaling 3. Move jobs between warehouses 4. Caching 5. Code rewrite 6. Clustering A note on code rewrite… ● Data modeling is still important ● Snowflake is very powerful, but joins and aggregations cost ● Repetitive joins, repetitive aggregation? Consider creating “flat” warehouse table
  • 10. 10 Query Performance User Concurrency Compute Costs How fast can the Cloud Data Warehouse answer a query for one user? How do multiple users running queries affect performance & stability? How do query workloads and configuration impact your monthly bill? Semantic Complexity How difficult is it to write the query to answer the business question? Additional Considerations When Managing Snowflake
  • 11. The Cloud Analytics Stack COMPONENT CONSUMPTION VISUALIZATION, ANALYSIS, REPORTING SEMANTIC LAYER QUERY ACCESS,METADATA, MASKING, AUDITING PREPARED DATA DATA PROCESSING, MODELING RAW DATA DATA STORAGE, ENCRYPTION DATA TRANSFORMATION ETL,MERGING, AGGREGATION LAYER (FUNCTION) BI Tools AI/ML Tools Applications UNIVERSAL SEMANTIC LAYER Data Warehouse File Access Engine ETL Engine File System (Data Lake) Data Catalog
  • 12. The Cloud Analytics Stack 12 COMPONENT CONSUMPTION VISUALIZATION, ANALYSIS, REPORTING SEMANTIC LAYER QUERY ACCESS, FILTERING, MASKING, AUDITING PREPARED DATA DATA PROCESSING, MODELING RAW DATA DATA STORAGE, ENCRYPTION DATA TRANSFORMATION ETL,MERGING, AGGREGATION LAYER (FUNCTION) BI Tools AI/ML Tools Applications Multi-dimensional Engine Data Governance Engine Virtualization Engine Data Warehouse File Access Engine ETL Engine File System (Data Lake) Data Catalog
  • 13. A Semantic Layer is Critical to Success 13 1. Simplicity 2. Single Source of truth 3. Governance for all
  • 14. 14 A Semantic Layer Simplifies & Normalizes Data Access SELECT `d_product_manufacturer_id` AS `d_product_manufacturer_id`, SUM( `Total Ext Sales Price` ) AS `sum_total__ext_sales_price_ok` FROM `tpc-ds benchmark model` `TPC-DS Benchmark Model` WHERE `I Category` = 'Electronics' AND `Sold Calendar Year-Week` = 1999 AND `Sold d_customer_gmt_offset` = -5.00 AND `Sold d_month_of_year` = 7 GROUP BY 1 ORDER BY 2 DESC LIMIT 100; with ss as ( select i_manufact_id,sum(ss_ext_sales_price) total_sales from store_sales, date_dim, customer_address, item where i_manufact_id in (select i_manufact_id from item where i_category in ('Electronics')) and ss_item_sk = i_item_sk and ss_sold_date_sk = d_date_sk and d_year = 1999 and d_moy = 7 and ss_addr_sk = ca_address_sk and ca_gmt_offset = -5 group by i_manufact_id), cs as ( select i_manufact_id,sum(cs_ext_sales_price) total_sales from catalog_sales, date_dim, customer_address, item where ... TPC-DS Query #33: What is the monthly sales figure based on extended price for a specific month in a specific year, for manufacturers in a specific category in a given time zone? Group sales by manufacturer identifier and sort output by sales amount, by channel, and give Total sales. 398 bytes 1,872 bytes AtScale SQL TPC-DS Raw
  • 15. 15 AtScale’s TPC-DS 10TB Benchmark (10,000 Scale Factor) THE TPC-DS 10TB DATASET HAS: Multiple fact tables Large fact tables Large dimensions 1 2 3
  • 16. Delivers orders of magnitude query improvements that are amplified with high user concurrency 16 Benchmark Results: Query Performance 14x Faster
  • 17. Smooths out & mitigates user concurrency challenges without requiring additional compute resources 17 Benchmark Results: Concurrency Note: Tthread group 1 is the average of 5 runs for each of the 20 queries, The 5, 25 & 50 thead groups ran each of the 20 queries 1 time per thread.
  • 18. 18 Benchmark Results: Compute Cost Allows for smaller compute resources & mitigates unpredictable & unbounded costs for on-demand pricing models 4x Less Cost
  • 19. 19 Same Workloads, Smaller Warehouses Snowflake - Raw Snowflake + AtScale 1, 5, 25, 50 threads 1, 5, 25, 50, 100 threads
  • 20. Test Improvement Factor with AtScale Snowflake Query Performance1 4x Faster User Concurrency2 14x Faster Compute Cost3 73% Cheaper Complexity4 76% less complex SQL queries Results of TPC-DS 10TB Benchmark Test 20 1. Elapsed time for executing 1 query five times 2. Elapsed time executing 1 (x5), 5, 25, 50 queries 3. Compute costs for cluster time (Redshift, Snowflake) or bytes read (BigQuery) for user concurrency test 4. Complexity score for SQL queries for number of: functions, operations, tables, objects & subqueries (AtScale = 258, TPC-DS = 1,057) Configuration Virtual Data Warehouse Used Compute Cost per Hour1 Snowflake 3X-Large (64 credits/hour) $128.00 AtScale on Snowflake 1X-Large (16 credits/hour) $32.00 AtScale customers realize an additional 270% ROI on Snowflake
  • 21. 21 DEMO SELECT `d_product_manufacturer_id` AS `d_product_manufacturer_id`, SUM( `Total Ext Sales Price` ) AS `sum_total__ext_sales_price_ok` FROM `tpc-ds benchmark model` `TPC-DS Benchmark Model` WHERE `I Category` = 'Electronics' AND `Sold Calendar Year-Week` = 1999 AND `Sold d_customer_gmt_offset` = -5.00 AND `Sold d_month_of_year` = 7 GROUP BY 1 ORDER BY 2 DESC LIMIT 100; with ss as ( select i_manufact_id,sum(ss_ext_sales_price) total_sales from store_sales, date_dim, customer_address, item where i_manufact_id in (select i_manufact_id from item where i_category in ('Electronics')) and ss_item_sk = i_item_sk and ss_sold_date_sk = d_date_sk and d_year = 1999 and d_moy = 7 and ss_addr_sk = ca_address_sk and ca_gmt_offset = -5 group by i_manufact_id), cs as ( select i_manufact_id,sum(cs_ext_sales_price) total_sales from catalog_sales, date_dim, customer_address, item where ... TPC-DS Query #33: What is the monthly sales figure based on extended price for a specific month in a specific year, for manufacturers in a specific category in a given time zone? Group sales by manufacturer identifier and sort output by sales amount, by channel, and give Total sales. 398 bytes 1,872 bytes AtScale SQL TPC-DS Raw
  • 22. 22 Summary: How to realize an additional 270% ROI on Snowflake ▵ Download the Snowflake benchmark report at: https://www.atscale.com/snowflake benchmark ▵ Read the Rakuten Rewards case study at: https://www.atscale.com/rakutenrewards ▵ COMING SOON! Estimate your cost savings using the AtScale calculator
  • 23. Q&A!

Editor's Notes

  1. Companies of all sizes have embraced the power, scale and ease of use of Snowflake along with the promise of cost savings. But as some have learned, cloud compute usage can sneak up on you if you aren’t careful. Today, our experts will discuss how to dramatically increase the ROI of your Snowflake investment by:
  2. AtScale is built to leverage the efficiencies and performance of the cloud for the data consumer whether you’re on premise or in the cloud (or both). We connect people to data. We do that without moving data and without complexity—leveraging existing investments in big data platforms, applications and tools. We also do that consistently, securely and with one set of semantics—and without interrupting existing data usage so that data workers no longer have to understand how or where it is stored. Performance Optimizing performance is difficult and that’s where we focus our energies. AtScale’s data warehouse virtualization can reduce queries performance from 5 weeks to 5 seconds—automatically optimizing each time a user queries the database. Security Because we haven’t copied the data and applied new code or embedded rules, we’ve reduced the amount of complexity and maintain consistent data lineage throughout the data lifecycle. AtScale not only leverages existing data security and governance but applies an additional layer so that data can be ported to new data tools, applications and platforms. Agility What’s more powerful is we create simple interface to querying data and building models for data science and analytics data workers with deep integrations with BI and AI/ML tools. For the first time, users (and IT) have visibilities into how data is being queried and used throughout the organization (no more data silos).
  3. AtScale is built to leverage the efficiencies and performance of the cloud for the data consumer whether you’re on premise or in the cloud (or both). We connect people to data. We do that without moving data and without complexity—leveraging existing investments in big data platforms, applications and tools. We also do that consistently, securely and with one set of semantics—and without interrupting existing data usage so that data workers no longer have to understand how or where it is stored. Performance Optimizing performance is difficult and that’s where we focus our energies. AtScale’s data warehouse virtualization can reduce queries performance from 5 weeks to 5 seconds—automatically optimizing each time a user queries the database. Security Because we haven’t copied the data and applied new code or embedded rules, we’ve reduced the amount of complexity and maintain consistent data lineage throughout the data lifecycle. AtScale not only leverages existing data security and governance but applies an additional layer so that data can be ported to new data tools, applications and platforms. Agility What’s more powerful is we create simple interface to querying data and building models for data science and analytics data workers with deep integrations with BI and AI/ML tools. For the first time, users (and IT) have visibilities into how data is being queried and used throughout the organization (no more data silos).