SlideShare a Scribd company logo
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Pavan Pothukuchi, Amazon Redshift
Nam Nguyen, RetailMeNot
October 2015
DAT201
Introduction to Amazon Redshift
What to expect from the session
• Amazon Redshift – What and Why
• Benefits
• Use cases
• Amazon Redshift at RetailMeNot
• Q&A
AnalyzeStore
Import/Export
Direct Connect
Collect
Amazon Kinesis
Amazon
Glacier
S3
DynamoDB
Amazon Aurora
AWS big data portfolio
Data Pipeline
CloudSearch
EMR EC2
Amazon
Redshift
Machine
Learning
Relational data warehouse
Massively parallel; Petabyte scale
Fully managed
HDD and SSD Platforms
$1,000/TB/Year; starts at $0.25/hour
Amazon
Redshift
a lot faster
a lot simpler
a lot cheaper
The legacy view of data warehousing ...
Global 2,000 companies
Sell to central IT
Multi-year commitment
Multi-year deployments
Multi-million dollar deals
… Leads to dark data
This is a narrow view
Small companies also have big data
(mobile, social, gaming, adtech, IoT)
Long cycles, high costs, administrative
complexity all stifle innovation
0
200
400
600
800
1000
1200
Enterprise Data Data in Warehouse
The Amazon Redshift view of data warehousing
10x cheaper
Easy to provision
Higher DBA productivity
10x faster
No programming
Easily leverage BI tools,
Hadoop, Machine Learning,
Streaming
Analysis in-line with process
flows
Pay as you go, grow as you
need
Managed availability & DR
Enterprise Big Data SaaS
Selected Amazon Redshift customers
Amazon Redshift architecture
Leader Node
Simple SQL end point
Stores metadata
Optimizes query plan
Coordinates query execution
Compute Nodes
Local columnar storage
Parallel/distributed execution of all queries, loads,
backups, restores, resizes
Start at just $0.25/hour, grow to 2 PB (compressed)
DC1: SSD; scale from 160 GB to 326 TB
DS2: HDD; scale from 2 TB to 2 PB
Ingestion/Backup
Backup
Restore
JDBC/ODBC
10 GigE
(HPC)
Benefit #1: Amazon Redshift is fast
Dramatically less I/O
Column storage
Data compression
Zone maps
Direct-attached storage
Large data block sizes
analyze compression listing;
Table | Column | Encoding
---------+----------------+----------
listing | listid | delta
listing | sellerid | delta32k
listing | eventid | delta32k
listing | dateid | bytedict
listing | numtickets | bytedict
listing | priceperticket | delta32k
listing | totalprice | mostly32
listing | listtime | raw
10 | 13 | 14 | 26 |…
… | 100 | 245 | 324
375 | 393 | 417…
… 512 | 549 | 623
637 | 712 | 809 …
… | 834 | 921 | 959
10
324
375
623
637
959
SELECT COUNT(*) FROM LOGS WHERE DATE = ‘09-JUNE-2013’
MIN: 01-JUNE-2013
MAX: 20-JUNE-2013
MIN: 08-JUNE-2013
MAX: 30-JUNE-2013
MIN: 12-JUNE-2013
MAX: 20-JUNE-2013
MIN: 02-JUNE-2013
MAX: 25-JUNE-2013
Unsorted Table
MIN: 01-JUNE-2013
MAX: 06-JUNE-2013
MIN: 07-JUNE-2013
MAX: 12-JUNE-2013
MIN: 13-JUNE-2013
MAX: 18-JUNE-2013
MIN: 19-JUNE-2013
MAX: 24-JUNE-2013
Sorted By Date
Benefit #1: Amazon Redshift is fast
Sort Keys and Zone Maps
Benefit #1: Amazon Redshift is fast
Parallel and Distributed
Query
Load
Export
Backup
Restore
Resize
ID Name
1 John Smith
2 Jane Jones
3 Peter Black
4 Pat Partridge
5 Sarah Cyan
6 Brian Snail
1 John Smith
4 Pat Partridge
2 Jane Jones
5 Sarah Cyan
3 Peter Black
6 Brian Snail
Benefit #1: Amazon Redshift is fast
Distribution Keys
Benefit #1: Amazon Redshift is fast
H/W optimized for I/O intensive workloads, 4GB/sec/node
Enhanced networking, over 1M packets/sec/node
Choice of storage type, instance size
Regular cadence of auto-patched improvements
Example: Our new Dense Storage (HDD) instance type
Improved memory 2x, compute 2x, disk throughput 1.5x
Cost: same as our prior generation !
Benefit #2: Amazon Redshift is inexpensive
DS2 (HDD)
Price Per Hour for
DW1.XL Single Node
Effective Annual
Price per TB compressed
On-Demand $ 0.850 $ 3,725
1 Year Reservation $ 0.500 $ 2,190
3 Year Reservation $ 0.228 $ 999
DC1 (SSD)
Price Per Hour for
DW2.L Single Node
Effective Annual
Price per TB compressed
On-Demand $ 0.250 $ 13,690
1 Year Reservation $ 0.161 $ 8,795
3 Year Reservation $ 0.100 $ 5,500
Pricing is simple
Number of nodes x price/hour
No charge for leader node
No up front costs
Pay as you go
Benefit #3: Amazon Redshift is fully managed
Continuous/incremental backups
Multiple copies within cluster
Continuous and incremental backups
to S3
Continuous and incremental backups
across regions
Streaming restore
Amazon S3
Amazon S3
Region 1
Region 2
Benefit #3: Amazon Redshift is fully managed
Amazon S3
Amazon S3
Region 1
Region 2
Fault tolerance
Disk failures
Node failures
Network failures
Availability Zone/Region level disasters
Benefit #4: Security is built-in
• Load encrypted from S3
• SSL to secure data in transit
• ECDHE perfect forward security
• Amazon VPC for network isolation
• Encryption to secure data at rest
• All blocks on disks & in Amazon S3 encrypted
• Block key, Cluster key, Master key (AES-256)
• On-premises HSM & AWS CloudHSM support
• Audit logging and AWS CloudTrail integration
• SOC 1/2/3, PCI-DSS, FedRAMP, BAA
10 GigE
(HPC)
Ingestion
Backup
Restore
Customer VPC
Internal
VPC
JDBC/ODBC
Benefit #5: We innovate quickly
Well over 100 new features added since launch
Release every two weeks
Automatic patching
Service Launch (2/14)
PDX (4/2)
Temp Credentials (4/11)
DUB (4/25)
SOC1/2/3 (5/8)
Unload Encrypted Files
NRT (6/5)
JDBC Fetch Size (6/27)
Unload logs (7/5)
SHA1 Builtin (7/15)
4 byte UTF-8 (7/18)
Sharing snapshots (7/18)
Statement Timeout (7/22)
Timezone, Epoch, Autoformat (7/25)
WLM Timeout/Wildcards (8/1)
CRC32 Builtin, CSV, Restore Progress
(8/9)
Resource Level IAM (8/9)
PCI (8/22)
UTF-8 Substitution (8/29)
JSON, Regex, Cursors (9/10)
Split_part, Audit tables (10/3)
SIN/SYD (10/8)
HSM Support (11/11)
Kinesis EMR/HDFS/SSH copy,
Distributed Tables, Audit
Logging/CloudTrail, Concurrency, Resize
Perf., Approximate Count Distinct, SNS
Alerts, Cross Region Backup (11/13)
Distributed Tables, Single Node Cursor
Support, Maximum Connections to 500
(12/13)
EIP Support for VPC Clusters (12/28)
New query monitoring system tables and
diststyle all (1/13)
Redshift on DW2 (SSD) Nodes (1/23)
Compression for COPY from SSH, Fetch
size support for single node clusters, new
system tables with commit stats,
row_number(), strotol() and query
termination (2/13)
Resize progress indicator & Cluster
Version (3/21)
Regex_Substr, COPY from JSON (3/25)
50 slots, COPY from EMR, ECDHE
ciphers (4/22)
3 new regex features, Unload to single
file, FedRAMP(5/6)
Rename Cluster (6/2)
Copy from multiple regions,
percentile_cont, percentile_disc (6/30)
Free Trial (7/1)
pg_last_unload_count (9/15)
AES-128 S3 encryption (9/29)
UTF-16 support (9/29)
Benefit #6: Amazon Redshift is powerful
• Approximate functions
• User defined functions
• Machine Learning
• Data Science
Amazon ML
Benefit #7: Amazon Redshift has a large ecosystem
Data Integration Systems IntegratorsBusiness Intelligence
Benefit #8: Service oriented architecture
DynamoDB
EMR
S3
EC2/SSH
RDS/Aurora
Amazon
Redshift
Amazon Kinesis
Machine
Learning
Data Pipeline
CloudSearch
Mobile
Analytics
Use cases
Analyzing Twitter Firehose
Amazon
Redshift
Starts at
$0.25/hour
EC2
Starts at
$0.02/hour
S3
$0.030/GB-Mo
Amazon Glacier
$0.010/GB-Mo
Amazon Kinesis
$0.015/shard 1MB/s in; 2MB/out
$0.028/million puts
Analyzing Twitter Firehose
500MM tweets/day = ~ 5,800 tweets/sec
2k/tweet is ~12MB/sec (~1TB/day)
$0.015/hour per shard, $0.028/million PUTS
Amazon Kinesis cost is $0.765/hour
Amazon Redshift cost is $0.850/hour (for a 2TB node)
S3 cost is $1.28/hour (no compression)
Total: $2.895/hour
Data warehouses
can be
inexpensive
and
powerful
Use only the services you need
Scale only the services you need
Pay for what you use
~40% discount with 1 year commitment
~70% discounts with 3 year commitment
Data warehouses
can be
inexpensive
and
powerful
Amazon.com – Weblog analysis
Web log analysis for Amazon.com
1PB+ workload, 2TB/day, growing 67% YoY
Largest table: 400 TB
Want to understand customer behavior
Solution
Legacy DW—query across 1 week/hr.
Hadoop—query across 1 month/hr.
Query 15 months of data (1PB) in 14 minutes
Load 5B rows in 10 minutes
21B rows joined with 10B rows – 3 days (Hive) to 2 hours
Load pipeline: 90 hours (Oracle) to 8 hours
64 clusters
800 total nodes
13PB provisioned storage
2 DBAs
Data warehouses
can be
fast
and
simple
Petabytes of data generated
by many cell phone towers
Hard to scale, expensive
Needed a secure scalable
system that can work with on
premises
NTT Docomo – Mobile usage analysis
Data
Source
ET
Direct
Connect
Client
Forwarder
LoaderState
Management
SandboxRedshift
S3
High speed redundant direct connect lines
Load billions of rows in minutes
All data in private VPC
All data encrypted with private on-premises hardware keys
Encryption of data, transport, backups, partial spills
Audit of all SQL actions
Audit of all configuration changes
The cloud
can be made
more secure than
on premises
Sushiro – Real-time streaming from IoT & analysis
Sushiro – Real-time streaming & analysis
Real-time data ingested by Amazon Kinesis is analyzed in Amazon Redshift
380 stores stream live data from
Sushi plates
Inventory information combined
with consumption information
near real-time
Forecast demand by store,
minimize food waste, and
improve efficiencies
Amazon
Big data does not mean batch
Can be streamed in
Can be processed in near real time
Can be used to respond quickly to requests
You can mix and match
On premises and cloud
Custom development and managed services
Infrastructure with managed scaling, security
Data warehouses
can support
real-time data
In sum…
Amazon Redshift: Spend time with your data, not your database
(DAT201) Introduction to Amazon Redshift
Europe:
67.3M
Greater China:
27.5M
Middle East &
Africa: 81.7M
Asia-Pacific:
81.7M
Latin America:
43.4M
Our Data
Our data
100s of TBs in Data Warehouses
2012 2013 2014 2015
>100% Year over Year Data Growth
The legacy
Vertica Reporting
Content Presentation
Source DBs
3rd Party Data
Log Data
A B
Testing
Pain points
Fire Fights
Query Traffic Jams
Processing Windows
Scaling
Adopting cloud strategies
Amazon Redshift Instances
Reporting
Content Presentation
A B
Testing
Source DBs
3rd Party Data
Log Data
On-demand breakdown
Only when needed
Ephemeral Processing
Up during business hours
Always Up
Benefits to the data team
Processing Windows
Fire Fights
Scaling Number of
Clusters
Scaling the Size of
Clusters
DOH!
Reserved Instances
Automated vs. Manual Backups
Automated Cluster Shut Down
Sort/Distribution Keys
For Joins
Benefits to the business
50% Reduced time on administration
$0 Licensing
50% cost reduction for instances
100% Growth of Internal Customers
Q&A
Thank you!
Remember to complete
your evaluations!
Related Sessions
Hear from other customers discussing their Amazon Redshift use cases:
• DAT308—How Yahoo! Analyzes Billions of Events with Amazon Redshift (Yahoo)
• ISM303—Migrating Your Enterprise Data Warehouse to Amazon Redshift (Boingo Wireless
and Edmunds)
• ARC303—Pure Play Video OTT: A Microservices Architecture in the Cloud (Verizon)
• ARC305—Self-Service Cloud Services: How J&J Is Managing AWS at Scale for Enterprise
Workloads
• BDT306—The Life of a Click: How Hearst Publishing Manages Clickstream Analytics with
AWS
• DAT311—Large-Scale Genomic Analysis with Amazon Redshift (Human Longevity)
• BDT314—Running a Big Data and Analytics Application on Amazon EMR and Amazon
Redshift with a Focus on Security (Nasdaq)
• BDT316—Offloading ETL to Amazon Elastic MapReduce (Amgen)
• BDT401—Amazon Redshift Deep Dive (TripAdvisor)

More Related Content

(DAT201) Introduction to Amazon Redshift

  • 1. © 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Pavan Pothukuchi, Amazon Redshift Nam Nguyen, RetailMeNot October 2015 DAT201 Introduction to Amazon Redshift
  • 2. What to expect from the session • Amazon Redshift – What and Why • Benefits • Use cases • Amazon Redshift at RetailMeNot • Q&A
  • 3. AnalyzeStore Import/Export Direct Connect Collect Amazon Kinesis Amazon Glacier S3 DynamoDB Amazon Aurora AWS big data portfolio Data Pipeline CloudSearch EMR EC2 Amazon Redshift Machine Learning
  • 4. Relational data warehouse Massively parallel; Petabyte scale Fully managed HDD and SSD Platforms $1,000/TB/Year; starts at $0.25/hour Amazon Redshift a lot faster a lot simpler a lot cheaper
  • 5. The legacy view of data warehousing ... Global 2,000 companies Sell to central IT Multi-year commitment Multi-year deployments Multi-million dollar deals
  • 6. … Leads to dark data This is a narrow view Small companies also have big data (mobile, social, gaming, adtech, IoT) Long cycles, high costs, administrative complexity all stifle innovation 0 200 400 600 800 1000 1200 Enterprise Data Data in Warehouse
  • 7. The Amazon Redshift view of data warehousing 10x cheaper Easy to provision Higher DBA productivity 10x faster No programming Easily leverage BI tools, Hadoop, Machine Learning, Streaming Analysis in-line with process flows Pay as you go, grow as you need Managed availability & DR Enterprise Big Data SaaS
  • 9. Amazon Redshift architecture Leader Node Simple SQL end point Stores metadata Optimizes query plan Coordinates query execution Compute Nodes Local columnar storage Parallel/distributed execution of all queries, loads, backups, restores, resizes Start at just $0.25/hour, grow to 2 PB (compressed) DC1: SSD; scale from 160 GB to 326 TB DS2: HDD; scale from 2 TB to 2 PB Ingestion/Backup Backup Restore JDBC/ODBC 10 GigE (HPC)
  • 10. Benefit #1: Amazon Redshift is fast Dramatically less I/O Column storage Data compression Zone maps Direct-attached storage Large data block sizes analyze compression listing; Table | Column | Encoding ---------+----------------+---------- listing | listid | delta listing | sellerid | delta32k listing | eventid | delta32k listing | dateid | bytedict listing | numtickets | bytedict listing | priceperticket | delta32k listing | totalprice | mostly32 listing | listtime | raw 10 | 13 | 14 | 26 |… … | 100 | 245 | 324 375 | 393 | 417… … 512 | 549 | 623 637 | 712 | 809 … … | 834 | 921 | 959 10 324 375 623 637 959
  • 11. SELECT COUNT(*) FROM LOGS WHERE DATE = ‘09-JUNE-2013’ MIN: 01-JUNE-2013 MAX: 20-JUNE-2013 MIN: 08-JUNE-2013 MAX: 30-JUNE-2013 MIN: 12-JUNE-2013 MAX: 20-JUNE-2013 MIN: 02-JUNE-2013 MAX: 25-JUNE-2013 Unsorted Table MIN: 01-JUNE-2013 MAX: 06-JUNE-2013 MIN: 07-JUNE-2013 MAX: 12-JUNE-2013 MIN: 13-JUNE-2013 MAX: 18-JUNE-2013 MIN: 19-JUNE-2013 MAX: 24-JUNE-2013 Sorted By Date Benefit #1: Amazon Redshift is fast Sort Keys and Zone Maps
  • 12. Benefit #1: Amazon Redshift is fast Parallel and Distributed Query Load Export Backup Restore Resize
  • 13. ID Name 1 John Smith 2 Jane Jones 3 Peter Black 4 Pat Partridge 5 Sarah Cyan 6 Brian Snail 1 John Smith 4 Pat Partridge 2 Jane Jones 5 Sarah Cyan 3 Peter Black 6 Brian Snail Benefit #1: Amazon Redshift is fast Distribution Keys
  • 14. Benefit #1: Amazon Redshift is fast H/W optimized for I/O intensive workloads, 4GB/sec/node Enhanced networking, over 1M packets/sec/node Choice of storage type, instance size Regular cadence of auto-patched improvements Example: Our new Dense Storage (HDD) instance type Improved memory 2x, compute 2x, disk throughput 1.5x Cost: same as our prior generation !
  • 15. Benefit #2: Amazon Redshift is inexpensive DS2 (HDD) Price Per Hour for DW1.XL Single Node Effective Annual Price per TB compressed On-Demand $ 0.850 $ 3,725 1 Year Reservation $ 0.500 $ 2,190 3 Year Reservation $ 0.228 $ 999 DC1 (SSD) Price Per Hour for DW2.L Single Node Effective Annual Price per TB compressed On-Demand $ 0.250 $ 13,690 1 Year Reservation $ 0.161 $ 8,795 3 Year Reservation $ 0.100 $ 5,500 Pricing is simple Number of nodes x price/hour No charge for leader node No up front costs Pay as you go
  • 16. Benefit #3: Amazon Redshift is fully managed Continuous/incremental backups Multiple copies within cluster Continuous and incremental backups to S3 Continuous and incremental backups across regions Streaming restore Amazon S3 Amazon S3 Region 1 Region 2
  • 17. Benefit #3: Amazon Redshift is fully managed Amazon S3 Amazon S3 Region 1 Region 2 Fault tolerance Disk failures Node failures Network failures Availability Zone/Region level disasters
  • 18. Benefit #4: Security is built-in • Load encrypted from S3 • SSL to secure data in transit • ECDHE perfect forward security • Amazon VPC for network isolation • Encryption to secure data at rest • All blocks on disks & in Amazon S3 encrypted • Block key, Cluster key, Master key (AES-256) • On-premises HSM & AWS CloudHSM support • Audit logging and AWS CloudTrail integration • SOC 1/2/3, PCI-DSS, FedRAMP, BAA 10 GigE (HPC) Ingestion Backup Restore Customer VPC Internal VPC JDBC/ODBC
  • 19. Benefit #5: We innovate quickly Well over 100 new features added since launch Release every two weeks Automatic patching Service Launch (2/14) PDX (4/2) Temp Credentials (4/11) DUB (4/25) SOC1/2/3 (5/8) Unload Encrypted Files NRT (6/5) JDBC Fetch Size (6/27) Unload logs (7/5) SHA1 Builtin (7/15) 4 byte UTF-8 (7/18) Sharing snapshots (7/18) Statement Timeout (7/22) Timezone, Epoch, Autoformat (7/25) WLM Timeout/Wildcards (8/1) CRC32 Builtin, CSV, Restore Progress (8/9) Resource Level IAM (8/9) PCI (8/22) UTF-8 Substitution (8/29) JSON, Regex, Cursors (9/10) Split_part, Audit tables (10/3) SIN/SYD (10/8) HSM Support (11/11) Kinesis EMR/HDFS/SSH copy, Distributed Tables, Audit Logging/CloudTrail, Concurrency, Resize Perf., Approximate Count Distinct, SNS Alerts, Cross Region Backup (11/13) Distributed Tables, Single Node Cursor Support, Maximum Connections to 500 (12/13) EIP Support for VPC Clusters (12/28) New query monitoring system tables and diststyle all (1/13) Redshift on DW2 (SSD) Nodes (1/23) Compression for COPY from SSH, Fetch size support for single node clusters, new system tables with commit stats, row_number(), strotol() and query termination (2/13) Resize progress indicator & Cluster Version (3/21) Regex_Substr, COPY from JSON (3/25) 50 slots, COPY from EMR, ECDHE ciphers (4/22) 3 new regex features, Unload to single file, FedRAMP(5/6) Rename Cluster (6/2) Copy from multiple regions, percentile_cont, percentile_disc (6/30) Free Trial (7/1) pg_last_unload_count (9/15) AES-128 S3 encryption (9/29) UTF-16 support (9/29)
  • 20. Benefit #6: Amazon Redshift is powerful • Approximate functions • User defined functions • Machine Learning • Data Science Amazon ML
  • 21. Benefit #7: Amazon Redshift has a large ecosystem Data Integration Systems IntegratorsBusiness Intelligence
  • 22. Benefit #8: Service oriented architecture DynamoDB EMR S3 EC2/SSH RDS/Aurora Amazon Redshift Amazon Kinesis Machine Learning Data Pipeline CloudSearch Mobile Analytics
  • 25. Amazon Redshift Starts at $0.25/hour EC2 Starts at $0.02/hour S3 $0.030/GB-Mo Amazon Glacier $0.010/GB-Mo Amazon Kinesis $0.015/shard 1MB/s in; 2MB/out $0.028/million puts Analyzing Twitter Firehose
  • 26. 500MM tweets/day = ~ 5,800 tweets/sec 2k/tweet is ~12MB/sec (~1TB/day) $0.015/hour per shard, $0.028/million PUTS Amazon Kinesis cost is $0.765/hour Amazon Redshift cost is $0.850/hour (for a 2TB node) S3 cost is $1.28/hour (no compression) Total: $2.895/hour Data warehouses can be inexpensive and powerful
  • 27. Use only the services you need Scale only the services you need Pay for what you use ~40% discount with 1 year commitment ~70% discounts with 3 year commitment Data warehouses can be inexpensive and powerful
  • 28. Amazon.com – Weblog analysis Web log analysis for Amazon.com 1PB+ workload, 2TB/day, growing 67% YoY Largest table: 400 TB Want to understand customer behavior Solution Legacy DW—query across 1 week/hr. Hadoop—query across 1 month/hr.
  • 29. Query 15 months of data (1PB) in 14 minutes Load 5B rows in 10 minutes 21B rows joined with 10B rows – 3 days (Hive) to 2 hours Load pipeline: 90 hours (Oracle) to 8 hours 64 clusters 800 total nodes 13PB provisioned storage 2 DBAs Data warehouses can be fast and simple
  • 30. Petabytes of data generated by many cell phone towers Hard to scale, expensive Needed a secure scalable system that can work with on premises NTT Docomo – Mobile usage analysis Data Source ET Direct Connect Client Forwarder LoaderState Management SandboxRedshift S3
  • 31. High speed redundant direct connect lines Load billions of rows in minutes All data in private VPC All data encrypted with private on-premises hardware keys Encryption of data, transport, backups, partial spills Audit of all SQL actions Audit of all configuration changes The cloud can be made more secure than on premises
  • 32. Sushiro – Real-time streaming from IoT & analysis
  • 33. Sushiro – Real-time streaming & analysis Real-time data ingested by Amazon Kinesis is analyzed in Amazon Redshift 380 stores stream live data from Sushi plates Inventory information combined with consumption information near real-time Forecast demand by store, minimize food waste, and improve efficiencies Amazon
  • 34. Big data does not mean batch Can be streamed in Can be processed in near real time Can be used to respond quickly to requests You can mix and match On premises and cloud Custom development and managed services Infrastructure with managed scaling, security Data warehouses can support real-time data
  • 35. In sum… Amazon Redshift: Spend time with your data, not your database
  • 37. Europe: 67.3M Greater China: 27.5M Middle East & Africa: 81.7M Asia-Pacific: 81.7M Latin America: 43.4M
  • 39. Our data 100s of TBs in Data Warehouses 2012 2013 2014 2015 >100% Year over Year Data Growth
  • 40. The legacy Vertica Reporting Content Presentation Source DBs 3rd Party Data Log Data A B Testing
  • 41. Pain points Fire Fights Query Traffic Jams Processing Windows Scaling
  • 42. Adopting cloud strategies Amazon Redshift Instances Reporting Content Presentation A B Testing Source DBs 3rd Party Data Log Data
  • 43. On-demand breakdown Only when needed Ephemeral Processing Up during business hours Always Up
  • 44. Benefits to the data team Processing Windows Fire Fights Scaling Number of Clusters Scaling the Size of Clusters
  • 45. DOH! Reserved Instances Automated vs. Manual Backups Automated Cluster Shut Down Sort/Distribution Keys For Joins
  • 46. Benefits to the business 50% Reduced time on administration $0 Licensing 50% cost reduction for instances 100% Growth of Internal Customers
  • 47. Q&A
  • 50. Related Sessions Hear from other customers discussing their Amazon Redshift use cases: • DAT308—How Yahoo! Analyzes Billions of Events with Amazon Redshift (Yahoo) • ISM303—Migrating Your Enterprise Data Warehouse to Amazon Redshift (Boingo Wireless and Edmunds) • ARC303—Pure Play Video OTT: A Microservices Architecture in the Cloud (Verizon) • ARC305—Self-Service Cloud Services: How J&J Is Managing AWS at Scale for Enterprise Workloads • BDT306—The Life of a Click: How Hearst Publishing Manages Clickstream Analytics with AWS • DAT311—Large-Scale Genomic Analysis with Amazon Redshift (Human Longevity) • BDT314—Running a Big Data and Analytics Application on Amazon EMR and Amazon Redshift with a Focus on Security (Nasdaq) • BDT316—Offloading ETL to Amazon Elastic MapReduce (Amgen) • BDT401—Amazon Redshift Deep Dive (TripAdvisor)