SlideShare a Scribd company logo
P U B L I C S E C T O R
S U M M I T
WASHINGTON, D C
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
Implementing a Data Warehouse
on AWS in a Hybrid Environment
Gargi Singh Chhatwal
Solutions Architect
AWS
S e s s i o n I D - 2 9 9 9 4 3
Amy Tseng
DB Engineering Manager
Fannie Mae
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
Key Takeaways
Why do enterprises want hybrid cloud?
Data Warehouse on AWS
Data warehouse design considerations
Customer Story - Fannie Mae
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
What Do Customers Want in Hybrid?
Run workloads
on the cloud
Tight
integration
Run workloads
on-premises
Without buying
new hardware
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
Hybrid Cloud Use Cases
Integration of different data silos
Integrated identity and access
Integrated resources and deployment
management
Integrated devices and edge systems
Cloud bursting
Data center extension
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
The explosion of data that is being generated by cloud-based applications and services, as well as
data that is being migrated to cloud platforms from on-premises systems, is increasing exponentially.
THE DATA TSUNAMI
Data
every 5 years
There is more data
than people think.
years
live for
Data platforms need to
scalegrows
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
ANALYTICS PIPELINE: AWS
CONSUME/VISUALIZEETL ANALYZECOLLECT STORE
AWS Glue
ETL & Data Catalog
AWS Lake Formation
Data Lakes
Amazon Redshift
Data warehousing
Amazon EMR
Hadoop + Spark
Amazon Athena
Interactive analytics
Amazon Kinesis
Analytics Real-time
Amazon Elasticsearch service
Operational Analytics
AWS Database Migration Service
AWS Snowball
AWS Snowmobile
Amazon Kinesis Data Firehose
Amazon Kinesis Data Streams
AWS Direct Connect
AWS DataSync
Amazon Simple Storage Service
(Amazon S3)/Glacier
Amazon Relational
Database Service
(Amazon RDS)
Amazon Aurora
MySQL, PostgreSQL
Amazon ElastiCache
Redis, Memcached
Amazon Quantum
Ledger Database
Amazon DynamoDB
Key value
Amazon Neptune
Graph
Amazon Timestream
Time Series
Amazon RDS on
VMWare
Amazon Document DB
Amazon Elastic
File System
(Amazon EFS)
Amazon FSx
Amazon QuickSight
Amazon SageMaker
Amazon Machine Learning
AWS Marketplace
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
What does
data warehouse
modernization
mean? Easy to use
Extends to
your Data Lake
Don’t waste time on
menial administrative
tasks and maintenance
Directly analyze data
stored in your data lake
in open formats
Any scale of data,
workloads, and users
Dynamically scale up to
guarantee performance even
with unpredictable demands
and data volumes
Faster
time-to-insights
Consistently fast
performance, even with
thousands of concurrent
queries and users
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
Data Warehouse on AWS
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
DESIGNING A CLOUD DATA WAREHOUSE
Prepare your sources : batch or real-time?
Ingest : How are you going to get data into your data warehouse?
ETL : Is the data structured for the data warehouse or do you need to?
Data quality
What do you do about the quality of your data?
Partner solutions available
Auditing
Data governance
Master data management
Nightly jobs/ETL
Managing your data transformation
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
ENTERPRISE DATA WAREHOUSE WORKFLOW
ETLSTORE ANALYZE VISUALIZECOLLECT
Amazon S3 Amazon
Redshift
Amazon S3 Amazon
QuickSight
Amazon EMR
AWS Glue
Amazon
Redshift
Spectrum
Unstructured
data
Structured
data
AWS Direct Connect
AWS Snowball
AWS Database
Migration
Service
AWS DataSync
Data Sources
On-Premises
Data Sources
In Cloud
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
AMAZON REDSHIFT ETLSTORE ANALYZE VISUALIZECOLLECT
Fastest
Get faster time-to-insight
for all types of analytics
workloads; powered by
machine learning, columnar
storage, and MPP
Unlimited
scale
Extends your
Data Lake
1/10th
the cost
Dynamically scale up to
guarantee performance
even with unpredictable
analytical demands and
data volumes
Analyze data in the Amazon
S3 Data Lake in-place and in
open formats, together with
data loaded into Amazon
Redshift’s high performance
SSDs
Start at $0.25 per hour,
save costs with automated
administration tasks and
eliminate business impact
due to downtime; as low as
$1,000 per terabyte per year
Fast, simple, cost-effective data
warehouse that can extend queries to your Data Lake
Analyze data in open formats
such as Parquet, ORC, and JSON, using SQL tools
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
Amazon Redshift
The 4 things that matter most
Speed Scale SecuritySimplicity
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
AMAZON REDSHIFT SYSTEM ARCHITECTURE
Leader node
SQL endpoint
Stores metadata
Coordinates query execution
Compute nodes
Local, columnar storage
Execute queries in parallel
Load, backup, restore
through Amazon S3; load
from Amazon DynamoDB,
Amazon EMR.
Two hardware platforms
Optimized for data processing
DS2: HDD; scale from 2TB to 2PB
DC1: SSD; scale from 160GB to 356TB
Ingestion / Backup / Restore
JDBC/ODBC
SQL Clients / BI
Tools
Data
Catalog
Leader Node
Compute
Nodes
10 GigE (HPC)
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
A DEEPER LOOK AT COMPUTE NODE ARCHITECTURE
• A compute node is partitioned into either 2 or 16 Slices;
a slice can be thought of as a “virtual compute node”
• Each slice is allocated a portion of the compute node's
memory and disk space, where it processes a portion of the
workload assigned to the compute node by the leader node
• The leader node manages distributing data to the slices and
apportions the workload for any queries or other database
operations to the slices
• Slices are Amazon Redshift’s Symmetric Multi processing
(SMP) mechanism – they work in parallel to complete
operations
Compute Node
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
Security is built-in
Compliance certifications
10 GigE (HPC)
Customer
VPC
Internal
VPC
JDBC/ODBC
Compute
Nodes
Leader
Node
Network Isolation
End-to-end encryption
Integration with AWS Key
Management Service
(AWS KMS)
Amazon S3
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
AMAZON REDSHIFT ANTI-PATTERNS
Amazon Redshift is not ideally suited for the following usage patterns:
Small Datasets
Built for parallel processing
Data sets of < 100GB don’t gain benefits of Amazon Redshift
OLTP (Online Transaction Processing)
More appropriate for a traditional RDBMS or NoSQL database
Unstructured Data
Data must be structured by a defined schema
Amazon Redshift Spectrum
BLOB datastore large binary objects in Amazon S3 and reference in Amazon Redshift
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
DATA DESIGN CONSIDERATIONS
Are you using optimal data types?
• Parquet , AVRO, ORC
Is your data distributed evenly?
Did you pick a good sort key?
Loading data efficiently
Use the COPY command
You need at least as many input files as you have slices
With multiple input files, all slices are working so you
maximize throughput
Scale linearly as you
add nodes
Distribution Key All
Node 1
Slice
1
Slice
2
Node 2
Slice
3
Slice
4
Node 1
Slice
1
Slice
2
Node 2
Slice
3
Slice
4
All data on
every node
Same key to same
location
Node 1
Slice
1
Slice
2
Node 2
Slice
3
Slice
4
Even
Round robin distribution
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
Recently Released Features
Dense Compute Nodes (DC2)
2x performance as DC1 at the
same price
3x more I/O with
Upgrade at no cost
30% better storage utilization
than DC1
“Amazon Redshift’s new DC2 node is
giving us a 100 percent
performance increase, allowing us
to provide faster insights for our
retailers, more cost effectively, to
drive incremental revenue."
NVMe
SSD
DDR4 memory
Intel E5-2686
v4 (Broadwell)
Result-Set Caching
Sub second repeat queries
• Amazon Redshift customers can now serve 35% more
queries on average, using the same compute
resources
• Tens of thousands of compute hours are freed up
daily to serve the remaining queries and data ingestion
• Transparent – it just works!
“With Amazon Redshift result
caching, 20 percent of our queries
now complete in less than one
second,” said Greg Rokita,
Executive Director for Technology,
Edmunds
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
Short Query Acceleration
Express Lane for Short Queries
• Machine learning predicts the
runtime of queries
• Short queries are routed to an
express queue
• Resources are dynamically
dedicated to short queries
• Enable it today from your AWS
Management Console
How it works:
Analytics and
BI / Dashboard tools
Amazon
Redshift Machine Learning Classifier
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
Amazon Redshift Elastic Resize
Adds
additional
nodes
to Amazon
Redshift cluster
Distributes
data
across new
configuration
Minimal
transition time
Quickly scale
for varying
workload
demands
Scale up and down in minutes
Amazon
Redshift
Cluster
Compute
nodes Amazon Redshift
Managed Amazon S3
JDBC/ODBC
Leader Node
Backup
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
Concurrency Scaling for bursts of user activity
Caching Layer
Creates
more
clusters
automatically
on-demand
Consistently
fast
performance
even with
thousands of
concurrent queries
No
advance
hydration
required
Free for >97%
of Customers
For every 24 hours
that your main
cluster is in use, you
accrue a one-hour
credit for
Concurrency Scaling
Backup
Redshift Managed S3
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
Amazon Redshift Spectrum
Redshift Spectrum
query engine
Query across Amazon
Redshift and Amazon S3
Amazon
Redshift data
Amazon S3
data lake
Extend the data warehouse to exabytes of data in Amazon S3 Data Lake
No data loading required
Scale compute and storage separately
Directly query data stored in Amazon S3
Parquet, ORC, Avro, JSON, and CSV data formats
 Unload to Parquet
 Spectrum Request Accelerator
Coming
Soon!
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
Unload
to Parquet
Amazon Redshift
New features
Speed
Scale
WLM
Concurrency
Setting
Simplicity
Amazon Lake
Formation
integration
Security
Auto Data
Distribution
Deferred
Maintenance
Snapshot
Scheduler
Spectrum
Request
Accelerator
Auto data
distribution
Elastic
resize
Concurrency
Scaling
Improving
short query
acceleration
Auto-
vacuum
Auto-
analyze
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
About Fannie Mae
Homes Financed by Fannie Mae
Home ownership in United States
36 %
64 %
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
Our Data Warehouse
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
New Challenges
Loan Credit
• Digital Transformation -> 3 million queries/month, 4x data growth
• Successful user adoption -> User base growth 100 to 1000 in 3 years
• Concurrency, Scalability, and time to market
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
Amazon Redshift Solutions
DW
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
On-premises to Hybrid Environment
On-premise Amazon Cloud
Amazon
S3
Amazon
Redshift
Amazon
Athena
Amazon
EMR
Data
Warehouse
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
Concurrency Scaling Performance
0
50
100
150
200
250
300
350
400
Query 1 Query 2 Query 3 Query 4 Query 5 Query 6 Query 7 Query 8 Query 9 Query 10 Query 11
ExecutionTime(Sec.)
RS 16 nodes
RS 8 nodes Burst
Amazon Redshift 16 nodes vs. Amazon Redshift 8 nodes w/Concurrency Scaling
similar or better
performance is
achieved with
50% of the
compute
resource.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
Concurrency Scaling Performance
With concurrency
scaling feature, the
performance was flat
(did not degrade) as
concurrency increases.
0
50
100
150
200
250
300
350
400
1
Query 3
30 50 100
ExecutionTime(Sec.)
Query Concurrencies
Amazon Redshift vs. Amazon Redshift Concurrency
Scaling
5-table joins
Average of RS 8 nodes Average of RS 8 nodes Burst
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
Lessons Learned
Thank you!
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
Gargi Singh Chhatwal
gcchhatw@amazon.com
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T

More Related Content

Implementing a Data Warehouse on AWS in a Hybrid Environment

  • 1. P U B L I C S E C T O R S U M M I T WASHINGTON, D C
  • 2. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T Implementing a Data Warehouse on AWS in a Hybrid Environment Gargi Singh Chhatwal Solutions Architect AWS S e s s i o n I D - 2 9 9 9 4 3 Amy Tseng DB Engineering Manager Fannie Mae
  • 3. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T Key Takeaways Why do enterprises want hybrid cloud? Data Warehouse on AWS Data warehouse design considerations Customer Story - Fannie Mae
  • 4. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T
  • 5. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T What Do Customers Want in Hybrid? Run workloads on the cloud Tight integration Run workloads on-premises Without buying new hardware
  • 6. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T Hybrid Cloud Use Cases Integration of different data silos Integrated identity and access Integrated resources and deployment management Integrated devices and edge systems Cloud bursting Data center extension
  • 7. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T The explosion of data that is being generated by cloud-based applications and services, as well as data that is being migrated to cloud platforms from on-premises systems, is increasing exponentially. THE DATA TSUNAMI Data every 5 years There is more data than people think. years live for Data platforms need to scalegrows
  • 8. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T ANALYTICS PIPELINE: AWS CONSUME/VISUALIZEETL ANALYZECOLLECT STORE AWS Glue ETL & Data Catalog AWS Lake Formation Data Lakes Amazon Redshift Data warehousing Amazon EMR Hadoop + Spark Amazon Athena Interactive analytics Amazon Kinesis Analytics Real-time Amazon Elasticsearch service Operational Analytics AWS Database Migration Service AWS Snowball AWS Snowmobile Amazon Kinesis Data Firehose Amazon Kinesis Data Streams AWS Direct Connect AWS DataSync Amazon Simple Storage Service (Amazon S3)/Glacier Amazon Relational Database Service (Amazon RDS) Amazon Aurora MySQL, PostgreSQL Amazon ElastiCache Redis, Memcached Amazon Quantum Ledger Database Amazon DynamoDB Key value Amazon Neptune Graph Amazon Timestream Time Series Amazon RDS on VMWare Amazon Document DB Amazon Elastic File System (Amazon EFS) Amazon FSx Amazon QuickSight Amazon SageMaker Amazon Machine Learning AWS Marketplace
  • 9. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T What does data warehouse modernization mean? Easy to use Extends to your Data Lake Don’t waste time on menial administrative tasks and maintenance Directly analyze data stored in your data lake in open formats Any scale of data, workloads, and users Dynamically scale up to guarantee performance even with unpredictable demands and data volumes Faster time-to-insights Consistently fast performance, even with thousands of concurrent queries and users
  • 10. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T Data Warehouse on AWS
  • 11. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T DESIGNING A CLOUD DATA WAREHOUSE Prepare your sources : batch or real-time? Ingest : How are you going to get data into your data warehouse? ETL : Is the data structured for the data warehouse or do you need to? Data quality What do you do about the quality of your data? Partner solutions available Auditing Data governance Master data management Nightly jobs/ETL Managing your data transformation
  • 12. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T ENTERPRISE DATA WAREHOUSE WORKFLOW ETLSTORE ANALYZE VISUALIZECOLLECT Amazon S3 Amazon Redshift Amazon S3 Amazon QuickSight Amazon EMR AWS Glue Amazon Redshift Spectrum Unstructured data Structured data AWS Direct Connect AWS Snowball AWS Database Migration Service AWS DataSync Data Sources On-Premises Data Sources In Cloud
  • 13. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T AMAZON REDSHIFT ETLSTORE ANALYZE VISUALIZECOLLECT Fastest Get faster time-to-insight for all types of analytics workloads; powered by machine learning, columnar storage, and MPP Unlimited scale Extends your Data Lake 1/10th the cost Dynamically scale up to guarantee performance even with unpredictable analytical demands and data volumes Analyze data in the Amazon S3 Data Lake in-place and in open formats, together with data loaded into Amazon Redshift’s high performance SSDs Start at $0.25 per hour, save costs with automated administration tasks and eliminate business impact due to downtime; as low as $1,000 per terabyte per year Fast, simple, cost-effective data warehouse that can extend queries to your Data Lake Analyze data in open formats such as Parquet, ORC, and JSON, using SQL tools
  • 14. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T Amazon Redshift The 4 things that matter most Speed Scale SecuritySimplicity
  • 15. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T AMAZON REDSHIFT SYSTEM ARCHITECTURE Leader node SQL endpoint Stores metadata Coordinates query execution Compute nodes Local, columnar storage Execute queries in parallel Load, backup, restore through Amazon S3; load from Amazon DynamoDB, Amazon EMR. Two hardware platforms Optimized for data processing DS2: HDD; scale from 2TB to 2PB DC1: SSD; scale from 160GB to 356TB Ingestion / Backup / Restore JDBC/ODBC SQL Clients / BI Tools Data Catalog Leader Node Compute Nodes 10 GigE (HPC)
  • 16. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T A DEEPER LOOK AT COMPUTE NODE ARCHITECTURE • A compute node is partitioned into either 2 or 16 Slices; a slice can be thought of as a “virtual compute node” • Each slice is allocated a portion of the compute node's memory and disk space, where it processes a portion of the workload assigned to the compute node by the leader node • The leader node manages distributing data to the slices and apportions the workload for any queries or other database operations to the slices • Slices are Amazon Redshift’s Symmetric Multi processing (SMP) mechanism – they work in parallel to complete operations Compute Node
  • 17. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T Security is built-in Compliance certifications 10 GigE (HPC) Customer VPC Internal VPC JDBC/ODBC Compute Nodes Leader Node Network Isolation End-to-end encryption Integration with AWS Key Management Service (AWS KMS) Amazon S3
  • 18. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T
  • 19. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T AMAZON REDSHIFT ANTI-PATTERNS Amazon Redshift is not ideally suited for the following usage patterns: Small Datasets Built for parallel processing Data sets of < 100GB don’t gain benefits of Amazon Redshift OLTP (Online Transaction Processing) More appropriate for a traditional RDBMS or NoSQL database Unstructured Data Data must be structured by a defined schema Amazon Redshift Spectrum BLOB datastore large binary objects in Amazon S3 and reference in Amazon Redshift
  • 20. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T DATA DESIGN CONSIDERATIONS Are you using optimal data types? • Parquet , AVRO, ORC Is your data distributed evenly? Did you pick a good sort key? Loading data efficiently Use the COPY command You need at least as many input files as you have slices With multiple input files, all slices are working so you maximize throughput Scale linearly as you add nodes Distribution Key All Node 1 Slice 1 Slice 2 Node 2 Slice 3 Slice 4 Node 1 Slice 1 Slice 2 Node 2 Slice 3 Slice 4 All data on every node Same key to same location Node 1 Slice 1 Slice 2 Node 2 Slice 3 Slice 4 Even Round robin distribution
  • 21. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T Recently Released Features Dense Compute Nodes (DC2) 2x performance as DC1 at the same price 3x more I/O with Upgrade at no cost 30% better storage utilization than DC1 “Amazon Redshift’s new DC2 node is giving us a 100 percent performance increase, allowing us to provide faster insights for our retailers, more cost effectively, to drive incremental revenue." NVMe SSD DDR4 memory Intel E5-2686 v4 (Broadwell) Result-Set Caching Sub second repeat queries • Amazon Redshift customers can now serve 35% more queries on average, using the same compute resources • Tens of thousands of compute hours are freed up daily to serve the remaining queries and data ingestion • Transparent – it just works! “With Amazon Redshift result caching, 20 percent of our queries now complete in less than one second,” said Greg Rokita, Executive Director for Technology, Edmunds
  • 22. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T Short Query Acceleration Express Lane for Short Queries • Machine learning predicts the runtime of queries • Short queries are routed to an express queue • Resources are dynamically dedicated to short queries • Enable it today from your AWS Management Console How it works: Analytics and BI / Dashboard tools Amazon Redshift Machine Learning Classifier
  • 23. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T Amazon Redshift Elastic Resize Adds additional nodes to Amazon Redshift cluster Distributes data across new configuration Minimal transition time Quickly scale for varying workload demands Scale up and down in minutes Amazon Redshift Cluster Compute nodes Amazon Redshift Managed Amazon S3 JDBC/ODBC Leader Node Backup
  • 24. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T Concurrency Scaling for bursts of user activity Caching Layer Creates more clusters automatically on-demand Consistently fast performance even with thousands of concurrent queries No advance hydration required Free for >97% of Customers For every 24 hours that your main cluster is in use, you accrue a one-hour credit for Concurrency Scaling Backup Redshift Managed S3
  • 25. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T Amazon Redshift Spectrum Redshift Spectrum query engine Query across Amazon Redshift and Amazon S3 Amazon Redshift data Amazon S3 data lake Extend the data warehouse to exabytes of data in Amazon S3 Data Lake No data loading required Scale compute and storage separately Directly query data stored in Amazon S3 Parquet, ORC, Avro, JSON, and CSV data formats  Unload to Parquet  Spectrum Request Accelerator Coming Soon!
  • 26. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T Unload to Parquet Amazon Redshift New features Speed Scale WLM Concurrency Setting Simplicity Amazon Lake Formation integration Security Auto Data Distribution Deferred Maintenance Snapshot Scheduler Spectrum Request Accelerator Auto data distribution Elastic resize Concurrency Scaling Improving short query acceleration Auto- vacuum Auto- analyze
  • 27. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T
  • 28. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T About Fannie Mae Homes Financed by Fannie Mae Home ownership in United States 36 % 64 %
  • 29. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T Our Data Warehouse
  • 30. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T New Challenges Loan Credit • Digital Transformation -> 3 million queries/month, 4x data growth • Successful user adoption -> User base growth 100 to 1000 in 3 years • Concurrency, Scalability, and time to market
  • 31. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T Amazon Redshift Solutions DW
  • 32. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T On-premises to Hybrid Environment On-premise Amazon Cloud Amazon S3 Amazon Redshift Amazon Athena Amazon EMR Data Warehouse
  • 33. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T Concurrency Scaling Performance 0 50 100 150 200 250 300 350 400 Query 1 Query 2 Query 3 Query 4 Query 5 Query 6 Query 7 Query 8 Query 9 Query 10 Query 11 ExecutionTime(Sec.) RS 16 nodes RS 8 nodes Burst Amazon Redshift 16 nodes vs. Amazon Redshift 8 nodes w/Concurrency Scaling similar or better performance is achieved with 50% of the compute resource.
  • 34. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T Concurrency Scaling Performance With concurrency scaling feature, the performance was flat (did not degrade) as concurrency increases. 0 50 100 150 200 250 300 350 400 1 Query 3 30 50 100 ExecutionTime(Sec.) Query Concurrencies Amazon Redshift vs. Amazon Redshift Concurrency Scaling 5-table joins Average of RS 8 nodes Average of RS 8 nodes Burst
  • 35. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T Lessons Learned
  • 36. Thank you! © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T Gargi Singh Chhatwal gcchhatw@amazon.com
  • 37. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T