SlideShare a Scribd company logo
Build multi-region Data Warehouse
on AWS
Mr. Vuong Tran
Cloud Solution Architect
OSAM
#1
Mr. Thai Ngo
Cloud Solution Architect
OSAM
AWS User Group Vietnam
Build multi region data warehouse on AWS - AWSVNUG
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Agenda
1. What is Data Warehouse?
2. Benefit of using AWS for Data Warehousing
3. Why Multi-region Data Warehouse?
4. AWS Architecture
What is the Data Warehouse?
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
What is the Data Warehouse
• DWs are central repositories of integrated data from one or more
disparate sources so it can be compared and analyzed for greater
business intelligence.
• DWs is known as a blend of technologies and components which allows
the strategic use of data.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Data Warehouse using for?
• Reporting
• Data Analysis
• Core component of business intelligence
Benefits of using AWS for Data Warehouse
SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Global Infrastructure
19 regions in the world
Expanding more and more:
Bahrain, Cape Town, Hong
Kong SAR and Milan
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
What are the main components of data
warehouse on AWS?
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Benefits of using
AWS for Data
Warehouse
• Better decision making
• Consolidates data from many sources
• Data quality, consistency, and
accuracy
• Historical intelligence
• Separates analytics processing from
transactional databases, improving
performance of both systems
Why Multi-region Data Warehouse ?
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Global business
SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data Location
● Stores in different
regions
● Different networks
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Data Process
Be processed separately
in different ways in different
regions
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Data Aggregation
Be aggregated across regions
for data mining and business
analytic
AWS Architecture
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
From the beginning
Single Region
SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Redshift
Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the
cloud. You can start with just a few hundred gigabytes of data and scale to a
petabyte or more. This enables you to use your data to acquire new insights for your
business and customers
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Amazon RedShift
• Scale from 160GB to 2PB online
• Automatic streaming
backup/restore to S3
• Automatic failover and recovery
• ANSI SQL interface
• Load data from S3, DynamoDB
and EMR
SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Glue
Automatically discover and categorize your data making it
immediately searchable and queryable across data sources
Generate code to clean, enrich, and reliably move data between
various data sources; you can also use their favorite tools to build
ETL jobs
Run your jobs on a serverless, fully managed, scale-out
environment. No compute resources to provision or manage.
Discovery
Develop
Deploy
SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS S3
Secure, durable, highly-scalable object storage
Accessible via a simple web services interface
Store & retrieve any amount of data
Integrate with many AWS services
SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon S3
Backup & Archiving
Content Storage &
Distribution
Big Data Analytics
Static Website Hosting
Cloud-native Application Data
Disaster Recovery
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Then expanding
SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Multi Regions Architecture
- Route 53: Geolocation Routing
- DW Redshift separately
- Glue jobs differences between regions
- Data is placed separately among regions
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Multi Regions Architecture
What about the Data from other regions?
How can we manage and access data from other regions in secure way?
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
We can do like this
Rely on BI tools features
- Support multiple sources
- Support aggregation from
multiple sources
Challenge
- Latency from sources
- Customize data from other
regions
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Or like this
Glue
- Send data to multiple DW
endpoints per job
Challenge
- Managed glue jobs across
region
- Complex private networks
with VPC peering when using
Glue in VPC
Better solution
SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Architecture Diagram
SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Pros
- Replicate by AWS
- Choose partial data to replicate by S3 prefix
- Simplify the replication configure multi regions by using Cloud
Formation
S3 with cross-regions replication
SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Pros
- Maximize the ability to process and mine data from other regions
- Focus only processing data, not sending data to many endpoints
SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Pros
- No complex network, no VPCs peering
- Isolate ETL process between regions, different teams
SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Pros
- Data is prepared completely on each Redshift per region
- Reduce latency from BI tools to Redshift
SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Cons
Redundant data
- Replicate only what we need
- Apply S3 object life cycle
- Leverage Lambda to clean up Redshift data
NGO QUOC THAI
Cloud Solution Architect
OSAM
TRAN LE VUONG
Cloud Solution Architect
OSAM
Thank you

More Related Content

Build multi region data warehouse on AWS - AWSVNUG

  • 1. Build multi-region Data Warehouse on AWS Mr. Vuong Tran Cloud Solution Architect OSAM #1 Mr. Thai Ngo Cloud Solution Architect OSAM AWS User Group Vietnam
  • 3. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT Agenda 1. What is Data Warehouse? 2. Benefit of using AWS for Data Warehousing 3. Why Multi-region Data Warehouse? 4. AWS Architecture
  • 4. What is the Data Warehouse?
  • 5. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT What is the Data Warehouse • DWs are central repositories of integrated data from one or more disparate sources so it can be compared and analyzed for greater business intelligence. • DWs is known as a blend of technologies and components which allows the strategic use of data.
  • 6. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT Data Warehouse using for? • Reporting • Data Analysis • Core component of business intelligence
  • 7. Benefits of using AWS for Data Warehouse
  • 8. SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Global Infrastructure 19 regions in the world Expanding more and more: Bahrain, Cape Town, Hong Kong SAR and Milan
  • 9. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT What are the main components of data warehouse on AWS?
  • 10. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT Benefits of using AWS for Data Warehouse • Better decision making • Consolidates data from many sources • Data quality, consistency, and accuracy • Historical intelligence • Separates analytics processing from transactional databases, improving performance of both systems
  • 11. Why Multi-region Data Warehouse ?
  • 12. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT Global business
  • 13. SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Data Location ● Stores in different regions ● Different networks
  • 14. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT Data Process Be processed separately in different ways in different regions
  • 15. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT Data Aggregation Be aggregated across regions for data mining and business analytic
  • 17. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT From the beginning Single Region
  • 18. SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS Redshift Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud. You can start with just a few hundred gigabytes of data and scale to a petabyte or more. This enables you to use your data to acquire new insights for your business and customers
  • 19. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT Amazon RedShift • Scale from 160GB to 2PB online • Automatic streaming backup/restore to S3 • Automatic failover and recovery • ANSI SQL interface • Load data from S3, DynamoDB and EMR
  • 20. SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS Glue Automatically discover and categorize your data making it immediately searchable and queryable across data sources Generate code to clean, enrich, and reliably move data between various data sources; you can also use their favorite tools to build ETL jobs Run your jobs on a serverless, fully managed, scale-out environment. No compute resources to provision or manage. Discovery Develop Deploy
  • 21. SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS S3 Secure, durable, highly-scalable object storage Accessible via a simple web services interface Store & retrieve any amount of data Integrate with many AWS services
  • 22. SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon S3 Backup & Archiving Content Storage & Distribution Big Data Analytics Static Website Hosting Cloud-native Application Data Disaster Recovery
  • 23. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT Then expanding
  • 24. SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Multi Regions Architecture - Route 53: Geolocation Routing - DW Redshift separately - Glue jobs differences between regions - Data is placed separately among regions
  • 25. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT Multi Regions Architecture What about the Data from other regions? How can we manage and access data from other regions in secure way?
  • 26. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT We can do like this Rely on BI tools features - Support multiple sources - Support aggregation from multiple sources Challenge - Latency from sources - Customize data from other regions
  • 27. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT Or like this Glue - Send data to multiple DW endpoints per job Challenge - Managed glue jobs across region - Complex private networks with VPC peering when using Glue in VPC
  • 29. SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Architecture Diagram
  • 30. SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Pros - Replicate by AWS - Choose partial data to replicate by S3 prefix - Simplify the replication configure multi regions by using Cloud Formation S3 with cross-regions replication
  • 31. SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Pros - Maximize the ability to process and mine data from other regions - Focus only processing data, not sending data to many endpoints
  • 32. SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Pros - No complex network, no VPCs peering - Isolate ETL process between regions, different teams
  • 33. SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Pros - Data is prepared completely on each Redshift per region - Reduce latency from BI tools to Redshift
  • 34. SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Cons Redundant data - Replicate only what we need - Apply S3 object life cycle - Leverage Lambda to clean up Redshift data
  • 35. NGO QUOC THAI Cloud Solution Architect OSAM TRAN LE VUONG Cloud Solution Architect OSAM Thank you