Build multi region data warehouse on AWS - AWSVNUG
- 1. Build multi-region Data Warehouse
on AWS
Mr. Vuong Tran
Cloud Solution Architect
OSAM
#1
Mr. Thai Ngo
Cloud Solution Architect
OSAM
AWS User Group Vietnam
- 3. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Agenda
1. What is Data Warehouse?
2. Benefit of using AWS for Data Warehousing
3. Why Multi-region Data Warehouse?
4. AWS Architecture
- 5. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
What is the Data Warehouse
• DWs are central repositories of integrated data from one or more
disparate sources so it can be compared and analyzed for greater
business intelligence.
• DWs is known as a blend of technologies and components which allows
the strategic use of data.
- 6. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Data Warehouse using for?
• Reporting
• Data Analysis
• Core component of business intelligence
- 8. SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Global Infrastructure
19 regions in the world
Expanding more and more:
Bahrain, Cape Town, Hong
Kong SAR and Milan
- 9. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
What are the main components of data
warehouse on AWS?
- 10. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Benefits of using
AWS for Data
Warehouse
• Better decision making
• Consolidates data from many sources
• Data quality, consistency, and
accuracy
• Historical intelligence
• Separates analytics processing from
transactional databases, improving
performance of both systems
- 12. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Global business
- 13. SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data Location
● Stores in different
regions
● Different networks
- 14. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Data Process
Be processed separately
in different ways in different
regions
- 15. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Data Aggregation
Be aggregated across regions
for data mining and business
analytic
- 17. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
From the beginning
Single Region
- 18. SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Redshift
Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the
cloud. You can start with just a few hundred gigabytes of data and scale to a
petabyte or more. This enables you to use your data to acquire new insights for your
business and customers
- 19. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Amazon RedShift
• Scale from 160GB to 2PB online
• Automatic streaming
backup/restore to S3
• Automatic failover and recovery
• ANSI SQL interface
• Load data from S3, DynamoDB
and EMR
- 20. SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Glue
Automatically discover and categorize your data making it
immediately searchable and queryable across data sources
Generate code to clean, enrich, and reliably move data between
various data sources; you can also use their favorite tools to build
ETL jobs
Run your jobs on a serverless, fully managed, scale-out
environment. No compute resources to provision or manage.
Discovery
Develop
Deploy
- 21. SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS S3
Secure, durable, highly-scalable object storage
Accessible via a simple web services interface
Store & retrieve any amount of data
Integrate with many AWS services
- 22. SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon S3
Backup & Archiving
Content Storage &
Distribution
Big Data Analytics
Static Website Hosting
Cloud-native Application Data
Disaster Recovery
- 23. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Then expanding
- 24. SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Multi Regions Architecture
- Route 53: Geolocation Routing
- DW Redshift separately
- Glue jobs differences between regions
- Data is placed separately among regions
- 25. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Multi Regions Architecture
What about the Data from other regions?
How can we manage and access data from other regions in secure way?
- 26. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
We can do like this
Rely on BI tools features
- Support multiple sources
- Support aggregation from
multiple sources
Challenge
- Latency from sources
- Customize data from other
regions
- 27. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Or like this
Glue
- Send data to multiple DW
endpoints per job
Challenge
- Managed glue jobs across
region
- Complex private networks
with VPC peering when using
Glue in VPC
- 29. SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Architecture Diagram
- 30. SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Pros
- Replicate by AWS
- Choose partial data to replicate by S3 prefix
- Simplify the replication configure multi regions by using Cloud
Formation
S3 with cross-regions replication
- 31. SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Pros
- Maximize the ability to process and mine data from other regions
- Focus only processing data, not sending data to many endpoints
- 32. SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Pros
- No complex network, no VPCs peering
- Isolate ETL process between regions, different teams
- 33. SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Pros
- Data is prepared completely on each Redshift per region
- Reduce latency from BI tools to Redshift
- 34. SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Cons
Redundant data
- Replicate only what we need
- Apply S3 object life cycle
- Leverage Lambda to clean up Redshift data
- 35. NGO QUOC THAI
Cloud Solution Architect
OSAM
TRAN LE VUONG
Cloud Solution Architect
OSAM
Thank you