Building a Modern Data Architecture on AWS - Webinar
- 1. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Russell Nash – AWS Solutions Architect, AWS
Building
a Modern Data Architecture
on AWS
In partnership with:
- 3. Ingest Serving
Speed (Real-time)
Scale (Batch)
Data analysts
Data scientists
Business users
Engagement platforms
Automation / events
Sources
Modern Data Architecture
AWS
Cloud Trail
AWS
IAM
Amazon
CloudWatch
AWS
KMS
- 5. Ingest Serving
Speed (Real-time)
Scale (Batch)
Data analysts
Data scientists
Business users
Engagement platforms
Automation / events
Sources
Database Analytics
Amazon
Redshift
Source
Database
- 8. ID Name
1 John Smith
2 Jane Jones
3 Peter Black
4 Pat Partridge
5 Sarah Cyan
6 Brian Snail
1 John Smith
4 Pat Partridge
2 Jane Jones
5 Sarah Cyan
3 Peter Black
6 Brian Snail
- 9. 1 John Smith
4 Pat Partridge
2 Jane Jones
5 Sarah Cyan
3 Peter Black
6 Brian Snail
SQL
SQL SQL SQLResults Results Results
Results
- 11. Ingest Serving
Speed (Real-time)
Scale (Batch)
Data analysts
Data scientists
Business users
Engagement platforms
Automation / events
Sources
ETL
Amazon
Redshift
Source
Database
Database Analytics
- 13. Ingest Serving
Speed (Real-time)
Scale (Batch)
Data analysts
Data scientists
Business users
Engagement platforms
Automation / events
Sources
ELT
Amazon
Redshift
Amazon
Redshift
Source
Database
Database Analytics
- 18. Ingest Serving
Speed (Real-time)
Scale (Batch)
Data analysts
Data scientists
Business users
Engagement platforms
Automation / events
Sources
Flat
Files
Amazon
S3
Batch Processing
AWS
Snowball
AWS
CLI & SDK
- 21. In pioneer days, they used oxen for heavy pulling,
and when one ox couldn’t budge a log,
they didn’t try to grow a bigger ox.
Grace Hopper
- 28. Compute Flexibility
Compute Memory Storage
Machine Learning
C4 Family
C3 Family
X1 Family
R3 Family
Interactive Analysis
D2 Family
I2 Family
Large HDFS
General
Batch Process
M4 Family
M3 Family
- 29. Cost & Time
# CPUs
Time
# CPUs
Time
Wall clock time: 1 hourWall clock time: 10 hours
- 32. Ingest Serving
Speed (Real-time)
Scale (Batch)
Data analysts
Data scientists
Business users
Engagement platforms
Automation / events
Sources
Flat
Files
Amazon
S3
Batch Processing
Amazon
EMR
Amazon
S3
AWS
Glue
AWS
Snowball
AWS
CLI & SDK
- 35. Ingest Serving
Speed (Real-time)
Scale (Batch)
Data analysts
Data scientists
Business users
Engagement platforms
Automation / events
Sources
Flat
Files
Amazon
S3
Batch Processing
Amazon
EMR
Amazon
S3
AWS
Glue
Amazon
Redshift
Amazon
EMR
AWS
Snowball
AWS
CLI & SDK
- 37. Ingest Serving
Speed (Real-time)
Scale (Batch)
Data analysts
Data scientists
Business users
Engagement platforms
Automation / events
Sources
Flat
Files
Amazon
S3
Batch Processing
Amazon
EMR
Amazon
S3
AWS
Glue
Amazon
Redshift
Amazon
EMR
Amazon
AthenaAWS
Snowball
AWS
CLI & SDK
- 40. Comparison of SQL Processing engines
Amazon
Redshift
Amazon
Athena
Data Structure
Languages
Semi Semi
SQL, HiveQL SQL
Full
SQL
Data Store S3/HDFS S3 Local
SQL
Semi
SQL
S3/HDFS
Performance
- 41. Comparison of SQL Processing engines
Transformation
SQL Queries
For S3/HDFS
Fully Featured
SQL
Database
Use Case
Amazon
Redshift
Amazon
Athena
SQL
Serverless
SQL Queries
for S3
- 44. Ingest Serving
Speed (Real-time)
Scale (Batch)
Data analysts
Data scientists
Business users
Engagement platforms
Automation / events
Sources
Real-time Pipeline
Amazon
Kinesis
Machines
Devices
Mobile
Clickstream
- 46. Ingest Serving
Speed (Real-time)
Scale (Batch)
Data analysts
Data scientists
Business users
Engagement platforms
Automation / events
Sources
Amazon
Kinesis
AWS Lambda
Application
Amazon EMR
Streaming
S3
(Log)
Amazon
ElasticSearch
(Dashboard)
Real-time Pipeline
- 49. Ingest Serving
Speed (Real-time)
Scale (Batch)
Data analysts
Data scientists
Business users
Engagement platforms
Automation / events
Sources
Amazon
Kinesis
AWS Lambda
Application
Amazon EMR
Streaming
S3
(Logs)
Amazon
ElasticSearch
(Dashboards)
Amazon EMR
(Predictions)
ML
Amazon SNS
(Alerts)
Real-time Pipeline
Amazon
Redshift
(Analytics)
- 53. Ingest Serving
Speed (Real-time)
Scale (Batch)
Data analysts
Data scientists
Business users
Engagement platforms
Automation / events
Sources
Amazon
Kinesis
AWS Lambda
Application
Amazon EMR
Streaming
Amazon
EMR
Data Lake
Amazon
Redshift
ETL
Amazon
Athena
EC2
AWS
CLI & SDK
Amazon
S3
Amazon
EMR
Amazon
S3
AWS
Cloud Trail
AWS
IAM
Amazon
CloudWatch
AWS
KMS
- 54. New X1 Instance - Tons of Memory
• Large-scale, in-memory applications
• Intel® Xeon® E7 8880 v3 Haswell processors
• Up to 2TB of memory
• Up to 128 vCPUs per instance
- 55. Intel® Processor Technologies
Intel® AVX – Dramatically increases performance for highly parallel HPC workloads
such as life science engineering, data mining, financial analysis, media processing
Intel® AES-NI – Enhances security with new encryption instructions that reduce the
performance penalty associated with encrypting/decrypting data
Intel® Turbo Boost Technology – Increases computing power with performance that
adapts to spikes in workloads
Intel Transactional Synchronization (TSX) Extensions – Enables execution of
transactions that are independent to accelerate throughput
P state & C state control – provides granular performance tuning for cores and sleep
states to improve overall application performance