SlideShare a Scribd company logo
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
BuildingServerlessAnalytics onAWS
Ivan Cheng
Solutions Architect
AWS
Steven Hsieh
Engineer
TrendMicro
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
COLLECT STORE PROCESS/
ANALYZE
CONSUME
Data Answers
Time to answer (Latency)
Throughput
Cost
Data Processing START HERE
WITH A BUSINESS CASE
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
To answer new questions quickly, we look to
a modern data architecture design
Massive upfront costs
Overprovisioned capacity
Long implementation times
Pay as you go, for what you use
Decoupled pipelines and engines
Experimentation platform
Ingest/
Collect
Consume/
visualize
Store Process/
analyze
1 4
0 9
5
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Data Is Changing  Analytics Are Adopting
Capture and store
new data at PB-EB
scale
Do new type of analytics
in a cost effective way
• Machine learning
• Big data processing
• Real-time analytics
• Full-text search
New types of
analytics
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
More data lakes and analytics than anywhere
else
More than 10,000 data lakes onAWS
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Data Movement
Analytics
AWS Analytics Portfolio
Broadest and deepest portfolio, purpose-built for builders
+ 10 more
Redshift
EMR (Spark
& Hadoop)
Athena
Elasticsearch
Service
Kinesis Data
Analytics
Glue (Spark
& Python)
S3/Glacier GlueLake
Formation
Visualization, Engagement, & Machine Learning
QuickSight SageMaker Comprehen
d
Le
x
Polly Rekognition Translate Transcribe
Deep Learning
AMIs
Database Migration Service | Snowball | Snowmobile | Kinesis Data Firehose | Kinesis Data Streams | Managed Streaming for Kafka
Data Lake Infrastructure & Management
Agility and Innovation Are Key
Amazon SageMaker
AWS Deep LearningAMIs
Amazon Rekognition
Amazon Lex
AWS DeepLens
Amazon Comprehend
AmazonTranslate
AmazonTranscribe
Amazon Polly
Amazon Athena
Amazon EMR
Amazon Redshift
Amazon Elasticsearch Service
Amazon Kinesis
Amazon QuickSight
AWS Direct Connect
AWS Snowball
AWS Snowmobile
AWS Database Migration Service
AWS IoT Core
Amazon Kinesis Data Firehose
Amazon Kinesis Data Streams
Amazon KinesisVideo Streams
Data Lake
on AWS
Storage | Archival Storage | Data Catalog
AnalyticsMachine learning
Real-time dataOn-premises movementdata movement
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Snowball
Snowmobile Kinesis
Data Firehose
Kinesis
Data Streams
S3
Redshift
EMR
Athena Kinesis
Elasticsearch Service
Kinesis
Video Streams
AI Services
QuickSight
Durable and available; Exabyte scale
Secure, compliant, auditable
Rapid ingest and transformation
Schema on read
Decoupling of compute and storage
On-demand resources, tiering, cost choices
Robust Infrastructure
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Your choice of Amazon S3 storage classes
Access FrequencyFrequent Infrequent
• Active, frequently
accessed data
• Milliseconds access
• > 3 AZ
• $0.0210/GB
• Data with changing
access patterns
• Milliseconds access
• > 3 AZ
• $0.0210 to $0.0125/GB
• Monitoring fee per Obj.
• Min storage duration
• Infrequently accessed
data
• Milliseconds access
• > 3 AZ
• $0.0125/GB
• Retrieval fee per GB
• Min storage duration
• Min object size
S3 Standard S3 S-IA S3 Z-IA Amazon Glacier
• Re-creatable, less
accessed data
• Milliseconds access
• 1 AZ
• $0.0100/GB
• Retrieval fee per GB
• Min storage duration
• Min object size
• Archive data
• Select minutes or
hours
• > 3 AZ
• $0.0040/GB
• Retrieval fee per GB
• Min storage duration
• Min object size
S3 INT
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Ingest Consume
Amazon Kinesis
BI Tools
Data Analytics Pipeline
Database
Migration Service
AWS Snowball
Amazon MSK
Amazon
Athena
Amazon
EMR
Amazon
Redshift
Amazon
Elasticsearch
Process & Analyze
Jupyter
Notebooks
Amazon
API Gateway
Amazon
QuickSight
Catalog
AWS Glue
Store
Amazon S3
Store
Amazon S3
Data sources
Web logs /
cookies
ERP
Connected
devices
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Virtual
machines
Managed
services
Serverless
Cloud Services Evolution
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Serverless analytics
Deliver on-demand analytics on the data lake
S3
Data lake
Glue
(ETL &
Data Catalog)
Athena
QuickSight
Serverless. Zero
infrastructure. Zero
administration
Never pay for
idle resources
$
Availability and
fault tolerance built
in
Automatically scales
resources with usage
AI/ML
Devices Web Sensors Social
Kinesis Data
Firehose
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Amazon Athena-Interactive Analysis
Interactive query service to analyze data in Amazon S3 using standard SQL
No infrastructure to set up or manage and no data to load
Supports Multiple Data Formats – Define Schema on Demand
Fast. Really Fast.
Interactive performance even
for large datasets. Athena
automatically executes
queries in parallel, so most
results come back within
seconds.
Open. Powerful.
Standard
Start Querying
Instantly
Pay Per Query
Athena is serverless. Just
point to your data in Amazon
S3, define the schema, and
start querying using the built-
in query editor.
Amazon Athena uses Presto
with ANSI SQL support and
works with a variety of
standard data formats,
including CSV, JSON, ORC,
Avro, and Parquet
With Amazon Athena, you pay
only for the queries that you
run. You are charged $5 per
terabyte scanned by your
queries.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Amazon S3 Amazon Athena
Data catalog
Data Engineer Data Consumer
AWS Tools and SDKs
AWS Management Console
Amazon QuickSight
Amazon SageMaker
User
Analyst
Data Scientist
Use PyAthena to query
Athena tables directly from
Amazon SageMaker
notebooks
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Data consumption – Automated Reporting
athena.startQueryExecution("SELECT * FROM
business_view”)
1
2
3 4
5
1. Schedule query
2. Track QueryID for status
3. Query results to Amazon S3
4. New file trigger
5. Job complete notification
Email
notification
Query_ID
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Athena Workgroups
Athena Workgroups are used to isolate queries
between different teams, workloads or applications,
and to set limits on amount of data each query or the
entire workgroup can process
Workload Isolation Query Metrics Cost Controls
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Workgroups – Cost Controls
• Per query data scanned threshold; exceeding, will cancel query
• Trigger alarms to notify of increasing usage and cost
• Disable Workgroup when all queries exceed a maximum threshold
Any Athena metric: successful/failed & total queries, query run time, etc.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Visualize your data with your favorite tools
Featured Athena Partners
Amazon QuickSight
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon QuickSight
is a fully managed,
serverless, cloud
business intelligence
system
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Why QuickSight
Scalable
From 10 users to 10,000, QuickSight seamlessly grows
with you with no need for additional servers or
infrastructure.
No Servers to Manage
QuickSight is a fully managed cloud service. There is
no infrastructure to maintain or upgrade and no upfront
costs.
Fully integrated
QuickSight integrates with your other AWS services
and data sources giving you everything you need to
build an end-to-end cloud analytics solution.
Pay For What You Use
Instead of buying costly licenses for all of your users,
QuickSight allows you to share dashboards and reports and
only pay when users access them.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Connect to your data, wherever it is
QuickSight allows you to connect to AWS data sources, Private VPC subnets, on-premise
and hosted databases and third party business applications.
On-premises
Securely connect to on-premise
databases and flat files like Excel
and CSV
In the cloud
Connect to hosted database, big
data formats, and secure VPCs
Applications
Connect directly to third
party business applications
• Salesforce
• Square
• Adobe Analytics
• Jira
• ServiceNow
• Twitter
• Github
• Redshift
• RDS
• S3
• Athena
• Aurora
• Teradata
• MySQL
• Presto
• Spark
• SQL Server
• Postgre SQL
• MariaDB
• Snowflake
• IoT Analytics
• Excel
• CSV
• Teradata
• MySQL
• SQL Server
• PostgreSQL
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Embedding Dashboards In Your Application
QuickSight allows you to seamlessly integrate interactive dashboards and analytics into your
own applications
• Enhance your applications with rich
analytics and dashboards
• Easy maintenance, no servers to
manage
• Fast! No Custom development or
domain expertise needed
• Leverage new features as we add
them
• Utilizes Pay-per-Session Pricing.
S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Amazon S3
(Processed Data)
Amazon
Athena
Amazon
QuickSight
Demo Scenario
Glue Data
catalog
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Building AWS Multi Account Cost
Analytics Solution at Scale
Steven Hsieh
Engineer
TrendMicro
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
About Me
Steven Hsieh
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Background
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Pillars of
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Design Principles for Cost Optimization
• Adopt a consumption model
• Measure overall efficiency
• Stop spending money on data center operations
• Analyze and attribute expenditure
• Use managed services to reduce cost of ownership
Pay as you go /
need
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Challenges
Large Scale Accounts
• Almost 400 accounts
• Hard management via
AWS console
Multiple Data Sources
• Billing data
• Utilization data of AWS
services ( e.g., EC2, S3)
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Challenges
Permission Management
• Multiple teams
• Authorization of different
team
Insight for Better Design
• Finding insight for
design improvement
• Providing utilization
visibility for design
change
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Other solution we have tried…
AWS Billing Console
• Hard to use in large
scale
• Single data source
Amazon Redshift
• Cost Model
• ETL
3rd Party BI Tool
• Expensive license
fee
• Additional operation
cost
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Ideas
+ +
• Data persistence in Amazon S3
• Data querying via Amazon Athena
• Dashboard / Reporting via Amazon QuickSight
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Challenges
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Global Accelerator
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
• Using SQS to trigger parallel tasks
• Lambda limitation:
• Timeout: 15 minutes
• /tmp: 512 MB
• Spot instance interruptions
• Fargate limitation:
• Container storage: 10 GB
• Run-task: 10
• Using assume role to collect data
across accounts
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
• Using SNS to trace data
uploading result
• Preprocessing data before
uploading to S3
• Only creator can modify
datasets in QuickSight
• Create view in Athena
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Global Accelerator
• Web application host in
Fargate
• Lambda Integration with
QuickSight for embedded URL.
• Using ALB to handle all
HTTPS interaction.
• Permission & Metadata in
DynamoDB
• ADFS Federation using
Cognito
• Performance Improvement
via AWS Global Accelerator
• Web Security Enhancement
via AWS WAF
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Quick Development & Evaluation
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Low Utilization & Right Sizing
• Trusted Advisor Checks
• Low utilization EC2 instances: CPU was 10% or less and
network I/O was 5 MB or less on 4 or more days during last 14
days
• Right Sizing
• Analysis metric data to recommend proper instance type and
size
• Awareness of NIC driver and Linux virtualization type issue
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Saving Polar Bear
• Analyzing the CPU utilization pattern
• Tuning off non-production instances can saving almost
70% cost
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Recap
• Using cost effective way to build the end-to-end BI
solution
• 2 power users $36 + ALB $18 = $54
• Using flexible reporting architecture to integrate with
multiple data sources
• Quick win & timely data driven decision
• Validating innovation idea (e.g., the potential saving of polar bear
project)
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Summary
• More organizations building datalake on cloud to stay competitive
• AWS provides the broadest and deepest portfolio of databases and
analytics services includes machine learning.
• Serverless Analytics helps you build modern data pipeline with increased
agility and lower cost.
• Learn more at: https://aws.amazon.com/big-data/datalakes-and-analytics/
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Thank you!
S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Ivan Cheng
Solutions Architect
AWS
Steven Hsieh
Engineer
TrendMicro

More Related Content

AWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWS

  • 1. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T BuildingServerlessAnalytics onAWS Ivan Cheng Solutions Architect AWS Steven Hsieh Engineer TrendMicro
  • 2. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T COLLECT STORE PROCESS/ ANALYZE CONSUME Data Answers Time to answer (Latency) Throughput Cost Data Processing START HERE WITH A BUSINESS CASE
  • 3. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T To answer new questions quickly, we look to a modern data architecture design Massive upfront costs Overprovisioned capacity Long implementation times Pay as you go, for what you use Decoupled pipelines and engines Experimentation platform Ingest/ Collect Consume/ visualize Store Process/ analyze 1 4 0 9 5
  • 4. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Data Is Changing  Analytics Are Adopting Capture and store new data at PB-EB scale Do new type of analytics in a cost effective way • Machine learning • Big data processing • Real-time analytics • Full-text search New types of analytics
  • 5. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T More data lakes and analytics than anywhere else More than 10,000 data lakes onAWS
  • 6. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Data Movement Analytics AWS Analytics Portfolio Broadest and deepest portfolio, purpose-built for builders + 10 more Redshift EMR (Spark & Hadoop) Athena Elasticsearch Service Kinesis Data Analytics Glue (Spark & Python) S3/Glacier GlueLake Formation Visualization, Engagement, & Machine Learning QuickSight SageMaker Comprehen d Le x Polly Rekognition Translate Transcribe Deep Learning AMIs Database Migration Service | Snowball | Snowmobile | Kinesis Data Firehose | Kinesis Data Streams | Managed Streaming for Kafka Data Lake Infrastructure & Management
  • 7. Agility and Innovation Are Key Amazon SageMaker AWS Deep LearningAMIs Amazon Rekognition Amazon Lex AWS DeepLens Amazon Comprehend AmazonTranslate AmazonTranscribe Amazon Polly Amazon Athena Amazon EMR Amazon Redshift Amazon Elasticsearch Service Amazon Kinesis Amazon QuickSight AWS Direct Connect AWS Snowball AWS Snowmobile AWS Database Migration Service AWS IoT Core Amazon Kinesis Data Firehose Amazon Kinesis Data Streams Amazon KinesisVideo Streams Data Lake on AWS Storage | Archival Storage | Data Catalog AnalyticsMachine learning Real-time dataOn-premises movementdata movement
  • 8. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Snowball Snowmobile Kinesis Data Firehose Kinesis Data Streams S3 Redshift EMR Athena Kinesis Elasticsearch Service Kinesis Video Streams AI Services QuickSight Durable and available; Exabyte scale Secure, compliant, auditable Rapid ingest and transformation Schema on read Decoupling of compute and storage On-demand resources, tiering, cost choices Robust Infrastructure
  • 9. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Your choice of Amazon S3 storage classes Access FrequencyFrequent Infrequent • Active, frequently accessed data • Milliseconds access • > 3 AZ • $0.0210/GB • Data with changing access patterns • Milliseconds access • > 3 AZ • $0.0210 to $0.0125/GB • Monitoring fee per Obj. • Min storage duration • Infrequently accessed data • Milliseconds access • > 3 AZ • $0.0125/GB • Retrieval fee per GB • Min storage duration • Min object size S3 Standard S3 S-IA S3 Z-IA Amazon Glacier • Re-creatable, less accessed data • Milliseconds access • 1 AZ • $0.0100/GB • Retrieval fee per GB • Min storage duration • Min object size • Archive data • Select minutes or hours • > 3 AZ • $0.0040/GB • Retrieval fee per GB • Min storage duration • Min object size S3 INT
  • 10. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Ingest Consume Amazon Kinesis BI Tools Data Analytics Pipeline Database Migration Service AWS Snowball Amazon MSK Amazon Athena Amazon EMR Amazon Redshift Amazon Elasticsearch Process & Analyze Jupyter Notebooks Amazon API Gateway Amazon QuickSight Catalog AWS Glue Store Amazon S3 Store Amazon S3 Data sources Web logs / cookies ERP Connected devices
  • 11. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Virtual machines Managed services Serverless Cloud Services Evolution
  • 12. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Serverless analytics Deliver on-demand analytics on the data lake S3 Data lake Glue (ETL & Data Catalog) Athena QuickSight Serverless. Zero infrastructure. Zero administration Never pay for idle resources $ Availability and fault tolerance built in Automatically scales resources with usage AI/ML Devices Web Sensors Social Kinesis Data Firehose
  • 13. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Amazon Athena-Interactive Analysis Interactive query service to analyze data in Amazon S3 using standard SQL No infrastructure to set up or manage and no data to load Supports Multiple Data Formats – Define Schema on Demand Fast. Really Fast. Interactive performance even for large datasets. Athena automatically executes queries in parallel, so most results come back within seconds. Open. Powerful. Standard Start Querying Instantly Pay Per Query Athena is serverless. Just point to your data in Amazon S3, define the schema, and start querying using the built- in query editor. Amazon Athena uses Presto with ANSI SQL support and works with a variety of standard data formats, including CSV, JSON, ORC, Avro, and Parquet With Amazon Athena, you pay only for the queries that you run. You are charged $5 per terabyte scanned by your queries.
  • 14. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Amazon S3 Amazon Athena Data catalog Data Engineer Data Consumer AWS Tools and SDKs AWS Management Console Amazon QuickSight Amazon SageMaker User Analyst Data Scientist Use PyAthena to query Athena tables directly from Amazon SageMaker notebooks
  • 15. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Data consumption – Automated Reporting athena.startQueryExecution("SELECT * FROM business_view”) 1 2 3 4 5 1. Schedule query 2. Track QueryID for status 3. Query results to Amazon S3 4. New file trigger 5. Job complete notification Email notification Query_ID
  • 16. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Athena Workgroups Athena Workgroups are used to isolate queries between different teams, workloads or applications, and to set limits on amount of data each query or the entire workgroup can process Workload Isolation Query Metrics Cost Controls
  • 17. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Workgroups – Cost Controls • Per query data scanned threshold; exceeding, will cancel query • Trigger alarms to notify of increasing usage and cost • Disable Workgroup when all queries exceed a maximum threshold Any Athena metric: successful/failed & total queries, query run time, etc.
  • 18. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Visualize your data with your favorite tools Featured Athena Partners Amazon QuickSight
  • 19. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon QuickSight is a fully managed, serverless, cloud business intelligence system
  • 20. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Why QuickSight Scalable From 10 users to 10,000, QuickSight seamlessly grows with you with no need for additional servers or infrastructure. No Servers to Manage QuickSight is a fully managed cloud service. There is no infrastructure to maintain or upgrade and no upfront costs. Fully integrated QuickSight integrates with your other AWS services and data sources giving you everything you need to build an end-to-end cloud analytics solution. Pay For What You Use Instead of buying costly licenses for all of your users, QuickSight allows you to share dashboards and reports and only pay when users access them.
  • 21. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Connect to your data, wherever it is QuickSight allows you to connect to AWS data sources, Private VPC subnets, on-premise and hosted databases and third party business applications. On-premises Securely connect to on-premise databases and flat files like Excel and CSV In the cloud Connect to hosted database, big data formats, and secure VPCs Applications Connect directly to third party business applications • Salesforce • Square • Adobe Analytics • Jira • ServiceNow • Twitter • Github • Redshift • RDS • S3 • Athena • Aurora • Teradata • MySQL • Presto • Spark • SQL Server • Postgre SQL • MariaDB • Snowflake • IoT Analytics • Excel • CSV • Teradata • MySQL • SQL Server • PostgreSQL
  • 22. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Embedding Dashboards In Your Application QuickSight allows you to seamlessly integrate interactive dashboards and analytics into your own applications • Enhance your applications with rich analytics and dashboards • Easy maintenance, no servers to manage • Fast! No Custom development or domain expertise needed • Leverage new features as we add them • Utilizes Pay-per-Session Pricing.
  • 23. S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 24. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Amazon S3 (Processed Data) Amazon Athena Amazon QuickSight Demo Scenario Glue Data catalog
  • 25. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Building AWS Multi Account Cost Analytics Solution at Scale Steven Hsieh Engineer TrendMicro
  • 26. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
  • 27. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T About Me Steven Hsieh
  • 28. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Background
  • 29. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Pillars of
  • 30. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Design Principles for Cost Optimization • Adopt a consumption model • Measure overall efficiency • Stop spending money on data center operations • Analyze and attribute expenditure • Use managed services to reduce cost of ownership Pay as you go / need
  • 31. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Challenges Large Scale Accounts • Almost 400 accounts • Hard management via AWS console Multiple Data Sources • Billing data • Utilization data of AWS services ( e.g., EC2, S3)
  • 32. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Challenges Permission Management • Multiple teams • Authorization of different team Insight for Better Design • Finding insight for design improvement • Providing utilization visibility for design change
  • 33. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Other solution we have tried… AWS Billing Console • Hard to use in large scale • Single data source Amazon Redshift • Cost Model • ETL 3rd Party BI Tool • Expensive license fee • Additional operation cost
  • 34. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Ideas + + • Data persistence in Amazon S3 • Data querying via Amazon Athena • Dashboard / Reporting via Amazon QuickSight
  • 35. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Challenges
  • 36. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Global Accelerator
  • 37. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T • Using SQS to trigger parallel tasks • Lambda limitation: • Timeout: 15 minutes • /tmp: 512 MB • Spot instance interruptions • Fargate limitation: • Container storage: 10 GB • Run-task: 10 • Using assume role to collect data across accounts
  • 38. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T • Using SNS to trace data uploading result • Preprocessing data before uploading to S3 • Only creator can modify datasets in QuickSight • Create view in Athena
  • 39. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Global Accelerator • Web application host in Fargate • Lambda Integration with QuickSight for embedded URL. • Using ALB to handle all HTTPS interaction. • Permission & Metadata in DynamoDB • ADFS Federation using Cognito • Performance Improvement via AWS Global Accelerator • Web Security Enhancement via AWS WAF
  • 40. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Quick Development & Evaluation
  • 41. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Low Utilization & Right Sizing • Trusted Advisor Checks • Low utilization EC2 instances: CPU was 10% or less and network I/O was 5 MB or less on 4 or more days during last 14 days • Right Sizing • Analysis metric data to recommend proper instance type and size • Awareness of NIC driver and Linux virtualization type issue
  • 42. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Saving Polar Bear • Analyzing the CPU utilization pattern • Tuning off non-production instances can saving almost 70% cost
  • 43. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Recap • Using cost effective way to build the end-to-end BI solution • 2 power users $36 + ALB $18 = $54 • Using flexible reporting architecture to integrate with multiple data sources • Quick win & timely data driven decision • Validating innovation idea (e.g., the potential saving of polar bear project)
  • 44. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Summary • More organizations building datalake on cloud to stay competitive • AWS provides the broadest and deepest portfolio of databases and analytics services includes machine learning. • Serverless Analytics helps you build modern data pipeline with increased agility and lower cost. • Learn more at: https://aws.amazon.com/big-data/datalakes-and-analytics/
  • 45. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Thank you! S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Ivan Cheng Solutions Architect AWS Steven Hsieh Engineer TrendMicro

Editor's Notes

  1. As a result of the advantages shown on the previous slides, there are more people running their data lakes, and analytics on AWS than anywhere else. This includes customers like FINRA, Netflix, Nasdaq, Amazon.com, Atlassian, Sysco, Airbnb, iRobot, CrowdStrike, Viber, 21st Century Fox, Vanguard, Takeda, Movable Inc, Expedia.com, Zillow, Yelp, Amgen, JustGiving, NTT Docomo, to name a few. In later slides, we will share details on a number of customers and how they use specific services to achieve their goals.
  2. To enable this for customers, AWS provides the broadest and deepest portfolio of databases and analytics services than any other cloud provider, many of them named above to match with the function they perform (as shown on the previous slide.) AWS offers at least: 10 data movement services 13 analytics services 18 machine learning and AI services 17 security and governance services Maybe more since this slide was created!
  3. <timing 2 minute> Backup notes the way to deal with massive amounts of data is to use S3 to store your data. It can store exabytes of data without breaking a sweat so u can store all your relational and non relational data and not have to throw anything away because your data base does not scale like was the case on-premise. We call if your Data Lake. It has all your data. To get data into the Data Lake, we have data movement services and devices. These bring all the data into the Data Lake from on-premise systems using snowball devices, migrate databases into s3 using database migration service and bring data from IOT and real time ingestion using kinesis. And we have a set of purpose built tools that work directly on data in S3. Redshift is our exabyte scale Data Warehouse that can work directly on data in S3. Quicksight is our BI tool that can query the data directly in S3. Athena is our adhoc query service for data in S3 that you can use in place of Redshift if you have need to do adhoc query to data sitting in S3. EMR supports Spark, and Hadoop to directly make sense out of data in S3. And these tools are priced so inexpensively that you can make sense out of all your data and not have to delete data to save money. No need to do a ROI analysis on data. Redshift costs $1,000/TB per year instead of 10k to 50k like Teradata and Oracle on-premise systems. Athena can query data at a rate of ½ cent per GB and S3 can store GB of data for a whole month for 2.3 cents. Quicksight can query data for 30 cents for a 30 mt session instead of legacy tools that can cost you lot more money per user per month. You can give all your users access to all your data without breaking the bank and truly be a data driven organization.
  4. AWS continues to innovate and find more efficient ways for you to analyze. We have an emerging serverless analytics stack – you can put all these systems together with zero infrastructure to manage. This lets you pay per use, with close to zero costs when things are ideal. Scales automatically; systems are highly available and fault tolerant by default. IoT data is a great example where an “always on” system provides continuous sensor data, but the analytics is on-demand and you pay for those services only when you use them. We also deliver our AI/ML services this way. While we offer GPUs and frameworks for experts, we also make higher level services for image and video analysis, transcription, translation etc, available via APIs with nothing for you to manage Simply put, we do a bunch of work behind the scenes, so you don’t have to.
  5. As clouding services increases, cloud spending increases. We have to find where we can improve and optimize Provide visibility to prove the insight and lead to
  6. Consumption model: turn off when work off -> pay as your demand Measure: business output Stop ? Analyze: Managed services: operation cost
  7. More Detail
  8. More Detail
  9. 3rd Analytic Tools
  10. 頁面美觀
  11. 描述詳細一點
  12. 描述詳細一點
  13. 描述詳細一點 補designer角色slide 看相聲 抑揚頓挫 笑話三個
  14. Stop spending money on data center operations
  15. Stop spending money on data center operations