SlideShare a Scribd company logo
©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.
BI & Analytics - Datalakes on AWS
Johan Brom an
M anager, Solutions Architecture
©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.
©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.
Today's conversation
Business drivers for a Data Lake
Designing and building
Production use cases
©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.
Data Drives Better Decision
Making
©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.
Outcome 1 : Modernize and consolidate
• Insights to enhance business applications and create new digital services
Outcome 2 : Innovate for new revenues
• Personalization, demand forecasting, risk analysis
Outcome 3 : Real-time engagement
• Interactive customer experience, event-driven automation, fraud detection
Outcome 4 : Automate for expansive reach
• Automation of business processes and physical infrastructure
Business Outcomes on a Modern Data Architecture
©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.
©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.
©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.
Legacy Data Architectures Exist as Isolated Data Silos
Hadoop
Cluster
SQL
Database
Data
Warehouse
Appliance
©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.
Enter Data Lake Architectures
Data Lake is a new and increasingly
popular architecture to store and analyze
massive volumes and heterogeneous
types of data.
©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.
Benefits of a Data Lake
Store and analyse all of your data,
from all of your sources, in one
centralised location.
Quickly ingest data
without needing to force it into a
pre-defined schema.
Separating your
storage and compute
allows you to scale
each component as
required
A Data Lake enables ad-hoc analysis
by applying schemas
on read, not write.
©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.
Today's conversation
Business drivers for a Data Lake
Designing and building
Production use cases
©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.
Data
scientists
Automation /
events
Business
users
Data
analysts
Engagement
platforms
1. More personas need access to data, through appropriate tools
2. More systems need to link to data for decision and process automation
3. Users need to be able to find information, and access it securely
Expanding access requirements
©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.
1. Data must be captured from diverse sources at speed and scale
2. Data needs to be pulled together, breaking down traditional silos
3. Benefits need to far outweigh the costs of collection and analysis
Transactions ERP Connected
devices
Social mediaWeb logs /
cookies
Exponential growth of business data
©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.
Important Components of a Data Lake
Catalogue
& Search
Protect
& Secure
Access &
User Interface Ingest & Store
©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.
AWS Approach to Data Lakes
©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.
Data Lakes Extend the Traditional Approach
Relational and non-relational data
TBs-EBs scale
Schema defined during analysis
Diverse analytical engines to gain insights
Designed for low-cost storage and analytics
OLTP ERP CRM LOB
Data warehouse
Business
intelligence
Data lake
100110000100101011100
101010111001010100001011111011010
0011110010110010110
0100011000010
Devices Web Sensors Social
Catalog
Machine
learning
DW
queries
Big data
processing
Interactive Real-time
©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.
S3 is key in the
Data Lake
©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.
Building a Data Lake on AWS
Kinesis Firehose Athena
Query Service
©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.
Designed for 11 9s
of durability
Designed for
99.99% availability
Durable Available High performance
§ Multipart upload
§ Range GET
§ Store as much as you need
§ Scale storage and compute
independently
§ No minimum usage
commitments
Scalable
§ Amazon Redshift / Spectrum
§ Amazon EMR
§ Amazon Athena
§ Amazon DynamoDB
Integrated
§ Simple REST API
§ AWS SDKs
§ Read-after-create consistency
§ Event notification
§ Lifecycle policies
Easy to use
Why Amazon S3 for the Data Lake?
©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.
Security
§ Identity and Access
Management (IAM) policies
§ Bucket policies
§ Access Control Lists (ACLs)
§ Private VPC endpoints to
Amazon S3
§ Pre-signed S3 URLs
Encryption
§ SSL endpoints
§ Server Side Encryption
(SSE-S3)
§ S3 Server Side
Encryption with
provided keys (SSE-C,
SSE-KMS)
§ Client-side Encryption
Audit & Compliance
§ Buckets access logs
§ Lifecycle Management
Policies
§ Versioning & MFA
deletes
§ Certifications – HIPAA,
PCI, SOC 1/2/3 etc.
Implement the right cloud security controls
©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.
Data Ingestion into S3
AWS Direct Connect
AWS SnowballISV Connectors
Amazon Kinesis
Firehose
AWS Storage
Gateway
S3 Transfer
Acceleration
©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.
Storing is not enough. Data needs to be discoverable.
Dark data are the information
assets organizations collect,
process, and store during
regular business activities,
but generally fail to use for
other purposes (for example,
analytics, business relationships
and direct monetizing).
Gartner
CRM ERP D ata w arehouse M ainfram e
data
W eb Social Log
files
M achine
data
Sem i-
structured
Unstructured
“
”
©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.
AWS Glue: Data Catalog
Make data discoverable
Automatically discovers data and stores schema
Catalog makes data searchable and available for ETL
Catalog contains table and job definitions
Computes statistics to make queries efficient
Com pliance
AWS Glue
Data Catalog
Discover data and
extract schema
©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.
Data preparation accounts for ~80% of the work.
Building training sets
Cleaning and organizing data
Collecting data sets
Mining data for patterns
Refining algorithms
Other
©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.
AWS Glue: ETL Service
Make ETL scripting and deployment easy
Automatically generates ETL code
Code is customizable with Python and Spark
Endpoints provided to edit, debug, & test code
Jobs are scheduled or event-based
Serverless
©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.
Amazon Athena: Interactive Analysis
$ SQL
Query Instantly
Zero setup cost;
just point to
Amazon S3 and
start querying.
Pay per query
Pay only for queries run;
save 30–90% on per-
query costs through
compression.
Open
ANSI SQL interface,
JDBC/ODBC drivers, multiple
formats, compression types,
and complex joins and data
types.
Easy
Serverless: zero
infrastructure, zero
administration
Integrated with Amazon
QuickSight.
Interactive query service to analyze data in Amazon S3 using standard SQL
No infrastructure to set up or manage and no data to load
Ability to run SQL queries on data archived in Amazon Glacier (coming soon)
©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.
Q uickSight O verview
Integrated with AWS - Redshift, RDS, Athena, S3,
IAM, Roles, CloudTrail and more
Cloud Native - Fully managed, serverless analytics at
scale
Super Fast and Easy to Use - Backed by SPICE and
a beautiful UI
Cost Effective - Starts at $9 per user per month
©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.
Putting it all together…
©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.
Summary of AWS Analytics, Database & AI Tools
Amazon Redshift
Enterprise Data Warehouse
Amazon EMR
Hadoop/Spark
Amazon Athena
Clusterless SQL
Amazon Glue
Clusterless ETL
Amazon Aurora
Managed Relational Database
Amazon Machine Learning
Predictive Analytics
Amazon Quicksight
Business Intelligence/Visualization
Amazon ElasticSearch Service
ElasticSearch
Amazon ElastiCache
Redis In-memory Datastore
Amazon DynamoDB
Managed NoSQL Database
Amazon Rekognition
Deep Learning-based Image Recognition
Amazon Lex
Voice or Text Chatbots
©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.
Queries Against an Amazon S3 Data Lake
©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.
Event-driven ETL Pipelines
©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.
AWS Solution Builder - Data Lake on AWS
Reference Architecture deployment
via CloudFormation
Configures core services to tag,
search and catalogue datasets
Deploys a console to search and
browse available datasets
http://amzn.to/2nTVjcp
©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.
Processing & Analytics
Real-time Batch
AI & Predictive
BI & Data Visualization
Transactional &
RDBMS
AWSLambda
ApacheStorm
onEMR
ApacheFlink
onEMR
Spark Streaming
onEMR
Elasticsearch
Service
Kinesis Analytics,
Kinesis Streams
DynamoDB
NoSQL DB Relational Database
Aurora
EMR
Hadoop, Spark,
Presto
Redshift
DataWarehouse
Athena
Query Service
AmazonLex
Speech
recognition
Amazon
Rekognition
AmazonPolly
Text tospeech
MachineLearning
Predictiveanalytics
Kinesis Streams
& Firehose
©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.
Today's conversation
Business drivers for a Data Lake
Designing and building
Production use cases
©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.
“For our market
surveillance systems, we
are looking at about 40%
[savings with AWS], but
the real benefits are the
business benefits: We
can do things that we
physically weren’t able to
do before, and that is
priceless.”
- Steve Randich, CIO
Case Study: Re-architecting Compliance
What FINRA needed
• Infrastructure for its market surveillance platform
• Support of analysis and storage of approximately 75
billion market events every day
Why they chose AWS
• Fulfillment of FINRA’s security requirements
• Ability to create a flexible platform using dynamic
clusters (Hadoop, Hive, and HBase), Amazon EMR,
and Amazon S3
Benefits realized
• Increased agility, speed, and cost savings
• Estimated savings of $10-20m annually by using AWS
©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.
AWS Solution Builder - Data Lake on AWS
Reference Architecture deployment
via CloudFormation
Configures core services to tag,
search and catalogue datasets
Deploys a console to search and
browse available datasets
http://amzn.to/2nTVjcp
©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.
Thank you!

More Related Content

BI & Analytics - A Datalake on AWS

  • 1. ©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved. BI & Analytics - Datalakes on AWS Johan Brom an M anager, Solutions Architecture ©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.
  • 2. ©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved. Today's conversation Business drivers for a Data Lake Designing and building Production use cases
  • 3. ©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved. Data Drives Better Decision Making
  • 4. ©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved. Outcome 1 : Modernize and consolidate • Insights to enhance business applications and create new digital services Outcome 2 : Innovate for new revenues • Personalization, demand forecasting, risk analysis Outcome 3 : Real-time engagement • Interactive customer experience, event-driven automation, fraud detection Outcome 4 : Automate for expansive reach • Automation of business processes and physical infrastructure Business Outcomes on a Modern Data Architecture
  • 5. ©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.
  • 6. ©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.
  • 7. ©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved. Legacy Data Architectures Exist as Isolated Data Silos Hadoop Cluster SQL Database Data Warehouse Appliance
  • 8. ©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved. Enter Data Lake Architectures Data Lake is a new and increasingly popular architecture to store and analyze massive volumes and heterogeneous types of data.
  • 9. ©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved. Benefits of a Data Lake Store and analyse all of your data, from all of your sources, in one centralised location. Quickly ingest data without needing to force it into a pre-defined schema. Separating your storage and compute allows you to scale each component as required A Data Lake enables ad-hoc analysis by applying schemas on read, not write.
  • 10. ©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved. Today's conversation Business drivers for a Data Lake Designing and building Production use cases
  • 11. ©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved. Data scientists Automation / events Business users Data analysts Engagement platforms 1. More personas need access to data, through appropriate tools 2. More systems need to link to data for decision and process automation 3. Users need to be able to find information, and access it securely Expanding access requirements
  • 12. ©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved. 1. Data must be captured from diverse sources at speed and scale 2. Data needs to be pulled together, breaking down traditional silos 3. Benefits need to far outweigh the costs of collection and analysis Transactions ERP Connected devices Social mediaWeb logs / cookies Exponential growth of business data
  • 13. ©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved. Important Components of a Data Lake Catalogue & Search Protect & Secure Access & User Interface Ingest & Store
  • 14. ©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved. AWS Approach to Data Lakes
  • 15. ©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved. Data Lakes Extend the Traditional Approach Relational and non-relational data TBs-EBs scale Schema defined during analysis Diverse analytical engines to gain insights Designed for low-cost storage and analytics OLTP ERP CRM LOB Data warehouse Business intelligence Data lake 100110000100101011100 101010111001010100001011111011010 0011110010110010110 0100011000010 Devices Web Sensors Social Catalog Machine learning DW queries Big data processing Interactive Real-time
  • 16. ©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved. S3 is key in the Data Lake
  • 17. ©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved. Building a Data Lake on AWS Kinesis Firehose Athena Query Service
  • 18. ©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved. Designed for 11 9s of durability Designed for 99.99% availability Durable Available High performance § Multipart upload § Range GET § Store as much as you need § Scale storage and compute independently § No minimum usage commitments Scalable § Amazon Redshift / Spectrum § Amazon EMR § Amazon Athena § Amazon DynamoDB Integrated § Simple REST API § AWS SDKs § Read-after-create consistency § Event notification § Lifecycle policies Easy to use Why Amazon S3 for the Data Lake?
  • 19. ©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved. Security § Identity and Access Management (IAM) policies § Bucket policies § Access Control Lists (ACLs) § Private VPC endpoints to Amazon S3 § Pre-signed S3 URLs Encryption § SSL endpoints § Server Side Encryption (SSE-S3) § S3 Server Side Encryption with provided keys (SSE-C, SSE-KMS) § Client-side Encryption Audit & Compliance § Buckets access logs § Lifecycle Management Policies § Versioning & MFA deletes § Certifications – HIPAA, PCI, SOC 1/2/3 etc. Implement the right cloud security controls
  • 20. ©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved. Data Ingestion into S3 AWS Direct Connect AWS SnowballISV Connectors Amazon Kinesis Firehose AWS Storage Gateway S3 Transfer Acceleration
  • 21. ©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved. Storing is not enough. Data needs to be discoverable. Dark data are the information assets organizations collect, process, and store during regular business activities, but generally fail to use for other purposes (for example, analytics, business relationships and direct monetizing). Gartner CRM ERP D ata w arehouse M ainfram e data W eb Social Log files M achine data Sem i- structured Unstructured “ ”
  • 22. ©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved. AWS Glue: Data Catalog Make data discoverable Automatically discovers data and stores schema Catalog makes data searchable and available for ETL Catalog contains table and job definitions Computes statistics to make queries efficient Com pliance AWS Glue Data Catalog Discover data and extract schema
  • 23. ©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved. Data preparation accounts for ~80% of the work. Building training sets Cleaning and organizing data Collecting data sets Mining data for patterns Refining algorithms Other
  • 24. ©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved. AWS Glue: ETL Service Make ETL scripting and deployment easy Automatically generates ETL code Code is customizable with Python and Spark Endpoints provided to edit, debug, & test code Jobs are scheduled or event-based Serverless
  • 25. ©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved. Amazon Athena: Interactive Analysis $ SQL Query Instantly Zero setup cost; just point to Amazon S3 and start querying. Pay per query Pay only for queries run; save 30–90% on per- query costs through compression. Open ANSI SQL interface, JDBC/ODBC drivers, multiple formats, compression types, and complex joins and data types. Easy Serverless: zero infrastructure, zero administration Integrated with Amazon QuickSight. Interactive query service to analyze data in Amazon S3 using standard SQL No infrastructure to set up or manage and no data to load Ability to run SQL queries on data archived in Amazon Glacier (coming soon)
  • 26. ©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved. Q uickSight O verview Integrated with AWS - Redshift, RDS, Athena, S3, IAM, Roles, CloudTrail and more Cloud Native - Fully managed, serverless analytics at scale Super Fast and Easy to Use - Backed by SPICE and a beautiful UI Cost Effective - Starts at $9 per user per month
  • 27. ©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved. Putting it all together…
  • 28. ©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved. Summary of AWS Analytics, Database & AI Tools Amazon Redshift Enterprise Data Warehouse Amazon EMR Hadoop/Spark Amazon Athena Clusterless SQL Amazon Glue Clusterless ETL Amazon Aurora Managed Relational Database Amazon Machine Learning Predictive Analytics Amazon Quicksight Business Intelligence/Visualization Amazon ElasticSearch Service ElasticSearch Amazon ElastiCache Redis In-memory Datastore Amazon DynamoDB Managed NoSQL Database Amazon Rekognition Deep Learning-based Image Recognition Amazon Lex Voice or Text Chatbots
  • 29. ©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved. Queries Against an Amazon S3 Data Lake
  • 30. ©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved. Event-driven ETL Pipelines
  • 31. ©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved. AWS Solution Builder - Data Lake on AWS Reference Architecture deployment via CloudFormation Configures core services to tag, search and catalogue datasets Deploys a console to search and browse available datasets http://amzn.to/2nTVjcp
  • 32. ©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved. Processing & Analytics Real-time Batch AI & Predictive BI & Data Visualization Transactional & RDBMS AWSLambda ApacheStorm onEMR ApacheFlink onEMR Spark Streaming onEMR Elasticsearch Service Kinesis Analytics, Kinesis Streams DynamoDB NoSQL DB Relational Database Aurora EMR Hadoop, Spark, Presto Redshift DataWarehouse Athena Query Service AmazonLex Speech recognition Amazon Rekognition AmazonPolly Text tospeech MachineLearning Predictiveanalytics Kinesis Streams & Firehose
  • 33. ©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved. Today's conversation Business drivers for a Data Lake Designing and building Production use cases
  • 34. ©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved. “For our market surveillance systems, we are looking at about 40% [savings with AWS], but the real benefits are the business benefits: We can do things that we physically weren’t able to do before, and that is priceless.” - Steve Randich, CIO Case Study: Re-architecting Compliance What FINRA needed • Infrastructure for its market surveillance platform • Support of analysis and storage of approximately 75 billion market events every day Why they chose AWS • Fulfillment of FINRA’s security requirements • Ability to create a flexible platform using dynamic clusters (Hadoop, Hive, and HBase), Amazon EMR, and Amazon S3 Benefits realized • Increased agility, speed, and cost savings • Estimated savings of $10-20m annually by using AWS
  • 35. ©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved. AWS Solution Builder - Data Lake on AWS Reference Architecture deployment via CloudFormation Configures core services to tag, search and catalogue datasets Deploys a console to search and browse available datasets http://amzn.to/2nTVjcp
  • 36. ©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved. Thank you!