During the meetup of the AWS NL user group on 3rd of June 2019, Marek Kuczynski from AWS presented about big data solutions using serverless.
Report
Share
Report
Share
1 of 44
Download to read offline
More Related Content
Big data and serverless - AWS UG The Netherlands
1. Big data and serverless
Marek Kuczynski
Sr. Solutions Architect – Startups
@marekq
marekku@amazon.nl
A W S U s e r G r o u p N e t h e r l a n d s M e e t u p
2. Various choices for compute on AWS
Amazon EC2
Virtual server instances
in the cloud
Amazon ECS,
EKS, and Fargate
Container management service
for running
Docker on a managed
cluster of EC2
AWS Lambda
Serverless compute
for stateless code execution in
response to triggers
3. Event based architectures
SERVICES (ANYTHING)
Changes in
data state
Requests to
endpoints
Changes in
resource state
EVENT SOURCE FUNCTION
Node.js
Python
Java
C#
Go
Ruby
PowerShell
Bring your own runtime
4. Common Lambda use cases
Web
Applications
• Static
websites
• Complex web
apps
• Packages for
Flask and
Express
Data
Processing
• Real time
• MapReduce
• Batch
Chatbots
• Powering
chatbot logic
Backends
• Apps &
services
• Mobile
• IoT
</></>
Amazon
Alexa
• Powering
voice-enabled
apps
• Alexa Skills Kit
IT
Automation
• Policy engines
• Extending AWS
services
• Infrastructure
management
7. Build well architected
• Scalability
• Is scalability seamless, semi-automatic or a manual process?
• Resilience
• To what degree can we (automatically) recover from issues on infrastructure?
• Cost
• Can we control cost based on pricing per operation/invocation?
• Maintenance and operations
• How much OS/software maintenance will be needed going forward?
• Security
• How do I keep infrastructure secure and handle authentication/authorization?
8. A serverless, three tier application
Data stored in
DynamoDB
Dynamic content in
AWS Lambda
Amazon API
Gateway
Browser
Amazon
CloudFront
Amazon S3
Amazon Cognito
13. AWS solutions to build a serverless data lake
Amazon
S3
bucket(s)
Amazon ESAWS
Glue
Amazon
DynamoDB
Catalog & search
AWS Key
Management
Service (AWS KMS)
AWS
CloudTrail
IAM Amazon
Macie
Security & auditing
Amazon
Cognito
Amazon
API
Gateway
IAM
API/UI
Amazon
Athena
Amazon
QuickSight
Aurora
Serverless or
Redshift
Analytics & processing
AWS
Glue
AWS
Lambda
Amazon
Kinesis
Data
Streams
Amazon
Kinesis
Data
Firehose
AWS
Direct
Connect
Ingest
14. Ingestion using Kinesis
Amazon CloudWatch:
Delivery metrics
Amazon S3:
Buffered files
Kinesis
Agent
Record
producers Amazon Redshift or Aurora:
Table loads
Amazon Elasticsearch Service:
Domain loads
Amazon S3:
Source record backup
AWS Lambda:
Transformations &
enrichment
Amazon DynamoDB:
Lookup tables
Raw records
Lookup
Transformed records
Transformed recordsRaw records
Kinesis Data Firehose:
Delivery stream
15. Architectures patterns to push or pull data
S3
bucket
object
Lambda
function
1. File put into bucket
2. Lambda invoked
Lambda
function
2. Lambda invoked
SNS
topic
1. Data published to a
topic
Data
1. Message inserted
into to a queue
message
Amazon
SQS
Lambda
function
3. Function
removes
message from
queue
2. Lambda polls
queue and
invokes function
16. Recent launch; richer workflows using Step Functions
Simplify building workloads such as
order processing, report generation,
and data analysis
Add services in minutes
Write and maintain less code
AWS
Step Functions
AWS
Lambda
Amazon
ECS
AWS
Fargate
AWS
Batch
Amazon
SageMaker
AWS
Glue
Amazon
DynamoDB
Amazon
SNS
Amazon
SQS
17. Simpler integration, less code
With serverless polling With new service integration
AWS
Lambda
functions
No
Lambda
functions
19. Analytics
Various choices for analytics of your data
• S3 Select on CSV, JSON and Apache Parquet objects
• Amazon Athena
• AWS Lambda
• Predictions with Amazon SageMaker
• Amazon EMR
• AWS Glue (ETL)
Analytics & processing
20. S3 Select – selecting fields from individual files
21. S3 Select – selecting fields from individual files
22. Athena – running a query on files in S3 buckets
44.66 seconds...Data scanned: 169.53GB
Cost: $5/TB or $0.005/GB = $0.85
SELECT custid, year, sum(count) FROM sales
WHERE custid = ‘157231’
GROUP BY gram, year ORDER BY year ASC;
Analytics & processing
24. Just released : S3 Batch Operations
Amazon S3
Lambda
Function
Lambda
Function
Lambda
Function
Lambda
Function
Lambda
Function
Lambda
Function
Lambda
Function
Lambda
Function
Lambda
Function
This new feature can;
• Modify the ACL’s or tags of
objects on S3 at scale.
• Copy objects to a new bucket
while preserving properties.
• Let Lambda (re)process all your
files stored on S3.
AWS takes care of running the
operations, even if your bucket
has billions of objects.
26. Why relational and not NoSQL?
Sometimes it’s not possible to use a NoSQL database;
• You need to integrate with other backend applications that run on a
relational database (i.e. WordPress) or are hard to modify.
• You need access to complex queries that are harder to do with NoSQL
(i.e. multiple joins, fuzzy searches).
• There may be other database features that your application requires
(logging, ACID compliance).
27. How does Serverless Aurora work?
Availability zone 1
Region
App on
EC2 or
Lambda
Shared distributed storage volume
Multi-tenant proxy layer
Warm-pool of Aurora instances
Monitoring service
28. Introducing Amazon Relational Database Service Data API
• Simple web service protocol for database access
• SQL statements packaged as HTTP requests
• Access your database from Lambda and AppSync
• Access your database from the AWS SDK & CLI
Data API Service
Aurora Serverless
29. Introducing RDS Console Query Editor
• Access your database from
AWS Management Console
• No database client application
or terminal required
• The same requests can be
made using the AWS SDK or
CLI.
30. A serverless, relational three tier application
Data stored in
Aurora serverless
Dynamic content in
AWS Lambda
Amazon API
Gateway
Browser
Amazon
CloudFront
Amazon S3
Amazon Cognito
32. Search and Data Catalog
• Use DynamoDB as a metadata repository
• Optionally use Amazon ElasticSearch for
more complex queries
AWS Lambda
Metadata Index
(DynamoDB)
Search Index
(Amazon ES)
ObjectCreated
ObjectDeleted PutItem
Update Index
S3 Bucket
https://aws.amazon.com/answers/big-data/data-lake-solution/
Catalog & Search
33. AWS Glue
Crawlers
AWS Glue
Data Catalog
Amazon
QuickSight
Amazon
Redshift
Spectrum
Amazon
Athena
S3
Bucket(s)
Catalog & Search
Use Glue Crawlers to build a data catalogue
34. AWS Lake Formation (in preview)
Build, secure, and manage data lakes, reducing the set up time from months to days
36. AWS CodeStar
• Quickly develop, build, and deploy
applications on AWS
• Start developing on AWS in minutes
• Work across your team, securely
• Manage software delivery easily
• Choose from a variety of project
templates
38. Services deployed for you when using CodeStar
Source Build Test Deploy Monitor
AWS CodeBuild +
Third Party
AWS CodeCommit AWS CodeBuild AWS CodeDeploy
AWS CodePipeline
AWS X-Ray
Amazon
CloudWatch
42. Further reading and events
Well Architected Lens for serverless
https://d1.awsstatic.com/whitepapers/architecture/AWS-Serverless-
Applications-Lens.pdf
Serverless Application Repository
https://serverlessrepo.aws.amazon.com/
Free developer event - AWS DevDay on June 19th in Utrecht
https://aws.amazon.com/events/Devdays-Utrecht/
43. No server is easier to manage than "no
server.”
Werner Vogels—Amazon CTO