Big data and serverless - AWS UG The Netherlands

Big data and serverless
Marek Kuczynski
Sr. Solutions Architect – Startups
@marekq
marekku@amazon.nl
A W S U s e r G r o u p N e t h e r l a n d s M e e t u p

Various choices for compute on AWS
Amazon EC2
Virtual server instances
in the cloud
Amazon ECS,
EKS, and Fargate
Container management service
for running
Docker on a managed
cluster of EC2
AWS Lambda
Serverless compute
for stateless code execution in
response to triggers

Event based architectures
SERVICES (ANYTHING)
Changes in
data state
Requests to
endpoints
Changes in
resource state
EVENT SOURCE FUNCTION
Node.js
Python
Java
C#
Go
Ruby
PowerShell
Bring your own runtime

Common Lambda use cases
Web
Applications
• Static
websites
• Complex web
apps
• Packages for
Flask and
Express
Data
Processing
• Real time
• MapReduce
• Batch
Chatbots
• Powering
chatbot logic
Backends
• Apps &
services
• Mobile
• IoT
</></>
Amazon
Alexa
• Powering
voice-enabled
apps
• Alexa Skills Kit
IT
Automation
• Policy engines
• Extending AWS
services
• Infrastructure
management

AWS serverless portfolio
COMPUTE AND DATASTORES
AWS
Lambda
AWS
Fargate
Amazon
API Gateway
Amazon
SNS
Amazon
MQ
Amazon
SQS
AWS
Step Functions
APPLICATION INTEGRATION
DEVELOPER TOOLS
SECURITY AND ADMINISTRATION
Amazon Aurora
Serverless
Amazon
S3
Amazon
DynamoDB
AWS
AppSync
AWS
IAM
Amazon
Cognito
Amazon
Inspector
Amazon
VPC
Amazon
GuardDuty
AWS
CloudFormation
AWS
Cloud9
AWS
CloudTrail
Amazon
CloudWatch
AWS
X-Ray
AWS
CodePipeline
AWS
Config
AWS
SSO
AWS
Shield
AWS
WAF
Amazon
Kinesis
AWS Serverless
Application
Repository

Serverless is a spectrum
More operations Less operations

Build well architected
• Scalability
• Is scalability seamless, semi-automatic or a manual process?
• Resilience
• To what degree can we (automatically) recover from issues on infrastructure?
• Cost
• Can we control cost based on pricing per operation/invocation?
• Maintenance and operations
• How much OS/software maintenance will be needed going forward?
• Security
• How do I keep infrastructure secure and handle authentication/authorization?

A serverless, three tier application
Data stored in
DynamoDB
Dynamic content in
AWS Lambda
Amazon API
Gateway
Browser
Amazon
CloudFront
Amazon S3
Amazon Cognito

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
My
Demo of a serverless blog – https://marek.rocks

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Monthly costs of running the blog
The website has been running stable for 3+ years with a few
hundred visitors every month.
• Route53 hosted zone $0,50
• Lambda function cost $0,30
• DynamoDB costs $0,20
• API Gateway costs $0,10
• Email costs $0,02
• Domain name $1
No maintenance (patching, scaling, backups) is required.
TCO is at least 10 x cheaper than running this on EC2.

Building and orchestrating a
serverless data

AWS solutions to build a serverless data lake
Amazon
S3
bucket(s)
Amazon ESAWS
Glue
Amazon
DynamoDB
Catalog & search
AWS Key
Management
Service (AWS KMS)
AWS
CloudTrail
IAM Amazon
Macie
Security & auditing
Amazon
Cognito
Amazon
API
Gateway
IAM
API/UI
Amazon
Athena
Amazon
QuickSight
Aurora
Serverless or
Redshift
Analytics & processing
AWS
Glue
AWS
Lambda
Amazon
Kinesis
Data
Streams
Amazon
Kinesis
Data
Firehose
AWS
Direct
Connect
Ingest

Ingestion using Kinesis
Amazon CloudWatch:
Delivery metrics
Amazon S3:
Buffered files
Kinesis
Agent
Record
producers Amazon Redshift or Aurora:
Table loads
Amazon Elasticsearch Service:
Domain loads
Amazon S3:
Source record backup
AWS Lambda:
Transformations &
enrichment
Amazon DynamoDB:
Lookup tables
Raw records
Lookup
Transformed records
Transformed recordsRaw records
Kinesis Data Firehose:
Delivery stream

Architectures patterns to push or pull data
S3
bucket
object
Lambda
function
1. File put into bucket
2. Lambda invoked
Lambda
function
2. Lambda invoked
SNS
topic
1. Data published to a
topic
Data
1. Message inserted
into to a queue
message
Amazon
SQS
Lambda
function
3. Function
removes
message from
queue
2. Lambda polls
queue and
invokes function

Recent launch; richer workflows using Step Functions
Simplify building workloads such as
order processing, report generation,
and data analysis
Add services in minutes
Write and maintain less code
AWS
Step Functions
AWS
Lambda
Amazon
ECS
AWS
Fargate
AWS
Batch
Amazon
SageMaker
AWS
Glue
Amazon
DynamoDB
Amazon
SNS
Amazon
SQS

Simpler integration, less code
With serverless polling With new service integration
AWS
Lambda
functions
No
Lambda
functions

Serverless data lakes -
Analytics

Analytics
Various choices for analytics of your data
• S3 Select on CSV, JSON and Apache Parquet objects
• Amazon Athena
• AWS Lambda
• Predictions with Amazon SageMaker
• Amazon EMR
• AWS Glue (ETL)

S3 Select – selecting fields from individual files

Athena – running a query on files in S3 buckets
44.66 seconds...Data scanned: 169.53GB
Cost: $5/TB or $0.005/GB = $0.85
SELECT custid, year, sum(count) FROM sales
WHERE custid = ‘157231’
GROUP BY gram, year ORDER BY year ASC;

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Run a data science framework in Lambda
• pandas
• SciPy
• NumPy
• matplotlib

Just released : S3 Batch Operations
Amazon S3
Lambda
Function
Lambda
Function
Lambda
Function
Lambda
Function
Lambda
Function
Lambda
Function
Lambda
Function
Lambda
Function
Lambda
Function
This new feature can;
• Modify the ACL’s or tags of
objects on S3 at scale.
• Copy objects to a new bucket
while preserving properties.
• Let Lambda (re)process all your
files stored on S3.
AWS takes care of running the
operations, even if your bucket
has billions of objects.

Big data and serverless - AWS UG The Netherlands

Why relational and not NoSQL?
Sometimes it’s not possible to use a NoSQL database;
• You need to integrate with other backend applications that run on a
relational database (i.e. WordPress) or are hard to modify.
• You need access to complex queries that are harder to do with NoSQL
(i.e. multiple joins, fuzzy searches).
• There may be other database features that your application requires
(logging, ACID compliance).

How does Serverless Aurora work?
Availability zone 1
Region
App on
EC2 or
Lambda
Shared distributed storage volume
Multi-tenant proxy layer
Warm-pool of Aurora instances
Monitoring service

Introducing Amazon Relational Database Service Data API
• Simple web service protocol for database access
• SQL statements packaged as HTTP requests
• Access your database from Lambda and AppSync
• Access your database from the AWS SDK & CLI
Data API Service
Aurora Serverless

Introducing RDS Console Query Editor
• Access your database from
AWS Management Console
• No database client application
or terminal required
• The same requests can be
made using the AWS SDK or
CLI.

A serverless, relational three tier application
Data stored in
Aurora serverless
Dynamic content in
AWS Lambda
Amazon API
Gateway
Browser
Amazon
CloudFront
Amazon S3
Amazon Cognito

Search and Data Catalog
• Use DynamoDB as a metadata repository
• Optionally use Amazon ElasticSearch for
more complex queries
AWS Lambda
Metadata Index
(DynamoDB)
Search Index
(Amazon ES)
ObjectCreated
ObjectDeleted PutItem
Update Index
S3 Bucket
https://aws.amazon.com/answers/big-data/data-lake-solution/
Catalog & Search

AWS Glue
Crawlers
AWS Glue
Data Catalog
Amazon
QuickSight
Amazon
Redshift
Spectrum
Amazon
Athena
S3
Bucket(s)
Catalog & Search
Use Glue Crawlers to build a data catalogue

AWS Lake Formation (in preview)
Build, secure, and manage data lakes, reducing the set up time from months to days

AWS CodeStar
• Quickly develop, build, and deploy
applications on AWS
• Start developing on AWS in minutes
• Work across your team, securely
• Manage software delivery easily
• Choose from a variety of project
templates

AWS CodeStar
Project templates for EC2, AWS Lambda, and Elastic Beanstalk

Services deployed for you when using CodeStar
Source Build Test Deploy Monitor
AWS CodeBuild +
Third Party
AWS CodeCommit AWS CodeBuild AWS CodeDeploy
AWS CodePipeline
AWS X-Ray
Amazon
CloudWatch

<-THIS
BECOMES THIS->
SAM Template

Use AWS X-Ray to debug functions

Further reading and events
Well Architected Lens for serverless
https://d1.awsstatic.com/whitepapers/architecture/AWS-Serverless-
Applications-Lens.pdf
Serverless Application Repository
https://serverlessrepo.aws.amazon.com/
Free developer event - AWS DevDay on June 19th in Utrecht
https://aws.amazon.com/events/Devdays-Utrecht/

No server is easier to manage than "no
server.”
Werner Vogels—Amazon CTO

Thank you!
Marek Kuczynski
Sr. Solutions Architect - startups
@marekq
marekku@amazon.nl

Big data and serverless - AWS UG The Netherlands

More Related Content

Big data and serverless - AWS UG The Netherlands