Cloud native orchestrators like AWS Step Functions and Amazon SageMaker Pipelines can be used to construct scalable end-to-end deep learning pipelines in the cloud. These orchestrators provide centralized monitoring, logging, and scaling capabilities. AWS Step Functions is useful for integrating pipelines with production infrastructure, while SageMaker Pipelines is good for research workflows that require validation. Serverless architectures using services like AWS Lambda, Batch, and Fargate can build scalable and flexible pipelines at a low cost.
AWS re:Invent 2016: JustGiving: Serverless Data Pipelines, Event-Driven ETL, ...Amazon Web Services
Organizations need to gain insight and knowledge from a growing number of Internet of Things (IoT), application programming interfaces (API), clickstreams, unstructured and log data sources. However, organizations are also often limited by legacy data warehouses and ETL processes that were designed for transactional data. Building scalable big data pipelines with automated extract-transform-load (ETL) and machine learning processes can address these limitations. JustGiving is the world’s largest social platform for online giving. In this session, we describe how we created several scalable and loosely coupled event-driven ETL and ML pipelines as part of our in-house data science platform called RAVEN. You learn how to leverage AWS Lambda, Amazon S3, Amazon EMR, Amazon Kinesis, and other services to build serverless, event-driven, data and stream processing pipelines in your organization. We review common design patterns, lessons learned, and best practices, with a focus on serverless big data architectures with AWS Lambda.
Amazon Aurora is a MySQL-compatible database engine that combines the speed and availability of high-end commercial databases with the simplicity and cost-effectiveness of open source databases. This session introduces you to Amazon Aurora, explains common use cases for the service, and helps you get started with building your first Amazon Aurora–powered application.
AWS re:Invent 2016: Workshop: Converting Your Oracle or Microsoft SQL Server ...Amazon Web Services
In this workshop, you migrate a sample sporting event and ticketing database from Oracle or Microsoft SQL Server to Amazon Aurora or Postgre SQL using the AWS Schema Conversion Tool (AWS SCT) and AWS Database Migration Service (AWS DMS). The workshop includes the migration of tables, indexes, procedures, functions, constraints, views, and more. We run SCT on a Amazon EC2 Windows instance--bring a laptop with Remote Desktop (or some other method of connecting to the Windows instance). Ideally, you should be familiar with relational databases, especially Oracle or SQL Server and PostgreSQL or Aurora, to get the most from this session. Additionally, attendees should be familiar with SCT and DMS. Familiarity with SQL Developer and pgAdmin III will be helpful but is not required.
Prerequisites:
- Participants should have an AWS account established and available for use during the workshop.
- Please bring your own laptop.
This document provides an overview of Amazon Redshift presented by Pavan Pothukuchi and Chris Liu. The agenda includes an introduction to Redshift, its benefits, use cases, and Coursera's experience using Redshift. Some key benefits highlighted are that Redshift is fast, inexpensive, fully managed, secure, and innovates quickly. Example use cases from NTT Docomo and Nasdaq are discussed. Chris Liu then discusses Coursera's experience moving from no data warehouse to using Redshift over three years, including their current ecosystem involving Redshift, other AWS services, and business intelligence applications. Lessons learned around thinking in Redshift, communicating with users, surprises, and reflections are also shared.
AWS Summit London 2014 | Introduction to Amazon EC2 (100)Amazon Web Services
This document is an introduction to Amazon EC2 presented by Ian Massingham on April 30, 2014. It provides an overview of EC2's key functionality and growth over the past 7 years. EC2 allows users to provision compute capacity in the cloud and pay only for what they use. It offers choices for instance types, operating systems, storage options, and pricing models to meet different use cases. EC2 provides scalability, reliability, security, and cost savings compared to on-premises infrastructure.
(DAT303) Oracle on AWS and Amazon RDS: Secure, Fast, and ScalableAmazon Web Services
AWS and Amazon RDS provide advanced features and architectures that enable graceful migration, high performance, elastic scaling, and high availability for Oracle database workloads. Learn best practices for realizing the benefits of the cloud while reducing costs, by running Oracle on AWS in a variety of single- and multi-instance topologies. This session teaches you to take advantage of features unique to AWS and Amazon RDS to free your databases from the confines of the conventional data center.
Deep Dive on Amazon EC2 Instances - January 2017 AWS Online Tech TalksAmazon Web Services
Amazon EC2 provides a broad selection of instance types to accommodate a diverse mix of workloads. In this session, we provide an overview of the Amazon EC2 instance platform, key platform features, and the concept of instance generations. We dive into the current generation design choices of the different instance families, including the General Purpose, Compute Optimized, Storage Optimized, Memory Optimized, and GPU instance families. We will also provide an overview of the newest instances announced at re:Invent, including the latest generation of Memory and Compute Optimized Instances R4 and C5 instances, new Storage Optimized High I/O I3 instances, and new larger T2 instances. We also detail best practices and share performance tips for getting the most out of your Amazon EC2 instances.
Learning Objectives:
• Get an overview of the EC2 instance platform, key platform features, and the concept of instance generations
• Learn about the latest generation of Amazon EC2 Instances
• Learn best practices around instance selection to optimize performance
AWS Summit London 2014 | Partners & Solutions Track | What's New at AWS?Amazon Web Services
AWS continues to rapidly innovate and release new services and features. In 2014 so far, AWS has released 80 new features including Amazon WorkSpaces, which provides fully managed desktops in the cloud, and Amazon AppStream, which allows resource-intensive applications to be streamed from the cloud. AWS also frequently reduces prices for its services, with recent reductions including 50% for Amazon EBS and 34% for Amazon ElasticCache.
Amazon WorkSpaces is a new service from AWS that delivery fully managed desktops in the Cloud. In this session you be able to learn more about the benefits and capabilities of WorkSpaces and see a demo of the user's experience when using WorkSpaces and the administrators' experience in managing it.
AWS Webcast - High Availability SQL Server with Amazon RDSAmazon Web Services
Amazon RDS for Microsoft SQL Server makes it easy to set up, operate, and scale SQL Server deployments in the cloud. Amazon RDS Multi-AZ deployments provide enhanced availability and durability, making them a natural fit for production database workloads.
Review this webinar to learn more about this easy way to achieve highly available operation of SQL Server. When you provision a Multi-AZ DB Instance, Amazon RDS automatically creates a primary DB Instance and synchronously replicates the data to a standby instance in a different Availability Zone (AZ). Each AZ runs on its own physically distinct, independent infrastructure, and is engineered to be highly reliable. Amazon RDS performs an automatic failover to the standby, with no administrator intervention required, so that your application can resume database operations as soon as the failover is complete.
Data Replication Options in AWS (ARC302) | AWS re:Invent 2013Amazon Web Services
One of the most critical roles of an IT department is to protect and serve its corporate data. As a result, IT departments spend tremendous amounts of resources developing, designing, testing, and optimizing data recovery and replication options in order to improve data availability and service response time. This session outlines replication challenges, key design patterns, and methods commonly used in today’s IT environment. Furthermore, the session provides different data replication solutions available in the AWS cloud. Finally, the session outlines several key factors to be considered when implementing data replication architectures in the AWS cloud.
Day 4 - Big Data on AWS - RedShift, EMR & the Internet of ThingsAmazon Web Services
Big Data is everywhere these days. But what is it and how can you use it to fuel your business? Data is as important to organizations as labour and capital, and if organizations can effectively capture, analyze, visualize and apply big data insights to their business goals, they can differentiate themselves from their competitors and outperform them in terms of operational efficiency and the bottom line.
Join this session to understand the different AWS Big Data and Analytics services such as Amazon Elastic MapReduce (Hadoop), Amazon Redshift (Data Warehouse) and Amazon Kinesis (Streaming), when to use them and how they work together.
Reasons to attend:
- Learn how AWS can help you process and make better use of your data with meaningful insights.
- Learn about Amazon Elastic MapReduce and Amazon Redshift, fully managed petabyte-scale data warehouse solutions.
- Learn about real time data processing with Amazon Kinesis.
Nearly 1,000 takeaways ordered a minute from hungry consumers, with near real time confirmation from restaurants and delivery of their food just 45 minutes later is a hard technical challenge.
AWS allows the many small engineering teams at JUST EAT to take responsibility to meet that challenge, as they build and operate a platform that delivers a takeaway experience for consumers to love.
Learn how we migrated our e-commerce platform to AWS and organise both our platform and teams around the the twin goals of rapid change and high availability. Watch as during the session we deploy changes and break things live in production, and see how the JUST EAT platform is designed around AWS to recover quickly and automatically.
Join AWS at this session to understand how to architect an infrastructure to handle going from zero to millions of users. From leveraging highly scalable AWS services to making smart decisions on building out your application, you'll learn a number of best practices for scaling your infrastructure in the cloud.
Speakers:
Andreas Chatzakis, AWS Solutions Architect
Pete Mounce, Senior Developer, JustEat
(CMP404) Cloud Rendering at Walt Disney Animation StudiosAmazon Web Services
"Each year, the technical complexity of making the next great Walt Disney Animation Studios film increases. Animation and Visual FX studios continue to push the bounds of what is possible in computer graphics. This complexity drives rapid technological growth in both computational resources and storage to the point that it exceeds what we can physically provide with our on-premise compute cluster. As a result, we have started to adopt a hybrid approach with the cloud.
This session addresses the hurdles that animation and VFX studios face and focuses on automation of 'disposable' components (specifically infrastructure, licensing, fleet management, data and dependency management in a large-scale batch workload). We apply these general cloud techniques and utilities to an animation/VFX workload and push the limits with a very large scale cloud renderfarm deployment.
The team from Walt Disney Animation Studios walks through how they use cloud technologies to maximize render capacity. Learn how to leverage high-performance storage (like Amazon EFS), Amazon EC2 networking and the latest EC2 Spot features to provide a fully functional renderfarm at production-quality scale."
This document provides an overview of a presentation on cloud architecture and anti-architecture patterns. The presentation discusses moving a company's primary data store from a centralized SQL database to a distributed Cassandra database in the cloud. An initial prototype backup solution was overengineered, becoming complex and taking too long to implement fully. This highlighted the importance of defining anti-architecture constraints upfront to guide development in a simpler direction. The presentation concludes with a discussion of differences between the company's existing datacenter architecture and goals for a cloud architecture, focusing on replacing centralized components with distributed and decoupled alternatives.
Amazon EC2 changes the economics of computing and provides you with complete control of your computing resources. It is designed to make web-scale cloud computing easier for developers. In this session, we will take you on a journey, starting with the basics of key management and security groups and ending with an explanation of Auto Scaling and how you can use it to match capacity and costs to demand using dynamic policies. We will also discuss tools and best practices that will help you build failure resilient applications that take advantage of the scale and robustness of AWS regions.
Amazon Elastic Compute Cloud (Amazon EC2) provides a broad selection of instance types to accommodate a diverse mix of workloads. In this technical session, we provide an overview of the Amazon EC2 instance platform, key platform features, and the concept of instance generations. We dive into the current-generation design choices of the different instance families, including the General Purpose, Compute Optimized, Storage Optimized, Memory Optimized, and GPU instance families. We also detail best practices and share performance tips for getting the most out of your Amazon EC2 instances. Other Compute options such as ECS and Lambda for processing in the cloud will be introduced and explained at a high level.
AWS re:Invent 2016: Amazon Aurora Best Practices: Getting the Best Out of You...Amazon Web Services
Amazon Aurora is a fully managed relational database engine that provides higher performance, availability and durability than previously possible using conventional monolithic database architectures. After launching a year ago, we continued adding many new features and capabilities to Aurora. In this session AWS Aurora experts will discuss the best practices that will help you put these capabilities to the best use. You will also hear from Amazon Aurora customer Intercom on the best practices they adopted for moving live databases with over two billion rows to a new datastore in Amazon Aurora with almost no downtime or lost records.
Intercom was founded to provide a fundamentally new way for Internet businesses to communicate with customers at scale. For growing startups like Intercom, it’s natural for the load on datastores to grow on a weekly basis. The usual solution to this problem is to get a bigger box from AWS. But very soon you reach a point where bigger boat is not an option anymore. You will learn about the benefits of moving to such a datastore, the problems it introduced, and all about the new ability for scaling that was not there before.
Kalibrr is a startup that provides an online talent assessment platform. They launched their minimum viable product (MVP) on AWS in March 2013, seeing user growth from 0 to 25,000 in two months. AWS allowed Kalibrr to scale easily and provided reliability with no downtime. Kalibrr uses EC2 instances to host their web servers, SES for email, S3 for content storage, ELB for load balancing, and Route 53 for DNS management. AWS's scalability, ease of use, and reliability helped Kalibrr launch their MVP successfully and support further growth.
Nuts and bolts of running a popular site in the aws cloudDavid Veksler
I will share how we develop and host a popular publishing platform in the cloud with a limited budget and technology team.
We'll cover architecture, including a variety of services at Amazon Web Services such as elastic load balancing, S3, Elastic Beanstalk, and RDS in the context of a real site.
We'll cover how we control costs with Spot and burstable instances and scale up with distributed caching.
Finally we'll discuss continuous deployment strategies for Windows and Linux-based cloud applications in the context of a distributed team using an agile process.
This document summarizes key announcements from re:Invent 2016, AWS's annual user conference. The main themes included artificial intelligence, serverless computing, devops, data, and migration tools. Notable product announcements included AWS Batch for batch processing, Aurora for PostgreSQL, Athena for querying data lakes, and X-Ray for debugging distributed applications. The document also discusses AWS's strategy around machine learning and deep learning using MXNet as its primary framework.
Amazon Redshift is a fast, fully managed data warehousing service that allows customers to analyze petabytes of structured data, at one-tenth the cost of traditional data warehousing solutions. It provides massively parallel processing across multiple nodes, columnar data storage for efficient queries, and automatic backups and recovery. Customers have seen up to 100x performance improvements over legacy systems when using Redshift for applications like log and clickstream analytics, business intelligence reporting, and real-time analytics.
Traditional data warehouses become expensive and slow down as the volume of your data grows. Amazon Redshift is a fast, petabyte-scale data warehouse that makes it easy to analyze all of your data using existing business intelligence tools for 1/10th the traditional cost. This session will provide an introduction to Amazon Redshift and cover the essentials you need to deploy your data warehouse in the cloud so that you can achieve faster analytics and save costs.
Getting Started with AWS Lambda and the Serverless Cloud - AWS Summit Cape T...Amazon Web Services
This document provides an overview and introduction to AWS Lambda and serverless computing. It discusses AWS compute offerings like EC2, ECS, and Lambda. It explains benefits of Lambda like no servers to provision, automatic scaling, and built-in availability. Common use cases for Lambda are also presented like web applications, backends, data processing, chatbots, Alexa skills, and IT automation. Best practices for Lambda like versioning, networking, externalizing configuration, and monitoring with X-Ray are covered. The document concludes that Lambda is well-suited for modern application architectures.
The document summarizes announcements from AWS re:Invent 2016 related to compute, storage, artificial intelligence, serverless computing, databases, migration tools, and developer tools. Key announcements included new EC2 instance types, cost reductions, Elastic GPUs, AWS Batch for batch processing, Aurora PostgreSQL, Athena for analytics on S3 data, VMware on AWS, AWS X-Ray for tracing distributed applications, and expanded machine learning capabilities through services like Polly, Lex, and Rekognition as well as support for MXNet as an AI framework.
ENT305 Migrating Your Databases to AWS: Deep Dive on Amazon Relational Databa...Amazon Web Services
Amazon RDS allows you to launch an optimally configured, secure and highly available database with just a few clicks. It provides cost-efficient and resizable capacity, automates time-consuming database administration tasks, and provides you with six familiar database engines to choose from: Amazon Aurora, Oracle, Microsoft SQL Server, PostgreSQL, MySQL and MariaDB. In this session, we will take a close look at the capabilities of Amazon RDS and explain how it works. We’ll also discuss the AWS Database Migration Service and AWS Schema Conversion Tool, which help you migrate databases and data warehouses with minimal downtime from on-premises and cloud environments to Amazon RDS and other Amazon services. Gain your freedom from expensive, proprietary databases while providing your applications with the fast performance, scalability, high availability, and compatibility they need.
Being Well Architected in the Cloud (Updated)Adrian Hornsby
This document provides an overview of a presentation on being well-architected on AWS. The presentation covers:
1. What is the Well-Architected Framework
2. An overview of the framework including security, reliability, performance efficiency, cost optimization, and operational excellence pillars
3. How to be well-architected following AWS best practices
4. A conclusion
In this session, you'll learn how to architect your applications based on Amazon Web Services' Well-Architected Framework principles and Adrian’s 10+ years of experience using AWS.
Amazon Aurora is a MySQL-compatible database engine that combines the speed and availability of high-end commercial databases with the simplicity and cost-effectiveness of open source databases. This session introduces you to Amazon Aurora, explains common use cases for the service, and helps you get started with building your first Amazon Aurora–powered application.
Scaling the Platform for Your Startup - Startup Talks June 2015Amazon Web Services
Join AWS at this session to understand how to architect an infrastructure to handle going from zero to millions of users. From leveraging highly scalable AWS services to making smart decisions on building out your application, you'll learn a number of best practices for scaling your infrastructure in the cloud.
Understand how to architect an infrastructure to handle going from zero to millions of users. From leveraging highly scalable AWS services to making smart decisions on building out your application, you'll learn a number of best practices for scaling your infrastructure in the cloud.
As serverless architectures become more popular, AWS customers need a framework of patterns to help them deploy their workloads without managing servers or operating systems.
As serverless architectures become more popular, AWS customers need a framework of patterns to help them deploy their workloads without managing servers or operating systems.
Raleigh DevDay 2017: Build a serverless web application in one day workshopAmazon Web Services
This document provides a summary of a presentation on building serverless web applications using AWS services like AWS Lambda and Amazon API Gateway. It includes an overview of AWS compute services like EC2, ECS, and Lambda and their differences. The presentation discusses why serverless is useful, common use cases, and design patterns like monolithic and microservices architectures. It also demonstrates how to define and deploy serverless applications using the AWS Serverless Application Model (SAM).
The document discusses the state of serverless computing on AWS. It begins by explaining what serverless computing is and how it has evolved from physical servers to virtual servers in data centers to virtual servers in the cloud. It then discusses some of the key benefits of serverless computing such as no server management, automatic scaling, and pay per use. The document outlines some common use cases for serverless applications including web apps, backends, media processing, and big data workloads. It also provides examples of large customers using AWS Lambda at scale. Finally, it discusses some of the building blocks that enable serverless applications on AWS such as Lambda, API Gateway, DynamoDB, and others.
The document provides an overview of database migration options using AWS Database Migration Service (DMS) and AWS Schema Conversion Tool (SCT). It discusses how DMS can be used to migrate databases across different database platforms with minimal downtime. It also outlines how SCT can be used to convert schemas from commercial databases to open-source databases like PostgreSQL. The document shares customer examples and benefits of using DMS and SCT for heterogeneous, scale-up, and split migrations. It also lists available resources for customers on DMS and SCT.
Similar to DataTalks.Club - Building Scalable End-to-End Deep Learning Pipelines in the Cloud (20)
Have you ever built a sandcastle at the beach, only to see it crumble when the tide comes in? In the digital world, our information is like that sandcastle, constantly under threat from waves of cyberattacks. A cybersecurity course is like learning to build a fortress for your information!
This course will teach you how to protect yourself from sneaky online characters who might try to steal your passwords, photos, or even mess with your computer. You'll learn about things like:
* **Spotting online traps:** Phishing emails that look real but could steal your info, and websites that might be hiding malware (like tiny digital monsters).
* **Building strong defenses:** Creating powerful passwords and keeping your software up-to-date, like putting a big, strong lock on your digital door.
* **Fighting back (safely):** Learning how to identify and avoid threats, and what to do if something does go wrong.
By the end of this course, you'll be a cybersecurity champion, ready to defend your digital world and keep your information safe and sound!
seo proposal | Kiyado Innovations LLP pdfdiyakiyado
Crafting a compelling SEO proposal? Learn how to structure a winning SEO proposal template with essential elements and tips for client engagement. Elevate your SEO strategy with expert insights and examples
DataTalks.Club - Building Scalable End-to-End Deep Learning Pipelines in the Cloud
1. Building Scalable End-to-End Deep Learning Pipelines in
the Cloud
Rustem Feyzkhanov
Machine Learning Engineer @ Instrumental
AWS Machine Learning Hero
2. DataTalks.Club
• Cloud native orchestrators are convenient for constructing
scalable end-to-end deep learning pipelines
• There are multiple services at your disposal for constructing deep
learning workflow and it depends on your context
• You can deploy this kind of workflows pretty easily even for
research projects
Takeaways
3. DataTalks.Club
Data science process
from https://docs.microsoft.com/en-us/azure/machine-learning/team-data-science-process/overview
Business
understanding
Data
acquisition
Modeling Deployment
Customer
acceptance
- Define objectives
- Identify data
sources
- Ingest data
- Explore data
- Update data
- Feature selection
- Create model
- Train model
- Operationalize - Testing and
validation
- Handoff
- Re-train and re-
score
5. DataTalks.Club
What is serverless
On premise Iaas CaaS PaaS FaaS SaaS
Functions Functions Functions Functions Functions Functions
Application Application Application Application Application Application
Runtime Runtime Runtime Runtime Runtime Runtime
Container Container Container Container Container Container
Operating system Operating system Operating system Operating system Operating system Operating system
Vizualization Vizualization Vizualization Vizualization Vizualization Vizualization
Networking Networking Networking Networking Networking Networking
Storage Storage Storage Storage Storage Storage
Hardware Hardware Hardware Hardware Hardware Hardware
6. DataTalks.Club
What is serverless
On premise Iaas CaaS PaaS FaaS SaaS
Functions Functions Functions Functions Functions Functions
Application Application Application Application Application Application
Runtime Runtime Runtime Runtime Runtime Runtime
Container Container Container Container Container Container
Operating system Operating system Operating system Operating system Operating system Operating system
Vizualization Vizualization Vizualization Vizualization Vizualization Vizualization
Networking Networking Networking Networking Networking Networking
Storage Storage Storage Storage Storage Storage
Hardware Hardware Hardware Hardware Hardware Hardware
7. DataTalks.Club
• On-demand cluster/worker/service to scale with your
consumption
• No upfront costs, pay-as-you-go pricing, no payment for idle
time*
• Low operational support
• Defined as Infrastructure as Code (IaC)
• Built-in service integrations
Serverless
8. DataTalks.Club
• Low operational support + no upfront costs
=> easy to start
• Defined as IaC + built-in service integrations
=> flexible infrastructure
• Scalable + built-in service integrations
=> integratabtle with production infrastructure
Why serverless
10. DataTalks.Club
Task
• Getting and transforming data from multiple sources
Challenges
• Combination of multiple frameworks and libraries
• Scaling based on load
• Combination of heavy processing, long running processing and
parallel one
Data preprocessing
11. DataTalks.Club
Tasks
• Training and publishing the model
• Checking multiple sets of hyperparameters
• Handling semi-automatic logic
Challenges
• High cost of GPU instances
• Higher level of uncertainty compared to normal Software Engineering
ML/DL training
12. DataTalks.Club
Task
• Making predictions based on new incoming data
Challenges
• Handling production requirements - latency/load/cost
• Handling multiple frameworks
• Handling model versioning
• Implementing custom logic for choosing the result
ML/DL inference
13. DataTalks.Club
Container/Function-as-a-Service
On premise Iaas CaaS PaaS FaaS SaaS
Functions Functions Functions Functions Functions Functions
Application Application Application Application Application Application
Runtime Runtime Runtime Runtime Runtime Runtime
Container Container Container Container Container Container
Operating system Operating system Operating system Operating system Operating system Operating system
Vizualization Vizualization Vizualization Vizualization Vizualization Vizualization
Networking Networking Networking Networking Networking Networking
Storage Storage Storage Storage Storage Storage
Hardware Hardware Hardware Hardware Hardware Hardware
14. DataTalks.Club
• On-demand cluster/worker to scale with your consumption
• Requires to define just code and launching configuration
• Scaling technique:
• scales based on job queue (AWS Batch)
• starts VM per job (AWS Fargate, Amazon SageMaker)
• starts worker per job (AWS Lambda)
'Serverless' cluster
15. DataTalks.Club
• On-demand cluster which scales based on Job Queue
• Consists of the following components:
• job definition
• job queue
• computer environment
• scheduler
AWS Batch
16. DataTalks.Club
• CaaS service which starts VM per job
• Can be used for both on demand processing and as a scalable
web server
• Fargate can only use customizable CPU instances
AWS Fargate
17. DataTalks.Club
Amazon SageMaker - Processing jobs
• CaaS service which starts VM/VMs per job
• Can be used only for on demand processing
• Can be used only with specific ml. instance types
• Instances can mount S3 buckets as disks
18. DataTalks.Club
Amazon SageMaker - Training jobs
• CaaS service which starts VM/VMs per job
• Can be used only for on demand processing, but also can run cluster of
VMs for a job in case of distributed training
• Can be used only with specific ml. instance types
• Instances can mount S3 buckets as disks
• Handles monitoring, hyperparameters, data import and model export for
you
• Has support for spot instances and handles checkpointing
20. DataTalks.Club
'Serverless' cluster comparison
Lambda SageMaker Fargate Batch
Type FaaS
Pure container(s) as a
service
Pure container as a service
Service which starts
cluster and executes jobs
on it
Pros Fast startup time
(~100ms)
Price per 100ms 1ms
Very scalable
Most instance types
available
Build-in dashboard
Spot instances available
Customizable instances
Medium startup time
(~10-20s)
Spot instances available
Full control VM
Spot instances available
Cons Higher price per CPU/sec
Timeout limit
CPU limit - 2vCPU 6vCPU
RAM limit - 3GB 10GB
Medium startup time
(~30-1min)
Price per 1s (min 1 min)
Price per 1s (min 1 minute)
Only CPU
Slow startup time
(~1-4min)
Price per 1s (min 1 minute)
Use
cases
Short term processes GPU long running
processes
CPU long running
processes
CPU/GPU medium running
multiple tasks processes
Re:Invent 2020
21. DataTalks.Club
• Speed of single inference/training
• Speed of batch inference
• Cost per inference/training
• Scalability
CPU vs GPU for ML
22. DataTalks.Club
Inference cost - Inception V3
Service Type Inference
time (s)
Cost per hour Cost per
prediction
Cost of 1M
predictions
Cost per
month
Lambda
predictions
Lambda
3GB RAM
2vCPU 0.338 $0.18 $0.0000179 $17.9
AWS EC2
c5a.large
on demand 0.177 $0.077 $0.000003786 $3.79 $55.44 3.1M
AWS EC2
c5a.large
spot 0.177 $0.032 $0.000001573 $1.57 $23.04 1.29M
AWS EC2
p2.xlarge
on demand 0.057 $0.9 $0.00001425 $14.25 $648.00 36.2M
AWS EC2
p2.xlarge
spot 0.057 $0.27 $0.000004275 $4.28 $194.40 10.86M
AWS EC2
p3.2xlarge
on demand 0.027 $3.06 $0.00002295 $22.95 $2203.2 123.1M
AWS EC2
p3.2xlarge
Spot 0.027 $0.918 $0.000006885 $6.89 $660.96 36.93M
AWS EC2
inf1.large
on demand 0.0095 $0.368 $0.000000971 $0.97 $264.96 14.8M
AWS EC2
inf1.large
spot 0.0095 $0.1104 $0.000000291 $0.29 $79.49 4.44M
23. DataTalks.Club
Inference cost - Inception V3
Service Type Inference
time (s)
Cost per hour Cost per
prediction
Cost of 1M
predictions
Cost per
month
Lambda
predictions
Lambda
3GB RAM
2vCPU 0.338 $0.18 $0.0000179 $17.9
AWS EC2
c5a.large
on demand 0.177 $0.077 $0.000003786 $3.79 $55.44 3.1M
AWS EC2
c5a.large
spot 0.177 $0.032 $0.000001573 $1.57 $23.04 1.29M
AWS EC2
p2.xlarge
on demand 0.057 $0.9 $0.00001425 $14.25 $648.00 36.2M
AWS EC2
p2.xlarge
spot 0.057 $0.27 $0.000004275 $4.28 $194.40 10.86M
AWS EC2
p3.2xlarge
on demand 0.027 $3.06 $0.00002295 $22.95 $2203.2 123.1M
AWS EC2
p3.2xlarge
Spot 0.027 $0.918 $0.000006885 $6.89 $660.96 36.93M
AWS EC2
inf1.large
on demand 0.0095 $0.368 $0.000000971 $0.97 $264.96 14.8M
AWS EC2
inf1.large
spot 0.0095 $0.1104 $0.000000291 $0.29 $79.49 4.44M
24. DataTalks.Club
• C5 Large Instance - 2 vCPU 4GB RAM
• AWS Lambda
• 3GB RAM x 0.00001667 x 3600 = 0.18$ per hour
• AWS Fargate
• 4GB RAM x 0.0044 + 2 vCPU x 0.0404 = 0.098$ per hour
• AWS Batch
• C5 Large On Demand = 0.085$ per hour
• C5 Large Spot = 0.033$ per hour
Price comparison - CPU
25. DataTalks.Club
• P2 Xlarge Instance - 1 NVIDIA K80 GPU, 4 vCPU
• Amazon SageMaker (recently reduced prices up to 18%)
• P2 Xlarge ML instance = 1.12$ per hour
• P2 Xlarge ML instance Spot = 0.33$ per hour
• AWS Batch
• P2 Xlarge On Demand = 0.90$ per hour
• P2 Xlarge Reserved = 0.42$ per hour
• P2 Xlarge Spot = 0.27$ per hour
Price comparison - GPU
29. DataTalks.Club
Platform-as-a-Service
On premise Iaas CaaS PaaS FaaS SaaS
Functions Functions Functions Functions Functions Functions
Application Application Application Application Application Application
Runtime Runtime Runtime Runtime Runtime Runtime
Container Container Container Container Container Container
Operating system Operating system Operating system Operating system Operating system Operating system
Vizualization Vizualization Vizualization Vizualization Vizualization Vizualization
Networking Networking Networking Networking Networking Networking
Storage Storage Storage Storage Storage Storage
Hardware Hardware Hardware Hardware Hardware Hardware
30. DataTalks.Club
Rest API Event queue Orchestrator
Synchronous process
Short-term process
Simple intermediate logic
Doesn’t trace the whole
process
Cheap
Asynchronous process
Long-term process
Simple intermediate logic
Doesn’t trace the whole
process
Cheap
Asynchronous process
Long-term process
Complex intermediate
logic
Traces the process
Expensive
Microservice connectors
31. DataTalks.Club
• Native support for FaaS and CaaS
• Central monitoring
• Central logging and tracing
• On-demand scaling*
Cloud native orchestrators
32. DataTalks.Club
• Graph-based workflow
• Processing nodes - support for Fargate, ECS,
SageMaker, Lambda, Batch, Glue, EMR
• Logic for custom error handling
• Parallel dynamic execution
• Branching and loops
• Scheduler and waiter
• Pay-as-you-go ($0.025 /1,000 state transitions)
Amazon Step Functions
33. DataTalks.Club
• DAG-based workflow
• Processing nodes - SageMaker
Processing, Training, Data Brew
• Logic for custom error handling
• Data Lineage Tracking
• Human review step
• Free*
SageMaker Pipelines
34. DataTalks.Club
• DAG-based workflow (Airflow)
• Processing nodes - anything which
Airflow can support (works with
plugins)
• High flexibility
• Doesn’t scale automatically
• Pay per instance-time
Managed Apache Airflow (MWAA)
35. DataTalks.Club
Cloud native orchestrators
AWS Step Functions SageMaker Pipelines MWAA (Airflow)
Type PaaS
PaaS
Hosted Airflow
Pros Scales automatically
Integrations with many AWS
services
Pay-as-you-go
Scales automatically
Automatically track model lineage
Free
Easy to run locally
Extremely flexible with plugins
Cons Can’t run in local environment
Manually handle pipeline as
artifact
Can’t run in local environment
Only has integrations with
SageMaker services
Scales manually
Pay per instance time
Manually handle pipeline as
artifact
Use
cases
Integration with production
infrastructure
Research and semi production
workflows which require more
validation
Integration with complex
production infrastructure
36. DataTalks.Club
• Use scalable processing nodes AWS Lambda for short/parallel
processing
• Use scalable container service AWS Batch for heavy and parallel
processing and GPU training jobs
• Use Amazon SageMaker for GPU training jobs and distributed training
• Use scalable container service AWS Fargate for long running
processing
• Use orchestrator AWS Step Functions to organize workflows
Serverless approach
37. DataTalks.Club
Task
• Getting and transforming data from multiple sources
Challenges
• Combination of multiple frameworks and libraries
• Scaling based on load
• Combination of heavy processing, long running processing and
parallel one
Data preprocessing
38. DataTalks.Club
• SageMaker processing jobs
for heavy processing
• Modular approach
• Parallel data download and
parsing
• FaaS for parallel
processing
SageMaker
Data download
Scalable
processing
Amazon SageMaker
39. DataTalks.Club
• Manual handling of input/
output data
• FaaS for parallel processing
• CaaS for heavy processing
• Modular approach
• Parallel data download and
parsing
Batch/Fargate
Data download
Scalable
processing
AWS Batch/Fargate
S3
S3
input data
output
data
40. DataTalks.Club
How do you know if this is for you
• You have peak loads and want to scale automatically
• You have custom logic (scheduler, error handling, etc) in business
logic
• You want to make customizable pipeline with multiple frameworks
41. DataTalks.Club
How do you know if this is NOT for you
• You need to run synchronous data processing workflows
=> in this case calling AWS Lambda or cluster is easier
• You have low CPU/RAM consuming workflow and want to optimize
costs
=> in this case SQS/Kinesis + AWS Lambda is a cheaper solution
43. DataTalks.Club
Tasks
• Training and publishing the model
• Checking multiple sets of hyperparameters
• Handling semi-automatic logic
Challenges
• High cost of GPU instances
• Higher level of uncertainty compared to normal Software Engineering
ML/DL training
44. DataTalks.Club
• Automatic handling of hyper
parameters and metrics
• Automatic handling of model and
input data
• Automatic hyperparameters
optimization
• Automatic checkpoint handling
• Handling error on each branch
• Distributed training
Preprocessor
SageMaker
Mapper
Handler
Amazon SageMaker
45. DataTalks.Club
• Parallel training on multiple
sets of hyper parameters
• Central gathering of the
results
• Handling error on each
branch
• Capability for feedback loop
• Test after training
ML
Preprocessor
ML
ML
Batch
Mapper
Publisher
AWS Batch
S3
S3
parameters
checkpoint
data
model
46. DataTalks.Club
• Integrating production
cloud environment with
on-premise
infrastructure
• Preparing data and
providing access
• Handle publishing
completed model
Preprocessor
External infrastructure
S3
S3
Handler
Async task External
GPU
model
metrics
parameters
data
47. DataTalks.Club
How do you know if this is for you
• You have peak loads and want to scale automatically
• You need to run training jobs occasionally and want to minimize idle
time
• You have custom logic (scheduler, error handling, etc) in business
logic
• You want to integrate external infrastructure or multiple AWS
services
48. DataTalks.Club
How do you know if this is NOT for you
• You need to run synchronous model training workflows
=> in this case using cluster is easier as Step Functions don’t support
synchronous workflows
• You need to maximize training speed
=> in this case using cluster minimizes start up time
50. DataTalks.Club
Task
• Making predictions based on new incoming data
Challenges
• Handling production requirements - latency/load/cost
• Handling multiple frameworks
• Handling model versioning
• Implementing custom logic for choosing the result
ML/DL inference
55. DataTalks.Club
• A/B testing/multi-armed bandit
to rollout new models
• Scalable inference which
allows to run batches in parallel
• Allows modular approach
(multiple frameworks)
Post processor
Preprocessor/feature extractor
Gather
data
Inference A Inference B
ML/DL inference pipeline
56. DataTalks.Club
Import from S3:
•Keras - h5 files
•TensorFlow - pb/ckpt files
•PyTorch - path files
Models in package:
•TensorFlow - TFlite export
•PyTorch - ONNX export
•OpenVino export
How to import models
59. DataTalks.Club
Lifehacks for serverless inference
• Store model in memory for warm invocations
• Use AWS EFS for storing the model
• Store part of the model with the libraries
• Download model in parallel from storage
• Separate layers on multiple lambdas and chain them
• Batch the workload
• Balance RAM/Timeout to optimize your costs
60. DataTalks.Club
How do you know if this is for you
• You want to deploy your model for pet project
• You want to make s simple MVP for your startup/project
• You have simple model and this architecture will reduce cost
• You have peak loads and it is hard to manage clusters
61. DataTalks.Club
How do you know if this is NOT for you
• You want to have real time response
• Your model requires a lot of data
• Your model requires a lot of processing power
• You want to handle large number of requests (>10M per month)
=> in this case cluster would be more suitable approach
62. DataTalks.Club
Repositories to check
https://github.com/ryfeus/lambda-packs https://github.com/ryfeus/gcf-packs
• Packages for AWS Lambda and Google Cloud Functions including:
• Tensorflow (including 2.0), PyTorch - Deep Learning
• Scikit Learn, LightGBM, H2O - Machine Learning
• Scikit Image, Scipy, OpenCV, Tesseract - Image processing
• Spacy - Natural Language Processing
63. DataTalks.Club
• Cloud native orchestrators are convenient for constructing
scalable end-to-end deep learning pipelines
• There are multiple services at your disposal for constructing deep
learning workflow and it depends on your context
• You can deploy this kind of workflows pretty easily even for
research projects
Summary
64. DataTalks.Club
Thank you!
Packages for AWS Lambda and Google Cloud Functions
https://github.com/ryfeus/lambda-packs
https://github.com/ryfeus/gcf-packs
Infrastructure configuration files for AWS Step Functions, AWS Batch,
AWS Fargate, Amazon Sagemaker
https://github.com/ryfeus/stepfunctions2processing
Link to my website: https://ryfeus.io