SlideShare a Scribd company logo
Auto-scaling in the cloud: a
case study
2013-11-29
David Veksler
Case Study: MediaHub media sharing SOA
portal
Architecture Overview
Amazon Cloud
Native Client Apps
MediaHub iOS App
External AD Apps
Spotlight Classroom AppMediaHub Android App MediaHub Windows Agent (Video)
Web Application
REST API
(staff upload / parents view media)
Media Editing/Review UI
(for DOS)
Media Upload Web UI
(for CC, Teachers)
AWS Metadata Store
Amazon RDS DB
(MS SQL)
AWS S3 File Store
CN Media Storage
(Beijing)
RU Media Storage
(Ireland)
ID Media Storage
(Singapore)
Corp DC
CRM Platform
CRM Service
CRM DB
SOA Clients
EFP Website
EFP Apps
Media Pipeline
Ffmpeg Video File Encoder Image Processing/Optimization
Media Metadata/
File Persistence Services
Sample app in AWS: MediaHub
Services currently used:
• EC2 for MediaHub Web Website
• RDS for SQL Server DB
• S3 for Media Storage
• CloudFront for CDN
• SES for Email Notifications
• Route 53 for DNS
• Elastic Beanstalk for configuration
management
Could also use:
• Elastic Transcoder for media encoding
• Lambda for web services
• SQS for media processing pipeline
• S3 for website
1: Using Amazon Elastic
Beanstalk to push and update
applications to the cloud
Amazon Elastic Beanstalk = PaaS (Platform as a Service)
Auto scaling websites in the cloud
Auto scaling websites in the cloud
Auto scaling websites in the cloud
Auto scaling websites in the cloud
Auto scaling websites in the cloud
Auto scaling websites in the cloud
Auto scaling websites in the cloud
Auto scaling websites in the cloud
Auto scaling websites in the cloud
Auto scaling websites in the cloud
Auto scaling websites in the cloud
Auto scaling websites in the cloud
Auto scaling websites in the cloud
Auto scaling websites in the cloud
Auto scaling websites in the cloud
Auto scaling websites in the cloud
Auto scaling websites in the cloud
Auto scaling websites in the cloud
Auto scaling websites in the cloud
Auto scaling websites in the cloud
Auto scaling websites in the cloud
Auto scaling websites in the cloud
Cross-server login cookies/view state
Auto scaling websites in the cloud
Auto scaling websites in the cloud
Auto scaling websites in the cloud
Auto scaling websites in the cloud
Auto scaling websites in the cloud
Auto scaling websites in the cloud
Auto scaling websites in the cloud
Auto scaling websites in the cloud
Auto scaling websites in the cloud
Auto scaling websites in the cloud
Auto scaling websites in the cloud
Auto scaling websites in the cloud
Auto scaling websites in the cloud
Auto scaling websites in the cloud
Auto scaling websites in the cloud
How not to do cloud
Auto scaling websites in the cloud
Don’t set up pre-defined servers to match
physical architecture
Don’t set up pre-defined servers to match
physical architecture
Don’t maintain development environments
• Traditional release management:
• V2.0 => Development -> QA -> Staging - > Live
• Copy/deploy code to persistent servers
• Cloud-optimized release management:
• V2.0=> DEV.company.com -> QA.company.com -> www.company.com
• Create a new environment for each version
• Promote new ELB cloud in DNS to launch new version
Don’t use EBS volumes as a permanent data
store
Don’t use EBS (VM) for storage
As an example, volumes that operate with 20
GB or less of modified data since their most
recent Amazon EBS snapshot can expect an
annual failure rate (AFR) of between 0.1% –
0.5%, where failure refers to a complete loss of
the volume. This compares with commodity
hard disks that will typically fail with an AFR of
around 4%, making EBS volumes 10 times
more reliable than typical commodity disk
drives.
Don’t set up dedicated VMs if a cloud service
exists
Amazon Services Used
• Amazon EC2 virtual machines for Application Servers: at least 2 per region
• Amazon EC2 Virtual Appliance: 1 instance per region (JPEGmini service)
• Amazon RDS (SQL Server): 1 instance per region
• Amazon S3: One bucket per country
• Amazon Load Balancer: one per region
• Amazon Elastic IP: one per region
• Amazon SES: Email Service for Sharing media & weekly reports
• Amazon CloudFront CDN: TBD
• Amazon Transcoder Video Conversion: TBD
How to do cloud
Automate your infrastructure
• Create robust configurations that survive a reboot/re-start/re-deploy
• Elastic (virtual) IP
• Deploy builds to S3
• “Create a Self Healing and Self-discoverable environment which is
more resilient to hardware failure” – AWS Best Practices
Control permissions
• Create new users in IAM for every service which needs to access AWS
services giving them only needed permissions
• Use SSL
• Don’t embed access keys in clients, get them dynamically over SSL
• Use X.509 certificates for authentication
Automate configuration
Design for redundancy and parallelism
• Multi-thread your Amazon S3 requests as detailed in Best practices
paper
• Multi-thread your Amazon SimpleDB GET and BATCHPUT requests
• Create a JobFlow using the Amazon Elastic MapReduce Service for
each of your daily batch processes
• (indexing, log analysis etc.) which will compute the job in parallel and
save time.
• Use the Elastic Load Balancing service and spread your load across
multiple web app servers dynamically
Conclusion
• Cloud is not for all applications
• Cloud costs more if you use it like traditional physical hardware.
• Cloud architecture is merger of infrastructure and software
architecture. Applications must be designed for cloud-scale.

More Related Content

Auto scaling websites in the cloud

  • 1. Auto-scaling in the cloud: a case study 2013-11-29 David Veksler
  • 2. Case Study: MediaHub media sharing SOA portal
  • 3. Architecture Overview Amazon Cloud Native Client Apps MediaHub iOS App External AD Apps Spotlight Classroom AppMediaHub Android App MediaHub Windows Agent (Video) Web Application REST API (staff upload / parents view media) Media Editing/Review UI (for DOS) Media Upload Web UI (for CC, Teachers) AWS Metadata Store Amazon RDS DB (MS SQL) AWS S3 File Store CN Media Storage (Beijing) RU Media Storage (Ireland) ID Media Storage (Singapore) Corp DC CRM Platform CRM Service CRM DB SOA Clients EFP Website EFP Apps Media Pipeline Ffmpeg Video File Encoder Image Processing/Optimization Media Metadata/ File Persistence Services
  • 4. Sample app in AWS: MediaHub Services currently used: • EC2 for MediaHub Web Website • RDS for SQL Server DB • S3 for Media Storage • CloudFront for CDN • SES for Email Notifications • Route 53 for DNS • Elastic Beanstalk for configuration management Could also use: • Elastic Transcoder for media encoding • Lambda for web services • SQS for media processing pipeline • S3 for website
  • 5. 1: Using Amazon Elastic Beanstalk to push and update applications to the cloud Amazon Elastic Beanstalk = PaaS (Platform as a Service)
  • 44. How not to do cloud
  • 46. Don’t set up pre-defined servers to match physical architecture
  • 47. Don’t set up pre-defined servers to match physical architecture
  • 48. Don’t maintain development environments • Traditional release management: • V2.0 => Development -> QA -> Staging - > Live • Copy/deploy code to persistent servers • Cloud-optimized release management: • V2.0=> DEV.company.com -> QA.company.com -> www.company.com • Create a new environment for each version • Promote new ELB cloud in DNS to launch new version
  • 49. Don’t use EBS volumes as a permanent data store
  • 50. Don’t use EBS (VM) for storage As an example, volumes that operate with 20 GB or less of modified data since their most recent Amazon EBS snapshot can expect an annual failure rate (AFR) of between 0.1% – 0.5%, where failure refers to a complete loss of the volume. This compares with commodity hard disks that will typically fail with an AFR of around 4%, making EBS volumes 10 times more reliable than typical commodity disk drives.
  • 51. Don’t set up dedicated VMs if a cloud service exists
  • 52. Amazon Services Used • Amazon EC2 virtual machines for Application Servers: at least 2 per region • Amazon EC2 Virtual Appliance: 1 instance per region (JPEGmini service) • Amazon RDS (SQL Server): 1 instance per region • Amazon S3: One bucket per country • Amazon Load Balancer: one per region • Amazon Elastic IP: one per region • Amazon SES: Email Service for Sharing media & weekly reports • Amazon CloudFront CDN: TBD • Amazon Transcoder Video Conversion: TBD
  • 53. How to do cloud
  • 54. Automate your infrastructure • Create robust configurations that survive a reboot/re-start/re-deploy • Elastic (virtual) IP • Deploy builds to S3 • “Create a Self Healing and Self-discoverable environment which is more resilient to hardware failure” – AWS Best Practices
  • 55. Control permissions • Create new users in IAM for every service which needs to access AWS services giving them only needed permissions • Use SSL • Don’t embed access keys in clients, get them dynamically over SSL • Use X.509 certificates for authentication
  • 57. Design for redundancy and parallelism • Multi-thread your Amazon S3 requests as detailed in Best practices paper • Multi-thread your Amazon SimpleDB GET and BATCHPUT requests • Create a JobFlow using the Amazon Elastic MapReduce Service for each of your daily batch processes • (indexing, log analysis etc.) which will compute the job in parallel and save time. • Use the Elastic Load Balancing service and spread your load across multiple web app servers dynamically
  • 58. Conclusion • Cloud is not for all applications • Cloud costs more if you use it like traditional physical hardware. • Cloud architecture is merger of infrastructure and software architecture. Applications must be designed for cloud-scale.

Editor's Notes

  1. Merge code to release branch
  2. Re-publish application to cloud
  3. Sync machine keys to share login sessions between servers
  4. Applications should scale Don’t try to fit everything on one server