SlideShare a Scribd company logo
Failsafe Mechanism for
Yahoo Homepage
Using Apache Storm & Apache Traffic Server
Pushkar Sachdeva (psachdev@yahoo-inc.com)
Kit Chan (kichan@yahoo-inc.com)
05/2016
Failsafe Mechanism for Yahoo Homepage
Failsafe
“A fail-safe or fail-secure device is one that, in the event of a
specific type of failure, responds in a way that will cause no
harm, or at least a minimum of harm, to other devices or to
personnel”
Overall Architecture
Yahoo! Presentation, Confidential
Browser
ELB
EC2 ATS
S3
Property ATS
Property
Serving Stack
Crawler on Storm
AWSYahoo
Auto activate Failsafe
Switch traffic to AWS
Offstage Data Flow
Online Request Flow
Normal Operation
Online Request Flow
Failsafe Mode
AWS Failsafe Stack Architecture
Elastic Load
Balancer
S3 Bucket
Security Group
ATS EC2
Instances
ATS
Server
VPC
Availability Zone #1
ATS EC2
Instances
ATS
Server
Availability Zone #2
Region (US W Oregon)
Region (US E North Virginia)
Region (Ireland)
Region (Singapore)
S3 Replication across regions
Cloud watch
Crawled data
from Yahoo
https
http
EC2 Instance - ATS
● Instance (amazon linux)
○ t2.large - burstable
○ 2 vCPUs/8GB RAM/1 gbps network
● Apache Traffic Server
○ For caching
■ Negative caching enabled
■ Ramdisk used
○ Health Check/S3 Authentication plugin
○ Lua plugin
■ Query Parameters Sorting
■ Simple Device Detection
■ Error handling
● Cloudwatch Log Agent/Monitoring Scripts
● Autoscaling based on # of incoming requests
● Deployment Mechanism using Terraform / Packer
ATS
4Gb ramdisk
cache
Amazon Linux
Cloudwatch
Agent
Cloudwatch
Monitoring Scripts
Lua script example - sorting query parameters
function do_remap()
local query = ts.client_request.get_uri_args()
if (query ~= nil and query ~= '') then
local result = {}
local i = 1
for value in query:gmatch '([^&]*)' do
if (value ~= '') then
result [i] = value
i = i + 1
end
end
table.sort(result)
local sorted_query = table.concat(result, '&')
ts.client_request.set_uri_args(sorted_query)
end
end
Cloudwatch Log Agent Conf
# /etc/awslogs/awslogs.conf
# Custom ATS log enabled and in /usr/local/var/log/trafficserver/mon
[monlog]
datetime_format = %Y-%m-%d %H:%M:%S
file = /usr/local/var/log/trafficserver/mon.*
buffer_duration = 5000
log_stream_name = {instance_id}
initial_position = start_of_file
log_group_name = monlog
Perl Script calling Cloudwatch Monitoring Lib
+ if ($report_chr) {
+ my $result = `/usr/local/bin/traffic_line -r proxy.node.cache_hit_ratio_avg_10s`;
+ add_metric('CacheHitRatio', 'Percent', 100 * $result);
+ }
+ if ($report_tef) {
+ my $connect_failed = `/usr/local/bin/traffic_line -r proxy.node.http.transaction_frac_avg_10s.errors.connect_failed`;
+ my $aborts = `/usr/local/bin/traffic_line -r proxy.node.http.transaction_frac_avg_10s.errors.aborts`;
+ my $possible_aborts = `/usr/local/bin/traffic_line -r proxy.node.http.transaction_frac_avg_10s.errors.possible_aborts`;
+ my $pre_accept_hangups = `/usr/local/bin/traffic_line -r proxy.node.http.transaction_frac_avg_10s.errors.
pre_accept_hangups`;
+ my $early_hangups = `/usr/local/bin/traffic_line -r proxy.node.http.transaction_frac_avg_10s.errors.early_hangups`;
+ my $empty_hangups = `/usr/local/bin/traffic_line -r proxy.node.http.transaction_frac_avg_10s.errors.empty_hangups`;
+ my $other = `/usr/local/bin/traffic_line -r proxy.node.http.transaction_frac_avg_10s.errors.other`;
+
+ add_metric('TransErrorFraction', 'Percent', 100 * ($connect_failed + $aborts + $possible_aborts +
$pre_accept_hangups + $early_hangups + $empty_hangups + $other));
+ }
Cloudwatch Dashboard
AWS Autoscaling - Terraform Configuration File
resource "aws_autoscaling_group" "fsfb_base_load" {
availability_zones = ["${split(",", var.zones)}"]
name = "${var.env}_fsfb_base_load-${aws_launch_configuration.fsfb_ats.name}"
load_balancers = ["${aws_elb.fsfb_elb.name}"]
max_size = 8
min_size = 2
health_check_grace_period = 180
health_check_type = "ELB"
desired_capacity = 2
launch_configuration = "${aws_launch_configuration.fsfb_ats.name}"
force_delete = true
wait_for_elb_capacity = 2
lifecycle {
create_before_destroy = true
}
}
AWS Autoscaling - Terraform Configuration File
(Cont’d)
resource "aws_autoscaling_policy" "fsfb_scale_out_med" {
name = "${var.env}_fsfb_scale_out_med"
scaling_adjustment = 8
adjustment_type = "ExactCapacity"
cooldown = 300
autoscaling_group_name = "${aws_autoscaling_group.fsfb_base_load.name}"
}
AWS Autoscaling - Terraform Configuration File
(Cont’d)
resource "aws_cloudwatch_metric_alarm" "fsfb_upper_medium_rps" {
alarm_name = "${var.env}_fsfb_upper_medium_rps"
comparison_operator = "GreaterThanOrEqualToThreshold"
evaluation_periods = "1"
period = "60"
metric_name = "RequestCount"
namespace = "AWS/ELB"
statistic = "Sum"
threshold = "75000"
dimensions {
LoadBalancerName = "${aws_elb.fsfb_elb.name}"
}
alarm_description = "This metric monitors medium elb traffic"
alarm_actions = ["${aws_autoscaling_policy.fsfb_scale_out_med.arn}", "${var.sns_email_topic}"]
}
Escalate Plugin in Apache Traffic Server (ATS)
● ATS is a proxy server that sits between the user and the origin server
● ‘Escalate’ is an ATS plugin that fetches content from failsafe servers when the
origin server fails to provide a ‘good’ response.
ATS Origin ServerUser
Escalate Plugin in ATS (Continued)
● ‘Escalate’ is a remap plugin -
map http://games.yahoo.com/ http://some_origin.yahoo.com/ @plugin=ats_escalate.so
@pparam=some_label
● Loads global configuration with ‘label’ definitions
● Sample ‘label’ definition -
"some_label" : {
"enable" : 1,
"response" : {
"500" : {
"mode" : "url",
"url" : "http://brb.yahoo.net/$h/$d/$p$x"
}
}
}
Escalate Plugin in ATS (Continued)
● Runs in ‘READ_RESPONSE_HDR_HOOK’
● Uses 'TSHttpTxnRedirectUrlSet’ to fetch content from failsafe servers
if (EscalateLabel::ACTION_URL == entry->second.mode) {
std::string content;
MyExpander expander(txn, entry->second.url);
if (!expander(entry->second.url, config->get_device_type_header(), config->get_default_device_type())) {
TSError("[" PLUGIN_TAG "] invalid expansion");
TSDebug(PLUGIN_TAG, "invalid expansion");
goto finish;
}
expander.swap(content);
url_str = TSstrdup(content.c_str());
length = content.size();
if (url_str) {
TSHttpTxnRedirectUrlSet(txn, url_str, length); // Transfers ownership
}
}
Apache Storm Crawler
● Based on scalable Apache Storm platform
● Topology
● Spouts
● Bolts
Spout
Bolt
Spout
Bolt
Bolt
Apache Storm Crawler (Continued)
Simplified Topology
Cron
Feeder
Changelog
Feeder
IndexUrl
Config
Fetcher
Url
Fetcher
Memory
Storage
Writer
Response
Processor
Response
Uploader
Custom Event
Queue UpdaterCustom Event
Queue Feeder
Apache Storm Crawler (Continued)
● Crawls content for desktop, smartphone and tablet
● Supports domain level configuration for request headers, query params and
output storage.
● Failsafe url path mapping example -
Mapping: http://{failsafe_host}/{original_domain}/{device}/{path};
{sorted_query_params_as_matrix_params}
URL: https://www.yahoo.com/news/trump-unveils-foreign-policy-plan-201628138.html?q=1&a=2
S3 file path: http://brb.yahoo.net/www.yahoo.com/smartphone/news/trump-unveils-foreign-policy-
plan-201628138.html;a=2;q=1
High Level Architecture
Proxy Router Proxy Cache Origin Server
Failsafe Crawler
AWS storage
1
10
5
4
3
2
9
8 7
6
User
7
6
4
3
5
2
1
PUT
Offline Crawler Request Flow
User Request Flow
Optional Request Flow to fetch
failsafe content
Benefits
● No manual intervention needed to serve failsafe content
● Granular control
● More relevant content is shown to user
● Failsafe content is cached in proxy layer
Pitfalls/Limitations
● Lagging Crawler
● Handling additional Crawler traffic
● Bucket specific experience
● Malformed Page
Future on Resiliency - multi-cloud for failsafe
● Additional Cloud Vendor
○ E.g. Google Cloud Platform
○ S3 vs Google Cloud Storage
○ EC2/ELB vs Google Compute Engine
○ Cloudwatch vs StackDriver
● Changes in Apache Storm Crawler
○ Can use Apache jclouds to create objects in storage in S3 or Google Cloud Storage
● Changes in deployment using terraform / configuration using chef
○ GCP & AWS are supported
● Route 53 can be used to do failover to GCP
Future on Resiliency
● Speculative Retry
void SpeculativeRetryPlugin::handleInputComplete()
{
orig_url_ = transaction_.getClientRequest().getUrl().getUrlString();
//fetch original request
sendFetchRequest(orig_url_, false);
//start a timer which would give a callback after ‘time_’ msecs
Async::execute<AsyncTimer>(this, new AsyncTimer(AsyncTimer::TYPE_ONE_OFF, time_), getMutex());
}
void SpeculativeRetryPlugin::handleAsyncComplete(AsyncTimer &async_timer)
{
async_timer.cancel();
//active_fetch keeps track if we have received the response of original request yet or not
//if not initiate a retry request
if(!active_fetch_) {
sendFetchRequest(orig_url_, true);
}
}
Thank you. Questions?

More Related Content

What's hot

PostgreSQL High-Availability and Geographic Locality using consul
PostgreSQL High-Availability and Geographic Locality using consulPostgreSQL High-Availability and Geographic Locality using consul
PostgreSQL High-Availability and Geographic Locality using consul
Sean Chittenden
 
Using ngx_lua in UPYUN 2
Using ngx_lua in UPYUN 2Using ngx_lua in UPYUN 2
Using ngx_lua in UPYUN 2
Cong Zhang
 
ProxySQL & PXC(Query routing and Failover Test)
ProxySQL & PXC(Query routing and Failover Test)ProxySQL & PXC(Query routing and Failover Test)
ProxySQL & PXC(Query routing and Failover Test)
YoungHeon (Roy) Kim
 
Amazon EC2 Container Service in Action
Amazon EC2 Container Service in ActionAmazon EC2 Container Service in Action
Amazon EC2 Container Service in Action
Remotty
 
Using ngx_lua in upyun 2
Using ngx_lua in upyun 2Using ngx_lua in upyun 2
Using ngx_lua in upyun 2
OpenRestyCon
 
[오픈소스컨설팅] 프로메테우스 모니터링 살펴보고 구성하기
[오픈소스컨설팅] 프로메테우스 모니터링 살펴보고 구성하기[오픈소스컨설팅] 프로메테우스 모니터링 살펴보고 구성하기
[오픈소스컨설팅] 프로메테우스 모니터링 살펴보고 구성하기
Ji-Woong Choi
 
How to create a secured cloudera cluster
How to create a secured cloudera clusterHow to create a secured cloudera cluster
How to create a secured cloudera cluster
Tiago Simões
 
kubernetes practice
kubernetes practicekubernetes practice
kubernetes practice
wonyong hwang
 
How to create a multi tenancy for an interactive data analysis
How to create a multi tenancy for an interactive data analysisHow to create a multi tenancy for an interactive data analysis
How to create a multi tenancy for an interactive data analysis
Tiago Simões
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
Sematext Group, Inc.
 
DEVIEW - 오픈소스를 활용한 분산아키텍처 구현기술
DEVIEW - 오픈소스를 활용한 분산아키텍처 구현기술DEVIEW - 오픈소스를 활용한 분산아키텍처 구현기술
DEVIEW - 오픈소스를 활용한 분산아키텍처 구현기술
John Kim
 
Automated Java Deployments With Rpm
Automated Java Deployments With RpmAutomated Java Deployments With Rpm
Automated Java Deployments With Rpm
Martin Jackson
 
2017-03-11 02 Денис Нелюбин. Docker & Ansible - лучшие друзья DevOps
2017-03-11 02 Денис Нелюбин. Docker & Ansible - лучшие друзья DevOps2017-03-11 02 Денис Нелюбин. Docker & Ansible - лучшие друзья DevOps
2017-03-11 02 Денис Нелюбин. Docker & Ansible - лучшие друзья DevOps
Омские ИТ-субботники
 
Consul - service discovery and others
Consul - service discovery and othersConsul - service discovery and others
Consul - service discovery and others
Walter Liu
 
Making Your Capistrano Recipe Book
Making Your Capistrano Recipe BookMaking Your Capistrano Recipe Book
Making Your Capistrano Recipe Book
Tim Riley
 
Red hat lvm cheatsheet
Red hat   lvm cheatsheetRed hat   lvm cheatsheet
Red hat lvm cheatsheet
Prakash Ghosh
 
MySQL Audit using Percona audit plugin and ELK
MySQL Audit using Percona audit plugin and ELKMySQL Audit using Percona audit plugin and ELK
MySQL Audit using Percona audit plugin and ELK
YoungHeon (Roy) Kim
 
Nomad + Flatcar: a harmonious marriage of lightweights
Nomad + Flatcar: a harmonious marriage of lightweightsNomad + Flatcar: a harmonious marriage of lightweights
Nomad + Flatcar: a harmonious marriage of lightweights
Iago López Galeiras
 
How to configure a hive high availability connection with zeppelin
How to configure a hive high availability connection with zeppelinHow to configure a hive high availability connection with zeppelin
How to configure a hive high availability connection with zeppelin
Tiago Simões
 
실시간 서비스 플랫폼 개발 사례
실시간 서비스 플랫폼 개발 사례실시간 서비스 플랫폼 개발 사례
실시간 서비스 플랫폼 개발 사례
John Kim
 

What's hot (20)

PostgreSQL High-Availability and Geographic Locality using consul
PostgreSQL High-Availability and Geographic Locality using consulPostgreSQL High-Availability and Geographic Locality using consul
PostgreSQL High-Availability and Geographic Locality using consul
 
Using ngx_lua in UPYUN 2
Using ngx_lua in UPYUN 2Using ngx_lua in UPYUN 2
Using ngx_lua in UPYUN 2
 
ProxySQL & PXC(Query routing and Failover Test)
ProxySQL & PXC(Query routing and Failover Test)ProxySQL & PXC(Query routing and Failover Test)
ProxySQL & PXC(Query routing and Failover Test)
 
Amazon EC2 Container Service in Action
Amazon EC2 Container Service in ActionAmazon EC2 Container Service in Action
Amazon EC2 Container Service in Action
 
Using ngx_lua in upyun 2
Using ngx_lua in upyun 2Using ngx_lua in upyun 2
Using ngx_lua in upyun 2
 
[오픈소스컨설팅] 프로메테우스 모니터링 살펴보고 구성하기
[오픈소스컨설팅] 프로메테우스 모니터링 살펴보고 구성하기[오픈소스컨설팅] 프로메테우스 모니터링 살펴보고 구성하기
[오픈소스컨설팅] 프로메테우스 모니터링 살펴보고 구성하기
 
How to create a secured cloudera cluster
How to create a secured cloudera clusterHow to create a secured cloudera cluster
How to create a secured cloudera cluster
 
kubernetes practice
kubernetes practicekubernetes practice
kubernetes practice
 
How to create a multi tenancy for an interactive data analysis
How to create a multi tenancy for an interactive data analysisHow to create a multi tenancy for an interactive data analysis
How to create a multi tenancy for an interactive data analysis
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
 
DEVIEW - 오픈소스를 활용한 분산아키텍처 구현기술
DEVIEW - 오픈소스를 활용한 분산아키텍처 구현기술DEVIEW - 오픈소스를 활용한 분산아키텍처 구현기술
DEVIEW - 오픈소스를 활용한 분산아키텍처 구현기술
 
Automated Java Deployments With Rpm
Automated Java Deployments With RpmAutomated Java Deployments With Rpm
Automated Java Deployments With Rpm
 
2017-03-11 02 Денис Нелюбин. Docker & Ansible - лучшие друзья DevOps
2017-03-11 02 Денис Нелюбин. Docker & Ansible - лучшие друзья DevOps2017-03-11 02 Денис Нелюбин. Docker & Ansible - лучшие друзья DevOps
2017-03-11 02 Денис Нелюбин. Docker & Ansible - лучшие друзья DevOps
 
Consul - service discovery and others
Consul - service discovery and othersConsul - service discovery and others
Consul - service discovery and others
 
Making Your Capistrano Recipe Book
Making Your Capistrano Recipe BookMaking Your Capistrano Recipe Book
Making Your Capistrano Recipe Book
 
Red hat lvm cheatsheet
Red hat   lvm cheatsheetRed hat   lvm cheatsheet
Red hat lvm cheatsheet
 
MySQL Audit using Percona audit plugin and ELK
MySQL Audit using Percona audit plugin and ELKMySQL Audit using Percona audit plugin and ELK
MySQL Audit using Percona audit plugin and ELK
 
Nomad + Flatcar: a harmonious marriage of lightweights
Nomad + Flatcar: a harmonious marriage of lightweightsNomad + Flatcar: a harmonious marriage of lightweights
Nomad + Flatcar: a harmonious marriage of lightweights
 
How to configure a hive high availability connection with zeppelin
How to configure a hive high availability connection with zeppelinHow to configure a hive high availability connection with zeppelin
How to configure a hive high availability connection with zeppelin
 
실시간 서비스 플랫폼 개발 사례
실시간 서비스 플랫폼 개발 사례실시간 서비스 플랫폼 개발 사례
실시간 서비스 플랫폼 개발 사례
 

Similar to Failsafe Mechanism for Yahoo Homepage

PuppetDB: Sneaking Clojure into Operations
PuppetDB: Sneaking Clojure into OperationsPuppetDB: Sneaking Clojure into Operations
PuppetDB: Sneaking Clojure into Operations
grim_radical
 
DevOps Fest 2019. Сергей Марченко. Terraform: a novel about modules, provider...
DevOps Fest 2019. Сергей Марченко. Terraform: a novel about modules, provider...DevOps Fest 2019. Сергей Марченко. Terraform: a novel about modules, provider...
DevOps Fest 2019. Сергей Марченко. Terraform: a novel about modules, provider...
DevOps_Fest
 
DVWA BruCON Workshop
DVWA BruCON WorkshopDVWA BruCON Workshop
DVWA BruCON Workshop
testuser1223
 
Applications secure by default
Applications secure by defaultApplications secure by default
Applications secure by default
Slawomir Jasek
 
Applications secure by default
Applications secure by defaultApplications secure by default
Applications secure by default
SecuRing
 
Sherlock Homepage (Maarten Balliauw)
Sherlock Homepage (Maarten Balliauw)Sherlock Homepage (Maarten Balliauw)
Sherlock Homepage (Maarten Balliauw)
Visug
 
Sherlock Homepage - A detective story about running large web services (VISUG...
Sherlock Homepage - A detective story about running large web services (VISUG...Sherlock Homepage - A detective story about running large web services (VISUG...
Sherlock Homepage - A detective story about running large web services (VISUG...
Maarten Balliauw
 
Reversing Engineering a Web Application - For fun, behavior and detection
Reversing Engineering a Web Application - For fun, behavior and detectionReversing Engineering a Web Application - For fun, behavior and detection
Reversing Engineering a Web Application - For fun, behavior and detection
Rodrigo Montoro
 
Terraform 0.9 + good practices
Terraform 0.9 + good practicesTerraform 0.9 + good practices
Terraform 0.9 + good practices
Radek Simko
 
Sherlock Homepage - A detective story about running large web services - NDC ...
Sherlock Homepage - A detective story about running large web services - NDC ...Sherlock Homepage - A detective story about running large web services - NDC ...
Sherlock Homepage - A detective story about running large web services - NDC ...
Maarten Balliauw
 
The Atmosphere Framework
The Atmosphere FrameworkThe Atmosphere Framework
The Atmosphere Framework
jfarcand
 
Web application security
Web application securityWeb application security
Web application security
Ravi Raj
 
(BAC404) Deploying High Availability and Disaster Recovery Architectures with...
(BAC404) Deploying High Availability and Disaster Recovery Architectures with...(BAC404) Deploying High Availability and Disaster Recovery Architectures with...
(BAC404) Deploying High Availability and Disaster Recovery Architectures with...
Amazon Web Services
 
Taming Deployment With Smart Frog
Taming Deployment With Smart FrogTaming Deployment With Smart Frog
Taming Deployment With Smart Frog
Steve Loughran
 
JS everywhere 2011
JS everywhere 2011JS everywhere 2011
JS everywhere 2011
Oleg Podsechin
 
Deploy Rails Application by Capistrano
Deploy Rails Application by CapistranoDeploy Rails Application by Capistrano
Deploy Rails Application by Capistrano
Tasawr Interactive
 
Secure .NET programming
Secure .NET programmingSecure .NET programming
Secure .NET programming
Ante Gulam
 
Burn down the silos! Helping dev and ops gel on high availability websites
Burn down the silos! Helping dev and ops gel on high availability websitesBurn down the silos! Helping dev and ops gel on high availability websites
Burn down the silos! Helping dev and ops gel on high availability websites
Lindsay Holmwood
 
Server side JavaScript: going all the way
Server side JavaScript: going all the wayServer side JavaScript: going all the way
Server side JavaScript: going all the way
Oleg Podsechin
 
Angular js活用事例:filydoc
Angular js活用事例:filydocAngular js活用事例:filydoc
Angular js活用事例:filydoc
Keiichi Kobayashi
 

Similar to Failsafe Mechanism for Yahoo Homepage (20)

PuppetDB: Sneaking Clojure into Operations
PuppetDB: Sneaking Clojure into OperationsPuppetDB: Sneaking Clojure into Operations
PuppetDB: Sneaking Clojure into Operations
 
DevOps Fest 2019. Сергей Марченко. Terraform: a novel about modules, provider...
DevOps Fest 2019. Сергей Марченко. Terraform: a novel about modules, provider...DevOps Fest 2019. Сергей Марченко. Terraform: a novel about modules, provider...
DevOps Fest 2019. Сергей Марченко. Terraform: a novel about modules, provider...
 
DVWA BruCON Workshop
DVWA BruCON WorkshopDVWA BruCON Workshop
DVWA BruCON Workshop
 
Applications secure by default
Applications secure by defaultApplications secure by default
Applications secure by default
 
Applications secure by default
Applications secure by defaultApplications secure by default
Applications secure by default
 
Sherlock Homepage (Maarten Balliauw)
Sherlock Homepage (Maarten Balliauw)Sherlock Homepage (Maarten Balliauw)
Sherlock Homepage (Maarten Balliauw)
 
Sherlock Homepage - A detective story about running large web services (VISUG...
Sherlock Homepage - A detective story about running large web services (VISUG...Sherlock Homepage - A detective story about running large web services (VISUG...
Sherlock Homepage - A detective story about running large web services (VISUG...
 
Reversing Engineering a Web Application - For fun, behavior and detection
Reversing Engineering a Web Application - For fun, behavior and detectionReversing Engineering a Web Application - For fun, behavior and detection
Reversing Engineering a Web Application - For fun, behavior and detection
 
Terraform 0.9 + good practices
Terraform 0.9 + good practicesTerraform 0.9 + good practices
Terraform 0.9 + good practices
 
Sherlock Homepage - A detective story about running large web services - NDC ...
Sherlock Homepage - A detective story about running large web services - NDC ...Sherlock Homepage - A detective story about running large web services - NDC ...
Sherlock Homepage - A detective story about running large web services - NDC ...
 
The Atmosphere Framework
The Atmosphere FrameworkThe Atmosphere Framework
The Atmosphere Framework
 
Web application security
Web application securityWeb application security
Web application security
 
(BAC404) Deploying High Availability and Disaster Recovery Architectures with...
(BAC404) Deploying High Availability and Disaster Recovery Architectures with...(BAC404) Deploying High Availability and Disaster Recovery Architectures with...
(BAC404) Deploying High Availability and Disaster Recovery Architectures with...
 
Taming Deployment With Smart Frog
Taming Deployment With Smart FrogTaming Deployment With Smart Frog
Taming Deployment With Smart Frog
 
JS everywhere 2011
JS everywhere 2011JS everywhere 2011
JS everywhere 2011
 
Deploy Rails Application by Capistrano
Deploy Rails Application by CapistranoDeploy Rails Application by Capistrano
Deploy Rails Application by Capistrano
 
Secure .NET programming
Secure .NET programmingSecure .NET programming
Secure .NET programming
 
Burn down the silos! Helping dev and ops gel on high availability websites
Burn down the silos! Helping dev and ops gel on high availability websitesBurn down the silos! Helping dev and ops gel on high availability websites
Burn down the silos! Helping dev and ops gel on high availability websites
 
Server side JavaScript: going all the way
Server side JavaScript: going all the wayServer side JavaScript: going all the way
Server side JavaScript: going all the way
 
Angular js活用事例:filydoc
Angular js活用事例:filydocAngular js活用事例:filydoc
Angular js活用事例:filydoc
 

More from Kit Chan

Experience with jemalloc
Experience with jemallocExperience with jemalloc
Experience with jemalloc
Kit Chan
 
Benchmarking for HTTP/2
Benchmarking for HTTP/2Benchmarking for HTTP/2
Benchmarking for HTTP/2
Kit Chan
 
Life on the Edge with ESI
Life on the Edge with ESILife on the Edge with ESI
Life on the Edge with ESI
Kit Chan
 
Learning from Captain Kirk, Spock and Crew of the Enterprise
Learning from Captain Kirk, Spock and Crew of the EnterpriseLearning from Captain Kirk, Spock and Crew of the Enterprise
Learning from Captain Kirk, Spock and Crew of the Enterprise
Kit Chan
 
Yahoo's Adventure with ATS
Yahoo's Adventure with ATSYahoo's Adventure with ATS
Yahoo's Adventure with ATS
Kit Chan
 
Life on the Edge with ESI
Life on the Edge with ESILife on the Edge with ESI
Life on the Edge with ESI
Kit Chan
 
Apache Traffic Server & Lua
Apache Traffic Server & LuaApache Traffic Server & Lua
Apache Traffic Server & Lua
Kit Chan
 

More from Kit Chan (7)

Experience with jemalloc
Experience with jemallocExperience with jemalloc
Experience with jemalloc
 
Benchmarking for HTTP/2
Benchmarking for HTTP/2Benchmarking for HTTP/2
Benchmarking for HTTP/2
 
Life on the Edge with ESI
Life on the Edge with ESILife on the Edge with ESI
Life on the Edge with ESI
 
Learning from Captain Kirk, Spock and Crew of the Enterprise
Learning from Captain Kirk, Spock and Crew of the EnterpriseLearning from Captain Kirk, Spock and Crew of the Enterprise
Learning from Captain Kirk, Spock and Crew of the Enterprise
 
Yahoo's Adventure with ATS
Yahoo's Adventure with ATSYahoo's Adventure with ATS
Yahoo's Adventure with ATS
 
Life on the Edge with ESI
Life on the Edge with ESILife on the Edge with ESI
Life on the Edge with ESI
 
Apache Traffic Server & Lua
Apache Traffic Server & LuaApache Traffic Server & Lua
Apache Traffic Server & Lua
 

Recently uploaded

Splunk_Remote_Work_Insights_Overview.pptx
Splunk_Remote_Work_Insights_Overview.pptxSplunk_Remote_Work_Insights_Overview.pptx
Splunk_Remote_Work_Insights_Overview.pptx
sudsdeep
 
Top 10 Tips To Get Google AdSense For Your Website
Top 10 Tips To Get Google AdSense For Your WebsiteTop 10 Tips To Get Google AdSense For Your Website
Top 10 Tips To Get Google AdSense For Your Website
e-Definers Technology
 
Independence Day Hasn’t Always Been a U.S. Holiday.pdf
Independence Day Hasn’t Always Been a U.S. Holiday.pdfIndependence Day Hasn’t Always Been a U.S. Holiday.pdf
Independence Day Hasn’t Always Been a U.S. Holiday.pdf
Livetecs LLC
 
Shivam Pandit working on Php Web Developer.
Shivam Pandit working on Php Web Developer.Shivam Pandit working on Php Web Developer.
Shivam Pandit working on Php Web Developer.
shivamt017
 
NYC 26-Jun-2024 Combined Presentations.pdf
NYC 26-Jun-2024 Combined Presentations.pdfNYC 26-Jun-2024 Combined Presentations.pdf
NYC 26-Jun-2024 Combined Presentations.pdf
AUGNYC
 
Safe Work Permit Management Software for Hot Work Permits
Safe Work Permit Management Software for Hot Work PermitsSafe Work Permit Management Software for Hot Work Permits
Safe Work Permit Management Software for Hot Work Permits
sheqnetworkmarketing
 
Leading Project Management Tool Taskruop.pptx
Leading Project Management Tool Taskruop.pptxLeading Project Management Tool Taskruop.pptx
Leading Project Management Tool Taskruop.pptx
taskroupseo
 
Folding Cheat Sheet #7 - seventh in a series
Folding Cheat Sheet #7 - seventh in a seriesFolding Cheat Sheet #7 - seventh in a series
Folding Cheat Sheet #7 - seventh in a series
Philip Schwarz
 
A Comparative Analysis of Functional and Non-Functional Testing.pdf
A Comparative Analysis of Functional and Non-Functional Testing.pdfA Comparative Analysis of Functional and Non-Functional Testing.pdf
A Comparative Analysis of Functional and Non-Functional Testing.pdf
kalichargn70th171
 
一比一原版英国牛津大学毕业证(oxon毕业证书)如何办理
一比一原版英国牛津大学毕业证(oxon毕业证书)如何办理一比一原版英国牛津大学毕业证(oxon毕业证书)如何办理
一比一原版英国牛津大学毕业证(oxon毕业证书)如何办理
avufu
 
Intro to Amazon Web Services (AWS) and Gen AI
Intro to Amazon Web Services (AWS) and Gen AIIntro to Amazon Web Services (AWS) and Gen AI
Intro to Amazon Web Services (AWS) and Gen AI
Ortus Solutions, Corp
 
AWS Cloud Practitioner Essentials (Second Edition) (Arabic) AWS Security .pdf
AWS Cloud Practitioner Essentials (Second Edition) (Arabic) AWS Security .pdfAWS Cloud Practitioner Essentials (Second Edition) (Arabic) AWS Security .pdf
AWS Cloud Practitioner Essentials (Second Edition) (Arabic) AWS Security .pdf
karim wahed
 
Overview of ERP - Mechlin Technologies.pptx
Overview of ERP - Mechlin Technologies.pptxOverview of ERP - Mechlin Technologies.pptx
Overview of ERP - Mechlin Technologies.pptx
Mitchell Marsh
 
Google ML-Kit - Understanding on-device machine learning
Google ML-Kit - Understanding on-device machine learningGoogle ML-Kit - Understanding on-device machine learning
Google ML-Kit - Understanding on-device machine learning
VishrutGoyani1
 
Addressing the Top 9 User Pain Points with Visual Design Elements.pptx
Addressing the Top 9 User Pain Points with Visual Design Elements.pptxAddressing the Top 9 User Pain Points with Visual Design Elements.pptx
Addressing the Top 9 User Pain Points with Visual Design Elements.pptx
Sparity1
 
React vs Next js: Which is Better for Web Development? - Semiosis Software Pr...
React vs Next js: Which is Better for Web Development? - Semiosis Software Pr...React vs Next js: Which is Better for Web Development? - Semiosis Software Pr...
React vs Next js: Which is Better for Web Development? - Semiosis Software Pr...
Semiosis Software Private Limited
 
NBFC Software: Optimize Your Non-Banking Financial Company
NBFC Software: Optimize Your Non-Banking Financial CompanyNBFC Software: Optimize Your Non-Banking Financial Company
NBFC Software: Optimize Your Non-Banking Financial Company
NBFC Softwares
 
Responsibilities of Fleet Managers and How TrackoBit Can Assist.pdf
Responsibilities of Fleet Managers and How TrackoBit Can Assist.pdfResponsibilities of Fleet Managers and How TrackoBit Can Assist.pdf
Responsibilities of Fleet Managers and How TrackoBit Can Assist.pdf
Trackobit
 
ANSYS Mechanical APDL Introductory Tutorials.pdf
ANSYS Mechanical APDL Introductory Tutorials.pdfANSYS Mechanical APDL Introductory Tutorials.pdf
ANSYS Mechanical APDL Introductory Tutorials.pdf
sachin chaurasia
 
BITCOIN HEIST RANSOMEWARE ATTACK PREDICTION
BITCOIN HEIST RANSOMEWARE ATTACK PREDICTIONBITCOIN HEIST RANSOMEWARE ATTACK PREDICTION
BITCOIN HEIST RANSOMEWARE ATTACK PREDICTION
ssuser2b426d1
 

Recently uploaded (20)

Splunk_Remote_Work_Insights_Overview.pptx
Splunk_Remote_Work_Insights_Overview.pptxSplunk_Remote_Work_Insights_Overview.pptx
Splunk_Remote_Work_Insights_Overview.pptx
 
Top 10 Tips To Get Google AdSense For Your Website
Top 10 Tips To Get Google AdSense For Your WebsiteTop 10 Tips To Get Google AdSense For Your Website
Top 10 Tips To Get Google AdSense For Your Website
 
Independence Day Hasn’t Always Been a U.S. Holiday.pdf
Independence Day Hasn’t Always Been a U.S. Holiday.pdfIndependence Day Hasn’t Always Been a U.S. Holiday.pdf
Independence Day Hasn’t Always Been a U.S. Holiday.pdf
 
Shivam Pandit working on Php Web Developer.
Shivam Pandit working on Php Web Developer.Shivam Pandit working on Php Web Developer.
Shivam Pandit working on Php Web Developer.
 
NYC 26-Jun-2024 Combined Presentations.pdf
NYC 26-Jun-2024 Combined Presentations.pdfNYC 26-Jun-2024 Combined Presentations.pdf
NYC 26-Jun-2024 Combined Presentations.pdf
 
Safe Work Permit Management Software for Hot Work Permits
Safe Work Permit Management Software for Hot Work PermitsSafe Work Permit Management Software for Hot Work Permits
Safe Work Permit Management Software for Hot Work Permits
 
Leading Project Management Tool Taskruop.pptx
Leading Project Management Tool Taskruop.pptxLeading Project Management Tool Taskruop.pptx
Leading Project Management Tool Taskruop.pptx
 
Folding Cheat Sheet #7 - seventh in a series
Folding Cheat Sheet #7 - seventh in a seriesFolding Cheat Sheet #7 - seventh in a series
Folding Cheat Sheet #7 - seventh in a series
 
A Comparative Analysis of Functional and Non-Functional Testing.pdf
A Comparative Analysis of Functional and Non-Functional Testing.pdfA Comparative Analysis of Functional and Non-Functional Testing.pdf
A Comparative Analysis of Functional and Non-Functional Testing.pdf
 
一比一原版英国牛津大学毕业证(oxon毕业证书)如何办理
一比一原版英国牛津大学毕业证(oxon毕业证书)如何办理一比一原版英国牛津大学毕业证(oxon毕业证书)如何办理
一比一原版英国牛津大学毕业证(oxon毕业证书)如何办理
 
Intro to Amazon Web Services (AWS) and Gen AI
Intro to Amazon Web Services (AWS) and Gen AIIntro to Amazon Web Services (AWS) and Gen AI
Intro to Amazon Web Services (AWS) and Gen AI
 
AWS Cloud Practitioner Essentials (Second Edition) (Arabic) AWS Security .pdf
AWS Cloud Practitioner Essentials (Second Edition) (Arabic) AWS Security .pdfAWS Cloud Practitioner Essentials (Second Edition) (Arabic) AWS Security .pdf
AWS Cloud Practitioner Essentials (Second Edition) (Arabic) AWS Security .pdf
 
Overview of ERP - Mechlin Technologies.pptx
Overview of ERP - Mechlin Technologies.pptxOverview of ERP - Mechlin Technologies.pptx
Overview of ERP - Mechlin Technologies.pptx
 
Google ML-Kit - Understanding on-device machine learning
Google ML-Kit - Understanding on-device machine learningGoogle ML-Kit - Understanding on-device machine learning
Google ML-Kit - Understanding on-device machine learning
 
Addressing the Top 9 User Pain Points with Visual Design Elements.pptx
Addressing the Top 9 User Pain Points with Visual Design Elements.pptxAddressing the Top 9 User Pain Points with Visual Design Elements.pptx
Addressing the Top 9 User Pain Points with Visual Design Elements.pptx
 
React vs Next js: Which is Better for Web Development? - Semiosis Software Pr...
React vs Next js: Which is Better for Web Development? - Semiosis Software Pr...React vs Next js: Which is Better for Web Development? - Semiosis Software Pr...
React vs Next js: Which is Better for Web Development? - Semiosis Software Pr...
 
NBFC Software: Optimize Your Non-Banking Financial Company
NBFC Software: Optimize Your Non-Banking Financial CompanyNBFC Software: Optimize Your Non-Banking Financial Company
NBFC Software: Optimize Your Non-Banking Financial Company
 
Responsibilities of Fleet Managers and How TrackoBit Can Assist.pdf
Responsibilities of Fleet Managers and How TrackoBit Can Assist.pdfResponsibilities of Fleet Managers and How TrackoBit Can Assist.pdf
Responsibilities of Fleet Managers and How TrackoBit Can Assist.pdf
 
ANSYS Mechanical APDL Introductory Tutorials.pdf
ANSYS Mechanical APDL Introductory Tutorials.pdfANSYS Mechanical APDL Introductory Tutorials.pdf
ANSYS Mechanical APDL Introductory Tutorials.pdf
 
BITCOIN HEIST RANSOMEWARE ATTACK PREDICTION
BITCOIN HEIST RANSOMEWARE ATTACK PREDICTIONBITCOIN HEIST RANSOMEWARE ATTACK PREDICTION
BITCOIN HEIST RANSOMEWARE ATTACK PREDICTION
 

Failsafe Mechanism for Yahoo Homepage

  • 1. Failsafe Mechanism for Yahoo Homepage Using Apache Storm & Apache Traffic Server Pushkar Sachdeva (psachdev@yahoo-inc.com) Kit Chan (kichan@yahoo-inc.com) 05/2016
  • 3. Failsafe “A fail-safe or fail-secure device is one that, in the event of a specific type of failure, responds in a way that will cause no harm, or at least a minimum of harm, to other devices or to personnel”
  • 4. Overall Architecture Yahoo! Presentation, Confidential Browser ELB EC2 ATS S3 Property ATS Property Serving Stack Crawler on Storm AWSYahoo Auto activate Failsafe Switch traffic to AWS Offstage Data Flow Online Request Flow Normal Operation Online Request Flow Failsafe Mode
  • 5. AWS Failsafe Stack Architecture Elastic Load Balancer S3 Bucket Security Group ATS EC2 Instances ATS Server VPC Availability Zone #1 ATS EC2 Instances ATS Server Availability Zone #2 Region (US W Oregon) Region (US E North Virginia) Region (Ireland) Region (Singapore) S3 Replication across regions Cloud watch Crawled data from Yahoo https http
  • 6. EC2 Instance - ATS ● Instance (amazon linux) ○ t2.large - burstable ○ 2 vCPUs/8GB RAM/1 gbps network ● Apache Traffic Server ○ For caching ■ Negative caching enabled ■ Ramdisk used ○ Health Check/S3 Authentication plugin ○ Lua plugin ■ Query Parameters Sorting ■ Simple Device Detection ■ Error handling ● Cloudwatch Log Agent/Monitoring Scripts ● Autoscaling based on # of incoming requests ● Deployment Mechanism using Terraform / Packer ATS 4Gb ramdisk cache Amazon Linux Cloudwatch Agent Cloudwatch Monitoring Scripts
  • 7. Lua script example - sorting query parameters function do_remap() local query = ts.client_request.get_uri_args() if (query ~= nil and query ~= '') then local result = {} local i = 1 for value in query:gmatch '([^&]*)' do if (value ~= '') then result [i] = value i = i + 1 end end table.sort(result) local sorted_query = table.concat(result, '&') ts.client_request.set_uri_args(sorted_query) end end
  • 8. Cloudwatch Log Agent Conf # /etc/awslogs/awslogs.conf # Custom ATS log enabled and in /usr/local/var/log/trafficserver/mon [monlog] datetime_format = %Y-%m-%d %H:%M:%S file = /usr/local/var/log/trafficserver/mon.* buffer_duration = 5000 log_stream_name = {instance_id} initial_position = start_of_file log_group_name = monlog
  • 9. Perl Script calling Cloudwatch Monitoring Lib + if ($report_chr) { + my $result = `/usr/local/bin/traffic_line -r proxy.node.cache_hit_ratio_avg_10s`; + add_metric('CacheHitRatio', 'Percent', 100 * $result); + } + if ($report_tef) { + my $connect_failed = `/usr/local/bin/traffic_line -r proxy.node.http.transaction_frac_avg_10s.errors.connect_failed`; + my $aborts = `/usr/local/bin/traffic_line -r proxy.node.http.transaction_frac_avg_10s.errors.aborts`; + my $possible_aborts = `/usr/local/bin/traffic_line -r proxy.node.http.transaction_frac_avg_10s.errors.possible_aborts`; + my $pre_accept_hangups = `/usr/local/bin/traffic_line -r proxy.node.http.transaction_frac_avg_10s.errors. pre_accept_hangups`; + my $early_hangups = `/usr/local/bin/traffic_line -r proxy.node.http.transaction_frac_avg_10s.errors.early_hangups`; + my $empty_hangups = `/usr/local/bin/traffic_line -r proxy.node.http.transaction_frac_avg_10s.errors.empty_hangups`; + my $other = `/usr/local/bin/traffic_line -r proxy.node.http.transaction_frac_avg_10s.errors.other`; + + add_metric('TransErrorFraction', 'Percent', 100 * ($connect_failed + $aborts + $possible_aborts + $pre_accept_hangups + $early_hangups + $empty_hangups + $other)); + }
  • 11. AWS Autoscaling - Terraform Configuration File resource "aws_autoscaling_group" "fsfb_base_load" { availability_zones = ["${split(",", var.zones)}"] name = "${var.env}_fsfb_base_load-${aws_launch_configuration.fsfb_ats.name}" load_balancers = ["${aws_elb.fsfb_elb.name}"] max_size = 8 min_size = 2 health_check_grace_period = 180 health_check_type = "ELB" desired_capacity = 2 launch_configuration = "${aws_launch_configuration.fsfb_ats.name}" force_delete = true wait_for_elb_capacity = 2 lifecycle { create_before_destroy = true } }
  • 12. AWS Autoscaling - Terraform Configuration File (Cont’d) resource "aws_autoscaling_policy" "fsfb_scale_out_med" { name = "${var.env}_fsfb_scale_out_med" scaling_adjustment = 8 adjustment_type = "ExactCapacity" cooldown = 300 autoscaling_group_name = "${aws_autoscaling_group.fsfb_base_load.name}" }
  • 13. AWS Autoscaling - Terraform Configuration File (Cont’d) resource "aws_cloudwatch_metric_alarm" "fsfb_upper_medium_rps" { alarm_name = "${var.env}_fsfb_upper_medium_rps" comparison_operator = "GreaterThanOrEqualToThreshold" evaluation_periods = "1" period = "60" metric_name = "RequestCount" namespace = "AWS/ELB" statistic = "Sum" threshold = "75000" dimensions { LoadBalancerName = "${aws_elb.fsfb_elb.name}" } alarm_description = "This metric monitors medium elb traffic" alarm_actions = ["${aws_autoscaling_policy.fsfb_scale_out_med.arn}", "${var.sns_email_topic}"] }
  • 14. Escalate Plugin in Apache Traffic Server (ATS) ● ATS is a proxy server that sits between the user and the origin server ● ‘Escalate’ is an ATS plugin that fetches content from failsafe servers when the origin server fails to provide a ‘good’ response. ATS Origin ServerUser
  • 15. Escalate Plugin in ATS (Continued) ● ‘Escalate’ is a remap plugin - map http://games.yahoo.com/ http://some_origin.yahoo.com/ @plugin=ats_escalate.so @pparam=some_label ● Loads global configuration with ‘label’ definitions ● Sample ‘label’ definition - "some_label" : { "enable" : 1, "response" : { "500" : { "mode" : "url", "url" : "http://brb.yahoo.net/$h/$d/$p$x" } } }
  • 16. Escalate Plugin in ATS (Continued) ● Runs in ‘READ_RESPONSE_HDR_HOOK’ ● Uses 'TSHttpTxnRedirectUrlSet’ to fetch content from failsafe servers if (EscalateLabel::ACTION_URL == entry->second.mode) { std::string content; MyExpander expander(txn, entry->second.url); if (!expander(entry->second.url, config->get_device_type_header(), config->get_default_device_type())) { TSError("[" PLUGIN_TAG "] invalid expansion"); TSDebug(PLUGIN_TAG, "invalid expansion"); goto finish; } expander.swap(content); url_str = TSstrdup(content.c_str()); length = content.size(); if (url_str) { TSHttpTxnRedirectUrlSet(txn, url_str, length); // Transfers ownership } }
  • 17. Apache Storm Crawler ● Based on scalable Apache Storm platform ● Topology ● Spouts ● Bolts Spout Bolt Spout Bolt Bolt
  • 18. Apache Storm Crawler (Continued) Simplified Topology Cron Feeder Changelog Feeder IndexUrl Config Fetcher Url Fetcher Memory Storage Writer Response Processor Response Uploader Custom Event Queue UpdaterCustom Event Queue Feeder
  • 19. Apache Storm Crawler (Continued) ● Crawls content for desktop, smartphone and tablet ● Supports domain level configuration for request headers, query params and output storage. ● Failsafe url path mapping example - Mapping: http://{failsafe_host}/{original_domain}/{device}/{path}; {sorted_query_params_as_matrix_params} URL: https://www.yahoo.com/news/trump-unveils-foreign-policy-plan-201628138.html?q=1&a=2 S3 file path: http://brb.yahoo.net/www.yahoo.com/smartphone/news/trump-unveils-foreign-policy- plan-201628138.html;a=2;q=1
  • 20. High Level Architecture Proxy Router Proxy Cache Origin Server Failsafe Crawler AWS storage 1 10 5 4 3 2 9 8 7 6 User 7 6 4 3 5 2 1 PUT Offline Crawler Request Flow User Request Flow Optional Request Flow to fetch failsafe content
  • 21. Benefits ● No manual intervention needed to serve failsafe content ● Granular control ● More relevant content is shown to user ● Failsafe content is cached in proxy layer
  • 22. Pitfalls/Limitations ● Lagging Crawler ● Handling additional Crawler traffic ● Bucket specific experience ● Malformed Page
  • 23. Future on Resiliency - multi-cloud for failsafe ● Additional Cloud Vendor ○ E.g. Google Cloud Platform ○ S3 vs Google Cloud Storage ○ EC2/ELB vs Google Compute Engine ○ Cloudwatch vs StackDriver ● Changes in Apache Storm Crawler ○ Can use Apache jclouds to create objects in storage in S3 or Google Cloud Storage ● Changes in deployment using terraform / configuration using chef ○ GCP & AWS are supported ● Route 53 can be used to do failover to GCP
  • 24. Future on Resiliency ● Speculative Retry void SpeculativeRetryPlugin::handleInputComplete() { orig_url_ = transaction_.getClientRequest().getUrl().getUrlString(); //fetch original request sendFetchRequest(orig_url_, false); //start a timer which would give a callback after ‘time_’ msecs Async::execute<AsyncTimer>(this, new AsyncTimer(AsyncTimer::TYPE_ONE_OFF, time_), getMutex()); } void SpeculativeRetryPlugin::handleAsyncComplete(AsyncTimer &async_timer) { async_timer.cancel(); //active_fetch keeps track if we have received the response of original request yet or not //if not initiate a retry request if(!active_fetch_) { sendFetchRequest(orig_url_, true); } }