SlideShare a Scribd company logo
Monitor All Your Things: Amazon CloudWatch in Action with BBC (DEV302) - AWS re:Invent 2018
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Monitor All Your Things: Amazon
CloudWatch in Action with BBC
Brian Dennehy
Director of Engineering
AWS
D E V 3 0 2
Christopher Darlaston
Development Lead
BBC
Monitor All Your Things: Amazon CloudWatch in Action with BBC (DEV302) - AWS re:Invent 2018
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Monitoring matters because …
Visibility Real-time
troubleshooting
#Customer
experience
Applications
= $$
BusinessOperational
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Full stack
visibility
Short-lived
resources
^Devices
^Data
Monolithic to
microservice
Faster release
velocity
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Cloud native
defaults
Single solution for
metrics and logs
Highly scalable
Monitor with
automation
Logs
Metrics
Alarms
Events
Dash-
boards
Agent
& APIs
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
and Log analytics
Collect Monitor Act Analyze
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Christopher Darlaston—BBC
• Development lead in interactive TV
• Seven years in interactive TV on BBC
iPlayer, Sport, News and Frameworks
• Previous 13 years working at Sun
Microsystems in their web teams
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
BBC Interactive TV
overview
Giving users access to additional TV
programming.
Press the red button on your TV
remote control to enjoy additional
coverage from the big events:
• Glastonbury Festival (Music)
• Wimbledon (Tennis, Grand Slam)
• Olympic Games
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Simplified architecture—Unconnected Red Button
AWS Direct
Connect
Main
Data Playout
Carousel Injection
Amazon EC2
Amazon EC2
Private
Public
Amazon EFS
Carousel Storage
Carousel Creation
Amazon EC2
Amazon
DynamoDB
Amazon
Kinesis
AWS
Lambda
Amazon
S3
Amazon
CloudWatch
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Collecting metrics and logs via CloudWatch agent
{
"metrics": {
"aggregation_dimensions": [ ["AutoScalingGroupName", "InstanceId"], ["AutoScalingGroupName”] ],
"append_dimensions": { "InstanceId": "${aws:InstanceId}”, "AutoScalingGroupName": "${aws:AutoScalingGroupName}” },
"metrics_collected": {
"mem": { "measurement": ["mem_used", "mem_cached", "mem_used_percent", "mem_available_percent”] },
"processes": { "measurement": ["running", "sleeping", "dead”] },
"disk": {"resources": ["/"], "measurement": ["free", "used_percent”] },
"netstat": {"measurement": ["tcp_established”] },
"cpu": { "totalcpu": false, "resources": ["*"], "measurement": ["cpu_usage_iowait", "cpu_usage_idle", "cpu_usage_nice”] }
},
"namespace": "live-broadcast-red-button-linkmanager-api"
},
"logs": {
"logs_collected": {
"files": {
"collect_list": [{
“file_path": "/var/log/broadcast-red-button-linkmanager-api/output.log",
"log_group_name": "live-broadcast-red-button-linkmanager-api-infrastructure-ApplicationLog-J8FGOWKDFOE8",
"log_stream_name": "{instance_id}-{ip_address}-output.log"
}]
}
},
"log_stream_name": "{instance_id}-{hostname}"
},
"agent": { "logfile": "/var/log/amazon-cloudwatch-agent/amazon-cloudwatch-agent.log”, "metrics_collection_interval": 60 }
}
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Collecting metrics from log extraction
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Monitoring view—Typical day
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Alerting on issues using CloudWatch alarms
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Monitoring view—Day of trouble
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Diagnosing—Is it downstream or on premise ?
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Diagnosing—Is it upstream of us?
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Flexibility—Dashboard created during incident
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Monitoring view—Full day
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Split
the problem space
Log
everything
Do you have the right
dashboards?
What did we learn?
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Why do we use CloudWatch?
“Our interactive services, just like
picking up your phone and making a
call, needs to just work at all times.
We deliver journalistic content and
news, which are fundamental services
that our users expect in real-time and
on-demand without failure.”
1. End-to-end visibility for on-premise
and cloud
Log analytics for both on-premise & Amazon Web
Services (AWS)
2. Monitoring with automation
Resource optimization, snapshot graphs
3. Correlate & investigate issues in real
time
CloudWatch agent & dashboards
4. More time back to focus on BBC
innovation
Reinvent & simplify: Lessons learned
inform our future
What’s new
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
NEW: CloudWatch Automatic Dashboards
CloudWatch simplifies infrastructure monitoring with a default, getting started
experience
Dynamic, self-
updating AWS
infrastructure
dashboards
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Building operational dashboards takes time &
experience
“I just want a quick, summary view …”
“I just want some default recommendations …”
“Oh, not all statistics and visualizations are created equal …”
“I create dashboards one by one and someone always forgets …”
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Automatic
Explore account &
resource-based views of
health and performance
metrics
Smart
Browse defaults with
built-in AWS best
practices, including
metrics, statistics, and
visualizations
Dynamic
Auto-scrub metrics of
resources that no longer
exist to reduce stale
views via resource-aware
updates
Granular
Easily drill down for
troubleshooting with
AWS or resource group
filtering
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Collect
everything with ease
using defaults for building
operational visibility
Automate
monitoring
with new CloudWatch
automated operational
dashboards
Session key takeaways
Correlate
metrics and logs
for faster
troubleshooting and
understanding root
cause
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
More sessions:
AWS booth for demos
DEV375 “Amazon CloudWatch Logs Is Making an Exciting Announcement!”
DEV311 “Breaking Observability Chaos: Best Practices to Monitor AWS Cloud Native Apps”
DEV301R “AIOPs: Find Your Needle in the Haystack”
DEV306R1 “Monitoring for Operational Outcomes and Application Insights: Best Practices Workshop”
DEV303R “Instrumenting Kubernetes for Observability Using AWS X-Ray and Amazon CloudWatch”
WIN202L “Leadership Session: Learn about 10 Years’ of Windows and .NET Innovation on AWS with
10 New Launches”
What else is new:
Metric Math alarms
Log insights
CloudWatch agent with collectd and StatsD
integration
Snapshot graphs
Events support for AWS organizations
Thank you!
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Brian Dennehy
Christopher Darlaston
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

More Related Content

Monitor All Your Things: Amazon CloudWatch in Action with BBC (DEV302) - AWS re:Invent 2018

  • 2. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Monitor All Your Things: Amazon CloudWatch in Action with BBC Brian Dennehy Director of Engineering AWS D E V 3 0 2 Christopher Darlaston Development Lead BBC
  • 4. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Monitoring matters because … Visibility Real-time troubleshooting #Customer experience Applications = $$ BusinessOperational
  • 5. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Full stack visibility Short-lived resources ^Devices ^Data Monolithic to microservice Faster release velocity
  • 6. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Cloud native defaults Single solution for metrics and logs Highly scalable Monitor with automation Logs Metrics Alarms Events Dash- boards Agent & APIs
  • 7. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. and Log analytics Collect Monitor Act Analyze
  • 8. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 9. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Christopher Darlaston—BBC • Development lead in interactive TV • Seven years in interactive TV on BBC iPlayer, Sport, News and Frameworks • Previous 13 years working at Sun Microsystems in their web teams
  • 10. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. BBC Interactive TV overview Giving users access to additional TV programming. Press the red button on your TV remote control to enjoy additional coverage from the big events: • Glastonbury Festival (Music) • Wimbledon (Tennis, Grand Slam) • Olympic Games © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 11. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Simplified architecture—Unconnected Red Button AWS Direct Connect Main Data Playout Carousel Injection Amazon EC2 Amazon EC2 Private Public Amazon EFS Carousel Storage Carousel Creation Amazon EC2 Amazon DynamoDB Amazon Kinesis AWS Lambda Amazon S3 Amazon CloudWatch
  • 12. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Collecting metrics and logs via CloudWatch agent { "metrics": { "aggregation_dimensions": [ ["AutoScalingGroupName", "InstanceId"], ["AutoScalingGroupName”] ], "append_dimensions": { "InstanceId": "${aws:InstanceId}”, "AutoScalingGroupName": "${aws:AutoScalingGroupName}” }, "metrics_collected": { "mem": { "measurement": ["mem_used", "mem_cached", "mem_used_percent", "mem_available_percent”] }, "processes": { "measurement": ["running", "sleeping", "dead”] }, "disk": {"resources": ["/"], "measurement": ["free", "used_percent”] }, "netstat": {"measurement": ["tcp_established”] }, "cpu": { "totalcpu": false, "resources": ["*"], "measurement": ["cpu_usage_iowait", "cpu_usage_idle", "cpu_usage_nice”] } }, "namespace": "live-broadcast-red-button-linkmanager-api" }, "logs": { "logs_collected": { "files": { "collect_list": [{ “file_path": "/var/log/broadcast-red-button-linkmanager-api/output.log", "log_group_name": "live-broadcast-red-button-linkmanager-api-infrastructure-ApplicationLog-J8FGOWKDFOE8", "log_stream_name": "{instance_id}-{ip_address}-output.log" }] } }, "log_stream_name": "{instance_id}-{hostname}" }, "agent": { "logfile": "/var/log/amazon-cloudwatch-agent/amazon-cloudwatch-agent.log”, "metrics_collection_interval": 60 } }
  • 13. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Collecting metrics from log extraction
  • 14. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Monitoring view—Typical day
  • 15. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Alerting on issues using CloudWatch alarms
  • 16. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Monitoring view—Day of trouble
  • 17. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Diagnosing—Is it downstream or on premise ?
  • 18. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Diagnosing—Is it upstream of us?
  • 19. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Flexibility—Dashboard created during incident
  • 20. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Monitoring view—Full day
  • 21. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Split the problem space Log everything Do you have the right dashboards? What did we learn?
  • 22. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Why do we use CloudWatch? “Our interactive services, just like picking up your phone and making a call, needs to just work at all times. We deliver journalistic content and news, which are fundamental services that our users expect in real-time and on-demand without failure.” 1. End-to-end visibility for on-premise and cloud Log analytics for both on-premise & Amazon Web Services (AWS) 2. Monitoring with automation Resource optimization, snapshot graphs 3. Correlate & investigate issues in real time CloudWatch agent & dashboards 4. More time back to focus on BBC innovation
  • 23. Reinvent & simplify: Lessons learned inform our future What’s new © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 24. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. NEW: CloudWatch Automatic Dashboards CloudWatch simplifies infrastructure monitoring with a default, getting started experience Dynamic, self- updating AWS infrastructure dashboards
  • 25. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Building operational dashboards takes time & experience “I just want a quick, summary view …” “I just want some default recommendations …” “Oh, not all statistics and visualizations are created equal …” “I create dashboards one by one and someone always forgets …”
  • 26. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Automatic Explore account & resource-based views of health and performance metrics Smart Browse defaults with built-in AWS best practices, including metrics, statistics, and visualizations Dynamic Auto-scrub metrics of resources that no longer exist to reduce stale views via resource-aware updates Granular Easily drill down for troubleshooting with AWS or resource group filtering
  • 27. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 28. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Collect everything with ease using defaults for building operational visibility Automate monitoring with new CloudWatch automated operational dashboards Session key takeaways Correlate metrics and logs for faster troubleshooting and understanding root cause
  • 29. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. More sessions: AWS booth for demos DEV375 “Amazon CloudWatch Logs Is Making an Exciting Announcement!” DEV311 “Breaking Observability Chaos: Best Practices to Monitor AWS Cloud Native Apps” DEV301R “AIOPs: Find Your Needle in the Haystack” DEV306R1 “Monitoring for Operational Outcomes and Application Insights: Best Practices Workshop” DEV303R “Instrumenting Kubernetes for Observability Using AWS X-Ray and Amazon CloudWatch” WIN202L “Leadership Session: Learn about 10 Years’ of Windows and .NET Innovation on AWS with 10 New Launches” What else is new: Metric Math alarms Log insights CloudWatch agent with collectd and StatsD integration Snapshot graphs Events support for AWS organizations
  • 30. Thank you! © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Brian Dennehy Christopher Darlaston
  • 31. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.