Monitor All Your Things: Amazon CloudWatch in Action with BBC (DEV302) - AWS re:Invent 2018
- 2. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Monitor All Your Things: Amazon
CloudWatch in Action with BBC
Brian Dennehy
Director of Engineering
AWS
D E V 3 0 2
Christopher Darlaston
Development Lead
BBC
- 4. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Monitoring matters because …
Visibility Real-time
troubleshooting
#Customer
experience
Applications
= $$
BusinessOperational
- 5. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Full stack
visibility
Short-lived
resources
^Devices
^Data
Monolithic to
microservice
Faster release
velocity
- 6. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Cloud native
defaults
Single solution for
metrics and logs
Highly scalable
Monitor with
automation
Logs
Metrics
Alarms
Events
Dash-
boards
Agent
& APIs
- 7. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
and Log analytics
Collect Monitor Act Analyze
- 8. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
- 9. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Christopher Darlaston—BBC
• Development lead in interactive TV
• Seven years in interactive TV on BBC
iPlayer, Sport, News and Frameworks
• Previous 13 years working at Sun
Microsystems in their web teams
- 10. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
BBC Interactive TV
overview
Giving users access to additional TV
programming.
Press the red button on your TV
remote control to enjoy additional
coverage from the big events:
• Glastonbury Festival (Music)
• Wimbledon (Tennis, Grand Slam)
• Olympic Games
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
- 11. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Simplified architecture—Unconnected Red Button
AWS Direct
Connect
Main
Data Playout
Carousel Injection
Amazon EC2
Amazon EC2
Private
Public
Amazon EFS
Carousel Storage
Carousel Creation
Amazon EC2
Amazon
DynamoDB
Amazon
Kinesis
AWS
Lambda
Amazon
S3
Amazon
CloudWatch
- 12. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Collecting metrics and logs via CloudWatch agent
{
"metrics": {
"aggregation_dimensions": [ ["AutoScalingGroupName", "InstanceId"], ["AutoScalingGroupName”] ],
"append_dimensions": { "InstanceId": "${aws:InstanceId}”, "AutoScalingGroupName": "${aws:AutoScalingGroupName}” },
"metrics_collected": {
"mem": { "measurement": ["mem_used", "mem_cached", "mem_used_percent", "mem_available_percent”] },
"processes": { "measurement": ["running", "sleeping", "dead”] },
"disk": {"resources": ["/"], "measurement": ["free", "used_percent”] },
"netstat": {"measurement": ["tcp_established”] },
"cpu": { "totalcpu": false, "resources": ["*"], "measurement": ["cpu_usage_iowait", "cpu_usage_idle", "cpu_usage_nice”] }
},
"namespace": "live-broadcast-red-button-linkmanager-api"
},
"logs": {
"logs_collected": {
"files": {
"collect_list": [{
“file_path": "/var/log/broadcast-red-button-linkmanager-api/output.log",
"log_group_name": "live-broadcast-red-button-linkmanager-api-infrastructure-ApplicationLog-J8FGOWKDFOE8",
"log_stream_name": "{instance_id}-{ip_address}-output.log"
}]
}
},
"log_stream_name": "{instance_id}-{hostname}"
},
"agent": { "logfile": "/var/log/amazon-cloudwatch-agent/amazon-cloudwatch-agent.log”, "metrics_collection_interval": 60 }
}
- 13. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Collecting metrics from log extraction
- 14. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Monitoring view—Typical day
- 15. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Alerting on issues using CloudWatch alarms
- 16. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Monitoring view—Day of trouble
- 17. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Diagnosing—Is it downstream or on premise ?
- 18. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Diagnosing—Is it upstream of us?
- 19. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Flexibility—Dashboard created during incident
- 20. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Monitoring view—Full day
- 21. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Split
the problem space
Log
everything
Do you have the right
dashboards?
What did we learn?
- 22. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Why do we use CloudWatch?
“Our interactive services, just like
picking up your phone and making a
call, needs to just work at all times.
We deliver journalistic content and
news, which are fundamental services
that our users expect in real-time and
on-demand without failure.”
1. End-to-end visibility for on-premise
and cloud
Log analytics for both on-premise & Amazon Web
Services (AWS)
2. Monitoring with automation
Resource optimization, snapshot graphs
3. Correlate & investigate issues in real
time
CloudWatch agent & dashboards
4. More time back to focus on BBC
innovation
- 23. Reinvent & simplify: Lessons learned
inform our future
What’s new
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
- 24. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
NEW: CloudWatch Automatic Dashboards
CloudWatch simplifies infrastructure monitoring with a default, getting started
experience
Dynamic, self-
updating AWS
infrastructure
dashboards
- 25. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Building operational dashboards takes time &
experience
“I just want a quick, summary view …”
“I just want some default recommendations …”
“Oh, not all statistics and visualizations are created equal …”
“I create dashboards one by one and someone always forgets …”
- 26. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Automatic
Explore account &
resource-based views of
health and performance
metrics
Smart
Browse defaults with
built-in AWS best
practices, including
metrics, statistics, and
visualizations
Dynamic
Auto-scrub metrics of
resources that no longer
exist to reduce stale
views via resource-aware
updates
Granular
Easily drill down for
troubleshooting with
AWS or resource group
filtering
- 27. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
- 28. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Collect
everything with ease
using defaults for building
operational visibility
Automate
monitoring
with new CloudWatch
automated operational
dashboards
Session key takeaways
Correlate
metrics and logs
for faster
troubleshooting and
understanding root
cause
- 29. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
More sessions:
AWS booth for demos
DEV375 “Amazon CloudWatch Logs Is Making an Exciting Announcement!”
DEV311 “Breaking Observability Chaos: Best Practices to Monitor AWS Cloud Native Apps”
DEV301R “AIOPs: Find Your Needle in the Haystack”
DEV306R1 “Monitoring for Operational Outcomes and Application Insights: Best Practices Workshop”
DEV303R “Instrumenting Kubernetes for Observability Using AWS X-Ray and Amazon CloudWatch”
WIN202L “Leadership Session: Learn about 10 Years’ of Windows and .NET Innovation on AWS with
10 New Launches”
What else is new:
Metric Math alarms
Log insights
CloudWatch agent with collectd and StatsD
integration
Snapshot graphs
Events support for AWS organizations
- 30. Thank you!
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Brian Dennehy
Christopher Darlaston
- 31. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.