It’s easy to get seduced by being able to quickly deploy and scale applications by using containers. However, when things inevitably go wrong, how do you debug your application? This session covers various pro bug hunting tips and tricks. It shows live demos of tools such as the Docker stats API, Docker exec (and top, vmstat, and netstat), and how to use the ELK stack for centralized logging. It also dives into other more sophisticated tools that operate at the application and (micro)service layer, such as Twitter’s Zipkin tracing app, Spring Boot’s Actuator, and DropWizard’s Metrics library. Keep those container-based nightmares away by ensuring that when the worst does happen, you have the tools, info, and experience to debug containerized applications.
Presented at JavaOne 2015 with Steve Poole
Report
Share
Report
Share
1 of 56
More Related Content
J1 2015 "Debugging Java Apps in Containers: No Heavy Welding Gear Required"
1. Debugging Java Apps in Containers:
No Heavy Welding Gear Required
Daniel Bryant
@danielbryantuk
Steve Poole
@spoole167
2. Agenda
• Assuming Java and Docker basic knowledge
• You can still use standard Java tooling
• Docker is not as ‘contained’ as we think
• Containers do change some things (a lot)
• Case studies
• Monitoring and logging FTW
• Docker debug tool
• OS debug tools
3. Who Are We?
Steve Poole
IBM Developer
@spoole167
Daniel Bryant
Principal Consultant,
OpenCredo
@danielbryantuk
Making Java Real Since Version 0.9
Open Source Advocate
DevOps Practitioner (whatever that means!)
Driving Change
“Biz-dev-QA-ops”
Leading change in organisations
All over Docker, Mesos, k8s, Go, Java
InfoQ, DZone, Voxxed contributor
8. Simple
• Treat Docker image as a remote server
– Enable debugging port in Docker at launch
• -p 8000:8000
– Point debugger to image host
• 192.168.99.100
• And you’re done
23. Welcome to the World of Containers
• Restrictions (typically)
– Minimal OS
– Limited resources
– JRE vs JDK (with Java 9, minimal JDK?)
• JVM doesn’t always respect resource limits
– cgroup/namespace awareness
• Clustering/autoscaling/microscaling
– Constant restarts and reallocation
24. …and the World of ‘Cloud’
• Everything is on the network
– Seriously, everything…
• Noisy neighbours
– Containers and VMs
• No writing to ‘local’ disk
– Typically emphemeral
25. The Hardest Part of Debugging
• Finding the problem is difficult…
• A distributed system only makes this harder!
• Cloud and containers embrace transient
• …yep, we’ve got a challenge on our hands!
30. Case Study: App Crashing
• Symptom
– Spring Boot app slow to respond/crashing
• Diagnostics
– Examine host (top, vmstat, df –h)
– Examine container (docker stats)
– Docker exec (top, vmstat, df –h)
• Problem
– No disk space for docker logging
• Resolution
– Increase disk space (move logs to mount)
31. Case Study: Container Builds Failing
• Symptom
– Jenkins builds sporadically failing – insufficient space
• Diagnostics
– Examine host (df –h)
– Inodes! (df –i)
• Problem
– Old containers taking lots of inodes
• Resolution
– Clear old containers
– docker run --rm
33. Case Study: App Not Starting
• Symptom
– Spring boot application not starting (or slow)
• Diagnostics
– The usual suspects
– jstack – blocked on SecureRandom
• Problem
– /dev/random not so good on containers
• Resolution
– -Djava.security.egd=file:/dev/urandom
34. $ jstack 2616
2014-09-04 07:33:30
Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.20-b23 mixed mode):
"Attach Listener" #14 daemon prio=9 os_prio=0 tid=0x00007fb678001000 nid=0xaae waiting on condition
[0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"localhost-startStop-1" #13 daemon prio=5 os_prio=0 tid=0x00007fb65000d000 nid=0xa46 runnable
[0x00007fb662fec000]
java.lang.Thread.State: RUNNABLE
at java.io.FileInputStream.readBytes(Native Method)
at java.io.FileInputStream.read(FileInputStream.java:246)
at sun.security.provider.SeedGenerator$URLSeedGenerator.getSeedBytes(SeedGenerator.java:539)
at sun.security.provider.SeedGenerator.generateSeed(SeedGenerator.java:144)
at sun.security.provider.SecureRandom$SeederHolder.<clinit>(SecureRandom.java:192)
at sun.security.provider.SecureRandom.engineNextBytes(SecureRandom.java:210)
- locked <0x00000000f15173a0> (a sun.security.provider.SecureRandom)
at java.security.SecureRandom.nextBytes(SecureRandom.java:457)
- locked <0x00000000f15176c0> (a java.security.SecureRandom)
at java.security.SecureRandom.next(SecureRandom.java:480)
at java.util.Random.nextInt(Random.java:329)
at org.apache.catalina.util.SessionIdGenerator.createSecureRandom(SessionIdGenerator.java:246)
35. Case Study: Rollouts Broke Comms
• Symptom
– New app deployed with new IP added to ELB
– Existing apps couldn’t talk to it
• Diagnostics
– dig service.qa.mystore.com (looks good)
– Create simple Java service that echoes IPs
• Problem
– Java’s wonky caching of DNS
– Pre Java 7 caches indefinitely
– Post Java 7 ‘implementation specific’ unless SM installed
• Resolution
– -Dsun.net.inetaddr.ttl=0
36. Case Study: Slow Communications
• Symptom
– Some apps would drip feed data onto the wire
– Existing apps couldn’t talk to it
• Diagnostics
– Vmstat
– Saw high cpu ‘wa’
• Problem
– Packed high contention containers on same host
• Resolution
– Spread containers out
39. Why Instrument?
• Post Mortem
– Containers are gone and so is all their state…
• Aggregation (or not)
– coherent view of your application behaviour
when it’s distributed (microservices or not)
– lots of containers running
• Looking for hints and smoking guns..
41. Instrumentation
• Instrument the code/application
– codehale-metrics, Spring Boot Actuator etc
• Instrument the OS
– Collectd, munin, SAR
• Instrument the system
– Zipkin, App Dynamics etc
42. The Value of System Monitoring
github.com/openzipkin/zipkin github.com/openzipkin/docker-zipkin
46. Logging
• docker logs are your friend
• Rotate app log files to mounted dir
– Otherwise they disappear!
• The ElasticSearch-Logstash-Kibana stack
– github.com/deviantony/docker-elk
• Log like an operator
54. Summary
• Debugging is still an essential skill
– rebooting containers doesn’t cut it
• Isolating (targeting) the issue is vital
– Distributing tracing (Zipkin etc)
• Build your debugging toolbox
– Java + docker + OS debugging
• Monitoring and logging FTW
Connect to a running container from the command line
Pressing the exec button gets me a terminal connected to the running container..
http://www.thegeekstuff.com/2011/03/sar-examples/
http://www.sysdig.org/
Docker ps –lq # id of last container
export CONTAINER_ID=$(docker ps -lq)
~ $ docker inspect --format='{{.NetworkSettings.IPAddress}}' $CONTAINER_ID
docker inspect --format='{{.LogPath}}' $INSTANCE_ID
Docker-machine ssh dev
https://speakerdeck.com/garethr/containers-and-microservices-make-performance-worse
Setting term for top
Installing iostat/vmstat in container
Blocked on I/O
Blocked on CPU
Steal time