Jvm profiling under the hood
- 8. CPU Profiling Limitations
● Finds CPU bound bottlenecks
● Many problems not CPU Bound
○ Networking
○ Database or External Service
○ I/O
○ Garbage Collection
○ Insufficient Parallelism
○ Blocking & Queuing Effects
- 13. Periodicity Bias
● Bias from sampling at a fixed interval
● Periodic operations with the same frequency
as the samples
● Timed operations
- 15. Stack Trace Sampling
● JVMTI interface: GetCallTrace
○ Trigger a global safepoint(not on Zing)
○ Collect stack trace
● Large impact on application
● Samples only at safepoints
- 16. Example
private static void outer()
{
for (int i = 0; i < OUTER; i++)
{
hotMethod(i);
}
}
// https://github.com/RichardWarburton/profiling-samples
- 17. Example (2)
private static void hotMethod(final int i)
{
for (int k = 0; k < N; k++)
{
final int[] array = SafePointBias. array;
final int index = i % SIZE;
for (int j = index; j < SIZE; j++)
{
array[index] += array[j];
}
}
}
- 20. Whats a safepoint?
● Java threads poll global flag
○ At ‘uncounted’ loops back edge
○ At method exit/enter
● A safepoint poll can be delayed by:
○ Large methods
○ Long running ‘counted’ loops
○ BONUS: Page faults/thread suspension
- 24. Let sleeping dogs lie?
● ‘GetCallTrace’ profilers will sample ALL
threads
● Even sleeping threads...
- 30. AsyncGetCallTrace
● Used by Oracle Solaris Studio
● Adapted to open source prototype by
Google’s Jeremy Manson
● Unsupported, Undocumented …
Underestimated
- 31. SIGPROF - Interrupt Handlers
● OS Managed timing based interrupt
● Interrupts the thread and directly calls an
event handler
● Used by profilers we’ll be talking about
- 33. “You are in a maze of twisty little stack frames,
all alike”
- 34. AsyncGetCallTrace under the hood
● A Java thread is ‘possessed’
● You have the PC/FP/SP
● What is the call trace?
○ jmethodId - Java Method Identifier
○ bci - Byte Code Index -> used to find line number
- 35. Where Am I?
● Given a PC what is the current method?
● Is this a Java method?
○ Each method ‘lives’ in a range of addresses
● If not, what do we do?
- 36. Java Method? Which line?
● Given a PC, what is the current line?
○ Not all instructions map directly to a source line
● Given super-scalar CPUs what does PC
mean?
● What are the limits of PC accuracy?
- 37. “> I think Andi mentioned this to me last year --
> that instruction profiling was no longer reliable.
It never was.”
http://permalink.gmane.org/gmane.linux.kernel.perf.user/1948
Exchange between Brenden Gregg and Andi Kleen
- 38. Skid
● PC indicated will be >= to PC at sample time
● Known limitation of instruction profiling
● Leads to harder ‘blame analysis’
- 39. Limits of line number accuracy:
Line number (derived from BCI) is the closest
attributable BCI to the PC (-XX:+DebugNonSafepoint)
The PC itself is within some skid distance from
actual sampled instruction
- 40. ● Divided into frames
○ frame { sender*, stack*, pc }
● A single linked list:
root(null, s0, pc1) <- call1 (root, s1, pc2) <- call2(call1, s2, pc2)
● Convert to: (jmethodId,lineno)
The Stack
- 41. A typical stack
● JVM Thread runner infra:
○ JavaThread::run to JavaCalls::call_helper
● Interleaved Java frames:
○ Interpreted
○ Compiled
○ Java to Native and back
● Top frame may be Java or Native
- 42. Native frames
● Ignored, but need to navigate through
● Use a dedicated FP register to find sender
● But only if compiled to do so…
● Use a last remembered Java frame instead
See: http://duartes.org/gustavo/blog/post/journey-to-the-stack/
- 43. Java Compiled Frames
● C1/C2 produce native code
● No FP register: use set frame size
● Challenge: methods can move (GC)
● Challenge: methods can get recompiled
- 44. Java Interpreter frames
● Separately managed by the runtime
● Make an effort to look like normal frames
● Challenge: may be interrupted half-way
through construction...
- 45. Virtual Frames
● C1/C2 inline code (intrinsics/other methods)
● No data on stack
● Must use JVM debug info
- 46. AsyncGetCallTrace Limitations
● Only profiles running threads
● Accuracy of line info limited by reality
● Only reports Java frames/threads
● Must lookup debug info during call
- 47. Compilers: Friend or Fiend?
void safe_reset(void *start, size_t size) {
char *base = reinterpret_cast<char *>(start);
char *end = base + size;
for (char *p = base; p < end; p++) {
*p = 0;
}
}
- 48. Compilers: Friend or Fiend?
safe_reset(void*, unsigned long):
lea rdx, [rdi+rsi]
cmp rdi, rdx
jae .L3
sub rdx, rdi
xor esi, esi
jmp memset
.L3:
rep ret
- 49. Concurrency Bug
● Even simple concurrency bugs are hard to
spot
● Unspotted race condition in the ring buffer
● Spotted thanks to open source & Rajiv
Signal
- 54. Perf
● A Linux profiling tool
● Can be made to work with Java
● JMH integration
● Ongoing integration efforts
- 58. What did we cover?
● Biases in Profilers
● More accurate sampling
● Alternative Profiling Approaches