JVM JIT-compiler overview @ JavaOne Moscow 2013

1
JIT-compiler in JVM
seen by a Java developer
Vladimir Ivanov
HotSpot JVM Compiler
Oracle Corp.

2
Agenda
§  about compilers in general
–  … and JIT-compilers in particular
§  about JIT-compilers in HotSpot JVM
§  monitoring JIT-compilers in HotSpot JVM

3
Static vs Dynamic
AOT vs JIT

4
Dynamic and Static Compilation Differences
§  Static compilation
–  “ahead-of-time”(AOT) compilation
–  Source code → Native executable
–  Most of compilation work happens before executing
§  Modern Java VMs use dynamic compilers (JIT)
–  “just-in-time” (JIT) compilation
–  Source code → Bytecode → Interpreter + JITted executable
–  Most of compilation work happens during executing

5
§  Static compilation (AOT)
–  can utilize complex and heavy analyses and optimizations
–  … but static information sometimes isn’t enough
–  … and it’s hard to rely on profiling info, if any
–  moreover, how to utilize specific platform features (like SSE 4.2)?

6
§  Modern Java VMs use dynamic compilers (JIT)
–  aggressive optimistic optimizations
§  through extensive usage of profiling info
–  … but budget is limited and shared with an application
–  startup speed suffers
–  peak performance may suffer as well (not necessary)

7
JIT-compilation
§  Just-In-Time compilation
§  Compiled when needed
§  Maybe immediately before execution
–  ...or when we decide it’s important
–  ...or never?

9
JVM
§  Runtime
–  class loading, bytecode verification, synchronization
§  JIT
–  profiling, compilation plans, OSR
–  aggressive optimizations
§  GC
–  different algorithms: throughput vs. response time

10
JVM: Makes Bytecodes Fast
§  JVMs eventually JIT bytecodes
–  To make them fast
–  Some JITs are high quality optimizing compilers
§  But cannot use existing static compilers directly:
–  Tracking OOPs (ptrs) for GC
–  Java Memory Model (volatile reordering & fences)
–  New code patterns to optimize
–  Time & resource constraints (CPU, memory)

11
JVM: Makes Bytecodes Fast
§  JIT'ing requires Profiling
–  Because you don't want to JIT everything
§  Profiling allows focused code-gen
§  Profiling allows better code-gen
–  Inline what’s hot
–  Loop unrolling, range-check elimination, etc
–  Branch prediction, spill-code-gen, scheduling

12
Dynamic Compilation (JIT)
§  Knows about
–  loaded classes, methods the program has executed
§  Makes optimization decisions based on code paths executed
–  Code generation depends on what is observed:
§  loaded classes, code paths executed, branches taken
§  May re-optimize if assumption was wrong, or alternative code paths
taken
–  Instruction path length may change between invocations of methods as a
result of de-optimization / re-compilation

13
§  Can do non-conservative optimizations in dynamic
§  Separates optimization from product delivery cycle
–  Update JVM, run the same application, realize improved performance!
–  Can be "tuned" to the target platform

14
Profiling
§  Gathers data about code during execution
–  invariants
§  types, constants (e.g. null pointers)
–  statistics
§  branches, calls
§  Gathered data is used during optimization
–  Educated guess
–  Guess can be wrong

15
Profile-guided optimization (PGO)
§  Use profile for more efficient optimization
§  PGO in JVMs
–  Always have it, turned on by default
–  Developers (usually) not interested or concerned about it
–  Profile is always consistent to execution scenario

16
Optimistic Compilers
§  Assume profile is accurate
–  Aggressively optimize based on profile
–  Bail out if we’re wrong
§  ...and hope that we’re usually right

17
§  Is dynamic compilation overhead essential?
–  The longer your application runs, the less the overhead
§  Trading off compilation time, not application time
–  Steal some cycles very early in execution
–  Done automagically and transparently to application
§  Most of “perceived” overhead is compiler waiting for more data
–  ...thus running semi-optimal code for time being
Overhead

18
JVM
Author: Alexey Shipilev

19
Mixed-Mode Execution
§  Interpreted
–  Bytecode-walking
–  Artificial stack machine
§  Compiled
–  Direct native operations
–  Native register machine

20
Bytecode Execution
1 2
34
Interpretation Profiling
Dynamic
Compilation
Deoptimization

21
Deoptimization
§  Bail out of running native code
–  stop executing native (JIT-generated) code
–  start interpreting bytecode
§  It’s a complicated operation at runtime…

22
OSR: On-Stack Replacement
§  Running method never exits?
§  But it’s getting really hot?
§  Generally means loops, back-branching
§  Compile and replace while running
§  Not typically useful in large systems
§  Looks great on benchmarks!

24
Optimizations in HotSpot JVM
§  compiler tactics
delayed compilation
tiered compilation
on-stack replacement
delayed reoptimization
program dependence graph rep.
static single assignment rep.
§  proof-based techniques
exact type inference
memory value inference
memory value tracking
constant folding
reassociation
operator strength reduction
null check elimination
type test strength reduction
type test elimination
algebraic simplification
common subexpression elimination
integer range typing
§  flow-sensitive rewrites
conditional constant propagation
dominating test detection
flow-carried type narrowing
dead code elimination
§  language-specific techniques
class hierarchy analysis
devirtualization
symbolic constant propagation
autobox elimination
escape analysis
lock elision
lock fusion
de-reflection
§  speculative (profile-based) techniques
optimistic nullness assertions
optimistic type assertions
optimistic type strengthening
optimistic array length strengthening
untaken branch pruning
optimistic N-morphic inlining
branch frequency prediction
call frequency prediction
§  memory and placement transformation
expression hoisting
expression sinking
redundant store elimination
adjacent store fusion
card-mark elimination
merge-point splitting
§  loop transformations
loop unrolling
loop peeling
safepoint elimination
iteration range splitting
range check elimination
loop vectorization
§  global code shaping
inlining (graph integration)
global code motion
heat-based code layout
switch balancing
throw inlining
§  control flow graph transformation
local code scheduling
local code bundling
delay slot filling
graph-coloring register allocation
linear scan register allocation
live range splitting
copy coalescing
constant splitting
copy removal
address mode matching
instruction peepholing
DFA-based code generator

25
JVM: Makes Virtual Calls Fast
§  C++ avoids virtual calls – because they are slow
§  Java embraces them – and makes them fast
–  Well, mostly fast – JIT's do Class Hierarchy Analysis (CHA)
–  CHA turns most virtual calls into static calls
–  JVM detects new classes loaded, adjusts CHA
§  May need to re-JIT
–  When CHA fails to make the call static, inline caches
–  When IC's fail, virtual calls are back to being slow

26
Call Site
§  The place where you make a call
§  Monomorphic (“one shape”)
–  Single target class
§  Bimorphic (“two shapes”)
§  Polymorphic (“many shapes”)
§  Megamorphic

27
Inlining
§  Combine caller and callee into one unit
–  e.g.based on profile
–  … or prove smth using CHA (Class Hierarchy Analysis)
–  Perhaps with a guard/test
§  Optimize as a whole
–  More code means better visibility

28
Inlining
int addAll(int max) {
int accum = 0;
for (int i = 0; i < max; i++) {
accum = add(accum, i);
}
return accum;
}
int add(int a, int b) { return a + b; }

29
Inlining
int addAll(int max) {
int accum = 0;
for (int i = 0; i < max; i++) {
accum = accum + i;
}
return accum;
}

30
Inlining and devirtualization
§  Inlining is the most profitable compiler optimization
–  Rather straightforward to implement
–  Huge benefits: expands the scope for other optimizations
§  OOP needs polymorphism, that implies virtual calls
–  Prevents naïve inlining
–  Devirtualization is required
–  (This does not mean you should not write OOP code)

31
JVM Devirtualization
§  Developers shouldn't care
§  Analyze hierarchy of currently loaded classes
§  Efficiently devirtualize all monomorphic calls
§  Able to devirtualize polymorphic calls
§  JVM may inline dynamic methods
–  Reflection calls
–  Runtime-synthesized methods
–  JSR 292

32
Feedback multiplies optimizations
§  On-line profiling and CHA produces information
–  ...which lets the JIT ignore unused paths
–  ...and helps the JIT sharpen types on hot paths
–  ...which allows calls to be devirtualized
–  ...allowing them to be inlined
–  ...expanding an ever-widening optimization horizon
§  Result:
Large native methods containing tightly optimized machine code for
hundreds of inlined calls!

33
Loop unrolling
public void foo(int[] arr, int a) {
for (int i = 0; i < arr.length; i++) {
arr[i] += a;
}
}

34
Loop unrolling
for (int i = 0; i < arr.length; i=i+4) {
arr[i] += a; arr[i+1] += a; arr[i+2] += a; arr[i+3] += a;
}
}

35
Loop unrolling
int new_limit = arr.length / 4;
for (int i = 0; i < new_limit; i++) {
arr[4*i] += a; arr[4*i+1] += a; arr[4*i+2] += a; arr[4*i+3] += a;
}
for (int i = new_limit*4; i < arr.length; i++) {
arr[i] += a;
}}

36
Lock Coarsening
pubic void m1(Object newValue) {
syncronized(this) {
field1 = newValue;
}
syncronized(this) {
field2 = newValue;
}
}

37
Lock Coarsening
pubic void m1(Object newValue) {
syncronized(this) {
field1 = newValue;
field2 = newValue;
}
}

38
Lock Eliding
public void m1() {
List list = new ArrayList();
synchronized (list) {
list.add(someMethod());
}
}

39
Lock Eliding
public void m1() {
synchronized (list) {
}
}

40
Lock Eliding
public void m1() {
}

41
Escape Analysis
public int m1() {
Pair p = new Pair(1, 2);
return m2(p);
}
public int m2(Pair p) {
return p.first + m3(p);
}
public int m3(Pair p) { return p.second;}
Initial version

42
Escape Analysis
public int m1() {
Pair p = new Pair(1, 2);
return p.first + p.second;
}
After deep inlining

43
Escape Analysis
public int m1() {
return 3;
}
Optimized version

44
Intrinsic
§  Known to the JIT compiler
–  method bytecode is ignored
–  inserts “best” native code
§  e.g. optimized sqrt in machine code
§  Existing intrinsics
–  String::equals, Math::*, System::arraycopy, Object::hashCode,
Object::getClass, sun.misc.Unsafe::*

46
JVMs
§  Oracle HotSpot
§  IBM J9
§  Oracle JRockit
§  Azul Zing
§  Excelsior JET
§  Jikes RVM

47
HotSpot JVM
§  client / C1
§  server / C2
§  tiered mode (C1 + C2)
JIT-compilers

48
HotSpot JVM
§  client / C1
–  $ java –client
§  only available in 32-bit VM
–  fast code generation of acceptable quality
–  basic optimizations
–  doesn’t need profile
–  compilation threshold: 1,5k invocations
JIT-compilers

49
HotSpot JVM
§  server / C2
–  $ java –server
–  highly optimized code for speed
–  many aggressive optimizations which rely on profile
–  compilation threshold: 10k invocations
JIT-compilers

50
HotSpot JVM
§  Client / C1
+ fast startup
–  peak performance suffers
§  Server / C2
+ very good code for hot methods
–  slow startup / warmup
JIT-compilers comparison

51
Tiered compilation
§  -XX:+TieredCompilation
§  Multiple tiers of interpretation, C1, and C2
§  Level0=Interpreter
§  Level1-3=C1
–  #1: C1 w/o profiling
–  #2: C1 w/ basic profiling
–  #3: C1 w/ full profiling
§  Level4=C2
C1 + C2

53
Monitoring JIT-Compiler
§  how to print info about compiled methods?
–  -XX:+PrintCompilation
§  how to print info about inlining decisions
–  -XX:+PrintInlining
§  how to control compilation policy?
–  -XX:CompileCommand=…
§  how to print assembly code?
–  -XX:+PrintAssembly
–  -XX:+PrintOptoAssembly (C2-only)

54
Print Compilation
§  -XX:+PrintCompilation
§  Print methods as they are JIT-compiled
§  Class + name + size

55
Print Compilation
$ java -XX:+PrintCompilation
988 1 java.lang.String::hashCode (55 bytes)
1271 2 sun.nio.cs.UTF_8$Encoder::encode (361 bytes)
1406 3 java.lang.String::charAt (29 bytes)
Sample output

56
Print Compilation
§  2043 470 % ! jdk.nashorn.internal.ir.FunctionNode::accept @ 136 (265 bytes)
% == OSR compilation
! == has exception handles (may be expensive)
s == synchronized method
§  2028 466 n java.lang.Class::isArray (native)
n == native method
Other useful info

57
Print Compilation
§  621 160 java.lang.Object::equals (11 bytes) made not entrant
–  don‘t allow any new calls into this compiled version
§  1807 160 java.lang.Object::equals (11 bytes) made zombie
–  can safely throw away compiled version
Not just compilation notifications

58
No JIT At All?
§  Code is too large
§  Code isn’t too «hot»
–  executed not too often

59
Print Inlining
§  -XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining
§  Shows hierarchy of inlined methods
§  Prints reason, if a method isn’t inlined

60
Print Inlining
$ java -XX:+PrintCompilation -XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining
75 1 java.lang.String::hashCode (55 bytes)
88 2 sun.nio.cs.UTF_8$Encoder::encode (361 bytes)
@ 14 java.lang.Math::min (11 bytes) (intrinsic)
@ 139 java.lang.Character::isSurrogate (18 bytes) never executed
103 3 java.lang.String::charAt (29 bytes)

61
Inlining Tuning
§  -XX:MaxInlineSize=35
–  Largest inlinable method (bytecode)
§  -XX:InlineSmallCode=#
–  Largest inlinable compiled method
§  -XX:FreqInlineSize=#
–  Largest frequently-called method…
§  -XX:MaxInlineLevel=9
–  How deep does the rabbit hole go?
§  -XX:MaxRecursiveInlineLevel=#
–  recursive inlining

62
Machine Code
§  -XX:+PrintAssembly
§  http://wikis.sun.com/display/HotSpotInternals/PrintAssembly
§  Knowing code compiles is good
§  Knowing code inlines is better
§  Seeing the actual assembly is best!

63
-XX:CompileCommand=
§  Syntax
–  “[command] [method] [signature]”
§  Supported commands
–  exclude – never compile
–  inline – always inline
–  dontinline – never inline
§  Method reference
–  class.name::methodName
§  Method signature is optional

64
What Have We Learned?
§  How JIT compilers work
§  How HotSpot’s JIT works
§  How to monitor the JIT in HotSpot

65
“Quantum Performance Effects”
Sergey Kuksenko, Oracle
today, 13:30-14:30, «San-Francisco» hall
“Bulletproof Java Concurrency”
Aleksey Shipilev, Oracle
today, 15:30-16:30, «Moscow» hall
Related Talks

66
Questions?
vladimir.x.ivanov@oracle.com
@iwanowww

JVM JIT-compiler overview @ JavaOne Moscow 2013

Related slideshows

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to JVM JIT-compiler overview @ JavaOne Moscow 2013

Similar to JVM JIT-compiler overview @ JavaOne Moscow 2013 (20)

More from Vladimir Ivanov

More from Vladimir Ivanov (9)

Recently uploaded

Recently uploaded (20)

JVM JIT-compiler overview @ JavaOne Moscow 2013