SlideShare a Scribd company logo
Understanding Memory Management
in Spark For Fun And Profit
Shivnath Babu (Duke University, Unravel Data Systems)
Mayuresh Kunjir (Duke University)
We are
• Shivnath Babu
– Associate Professor @ Duke University
– CTO, Unravel Data Systems
• Mayuresh Kunjir
– PhD Student @ Duke University
A Day in the Life of a
Spark Application Developer
spark-submit --class SortByKey --num-executors 10 --executor-memory 4G --executor-cores 16
Container
[pid=28352,containerID=container_1464692140
815_0006_01_000004] is running beyond
physical memory limits. Current usage: 5 GB of
5 GB physical memory used; 6.8 GB of 10.5 GB
virtual memory used. Killing container.
Searches on StackOverflow
Fix #1: Turn off Yarn’s Memory Policing
yarn.nodemanager.pmem-check-enabled=false
Application Succeeds!
But, wait a minute
This fix is not multi-tenant friendly!
-- Ops will not be happy
Fix #2: Use a Hint from Spark
WARN yarn.YarnAllocator: Container killed by YARN
for exceeding memory limits. 5 GB of 5 GB physical
memory used. Consider boosting
spark.yarn.executor.memoryOverhead
What is this Memory Overhead?
Node memory
Container
OS overhead Executor memory
Shared
native
libs
Memory
mapped
files
Thread
Stacks
NIO
buffers
A Peek at the Memory Usage Timeline
Executor JVM
max heap
Container
memory
Physical memory
used by Container
as seen by OS
Container is
killed by Yarn
After Applying Fix #2
Leaving more room
for overheads
spark-submit --class SortByKey --num-executors 10 --executor-memory 4G
--executor-cores 16 --conf spark.yarn.executor.memoryOverhead=1536m
But, what did we do here?
We traded off memory efficiency
for reliability
What was the Root Cause?
Each task is fetching shuffle
files over NIO channel. The
buffers required are allocated
from OS overheads
Container is
killed by Yarn
Fix #3: Reduce Executor Cores
Less Concurrent Tasks  Less Overhead Space
spark-submit --class SortByKey --num-executors 10 --executor-memory 4G --executor-cores 8
Application Succeeds!
But, what did we really do here?
We traded off performance and
CPU efficiency for reliability
Let’s Dig Deeper
Why is so much
memory consumed
in Executor heap?
JVM’s View of Executor Memory
Node memory
Container
OS overhead Executor memory
Direct
byte
buffers
JVM
Internal
(Code Cache
+ Perm Gen)
Young
Gen
Old
Gen
Off-heap Heap
JVM’s View of Executor Memory
Fix #4: Frequent GC for Smaller Heap
spark-submit --class SortByKey --num-executors 10 --executor-memory 4G --executor-cores 16
- --conf "spark.executor.extraJavaOptions=-XX:OldSize=100m -XX:MaxNewSize=100m"
But, what did we do now?
Reliability is achieved at the cost
of extra CPU cycles spent in GC,
degrading performance by 15%
Can we do better?
So far, we have sacrificed either
performance or efficiency for reliability
Fix #5: Spark can Exploit Structure in Data
spark-submit --class SortByKeyDF --num-executors 10 --executor-memory 4G --executor-cores 16
Tungsten’s custom serialization
reduces memory footprint
while also reducing processing time
Application succeeds
and runs 2x faster
compared to Fix #2!
Reliability
Efficiency
Performance
Predictability
Objectives
Workloads
Memory management options
BI
ML
Graph
Streaming
Challenges in
Memory
Management
Next
• Key insights from
experimental analysis
• Current work
Yarn-level Memory Management
Yarn-level Memory Management
• Executor memory
• OS memory overhead per executor
• Cores per executor
• Number of executors
Node memory
Container
OS overhead Executor memory
Impact of Changing Executor Memory
Failed
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2271)
at
java.io.ByteArrayOutputStream.grow(ByteArrayO
utputStream.java:118)
...
at
org.apache.spark.storage.BlockManager.dataSeri
alize(BlockManager.scala:1202)
...
at
org.apache.spark.CacheManager.putInBlockMan
ager(CacheManager.scala:175)
Reliability EfficiencyPerformancePredictability
Spark-level Memory Management
Spark-level Memory Management
Node memory
Container
OS overhead Executor memory
spark.executor.memory
Storage
spark.storage.memoryFraction
spark.storage.safetyFraction
Execution
spark.shuffle.memoryFraction
spark.shuffle.safetyFraction
Unmanaged
spark.executor.memory
Storage Execution
Unmanaged
Unified pool
spark.memory.fraction
spark.memory.
storageFraction
Legacy Unified
Spark-level Memory Management
• Legacy or unified?
– If legacy, what is size of storage pool Vs. execution pool?
• Caching
– On heap or off-heap (e.g., Tachyon)?
– Data format (deserialized or serialized)
– Provision for data unrolling
• Execution data
– Java-managed or Tungsten-managed
Comparing Legacy and Unified
Increasing storage pool size from left to right
SortByKey
K-Means
Increasing storage pool size,
Decreasing execution pool size
Unified
Unified
Unified does as Expected, But…
Size of storage pool increases from left to right
Performance Predictability
Executors fail due to OOM errors
while receiving shuffle blocks
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2271)
at
java.io.ByteArrayOutputStream.grow(ByteArrayO
utputStream.java:118)
...
at
org.apache.spark.storage.BlockManager.dataSeri
alize(BlockManager.scala:1202)
...
at
org.apache.spark.network.netty.NettyBlockRpcSe
rver.receive(NettyBlockRpcServer.scala:58)
legacy
Unified Memory Manager is:
• A step in the right direction
• Not unified enough
Spark-level Memory Management
• Legacy or unified?
– If legacy, what is size of storage pool Vs. execution pool?
• Caching
– On heap or off-heap (e.g., Tachyon)?
– Data format (deserialized or serialized)
– Provision for data unrolling
• Execution data
– Java-managed or Tungsten-managed
Deserialized Vs. Serialized cache
Size of storage pool increases from left to right
Performance Predictability
Memory footprint of data in
cache goes down by ~20%
making more partitions fit in the
storage pool
Efficiency
legacy
Another Application, Another Story!
Size of storage pool increases from left to right
Failed
Failed
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2271)
at
java.io.ByteArrayOutputStream.grow(ByteArrayO
utputStream.java:118)
...
at
org.apache.spark.storage.BlockManager.dataSeri
alize(BlockManager.scala:1202)
...
at
org.apache.spark.CacheManager.putInBlockMan
ager(CacheManager.scala:175)
PredictabilityReliability
Executors fail due to OOM
errors while serializing data
legacy
Spark-level memory management
• Legacy or unified?
– If legacy, what is size of storage pool Vs. execution pool?
• Caching
– On heap or off-heap (e.g., Tachyon)?
– Data format (deserialized or serialized)
– Provision for data unrolling
• Execution data
– Java-managed or Tungsten-managed
Execution Data Management
All objects in Heap Up to 2GB objects in
off-heap at any time
We have seen that Tungsten-managed heap improves the
performance significantly. (Fix #5)
We did not notice much further
improvements by pushing
objects to off-heap
JVM-level Memory Management
JVM-level Memory Management
• Which GC algorithm? (Parallel GC, G1 GC, …)
• Size cap for a GC pool
• Frequency of collections
• Number of parallel GC threads
Spark-JVM Interactions
Ratio of old generation size to average RDD size
cached per executor varied
Keep JVM OldGen size at least as big as
RDD cache
Keeping Spark storage pool size
PageRank
Keeping Spark storage pool size
constant, the size of OldGen pool
is increased from left to right
K-means executors display more
skew in data compared to
PageRank
Current Work
• Automatic root-cause
analysis of memory-
related issues
• Auto-tuning algorithms
for memory allocation in
multi-tenant clusters
Get Free Trial Edition:
bit.ly/getunravel
UNCOVER ISSUES
UNLEASH RESOURCES
UNRAVEL PERFORMANCE

More Related Content

Understanding Memory Management In Spark For Fun And Profit

  • 1. Understanding Memory Management in Spark For Fun And Profit Shivnath Babu (Duke University, Unravel Data Systems) Mayuresh Kunjir (Duke University)
  • 2. We are • Shivnath Babu – Associate Professor @ Duke University – CTO, Unravel Data Systems • Mayuresh Kunjir – PhD Student @ Duke University
  • 3. A Day in the Life of a Spark Application Developer
  • 4. spark-submit --class SortByKey --num-executors 10 --executor-memory 4G --executor-cores 16 Container [pid=28352,containerID=container_1464692140 815_0006_01_000004] is running beyond physical memory limits. Current usage: 5 GB of 5 GB physical memory used; 6.8 GB of 10.5 GB virtual memory used. Killing container.
  • 6. Fix #1: Turn off Yarn’s Memory Policing yarn.nodemanager.pmem-check-enabled=false Application Succeeds!
  • 7. But, wait a minute This fix is not multi-tenant friendly! -- Ops will not be happy
  • 8. Fix #2: Use a Hint from Spark WARN yarn.YarnAllocator: Container killed by YARN for exceeding memory limits. 5 GB of 5 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead
  • 9. What is this Memory Overhead? Node memory Container OS overhead Executor memory Shared native libs Memory mapped files Thread Stacks NIO buffers
  • 10. A Peek at the Memory Usage Timeline Executor JVM max heap Container memory Physical memory used by Container as seen by OS Container is killed by Yarn
  • 11. After Applying Fix #2 Leaving more room for overheads spark-submit --class SortByKey --num-executors 10 --executor-memory 4G --executor-cores 16 --conf spark.yarn.executor.memoryOverhead=1536m
  • 12. But, what did we do here? We traded off memory efficiency for reliability
  • 13. What was the Root Cause? Each task is fetching shuffle files over NIO channel. The buffers required are allocated from OS overheads Container is killed by Yarn
  • 14. Fix #3: Reduce Executor Cores Less Concurrent Tasks  Less Overhead Space spark-submit --class SortByKey --num-executors 10 --executor-memory 4G --executor-cores 8 Application Succeeds!
  • 15. But, what did we really do here? We traded off performance and CPU efficiency for reliability
  • 16. Let’s Dig Deeper Why is so much memory consumed in Executor heap?
  • 17. JVM’s View of Executor Memory Node memory Container OS overhead Executor memory Direct byte buffers JVM Internal (Code Cache + Perm Gen) Young Gen Old Gen Off-heap Heap
  • 18. JVM’s View of Executor Memory
  • 19. Fix #4: Frequent GC for Smaller Heap spark-submit --class SortByKey --num-executors 10 --executor-memory 4G --executor-cores 16 - --conf "spark.executor.extraJavaOptions=-XX:OldSize=100m -XX:MaxNewSize=100m"
  • 20. But, what did we do now? Reliability is achieved at the cost of extra CPU cycles spent in GC, degrading performance by 15%
  • 21. Can we do better? So far, we have sacrificed either performance or efficiency for reliability
  • 22. Fix #5: Spark can Exploit Structure in Data spark-submit --class SortByKeyDF --num-executors 10 --executor-memory 4G --executor-cores 16 Tungsten’s custom serialization reduces memory footprint while also reducing processing time Application succeeds and runs 2x faster compared to Fix #2!
  • 24. Next • Key insights from experimental analysis • Current work
  • 26. Yarn-level Memory Management • Executor memory • OS memory overhead per executor • Cores per executor • Number of executors Node memory Container OS overhead Executor memory
  • 27. Impact of Changing Executor Memory Failed java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2271) at java.io.ByteArrayOutputStream.grow(ByteArrayO utputStream.java:118) ... at org.apache.spark.storage.BlockManager.dataSeri alize(BlockManager.scala:1202) ... at org.apache.spark.CacheManager.putInBlockMan ager(CacheManager.scala:175) Reliability EfficiencyPerformancePredictability
  • 29. Spark-level Memory Management Node memory Container OS overhead Executor memory spark.executor.memory Storage spark.storage.memoryFraction spark.storage.safetyFraction Execution spark.shuffle.memoryFraction spark.shuffle.safetyFraction Unmanaged spark.executor.memory Storage Execution Unmanaged Unified pool spark.memory.fraction spark.memory. storageFraction Legacy Unified
  • 30. Spark-level Memory Management • Legacy or unified? – If legacy, what is size of storage pool Vs. execution pool? • Caching – On heap or off-heap (e.g., Tachyon)? – Data format (deserialized or serialized) – Provision for data unrolling • Execution data – Java-managed or Tungsten-managed
  • 31. Comparing Legacy and Unified Increasing storage pool size from left to right SortByKey K-Means Increasing storage pool size, Decreasing execution pool size Unified Unified
  • 32. Unified does as Expected, But… Size of storage pool increases from left to right Performance Predictability Executors fail due to OOM errors while receiving shuffle blocks java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2271) at java.io.ByteArrayOutputStream.grow(ByteArrayO utputStream.java:118) ... at org.apache.spark.storage.BlockManager.dataSeri alize(BlockManager.scala:1202) ... at org.apache.spark.network.netty.NettyBlockRpcSe rver.receive(NettyBlockRpcServer.scala:58) legacy
  • 33. Unified Memory Manager is: • A step in the right direction • Not unified enough
  • 34. Spark-level Memory Management • Legacy or unified? – If legacy, what is size of storage pool Vs. execution pool? • Caching – On heap or off-heap (e.g., Tachyon)? – Data format (deserialized or serialized) – Provision for data unrolling • Execution data – Java-managed or Tungsten-managed
  • 35. Deserialized Vs. Serialized cache Size of storage pool increases from left to right Performance Predictability Memory footprint of data in cache goes down by ~20% making more partitions fit in the storage pool Efficiency legacy
  • 36. Another Application, Another Story! Size of storage pool increases from left to right Failed Failed java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2271) at java.io.ByteArrayOutputStream.grow(ByteArrayO utputStream.java:118) ... at org.apache.spark.storage.BlockManager.dataSeri alize(BlockManager.scala:1202) ... at org.apache.spark.CacheManager.putInBlockMan ager(CacheManager.scala:175) PredictabilityReliability Executors fail due to OOM errors while serializing data legacy
  • 37. Spark-level memory management • Legacy or unified? – If legacy, what is size of storage pool Vs. execution pool? • Caching – On heap or off-heap (e.g., Tachyon)? – Data format (deserialized or serialized) – Provision for data unrolling • Execution data – Java-managed or Tungsten-managed
  • 38. Execution Data Management All objects in Heap Up to 2GB objects in off-heap at any time We have seen that Tungsten-managed heap improves the performance significantly. (Fix #5) We did not notice much further improvements by pushing objects to off-heap
  • 40. JVM-level Memory Management • Which GC algorithm? (Parallel GC, G1 GC, …) • Size cap for a GC pool • Frequency of collections • Number of parallel GC threads
  • 41. Spark-JVM Interactions Ratio of old generation size to average RDD size cached per executor varied Keep JVM OldGen size at least as big as RDD cache Keeping Spark storage pool size PageRank Keeping Spark storage pool size constant, the size of OldGen pool is increased from left to right K-means executors display more skew in data compared to PageRank
  • 42. Current Work • Automatic root-cause analysis of memory- related issues • Auto-tuning algorithms for memory allocation in multi-tenant clusters
  • 43. Get Free Trial Edition: bit.ly/getunravel UNCOVER ISSUES UNLEASH RESOURCES UNRAVEL PERFORMANCE