Understanding Memory Management In Spark For Fun And Profit

Understanding Memory Management
in Spark For Fun And Profit
Shivnath Babu (Duke University, Unravel Data Systems)
Mayuresh Kunjir (Duke University)

We are
• Shivnath Babu
– Associate Professor @ Duke University
– CTO, Unravel Data Systems
• Mayuresh Kunjir
– PhD Student @ Duke University

A Day in the Life of a
Spark Application Developer

spark-submit --class SortByKey --num-executors 10 --executor-memory 4G --executor-cores 16
Container
[pid=28352,containerID=container_1464692140
815_0006_01_000004] is running beyond
physical memory limits. Current usage: 5 GB of
5 GB physical memory used; 6.8 GB of 10.5 GB
virtual memory used. Killing container.

Fix #1: Turn off Yarn’s Memory Policing
yarn.nodemanager.pmem-check-enabled=false
Application Succeeds!

But, wait a minute
This fix is not multi-tenant friendly!
-- Ops will not be happy

Fix #2: Use a Hint from Spark
WARN yarn.YarnAllocator: Container killed by YARN
for exceeding memory limits. 5 GB of 5 GB physical
memory used. Consider boosting
spark.yarn.executor.memoryOverhead

What is this Memory Overhead?
Node memory
Container
OS overhead Executor memory
Shared
native
libs
Memory
mapped
files
Thread
Stacks
NIO
buffers

A Peek at the Memory Usage Timeline
Executor JVM
max heap
Container
memory
Physical memory
used by Container
as seen by OS
Container is
killed by Yarn

After Applying Fix #2
Leaving more room
for overheads
spark-submit --class SortByKey --num-executors 10 --executor-memory 4G
--executor-cores 16 --conf spark.yarn.executor.memoryOverhead=1536m

But, what did we do here?
We traded off memory efficiency
for reliability

What was the Root Cause?
Each task is fetching shuffle
files over NIO channel. The
buffers required are allocated
from OS overheads
Container is
killed by Yarn

Fix #3: Reduce Executor Cores
Less Concurrent Tasks  Less Overhead Space
Application Succeeds!

But, what did we really do here?
We traded off performance and
CPU efficiency for reliability

Let’s Dig Deeper
Why is so much
memory consumed
in Executor heap?

JVM’s View of Executor Memory
Node memory
Container
Direct
byte
buffers
JVM
Internal
(Code Cache
+ Perm Gen)
Young
Gen
Old
Gen
Off-heap Heap

JVM’s View of Executor Memory

Fix #4: Frequent GC for Smaller Heap
- --conf "spark.executor.extraJavaOptions=-XX:OldSize=100m -XX:MaxNewSize=100m"

But, what did we do now?
Reliability is achieved at the cost
of extra CPU cycles spent in GC,
degrading performance by 15%

Can we do better?
So far, we have sacrificed either
performance or efficiency for reliability

Fix #5: Spark can Exploit Structure in Data
spark-submit --class SortByKeyDF --num-executors 10 --executor-memory 4G --executor-cores 16
Tungsten’s custom serialization
reduces memory footprint
while also reducing processing time
Application succeeds
and runs 2x faster
compared to Fix #2!

Reliability
Efficiency
Performance
Predictability
Objectives
Workloads
Memory management options
BI
ML
Graph
Streaming
Challenges in
Memory
Management

Next
• Key insights from
experimental analysis
• Current work

Yarn-level Memory Management
• Executor memory
• OS memory overhead per executor
• Cores per executor
• Number of executors
Node memory
Container

Impact of Changing Executor Memory
Failed
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2271)
at
java.io.ByteArrayOutputStream.grow(ByteArrayO
utputStream.java:118)
...
at
org.apache.spark.storage.BlockManager.dataSeri
alize(BlockManager.scala:1202)
...
at
org.apache.spark.CacheManager.putInBlockMan
ager(CacheManager.scala:175)
Reliability EfficiencyPerformancePredictability

Spark-level Memory Management
Node memory
Container
spark.executor.memory
Storage
spark.storage.memoryFraction
spark.storage.safetyFraction
Execution
spark.shuffle.memoryFraction
spark.shuffle.safetyFraction
Unmanaged
spark.executor.memory
Storage Execution
Unmanaged
Unified pool
spark.memory.fraction
spark.memory.
storageFraction
Legacy Unified

Spark-level Memory Management
• Legacy or unified?
– If legacy, what is size of storage pool Vs. execution pool?
• Caching
– On heap or off-heap (e.g., Tachyon)?
– Data format (deserialized or serialized)
– Provision for data unrolling
• Execution data
– Java-managed or Tungsten-managed

Comparing Legacy and Unified
Increasing storage pool size from left to right
SortByKey
K-Means
Increasing storage pool size,
Decreasing execution pool size
Unified
Unified

Unified does as Expected, But…
Size of storage pool increases from left to right
Performance Predictability
Executors fail due to OOM errors
while receiving shuffle blocks
at
...
at
...
at
org.apache.spark.network.netty.NettyBlockRpcSe
rver.receive(NettyBlockRpcServer.scala:58)
legacy

Unified Memory Manager is:
• A step in the right direction
• Not unified enough

Deserialized Vs. Serialized cache
Performance Predictability
Memory footprint of data in
cache goes down by ~20%
making more partitions fit in the
storage pool
Efficiency
legacy

Another Application, Another Story!
Failed
Failed
at
...
at
...
at
org.apache.spark.CacheManager.putInBlockMan
ager(CacheManager.scala:175)
PredictabilityReliability
Executors fail due to OOM
errors while serializing data
legacy

Spark-level memory management
• Legacy or unified?
– If legacy, what is size of storage pool Vs. execution pool?
• Caching
– On heap or off-heap (e.g., Tachyon)?
– Data format (deserialized or serialized)
– Provision for data unrolling
• Execution data
– Java-managed or Tungsten-managed

Execution Data Management
All objects in Heap Up to 2GB objects in
off-heap at any time
We have seen that Tungsten-managed heap improves the
performance significantly. (Fix #5)
We did not notice much further
improvements by pushing
objects to off-heap

JVM-level Memory Management
• Which GC algorithm? (Parallel GC, G1 GC, …)
• Size cap for a GC pool
• Frequency of collections
• Number of parallel GC threads

Spark-JVM Interactions
Ratio of old generation size to average RDD size
cached per executor varied
Keep JVM OldGen size at least as big as
RDD cache
Keeping Spark storage pool size
PageRank
Keeping Spark storage pool size
constant, the size of OldGen pool
is increased from left to right
K-means executors display more
skew in data compared to
PageRank

Current Work
• Automatic root-cause
analysis of memory-
related issues
• Auto-tuning algorithms
for memory allocation in
multi-tenant clusters

Get Free Trial Edition:
bit.ly/getunravel
UNCOVER ISSUES
UNLEASH RESOURCES
UNRAVEL PERFORMANCE

Understanding Memory Management In Spark For Fun And Profit

More Related Content

Understanding Memory Management In Spark For Fun And Profit