Unity Internals: Memory and Performance
- 3. Page
Who Am I ?
• Now Field Engineer @ Unity
• Previously, Software Engineer
• Mainly worked on game engines
• Shipped several video games:
• Captain America: Super Soldier
• FIFA ‘07 – FIFA ’10
• Fight Night: Round 3
6/9/14 3
- 6. Page
Memory Domains
• Native (internal)
• Asset Data: Textures, AudioClips, Meshes
• Game Objects & Components: Transform, etc..
• Engine Internals: Managers, Rendering, Physics, etc..
• Managed - Mono
• Script objects (Managed dlls)
• Wrappers for Unity objects: Game objects, assets,
components
• Native Dlls
• User’s dlls and external dlls (for example: DirectX)
6/9/14 6
- 7. Page
Native Memory: Internal Allocators
• Default
• GameObject
• Gfx
• Profiler
5.x: We are considering to expose an API for using a native
allocator in Dlls
6/9/14 7
- 8. Page
Managed Memory
• Value types (bool, int, float, struct, ...)
• Exist in stack memory. De-allocated when removed
from the stack. No Garbage.
• Reference types (classes)
• Exist on the heap and are handled by the mono/.net
GC. Removed when no longer being referenced.
• Wrappers for Unity Objects :
• GameObject
• Assets : Texture2D, AudioClip, Mesh, …
• Components : MeshRenderer, Transform, MonoBehaviour
6/9/14 8
- 9. Page
Mono Memory Internals
• Allocates system heap blocks for internal allocator
• Will allocate new heap blocks when needed
• Heap blocks are kept in Mono for later use
• Memory can be given back to the system after a while
• …but it depends on the platform è don’t count on it
• Garbage collector cleans up
• Fragmentation can cause new heap blocks even
though memory is not exhausted
6/9/14 9
- 11. Page
Unity Object wrapper
• Some Objects used in scripts have large native
backing memory in unity
• Memory not freed until Finalizers have run
6/9/14 11
WWW
Decompression buffer
Compressed file
Decompressed file
Managed Native
- 12. Page
Mono Garbage Collection
• GC.Collect
• Runs on the main thread when
• Mono exhausts the heap space
• Or user calls System.GC.Collect()
• Finalizers
• Run on a separate thread
• Controlled by mono
• Can have several seconds delay
• Unity native memory
• Dispose() cleans up internal
memory
• Eventually called from finalizer
• Manually call Dispose() to cleanup
6/9/14 12
Main thread Finalizer thread
www = null;
new(someclass);
//no more heap
-> GC.Collect();
www.Dispose();
.....
- 13. Page
Garbage Collection
• Roots are not collected in a GC.Collect
• Thread stacks
• CPU Registers
• GC Handles (used by Unity to hold onto managed
objects)
• Static variables!!
• Collection time scales with managed heap size
• The more you allocate, the slower it gets
6/9/14 13
- 14. Page
GC: does lata layout matter ?
struct Stuff
{
int a;
float b;
bool c;
string leString;
}
Stuff[] arrayOfStuff; << Everything is scanned. GC takes more time
VS
int[] As;
float[] Bs;
bool[] Cs;
string[] leStrings; << Only this is scanned. GC takes less time.
6/9/14 14
- 15. Page
GC: Best Practices
• Reuse objects è Use object pools
• Prefer stack-based allocations è Use struct
instead of class
• System.GC.Collect can be used to trigger
collection
• Calling it 6 times returns the unused memory to
the OS
• Manually call Dispose to cleanup immediately
6/9/14 15
- 16. Page
Avoid temp allocations
• Don’t use FindObjects or LINQ
• Use StringBuilder for string concatenation
• Reuse large temporary work buffers
• ToString()
• .tag è use CompareTag() instead
6/9/14 16
- 17. Page
Unity API Temporary Allocations
Some Examples:
• GetComponents<T>
• Vector3[] Mesh.vertices
• Camera[] Camera.allCameras
• foreach
• does not allocate by definition
• However, there can be a small allocation, depending on the
implementation of .GetEnumerator()
5.x: We are working on new non-allocating versions
6/9/14 17
- 18. Page
Memory fragmentation
• Memory fragmentation is hard to account for
• Fully unload dynamically allocated content
• Switch to a blank scene before proceeding to next level
• This scene could have a hook where you may pause the game
long enough to sample if there is anything significant in
memory
• Ensure you clear out variables so GC.Collect will
remove as much as possible
• Avoid allocations where possible
• Reuse objects where possible within a scene play
• Clear them out for map load to clean the memory
6/9/14 18
- 19. Page
Unloading Unused Assets
• Resources.UnloadUnusedAssets will trigger asset
garbage collection
• It looks for all unreferenced assets and unloads them
• It’s an async operation
• It’s called internally after loading a level
• Resources.UnloadAsset is preferable
• you need to know exactly what you need to Unload
• Unity does not have to scan everything
• Unity 5.0: Multi-threaded asset garbage collection
6/9/14 19
- 21. Page
Mesh Read/Write Option
• It allows you to modify the mesh at run-time
• If enabled, a system-copy of the Mesh will remain in
memory
• It is enabled by default
• In some cases, disabling this option will not reduce the
memory usage
• Skinned meshes
• iOS
Unity 5.0: disable by default – under consideration
6/9/14 21
- 22. Page
Non-Uniform scaled Meshes
We need to correctly transform vertex normals
• Unity 4.x:
• transform the mesh on the CPU
• create an extra copy of the data
• Unity 5.0
• Scaled on GPU
• Extra memory no longer needed
6/9/14 22
- 23. Page
Static Batching
What is it ?
• It’s an optimization that reduces number of draw
calls and state changes
How do I enable it ?
• In the player settings + Tag the object as static
6/9/14 23
- 24. Page
Static Batching
How does it work internally ?
• Build-time: Vertices are transformed to world-
space
• Run-time: Index buffer is created with indices of
visible objects
Unity 5.0:
• Re-implemented static batching without copying of
index buffers
6/9/14 24
- 25. Page
Dynamic Batching
What is it ?
• Similar to Static Batching but it batches non-static
objects at run-time
How do I enable it ?
• In the player settings
�� no need to tag. it auto-magically works…
6/9/14 25
- 26. Page
Dynamic Batching
How does it work internally ?
• objects are transformed to world space on the
CPU
• Temporary VB & IB are created
• Rendered in one draw call
Unity 5.x: we are considering to expose per-platform
parameters
6/9/14 26
- 27. Page
Mesh Skinning
Different Implementations depending on platform:
• x86: SSE
• iOS/Android/WP8: Neon optimizations
• D3D11/XBoxOne/GLES3.0: GPU
• XBox360, WiiU: GPU (memexport)
• PS3: SPU
• WiiU: GPU w/ stream out
Unity 5.0: Skinned meshes use less memory by sharing
index buffers between instances
6/9/14 27
- 29. Page
Unity 5.0: Mono
• No upgrade
• Mainly bug fixes
• New tech in WebGL: IL2CPP
• http://blogs.unity3d.com/2014/04/29/on-the-future-of-
web-publishing-in-unity/
• Stay tuned: there will be a blog post about it
6/9/14 29
- 30. Page
GetComponent<T>
It asks the GameObject, for a component of the
specified type:
• The GO contains a list of Components
• Each Component type is compared to T
• The first Component of type T (or that derives from
T), will be returned to the caller
• Not too much overhead but it still needs to call into
native code
6/9/14 30
- 31. Page
Unity 5.0: Property Accessors
• Most accessors will be removed in Unity 5.0
• The objective is to reduce dependencies,
therefore improve modularization
• Transform will remain
• Existing scripts will be converted. Example:
in 5.0:
6/9/14 31
- 32. Page
Transform Component
• this.transform is the same as GetComponent<Transform>()
• transform.position/rotation needs to:
• find Transform component
• Traverse hierarchy to calculate absolute position
• Apply translation/rotation
• transform internally stores the position relative to the parent
• transform.localPosition = new Vector(…) è simple
assignment
• transform.position = new Vector(…) è costs the same if
no father, otherwise it will need to traverse the hierarchy
up to transform the abs position into local
• finally, other components (collider, rigid body, light, camera,
etc..) will be notified via messages
6/9/14 32
- 34. Page
Instantiate cont..ed
• Awake can be expensive
• AwakeFromLoad (main thread)
• clear states
• internal state caching
• pre-compute
Unity 5.0:
• Allocations have been reduced
• Some inner loops for copying the data have been
optimized
6/9/14 34
- 35. Page
JIT Compilation
What is it ?
• The process in which machine code is generated from CIL
code during the application's run-time
Pros:
• It generates optimized code for the current platform
Cons:
• Each time a method is called for the first time, the
application will suffer a certain performance penalty because
of the compilation
6/9/14 35
- 36. Page
JIT compilation spikes
What about pre-JITting ?
• RuntimeHelpers.PrepareMethod does not work:
…better to use MethodHandle.GetFunctionPointer()
6/9/14 36
- 38. Page
Unity 5.0: Job System (internal)
The goals of the job system:
• make it easy to write very efficient job based
multithreaded code
• The jobs should be able to run safely in parallel to
script code
6/9/14 38
- 39. Page
Job System: Why ?
Modern architectures are multi-core:
• XBox 360: 3 cores
• PS4/Xbox One: 8 cores
…which includes mobile devices:
• iPhone 4S: 2 cores
• Galaxy S3: 4 cores
6/9/14 39
- 40. Page
Job System: What is it ?
• It’s a Framework that we are going to use in
existing and new sub-systems
• We want to have Animation, NavMesh, Occlusion,
Rendering, etc… run as much as possible in
parallel
• This will ultimately lead to better performance
6/9/14 40
- 41. Page
Unity 5.0: Profiler Timeline View
It’s a tool that allows you to analyse internal (native)
threads execution of a specific frame
6/9/14 41
- 44. Page
Budgeting Memory
How much memory is available ?
• It depends…
• For example, on 512mb devices running iOS 6.0:
~250mb. A bit less with iOS 7.0
What’s the baseline ?
• Create an empty scene and measure memory
• Don’t forget that the profiler requires some
memory
• For example: on Android 15.5mb (+ 12mb profiler)
6/9/14 44
- 45. Page
Profiling
• Don’t make assumptions
• Profile on target device
• Editor != Player
• Platform X != Platform Y
• Managed Memory is not returned to Native Land!
For best results…:
• Profile early and regularly
6/9/14 45