SlideShare a Scribd company logo
Unity Internals:
Memory and Performance
Moscow, 16/05/2014
Marco Trivellato – Field Engineer
Page6/9/14 2
This Talk
Goals and Benefits
Page
Who Am I ?
•  Now Field Engineer @ Unity
•  Previously, Software Engineer
•  Mainly worked on game engines
•  Shipped several video games:
•  Captain America: Super Soldier
•  FIFA ‘07 – FIFA ’10
•  Fight Night: Round 3
6/9/14 3
Page
Topics
•  Memory Overview
•  Garbage Collection
•  Mesh Internals
•  Scripting
•  Job System
•  How to use the Profiler
6/9/14 4
Page6/9/14 5
Memory Overview
Page
Memory Domains
•  Native (internal)
•  Asset Data: Textures, AudioClips, Meshes
•  Game Objects & Components: Transform, etc..
•  Engine Internals: Managers, Rendering, Physics, etc..
•  Managed - Mono
•  Script objects (Managed dlls)
•  Wrappers for Unity objects: Game objects, assets,
components
•  Native Dlls
•  User’s dlls and external dlls (for example: DirectX)
6/9/14 6
Page
Native Memory: Internal Allocators
•  Default
•  GameObject
•  Gfx
•  Profiler
5.x: We are considering to expose an API for using a native
allocator in Dlls
6/9/14 7
Page
Managed Memory
•  Value types (bool, int, float, struct, ...)
•  Exist in stack memory. De-allocated when removed
from the stack. No Garbage.
•  Reference types (classes)
•  Exist on the heap and are handled by the mono/.net
GC. Removed when no longer being referenced.
•  Wrappers for Unity Objects :
•  GameObject
•  Assets : Texture2D, AudioClip, Mesh, …
•  Components : MeshRenderer, Transform, MonoBehaviour
6/9/14 8
Page
Mono Memory Internals
•  Allocates system heap blocks for internal allocator
•  Will allocate new heap blocks when needed
•  Heap blocks are kept in Mono for later use
•  Memory can be given back to the system after a while
•  …but it depends on the platform è don’t count on it
•  Garbage collector cleans up
•  Fragmentation can cause new heap blocks even
though memory is not exhausted
6/9/14 9
Page6/9/14 10
Garbage Collection
Page
Unity Object wrapper
•  Some Objects used in scripts have large native
backing memory in unity
•  Memory not freed until Finalizers have run
6/9/14 11
WWW
Decompression buffer
Compressed file
Decompressed file
Managed Native
Page
Mono Garbage Collection
•  GC.Collect
•  Runs on the main thread when
•  Mono exhausts the heap space
•  Or user calls System.GC.Collect()
•  Finalizers
•  Run on a separate thread
•  Controlled by mono
•  Can have several seconds delay
•  Unity native memory
•  Dispose() cleans up internal
memory
•  Eventually called from finalizer
•  Manually call Dispose() to cleanup
6/9/14 12
Main thread Finalizer thread
www = null;
new(someclass);
//no more heap
-> GC.Collect();
www.Dispose();
.....
Page
Garbage Collection
•  Roots are not collected in a GC.Collect
•  Thread stacks
•  CPU Registers
•  GC Handles (used by Unity to hold onto managed
objects)
•  Static variables!!
•  Collection time scales with managed heap size
•  The more you allocate, the slower it gets
6/9/14 13
Page
GC: does lata layout matter ?
struct Stuff
{
int a;
float b;
bool c;
string leString;
}
Stuff[] arrayOfStuff; << Everything is scanned. GC takes more time
VS
int[] As;
float[] Bs;
bool[] Cs;
string[] leStrings; << Only this is scanned. GC takes less time.
6/9/14 14
Page
GC: Best Practices
•  Reuse objects è Use object pools
•  Prefer stack-based allocations è Use struct
instead of class
•  System.GC.Collect can be used to trigger
collection
•  Calling it 6 times returns the unused memory to
the OS
•  Manually call Dispose to cleanup immediately
6/9/14 15
Page
Avoid temp allocations
•  Don’t use FindObjects or LINQ
•  Use StringBuilder for string concatenation
•  Reuse large temporary work buffers
•  ToString()
•  .tag è use CompareTag() instead
6/9/14 16
Page
Unity API Temporary Allocations
Some Examples:
•  GetComponents<T>
•  Vector3[] Mesh.vertices
•  Camera[] Camera.allCameras
•  foreach
•  does not allocate by definition
•  However, there can be a small allocation, depending on the
implementation of .GetEnumerator()
5.x: We are working on new non-allocating versions
6/9/14 17
Page
Memory fragmentation
•  Memory fragmentation is hard to account for
•  Fully unload dynamically allocated content
•  Switch to a blank scene before proceeding to next level
•  This scene could have a hook where you may pause the game
long enough to sample if there is anything significant in
memory
•  Ensure you clear out variables so GC.Collect will
remove as much as possible
•  Avoid allocations where possible
•  Reuse objects where possible within a scene play
•  Clear them out for map load to clean the memory
6/9/14 18
Page
Unloading Unused Assets
•  Resources.UnloadUnusedAssets will trigger asset
garbage collection
•  It looks for all unreferenced assets and unloads them
•  It’s an async operation
•  It’s called internally after loading a level
•  Resources.UnloadAsset is preferable
•  you need to know exactly what you need to Unload
•  Unity does not have to scan everything
•  Unity 5.0: Multi-threaded asset garbage collection
6/9/14 19
Page6/9/14 20
Mesh Internals
Memory vs. Cycles
Page
Mesh Read/Write Option
•  It allows you to modify the mesh at run-time
•  If enabled, a system-copy of the Mesh will remain in
memory
•  It is enabled by default
•  In some cases, disabling this option will not reduce the
memory usage
•  Skinned meshes
•  iOS
Unity 5.0: disable by default – under consideration
6/9/14 21
Page
Non-Uniform scaled Meshes
We need to correctly transform vertex normals
•  Unity 4.x:
•  transform the mesh on the CPU
•  create an extra copy of the data
•  Unity 5.0
•  Scaled on GPU
•  Extra memory no longer needed
6/9/14 22
Page
Static Batching
What is it ?
•  It’s an optimization that reduces number of draw
calls and state changes
How do I enable it ?
•  In the player settings + Tag the object as static
6/9/14 23
Page
Static Batching
How does it work internally ?
•  Build-time: Vertices are transformed to world-
space
•  Run-time: Index buffer is created with indices of
visible objects
Unity 5.0:
•  Re-implemented static batching without copying of
index buffers
6/9/14 24
Page
Dynamic Batching
What is it ?
•  Similar to Static Batching but it batches non-static
objects at run-time
How do I enable it ?
•  In the player settings
•  no need to tag. it auto-magically works…
6/9/14 25
Page
Dynamic Batching
How does it work internally ?
•  objects are transformed to world space on the
CPU
•  Temporary VB & IB are created
•  Rendered in one draw call
Unity 5.x: we are considering to expose per-platform
parameters
6/9/14 26
Page
Mesh Skinning
Different Implementations depending on platform:
•  x86: SSE
•  iOS/Android/WP8: Neon optimizations
•  D3D11/XBoxOne/GLES3.0: GPU
•  XBox360, WiiU: GPU (memexport)
•  PS3: SPU
•  WiiU: GPU w/ stream out
Unity 5.0: Skinned meshes use less memory by sharing
index buffers between instances
6/9/14 27
Page6/9/14 28
Scripting
Page
Unity 5.0: Mono
•  No upgrade
•  Mainly bug fixes
•  New tech in WebGL: IL2CPP
•  http://blogs.unity3d.com/2014/04/29/on-the-future-of-
web-publishing-in-unity/
•  Stay tuned: there will be a blog post about it
6/9/14 29
Page
GetComponent<T>
It asks the GameObject, for a component of the
specified type:
•  The GO contains a list of Components
•  Each Component type is compared to T
•  The first Component of type T (or that derives from
T), will be returned to the caller
•  Not too much overhead but it still needs to call into
native code
6/9/14 30
Page
Unity 5.0: Property Accessors
•  Most accessors will be removed in Unity 5.0
•  The objective is to reduce dependencies,
therefore improve modularization
•  Transform will remain
•  Existing scripts will be converted. Example:
in 5.0:
6/9/14 31
Page
Transform Component
•  this.transform is the same as GetComponent<Transform>()
•  transform.position/rotation needs to:
•  find Transform component
•  Traverse hierarchy to calculate absolute position
•  Apply translation/rotation
•  transform internally stores the position relative to the parent
•  transform.localPosition = new Vector(…) è simple
assignment
•  transform.position = new Vector(…) è costs the same if
no father, otherwise it will need to traverse the hierarchy
up to transform the abs position into local
•  finally, other components (collider, rigid body, light, camera,
etc..) will be notified via messages
6/9/14 32
Page
Instantiate
API:
•  Object Instantiate(Object, Vector3, Quaternion);
•  Object Instantiate(Object);
Implementation:
•  Clone GameObject Hierarchy and Components
•  Copy Properties
•  Awake
•  Apply new Transform (if provided)
6/9/14 33
Page
Instantiate cont..ed
•  Awake can be expensive
•  AwakeFromLoad (main thread)
•  clear states
•  internal state caching
•  pre-compute
Unity 5.0:
•  Allocations have been reduced
•  Some inner loops for copying the data have been
optimized
6/9/14 34
Page
JIT Compilation
What is it ?
•  The process in which machine code is generated from CIL
code during the application's run-time
Pros:
•  It generates optimized code for the current platform
Cons:
•  Each time a method is called for the first time, the
application will suffer a certain performance penalty because
of the compilation
6/9/14 35
Page
JIT compilation spikes
What about pre-JITting ?
•  RuntimeHelpers.PrepareMethod does not work:
…better to use MethodHandle.GetFunctionPointer()
6/9/14 36
Page6/9/14 37
Job System
Page
Unity 5.0: Job System (internal)
The goals of the job system:
•  make it easy to write very efficient job based
multithreaded code
•  The jobs should be able to run safely in parallel to
script code
6/9/14 38
Page
Job System: Why ?
Modern architectures are multi-core:
•  XBox 360: 3 cores
•  PS4/Xbox One: 8 cores
…which includes mobile devices:
•  iPhone 4S: 2 cores
•  Galaxy S3: 4 cores
6/9/14 39
Page
Job System: What is it ?
•  It’s a Framework that we are going to use in
existing and new sub-systems
•  We want to have Animation, NavMesh, Occlusion,
Rendering, etc… run as much as possible in
parallel
•  This will ultimately lead to better performance
6/9/14 40
Page
Unity 5.0: Profiler Timeline View
It’s a tool that allows you to analyse internal (native)
threads execution of a specific frame
6/9/14 41
Page
Unity 5.0: Frame Debugger
6/9/14 42
Page6/9/14 43
Conclusions
Page
Budgeting Memory
How much memory is available ?
•  It depends…
•  For example, on 512mb devices running iOS 6.0:
~250mb. A bit less with iOS 7.0
What’s the baseline ?
•  Create an empty scene and measure memory
•  Don’t forget that the profiler requires some
memory
•  For example: on Android 15.5mb (+ 12mb profiler)
6/9/14 44
Page
Profiling
•  Don’t make assumptions
•  Profile on target device
•  Editor != Player
•  Platform X != Platform Y
•  Managed Memory is not returned to Native Land!
For best results…:
•  Profile early and regularly
6/9/14 45
Page6/9/14 46
Questions ?
marcot@unity3d.com - Twitter: @m_trive

More Related Content

Unity Internals: Memory and Performance

  • 1. Unity Internals: Memory and Performance Moscow, 16/05/2014 Marco Trivellato – Field Engineer
  • 3. Page Who Am I ? •  Now Field Engineer @ Unity •  Previously, Software Engineer •  Mainly worked on game engines •  Shipped several video games: •  Captain America: Super Soldier •  FIFA ‘07 – FIFA ’10 •  Fight Night: Round 3 6/9/14 3
  • 4. Page Topics •  Memory Overview •  Garbage Collection •  Mesh Internals •  Scripting •  Job System •  How to use the Profiler 6/9/14 4
  • 6. Page Memory Domains •  Native (internal) •  Asset Data: Textures, AudioClips, Meshes •  Game Objects & Components: Transform, etc.. •  Engine Internals: Managers, Rendering, Physics, etc.. •  Managed - Mono •  Script objects (Managed dlls) •  Wrappers for Unity objects: Game objects, assets, components •  Native Dlls •  User’s dlls and external dlls (for example: DirectX) 6/9/14 6
  • 7. Page Native Memory: Internal Allocators •  Default •  GameObject •  Gfx •  Profiler 5.x: We are considering to expose an API for using a native allocator in Dlls 6/9/14 7
  • 8. Page Managed Memory •  Value types (bool, int, float, struct, ...) •  Exist in stack memory. De-allocated when removed from the stack. No Garbage. •  Reference types (classes) •  Exist on the heap and are handled by the mono/.net GC. Removed when no longer being referenced. •  Wrappers for Unity Objects : •  GameObject •  Assets : Texture2D, AudioClip, Mesh, … •  Components : MeshRenderer, Transform, MonoBehaviour 6/9/14 8
  • 9. Page Mono Memory Internals •  Allocates system heap blocks for internal allocator •  Will allocate new heap blocks when needed •  Heap blocks are kept in Mono for later use •  Memory can be given back to the system after a while •  …but it depends on the platform è don’t count on it •  Garbage collector cleans up •  Fragmentation can cause new heap blocks even though memory is not exhausted 6/9/14 9
  • 11. Page Unity Object wrapper •  Some Objects used in scripts have large native backing memory in unity •  Memory not freed until Finalizers have run 6/9/14 11 WWW Decompression buffer Compressed file Decompressed file Managed Native
  • 12. Page Mono Garbage Collection •  GC.Collect •  Runs on the main thread when •  Mono exhausts the heap space •  Or user calls System.GC.Collect() •  Finalizers •  Run on a separate thread •  Controlled by mono •  Can have several seconds delay •  Unity native memory •  Dispose() cleans up internal memory •  Eventually called from finalizer •  Manually call Dispose() to cleanup 6/9/14 12 Main thread Finalizer thread www = null; new(someclass); //no more heap -> GC.Collect(); www.Dispose(); .....
  • 13. Page Garbage Collection •  Roots are not collected in a GC.Collect •  Thread stacks •  CPU Registers •  GC Handles (used by Unity to hold onto managed objects) •  Static variables!! •  Collection time scales with managed heap size •  The more you allocate, the slower it gets 6/9/14 13
  • 14. Page GC: does lata layout matter ? struct Stuff { int a; float b; bool c; string leString; } Stuff[] arrayOfStuff; << Everything is scanned. GC takes more time VS int[] As; float[] Bs; bool[] Cs; string[] leStrings; << Only this is scanned. GC takes less time. 6/9/14 14
  • 15. Page GC: Best Practices •  Reuse objects è Use object pools •  Prefer stack-based allocations è Use struct instead of class •  System.GC.Collect can be used to trigger collection •  Calling it 6 times returns the unused memory to the OS •  Manually call Dispose to cleanup immediately 6/9/14 15
  • 16. Page Avoid temp allocations •  Don’t use FindObjects or LINQ •  Use StringBuilder for string concatenation •  Reuse large temporary work buffers •  ToString() •  .tag è use CompareTag() instead 6/9/14 16
  • 17. Page Unity API Temporary Allocations Some Examples: •  GetComponents<T> •  Vector3[] Mesh.vertices •  Camera[] Camera.allCameras •  foreach •  does not allocate by definition •  However, there can be a small allocation, depending on the implementation of .GetEnumerator() 5.x: We are working on new non-allocating versions 6/9/14 17
  • 18. Page Memory fragmentation •  Memory fragmentation is hard to account for •  Fully unload dynamically allocated content •  Switch to a blank scene before proceeding to next level •  This scene could have a hook where you may pause the game long enough to sample if there is anything significant in memory •  Ensure you clear out variables so GC.Collect will remove as much as possible •  Avoid allocations where possible •  Reuse objects where possible within a scene play •  Clear them out for map load to clean the memory 6/9/14 18
  • 19. Page Unloading Unused Assets •  Resources.UnloadUnusedAssets will trigger asset garbage collection •  It looks for all unreferenced assets and unloads them •  It’s an async operation •  It’s called internally after loading a level •  Resources.UnloadAsset is preferable •  you need to know exactly what you need to Unload •  Unity does not have to scan everything •  Unity 5.0: Multi-threaded asset garbage collection 6/9/14 19
  • 21. Page Mesh Read/Write Option •  It allows you to modify the mesh at run-time •  If enabled, a system-copy of the Mesh will remain in memory •  It is enabled by default •  In some cases, disabling this option will not reduce the memory usage •  Skinned meshes •  iOS Unity 5.0: disable by default – under consideration 6/9/14 21
  • 22. Page Non-Uniform scaled Meshes We need to correctly transform vertex normals •  Unity 4.x: •  transform the mesh on the CPU •  create an extra copy of the data •  Unity 5.0 •  Scaled on GPU •  Extra memory no longer needed 6/9/14 22
  • 23. Page Static Batching What is it ? •  It’s an optimization that reduces number of draw calls and state changes How do I enable it ? •  In the player settings + Tag the object as static 6/9/14 23
  • 24. Page Static Batching How does it work internally ? •  Build-time: Vertices are transformed to world- space •  Run-time: Index buffer is created with indices of visible objects Unity 5.0: •  Re-implemented static batching without copying of index buffers 6/9/14 24
  • 25. Page Dynamic Batching What is it ? •  Similar to Static Batching but it batches non-static objects at run-time How do I enable it ? •  In the player settings ��  no need to tag. it auto-magically works… 6/9/14 25
  • 26. Page Dynamic Batching How does it work internally ? •  objects are transformed to world space on the CPU •  Temporary VB & IB are created •  Rendered in one draw call Unity 5.x: we are considering to expose per-platform parameters 6/9/14 26
  • 27. Page Mesh Skinning Different Implementations depending on platform: •  x86: SSE •  iOS/Android/WP8: Neon optimizations •  D3D11/XBoxOne/GLES3.0: GPU •  XBox360, WiiU: GPU (memexport) •  PS3: SPU •  WiiU: GPU w/ stream out Unity 5.0: Skinned meshes use less memory by sharing index buffers between instances 6/9/14 27
  • 29. Page Unity 5.0: Mono •  No upgrade •  Mainly bug fixes •  New tech in WebGL: IL2CPP •  http://blogs.unity3d.com/2014/04/29/on-the-future-of- web-publishing-in-unity/ •  Stay tuned: there will be a blog post about it 6/9/14 29
  • 30. Page GetComponent<T> It asks the GameObject, for a component of the specified type: •  The GO contains a list of Components •  Each Component type is compared to T •  The first Component of type T (or that derives from T), will be returned to the caller •  Not too much overhead but it still needs to call into native code 6/9/14 30
  • 31. Page Unity 5.0: Property Accessors •  Most accessors will be removed in Unity 5.0 •  The objective is to reduce dependencies, therefore improve modularization •  Transform will remain •  Existing scripts will be converted. Example: in 5.0: 6/9/14 31
  • 32. Page Transform Component •  this.transform is the same as GetComponent<Transform>() •  transform.position/rotation needs to: •  find Transform component •  Traverse hierarchy to calculate absolute position •  Apply translation/rotation •  transform internally stores the position relative to the parent •  transform.localPosition = new Vector(…) è simple assignment •  transform.position = new Vector(…) è costs the same if no father, otherwise it will need to traverse the hierarchy up to transform the abs position into local •  finally, other components (collider, rigid body, light, camera, etc..) will be notified via messages 6/9/14 32
  • 33. Page Instantiate API: •  Object Instantiate(Object, Vector3, Quaternion); •  Object Instantiate(Object); Implementation: •  Clone GameObject Hierarchy and Components •  Copy Properties •  Awake •  Apply new Transform (if provided) 6/9/14 33
  • 34. Page Instantiate cont..ed •  Awake can be expensive •  AwakeFromLoad (main thread) •  clear states •  internal state caching •  pre-compute Unity 5.0: •  Allocations have been reduced •  Some inner loops for copying the data have been optimized 6/9/14 34
  • 35. Page JIT Compilation What is it ? •  The process in which machine code is generated from CIL code during the application's run-time Pros: •  It generates optimized code for the current platform Cons: •  Each time a method is called for the first time, the application will suffer a certain performance penalty because of the compilation 6/9/14 35
  • 36. Page JIT compilation spikes What about pre-JITting ? •  RuntimeHelpers.PrepareMethod does not work: …better to use MethodHandle.GetFunctionPointer() 6/9/14 36
  • 38. Page Unity 5.0: Job System (internal) The goals of the job system: •  make it easy to write very efficient job based multithreaded code •  The jobs should be able to run safely in parallel to script code 6/9/14 38
  • 39. Page Job System: Why ? Modern architectures are multi-core: •  XBox 360: 3 cores •  PS4/Xbox One: 8 cores …which includes mobile devices: •  iPhone 4S: 2 cores •  Galaxy S3: 4 cores 6/9/14 39
  • 40. Page Job System: What is it ? •  It’s a Framework that we are going to use in existing and new sub-systems •  We want to have Animation, NavMesh, Occlusion, Rendering, etc… run as much as possible in parallel •  This will ultimately lead to better performance 6/9/14 40
  • 41. Page Unity 5.0: Profiler Timeline View It’s a tool that allows you to analyse internal (native) threads execution of a specific frame 6/9/14 41
  • 42. Page Unity 5.0: Frame Debugger 6/9/14 42
  • 44. Page Budgeting Memory How much memory is available ? •  It depends… •  For example, on 512mb devices running iOS 6.0: ~250mb. A bit less with iOS 7.0 What’s the baseline ? •  Create an empty scene and measure memory •  Don’t forget that the profiler requires some memory •  For example: on Android 15.5mb (+ 12mb profiler) 6/9/14 44
  • 45. Page Profiling •  Don’t make assumptions •  Profile on target device •  Editor != Player •  Platform X != Platform Y •  Managed Memory is not returned to Native Land! For best results…: •  Profile early and regularly 6/9/14 45