SlideShare a Scribd company logo
Scope Stack AllocationAndreas Fredriksson, DICE<dep@dice.se>
ContentsWhat are Scope Stacks?Background – embedded systemsLinear memory allocationScope StacksBits and pieces
What are Scope Stacks?A memory management toolLinear memory layout of arbitrary object hierarchiesSupport C++ object life cycle if desiredDestructors called in correct dependency orderSuper-duper oiled-up fast!Makes debugging easier
BackgroundWhy is all this relevant?Console games are embedded systemsFixed (small) amount of memoryCan't run out of memory or you can't ship – fragmentation is a serious issueHeap allocation is very expensiveLots of code for complex heap managerBad cache localityAllocation & deallocation speed
Embedded systemsWith a global heap, memory fragmentsAssume 10-15% wasted with good allocatorCan easily get 25% wasted with poor allocatorCaused by allocating objects with mixed life time next to each otherTemporary stuff allocated next to semi-permanent stuff (system arrays etc)
Heap FragmentationEach alloc has to traverse the free list of this structure!Assuming “best fit” allocator for less fragmentationWill likely cache miss for each probed locationLarge blocks disappear quickly
Memory MapIdeally we would like fully deterministic memory mapPopular approach on consolePartition all memory up frontLoad new levelRewind level part onlyReconfigure systemsRewind both level, systemsFragmentation not possible
Linear AllocationMany games use linear allocators to achieve this kind of memory mapLinear allocators basically sit on a pointerAllocations just increment the pointerTo rewind, reset the pointerVery fast, but only suitable for POD dataNo finalizers/destructors calledUsed in Frostbite's renderer for command buffers
Linear Allocator ImplementationSimplified C++ exampleReal implementation needs checks, alignmentIn retail build, allocation will be just a few cycles 1 class LinearAllocator { 2 // ... 3     u8 *allocate(size_t size) { 4 return m_ptr += size; 5     } 6 void rewind(u8 *ptr) { 7         m_ptr = ptr; 8     } 9 // ...10     u8 *m_ptr;11 };
Using Linear AllocationWe're implementing FrogSystemA new system tied to the levelRandomly place frogs across the level as the player is moving aroundClearly the Next Big ThingDesign for linear allocationGrab all memory up frontMr FISK (c) FLTUsed with permission
FrogSystem - Linear AllocationSimplified C++ example 1 struct FrogInfo { ... }; 2 3 struct FrogSystem { 4 // ... 5 int maxFrogs; 6     FrogInfo *frogPool; 7 }; 8 9 FrogSystem* FrogSystem_init(LinearAllocator& alloc) {10     FrogSystem *self = alloc.allocate(sizeof(FrogSystem));11     self->maxFrogs = ...;12     self->frogPool = alloc.allocate(sizeof(FrogInfo) * self->maxFrogs);13 return self;14 }1516 void FrogSystem_update(FrogSystem *system) {17 // ...18 }
Resulting Memory LayoutFrogSystemFrog PoolAllocationPointPOD Data
Linear allocation limitationsWorks well until we need resource cleanupFile handles, sockets, ...Pool handles, other API resourcesThis is the “systems programming” aspectAssume frog system needs a critical sectionKernel objectMust be released when no longer used
FrogSystem – Adding a lock 1 class FrogSystem { 2     CriticalSection *m_lock; 3 4     FrogSystem(LinearAllocator& a) 5 // get memory 6     ,   m_lock((CriticalSection*) a.allocate(sizeof(CriticalSection))) 7 // ... 8     { 9 new (m_lock) CriticalSection; // construct object10     }1112     ~FrogSystem() {13         m_lock->~CriticalSection(); // destroy object14     }15 };1617 FrogSystem* FrogSystem_init(LinearAllocator& a) {18 returnnew (a.allocate(sizeof(FrogSystem))) FrogSystem(a);19 }2021 void FrogSystem_cleanup(FrogSystem *system) {22     system->~FrogSystem();23 }
Resulting Memory LayoutFrogSystemFrog PoolCritialSectAllocationPointPOD DataObject with cleanup
Linear allocation limitationsCode quickly drowns in low-level detailsLots of boilerplateWe must add a cleanup functionManually remember what resources to freeError proneIn C++, we would rather rely on destructors
Scope StacksIntroducing Scope StacksSits on top of linear allocatorRewinds part of underlying allocator when destroyedDesigned to make larger-scale system design with linear allocation possibleMaintain a list of finalizers to run when rewindingOnly worry about allocation, not cleanup
Scope Stacks, contd.Type itself is a lightweight construct 1 struct Finalizer { 2 void (*fn)(void *ptr); 3     Finalizer *chain; 4 }; 5 6 class ScopeStack { 7     LinearAllocator& m_alloc; 8 void *m_rewindPoint; 9     Finalizer *m_finalizerChain;1011 explicit ScopeStack(LinearAllocator& a);12     ~ScopeStack(); // unwind1314 template <typename T> T* newObject();15 template <typename T> T* newPOD();16 };17
Scope Stacks, contd.Can create a stack of scopes on top of a single linear allocatorOnly allocate from topmost scopeCan rewind scopes as desiredFor example init/systems/levelFiner-grained control over nested lifetimesCan also follow call stackVery elegant per-thread scratch pad
Scope Stack DiagramLinear AllocatorScopeScopeActive Scope
Scope Stack APISimple C++ interfacescope.newObject<T>(...) - allocate object with cleanup (stores finalizer)scope.newPod<T>(...) - allocate object without cleanupscope.alloc(...) - raw memory allocationCan also implement as C interfaceSimilar ideas in APR (Apache Portable Runtime)
Scope Stack ImplementationnewObject<T>() 1 template <typename T> 2 void destructorCall(void *ptr) { 3 static_cast<T*>(ptr)->~T(); 4 } 5 6 template <typename T> 7 T* ScopeStack::newObject() { 8 // Allocate memory for finalizer + object. 9     Finalizer* f = allocWithFinalizer(sizeof(T));1011 // Placement construct object in space after finalizer.12     T* result = new (objectFromFinalizer(f)) T;1314 // Link this finalizer onto the chain.15     f->fn = &destructorCall<T>;16     f->chain = m_finalizerChain;17     m_finalizerChain = f;18 return result;19 }
FrogSystem – Scope StacksCritical Section example with Scope Stack 1 class FrogSystem { 2 // ... 3     CriticalSection *m_lock; 4 5     FrogSystem(ScopeStack& scope) 6     :   m_lock(scope.newObject<CriticalSection>()) 7 // ... 8     {} 910 // no destructor needed!11 };1213 FrogSystem* FrogSystem_init(ScopeStack& scope) {14 return scope.newPod<FrogSystem>();15 }
Memory Layout (with context)FrogSystemFrog PoolCritialSect(other stuff)...AllocationPointPOD DataObject with cleanupFinalizer recordScope
Scope CleanupWith finalizer chain in place we can unwind without manual codeIterate linked listCall finalizer for objects that require cleanupPOD data still zero overheadFinalizer for C++ objects => destructor call
Per-thread allocationScratch pad = Thread-local linear allocatorConstruct nested scopes on this allocatorUtility functions can lay out arbitrary objects on scratch pad scope 1 class File; // next slide 2 3 constchar *formatString(ScopeStack& scope, constchar *fmt, ...); 4 5 void myFunction(constchar *fn) { 6     ScopeStack scratch(tls_allocator); 7 constchar *filename = formatString(scratch, "foo/bar/%s", fn); 8     File *file = scratch.newObject<File>(scratch, filename); 910     file->read(...);1112 // No cleanup required!13 }
Per-thread allocation, contd.File object allocates buffer from designed scopeDoesn't care about lifetime – its buffer and itself will live for exactly the same timeCan live on scratch pad without knowing it 1 class File { 2 private: 3     u8 *m_buffer; 4 int m_handle; 5 public: 6     File(ScopeStack& scope, constchar *filename) 7     :   m_buffer(scope.alloc(8192)) 8     ,   m_handle(open(filename, O_READ)) 9     {}1011     ~File() {12         close(m_handle);13     }14 };
Memory Layout: Scratch PadFile BufferFilenameFileAllocationPointOldAllocationPointPOD DataObject with cleanupFinalizer recordRewind PointScopeParent Scope
PIMPLC++ addicts can enjoy free PIMPL idiomBecause allocations are essentially “free”; PIMPL idiom becomes more attractiveCan slim down headers and hide all data members without concern for performance
LimitationsMust set upper bound all pool sizesCan never grow an allocationThis design style is classical in games industryBut pool sizes can vary between levels!Reconfigure after rewindBy default API not thread safeMakes sense as this is more like layout than allocationPools/other structures can still be made thread safe once memory is allocated
Limitations, contd.Doesn't map 100% to C++ object modelingBut finalizers enable RAII-like cleanupMany traditional C++ tricks become obsoletePointer swappingReference countingMust always think about lifetime and ownership when allocatingLifetime determined on global levelCan't hold on to pointers – unwind = apocalypseManage on higher level instead
ConclusionScope stacks are a system programming toolSuitable for embedded systems with few variable parameters – gamesRequires planning and commitmentPays dividends in speed and simplicitySame pointers every time – debugging easierOut of memory analysis usually very simpleEither the level runs, or doesn't runCan never fail half through
LinksToy implementation of scope stacksFor playing with, not industrial strengthhttp://pastebin.com/h7nU8JE2“Start Pre-allocating And Stop Worrying” - Noel Llopishttp://gamesfromwithin.com/start-pre-allocating-and-stop-worryingApache Portable Runtimehttp://apr.apache.org/
Questions
Bonus: what about....building an array, final size unknown?Standard approach in C++: STL vector pushInstead build linked list/dequeue on scratch padAllocate array in target scope stack once size is known..dynamic lifetime of individual objects?Allocate object pool from scope stackRequires bounding the worst case – a good idea for games anyway

More Related Content

Scope Stack Allocation

  • 1. Scope Stack AllocationAndreas Fredriksson, DICE<dep@dice.se>
  • 2. ContentsWhat are Scope Stacks?Background – embedded systemsLinear memory allocationScope StacksBits and pieces
  • 3. What are Scope Stacks?A memory management toolLinear memory layout of arbitrary object hierarchiesSupport C++ object life cycle if desiredDestructors called in correct dependency orderSuper-duper oiled-up fast!Makes debugging easier
  • 4. BackgroundWhy is all this relevant?Console games are embedded systemsFixed (small) amount of memoryCan't run out of memory or you can't ship – fragmentation is a serious issueHeap allocation is very expensiveLots of code for complex heap managerBad cache localityAllocation & deallocation speed
  • 5. Embedded systemsWith a global heap, memory fragmentsAssume 10-15% wasted with good allocatorCan easily get 25% wasted with poor allocatorCaused by allocating objects with mixed life time next to each otherTemporary stuff allocated next to semi-permanent stuff (system arrays etc)
  • 6. Heap FragmentationEach alloc has to traverse the free list of this structure!Assuming “best fit” allocator for less fragmentationWill likely cache miss for each probed locationLarge blocks disappear quickly
  • 7. Memory MapIdeally we would like fully deterministic memory mapPopular approach on consolePartition all memory up frontLoad new levelRewind level part onlyReconfigure systemsRewind both level, systemsFragmentation not possible
  • 8. Linear AllocationMany games use linear allocators to achieve this kind of memory mapLinear allocators basically sit on a pointerAllocations just increment the pointerTo rewind, reset the pointerVery fast, but only suitable for POD dataNo finalizers/destructors calledUsed in Frostbite's renderer for command buffers
  • 9. Linear Allocator ImplementationSimplified C++ exampleReal implementation needs checks, alignmentIn retail build, allocation will be just a few cycles 1 class LinearAllocator { 2 // ... 3 u8 *allocate(size_t size) { 4 return m_ptr += size; 5 } 6 void rewind(u8 *ptr) { 7 m_ptr = ptr; 8 } 9 // ...10 u8 *m_ptr;11 };
  • 10. Using Linear AllocationWe're implementing FrogSystemA new system tied to the levelRandomly place frogs across the level as the player is moving aroundClearly the Next Big ThingDesign for linear allocationGrab all memory up frontMr FISK (c) FLTUsed with permission
  • 11. FrogSystem - Linear AllocationSimplified C++ example 1 struct FrogInfo { ... }; 2 3 struct FrogSystem { 4 // ... 5 int maxFrogs; 6 FrogInfo *frogPool; 7 }; 8 9 FrogSystem* FrogSystem_init(LinearAllocator& alloc) {10 FrogSystem *self = alloc.allocate(sizeof(FrogSystem));11 self->maxFrogs = ...;12 self->frogPool = alloc.allocate(sizeof(FrogInfo) * self->maxFrogs);13 return self;14 }1516 void FrogSystem_update(FrogSystem *system) {17 // ...18 }
  • 12. Resulting Memory LayoutFrogSystemFrog PoolAllocationPointPOD Data
  • 13. Linear allocation limitationsWorks well until we need resource cleanupFile handles, sockets, ...Pool handles, other API resourcesThis is the “systems programming” aspectAssume frog system needs a critical sectionKernel objectMust be released when no longer used
  • 14. FrogSystem – Adding a lock 1 class FrogSystem { 2 CriticalSection *m_lock; 3 4 FrogSystem(LinearAllocator& a) 5 // get memory 6 , m_lock((CriticalSection*) a.allocate(sizeof(CriticalSection))) 7 // ... 8 { 9 new (m_lock) CriticalSection; // construct object10 }1112 ~FrogSystem() {13 m_lock->~CriticalSection(); // destroy object14 }15 };1617 FrogSystem* FrogSystem_init(LinearAllocator& a) {18 returnnew (a.allocate(sizeof(FrogSystem))) FrogSystem(a);19 }2021 void FrogSystem_cleanup(FrogSystem *system) {22 system->~FrogSystem();23 }
  • 15. Resulting Memory LayoutFrogSystemFrog PoolCritialSectAllocationPointPOD DataObject with cleanup
  • 16. Linear allocation limitationsCode quickly drowns in low-level detailsLots of boilerplateWe must add a cleanup functionManually remember what resources to freeError proneIn C++, we would rather rely on destructors
  • 17. Scope StacksIntroducing Scope StacksSits on top of linear allocatorRewinds part of underlying allocator when destroyedDesigned to make larger-scale system design with linear allocation possibleMaintain a list of finalizers to run when rewindingOnly worry about allocation, not cleanup
  • 18. Scope Stacks, contd.Type itself is a lightweight construct 1 struct Finalizer { 2 void (*fn)(void *ptr); 3 Finalizer *chain; 4 }; 5 6 class ScopeStack { 7 LinearAllocator& m_alloc; 8 void *m_rewindPoint; 9 Finalizer *m_finalizerChain;1011 explicit ScopeStack(LinearAllocator& a);12 ~ScopeStack(); // unwind1314 template <typename T> T* newObject();15 template <typename T> T* newPOD();16 };17
  • 19. Scope Stacks, contd.Can create a stack of scopes on top of a single linear allocatorOnly allocate from topmost scopeCan rewind scopes as desiredFor example init/systems/levelFiner-grained control over nested lifetimesCan also follow call stackVery elegant per-thread scratch pad
  • 20. Scope Stack DiagramLinear AllocatorScopeScopeActive Scope
  • 21. Scope Stack APISimple C++ interfacescope.newObject<T>(...) - allocate object with cleanup (stores finalizer)scope.newPod<T>(...) - allocate object without cleanupscope.alloc(...) - raw memory allocationCan also implement as C interfaceSimilar ideas in APR (Apache Portable Runtime)
  • 22. Scope Stack ImplementationnewObject<T>() 1 template <typename T> 2 void destructorCall(void *ptr) { 3 static_cast<T*>(ptr)->~T(); 4 } 5 6 template <typename T> 7 T* ScopeStack::newObject() { 8 // Allocate memory for finalizer + object. 9 Finalizer* f = allocWithFinalizer(sizeof(T));1011 // Placement construct object in space after finalizer.12 T* result = new (objectFromFinalizer(f)) T;1314 // Link this finalizer onto the chain.15 f->fn = &destructorCall<T>;16 f->chain = m_finalizerChain;17 m_finalizerChain = f;18 return result;19 }
  • 23. FrogSystem – Scope StacksCritical Section example with Scope Stack 1 class FrogSystem { 2 // ... 3 CriticalSection *m_lock; 4 5 FrogSystem(ScopeStack& scope) 6 : m_lock(scope.newObject<CriticalSection>()) 7 // ... 8 {} 910 // no destructor needed!11 };1213 FrogSystem* FrogSystem_init(ScopeStack& scope) {14 return scope.newPod<FrogSystem>();15 }
  • 24. Memory Layout (with context)FrogSystemFrog PoolCritialSect(other stuff)...AllocationPointPOD DataObject with cleanupFinalizer recordScope
  • 25. Scope CleanupWith finalizer chain in place we can unwind without manual codeIterate linked listCall finalizer for objects that require cleanupPOD data still zero overheadFinalizer for C++ objects => destructor call
  • 26. Per-thread allocationScratch pad = Thread-local linear allocatorConstruct nested scopes on this allocatorUtility functions can lay out arbitrary objects on scratch pad scope 1 class File; // next slide 2 3 constchar *formatString(ScopeStack& scope, constchar *fmt, ...); 4 5 void myFunction(constchar *fn) { 6 ScopeStack scratch(tls_allocator); 7 constchar *filename = formatString(scratch, "foo/bar/%s", fn); 8 File *file = scratch.newObject<File>(scratch, filename); 910 file->read(...);1112 // No cleanup required!13 }
  • 27. Per-thread allocation, contd.File object allocates buffer from designed scopeDoesn't care about lifetime – its buffer and itself will live for exactly the same timeCan live on scratch pad without knowing it 1 class File { 2 private: 3 u8 *m_buffer; 4 int m_handle; 5 public: 6 File(ScopeStack& scope, constchar *filename) 7 : m_buffer(scope.alloc(8192)) 8 , m_handle(open(filename, O_READ)) 9 {}1011 ~File() {12 close(m_handle);13 }14 };
  • 28. Memory Layout: Scratch PadFile BufferFilenameFileAllocationPointOldAllocationPointPOD DataObject with cleanupFinalizer recordRewind PointScopeParent Scope
  • 29. PIMPLC++ addicts can enjoy free PIMPL idiomBecause allocations are essentially “free”; PIMPL idiom becomes more attractiveCan slim down headers and hide all data members without concern for performance
  • 30. LimitationsMust set upper bound all pool sizesCan never grow an allocationThis design style is classical in games industryBut pool sizes can vary between levels!Reconfigure after rewindBy default API not thread safeMakes sense as this is more like layout than allocationPools/other structures can still be made thread safe once memory is allocated
  • 31. Limitations, contd.Doesn't map 100% to C++ object modelingBut finalizers enable RAII-like cleanupMany traditional C++ tricks become obsoletePointer swappingReference countingMust always think about lifetime and ownership when allocatingLifetime determined on global levelCan't hold on to pointers – unwind = apocalypseManage on higher level instead
  • 32. ConclusionScope stacks are a system programming toolSuitable for embedded systems with few variable parameters – gamesRequires planning and commitmentPays dividends in speed and simplicitySame pointers every time – debugging easierOut of memory analysis usually very simpleEither the level runs, or doesn't runCan never fail half through
  • 33. LinksToy implementation of scope stacksFor playing with, not industrial strengthhttp://pastebin.com/h7nU8JE2“Start Pre-allocating And Stop Worrying” - Noel Llopishttp://gamesfromwithin.com/start-pre-allocating-and-stop-worryingApache Portable Runtimehttp://apr.apache.org/
  • 35. Bonus: what about....building an array, final size unknown?Standard approach in C++: STL vector pushInstead build linked list/dequeue on scratch padAllocate array in target scope stack once size is known..dynamic lifetime of individual objects?Allocate object pool from scope stackRequires bounding the worst case – a good idea for games anyway