Scope Stack Allocation

Scope Stack AllocationAndreas Fredriksson, DICE<dep@dice.se>

ContentsWhat are Scope Stacks?Background – embedded systemsLinear memory allocationScope StacksBits and pieces

What are Scope Stacks?A memory management toolLinear memory layout of arbitrary object hierarchiesSupport C++ object life cycle if desiredDestructors called in correct dependency orderSuper-duper oiled-up fast!Makes debugging easier

BackgroundWhy is all this relevant?Console games are embedded systemsFixed (small) amount of memoryCan't run out of memory or you can't ship – fragmentation is a serious issueHeap allocation is very expensiveLots of code for complex heap managerBad cache localityAllocation & deallocation speed

Embedded systemsWith a global heap, memory fragmentsAssume 10-15% wasted with good allocatorCan easily get 25% wasted with poor allocatorCaused by allocating objects with mixed life time next to each otherTemporary stuff allocated next to semi-permanent stuff (system arrays etc)

Heap FragmentationEach alloc has to traverse the free list of this structure!Assuming “best fit” allocator for less fragmentationWill likely cache miss for each probed locationLarge blocks disappear quickly

Memory MapIdeally we would like fully deterministic memory mapPopular approach on consolePartition all memory up frontLoad new levelRewind level part onlyReconfigure systemsRewind both level, systemsFragmentation not possible

Linear AllocationMany games use linear allocators to achieve this kind of memory mapLinear allocators basically sit on a pointerAllocations just increment the pointerTo rewind, reset the pointerVery fast, but only suitable for POD dataNo finalizers/destructors calledUsed in Frostbite's renderer for command buffers

Linear Allocator ImplementationSimplified C++ exampleReal implementation needs checks, alignmentIn retail build, allocation will be just a few cycles 1 class LinearAllocator { 2 // ... 3 u8 *allocate(size_t size) { 4 return m_ptr += size; 5 } 6 void rewind(u8 *ptr) { 7 m_ptr = ptr; 8 } 9 // ...10 u8 *m_ptr;11 };

Using Linear AllocationWe're implementing FrogSystemA new system tied to the levelRandomly place frogs across the level as the player is moving aroundClearly the Next Big ThingDesign for linear allocationGrab all memory up frontMr FISK (c) FLTUsed with permission

FrogSystem - Linear AllocationSimplified C++ example 1 struct FrogInfo { ... }; 2 3 struct FrogSystem { 4 // ... 5 int maxFrogs; 6 FrogInfo *frogPool; 7 }; 8 9 FrogSystem* FrogSystem_init(LinearAllocator& alloc) {10 FrogSystem *self = alloc.allocate(sizeof(FrogSystem));11 self->maxFrogs = ...;12 self->frogPool = alloc.allocate(sizeof(FrogInfo) * self->maxFrogs);13 return self;14 }1516 void FrogSystem_update(FrogSystem *system) {17 // ...18 }

Resulting Memory LayoutFrogSystemFrog PoolAllocationPointPOD Data

Linear allocation limitationsWorks well until we need resource cleanupFile handles, sockets, ...Pool handles, other API resourcesThis is the “systems programming” aspectAssume frog system needs a critical sectionKernel objectMust be released when no longer used

FrogSystem – Adding a lock 1 class FrogSystem { 2 CriticalSection *m_lock; 3 4 FrogSystem(LinearAllocator& a) 5 // get memory 6 , m_lock((CriticalSection*) a.allocate(sizeof(CriticalSection))) 7 // ... 8 { 9 new (m_lock) CriticalSection; // construct object10 }1112 ~FrogSystem() {13 m_lock->~CriticalSection(); // destroy object14 }15 };1617 FrogSystem* FrogSystem_init(LinearAllocator& a) {18 returnnew (a.allocate(sizeof(FrogSystem))) FrogSystem(a);19 }2021 void FrogSystem_cleanup(FrogSystem *system) {22 system->~FrogSystem();23 }

Resulting Memory LayoutFrogSystemFrog PoolCritialSectAllocationPointPOD DataObject with cleanup

Linear allocation limitationsCode quickly drowns in low-level detailsLots of boilerplateWe must add a cleanup functionManually remember what resources to freeError proneIn C++, we would rather rely on destructors

Scope StacksIntroducing Scope StacksSits on top of linear allocatorRewinds part of underlying allocator when destroyedDesigned to make larger-scale system design with linear allocation possibleMaintain a list of finalizers to run when rewindingOnly worry about allocation, not cleanup

Scope Stacks, contd.Type itself is a lightweight construct 1 struct Finalizer { 2 void (*fn)(void *ptr); 3 Finalizer *chain; 4 }; 5 6 class ScopeStack { 7 LinearAllocator& m_alloc; 8 void *m_rewindPoint; 9 Finalizer *m_finalizerChain;1011 explicit ScopeStack(LinearAllocator& a);12 ~ScopeStack(); // unwind1314 template <typename T> T* newObject();15 template <typename T> T* newPOD();16 };17

Scope Stacks, contd.Can create a stack of scopes on top of a single linear allocatorOnly allocate from topmost scopeCan rewind scopes as desiredFor example init/systems/levelFiner-grained control over nested lifetimesCan also follow call stackVery elegant per-thread scratch pad

Scope Stack DiagramLinear AllocatorScopeScopeActive Scope

Scope Stack APISimple C++ interfacescope.newObject<T>(...) - allocate object with cleanup (stores finalizer)scope.newPod<T>(...) - allocate object without cleanupscope.alloc(...) - raw memory allocationCan also implement as C interfaceSimilar ideas in APR (Apache Portable Runtime)

Scope Stack ImplementationnewObject<T>() 1 template <typename T> 2 void destructorCall(void *ptr) { 3 static_cast<T*>(ptr)->~T(); 4 } 5 6 template <typename T> 7 T* ScopeStack::newObject() { 8 // Allocate memory for finalizer + object. 9 Finalizer* f = allocWithFinalizer(sizeof(T));1011 // Placement construct object in space after finalizer.12 T* result = new (objectFromFinalizer(f)) T;1314 // Link this finalizer onto the chain.15 f->fn = &destructorCall<T>;16 f->chain = m_finalizerChain;17 m_finalizerChain = f;18 return result;19 }

FrogSystem – Scope StacksCritical Section example with Scope Stack 1 class FrogSystem { 2 // ... 3 CriticalSection *m_lock; 4 5 FrogSystem(ScopeStack& scope) 6 : m_lock(scope.newObject<CriticalSection>()) 7 // ... 8 {} 910 // no destructor needed!11 };1213 FrogSystem* FrogSystem_init(ScopeStack& scope) {14 return scope.newPod<FrogSystem>();15 }

Memory Layout (with context)FrogSystemFrog PoolCritialSect(other stuff)...AllocationPointPOD DataObject with cleanupFinalizer recordScope

Scope CleanupWith finalizer chain in place we can unwind without manual codeIterate linked listCall finalizer for objects that require cleanupPOD data still zero overheadFinalizer for C++ objects => destructor call

Per-thread allocationScratch pad = Thread-local linear allocatorConstruct nested scopes on this allocatorUtility functions can lay out arbitrary objects on scratch pad scope 1 class File; // next slide 2 3 constchar *formatString(ScopeStack& scope, constchar *fmt, ...); 4 5 void myFunction(constchar *fn) { 6 ScopeStack scratch(tls_allocator); 7 constchar *filename = formatString(scratch, "foo/bar/%s", fn); 8 File *file = scratch.newObject<File>(scratch, filename); 910 file->read(...);1112 // No cleanup required!13 }

Per-thread allocation, contd.File object allocates buffer from designed scopeDoesn't care about lifetime – its buffer and itself will live for exactly the same timeCan live on scratch pad without knowing it 1 class File { 2 private: 3 u8 *m_buffer; 4 int m_handle; 5 public: 6 File(ScopeStack& scope, constchar *filename) 7 : m_buffer(scope.alloc(8192)) 8 , m_handle(open(filename, O_READ)) 9 {}1011 ~File() {12 close(m_handle);13 }14 };

Memory Layout: Scratch PadFile BufferFilenameFileAllocationPointOldAllocationPointPOD DataObject with cleanupFinalizer recordRewind PointScopeParent Scope

PIMPLC++ addicts can enjoy free PIMPL idiomBecause allocations are essentially “free”; PIMPL idiom becomes more attractiveCan slim down headers and hide all data members without concern for performance

LimitationsMust set upper bound all pool sizesCan never grow an allocationThis design style is classical in games industryBut pool sizes can vary between levels!Reconfigure after rewindBy default API not thread safeMakes sense as this is more like layout than allocationPools/other structures can still be made thread safe once memory is allocated

Limitations, contd.Doesn't map 100% to C++ object modelingBut finalizers enable RAII-like cleanupMany traditional C++ tricks become obsoletePointer swappingReference countingMust always think about lifetime and ownership when allocatingLifetime determined on global levelCan't hold on to pointers – unwind = apocalypseManage on higher level instead

ConclusionScope stacks are a system programming toolSuitable for embedded systems with few variable parameters – gamesRequires planning and commitmentPays dividends in speed and simplicitySame pointers every time – debugging easierOut of memory analysis usually very simpleEither the level runs, or doesn't runCan never fail half through

LinksToy implementation of scope stacksFor playing with, not industrial strengthhttp://pastebin.com/h7nU8JE2“Start Pre-allocating And Stop Worrying” - Noel Llopishttp://gamesfromwithin.com/start-pre-allocating-and-stop-worryingApache Portable Runtimehttp://apr.apache.org/

Bonus: what about....building an array, final size unknown?Standard approach in C++: STL vector pushInstead build linked list/dequeue on scratch padAllocate array in target scope stack once size is known..dynamic lifetime of individual objects?Allocate object pool from scope stackRequires bounding the worst case – a good idea for games anyway

Scope Stack Allocation

Related slideshows

More Related Content

Scope Stack Allocation