SlideShare a Scribd company logo
Preliminary XSX fact finding
(it is not about 100% correctness,
it is about journey for connecting the dot), so dont use this as a fact!!!
plus RDNA1 Slide and Cache data size
by @blueisviolet
AMD Slide / Navi Fact
In Navi, two Compute Engines form a Workgroup Processor, and five of those form an
Asynchronous Compute Engine (ACE)
important link RDNA1 (Navi10)
https://www.amd.com/system/files/documents/rdna-
whitepaper.pdf
https://gpuopen.com/wp-
content/uploads/2019/08/RDNA_Architecture_public.pdf
https://gpuopen.com/rdna-shader-instruction-set-architecture-
document-now-available/
AMD Slide / Navi Fact
Some interesting of per SIMD32 block
- VGPR (128KB)
- SGPR (10KB)
also hmmm so ALU provide: x32 ALU + DP unit x2 + Transcedental? unit x8
32KB I$ for 4 SPU ( 2 WGP)
16KB K$ (data cache) per 2 WGP
AMD Slide / Navi Fact
In Navi, two Compute Engines (CU) form a Workgroup Processor, and five of those form
an Asynchronous Compute Engine (ACE)
AMD Slide / Navi Fact
1 WGP (2CU)=2xL0(2 x 16KB) + I$ (32KB) + K$ (16KB)+VGPR(512KB)+SGPR(40KB)+128KB LDS
5 WGP (1 Shader Array) (1 ACE) = connect to 128KB L1,
1 CU = 256KB VGPR + 20KB SGPR
So Why ? Seems the E F G has similar structure
Zoomed, in normal CU, the 4 5 6 7 usually different
it is like 10 Group, instead 4 like Navi, it is 6, what we know it seems
per WGP can be customized, also remember X1 Hotchip
on higher level diagram they showed as per 6 CU
Another thing is from Github, it is said 3 GDS and 6
LDS per group. Also provided X1 Hotchip slide (per 6)

More Related Content

Preliminary xsx die_fact_finding

  • 1. Preliminary XSX fact finding (it is not about 100% correctness, it is about journey for connecting the dot), so dont use this as a fact!!! plus RDNA1 Slide and Cache data size by @blueisviolet
  • 2. AMD Slide / Navi Fact In Navi, two Compute Engines form a Workgroup Processor, and five of those form an Asynchronous Compute Engine (ACE) important link RDNA1 (Navi10) https://www.amd.com/system/files/documents/rdna- whitepaper.pdf https://gpuopen.com/wp- content/uploads/2019/08/RDNA_Architecture_public.pdf https://gpuopen.com/rdna-shader-instruction-set-architecture- document-now-available/
  • 3. AMD Slide / Navi Fact Some interesting of per SIMD32 block - VGPR (128KB) - SGPR (10KB) also hmmm so ALU provide: x32 ALU + DP unit x2 + Transcedental? unit x8 32KB I$ for 4 SPU ( 2 WGP) 16KB K$ (data cache) per 2 WGP
  • 4. AMD Slide / Navi Fact In Navi, two Compute Engines (CU) form a Workgroup Processor, and five of those form an Asynchronous Compute Engine (ACE)
  • 5. AMD Slide / Navi Fact 1 WGP (2CU)=2xL0(2 x 16KB) + I$ (32KB) + K$ (16KB)+VGPR(512KB)+SGPR(40KB)+128KB LDS 5 WGP (1 Shader Array) (1 ACE) = connect to 128KB L1, 1 CU = 256KB VGPR + 20KB SGPR
  • 6. So Why ? Seems the E F G has similar structure
  • 7. Zoomed, in normal CU, the 4 5 6 7 usually different it is like 10 Group, instead 4 like Navi, it is 6, what we know it seems per WGP can be customized, also remember X1 Hotchip on higher level diagram they showed as per 6 CU
  • 8. Another thing is from Github, it is said 3 GDS and 6 LDS per group. Also provided X1 Hotchip slide (per 6)