Preliminary xsx die_fact_finding
- 1. Preliminary XSX fact finding
(it is not about 100% correctness,
it is about journey for connecting the dot), so dont use this as a fact!!!
plus RDNA1 Slide and Cache data size
by @blueisviolet
- 2. AMD Slide / Navi Fact
In Navi, two Compute Engines form a Workgroup Processor, and five of those form an
Asynchronous Compute Engine (ACE)
important link RDNA1 (Navi10)
https://www.amd.com/system/files/documents/rdna-
whitepaper.pdf
https://gpuopen.com/wp-
content/uploads/2019/08/RDNA_Architecture_public.pdf
https://gpuopen.com/rdna-shader-instruction-set-architecture-
document-now-available/
- 3. AMD Slide / Navi Fact
Some interesting of per SIMD32 block
- VGPR (128KB)
- SGPR (10KB)
also hmmm so ALU provide: x32 ALU + DP unit x2 + Transcedental? unit x8
32KB I$ for 4 SPU ( 2 WGP)
16KB K$ (data cache) per 2 WGP
- 4. AMD Slide / Navi Fact
In Navi, two Compute Engines (CU) form a Workgroup Processor, and five of those form
an Asynchronous Compute Engine (ACE)
- 5. AMD Slide / Navi Fact
1 WGP (2CU)=2xL0(2 x 16KB) + I$ (32KB) + K$ (16KB)+VGPR(512KB)+SGPR(40KB)+128KB LDS
5 WGP (1 Shader Array) (1 ACE) = connect to 128KB L1,
1 CU = 256KB VGPR + 20KB SGPR
- 6. So Why ? Seems the E F G has similar structure
- 7. Zoomed, in normal CU, the 4 5 6 7 usually different
it is like 10 Group, instead 4 like Navi, it is 6, what we know it seems
per WGP can be customized, also remember X1 Hotchip
on higher level diagram they showed as per 6 CU
- 8. Another thing is from Github, it is said 3 GDS and 6
LDS per group. Also provided X1 Hotchip slide (per 6)