⋮
Architecture in Brief
Qualcomm's Adreno 6xx architecture diverges from Adreno 5xx's Radeon origins, and adds a separate low priority compute queue to the command processor. This queue lets the driver reserve the primary ring buffer for higher priority tasks, and is meant to allow background compute without display stutters. AMD in contrast jumped in the deep end with multiple asynchronous compute queues.
Adreno also focuses on tile based rendering, which tries to lower bandwidth requirements when rasterizing graphics. Primitives from vertex shaders get sorted into tiles (rectangular portions of the screen), which are rendered one at a time. That improves cache locality, and lets the GPU buffer the render-in-progress tile in specialized memory. Qualcomm calls that GMEM, and Adreno 690 gets 4 MB of that. More GMEM should let Adreno 690 better leverage its larger shader array by handling more pixels at a time. For comparison, Adreno 640 has 2 MB of GMEM.
Counting SPs. Die photo from Kurnal, labels added by Clam
Mesa source code indicates Adreno 690 has 8 "CCUs". On Adreno 730, a CCU corresponds to a pair of SPs, or Shader Processors. Adreno 690 likely has a similar design. A die shot of Qualcomm's 8cx Gen 3 also shows a similar GPU layout with SPs organized in pairs, unlike Snapdragon 821's Adreno 530.
Therefore, Adreno 690 is a large GPU by Qualcomm standards. It has twice as many SPs as their highest end Snapdragon cell phone chips. Even though it uses a prior generation GPU architecture, its large shader array still demands respect.
Mesa code suggests Adreno 690 has 16 KB instruction caches and 64 KB register files. Each SP's scheduler probably has two 16 entry partitions, and manages 128-wide waves.