⋮
However, LPDDR5 again turns in a disappointing performance. Certainly, a variety of factors mean that getting full theoretical bandwidth out of any DRAM configuration is a pipe dream. For example, you'll lose memory controller cycles from read-to-write turnarounds and page misses. But 25 GB/s is on the wrong planet.
I expected better performance out of a 128-bit LPDDR5-5500 setup. The chips themselves are rated for 6400 MT/s, meaning that theoretical memory bandwidth is totally wasted from the CPU side.
Nice 5500 MT/s there (in the BIOS). Shame we can't get that bandwidth from the CPU
To put more perspective into just how bad this is, Renoir's DDR4-3200 setup beats Van Gogh's by a massive margin. That applies even when I used process affinity to limit my test to a single CCX. 25 GB/s is something out of the early DDR4 days. For example, a Core i5-6600K can pull 27 GB/s from a dual channel DDR4-2133 setup.
The LPDDR5 setup therefore saddles the CPU with garbage memory latency, while providing bandwidth on par with a DDR4 setup out of late 2015. It's not a huge step up from a good DDR3 setup either. All that is made worse by the CPU's small L3 cache, which means the cores are less insulated from memory than they would be on a desktop or server Zen 2 implementation.
For even more perspective, we can look at memory bandwidth usage in Cyberpunk 2077. The game was run with raytracing off, allowing framerates to hover around 100 FPS. I’m using undocumented performance counters, but I’ve tested by pulling a known amount of data from memory and checking to make sure counts are reasonable.
Even desktop Zen 2 with 16 MB of L3 would find itself needing more than 25 GB/s. Less L3 capacity means even higher memory bandwidth demand. Van Gogh is clearly not optimized to get the most out of its CPU cores. It's starting to feel like a smaller console APU, where engineering effort is focused on the GPU side, rather than the CPU.
⋮