Hey guys, can someone give me a more detailed explanation on why Ampere's cache is such a problem? i have been trying to look into it on my own time but I can't find anything worth reading
Basically the gist is that Ampere needs to be well feed with data, otherwise it ends being sub-utilized and stall. More cache means more bandwidth available. But
@Alovon11 can explain this more in-depth.
Sorry for the late response.
Honestly, some others on here know more about cache interactions and whatnot than I do. but the problem I assume you're referring to Ash is moreso Bandwidth, correct?
Well, the main concern about memory bandwidth is that it is a major factor in regard to how much data can be pushed through the GPU in a certain amount of time.
More Bandwidth means higher quality textures/more resolution data, model geometry.etc
So for a system that is allegedly pushing high resolution images with high res textures, and complicated/memory expensive effects like Ray Tracing, Bandwidth does seem to be a concern considering LPDDR5/LPDDR5X cannot break 200GB/s (Where the Series S is...and midrange AMD GPUs without Infinity Cache to prop them up). At least in the assumed 128Bit memory bus that Drake (The SoC for Switch 2) most likely will use. As to break past the ~100-120GB/s range we assume generally, they'd need to go 256bit on the bus, and add a lot more modules which would ramp up complexity, cost, and increase cooling demand and size.
Cache comes in to play, like in the Desktop RDNA2/RDNA3 cards, to help alleviate lower raw memory bandwidth as the cache helps the GPU get some data filled in and out far faster, lowering stall times, and if some data can fit in and out without completely overwhelming the cache, it can vastly improve performance (EX: RDNA2 scaling better at lower resolutions than Desktop ampere as lower resolutions can fit inside of the Infinity Cache itself so data is transferred faster. Meanwhile at Native 4K.etc the difference levels back towards the raw bandwidth as Infinity Cache gets completely overwhelmed. This is part of why the higher end ampere cards usually outpace the higher end RDNA2 cards at 4K as they have flat out higher raw bandwidth)
Drake's cache is a concern as it seemingly only has 1MB of L2, which is a reduction vs even Desktop Ampere of equivalent size
Now, like Redd mentioned, a System-Level Cache can help alleviate this a lot, which is why he (And I myself) hope that Drake has one. Also, Tegra SoCs like Drake can access some of the CPU's Cache, which considering Drake is using the A78C CPU cores most likely (C being for Cache as it has big cache amounts), that does help.
I will say Bandwidth and Memory is a tricky thing as Latency is also a factor, and while LPDDR has lower bandwidth than GDDR like in the Series S|X/PS5 or other GPUs, it also has waaay lower latency.
So theoretically, while having lower bandwidth, it can make calls to the memory far faster than the systems using GDDR with that lower latency. CPUs like this a lot, but GPUs can work with it to some extent depending on the architecture/optimization.
An interesting observation is Ray Tracing itself may like lower latency about as much as high bandwidth if Steam Deck Ray Tracing testing is to be believed as it seems to scale pretty relative to the GPU itself rather than memory bandwidth being a issue vs Series S (Comparing RT Performance seems to indicate no major memory problems, just the raw reduction in CPU/GPU cores and clocks being more pressing for that)