We should stick with 1.1Ghz and 12 GB LPDDR5 as a base case for sure but can someone extrapolate what the best case clock speeds & performance would be given a bandwidth ceiling with 16GB & LPDDR5X?
@Thraktor explained why we feel like we're stuck with the bandwidth limits of LPDDR5, but figured I'd step in and answer your question directly. Note - this is all
@Look over there's work, I'm just recapping.
All of the RTX 30 cards float around 30 GB/s of memory bandwidth for every TFLOP of performance. So in order to perform in the same ballpark as an RTX 30 card, the GPU needs that much bandwidth.
In a console, the CPU and the GPU share a memory bus, unlike PCs. This has performance advantages, but means you will need extra bandwidth for the CPU. CPU performance will need to keep up with GPU performance, generally, so a 20% premium is a good estimate.
That's the origin of the 3 TFLOP limit. 30 GB/s * 3 TFLOPS * 1.2 = 108GB/s, just a touch higher than the 102 GB/s that would be the max for LPDDR5.
There are some caveats.
First, obviously, not all RTX 30 cards go as high as 30GB/s/TFLOP. The highest end (RTX 3090 Ti) and the lowest end (RTX 3050) both hit 25 GB/s. That may seem unintuitive, but think of it this way. The highest end card is about pushing as much performance as possible. Extra TFLOPS past the bandwidth limit aren't
useless they're just
less useful. The 3090 Ti is a waste of electricity and money, delivering only marginal improvements for it's huge cost premium, but it's a card targeted at the "fuck it, I don't care, card go brrr" market.
The low end card is sort of the same situation. More bandwidth costs money. The physical silicon is cut down till it hits the cost Nvidia wants, but pushing the GPU clock speed (and thus the TFLOPS) is basically free. So even if those TFLOPS aren't delivering as many frames/pixels as they should, they're still good performance for the $$$.
Nintendo is slightly more constrained. Electricity matters a lot more. Every little bit of electricity spent costs battery life. It also means a bigger heat sink, and a bigger fan, which increases the size and the cost of the device, makes it more expensive to ship, and perversely leaves less room for the battery.
We know a little bit about Nintendo's hardware development process - it's a back and forth between the hardware team and the lead software teams. Software wants more performance, more features. Hardware wants to cut costs and hit battery life targets. When software hits a bottleneck, they ask for more power, when hardware hits a bottleneck they try to cut it.That's why we tend to assume that things will land in the area of best balance and peak efficiency. Because that's the place where the Performance that software wants is the cheapest for Hardware to deliver.
But it's possible that software just Needs More Power, and Hardware has to bite the bullet, and push past the bandwidth limit (like Nvidia did on the low end 3050). It's also possible that, while laying out the rest of the design, there are a few spare milliwatts in the budget, and the biggest bang for the buck is to push the GPU clocks a bit, rather than horde it for an extra 10 minutes of battery life (ala the 3090 Ti).
Of course that could also be true for the CPU clocks, or the storage speeds, or the screen brightness, or the WiFi speed or... so even if this situation occurs, that doesn't mean it's reflected in the GPU clocks. A lot of discussion in the past here has been about whether the T239 is more likely to be limited by the GPU, or the CPU, or the memory bus, or the the storage speed, or even the RAM. The fact that there hasn't been a clear consensus on any of those things is a good sign - it means that T239's design seems pretty balanced.
I tend to fall in the camp of "Nintendo went with a big design. Underclocking it would be wasteful, but it's so big that pushing the clocks isn't required either." I think if there is any wiggle room in the design, Nintendo will spend it making the device smaller, and improving battery life.