I agree with all your general points here, and that the system could well become bandwidth limited, but I should point out that the "30GB/s per Tflop" calculation is based on official figures from Nvidia, which don't really reflect the actual clock speeds their GPUs run at. I was about to look up some figures on this online, when I remembered that I have an RTX 3070, and can just run some tests myself.
I ran tests of Cyberpunk 2077 (under three different settings, which I'll explain below), Metro Exodus, Baldur's Gate 3 and Warhammer Total War 3 (under two different settings), logging data using GPU-Z.
Nvidia advertises a base clock of 1.5GHz for the RTX 3070, and a boost clock of 1.725GHz. My card is a Nvidia Founder's Edition model, with no changes to clocks or voltages, so as stock as you can get. Across the 7 tests (totalling just under 13 minutes of in-game data logged), the clock had a median value of 1.83GHz, and an average of 1.827GHz. The GPU never clocked down as low as 1.5GHz, and was above the 1.725GHz boost clock for 92.5% of the time. The peak clock was 1.935GHz, which it hit just under 18% of the time.
Taking the median clock of 1.83GHz, that puts the RTX 3070 at a theoretical 21.6Tflops, and with 448GB/s of bandwidth (RAM clocks were constant during the tests), it would come to 20.74 GB/s per Tflop, which is quite a bit lower than the 30GB/s you're using. By my reckoning, the limit is around 3.6Tflops (although I'm allocating a bit more to the CPU than you).
Another interesting thing from my testing is that GPU-Z outputs a value for the memory controller load when logging data, which can be used as a proxy for bandwidth usage. Here's the average figures for each of the test runs:
Game | GPU Clock (MHz) | GPU Load (%) | Memory Controller Load (%) |
Cyberpunk 2077 (no RT) | 1747.2 | 98.2 | 38.1 |
Cyberpunk 2077 (RT) | 1850.1 | 98.8 | 13.6 |
Cyberpunk 2077 (PT) | 1837.0 | 96.3 | 45.9 |
Metro Exodus (RT) | 1770.0 | 98.6 | 48.8 |
Baldur’s Gate 3 | 1801.0 | 98.7 | 51.9 |
Warhammer 3 (High) | 1837.4 | 99.8 | 57.6 |
Warhammer 3 (Low) | 1842.6 | 97.5 | 64.3 |
Now I should say that I don't expect the memory controller load to be close to 100% like the GPU load typically is. I'm only using logging in one-second increments here, so nowhere near the resolution required to see intra-frame behaviour, but I'd expect that bandwidth demand is pretty unevenly distributed through the frame. There are likely parts of the rendering pipeline which are almost 100% bandwidth-limited, and other parts which aren't at all, so even with a balanced GPU you're not going to see full fully saturated bandwidth usage.
That said, there are some interesting things to take from it. One is how variable bandwidth usage is between the different games (or settings within games). The most intensive games are generally using the least bandwidth, and vice versa, with the reduction in settings from high to low on Warhammer Total War 3 causing an increase in load on the memory controller.
In a sense, this isn't entirely surprising, as the less intensive games are doing less work per pixel, and are pushing more pixels as a result. They still have to create the same buffers and read from and write to them in largely the same way, but they're doing less shader work per pixel. So for Warhammer 3 on low settings, the ratio of bandwidth required per shader op is pretty high, whereas on a game that's doing a lot of shader work on each pixel, like Cyberpunk, it's pretty low.
This also explains, in part, why bandwidth hasn't needed to increase in line with Gflops over the years. Resolutions have been increasing much more slowly than raw shader performance has, so developers have been able to put a lot more shader work in per pixel. Some changes over the years (like deferred rendering) have increased bandwidth requirements by adding extra buffers which need to be moved back and forth to memory, but generally the trend of work done relative to buffer data held in RAM has been increasing.
I actually tried to take an old game (Half Life 2) and run it at very high resolution and frame rate to see if it was more bandwidth-bound than the newer titles, but I wasn't getting very high GPU load, so it seems like there was a bottleneck elsewhere, possibly the CPU. I didn't spend much time on it, but I might go back to it.
The RT result for Cyberpunk, by the way, seems to be a bit of an anomaly. It's using everything on ultra, without path tracing, but with all other ray tracing effects turned up to max. I probably should have turned on DLSS here, but it's running full native 4K, so it was chugging along at around 4-5fps. The very definition of putting in a lot of work on not that many pixels. The PT (path tracing) run for Cyberpunk used DLSS-RR, so it was running at a somewhat more respectable 15fps or so.
When I get the time I'll see if I can down clock the memory on the graphics card to see what the impact is on performance, which may be able to give an idea of the point at which some of the games become bandwidth bottlenecked. I might also have a play around with Nsight to see if I can get some more granular data on individual frames.