Alovon11
Like Like
- Pronouns
- He/Them
All of this data...Part 2 of Thraktor's attempts to underclock his GPU!
So, after investigating a bit more, it turns out that the command line tool nvidia-smi (which you'll probably be familiar with if you've ever used CUDA) allows you to set fixed GPU clocks, and supports clock speeds as low as 405MHz. This is exactly what I was looking for, however after looking at the data it's not behaving quite as I'd like when it comes to voltages, which limits its usefulness for our purposes a bit. Anyway, I figured I'd report my findings here.
For these runs, I set the fixed clock speed on my RTX 3070 using nvidia-smi, performed a benchmark run of Metro Exodus Enhanced Edition (4K ultra settings with high raytracing and balanced DLSS) and used GPU-Z to log GPU data during the run. Then I took the average GPU chip power consumption (which excludes RAM and other GPU components), and the GPU voltage. The voltage would typically vary a bit, so I took the peak sustained voltage during the run. I also tracked the FPS reported by the Metro Exodus benchmark tool, just to have it. Here's the data I got:
Clock (MHz) Voltage (V) Power Draw (W) FPS W/SM FPS/W 405 0.781 37.1 15.09 0.807 0.4067 495 0.781 41.6 18.40 0.905 0.4420 600 0.781 47.3 22.12 1.027 0.4681 705 0.781 52.3 25.73 1.138 0.4916 795 0.781 56.5 28.54 1.227 0.5056 900 0.781 61.5 32.05 1.336 0.5214 1005 0.781 66.0 35.14 1.435 0.5323 1095 0.781 70.5 38.09 1.532 0.5404 1200 0.781 76.1 41.35 1.654 0.5434 1305 0.781 79.7 44.44 1.733 0.5575 1395 0.793 87.6 46.68 1.905 0.5326 1500 0.837 102.5 49.71 2.227 0.4851 1605 0.868 120.4 52.30 2.617 0.4344
Now, if you look at the voltage column, you'll probably notice the issue. Below 1.3GHz, when I set a clock speed using nvidia-smi, the voltage doesn't drop lower than 781mV. This contrasts to the behaviour when I limit clock speeds using MSI Afterburner, where I managed to run at 1155MHz at 721mV. I added the higher-clocked runs to test this, and it looks like everything at and above 1.3GHz is running the same voltage as I see on the default voltage curve in Afterburner, but below this point nvidia-smi seems to set 781mV as a limit for some reason. The nvidia-smi tool itself doesn't allow you to directly control voltages, so I'm not sure if there's a way I can work around this.
As a result of this voltage limitation, every clock speed below 1.3GHz is consuming more power than it should, were it at optimal voltage. If we compare to my previous 1155MHz/721mV run which ran at about 62W, the same clock here would consume over 10W more, due to the higher voltage. So effectively this is reasonably accurate for the range of frequencies that Nintendo would never use, but inaccurate in the range of actually plausible frequencies!
In any case, this is actually a useful instruction on how hitting a voltage floor impacts power efficiency. The last column is FPS/W, which is a measure of efficiency. If you read this from the bottom up, you can see as clocks drop from 1.6GHz down to 1.3GHz efficiency improves quite a lot, as the reduced voltage means power consumption is dropping faster than performance is. However, below 1.3GHz the voltage is static, so efficiency gradually gets worse as you go below that point. You still save power by dropping clock speeds, but you're giving up more performance than you're saving in power consumption.
This is why I suggest that Nintendo might disable some SMs in portable mode. The actual peak of performance per Watt will be at a much lower frequency than we've got here, probably in the 400-600MHz range, but there is a point where clocking down actually loses efficiency, and if they're not capable of running all 12 SMs at that peak efficiency clock in portable mode, then they'll get better returns by disabling SMs than clocking lower.
Rendered moot because of the A2000 showing far more Efficiency than this point and Orin/Drake running on their own 8nm node (assuming 8nm Drake), and an adjusted uArch due to the cache and all that