I just had a play around with the Jetson power tool myself, and I'm getting quite different estimates than were posted before.
For the GPU, my methodology was to turn off the DVA/PVA, etc., set the CPU to a fixed value, and compare the power consumption with the GPU turned off entirely to the power consumption at each GPU clock speed for 4 TPC (8 SM) and 8 TPC (16 SM) configurations. Then for each clock, I took the mid-point of the 4 TPC and 8 TPC power consumption, and subtracted the power with the GPU turned off. I set the GPU load level to high for all these tests. I feel this is a reasonable way to isolate the GPU power consumption and estimate it in a 12 SM configuration.
The power consumption figures I got for the GPU with 12 SMs are:
420.75MHz - 5.70W
522.75MHz - 6.80W
624.75MHz - 8.65W
726.75MHz - 10.85W
828.75MHz - 13.20W
930.75MHz - 16.35W
1032.75MHz - 19.90W
1236.75MHz - 29.35W
These probably overestimate the power consumption of an 8nm Drake GPU by a bit, as it uses smaller tensor cores, has fewer ROPs, and should have some small power savings by using only a single GPC rather than two. Still, I wouldn't expect the power consumption to be significantly lower than this. A few months ago I
made some rough estimates of power consumption of Ampere on 8nm by measuring my RTX 3070 and scaling from there, and I got a figure of 16.1W for 12 SMs at 1155MHz. This was a very crude estimate (just take the reported power consumption from GPU-Z and divide by the number of SMs), and seems to come in a bit lower than we see above. I'd err on the side of trusting Nvidia's official tool for estimating Orin power consumption over my rough estimate based on a GPU 4x as big, though.
I believe the original Switch's GPU consumed somewhere around 3W in portable mode, which means we're already pushing things at the base clock of 420MHz. Even if Drake comes down to about 5W at 420MHz from the smaller tensor cores and other optimisations, that's still a big increase. It's impossible to say how power consumption behaves at lower clocks, and maybe they could squeeze it in at the same 384MHz clock as the original Switch, but my guess is if Nvidia doesn't clock below 420MHz in any configuration, it's because it's not efficient to do so. I'm still of the opinion that running a full 12 SM GPU on 8nm isn't feasible in handheld mode for a device the same size as the current Switch, which means either disabling SMs in handheld mode, or a different manufacturing process.
On the docked side of things, they're more limited by cooling than power consumption, and a 768MHz clock seems reasonable with a similar cooling set-up to the original Switch, if running a bit hotter. They could possibly push up higher to 900MHz-1GHz or so if they
really wanted to, but it's more a question of Nintendo's tolerance for a more expensive cooling setup and/or more fan noise than a technical limitation.
For the CPU, as it's not possible to turn the CPU off entirely, I simply took the difference between the 4 core and 12 core CPU configuration for each clock, again with the CPU load level set to high. As a bit of a sense-check, I also checked the reported power consumption for each clock on an 8 core configuration, and validated that the difference in power draw between 4 and 8 cores was approximately the same as between 8 and 12 cores. This is true typically to within 0.1W, so I think it's reasonable to assume the power cost is linear with respect to the number of clusters activated (ie if going from 1 to 2 clusters adds the same amount of power consumption as moving from 2 to 3, then we can infer that going from 0 to 1 cluster of cores costs a similar amount).
The figures I got for an 8 core CPU are as follows:
1113.6MHz - 2.2W
1267.2MHz - 2.5W
1497.6MHz - 3.1W
1651.2MHz - 3.8W
1728.0MHz - 4.1W
1881.6MHz - 4.9W
2035.2MHz - 5.8W
2188.8MHz - 7.1W
CPU power consumption on Drake might be a bit lower due to using a single cluster and potentially less cache, but I would expect only pretty small differences here. Compared to the GPU, though, I don't think there's as big an issue. The CPU on the original Switch consumed a bit under 2W, so Nintendo could clock a bit higher than the old 1GHz. While ~1.1GHz or so might not sound like much, an 8 core A78 clocked at 1.1GHz (with 7 cores for games) is still a huge increase over 4 A57 cores (with 3 for games) at 1GHz. If they push the power draw of the CPU up a bit (which I feel is more likely than increasing handheld GPU power draw), then at 3W they could get up to around 1.5GHz, which would put them in a pretty nice spot.
The CPU side of things is a bit better than I expected, to be honest. You're obviously not going to clock 8 A78s up to 2GHz+ on 8nm in Switch's form-factor, but they don't have to clock them down to oblivion to get them to work. It would still be comfortably more capable than PS4/XBO's CPUs in a portable device while on an old manufacturing process, which isn't a bad position to be in.
So I messed around with the tool itself, I avoided all the on and off BS because it wasn't what we were trying to figure out, and since we were looking for the max power draw, with T234's configurations and architecture weirdness (A78AE on 3 clusters instead of A78C on one cluster and different gen Tensor cores, among other things), I just picked a CPU clock, choose the minimum CPU cores (2), clicked the estimate power button, noted it, then turned the core count to 10 cores, then subtracted the noted number from the new one... This gives you a power draw greater than Drake's 8 A78C cores, because it is on 3 clusters instead of one, with the A78AE cores which are bigger and have extra logic drawing power...
For the CPU I set the clock to 1497MHz on 2 cores with a high load, the other settings don't matter, because I'm isolating the CPU, the only number that matters is the difference between this one and the 10 core estimate, anyways base power draw here was 9.2w at whatever other settings I have and 2 cores at 1497MHz, with 10 cores it draws 12.4w at the same clock with a high load... this is 3.2w for this clock, remember Drake will be more energy efficient since it's 1 cluster with a smaller A78C core without the extra logic...
Next we have the GPU, I did this same thing, 2TPC (4SM) at 420MHz with a high load and saw 11w, with 8TPC at 420MHz, it estimated 15.7w, meaning Orin's additional 12SM is drawing 4.7w here...
CPU 8*A78AE at 1.5GHz for 3.2w on a High load + GPU 12SM at 420MHz for 4.7w on a high load, 7.9w... I'd suggest Orin's tweaks would net maybe half a watt over Drake's configuration, so more like 7.4w on a high load for both the CPU and GPU with those above clocks.
Now it's time to look at Erista, see we haven't been looking at high load, we've been looking at power consumption during Zelda botw, which we know lasts around 3 hours of battery life and draws 7.1w with the min screen brightness and 9w with the max screen brightness. We know that the battery is 4315mah, at 5V that is ~21.5wh, so at 7.1w per hour, you'd drain about the entire batter after 3 hours of gaming with Zelda. However, we know that other games have pushed Erista much harder, with battery life only being 2 hours, or near 11w power draw, though we can suggest the screen probably was set somewhere in the middle, we know that during Zelda gameplay, SoC is between 5.3w and 5.5w on average, this means maximum power draw for Erista is over 7w for the SoC.
With the numbers I've ran, in the method I've used above, and the understanding that Orin isn't as efficient for gaming as Drake would be, these clocks are what make sense on 8nm:
CPU: 8*A78C at 1.5GHz
GPU: 12SM at 460MHz in portable and 768MHz in docked (read ahead)
I also wanted to give Orin's medium load numbers, as this would match up better with Zelda's gaming numbers, with the clocks above, Nvidia power tools gives me 2.1w for the CPU and 4.2w for the GPU (it should be noted that the change being so small in the GPU when load is reduced, probably means it is near the low end for the power curve), because of the small change here, they might look at Switch's 460MHz clock for portable still, which would add around .5w here at medium load.
This gives you 6.3w with Orin at those clocks and Drake's core count, remember you'd probably see a reduction of maybe half a watt moving to Drake, so ~5.8w, but with Switch's portable clock of 460MHz, you'd be back up to ~6.3w.
Docked with the GPU clocked at 726.75MHz, it would draw 10w with Orin's chip on a high load, however with 828.75MHz it draws 12.5w, so with Switch's 768MHz, it should draw about 11w on a high load, however at 768MHz for Orin on medium load, we are looking at just under 10w. Again Drake should be more efficient here, so you could see a couple 10th of a watts shaved off, suggesting this clock to be valid for Drake as well, pushing the SoC up to only ~12w in docked mode in a medium load and ~14w for a high load.
That's where I'm going with this, this is the simplest way to use this Nvidia power tools, all I'm doing is isolating what 8 Orin CPU cores adds to the power draw regardless of the extra architecture it would have, same with the GPU and adding 12SM, these are more power draw than Drake should see with Orin's added complexity, so it's save to use these and reduce slightly.
The clocks I gave earlier were based on BlackTangMaster's use of Nvidia Power Tools and whatever method he mathed out, but I couldn't figure that out and what made sense for me was to just subtract the base power draw from the CPU and GPU configurations in isolation, the earlier clocks would probably work with Samsung 5nm, which Drake could very much be, we don't know, it's tough to guess the future, so if we see those higher clocks, I'd generally assume the chip was shrunk to a more advance node, this is something Nintendo did with the Wii back in the day, moving from 130um to 90nm from Gamecube to Wii, and 5nm has been available for years with Samsung losing major companies from that process node, I wouldn't write off Nvidia and Nintendo going with 5nm for Drake.