I have a sick day and I have to look at Nvidia docs for a work project. TL;DR Do we have any reason to believe that Nintendo will go with an 8-core CPU config other than "that would be nice?"
I ask because obvious the NVN2 hack has made Drake seem beefier than expected, and a lot of people started assuming that Samsun 8nm was no longer on table. This isn't my area of expertise, but while I had the docs up I started to see how far you could pare Orin AGX down to get Drake.
Looking at the existing perf profiles for AGX and the NX and you can actually make a pretty good estimate for the TDP cost of the DLA - about 2.5W per core (running at 614Mhz/207Mhz for the Falcon). Cutting the two DLA cores from the 15W profile seems to give us a "half Drake" - 4 cores instead of 8, and 3 TPCs instead of 6, with clocks in the rough ballpark of the current Switch, at a TDP in the ballpark of handheld Erista.
The CPUs are AEs of course, so actually we can get an 8 Core Drake config without increasing the power draw from what's listed. The issue is the TPCs. There just doesn't seem to be enough power for 6 more of them. The GPU clock speeds in this config are not far above the OG Switch, there isn't much room to clock them lower.
The PVA has a wattage cost, but it is itself a customized Cortex-A proc running at lowish clocks speeds. I don't think it's huge. And the CPU clocks are even closer in this config to the OG Switch than the GPU clocks, so the only other place to move that I can see is the number of CPU cores themselves. Cut them down to 4 and you have significant power savings over the 8 cores present here. Possibly enough to, combined with cutting the PVA, to get us to 12 SMs?
This suggests a strategy for the device - improve GPU perf by tiny clock improvements, but with a huge win in parallelism, but improve CPU perf in the opposite direction - retain the same number of CPU cores, but bump up the clocks by a significant margin. This leaves us with a config something like this
CPU: 4 Core Cortex A78 @ 1.7 GHz
GPU: 12 SM custom Ampere @ 420 MHz/1GHz
RAM: 12GB LPDDR5
(cpu speed pulled from the Orin config for something that seemed like a comfortable place on the power curve to bump to, docked GPU speed maintains the existing handheld/docked ratio)
This... is a very Nintendo machine. Compared to the PS4/Xbone it beats them handily on RAM, barely outperforms the GPUs at the margins, and skimps on the CPU. Which is exactly how you would describe the Switch relative to the PS3/360, or the Wii to the OG Xbox. I think lots of folks would look at those GPU clocks in handheld and scream "because Nintendo!"
On the other hand the GPU performance is actually about where we expect it to be. 420MHz clocks on Drake's giant 12SM design gives you 1.3 TFLOPS, right at the Xbone, but running on Ampere's more modern arch and feature set. This neatly aligns with "Last Gen/PS4 In Handheld Mode + DLSS" which Nate and others have mentioned.
Are these estimates out of hand to anyone smarter than me - does quad core A78 seem reasonable, and does those cores run about a watt a piece at these clocks? If so, this seems to line up with everything we know and lots of things we expect. It doesn't require a die shrink or getting Ampere to run on a totally new process, it keeps the battery life right in line with the first Switch release, it squares the circle of "because Nintendo" and "new Nintendo" pretty nicely.
It also puts Nintendo in a nice place for a Next NEXT Switch, especially if the 4 cpus are achieved via a hardware errata down from 8 - Nintendo can take the die shrink/process change that some have assumed has already happened, and ride it to the next phase of Switch evolution, rather than rebuilding on Nvidia Next and re-re-solving the BC problem.
On the other hand if a whole ARM Cortex-A cluster is simply so efficient that it can't account for ~4W then I think I'll have to concede that the 8nm process is dead - which opens up a lot of power options in terms of clocks, and number of CPU cores.