Well, frankly all my previous assumptions are out the window now, so I can't rule anything out. My gut instinct is that a Samsung EUV process is more likely because this seems to share a lot of DNA with Orin, and it would be easier to design both of them in parallel if they're being manufactured by the same foundry. The reason I'd suggest Samsung 5nm is that I imagine Samsung want to migrate their 7nm capacity over to 5nm (and possibly already have), because they can charge more for the latter, and both are limited by their EUV lithography equipment. It could in theory be any of Samsung's 7nm, 5nm or even 4nm processes, but I'm just guessing 5nm being most likely on a hunch.
On the TSMC side, Nvidia have A100 on their DUV 7nm process, and they don't have any products on 6nm, but given it's part of the same family as 7nm (and shares design rules, as far as I'm aware), and that TSMC themselves seem to be actively pushing new chips onto 6nm rather than 7nm, it would seem the most likely TSMC process. I would say that TSMC 5nm is very, very unlikely, but given the current revelations I wouldn't say it's strictly impossible (let's just say a 1% chance). The reason I say this is that TSMC are rapidly expanding their 5nm capacity, to the point where it should actually exceed the 7nm/6nm capacity in the not-too-distant future, and Nvidia have committed to a very large allocation of this. They'll be using this for Hopper, probably Grace, Ada (kopite recently mentioned that
all Ada GPUs are going to use TSMC N5, not just the high-end ones), and possibly more stuff we don't know about.
This means they'll be using a lot of TSMC N5 wafers for other stuff, but conversely if they have made huge advance payments to TSMC for that capacity, having many different product lines use it gives them flexibility in how they use up that allocation. If Grace doesn't sell as well as expected, for example, they can divert the allocation towards Ada consumer GPUs, or Hopper HPC GPUs, or perhaps even Drake. Conversely if they build Drake on N6, then it's the only chip they're going to be making on it, and if TSMC also require a significant commitment to get N6 allocation in the same vein as what they're doing for N5, then Nvidia will have a lot less flexibility with that allocation. Basically if Nintendo's Drake-powered model sells below expectations then Nvidia will have N6 allocation they can't really do anything else with, and if it exceeds expectations then it may be more difficult to scale up production.
I'd expect Samsung to be a lot more flexible, because they aren't in the same kind of negotiating position as TSMC.
The issue is that there's a limit on how much power you can save by reducing clocks. When you see power consumption vs clock graphs, the driving factor in terms of power consumption (and the reason you see something that looks like an exponential curve) is the voltage, as voltage typically increases alongside clock speed, and power consumption is proportional to voltage squared (which is why you typically get better perf/W at lower clocks). However in practice there's a minimum voltage required for any chip to operate, so you can't just keep reducing voltage ever further to keep getting better efficiency from the chip. For any given chip there's some clock speed that can be reliably achieved on the minimum voltage, and that clock speed is around the peak of the performance per Watt for the chip. You can clock lower than it, but as you can't reduce voltage any lower, your power consumption doesn't reduce much and your perf/W gets much worse.
As a specific datapoint, for Mariko we know that this "max clock at min voltage" GPU clock is 384MHz (at least as far as Nvidia is concerned). The Linux4Tegra DVFS tables for Mariko support clocks lower than 384MHz, but every clock 384MHz and lower uses the same voltage, so there's not much point clocking below that if you're concerned about performance per Watt. This clock for Drake will likely be different, and due to the improved manufacturing process compared to Mariko, it will probably be higher. It's quite possible that there's no benefit from clocking any lower than 500MHz or so (which is why I labelled my clock estimates conservative).
I'd caution about using MX570 as an example, as it hasn't released yet, so we don't have any confirmed details, and in particular we don't know what they steady-state clocks are with a 25W TGP (which is just tricky in general because of how much GPU clocks vary and how measures like TDP and TGP are less clear than they used to be). The laptop version of the RTX 3050 is out and about, which is very similar but at a slightly higher TGP, but I can't find any reviews which actually measure clocks for this. If I get the chance I might see if I can downclock my RTX 3070, as GPU-Z reports a GPU package power draw (ie excluding memory or anything else on the board), and I could divide by the relative number of SMs to get a rough idea of what Drake's GPU might consume at these kind of clocks.
That said, I don't think it's impossible that Drake could be on 8nm and use 12 SMs in docked mode, I just feel it's just hitting into higher power consumption territory than I'd personally expect. This plus kopite becoming less certain on the manufacturing process makes me think other nodes might be back on the table.
On disabling SMs, I also
argued basically the same thing just a few weeks ago. However, we now have details showing:
- A much larger GPU than pretty much anyone expected.
- Confirmation that this is the only Ampere GPU to support clock gating.
The evidence seems to support the idea that they're disabling SMs in portable mode. As to why they would do that, rather than just use a smaller GPU and clock it higher, one possible explanation is that they want a close match in terms of performance-per-pixel in both modes. With 4K output in docked mode and 1080p in portable that's a stretch to do without disabling SMs, as you'd need a 4x clock difference between the two modes. Plausibly this could be done with a 400MHz portable/1.6GHz docked, but I don't know how feasible that is. If they're using a 720p display, then there's a 9x difference in resolution, and that just isn't feasible to achieve with clock differences alone. Disabling SMs would allow them to maintain performance-per-pixel parity across both modes without pushing docked clocks too high, which may be enough of a reason for Nintendo.