I think I have replied before, but in general I'm not going to complain if someone has a more optimistic view than me. However, given that I've specifically been asked, I may as well respond.
The claim seems to be "Orin has 8SMs per GPC, and it would be prohibitively expensive to change that for Dane, therefore Dane must have 8 SMs". Not only do I not see any evidence to support this, I'd argue the evidence suggests the opposite: changing the number of SMs per GPC is
the norm for SoCs like Dane.
Let's take a look at every Nvidia SoC since they started using the current SM/GPC hierarchy (actually a SM/TPC/GPC hierarchy now, but we can ignore the TPC level for now):
Tegra X1 (Erista) - 2 SMs per GPC, desktop Maxwell had either
5 SMs per GPC or
4 SMs per GPC
Tegra X2 (Parker) - 2 SMs per GPC, desktop Pascal had either
5 SMs per GPC or
3 SMs per GPC, and HPC Pascal had
10 SMs per GPC
Xavier - 8 SMs per GPC, HPC Volta had
14 SMs per GPC (no desktop Volta chips to compare to)
Orin - 8 SMs per GPC, desktop Ampere has either
8 SMs per GPC,
10 SMs per GPC or
12 SMs per GPC, and HPC Ampere has
16 SMs per GPC
Every Nvidia architecture since they introduced the SM/GPC hierarchy has changed up the number of SMs per GPC depending on the requirements of the chip. Every SoC has also used a different GPC setup than most of the other chips, with not just a different number of SMs in most cases (Orin being the one exception), but architectural differences within the SMs, TPCs and GPCs themselves. Deciding the appropriate number of SMs per GPC is clearly the norm when Nvidia design a new chip, particularly so when it comes to SoCs. If anything, this has become even clearer with Ampere. There are currently six Ampere chips which Nvidia have provided architectural details on (for whatever reason they still haven't provided details on GA103):
GA100 - 16 SMs per GPC - HPC Ampere
GA102 - 12 SMs per GPC - Gaming Ampere
GA104 - 8 SMs per GPC - Gaming Ampere
GA106 - 10 SMs per GPC - Gaming Ampere
GA107 - 8 SMs per GPC - Gaming Ampere
Orin - 8 SMs per GPC - SoC Ampere
That's six different Ampere chips, and five different GPC setups. Suggesting that changing the SM count per GPC on a new chip design is somehow very difficult or prohibitively expensive just doesn't line up with the evidence.
Now, perhaps your argument is that Nintendo is somehow too small a customer to warrant such R&D expense, or that Dane just won't be made in large enough quantities to justify it. Again, I'd argue that the opposite is the case: a small, high-volume chip for Nintendo is exactly the kind of case where up-front R&D expenses to reduce manufacturing cost would be extremely cost-effective in the long run.
First, let's have a look at Orin, and the automotive market for Nvidia. Their
most recent full financial year saw automotive revenue of $536 million. They're currently selling Nintendo about 23 million Mariko chips per year. Assuming an average $25 price, Nvidia's revenues from Nintendo Mariko sales alone would come to $575 million, which is
more than the entire automotive industry. Even if that's only an estimate, it excludes anything Nintendo pays Nvidia for software, support, R&D towards future projects, etc. which almost certainly pushes it well above the auto industry even in the most pessimistic scenario. Perhaps Nvidia's auto revenue rises in the next few years with Orin, and overtakes Nintendo, but their automotive sales are for full systems, often with discrete GPUs, and a heavy emphasis on software. The actual part of that revenue that Orin accounts for would only be a fraction, and with a vastly higher sale price per chip than the smaller, lower-margin chips they sell to Nintendo, the total number of Orin chips produced will remain tiny next to what they're producing for Nintendo.
Gaming GPUs are harder to get good numbers on, as Nvidia only provide a high-level revenue figure for the Gaming segment, which covers a wide range of products (including their revenue from Nintendo). I found
a report on GPU sales from last year, which gives us a rough guide to go on. It claims that 123 million "GPUs" were sold in Q2 2021 (although it includes integrated GPUs on Intel and AMD SoCs, so that's not a number for just discrete GPUs). It does claim Nvidia accounted for 15.23% of this, and as Nvidia don't sell SoCs into the PC market, we can safely assume that's all discrete GPUs, which would come to 18.73 million for the quarter. Let's assume that this has risen since then, and take a rough figure of 80 million gaming GPUs being shipped by Nvidia per year. They're currently selling consumer GPUs based on 6 different Ampere GPU chips (we'll ignore the fact that they've brought back Turing cards for the moment), so on average they're producing about 13 million chips per year of each of their gaming GPU dies. With a typical 2 year lifespan, that would put a full life-cycle for one of their gaming GPU dies around the region of 26 million units, although there's likely quite a bit of variability between individual chips.
So Mariko currently accounts for almost twice the annual production of an average gaming GPU for Nvidia, and over its lifetime the X1 has exceeded a typical GPU production run by 4 times over, with it potentially still selling for a long time to come. It's almost certainly the highest-volume chip Nvidia have ever produced by a comfortable margin at this point. Even the low end for Dane production would probably be similar numbers to a typical gaming GPU at around 20 million or so, and at the high end, if it's actually being used in a successor to Switch, it will once again dwarf any gaming GPU in terms of production volume.
Now, of course the volume of production doesn't mean they account for nearly the same revenue or profit for Nvidia as gaming GPUs, but that's all the more reason to justify up-front R&D expenses if it allows them to hit Nintendo's goals at a lower manufacturing cost. Each dollar saved on manufacturing is worth a lot more when you're talking about a high-volume, low-margin chip like Dane than a low-volume, high-margin chip like Orin. It simply doesn't make sense to me that Nvidia would refuse the basic R&D expense of reconfiguring the GPC, something they've done on almost every Ampere chip they've produced, and design a chip that's both less profitable for Nvidia themselves, and likely wouldn't hit client requirements (
portable mode power efficiency) for their largest customer, on the back of the highest-volume chip they've ever produced.