StarTopic Future Nintendo Hardware & Technology Speculation & Discussion |ST| (New Staff Post, Please read)

RainTech · Feb 10, 2022

Alovon11 said:
Exactly.
So Tharktor's main premise is inherently flawed because of Orin's uArch.

The only way to make 4SMs save space is to go in and rework the uArch which is likely expensive

With the Switchs new chip likely being in demand and sold a frickin lot more than any automotive orin variant, it might be worth the investment, Heck, they might have something in the pipeline for Switch not even on the roadmap because NDA and stuff.

Alovon11 · Feb 10, 2022

davec00ke said:
Bigger chips cost more

Well it's going to be 8SMs of size anyway so yeah

Alovon11 · Feb 10, 2022

RainTech said:
With the Switchs new chip likely being in demand and sold a frickin lot more than any automotive orin variant, it might be worth the investment, Heck, they might have something in the pipeline for Switch not even on the roadmap because NDA and stuff.

Umm, you know how big the auto industry is right?

And ah yes the good old "Spend money to get a weaker product which we would've saved more money overall by just using Orin's uArch" plan

Skittzo · Feb 10, 2022

ILikeFeet said:
I don't believe there's any reason to even keep the die the same size as the TX1. that was only 120mm2 by "coincedence". now that Nintendo is getting a bespoke chip, it can grow as large as needed within reason. and that's really anywhere below 200mm2

Yeah I agree, I'm just stating what I remember Thraktor said about it.

That said, they will indeed likely want a smaller die size in general for the purposes of chip volume. Maybe not as small as the TX1 but if they can cut it down they probably will.

RainTech · Feb 10, 2022

Alovon11 said:
Umm, you know how big the auto industry is right?

And ah yes the good old "Spend money to get a weaker product which we would've saved more money overall by just using Orin's uArch" plan

Look. Could it be that the proposed specs on here could be right on the money? Totally. I am NOT saying this can't happen. So don't @ me if it does. It can totally turn out this way, HOWEVER the possibility of this not being what we think it is is there too. Wether you like it or not. No need to get all worked up over what i clearly marked as speculation. None of us are Buissnessman working in the field or eben chip engineers. We're nerds trying to figure something out with not much to go on, so there might be factors we don't get or understand.

davec00ke · Feb 10, 2022

Are there any chips in the Switch power budget that have DLSS

Skittzo · Feb 10, 2022

davec00ke said:
Are there any chips in the Switch power budget that have DLSS

The Orin family of chips, yes.

Alovon11 · Feb 10, 2022

davec00ke said:
Are there any chips in the Switch power budget that have DLSS

Orin, the uArch that has a minimum of 8SMs of die space per GPU.

Sol · Feb 10, 2022

Skittzo said:
IMO (and I believe I deviate from the mainstream with this) I don't believe they'll want their next "true" platform to simply be a Switch 2. They'll want it to revolve around a new concept, like they have been doing ever since the DS came out. I have no idea what that could be, but I don't see it as something that will need a big jump in processing capability to differentiate itself.

This is the kind of thing that worries me with Nintendo. Always trying to reinvent the wheel. I just don't know why they won't take a page out of Sony's playbook and just release more powerful conventional consoles with each new generation, and then rely on the amazing first party games to sell the hardware.

Surely they can't be under the impression that people are begging for a new gameplay gimmick, especially after the Wii U. I can't see them abandoning the hybrid console/portable concept, and I don't know what else they could do that would be wildly different functionally than the current Switch.

For a company with many peaks and valleys in terms of its success over the years, one would think they would stick to what works instead of trying to revolutionize the paradigm of gaming with each new console.

Skittzo · Feb 10, 2022

Sol said:
This is the kind of thing that worries me with Nintendo. Always trying to reinvent the wheel. I just don't know why they won't take a page out of Sony's playbook and just release more powerful conventional consoles with each new generation, and then rely on the amazing first party games to sell the hardware.

Surely they can't be under the impression that people are begging for a new gameplay gimmick, especially after the Wii U. I can't see them abandoning the hybrid console/portable concept, and I don't know what else they could do that would be wildly different functionally than the current Switch.

For a company with many peaks and valleys in terms of its success over the years, one would think they would stick to what works instead of trying to revolutionize the paradigm of gaming with each new console.

Because their DNA is still that of a toy company. They want to innovate when it comes to hardware as well as software. And in general their innovations have worked pretty damn well, more often than not.

ILikeFeet · Feb 10, 2022

davec00ke said:
Are there any chips in the Switch power budget that have DLSS

Xavier

Alovon11 · Feb 10, 2022

ILikeFeet said:
Xavier

And Xaiver is almost the same size as Orin and draws even more power

ILikeFeet · Feb 10, 2022

Alovon11 said:
And Xaiver is almost the same size as Orin and draws even more power

that's the joke. though it also has a 10W and 15W mode

Vash_the_Stampede · Feb 10, 2022

NineTailSage said:
What Alovon11 is stating is if Dane is based on Orin and if they went with 4SM's it would still utilize 1 GPC to obtain those 4SM's which is a waste because you aren't saving die space by going with 4SM's...

I understand that. I want to know why other intelligent guys like @Thraktor believe in a 4SM product.

I’m not a techie but I’m pretty sure I’m asking the right question

JoshuaJSlone · Feb 10, 2022

RainTech said:
Of course. 4 cores as the OG, 4SM max. Way lower clocks than anticipated. Why would they do it? Because the smaller the chip is, the more they can fit onto a wafer. If the clockspeed target is lower, the less "garbage" (aka broken Chips) are on the wafer (Better yields). If the OG Switch remains the base target for a few more years, there is only so much they can do to improve games. I doubt they do "pro exclusives", especially duting a chip shortage when the new one might be hard to get. It will be more powerful but not like current estimations. It will render higher res , smoothes some framerates, might add AA + DLSS. That's all they need to improve the performance and visuals for docked mode.

If all they want is slightly better performance and no exclusives, big waste of time and added complication to develop something new rather than just finally do higher speed TX1.

Vash_the_Stampede · Feb 10, 2022

Skittzo said:
IIRC Thraktor came to the conclusion that 8SMs would result in too big of a SoC die, at least if they're trying to keep the die size similar to that of the base Switch.

But per above, the die size would not be smaller given that 8SM is one GPC.

Unless one believes that the next hardware is not based on Orin.

Skittzo · Feb 10, 2022

Vash_the_Stampede said:
But per above, the die size would not be smaller given that 8SM is one GPC.

Unless one believes that the next hardware is not based on Orin.

Nah personally (unlike Alovon) I don't think it's really all that costly or time consuming to alter the GPC design to accommodate 4 or 6 SMs without taking up any extra space if needed.

Actually I'm pretty sure the diagram shared above isn't really even close to what the GPC actually looks like. The SMs are not clustered together in one box whose shape cannot change, there are probably plenty of very trivial ways to redesign it to fit only 4 SMs and save a fair amount of die space.

Alovon11 · Feb 10, 2022

Skittzo said:
Nah personally (unlike Alovon) I don't think it's really all that costly or time consuming to alter the GPC design to accommodate 4 or 6 SMs without taking up any extra space if needed.

Actually I'm pretty sure the diagram shared above isn't really even close to what the GPC actually looks like. The SMs are not clustered together in one box whose shape cannot change, there are probably plenty of very trivial ways to redesign it to fit only 4 SMs and save a fair amount of die space.

You think Nitnendo would spend more money on changing more things about Orin than they already are?

Not to mention 4SMs likely can't do 4K even through DLSS and we know devs are targeting 4K thanks to Bloomberg

Vash_the_Stampede · Feb 10, 2022

The cost of the SOC is going to be measured in $10s of dollars. How much could they possibly save by shaving off 4SMs if we account for R&D cost and any manufacturing inefficiencies by having less leverage from the work Nvidia already did on Orin?

Doesn’t seem worth it to me as a filthy casual looking in.

Skittzo · Feb 10, 2022

Alovon11 said:
You think Nitnendo would spend more money on changing more things about Orin than they already are?

Not to mention 4SMs likely can't do 4K even through DLSS and we know devs are targeting 4K thanks to Bloomberg

The latter I don't know and I don't think you can know at the moment, but for the former I don't really see why that would require spending any more money. They're getting their own semi custom design regardless, it'll need to be designed on its own either way. You have to make a set of masks for each individual chip design, they can't just reuse the masks that have already been used for other Orin chips.

I just don't see what's so difficult or expensive about cutting out 2-4 SMs.

Alovon11 · Feb 10, 2022

Skittzo said:
The latter I don't know and I don't think you can know at the moment, but for the former I don't really see why that would require spending any more money. They're getting their own semi custom design regardless, it'll need to be designed on its own either way. You have to make a set of masks for each individual chip design, they can't just reuse the masks that have already been used for other Orin chips.

I just don't see what's so difficult or expensive about cutting out 2-4 SMs.

Heat, Margins on dead SMs, Making DLSS work with so few TOPs, us knowing that they are targeting 4K Development thanks to Bloomberg meaning that DLSS and Native performance has to be up to par to do that and 4SMs would be at absolute best PS4 Level with a borked DLSS Implimentaion which could require affixing a DLA to run DLSS which would take up cost and space right up again

Skittzo · Feb 10, 2022

Vash_the_Stampede said:
The cost of the SOC is going to be measured in $10s of dollars. How much could they possibly save by shaving off 4SMs if we account for R&D cost and any manufacturing inefficiencies by having less leverage from the work Nvidia already did on Orin?

Doesn’t seem worth it to me as a filthy casual looking in.

It's less about material cost and more about dies per wafer. I can't search at the moment but a few months back we came up with rough estimates of SM size and theoretically cutting out a number of them would give you a smaller die, meaning more dies per wafer.

To illustrate the concept with completely fabricated numbers, say each SM is 10mm². With 8 SMs let's say the final die is 150mm². If you cut out 4 SMs you're lowering that to 110mm². If you have a wafer with 700mm² of real estate available you can get 6 dies out of it with 4 SMs versus only 4 dies with 8 SMs. That's 50% more units produced per base wafer, which is a very significant number especially considering the current semiconductor climate.

MP! · Feb 10, 2022

I hardly know anything about chip production but seems to me like there's pros and cons for both outlooks

Like wouldn't having lower number of SM's mean you'd need to run them at higher clocks ... and couldn't that affect how many chips might actually need to be binned?

So Larger chip with 8SMs, pro- more powerful, run at lower clocks- Con bigger and therefore less cost effective
Smaller chip with 4 SMs- pro- more cost effective due to size, Con- Less chips per wafer that actually can hit performance/heat/power targets?

Again... I don't know anything... just asking

Alovon11 · Feb 10, 2022

MP! said:
I hardly know anything about chip production but seems to me like there's pros and cons for both outlooks

Like wouldn't having lower number of SM's mean you'd need to run them at higher clocks ... and couldn't that affect how many chips might actually need to be binned?

So Larger chip with 8SMs, pro- more powerful, run at lower clocks- Con bigger and therefore less cost effective
Smaller chip with 4 SMs- pro- more cost effective due to size, Con- Less chips per wafer that actually can hit performance/heat/power targets?

Again... I don't know anything... just asking

Add a pro for 8SMs as it can be binned to 6-7SMs if 1-2 SM are bad

Skittzo · Feb 10, 2022

MP! said:
I hardly know anything about chip production but seems to me like there's pros and cons for both outlooks

Like wouldn't having lower number of SM's mean you'd need to run them at higher clocks ... and couldn't that affect how many chips might actually need to be binned?

So Larger chip with 8SMs, pro- more powerful, run at lower clocks- Con bigger and therefore less cost effective
Smaller chip with 4 SMs- pro- more cost effective due to size, Con- Less chips per wafer that actually can hit performance/heat/power targets?

Again... I don't know anything... just asking

No, you're definitely right. It's a tradeoff. Fewer SMs get you more dies per wafer but potentially a larger percentage of binned dies. I guess part of the design process is researching/testing what exactly the most efficient design would be from both ends.

Vash_the_Stampede · Feb 10, 2022

Skittzo said:
It's less about material cost and more about dies per wafer. I can't search at the moment but a few months back we came up with rough estimates of SM size and theoretically cutting out a number of them would give you a smaller die, meaning more dies per wafer.

To illustrate the concept with completely fabricated numbers, say each SM is 10mm². With 8 SMs let's say the final die is 150mm². If you cut out 4 SMs you're lowering that to 110mm². If you have a wafer with 700mm² of real estate available you can get 6 dies out of it with 4 SMs versus only 4 dies with 8 SMs. That's 50% more units produced per base wafer, which is a very significant number especially considering the current semiconductor climate.

That’s just cost tho. If the die is big, produce more wafers and spend more money.

Alovon11 · Feb 10, 2022

Vash_the_Stampede said:
That’s just cost tho. If the die is big, produce more wafers and spend more money.

But with a smaller die you could burn more dies outright because of bad SMs

Skittzo · Feb 10, 2022

Vash_the_Stampede said:
That’s just cost tho. If the die is big, produce more wafers and spend more money.

Well yeah, all of this comes down to cost. But it's different than just 10's of dollars spent on the actual die itself, you'd need to order 33% more wafers to get the same amount of dies (assuming the binning rate is the same which is not a safe assumption), which means you'd be spending for instance $13M on wafers for one production run versus $10M.

Edit: which I now realize is basically the same thing as talking about a few more or less dollars per die so nevermind!

Alovon11 said:
But with a smaller die you could burn more dies outright because of bad SMs

I'm not disagreeing that that's a concern. But none of us here can know what the best bang for your buck is when it comes to the exact chip design. Maybe 4 SMs means you're binning too many, maybe the yield is actually higher than we think, we can't know.

The point is, there's no reason IMO to think it HAS to be 8 SMs.

MP! · Feb 10, 2022

Alovon11 said:
Add a pro for 8SMs as it can be binned to 6-7SMs if 1-2 SM are bad

that was sort of my thinking
I do hope we hear something through the grape vine within the next month or so
Being able to whittle down the details would be nice

ReddDreadtheLead · Feb 10, 2022

RainTech said:
Of course. 4 cores as the OG, 4SM max. Way lower clocks than anticipated. Why would they do it? Because the smaller the chip is, the more they can fit onto a wafer. If the clockspeed target is lower, the less "garbage" (aka broken Chips) are on the wafer (Better yields). If the OG Switch remains the base target for a few more years, there is only so much they can do to improve games. I doubt they do "pro exclusives", especially duting a chip shortage when the new one might be hard to get. It will be more powerful but not like current estimations. It will render higher res , smoothes some framerates, might add AA + DLSS. That's all they need to improve the performance and visuals for docked mode.

Of course, this is only speculation on my part. But lets not underestimate the impact of the chip shortage.

A true successor would then surface around 25-26 IMO.

There's a degree of contradiction here. Now I'm not sure, but your post seems to be implying that they designed it after the pandemic and shortages started, but the shortages would not have much say in a chip design that presumably already exists from an existing chip and you would not see said chip until 2025/2026 if it was during or after the pandemic and shortages started.

If we are to assume this year (doubt) or next year (eh...), these decisions were done before the pandemic started and likewise the chip design was mostly done, going back to cut down what would be a near finished design again to release that product would be a bizarre use of R&D.

Vash_the_Stampede said:
Have you and @Thraktor debated this? I’m wondering why his prediction is basically half of yours. The 8SM options seems easier from Nintendo’s perspective.

No, but I have on why his conclusion is wrong

they can easily go for 4, 6 or 8SMs, its a derivative of an existing chip, they can customize it how they so choose.

Thraktor · Feb 10, 2022

Vash_the_Stampede said:
Have you and @Thraktor debated this? I’m wondering why his prediction is basically half of yours. The 8SM options seems easier from Nintendo’s perspective.

Alovon11 said:
Honestly, I don't know if he's read my posts about Orin's GPCs making 8SMs by far the more likely outcome cost/effort-wise.

But just on the terms of making an SoC based on Orin, the closer to the OG Orin the SoC is the cheaper, and sticking to the closer to A78AE, A78Cs which come in either 6 or 8 cores is cheaper, and 8SMs target lets them keep clocks lower which lets them save on cooling and power delivery for the same/better performance

They also get more games to get % cuts on in the eShop if devs can very easily port their PS4/Xbox One games over to Dane more or less if it's at 8 CPU cores.

I think I have replied before, but in general I'm not going to complain if someone has a more optimistic view than me. However, given that I've specifically been asked, I may as well respond.

The claim seems to be "Orin has 8SMs per GPC, and it would be prohibitively expensive to change that for Dane, therefore Dane must have 8 SMs". Not only do I not see any evidence to support this, I'd argue the evidence suggests the opposite: changing the number of SMs per GPC is the norm for SoCs like Dane.

Let's take a look at every Nvidia SoC since they started using the current SM/GPC hierarchy (actually a SM/TPC/GPC hierarchy now, but we can ignore the TPC level for now):

Tegra X1 (Erista) - 2 SMs per GPC, desktop Maxwell had either 5 SMs per GPC or 4 SMs per GPC
Tegra X2 (Parker) - 2 SMs per GPC, desktop Pascal had either 5 SMs per GPC or 3 SMs per GPC, and HPC Pascal had 10 SMs per GPC
Xavier - 8 SMs per GPC, HPC Volta had 14 SMs per GPC (no desktop Volta chips to compare to)
Orin - 8 SMs per GPC, desktop Ampere has either 8 SMs per GPC, 10 SMs per GPC or 12 SMs per GPC, and HPC Ampere has 16 SMs per GPC

Every Nvidia architecture since they introduced the SM/GPC hierarchy has changed up the number of SMs per GPC depending on the requirements of the chip. Every SoC has also used a different GPC setup than most of the other chips, with not just a different number of SMs in most cases (Orin being the one exception), but architectural differences within the SMs, TPCs and GPCs themselves. Deciding the appropriate number of SMs per GPC is clearly the norm when Nvidia design a new chip, particularly so when it comes to SoCs. If anything, this has become even clearer with Ampere. There are currently six Ampere chips which Nvidia have provided architectural details on (for whatever reason they still haven't provided details on GA103):

GA100 - 16 SMs per GPC - HPC Ampere
GA102 - 12 SMs per GPC - Gaming Ampere
GA104 - 8 SMs per GPC - Gaming Ampere
GA106 - 10 SMs per GPC - Gaming Ampere
GA107 - 8 SMs per GPC - Gaming Ampere
Orin - 8 SMs per GPC - SoC Ampere

That's six different Ampere chips, and five different GPC setups. Suggesting that changing the SM count per GPC on a new chip design is somehow very difficult or prohibitively expensive just doesn't line up with the evidence.

Now, perhaps your argument is that Nintendo is somehow too small a customer to warrant such R&D expense, or that Dane just won't be made in large enough quantities to justify it. Again, I'd argue that the opposite is the case: a small, high-volume chip for Nintendo is exactly the kind of case where up-front R&D expenses to reduce manufacturing cost would be extremely cost-effective in the long run.

First, let's have a look at Orin, and the automotive market for Nvidia. Their most recent full financial year saw automotive revenue of $536 million. They're currently selling Nintendo about 23 million Mariko chips per year. Assuming an average $25 price, Nvidia's revenues from Nintendo Mariko sales alone would come to $575 million, which is more than the entire automotive industry. Even if that's only an estimate, it excludes anything Nintendo pays Nvidia for software, support, R&D towards future projects, etc. which almost certainly pushes it well above the auto industry even in the most pessimistic scenario. Perhaps Nvidia's auto revenue rises in the next few years with Orin, and overtakes Nintendo, but their automotive sales are for full systems, often with discrete GPUs, and a heavy emphasis on software. The actual part of that revenue that Orin accounts for would only be a fraction, and with a vastly higher sale price per chip than the smaller, lower-margin chips they sell to Nintendo, the total number of Orin chips produced will remain tiny next to what they're producing for Nintendo.

Gaming GPUs are harder to get good numbers on, as Nvidia only provide a high-level revenue figure for the Gaming segment, which covers a wide range of products (including their revenue from Nintendo). I found a report on GPU sales from last year, which gives us a rough guide to go on. It claims that 123 million "GPUs" were sold in Q2 2021 (although it includes integrated GPUs on Intel and AMD SoCs, so that's not a number for just discrete GPUs). It does claim Nvidia accounted for 15.23% of this, and as Nvidia don't sell SoCs into the PC market, we can safely assume that's all discrete GPUs, which would come to 18.73 million for the quarter. Let's assume that this has risen since then, and take a rough figure of 80 million gaming GPUs being shipped by Nvidia per year. They're currently selling consumer GPUs based on 6 different Ampere GPU chips (we'll ignore the fact that they've brought back Turing cards for the moment), so on average they're producing about 13 million chips per year of each of their gaming GPU dies. With a typical 2 year lifespan, that would put a full life-cycle for one of their gaming GPU dies around the region of 26 million units, although there's likely quite a bit of variability between individual chips.

So Mariko currently accounts for almost twice the annual production of an average gaming GPU for Nvidia, and over its lifetime the X1 has exceeded a typical GPU production run by 4 times over, with it potentially still selling for a long time to come. It's almost certainly the highest-volume chip Nvidia have ever produced by a comfortable margin at this point. Even the low end for Dane production would probably be similar numbers to a typical gaming GPU at around 20 million or so, and at the high end, if it's actually being used in a successor to Switch, it will once again dwarf any gaming GPU in terms of production volume.

Now, of course the volume of production doesn't mean they account for nearly the same revenue or profit for Nvidia as gaming GPUs, but that's all the more reason to justify up-front R&D expenses if it allows them to hit Nintendo's goals at a lower manufacturing cost. Each dollar saved on manufacturing is worth a lot more when you're talking about a high-volume, low-margin chip like Dane than a low-volume, high-margin chip like Orin. It simply doesn't make sense to me that Nvidia would refuse the basic R&D expense of reconfiguring the GPC, something they've done on almost every Ampere chip they've produced, and design a chip that's both less profitable for Nvidia themselves, and likely wouldn't hit client requirements (portable mode power efficiency) for their largest customer, on the back of the highest-volume chip they've ever produced.

ReddDreadtheLead · Feb 10, 2022

Vash_the_Stampede said:
The cost of the SOC is going to be measured in $10s of dollars. How much could they possibly save by shaving off 4SMs if we account for R&D cost and any manufacturing inefficiencies by having less leverage from the work Nvidia already did on Orin?

Doesn’t seem worth it to me as a filthy casual looking in.

it wouldn't save much as they charge you based on other things, not solely the die size. It could cost 10-20 to make, they will charge you 50-70 for it.

that said, I don't think the increase in the amount of bad yields goes up dramatically from going say, 100mm^2 to idk 140mm^2, these are still super small chips

AMD Van Gogh/Aerith APU with RDNA2 iGPU for Steam Deck pictured up close - VideoCardz.com

Steam/AMD Van Gogh APU for next-gen portable gaming Steam has lifted the embargo on first impressions/early review videos of the Steam deck. Valve’s Steam Deck is a handheld Linux-based gaming console incorporating the latest hardware from AMD. The Deck is based on AMD Van Gogh SoC (also known...

videocardz.com

scroll and select, the steam deck APU is absolutely tiny, and this is 162mm^2

Dekuman · Feb 10, 2022

The thing i would say, is if they are planning ahead for a future die shurnk version running at higher clocks, it makes more sense to go with a larger die and clock lower than to go with a smaller die with a higher clock.

I wonder if there were any lessons learned with the X1 as Nintendo's intent seems to have been to introduce a die shrunk overclocked version that never materialized. I don't think nvidia's product cyle will be fast enough to accomodate another SoC a few years after Dane Switch launches, so it makes sense for them to have the option to overclock the chip when it is shrunk down inevitably and release it as a pro product if they need to even if they don't actually plan to.

Vash_the_Stampede · Feb 10, 2022

Thraktor said:
I think I have replied before, but in general I'm not going to complain if someone has a more optimistic view than me. However, given that I've specifically been asked, I may as well respond.

The claim seems to be "Orin has 8SMs per GPC, and it would be prohibitively expensive to change that for Dane, therefore Dane must have 8 SMs". Not only do I not see any evidence to support this, I'd argue the evidence suggests the opposite: changing the number of SMs per GPC is the norm for SoCs like Dane.

Let's take a look at every Nvidia SoC since they started using the current SM/GPC hierarchy (actually a SM/TPC/GPC hierarchy now, but we can ignore the TPC level for now):

Tegra X1 (Erista) - 2 SMs per GPC, desktop Maxwell had either 5 SMs per GPC or 4 SMs per GPC
Tegra X2 (Parker) - 2 SMs per GPC, desktop Pascal had either 5 SMs per GPC or 3 SMs per GPC, and HPC Pascal had 10 SMs per GPC
Xavier - 8 SMs per GPC, HPC Volta had 14 SMs per GPC (no desktop Volta chips to compare to)
Orin - 8 SMs per GPC, desktop Ampere has either 8 SMs per GPC, 10 SMs per GPC or 12 SMs per GPC, and HPC Ampere has 16 SMs per GPC

Every Nvidia architecture since they introduced the SM/GPC hierarchy has changed up the number of SMs per GPC depending on the requirements of the chip. Every SoC has also used a different GPC setup than most of the other chips, with not just a different number of SMs in most cases (Orin being the one exception), but architectural differences within the SMs, TPCs and GPCs themselves. Deciding the appropriate number of SMs per GPC is clearly the norm when Nvidia design a new chip, particularly so when it comes to SoCs. If anything, this has become even clearer with Ampere. There are currently six Ampere chips which Nvidia have provided architectural details on (for whatever reason they still haven't provided details on GA103):

GA100 - 16 SMs per GPC - HPC Ampere
GA102 - 12 SMs per GPC - Gaming Ampere
GA104 - 8 SMs per GPC - Gaming Ampere
GA106 - 10 SMs per GPC - Gaming Ampere
GA107 - 8 SMs per GPC - Gaming Ampere
Orin - 8 SMs per GPC - SoC Ampere

That's six different Ampere chips, and five different GPC setups. Suggesting that changing the SM count per GPC on a new chip design is somehow very difficult or prohibitively expensive just doesn't line up with the evidence.

Now, perhaps your argument is that Nintendo is somehow too small a customer to warrant such R&D expense, or that Dane just won't be made in large enough quantities to justify it. Again, I'd argue that the opposite is the case: a small, high-volume chip for Nintendo is exactly the kind of case where up-front R&D expenses to reduce manufacturing cost would be extremely cost-effective in the long run.

First, let's have a look at Orin, and the automotive market for Nvidia. Their most recent full financial year saw automotive revenue of $536 million. They're currently selling Nintendo about 23 million Mariko chips per year. Assuming an average $25 price, Nvidia's revenues from Nintendo Mariko sales alone would come to $575 million, which is more than the entire automotive industry. Even if that's only an estimate, it excludes anything Nintendo pays Nvidia for software, support, R&D towards future projects, etc. which almost certainly pushes it well above the auto industry even in the most pessimistic scenario. Perhaps Nvidia's auto revenue rises in the next few years with Orin, and overtakes Nintendo, but their automotive sales are for full systems, often with discrete GPUs, and a heavy emphasis on software. The actual part of that revenue that Orin accounts for would only be a fraction, and with a vastly higher sale price per chip than the smaller, lower-margin chips they sell to Nintendo, the total number of Orin chips produced will remain tiny next to what they're producing for Nintendo.

Gaming GPUs are harder to get good numbers on, as Nvidia only provide a high-level revenue figure for the Gaming segment, which covers a wide range of products (including their revenue from Nintendo). I found a report on GPU sales from last year, which gives us a rough guide to go on. It claims that 123 million "GPUs" were sold in Q2 2021 (although it includes integrated GPUs on Intel and AMD SoCs, so that's not a number for just discrete GPUs). It does claim Nvidia accounted for 15.23% of this, and as Nvidia don't sell SoCs into the PC market, we can safely assume that's all discrete GPUs, which would come to 18.73 million for the quarter. Let's assume that this has risen since then, and take a rough figure of 80 million gaming GPUs being shipped by Nvidia per year. They're currently selling consumer GPUs based on 6 different Ampere GPU chips (we'll ignore the fact that they've brought back Turing cards for the moment), so on average they're producing about 13 million chips per year of each of their gaming GPU dies. With a typical 2 year lifespan, that would put a full life-cycle for one of their gaming GPU dies around the region of 26 million units, although there's likely quite a bit of variability between individual chips.

So Mariko currently accounts for almost twice the annual production of an average gaming GPU for Nvidia, and over its lifetime the X1 has exceeded a typical GPU production run by 4 times over, with it potentially still selling for a long time to come. It's almost certainly the highest-volume chip Nvidia have ever produced by a comfortable margin at this point. Even the low end for Dane production would probably be similar numbers to a typical gaming GPU at around 20 million or so, and at the high end, if it's actually being used in a successor to Switch, it will once again dwarf any gaming GPU in terms of production volume.

Now, of course the volume of production doesn't mean they account for nearly the same revenue or profit for Nvidia as gaming GPUs, but that's all the more reason to justify up-front R&D expenses if it allows them to hit Nintendo's goals at a lower manufacturing cost. Each dollar saved on manufacturing is worth a lot more when you're talking about a high-volume, low-margin chip like Dane than a low-volume, high-margin chip like Orin. It simply doesn't make sense to me that Nvidia would refuse the basic R&D expense of reconfiguring the GPC, something they've done on almost every Ampere chip they've produced, and design a chip that's both less profitable for Nvidia themselves, and likely wouldn't hit client requirements (portable mode power efficiency) for their largest customer, on the back of the highest-volume chip they've ever produced.

Thanks for all this.

Is the a scenario where you think we get an 8SM Dane?

Is there a scenario where you think we don’t get a Dane/Orin SoC?

Dakhil · Feb 10, 2022

(I don't agree with Jeff Grubb that yesterday's Nintendo Direct and the DLC for Mario Kart 8 Deluxe DLC are necessarily indicators about more powerful hardware not coming until 2024 at the earliest.)

This is off-topic, but Nvidia's posted a job listing a couple days ago for the Director of Architecture, CPU position at Nvidia's office at Yokneam Illit in Israel. Considering that Nvidia recently terminated the attempt to acquire Arm a couple of days ago, I wonder if Nvidia plans to continue using Arm's Neoverse designs and/or Arm's Cortex-A designs for datacentre and/or consumer products, or if Nvidia plans to return to designing custom Arm based CPUs for datacentre and/or consumer products.

ReddDreadtheLead · Feb 10, 2022

Vash_the_Stampede said:
Thanks for all this.

Is the a scenario where you think we get an 8SM Dane?

Is there a scenario where you think we don’t get a Dane/Orin SoC?

Not thraktor but will probably answer the latter one, scenario in which they don’t get an Orin/Dane SoC would be one in which Nintendo pulls all plans on it and ultimately shelves the idea of that in a similar case to what they did with the GameBoy successor that eventually came out as the GameBoyAdvance and had the GameBoy Color as the stopgap, though they wouldn’t have that as the stopgap in this case. They’d perhaps have the Switch LITE with an OLED display.

Reason to not use it (Dane) would be if they were unable to secure sufficient capacity for their needs. This extends beyond the SoC itself, such as components that pertain to the Memory, the Motherboard, the storage, the sensors, etc. if one of the crucial components is in short supply, it makes it very difficult to produce in mass quantities which is the EoS that Nintendo operates in, shifting 20M a year.

In the scenario they don’t use it, NVidia would probably rebrand it as part of the ORIN family for unique small case use dedicated for AI and probably disable some unnecessary feature.

This would of course sour the relationship between Nintendo and NVidia if it’s awaiting a tape out meaning it’s core design is complete, but some aspects of it aren’t up to Nintendo’s requirements…. And they decide to just not use it. Would sour relationship with Samsung as well as they would also be involved with this chip considering it’s their fab

davec00ke · Feb 10, 2022

Maybe Dane never existed in the first place

Dekuman · Feb 10, 2022

ReddDreadtheLead said:
Not thraktor but will probably answer the latter one, scenario in which they don’t get an Orin/Dane SoC would be one in which Nintendo pulls all plans on it and ultimately shelves the idea of that in a similar case to what they did with the GameBoy successor that eventually came out as the GameBoyAdvance and had the GameBoy Color as the stopgap, though they wouldn’t have that as the stopgap in this case. They’d perhaps have the Switch LITE with an OLED display.

Reason to not use it (Dane) would be if they were unable to secure sufficient capacity for their needs. This extends beyond the SoC itself, such as components that pertain to the Memory, the Motherboard, the storage, the sensors, etc. if one of the crucial components is in short supply, it makes it very difficult to produce in mass quantities which is the EoS that Nintendo operates in, shifting 20M a year.

In the scenario they don’t use it, NVidia would probably rebrand it as part of the ORIN family for unique small case use dedicated for AI and probably disable some unnecessary feature.

This would of course sour the relationship between Nintendo and NVidia if it’s awaiting a tape out meaning it’s core design is complete, but some aspects of it aren’t up to Nintendo’s requirements…. And they decide to just not use it. Would sour relationship with Samsung as well as they would also be involved with this chip considering it’s their fab

I frankly don't think Nintendo has the option to shelve Dane SoC actually. They need new hardware soon because Switch won't be doing 20 million forever.

People keep bringing up Game Boy Color, that was an unexpected windfall from a declining segment for them and it ultimately was just an double clocked Z80 with twice the RAM and not even new hardware, at the time the N64 was their main profit centre.

Switch is all they have, and while the OG Switch will undoubtely continue to generate a lot of sales and profits for them, they would be keenly aware of what happened with the Wii and the DS when demand dropped off a cliff for those platforms while they were struggling to ramp up the next generation.

Furukawa's comment in the recent QA about considering large existing userbase when transitioning into the next generation is as big as tell as you can get that they've been thinking about the next-gen and aren't going to let what happened with the Wii > Wii U transition happen again. And actually further up the QA someone asked him to compare Wii vs. Switch engagement metrics and he basically said a lot has happened in 15 years and they are not comparable. Yet we still keep going back to the well of old Nintendo hardware patterns. Most of the people involved with those old systems are no longer at Nintendo.

Furukawa "...As for the comparison with Wii, Wii and Nintendo Switch are game consoles with very different features, and the development of software and the way it is played are also different. In addition, the game industry and the environment surrounding our company have changed significantly since the launch of Wii in 2006, so although the cumulative sales volume of 100 million units is the same level, we are not making a simple comparison with Wii.

...Looking ahead, it will be important to maintain and grow the (current) number of nearly 100 million "annual players", which will also be important when considering the next hardware rollout."

ReddDreadtheLead · Feb 10, 2022

Dekuman said:
People keep bringing up Game Boy Color, that was an unexpected windfall from a declining segment for them and it ultimately was just an double clocked Z80 with twice the RAM and not even new hardware, at the time the N64 was their main profit centre

I only brought up the GBC because the GB2 was yet to be released and they only revamped the existing model, nothing more, and delayed the actual successor for years.

It was a contingency plan of sorts due to revamped interest in the GB platform and due to the hardware goals they had in mind not being met.

davec00ke said:
Maybe Dane never existed in the first place

Maybe it didn’t exist, sure, but what doesn’t seem likely is that a chip didn’t exist in the first place, the only time they could get a new device is like 2027 which would align time wise as best as possible with nVidia’s next hardware soc, Atlan, the successor to ORIN.

Or Dane is all the friends we made along the way.

Magic-Man · Feb 10, 2022

Dakhil said:
(I don't agree with Jeff Grubb that yesterday's Nintendo Direct and the DLC for Mario Kart 8 Deluxe DLC are necessarily indicators about more powerful hardware not coming until 2024 at the earliest.)

This is off-topic, but Nvidia's posted a job listing a couple days ago for the Director of Architecture, CPU position at Nvidia's office at Yokneam Illit in Israel. Considering that Nvidia recently terminated the attempt to acquire Arm a couple of days ago, I wonder if Nvidia plans to continue using Arm's Neoverse designs and/or Arm's Cortex-A designs for datacentre and/or consumer products, or if Nvidia plans to return to designing custom Arm based CPUs for datacentre and/or consumer products.

Switch Pro could still happen (although I doubt that personally), but the Switch 2 isn't happening for a long time.

NineTailSage · Feb 10, 2022

Thraktor said:
I think I have replied before, but in general I'm not going to complain if someone has a more optimistic view than me. However, given that I've specifically been asked, I may as well respond.

The claim seems to be "Orin has 8SMs per GPC, and it would be prohibitively expensive to change that for Dane, therefore Dane must have 8 SMs". Not only do I not see any evidence to support this, I'd argue the evidence suggests the opposite: changing the number of SMs per GPC is the norm for SoCs like Dane.

Let's take a look at every Nvidia SoC since they started using the current SM/GPC hierarchy (actually a SM/TPC/GPC hierarchy now, but we can ignore the TPC level for now):

Tegra X1 (Erista) - 2 SMs per GPC, desktop Maxwell had either 5 SMs per GPC or 4 SMs per GPC
Tegra X2 (Parker) - 2 SMs per GPC, desktop Pascal had either 5 SMs per GPC or 3 SMs per GPC, and HPC Pascal had 10 SMs per GPC
Xavier - 8 SMs per GPC, HPC Volta had 14 SMs per GPC (no desktop Volta chips to compare to)
Orin - 8 SMs per GPC, desktop Ampere has either 8 SMs per GPC, 10 SMs per GPC or 12 SMs per GPC, and HPC Ampere has 16 SMs per GPC

Every Nvidia architecture since they introduced the SM/GPC hierarchy has changed up the number of SMs per GPC depending on the requirements of the chip. Every SoC has also used a different GPC setup than most of the other chips, with not just a different number of SMs in most cases (Orin being the one exception), but architectural differences within the SMs, TPCs and GPCs themselves. Deciding the appropriate number of SMs per GPC is clearly the norm when Nvidia design a new chip, particularly so when it comes to SoCs. If anything, this has become even clearer with Ampere. There are currently six Ampere chips which Nvidia have provided architectural details on (for whatever reason they still haven't provided details on GA103):

GA100 - 16 SMs per GPC - HPC Ampere
GA102 - 12 SMs per GPC - Gaming Ampere
GA104 - 8 SMs per GPC - Gaming Ampere
GA106 - 10 SMs per GPC - Gaming Ampere
GA107 - 8 SMs per GPC - Gaming Ampere
Orin - 8 SMs per GPC - SoC Ampere

That's six different Ampere chips, and five different GPC setups. Suggesting that changing the SM count per GPC on a new chip design is somehow very difficult or prohibitively expensive just doesn't line up with the evidence.

I have to push back a little on this and I think the way desktop Ampere is organized we don't have the entire picture...
Most of the assumptions we are making is going by kopite7kimi saying Dane is based off of Orin, if the next Switch deviates heavily from Orin's established design, it's not really based on that SoC anymore.

The Tegra X1, X2 and Xavier were all the largest forms of those SoC's, but in this case Nintendo would be coming to the table with an Nvidia design that's starting much larger than what they currently can realistically use at the moment(on 8nm that is).
Any cut down versions of Nvidia's Tegra SoC's are still the same footprint as the full chip was our main argument.

I think using some of this logic one might assume Nintendo could of halved the cores in the TX1 to achieve better TDP on 20nm in portable mode and to achieve the docked performance with higher clocks. Instead they chose to use the full TX1 design and underclock the SoC to meet whatever metrics they wanted, even if they are on the conservative side.

I don't think it's ever been a concern on whether the Switch is profitable enough for Nintendo and Nvidia to warrant making something completely customized. Just reading through their recent investors notes just proves they are constantly concerned with the ongoing chip shortages and how this effects future profits. We can't fully expect Nintendo to adopt things like OLED displays, increased RAM, fast UFS storage, premium build materials on top of an extremely custom SoC(vs adapting from something Nvidia already has planned) and for this device not to be priced at $450-500...

PandaAndino · Feb 10, 2022

September 30 along side with xenoblade 3. One only can hope

karmitt · Feb 10, 2022

Dakhil said:
(I don't agree with Jeff Grubb that yesterday's Nintendo Direct and the DLC for Mario Kart 8 Deluxe DLC are necessarily indicators about more powerful hardware not coming until 2024 at the earliest.)

It'd be good if somebody could get him to clarify if it's entirely speculation or not. certainly hope it is.

ReddDreadtheLead · Feb 10, 2022

NineTailSage said:
Most of the assumptions we are making is going by kopite7kimi saying Dane is based off of Orin, if the next Switch deviates heavily from Orin's established design, it's not really based on that SoC anymore.

Mmmm, I’m going to have to disagree with this part, we are already assuming that the customized chip will be based off of ORIN correct? These customizations as said in here, in the last thread and in the thread before that stemmed to removing features that are central to the ORIN SoC, such as the automotive features and changing the CPU cores from the A78AE to the A78C. We already, by admission, heavily deviated from the established design that ORIN when we start proposing other ideas, why would the SM count be off the table for this? It is just as equally part of this scenario as the others.

ORIN by design is for AI and automotive, as soon as you remove the AI and automotive parts in the SoC, you already aren’t following ORINs design. It’s an admittance.

On another note: Something not discussed here is that, what if ORIN S is T239 which is the Dane Chip?

Magic-Man said:
Switch Pro could still happen (although I doubt that personally), but the Switch 2 isn't happening for a long time.

karmitt said:
It'd be good if somebody could get him to clarify if it's entirely speculation or not. certainly hope it is.

I think he’s not referring to a Lite OLED, just the switch permadocked/TV model which Nintendo has been working on or has worked on internally, before Aula (OLED) model was revealed. And is speculating based on trend of the switch where a new model comes every 2 years (2017, 19 and 21).

And that was based on the direct info. Because it doesn’t make sense to have a product this year really with that lineup, and a year after the other model released.

Though I’m unsure if the switch would be an active platform by 2025 hardware sales wise…. because if next year it’s switch TV or Switch Lite OLED, you aren’t getting a 2024 release for a switch 2. You’re getting a 2025 at the earliest.

Alovon11 · Feb 10, 2022

NineTailSage said:
I have to push back a little on this and I think the way desktop Ampere is organized we don't have the entire picture...
Some of those that you state as changing the number of SM's per GPC are most likely not the full chip, just as the RTX 3090 wasn't the full GA102. Most of the assumptions we are making is going by kopite7kimi saying Dane is based off of Orin, if the next Switch deviates heavily from Orin's established design, it's not really based on that SoC anymore.

The Tegra X1, X2 and Xavier were all the largest forms of those SoC's, but in this case Nintendo would be coming to the table with an Nvidia design that starting much larger than what they currently can realistically use at the moment(on 8nm that is). Any cut down versions of Nvidia's Tegra SoC's are still the same footprint as the full chip is our main argument.

I think using some of this logic one might assume Nintendo could of halved the cores in the TX1 to achieve better TDP in portable mode and try to achieve the docked performance with higher clocks. Instead they chose to use the full design and underclock the SoC to meet whatever metrics they wanted, even if they are a little on the conservative side.

Yeah that is what makes me feel 6+SMs is more likely (8SMs would likely be the base)

Deviates less from Orin, so saves costs there
8SM per GPC lets them have better margins on bad silicon, letting them either just set the SoC to 6SMs for all of them to give margin for bad dies with 1 or 2 bad SMs, or save those 6+SM Dies for a Switch Dane Lite with a dedicated SoC down the road.
8SM Dane would last longer without running into hardware limits and could more feasibility make it to 2027/2028 for an Altan+ based System
6+ SMs would give enough Tensor cores to guarantee use of DLSS at it's most effective at the hardware level where 4SMs may be limited to DLSS Performance (4x) unless paired with a DLA which would increase costs further on that possibility

NineTailSage · Feb 11, 2022

ReddDreadtheLead said:
Mmmm, I’m going to have to disagree with this part, we are already assuming that the customized chip will be based off of ORIN correct? These customizations as said in here, in the last thread and in the thread before that stemmed to removing features that are central to the ORIN SoC, such as the automotive features and changing the CPU cores from the A78AE to the A78C. We already, by admission, heavily deviated from the established design that ORIN when we start proposing other ideas, why would the SM count be off the table for this? It is just as equally part of this scenario as the others.

ORIN by design is for AI and automotive, as soon as you remove the AI and automotive parts in the SoC, you already aren’t following ORINs design. It’s an admittance.

On another note: Something not discussed here is that, what if ORIN S is T239 which is the Dane Chip?

Everything we've discussed up until now is all speculation and kopite7kimi is the only figure of authority to pull from on Dane.
Even them removing all of the automotive specific hardware is just speculation, for all we know Nintendo were playing around with Xavier seeing all of the things they can do in development. I've made the argument for the possibilities of that dedicated hardware being used in a Head-mounted VR solution in the future for head and controller tracking via cameras and sensors.

On another note after reading over the GA102 whitepaper, it seems Nvidia now ties the ROP's to the GPC's and there are 2 banks of 8 ROP's per 1 GPC. In the case of the RTX 3080 it uses 6 GPC's but only has 34 TPC's activated (which equates to the 68SM's).

I definitely agree with Thraktor on the modifications of the SM counts from GA102-107 and I guess this was done to keep ROP's higher for better performance on cards with decent amounts of Cuda-cores. Even still we haven't seen anything below 8SM's across their line-up for Ampere

ArchedThunder · Feb 11, 2022

Dakhil said:
(I don't agree with Jeff Grubb that yesterday's Nintendo Direct and the DLC for Mario Kart 8 Deluxe DLC are necessarily indicators about more powerful hardware not coming until 2024 at the earliest.)

This is off-topic, but Nvidia's posted a job listing a couple days ago for the Director of Architecture, CPU position at Nvidia's office at Yokneam Illit in Israel. Considering that Nvidia recently terminated the attempt to acquire Arm a couple of days ago, I wonder if Nvidia plans to continue using Arm's Neoverse designs and/or Arm's Cortex-A designs for datacentre and/or consumer products, or if Nvidia plans to return to designing custom Arm based CPUs for datacentre and/or consumer products.

If I have to deal with blurry Switch games and bad framerates for another 3~ years I‘m going to pull my hair out, which will be extra painful since I shave my head.

Mildudon · Feb 11, 2022

The PS2 launched in 2000. The PS3 launched in 2006. The PS2 was discontinued 2013.

Simba1 · Feb 11, 2022

Mario Kart 8 DLC could signify a longer wait for Nintendo Switch 2

Nintendo expects to sell around 4 million Switch systems during the three-month period ending March 31.

venturebeat.com

"Nintendo plans to roll them out in six waves comprised of eight courses each. These waves will begin hitting Switch in March and continue through the end of 2023. And that timing is key because it’s probably the earliest that fans could expect to see a full successor to the Switch".

The dreams — or maybe they were hallucinations — of the Switch Pro are dead. Nintendo is almost certainly instead looking to launch a follow-up to the Switch — although that will likely be hardware that maintains the formfactor and momentum of what is now Nintendo’s best-selling home console of all time".

"And that probably puts a Switch 2 launch in March 2024 at the earliest and probably closer to November 2024".

I think this is very interesting because people are saying that Jeff is one of sources that was saying that "Pro" is coming in 2022.
this clearly indicates that not only he doesn't believe that any more, but its seems he think that "Pro" is Switch 2 now and that will not be out before 2024.

I would love to hear if Nate have new infos.

mariodk18 · Feb 11, 2022

I'm just looking forward to the next Mochizuki scoop. What's the deal with the dev kits that are out there?

StarTopic Future Nintendo Hardware & Technology Speculation & Discussion |ST| (New Staff Post, Please read)

Cappy

Like Like

Like Like

Baba Yaga Hut

Cappy

Octorok

Baba Yaga Hut

Like Like

Moblin

Baba Yaga Hut

Warpstar Knight

Like Like

Warpstar Knight

Boo

Starman

Boo

Baba Yaga Hut

Like Like

Boo

Baba Yaga Hut

Like Like

Baba Yaga Hut

Like Like

Like Like

Baba Yaga Hut

Boo

Like Like

Baba Yaga Hut

Like Like

#TeamLate2025WithAPotentialForEarly2026

"[✄]. [✄]. [✄]. [✄]." -Microsoft

#TeamLate2025WithAPotentialForEarly2026

Bounty Hunter

Boo

2010 experience points!

#TeamLate2025WithAPotentialForEarly2026

Octorok

Bounty Hunter

#TeamLate2025WithAPotentialForEarly2026

Perseus Jackson

Bob-omb

Tektite

Bounty Hunter

#TeamLate2025WithAPotentialForEarly2026

Like Like

Bob-omb

Uncle Beerus

Rattata

Bob-omb

Install Base Forum Namer