• Hey everyone, staff have documented a list of banned content and subject matter that we feel are not consistent with site values, and don't make sense to host discussion of on Famiboards. This list (and the relevant reasoning per item) is viewable here.

StarTopic Future Nintendo Hardware & Technology Speculation & Discussion |ST| (Read the staff posts before commenting!)

Nvidias cfo saying supply is good, is a whole lot better than Nvidias cfo saying supply is bad. Cause that would be really fucking bad.
Understood. I thought you were commenting on the Bloomberg source.

In this case nvidia is fabless and they have little incentive to lie imho about supply issues.
 
0
It‘s funny how they try to show DLSS in action while you can‘t really spot any differences in a livestream over YouTube

Seriously. It's funny people complain about screen resolution on small devices when the difference is almost impossible to notice. I stopped watching Digital Foundry on my phone (1440p screen) because I really couldn't tell the difference when watching their comparisons. Doom Eternal on Switch looked amazing when I saw the first trailer on it. I watched it on my monitor to notice the flaws.
 
Seriously. It's funny people complain about screen resolution on small devices when the difference is almost impossible to notice. I stopped watching Digital Foundry on my phone (1440p screen) because I really couldn't tell the difference when watching their comparisons. Doom Eternal on Switch looked amazing when I saw the first trailer on it. I watched it on my monitor to notice the flaws.
Digital Foundry also brings up the "400x zoom" issue, if you can call it that. when that's what's required to separate games, there's no difference beyond academics. as far as Nintendo ports, they're fine, just need to fix the clarity issue, which will come with Dane
 
Digital Foundry also brings up the "400x zoom" issue, if you can call it that. when that's what's required to separate games, there's no difference beyond academics. as far as Nintendo ports, they're fine, just need to fix the clarity issue, which will come with Dane
Even Digital Foundry has mentioned how we seem to now live in a "post-resolution" landscape after numerous assesments of DLSS 2.x.

It will now be about image quality and effects from here on out.
 
Even Digital Foundry has mentioned how we seem to now live in a "post-resolution" landscape after numerous assesments of DLSS 2.x.

It will now be about image quality and effects from here on out.

What a day it'll be when Nintendo is right there with the rest. It'll never come out on top in effects comparisons, but frankly I don't care. No longer having to worry about resolution and framerate with every new title will be so so good.
 
What a day it'll be when Nintendo is right there with the rest. It'll never come out on top in effects comparisons, but frankly I don't care. No longer having to worry about resolution and framerate with every new title will be so so good.
now you just have to worry about poor implementation of DLSS!
 
It’s why I believe that, if devs target 720p30-60FPS docked for the next model and ~5040 handheld mode with an aim of 30-60FPS, then the ports should function pretty fine in the long run.

Using some extra for handheld mode just to get it as close to the native (like DLSS) image. It is using a small display, so even if it doesn’t hit 720p it should look mostly fine to most people. Hell it’s hard to spot the difference on my phone, and it’s around the same size as the OLED display.

I think that’s decently realistic. It shouldn’t be CPU or GPU limited so much in these scenarios, and if it has an L3 cache that the GPU can utilize it can help it go a bit more above its weight and deal with the LPDDR5/X limitations.

I do think they’ll go for something like L3, or some exotic embedded RAM that helps to improve the performance while not consuming to much power, since that’s what they want. Rather than extra performance they can use it for extra efficiency.


ORIN has an L3 for the GPU portion I think? Edit: it does not, but it has 4MB of System level Cache. Not sure if they can purpose that for the GPU as well.
 
honestly, if they can get DLSS to work, it will be a nice cheap way to get extra performance out of the chipset and likely look as good as near native resolutions of whatever it's targeting. Even if it's not perfect. That's really the idea, otherwise a lot of the GPU die is going to be wasted if they are using the Dane chip we are speculating. They'd be better off just getting a custom chip and stuffing it full of CUDA cores.
 
Last edited:
It’s why I believe that, if devs target 720p30-60FPS docked for the next model and ~5040 handheld mode with an aim of 30-60FPS, then the ports should function pretty fine in the long run.

Using some extra for handheld mode just to get it as close to the native (like DLSS) image. It is using a small display, so even if it doesn’t hit 720p it should look mostly fine to most people. Hell it’s hard to spot the difference on my phone, and it’s around the same size as the OLED display.

I think that’s decently realistic. It shouldn’t be CPU or GPU limited so much in these scenarios, and if it has an L3 cache that the GPU can utilize it can help it go a bit more above its weight and deal with the LPDDR5/X limitations.

I do think they’ll go for something like L3, or some exotic embedded RAM that helps to improve the performance while not consuming to much power, since that’s what they want. Rather than extra performance they can use it for extra efficiency.


ORIN has an L3 for the GPU portion I think? Edit: it does not, but it has 4MB of System level Cache. Not sure if they can purpose that for the GPU as well.

So just for reference Orin has 192KB of L1$ per SM and 4MB of L2$ overall.
Now can Nintendo and Nvidia customize that L2$ amount even further to eliminate the need for higher memory bandwidth is the better question.

How close could they possibly get to AMD's RX 6500XT and 6400 cards which have 16 and 12 CU's respectively with 16MB of Infinity cache.
Increase in die size would have to be taken into account and if that is the better overall solution versus just increasing RAM which also effects space of components and thermals.
 
honestly, if they can get DLSS to work, it will be a nice cheap way to get extra performance out of the chipset and likely look as good as near native resolutions of whatever it's targeting. Even if it's not perfect. That's really the idea, otherwise a lot of the GPU die is going to be wasted if they are using the Dane chip we are speculating. They'd be better off just getting a custom chip and stuffing it full of CUDA cores.

I don't think this is really the case, yes in comparison to high-end graphics cards Dane wouldn't measure up in RT lighting but Nintendo no doubt would find other creative ways to use that hardware with interactive game-play ideas. Again I could really see them use that hardware to push 3D spatial audio as a modern next-gen feature over lighting in this next Switch revision.
 
I don't think this is really the case, yes in comparison to high-end graphics cards Dane wouldn't measure up in RT lighting but Nintendo no doubt would find other creative ways to use that hardware with interactive game-play ideas. Again I could really see them use that hardware to push 3D spatial audio as a modern next-gen feature over lighting in this next Switch revision.
they might be able to do some reflections, like we see with Crysis Remastered. there's also RTX GI that also works on non-RT accelerated hardware. so that can also utilize the RT core to some capacity. question is, how much speed up will we see
 
So after getting my OLED switch I actually do think they'll up the resolution of the screen for Dane.
I was very much in the camp that they would stick with 720p, but I think they'd probably increase to 1600x900 or full HD

I'd still be very fine with a 720p display of course as I think it is fine. And I guess it all boils down to what screens and resolutions are available and affordable by the time they're ready to assemble.
 
0
So just for reference Orin has 192KB of L1$ per SM and 4MB of L2$ overall.
Now can Nintendo and Nvidia customize that L2$ amount even further to eliminate the need for higher memory bandwidth is the better question.

How close could they possibly get to AMD's RX 6500XT and 6400 cards which have 16 and 12 CU's respectively with 16MB of Infinity cache.
Increase in die size would have to be taken into account and if that is the better overall solution versus just increasing RAM which also effects space of components and thermals.
I think 4-8MB of L3 (I know, not L2) can be enough for the whole GPU perf.


Or Nintendo can opt for an embedded RAM to function as cache that both the GPU and CPU use like the WiiU did. With that, perhaps a 16-32MB?

Though that consumes a gargantuan amount of die space lol
 
0
I wanted to get to this post before, but forgot and then remembered now

22.1mm=2000 pixels

20.8mm=1495 pixels

GPU(2048 cores) + cache memory + interconnect=~861x1142 pixels

9.5mm x 15.88mm = ~150.86mm^2 (2048 cores)

CPU + Caches= ~693x450

7.65mm x 6.26mm = ~47.89mm^2 die space (12 cores)

This does not include the DLA, MCs, PVA, etc., just the GPU + Cache and the CPU+Cache.

If shortened/ made smaller, perhaps?

i.e., 512-1024 CUDA Cores + 8 CPU cores?

9.5mm x ~7.94mm = ~75.47mm^2 (1024 CUDA Cores)

9.5mm x ~5.95mm = ~56.603mm^2 (768 CUDA Cores)

9.5mm x ~3.96mm = ~37.64mm^2 (512 CUDA Cores)

This is about how much of the die these would occupy for the shaders + the cache

+ the interconnect, if it remains at roughly the 45MTr per mm. Though, as GA106 has shown they can be a bit denser, with it being 48MTr per mm, can’t they make it even a bit denser to fit as much in a small package? Unsure. It is supposed to be a derivative, correct? So, to the specifications it likely doesn’t have to follow it to a T

~7.65mm x ~4.21mm = ~32.249mm^2 (8 CPU Cores)

~3.82mm x ~6.26mm = ~23.96mm^2 (6 CPU Cores)

~3.82mm x ~4.17mm = 15.94mm^2 (4 CPU Cores + no little cores)

When looking at the Tegra X1 die shot, you mentioned that the SM Logic extends beyond that, but that begs the question of, even with the smallest configuration of say 4SMs + 4 CPU Core, wouldn’t this extend to beyond 100mm^2 as a die size regardless unless they go with 2 SMs again? And 4 cores? The 4 CPU cores are closest to the 4 A57 CPU cores on die space, as per Locuza noting it to be around the 13mm^2 range.



And it was brought up to me, but why do they have to limit it to 100mm^2 or lower? I understand having a small chip, but don’t understand why they necessarily need to in this context. If they stretch it to say, 120? 130? 140? 150? Or hell, 160mm^2 if they need to what would be the issue here really? I understand that a smaller chip means a lower cost and can have more chips that are not damaged on the wafer as they are each separate and individual, but I don’t think the penalty is that severe even for a smaller chip like this in comparison to the PS5 or Series X APUs which would be far bigger and where a damaged good becomes more detrimental there. I’m not really saying that they should go for 160 mind you, I just don’t see why they need to be at 100mm^2 or less here.


It’s not like there isn’t a possible benefit of using a larger chip while having it at lower clocks which we reasonably expect, in that it could be easier to cool due to the higher amount of surface area.


If it is highest config (8SMs + 8CPU cores) and it uses the lowest density, I suspect that it being ~160mm^2 for the total die size at their own modification to it. The PS5 and Series X APU make a good use of the amount that is taken up by the GPU, CPU, Memory Controllers, Media Engines, etc., and don’t seem to take up too much with respect to the Logic, isn’t that possible here as well for Dane?

And then there is the physical space of the switch, the TX1, OG even being 18% larger than the Mariko, didn’t take up that much space, and Nintendo even managed to squeeze the overall package even smaller with the OLED revision, or rather, more kept/merged together at the cost of a smaller fan (because it wasn’t needed to be that large?).

So, again, I fail to see why they have to keep the die so small, they seem to have a bit more elbow room for this and can squeeze a bit more while not requiring to get blood from stone here. Mind you, again, I don’t think it will be that large, I just don’t see why it needs to be so small. Or why it would be that small.

Would it really be that much more expensive going Fromm a 100mm^2 die to a, say 120, 140 or even 160?

And this is assuming that it isn’t denser than the desktop equivalent at all, though the GA106 die is denser than it’s sister dies.


Firstly I wouldn't say 100mm2 is a hard limit on size, just that my expectations are for something around that size. It could be around 120mm2, maybe smaller, but I wouldn't expect anything significantly bigger.

As for the reasons to expect a die around that size, there are two. Firstly cost, which should be obvious enough, as the SoC is the most expensive component in a gaming device, and a smaller die is cheaper than a bigger die. The second is power consumption.

Now, you'll probably have heard many people (including myself) point out that larger chips can in general reduce power consumption at the same performance level by allowing for reduced clocks. This is due to the power consumption of an integrated circuit being proportional to the square of the voltage being supplied, so as you increase voltage (to achieve higher clocks) the power consumption increases faster than the performance. Hence, a (let's say) 8SM GPU running at 1GHz should consume less power than a 4SM GPU running at 2GHz, while providing effectively the same performance, all other things being equal.

The problem in this case is that this only applies to higher clock speeds. At low clock speeds, there's quite a different behaviour, as there's a minimum voltage required for the chip to actually operate, so as you reduce clocks to zero the voltage doesn't go down to zero, it goes down to this minimum voltage. In practice there is some maximum clock speed which can be attained at this minimum voltage, and there's very little reason to clock lower than this. For example, if a chip could hit 300MHz at its minimum voltage, then reducing the clock to 150MHz would consume almost as much power, but give you 50% less performance, so would be a net loss in power efficiency. This is the reason, when you check clock speeds on your PC or phone, you don't see CPU or GPU clocks dropping down to 1MHz when idle. There's some minimum clock speed which they will idle at where there are no meaningful power savings to be had to clock lower.

Bringing this back to Switch, the most important power limit for Nintendo and Nvidia when designing the new SoC isn't how much power is consumed while at high clock speeds (ie docked), it's how much power is consumed at low clock speeds (ie handheld). Nintendo will want the new model to have some basic level of battery life, which means they're going to set strict design limits on handheld-mode power consumption for the SoC. For the GPU side of things, this makes the minimum voltage a key design constraint; they can't go lower than that, so they can't use a GPU that consumes too much power at that minimum voltage.

More specifically, we can actually see from Nvidia's DVFS tables that the base 384MHz handheld clock speed used by the Switch is already running at the lowest voltage on the 16nm Mariko (although not quite on the original 20nm TX1). With the move to 8nm, it's reasonable to assume that this max-clock-on-min-voltage will increase again, let's say to around 500MHz. Let's assume that Nintendo have budgeted 3W for the GPU in handheld mode. If 4SMs at 500MHz consumes around 3W, then they simply can't use a larger GPU without breaking their power budget. Increasing to 6SMs or 8SMs, even if clocks were decreased to 300MHz or 250MHz respectively, would still consume more power, as the savings from the reduced clocks wouldn't be anywhere near enough to offset the larger GPUs.

Note that the CPU is also impacted by the same behaviour, although it's a bit different as the clocks will likely still be the same across handheld and docked modes. Ditto with basically all other hardware on the SoC, like security coprocessors, DSPs, etc.

Now, I don't know exactly the power budget Nintendo will allocate to the SoC in handheld for the new device, although it's likely to be within the ballpark of the original Switch and the 2019 Mariko revision. I also don't know the exact minimum voltages or corresponding clock speeds, or power consumption thereof of the new GPU, CPU, etc. However, it's pretty safe to say that it's roughly proportional to die size, as a large proportion of power consumption of ICs at low clocks is static power (ie leakage power), which is basically directly proportional to die area. This is why high-end smartphone SoCs are designed as small chips using high-density libraries, as opposed to low-density libraries used by desktop parts, as they spend a lot of time idling at low clocks, and high density libraries mean smaller chips, which means less static power consumption, which means better battery life.
 
Thank you @Thraktor for joining us back.
It is a bit sad that there isn't much new to discuss. I wonder though: what chip would be your favourite for the next Switch, and if can't name one, what essential qualities should it have.

You just mentioned consumption at idle clock speeds and how this dictates the size of the die to be small. Are there any other constraints you can see have a limitation on the design of the chip?
 
Too high to make it work properly to limit the die size so much

Yeah I agree, it probably has to be said that the structure of the Orin architecture is 8SM in a GPC, which at that point would be better for Nintendo to just use an 8SM part at lower clocks vs a cut down 4SM (that would essentially be similar in die size to the 8SM part but with higher clocks).
 
Yeah I agree, it probably has to be said that the structure of the Orin architecture is 8SM in a GPC, which at that point would be better for Nintendo to just use an 8SM part at lower clocks vs a cut down 4SM (that would essentially be similar in die size to the 8SM part but with higher clocks).
Yeah as cutting above or below 8SMs would be wasting die space more or less because the rest of the GPC would have to be there still.

And unless NVIDIA plans on releasing a true Smartphone SoC in 2022/2023, I doubt they would customize the GPC Layout to work with 4SMs just for Nintendo (Not to mention it would make the SoC More expensive than 8SMs still)
 
Yeah as cutting above or below 8SMs would be wasting die space more or less because the rest of the GPC would have to be there still.

And unless NVIDIA plans on releasing a true Smartphone SoC in 2022/2023, I doubt they would customize the GPC Layout to work with 4SMs just for Nintendo (Not to mention it would make the SoC More expensive than 8SMs still)

I think that's the big difference here in this case when kopite7kimi tells us that Dane is based on Orin, we have the specs for the full Orin chip and can speculate what that might look like. With the TX1 that was the full chip at that time and there was nothing else larger in the Tegra department...
 
6SM might still be a good middle ground
Then you'd have 2SMs of wasted space.

The only way it's "6SMs" is if they use Rapid Core Scaling at the Kernel level of the OS to force 2SMs worth of GPU cores off and give the cache from those cores to the remainder.

Which we can't predict the benefits.
 
Firstly I wouldn't say 100mm2 is a hard limit on size, just that my expectations are for something around that size. It could be around 120mm2, maybe smaller, but I wouldn't expect anything significantly bigger.

As for the reasons to expect a die around that size, there are two. Firstly cost, which should be obvious enough, as the SoC is the most expensive component in a gaming device, and a smaller die is cheaper than a bigger die. The second is power consumption.

Now, you'll probably have heard many people (including myself) point out that larger chips can in general reduce power consumption at the same performance level by allowing for reduced clocks. This is due to the power consumption of an integrated circuit being proportional to the square of the voltage being supplied, so as you increase voltage (to achieve higher clocks) the power consumption increases faster than the performance. Hence, a (let's say) 8SM GPU running at 1GHz should consume less power than a 4SM GPU running at 2GHz, while providing effectively the same performance, all other things being equal.

The problem in this case is that this only applies to higher clock speeds. At low clock speeds, there's quite a different behaviour, as there's a minimum voltage required for the chip to actually operate, so as you reduce clocks to zero the voltage doesn't go down to zero, it goes down to this minimum voltage. In practice there is some maximum clock speed which can be attained at this minimum voltage, and there's very little reason to clock lower than this. For example, if a chip could hit 300MHz at its minimum voltage, then reducing the clock to 150MHz would consume almost as much power, but give you 50% less performance, so would be a net loss in power efficiency. This is the reason, when you check clock speeds on your PC or phone, you don't see CPU or GPU clocks dropping down to 1MHz when idle. There's some minimum clock speed which they will idle at where there are no meaningful power savings to be had to clock lower.

Bringing this back to Switch, the most important power limit for Nintendo and Nvidia when designing the new SoC isn't how much power is consumed while at high clock speeds (ie docked), it's how much power is consumed at low clock speeds (ie handheld). Nintendo will want the new model to have some basic level of battery life, which means they're going to set strict design limits on handheld-mode power consumption for the SoC. For the GPU side of things, this makes the minimum voltage a key design constraint; they can't go lower than that, so they can't use a GPU that consumes too much power at that minimum voltage.

More specifically, we can actually see from Nvidia's DVFS tables that the base 384MHz handheld clock speed used by the Switch is already running at the lowest voltage on the 16nm Mariko (although not quite on the original 20nm TX1). With the move to 8nm, it's reasonable to assume that this max-clock-on-min-voltage will increase again, let's say to around 500MHz. Let's assume that Nintendo have budgeted 3W for the GPU in handheld mode. If 4SMs at 500MHz consumes around 3W, then they simply can't use a larger GPU without breaking their power budget. Increasing to 6SMs or 8SMs, even if clocks were decreased to 300MHz or 250MHz respectively, would still consume more power, as the savings from the reduced clocks wouldn't be anywhere near enough to offset the larger GPUs.

Note that the CPU is also impacted by the same behaviour, although it's a bit different as the clocks will likely still be the same across handheld and docked modes. Ditto with basically all other hardware on the SoC, like security coprocessors, DSPs, etc.

Now, I don't know exactly the power budget Nintendo will allocate to the SoC in handheld for the new device, although it's likely to be within the ballpark of the original Switch and the 2019 Mariko revision. I also don't know the exact minimum voltages or corresponding clock speeds, or power consumption thereof of the new GPU, CPU, etc. However, it's pretty safe to say that it's roughly proportional to die size, as a large proportion of power consumption of ICs at low clocks is static power (ie leakage power), which is basically directly proportional to die area. This is why high-end smartphone SoCs are designed as small chips using high-density libraries, as opposed to low-density libraries used by desktop parts, as they spend a lot of time idling at low clocks, and high density libraries mean smaller chips, which means less static power consumption, which means better battery life.
So is an 8SM Orion NX with identical clock speeds of the current switch models and the battery life and voltage of V1 possible?

i'm equally worried if it can even fit in 120mm
 
Last edited:
So is an 8SM Orion NX with identical clock speeds of the current switch models and the battery life and voltage of V1 possible?

i'm equally worried if it can even fit..
The problem is they are bound to 8SMs of GPU size anyway due to how NVIDIA's Orin GPCs work.

So why waste the potential processing power?
 
i'm equally worried if it can even fit in 120mm
I think that depends on what motherboard Nintendo decides to use for the DLSS model*. Assuming that the DLSS model* has a very similar form factor to the OLED model, Nintendo could theoretically fit a >120 mm² SoC inside the DLSS model* if Nintendo decides to use a much smaller and more compact motherboard, relatively similar to the one on the Jetson Orin NX. (I don't expect Dane's die size to be ≥200 mm².)

Although I still believe Dane's likely to be fabricated using Samsung's 8N process node, ASML's update on the damages caused by the fire at ASML Berlin is definitely concerning, especially for a hypothetical scenario where Nintendo's planning on releasing a refresh of the DLSS model* running on a refresh of Dane, which is fabricated at a process node more advanced than Samsung's 8N process node, such as Samsung's 5LPP process node as one example, in 2024, assuming the DLSS model* does launch in holiday 2022.

  • The manufacturing of DUV components has been restarted. Although there was some disruption regarding components for DUV, we expect to remediate this in such a way that it will not affect our output and revenue plan for DUV
  • As to EUV, the fire affected part of the production area of the wafer clamp, a module in our EUV systems. We are still in the process of completing the recovery plan for this production area and determining how to minimize any potential impact for our EUV customers, both in our output plan and in our field service.
 
Yeah as cutting above or below 8SMs would be wasting die space more or less because the rest of the GPC would have to be there still.

And unless NVIDIA plans on releasing a true Smartphone SoC in 2022/2023, I doubt they would customize the GPC Layout to work with 4SMs just for Nintendo (Not to mention it would make the SoC More expensive than 8SMs still)
Wasting space isn't anything new. The Tegra X1 has 4 unusable A53 cores.

Thraktor's point can't just be handwaved by the engineering team; if 8SMs use too much power at the minimum voltage, then they use too much power. There's no getting around that just because getting to use all 8 would be more convenient, not even with downclocking; as he said, that only works to a certain point. In that scenario, either Nvidia does engineer a smaller GPC configuration for Nintendo (not unbelievable imo), or they fuse off a few SMs.

Of course, they may not use too much power. We can't know from here.
 
Firstly I wouldn't say 100mm2 is a hard limit on size, just that my expectations are for something around that size. It could be around 120mm2, maybe smaller, but I wouldn't expect anything significantly bigger.

As for the reasons to expect a die around that size, there are two. Firstly cost, which should be obvious enough, as the SoC is the most expensive component in a gaming device, and a smaller die is cheaper than a bigger die. The second is power consumption.

Now, you'll probably have heard many people (including myself) point out that larger chips can in general reduce power consumption at the same performance level by allowing for reduced clocks. This is due to the power consumption of an integrated circuit being proportional to the square of the voltage being supplied, so as you increase voltage (to achieve higher clocks) the power consumption increases faster than the performance. Hence, a (let's say) 8SM GPU running at 1GHz should consume less power than a 4SM GPU running at 2GHz, while providing effectively the same performance, all other things being equal.

The problem in this case is that this only applies to higher clock speeds. At low clock speeds, there's quite a different behaviour, as there's a minimum voltage required for the chip to actually operate, so as you reduce clocks to zero the voltage doesn't go down to zero, it goes down to this minimum voltage. In practice there is some maximum clock speed which can be attained at this minimum voltage, and there's very little reason to clock lower than this. For example, if a chip could hit 300MHz at its minimum voltage, then reducing the clock to 150MHz would consume almost as much power, but give you 50% less performance, so would be a net loss in power efficiency. This is the reason, when you check clock speeds on your PC or phone, you don't see CPU or GPU clocks dropping down to 1MHz when idle. There's some minimum clock speed which they will idle at where there are no meaningful power savings to be had to clock lower.

Bringing this back to Switch, the most important power limit for Nintendo and Nvidia when designing the new SoC isn't how much power is consumed while at high clock speeds (ie docked), it's how much power is consumed at low clock speeds (ie handheld). Nintendo will want the new model to have some basic level of battery life, which means they're going to set strict design limits on handheld-mode power consumption for the SoC. For the GPU side of things, this makes the minimum voltage a key design constraint; they can't go lower than that, so they can't use a GPU that consumes too much power at that minimum voltage.

More specifically, we can actually see from Nvidia's DVFS tables that the base 384MHz handheld clock speed used by the Switch is already running at the lowest voltage on the 16nm Mariko (although not quite on the original 20nm TX1). With the move to 8nm, it's reasonable to assume that this max-clock-on-min-voltage will increase again, let's say to around 500MHz. Let's assume that Nintendo have budgeted 3W for the GPU in handheld mode. If 4SMs at 500MHz consumes around 3W, then they simply can't use a larger GPU without breaking their power budget. Increasing to 6SMs or 8SMs, even if clocks were decreased to 300MHz or 250MHz respectively, would still consume more power, as the savings from the reduced clocks wouldn't be anywhere near enough to offset the larger GPUs.

Note that the CPU is also impacted by the same behaviour, although it's a bit different as the clocks will likely still be the same across handheld and docked modes. Ditto with basically all other hardware on the SoC, like security coprocessors, DSPs, etc.

Now, I don't know exactly the power budget Nintendo will allocate to the SoC in handheld for the new device, although it's likely to be within the ballpark of the original Switch and the 2019 Mariko revision. I also don't know the exact minimum voltages or corresponding clock speeds, or power consumption thereof of the new GPU, CPU, etc. However, it's pretty safe to say that it's roughly proportional to die size, as a large proportion of power consumption of ICs at low clocks is static power (ie leakage power), which is basically directly proportional to die area. This is why high-end smartphone SoCs are designed as small chips using high-density libraries, as opposed to low-density libraries used by desktop parts, as they spend a lot of time idling at low clocks, and high density libraries mean smaller chips, which means less static power consumption, which means better battery life.
Thank you for getting back to me on this, are you ruling out the possibility of a denser battery? I’m aware that there are denser batteries these days that seem to fit a similar space that the current switch has but these are in smartphones like the later Samsung Galaxy models. But the likelihood of them opting for a denser batter while raising clocks to the point where it scales to the most performance per watt is unknown and I will admit not something that can be depended on as a certainty. Nor would the scaling essentially be removed in this case, just mitigated as a result. And I don’t know if it would mitigate it enough to be a desirable outcome. And it was the discovery of a 10W profile that spurred my curiosity, but that was for the whole SoC that includes the other elements as well, or maybe the whole board?

Yeah I agree, it probably has to be said that the structure of the Orin architecture is 8SM in a GPC, which at that point would be better for Nintendo to just use an 8SM part at lower clocks vs a cut down 4SM (that would essentially be similar in die size to the 8SM part but with higher clocks).
Yeah as cutting above or below 8SMs would be wasting die space more or less because the rest of the GPC would have to be there still.

And unless NVIDIA plans on releasing a true Smartphone SoC in 2022/2023, I doubt they would customize the GPC Layout to work with 4SMs just for Nintendo (Not to mention it would make the SoC More expensive than 8SMs still)
For the record, the organization of the GPC is out to how nVidia customizes the GPC and organizes it. The Tegra X1 in the Switch is Maxwell based which in the desktop variant, per GPC, has 512 shader units. The switch has a single GPC but that doesn’t contain 512 shader units, just 256 units.

A single Ampere GPC containing 8SMs and it getting cut down to say 4SMs, half just like the TX1, it would be following the similar design trend they did with one of the previous SoC.
So is an 8SM Orion NX with identical clock speeds of the current switch models and the battery life and voltage of V1 possible?

i'm equally worried if it can even fit in 120mm
It can’t fit in a 120mm^2 package, but looking more at the number of SMs is the less important part imo, the more important part is looking at the number of CPU cores if they can fit at the appropriate power profile.

CPU isn’t really as scalable.

The problem is they are bound to 8SMs of GPU size anyway due to how NVIDIA's Orin GPCs work.

So why waste the potential processing power?
It wouldn’t really be wasting space here, it would be sized down to fit the profile needs for the device.

analyst thread has me dooming

hardware nerds, hit me with 50ccs of hopium, stat
No switch until like 2024, can’t be as powerful because of laws of physics, think weaker.

Idk if this helps though :p
 
Thank you for getting back to me on this, are you ruling out the possibility of a denser battery? I’m aware that there are denser batteries these days that seem to fit a similar space that the current switch has but these are in smartphones like the later Samsung Galaxy models. But the likelihood of them opting for a denser batter while raising clocks to the point where it scales to the most performance per watt is unknown and I will admit not something that can be depended on as a certainty. Nor would the scaling essentially be removed in this case, just mitigated as a result. And I don’t know if it would mitigate it enough to be a desirable outcome. And it was the discovery of a 10W profile that spurred my curiosity, but that was for the whole SoC that includes the other elements as well, or maybe the whole board?



For the record, the organization of the GPC is out to how nVidia customizes the GPC and organizes it. The Tegra X1 in the Switch is Maxwell based which in the desktop variant, per GPC, has 512 shader units. The switch has a single GPC but that doesn’t contain 512 shader units, just 256 units.

A single Ampere GPC containing 8SMs and it getting cut down to say 4SMs, half just like the TX1, it would be following the similar design trend they did with one of the previous SoC.

It can’t fit in a 120mm^2 package, but looking more at the number of SMs is the less important part imo, the more important part is looking at the number of CPU cores if they can fit at the appropriate power profile.

CPU isn’t really as scalable.


It wouldn’t really be wasting space here, it would be sized down to fit the profile needs for the device.
You don't just downsize the GPC, the GPC is a uArch thing.

That would literally cost more than using 8SMs/1GPC
 
You don't just downsize the GPC, the GPC is a uArch thing.

That would literally cost more than using 8SMs/1GPC
I mentioned it and Jersh mentioned it as well, or alluded to it. Either they re-fit the GPC to fit the profile needed for this device, or they simply cut down four of those SMs to make it work in this profile. Nintendo probably recognizes that this may be a necessary cost, as well does nVidia so it becomes more difficult than you think to just have a full GPC. And it isn’t like they haven’t customized it before.

And deactivation/fuse off of course isn’t really surprising especially for Yield purposes, though it should be less significant on this device than the say the Series X
 
I mentioned it and Jersh mentioned it as well, or alluded to it. Either they re-fit the GPC to fit the profile needed for this device, or they simply cut down four of those SMs to make it work in this profile. Nintendo probably recognizes that this may be a necessary cost, as well does nVidia so it becomes more difficult than you think to just have a full GPC. And it isn’t like they haven’t customized it before.

And deactivation/fuse off of course isn’t really surprising especially for Yield purposes, though it should be less significant on this device than the say the Series X
Well, my point is why fuse off SMs in a GPC if they can just turn off the CUDA cores with Rapid Core Scaling and leave the Cache free for the remaining cores to use.

That would be a better Efficiency/Performance gain rather than yeeting away 4SMs.

And even then you'd still be using the space of 1 GPC. And the thing is the Orin GPC structure is already customized IIRC for 8SMs per GPC, so they would have to customize it again for Dane which increases costs further.
 
No switch until like 2024, can’t be as powerful because of laws of physics, think weaker.

Idk if this helps though :p
For the former I'm not buying it. Everything we have still points to H2 2022.

For the latter I've been saying this for a while, yeah. Some of the expectations here have gotten a bit too high. We were told ~XB1 performance docked before DLSS and that's where I'm currently still at.
 
Wasting space isn't anything new. The Tegra X1 has 4 unusable A53 cores.

Thraktor's point can't just be handwaved by the engineering team; if 8SMs use too much power at the minimum voltage, then they use too much power. There's no getting around that just because getting to use all 8 would be more convenient, not even with downclocking; as he said, that only works to a certain point. In that scenario, either Nvidia does engineer a smaller GPC configuration for Nintendo (not unbelievable imo), or they fuse off a few SMs.

Of course, they may not use too much power. We can't know from here.

We definitely aren't discounting what he's saying because it's absolutely valid!
The problem is no one has any base indication of what kind of TDP the uArch will even get down to of a smaller size GPU of 8SM fabricated on 8nm running under 1Ghz without all of the automotive features. So currently there's a bunch of unknowns still out there...
For the record, the organization of the GPC is out to how nVidia customizes the GPC and organizes it. The Tegra X1 in the Switch is Maxwell based which in the desktop variant, per GPC, has 512 shader units. The switch has a single GPC but that doesn’t contain 512 shader units, just 256 units.

A single Ampere GPC containing 8SMs and it getting cut down to say 4SMs, half just like the TX1, it would be following the similar design trend they did with one of the previous SoC.

It can’t fit in a 120mm^2 package, but looking more at the number of SMs is the less important part imo, the more important part is looking at the number of CPU cores if they can fit at the appropriate power profile.

CPU isn’t really as scalable.


It wouldn’t really be wasting space here, it would be sized down to fit the profile needs for the device.


No switch until like 2024, can’t be as powerful because of laws of physics, think weaker.

Idk if this helps though :p

We are all probably saying very similar things but what we are currently going by is the information that kopite7kimi has given us saying that Dane would be based on Orin. If Nintendo commissions Nvidia to come up with its own custom variation of the architecture that is no longer based on Orin's design, why even consider it an off shoot by that point. Which would be a more expensive solution on Nintendo's end and they might as well not even stay on the 8nm process if its that custom(which could also very well be a possibility).

The TX1 was designed from the beginning to use 256 Maxwell cores in its design, so we are just stating at what point if Nintendo are going that custom does it no longer bare any resemblance to the Orin design. The TM660M (Jetson Nano) is based on the TX1 but uses half the cores and is still essentially the same size die as the TX1. What Thraktor has stated absolutely still stands true even in core design for Dane's overall die size, because at what point does the Ampere/Lovelace uArch become so inefficient at higher clocks that it just makes more sense to use a wider design. Which Nvidia kind of tells us as much when they gave Orin specs having CPU cores cap out a 2Ghz and GPU cores at 1Ghz to achieve their desired TDP on 8nm.

Again we just don't know enough besides what insiders have mentioned but I'm not expecting Nintendo to do anything extremely custom with Nvidia anytime soon and how realistic would such a device be able to have competent DLSS functionality.
 
Late 2024 would be damn near a 3 year wait since 2022 just begun. That would be incredibly depressing. I've really cut back on playing Switch games at this point as they just don't look or run good on a 65' 4K OLED TV. Three years from now I imagine I will have just moved on. That would put this old hardware at 7.5+ years old and we would have PS5 Pros and the Xbox equivalent by now. Switch is already around a PS3 in power. Really is time for a huge upgrade. Don't want to wait 3 more years. Ugh.
 
Late 2024 would be damn near a 3 year wait since 2022 just begun. That would be incredibly depressing. I've really cut back on playing Switch games at this point as they just don't look or run good on a 65' 4K OLED TV. Three years from now I imagine I will have just moved on. That would put this old hardware at 7.5+ years old and we would have PS5 Pros and the Xbox equivalent by now. Switch is already around a PS3 in power. Really is time for a huge upgrade. Don't want to wait 3 more years. Ugh.
yeahhh if it's 2024 I think enthusiast software sales will drop a bit

will that be enough to affect Nintendo? probably not
 
Late 2024 would be damn near a 3 year wait since 2022 just begun. That would be incredibly depressing. I've really cut back on playing Switch games at this point as they just don't look or run good on a 65' 4K OLED TV. Three years from now I imagine I will have just moved on. That would put this old hardware at 7.5+ years old and we would have PS5 Pros and the Xbox equivalent by now. Switch is already around a PS3 in power. Really is time for a huge upgrade. Don't want to wait 3 more years. Ugh.

Yeah the problem that occurs as more time passes is by 2024 Nvidia will already have had another architecture (Lovelace) on a much better process in TSMC's 5nm. Which also by that time will be considered a mature enough process that most of the larger demanding companies will have moved on to 4nm, 3nm and starting on 2nm designs.

Even Nvidia themselves will have had another uArch on the market in Hopper and probably gearing up for a mid-gen refresh...
Time just doesn't stand still in the tech world and one year when something in spec could look sufficient enough in design, by the next year can easily show its age and where improvements in technology evolution has taken us.
 
by 2024, Orin will be set to be replaced and Samsung will desperately want to put 8nm out to pasture

either NateDrake is horrifically wrong, or the basis for the analysis's claim is very suspect
 
analyst thread has me dooming

hardware nerds, hit me with 50ccs of hopium, stat
If the rumored devkit timeline is correct (and there were quite a few sources suggesting they were out in the wild by early 2021 at the latest), then 2022 still looks like the most likely target, with early 2023 as a plan B.
by 2024, Orin will be set to be replaced and Samsung will desperately want to put 8nm out to pasture

either NateDrake is horrifically wrong, or the basis for the analysis's claim is very suspect
I kind of doubt technical details are factoring in to that 2024 prediction much, if at all.
 
I kind of doubt technical details are factoring in to that 2024 prediction much, if at all.
that's my problem with these predictions. just looking at "sales" and "shortages" doesn't make for a good prediction, I think. literally no one else is letting those things stop them, why would Nintendo?

I don't know about that. Samsung announced a 8 nm** based 5G radio frequency (RF) technology fairly recently, which I imagine will still be used in 2024 and beyond.
I suspect a dedicated RF chip would be significantly smaller than an SoC that consumes 10s of watts.
 
I wasn’t being serious with my 2024 comment, but all the doom and gloom hasn’t been helping lately so I didn’t see the need to be literal, especially with an analyst take that is just an opinion, and the community take about how a device isn’t needed just because “it’s selling well”


Granted, Nintendo has enough IPs to survive a while longer and be “fine”, enthusiast crowds that are for or against the next whatever it will be called forget that most people aren’t enthusiasts, they’re casuals.

yeahhh if it's 2024 I think enthusiast software sales will drop a bit

will that be enough to affect Nintendo? probably not
It depends solely on if their software starts to tank, and it won’t anytime soon.

For the former I'm not buying it. Everything we have still points to H2 2022.

For the latter I've been saying this for a while, yeah. Some of the expectations here have gotten a bit too high. We were told ~XB1 performance docked before DLSS and that's where I'm currently still at.
Nothing is really pointing to a 2022 release though, it’s a long stretch that pins comments that don’t allude to anything as though it is for a 2022 release. They have loads of software that is releasing before a device comes out only to have…. Nothing release with the device? How does that make sense?

They haven’t had a software peak as far as I’m aware. Shortages don’t factor into this much as the shortages were going to be there whether they release now, later in 2023 or even in 2025. They were still going to have shortages.

Only one comment has been about a 2022 release of something and it was just for software. As in a game was finished in development by the second half of 2022 that is not targeting the switch but it is supposedly targeting a higher end model. simply having a game finished development by the second half of 2022 and having a game completely finished ready to be shipped by the second half of 2022 are two different things. The first one can be a game that finishes development by the time but is ready to ship a few months later like in quarter one of 2023.

This really where we at in the thread?
Not we, just me. And the lurkers that read but don’t comment here.

I think if I rephrase the previous statements I’ve been making it makes it more accurate of what should be said. Rather than “expecting Q1 2023 for the release“ I’m going to rephrase it to be the actual meaning, “I want it to be released by Q1 2023 at the latest“ and this gives away a realistic idea of what’s going on. Later would be disappointing but something I would have to live with.

Well, my point is why fuse off SMs in a GPC if they can just turn off the CUDA cores with Rapid Core Scaling and leave the Cache free for the remaining cores to use.

That would be a better Efficiency/Performance gain rather than yeeting away 4SMs.

And even then you'd still be using the space of 1 GPC. And the thing is the Orin GPC structure is already customized IIRC for 8SMs per GPC, so they would have to customize it again for Dane which increases costs further.
We definitely aren't discounting what he's saying because it's absolutely valid!
The problem is no one has any base indication of what kind of TDP the uArch will even get down to of a smaller size GPU of 8SM fabricated on 8nm running under 1Ghz without all of the automotive features. So currently there's a bunch of unknowns still out there...


We are all probably saying very similar things but what we are currently going by is the information that kopite7kimi has given us saying that Dane would be based on Orin. If Nintendo commissions Nvidia to come up with its own custom variation of the architecture that is no longer based on Orin's design, why even consider it an off shoot by that point. Which would be a more expensive solution on Nintendo's end and they might as well not even stay on the 8nm process if its that custom(which could also very well be a possibility).

The TX1 was designed from the beginning to use 256 Maxwell cores in its design, so we are just stating at what point if Nintendo are going that custom does it no longer bare any resemblance to the Orin design. The TM660M (Jetson Nano) is based on the TX1 but uses half the cores and is still essentially the same size die as the TX1. What Thraktor has stated absolutely still stands true even in core design for Dane's overall die size, because at what point does the Ampere/Lovelace uArch become so inefficient at higher clocks that it just makes more sense to use a wider design. Which Nvidia kind of tells us as much when they gave Orin specs having CPU cores cap out a 2Ghz and GPU cores at 1Ghz to achieve their desired TDP on 8nm.

Again we just don't know enough besides what insiders have mentioned but I'm not expecting Nintendo to do anything extremely custom with Nvidia anytime soon and how realistic would such a device be able to have competent DLSS functionality.
Right, however we have to consider that this device is assumed to be backwards compatible and if they do not figure out a software solution then they would need to utilize a hardware solution for BC which would be including the hardware inside it or modify the GPU to be in a similar boat to the Series X, S and PS5 with the custom route of an RDNA1+2 feature set to varying degrees. Regardless of what they do, there must be a degree of architectural change for this chip as not everything is needed for this as shown for the consoles complete lack of IC despite it being a big player in why RDNA2 has such perf gains, but it cannot be a huge modification that alters it like how Pascal is different from Turing who has Tensor Cores and Ray Tracing cores among other things.

If we are to believe that they split the R&D for Orin, why would a modification for them be off the table as well? I understand making a new architecture is very expensive, but making tweaks enough where they can alter it to a degree should not be impossible to do if that makes sense.

Like if Nintendo adds a large embedded memory that is shared between the GPU and CPU, that is already an architectural change right there.
 
Nothing is really pointing to a 2022 release though, it’s a long stretch that pins comments that don’t allude to anything as though it is for a 2022 release. They have loads of software that is releasing before a device comes out only to have…. Nothing release with the device? How does that make sense?
Huh? We've had comments from multiple insiders including Nate, Mochizuki, Grubb and even DF talking about 2022 being the plan for the Dane Switch.

As for software we have no idea when anything but Arceus is releasing. It's very possible that BotW2, Xenoblade Chronicles 3 and Bayonetta 3 are all Dane launch window titles. Splatoon 3 is a Summer game for sure but it's GaaS so the exact timing it releases won't exactly matter.
They haven’t had a software peak as far as I’m aware. Shortages don’t factor into this much as the shortages were going to be there whether they release now, later in 2023 or even in 2025. They were still going to have shortages.

Only one comment has been about a 2022 release of something and it was just for software. As in a game was finished in development by the second half of 2022 that is not targeting the switch but it is supposedly targeting a higher end model. simply having a game finished development by the second half of 2022 and having a game completely finished ready to be shipped by the second half of 2022 are two different things. The first one can be a game that finishes development by the time but is ready to ship a few months later like in quarter one of 2023.
You're missing a lot of comments about 2022 that happened months ago. Grubb early in 2021 said he had heard they were targeting 2022. Mochizuki clearly can get stuff wrong but he reported that the 11 devs seemed to all be targeting 2022 for their 4k enhanced games. Nate specifically said he heard the plan was to launch in either late 2021 or 2022 (this was like a year ago) and more recently heard it's expected to launch by H1 2023.

It's not just a comment about a dev having a game ready for late 2022, it's been a lot of hints.
 
Last edited:
So Super Switch has been delayed a few years and we know the chipset can't be changed due to there not having anything else to choose from; what can Nintendo/Nvidia do to the SOC to make the system more powerful now that it's launching later? Can they still make improvements to it?
 
So Super Switch has been delayed a few years and we know the chipset can't be changed due to there not having anything else to choose from; what can Nintendo/Nvidia do to the SOC to make the system more powerful now that it's launching later? Can they still make improvements to it?
there's nothing definitive to say it has or hasn't been delayed. as for making it stronger, they'd have to redesign the whole SoC
 
0
So Super Switch has been delayed a few years and we know the chipset can't be changed due to there not having anything else to choose from; what can Nintendo/Nvidia do to the SOC to make the system more powerful now that it's launching later? Can they still make improvements to it?
Huh? Literally nothing has suggested it has been delayed at all, let alone a few years.

Let's not let an analyst's prediction that is stated purely to be a prediction and in no way any inside knowledge spread a bunch of FUD in this thread.
 
0
Yeah the problem that occurs as more time passes is by 2024 Nvidia will already have had another architecture (Lovelace) on a much better process in TSMC's 5nm. Which also by that time will be considered a mature enough process that most of the larger demanding companies will have moved on to 4nm, 3nm and starting on 2nm designs.
I think it's too early to say, considering ASML's update on the fire incident at ASML Berlin seems to suggest that plans for 3 nm** process nodes and beyond could be delayed. (Hopefully, the damage caused by the fire incident at ASML Berlin isn't too bad.)

So Super Switch has been delayed a few years and we know the chipset can't be changed due to there not having anything else to choose from; what can Nintendo/Nvidia do to the SOC to make the system more powerful now that it's launching later? Can they still make improvements to it?
I think it depends on if Dane's been taped out yet or not. If the former, outside of tweaking the CPU, GPU, and RAM frequencies, there's really not much Nintendo and Nvidia can do to improve on Dane's design.

Huh? We've had comments from multiple insiders including Nate, Mochizuki, Grubb and even DF talking about 2022 being the plan for the Dane Switch.
NateDrake also mentioned that developers had heard that Nintendo's targeting a holiday 2022 to early 2023 release window. So unless 2022 is referring to the fiscal year ending on March 2023, launching in 2022 is not 100% guaranteed.
 
Last edited:
So, we've finally entered the stage of hardware speculation where every market analyst has an opinion on the subject (mostly informed by current sales maximization rather than anything substantial) that throws us wildly off the scent for no good reason, huh?

Stay the course, folks.
 
Nikkei also reported on the same DLSS hardware last year. We are in an age of uncertainty and plans are more in flux than normal. It makes for a difficult reporting environment, moreso when GDC/E3 type events aren't being held in-person. Such gatherings are a treasure-trove for such information.
 
Please read this staff post before posting.

Furthermore, according to this follow-up post, all off-topic chat will be moderated.
Last edited:


Back
Top Bottom