StarTopic Future Nintendo Hardware & Technology Speculation & Discussion |ST| (Read the staff posts before commenting!)

Deleted member 887 · Mar 1, 2022

Alovon11 said:
12SM would likely surpass the Seires S if fed enough power before DLSS

But the file does say
__NVN_NUM_SMS_GA10F

NVN_ NUMBER OF SMs in_ GA10F

The leak I've seen is related to a Windows based NVN implementation for a dev kit. I imagine it has defines built around the hardware in a dev kit rather than the exact power of the device itself. I'm tempted to not read too much into it, but I am very dumb?

LiC · Mar 1, 2022

Pokemaniac said:
That 48 seems... weird. If that's the number of shaders in the SM as implied, that seems weirdly low.

Yeah, it sounds like this will be the API that native Dane/Drake games will use.

Well, I can tell you the other Ampere chips (and an Ada chip) are stated as 64 warps per SM, except for GA10X (sm_86) series which also says 48.

LiC · Mar 1, 2022

oldpuck said:
The leak I've seen is related to a Windows based NVN implementation for a dev kit. I imagine it has defines built around the hardware in a dev kit rather than the exact power of the device itself. I'm tempted to not read too much into it, but I am very dumb?

A few things: (1) A dev kit for Switch isn't Windows, it's modified Switch hardware, (2) this source code could build either the Windows reference implementation, which is meant for development -- though not for a dev kit -- or it could build for the real NX hardware, (3) anywhere the source refers to a Tegra chip like GA10F/T239, that's concrete in terms of what chip they're talking about and it can only be relevant to the NX hardware since Windows PCs don't have Tegras in them.

NineTailSage · Mar 1, 2022

So the question here is do we multiply the 48 x 2 to get the FP32 count of Ampere?
These notes on Ampere warps per SM state they were the equivalent to Volta...

The maximum number of concurrent warps per SM remains the same as in Volta (i.e., 64), and other factors influencing warp occupancy are:

The register file size is 64K 32-bit registers per SM.
The maximum number of registers per thread is 255.

Look over there · Mar 1, 2022

LiC said:
Well, I can tell you the other Ampere chips (and an Ada chip) are stated as 64 warps per SM, except for GA10X (sm_86) series which also says 48.

Going by wikipedia's numbers for the 30xx series, I'm seeing 128 shaders per SM. So perhaps 1 warp = 2 shaders?

Skittzo · Mar 1, 2022

oldpuck said:
The leak I've seen is related to a Windows based NVN implementation for a dev kit. I imagine it has defines built around the hardware in a dev kit rather than the exact power of the device itself. I'm tempted to not read too much into it, but I am very dumb?

I don't think devkits typically use bigger/more performant GPUs than the target hardware has.

Alovon11 · Mar 1, 2022

Look over there said:
Going by wikipedia's numbers for the 30xx series, I'm seeing 128 shaders per SM. So perhaps 1 warp = 2 shaders?

maybe?
either way the Core count for 12SM Drake should be 1536 CUDA which is insane as that would likely leapfrog the PS4 Pro and tail the Series S before DLSS in all but Memory but even then the leak also seemingly exposed some info about how the Cahce design increased efficiency of memory bandwidth >60% so even with 128GB/s (half Orin) of memory bandwidth, Drake likely will compete even there too outside of amount of raw memory.

Pokemaniac · Mar 1, 2022

LiC said:
Well, I can tell you the other Ampere chips (and an Ada chip) are stated as 64 warps per SM, except for GA10X (sm_86) series which also says 48.

Okay, so there's probably something more complicated going on here, as I think Ampere is supposed to be 128 CUDA cores per SM. I assume by GA10X you're referring to the desktop Ampere chips? If that's the case, then the numbers would line up, and would imply that this is just some funky architectural detail and it's still 128.

ReddDreadtheLead · Mar 1, 2022

Skittzo said:
The caveat here is that SMs have different configurations in different architectures. So 1 Maxwell SM will have a different number of CUDA cores then 1 Ampere SM.

Ampere and Maxwell contain 128 CC per sm

Skittzo said:
I don't think devkits typically use bigger/more performant GPUs than the target hardware has.

I believe that in the devkits the GPU, CPU are clocked higher and more RAM are available for debugging purposes

So in a way they would be more performant than the retail versions

I think it’s like 20% more for the CPU?

LiC · Mar 1, 2022

Pokemaniac said:
Okay, so there's probably something more complicated going on here, as I think Ampere is supposed to be 128 CUDA cores per SM. I assume by GA10X you're referring to the desktop Ampere chips? If that's the case, then the numbers would line up, and would imply that this is just some funky architectural detail and it's still 128.

FAMILY_GA10X or sm_86 is what they're defined as. According to this list, sm_86 appears to cover the desktop Ampere chips.

Skittzo · Mar 1, 2022

ReddDreadtheLead said:
Ampere and Maxwell contain 128 CC per sm

Ah do they?

Well at least the arrangements are definitely not the same so that could be what we're looking at here with the warps.

ReddDreadtheLead said:
I believe that in the devkits the GPU, CPU and more RAM are available for debugging purposes

So in a way they would be more performant

True but the point still stands. They'll have the same GPU, physically speaking.

Deleted member 887 · Mar 1, 2022

LiC said:
A few things: (1) A dev kit for Switch isn't Windows, it's modified Switch hardware, (2) this source code could build either the Windows reference implementation, which is meant for development -- though not for a dev kit -- or it could build for the real NX hardware, (3) anywhere the source refers to a Tegra chip like GA10F/T239, that's concrete in terms of what chip they're talking about and it can only be relevant to the NX hardware since Windows PCs don't have Tegras in them.

Ah, yes, I mistook GA10F for GA102 - like I said, dumb. I assumed you'd want to stub out all supported hardware so that the reference implementation can scale its threadpool. Thank you!

Pokemaniac · Mar 1, 2022

LiC said:
FAMILY_GA10X or sm_86 is what they're defined as. According to this list, sm_86 appears to cover the desktop Ampere chips.

Okay, yeah, definitely sounds like the number probably isn't directly the CUDA core count, then, even if it might work out that way for TX1. This could imply that the chip is configured closer to the desktop cards than Orin, but I feel the need to emphasize that I really don't have an especially solid handle on what this number means.

LiC · Mar 1, 2022

oldpuck said:
Ah, yes, I mistook GA10F for GA102 - like I said, dumb. I assumed you'd want to stub out all supported hardware so that the reference implementation can scale its threadpool. Thank you!

There is in fact a lot of code that's only for the Windows impl to manage which Turing/Ampere GPU it's on and try to make it work properly. That's a big part of what makes it difficult to decipher what's really relevant for the Switch. On the subject of dev kits, GA10F is definitely the "main" Tegra chip NVN2 is meant to run on, but there are places that check for GA10B (Orin), so it's entirely possible that chip or some version of it was used in dev kits.

Skittzo · Mar 1, 2022

12 SMs just sounds wildly unrealistic from a size perspective. Wasn't 8SMs even supposedly cutting it close?

Kinda feel a little sketchy on that detail...

Alovon11 · Mar 1, 2022

Skittzo said:
12 SMs just sounds wildly unrealistic from a size perspective. Wasn't 8SMs even supposedly cutting it close?

Kinda feel a little sketchy on that detail...

Well, can't really argue with the fucking source code.

Unless Drake is "Docked Only Dane", it has to be it, as it's baked into NVN2's Source Code fragment that is seemingly in the leak.

Skittzo · Mar 1, 2022

Alovon11 said:
Well, can't really argue with the fucking source code.

Unless Drake is "Docked Only Dane", it has to be it, as it's baked into NVN2's Source Code fragment that is seemingly in the leak.

Yeah Drake being stationary is something I thought about...

This is all just very odd.

Look over there · Mar 1, 2022

Perhaps we simply underestimated how large a die (and the corresponding cost) NVIDIA/Nintendo were willing to go with?

Mr.Gamerson · Mar 1, 2022

yeah, I'm guessing docked only mode

Crusters · Mar 1, 2022

LiC said:
This is pointless publicity stunting. Nvidia wouldn't buy these guys a coffee if that's what they were asking for. Paying a ransom is just an invitation for more people to try to hack and extort you in the future. Never ceases to amaze me how dumb and immature these black hat groups are despite being sophisticated enough to get the hacks done in the first place.

I am fairly sure this was done via social engineering, which I don't find particularly sophisticated.

Dark Cloud · Mar 1, 2022

ACE+ said:
markomaro also chipping in:

@Dakhil Best of luck to you!

LMAOOO

NineTailSage · Mar 1, 2022

Skittzo said:
12 SMs just sounds wildly unrealistic from a size perspective. Wasn't 8SMs even supposedly cutting it close?

Kinda feel a little sketchy on that detail...

12 SMs seem unrealistic because we don't have anything to compare it to...
We don't even have full specs on GA107, but kopite7kimi has stated 190+mm²(which the the full GA107 is probably 24SM's)

Dark Cloud · Mar 1, 2022

Skittzo said:
Yeah Drake being stationary is something I thought about...

This is all just very odd.

What are you guys talking about? This new system might not be hybrid?

LiC · Mar 1, 2022

Some more numbers in case everyone wasn't confused enough. GA10F has 1 GPC with 6 TPC. GA10B has 2 GPC with 4 TPC per (8 total).

I've also been shown some numbers for GA10B and GA10F floating-point efficiency, but I'm scared to post them.

Alovon11 · Mar 1, 2022

NineTailSage said:
12 SMs seem unrealistic because we don't have anything to compare it to...
We don't even have full specs on GA107, but kopite7kimi has stated 190+mm²(which the the full GA107 is probably 24SM's)

yeah, and we have to consider that Orin is likely as big as it is primarily due to the A78AEs, DLAs, and other Automotive/AI stuff in it.

Alovon11 · Mar 1, 2022

Dark Cloud said:
What are you guys talking about? This new system might not be hybrid?

12SMs is stated in NVN2's Source Code which is way bigger of a GPU than any of us were really expecting.

So T239 Drake could be a Stationary variant of Dane?

And NVN2 References it as the SM count because that is the limit of the SM count of the GA10F uArch that NVN2 expects to run?

Skittzo · Mar 1, 2022

Dark Cloud said:
What are you guys talking about? This new system might not be hybrid?

There's no real indication that it won't be a hybrid, but there's new info now suggesting it will have a much bigger SoC die than we previously thought.

I'm having trouble believing they'd use something so big in a similar form factor to the original especially considering how little internal space was left in the original Switch.

LiC · Mar 1, 2022

I doubt Nintendo is planning a stationary console, but even if they were, I think it's completely impossible that it wouldn't be the complement to another new hybrid system released around the same time. And there's no evidence of two new systems. I also think anything that would make it that divergent from the original Switch already would have been reported on by now.

Skittzo · Mar 1, 2022

LiC said:
I doubt Nintendo is planning a stationary console, but even if they were, I think it's completely impossible that it wouldn't be the complement to another new hybrid system released around the same time. And there's no evidence of two new systems. I also think anything that would make it that divergent from the original Switch already would have been reported on by now.

Are you seeing any indication of docking/multiple power modes or anything like that? Considering that's the type of stuff that would be detailed in the original NVN I'd imagine.

LiC · Mar 1, 2022

Skittzo said:
Are you seeing any indication of docking/multiple power modes or anything like that? Considering that's the type of stuff that would be detailed in the original NVN I'd imagine.

I haven't seen anything like that in either NVN1 or 2, although I can't claim to have a remotely thorough view of their contents. But I'd hazard a guess that things like docking and performance profiles are in the Nintendo SDK side of NVN, not the Nvidia graphics driver side.

Pokemaniac · Mar 1, 2022

I suppose the shader count being higher than expected could be evidence of the "Drake is a newer iteration of Dane on a denser process" theory.

Skittzo · Mar 1, 2022

I'm very curious to hear what @Thraktor thinks about this seemingly confirming 12 SMs. My views on the hardware have typically aligned with his and this is throwing me for a loop.

LiC said:
I haven't seen anything like that in either NVN1 or 2, although I can't claim to have a remotely thorough view of their contents. But I'd hazard a guess that things like docking and performance profiles are in the Nintendo SDK side of NVN, not the Nvidia graphics driver side.

Makes sense, thank you. I didn't consider that there'd be two sides to it.

Pokemaniac · Mar 1, 2022

Skittzo said:
Are you seeing any indication of docking/multiple power modes or anything like that? Considering that's the type of stuff that would be detailed in the original NVN I'd imagine.

LiC said:
I haven't seen anything like that in either NVN1 or 2, although I can't claim to have a remotely thorough view of their contents. But I'd hazard a guess that things like docking and performance profiles are in the Nintendo SDK side of NVN, not the Nvidia graphics driver side.

Yeah, I imagine NVN probably wouldn't deal with this super explicitly. That stuff is probably mainly happening at abstraction levels both above and below what NVN deals with.

Deleted member 887 · Mar 1, 2022

LiC said:
I haven't seen anything like that in either NVN1 or 2, although I can't claim to have a remotely thorough view of their contents. But I'd hazard a guess that things like docking and performance profiles are in the Nintendo SDK side of NVN, not the Nvidia graphics driver side.

Docking/Undocking is indeed handled by OS Services, and the game handles changing resolution if need be when undocked (it's documented in various Switch Homebrew places)

chocolate_supra · Mar 1, 2022

It's been a while since this thread was flooded with a bunch of excited technospeak that I have to use context and google to digest. It feels like I'm in a newsroom and everyone is frantically chasing leads and shouting out what they're finding while others are trying to piece it together as it comes. And I'm just the camera operator, excited to see what all the activity turns into once it all comes together.

Yall're awesome.

Alovon11 · Mar 1, 2022

chocolate_supra said:
It's been a while since this thread was flooded with a bunch of excited technospeak that I have to use context and google to digest. It feels like I'm in a newsroom and everyone is frantically chasing leads and shouting out what they're finding while others are trying to piece it together as it comes. And I'm just the camera operator, excited to see what all the activity turns into once it all comes together.

Yall're awesome.

TL;DR, Hackers released DLSS's Source Code.

Accidentally leaked NVN2 while at it.

That code contained the SM Count and Name of T239 (Drake) pinning it at 12SMs

zumaddy · Mar 1, 2022

chocolate_supra said:
It's been a while since this thread was flooded with a bunch of excited technospeak that I have to use context and google to digest. It feels like I'm in a newsroom and everyone is frantically chasing leads and shouting out what they're finding while others are trying to piece it together as it comes. And I'm just the camera operator, excited to see what all the activity turns into once it all comes together.

Yall're awesome.

People in the thread: "the DSSLR count is 3.4x higher than the Switch which has a 14 SM count and a 2439 internal name-"

Us: Thank you Famiboards, very cool!

Dakhil · Mar 1, 2022

Alovon11 said:
Accidentally leaked NVN2 while at it.

I don't think this is accidental, since the hackers are trying to illegally blackmail Nvidia in an attempt to force Nvidia to be open source. Damaging Nvidia's relationship with Nintendo to an extent is one way to do so.

chocolate_supra · Mar 1, 2022

Alovon11 said:
pinning it at 12SMs

And that's way bigger than expected, right?

zumaddy said:
People in the thread: "the DSSLR count is 3.4x higher than the Switch which has a 14 SM count and a 2439 internal name-"

Us: Thank you Famiboards, very cool!

So cool

Dakhil said:
Damaging Nvidia's relationship with Nintendo to an extent is one way to do so.

I'm hoping Nintendo doesn't look down on Nvidia for this leak the way it's been said they looked down on Netflix for leaking the Zelda talks. I'm sure Nintendo can't just pull the plug on the Nvidia hardware relationship the way they pulled the plug on Netflix.

Skittzo · Mar 1, 2022

chocolate_supra said:
And that's way bigger than expected, right?

Yes. Personally I was expecting 4SMs and thought 8 was like the max possible, and extremely unlikely.

Looks like they might go for 12 which means very low clocks but much bigger chip.

Alovon11 · Mar 1, 2022

LiC said:
Some more numbers in case everyone wasn't confused enough. GA10F has 1 GPC with 6 TPC. GA10B has 2 GPC with 4 TPC per (8 total).

I've also been shown some numbers for GA10B and GA10F floating-point efficiency, but I'm scared to post them.

Umm... translate?
I understand what the GPCs mean (Determiner of the L2 allotment) but not the TPC

chocolate_supra · Mar 1, 2022

Skittzo said:
Yes. Personally I was expecting 4SMs and thought 8 was like the max possible, and extremely unlikely.

Looks like they might go for 12 which means very low clocks but much bigger chip.

Okay I'm super ignorant so forgive me if this is off base but: would a configuration that runs at lower clock speeds perhaps help with backward compatibility? I remember people talking about concerns that games programmed for TX1 might not play nice with a much faster chip, so could "bigger, but still clocked low" possibly alleviate that?

LiC · Mar 1, 2022

Alovon11 said:
Umm... translate?
I understand what the GPCs mean (Determiner of the L2 allotment) but not the TPC

I assume you're aware it stands for texture processing cluster. The extent of my knowledge is that GPCs contain TPCs which contain SMs. And actually, since the Ampere architecture evidently has two SMs per TPC, that would make the math check out with the 12 SMs as GA10F has 6 TPCs.

TinyPikachu · Mar 1, 2022

Skittzo said:
Yes. Personally I was expecting 4SMs and thought 8 was like the max possible, and extremely unlikely.

Looks like they might go for 12 which means very low clocks but much bigger chip.

… Which is good, right?

Skittzo · Mar 1, 2022

chocolate_supra said:
Okay I'm super ignorant so forgive me if this is off base but: would a configuration that runs at lower clock speeds perhaps help with backward compatibility? I remember people talking about concerns that games programmed for TX1 might not play nice with a much faster chip, so could "bigger, but still clocked low" possibly alleviate that?

I don't think so because the GPU configuration will still be very different. I think either way they will need a software compatibility layer.

TinyPikachu said:
… Which is good, right?

It means the power ceiling on this thing just jumped way up. Like, it could do 3-4TFlops before DLSS at high enough clocks.

However it's likely that they chose to configure the chip this way so that they'll use very low clock speeds, like what the base Switch uses or even lower. This will still be a substantial jump but more in line with what we previously thought.

It also means this thing might be more expensive, since bigger chips cost more $$$.

Alovon11 · Mar 1, 2022

LiC said:
I assume you're aware it stands for texture processing cluster. The extent of my knowledge is that GPCs contain TPCs which contain SMs. And actually, since the Ampere architecture evidently has two SMs per TPC, that would make the math check out with the 12 SMs as GA10F has 6 TPCs.

Ah okay so it is 12SMs for Drake (1536 CUDA Cores, 6 RT Cores, 44 Tensor Cores?) just you have the FP32 FLOPs number (however NVIDIA defines it) for GA10B and GA10F.

Those should let us relate them to each other but not really back to Ampere easily without a common score tester as GA10B/F both have uArch differences versus Desktop Ampere

Pokemaniac · Mar 1, 2022

chocolate_supra said:
Okay I'm super ignorant so forgive me if this is off base but: would a configuration that runs at lower clock speeds perhaps help with backward compatibility? I remember people talking about concerns that games programmed for TX1 might not play nice with a much faster chip, so could "bigger, but still clocked low" possibly alleviate that?

I wouldn't really expect any BC benefits. Would probably mainly be a play for lower power consumption.

ReddDreadtheLead · Mar 1, 2022

LiC said:
Some more numbers in case everyone wasn't confused enough. GA10F has 1 GPC with 6 TPC. GA10B has 2 GPC with 4 TPC per (8 total).

I've also been shown some numbers for GA10B and GA10F floating-point efficiency, but I'm scared to post them.

Do you mind DMing me those?

Alovon11 · Mar 1, 2022

LiC said:
I've also been shown some numbers for GA10B and GA10F floating-point efficiency, but I'm scared to post them.

ReddDreadtheLead said:
Do you mind DMing me those?

Same (either Redd do on Discord or LiC?)

Crusters · Mar 1, 2022

Skittzo said:
I don't think so because the GPU configuration will still be very different. I think either way they will need a software compatibility layer.

It means the power ceiling on this thing just jumped way up. Like, it could do 3-4TFlops before DLSS at high enough clocks.

However it's likely that they chose to configure the chip this way so that they'll use very low clock speeds, like what the base Switch uses or even lower. This will still be a substantial jump but more in line with what we previously thought.

It also means this thing might be more expensive, since bigger chips cost more $$$.

What the hell happened I was still caught up with the ransomers but what
...... 12 SM'S more shaders than PS4? Twice as many shaders as Durango Xbone? Dogs and cats sleeping together? What is going on??!!!!

Okay, how 'massive' is this chip going to be really? What kind of a ballpark reduction of size per transistor..... Per... Sm can we use, using the old 20nm X1 as a base?

Right? That one had 2 sm, we get the footprint of that, How would 12 of these ampere SM'S shrunk to 8 and 5nm compare to that footprint? How much bigger?

StarTopic Future Nintendo Hardware & Technology Speculation & Discussion |ST| (Read the staff posts before commenting!)

Deleted member 887

Guest

Member

Member

Bob-omb

Bob-omb

Baba Yaga Hut

Like Like

Caught: 1025

#TeamLate2025WithAPotentialForEarly2026

Member

Baba Yaga Hut

Deleted member 887

Guest

Caught: 1025

Member

Baba Yaga Hut

Like Like

Baba Yaga Hut

Bob-omb

Octorok

Koopa

Warpstar Knight

Bob-omb

Warpstar Knight

Member

Like Like

Like Like

Baba Yaga Hut

Member

Baba Yaga Hut

Member

Caught: 1025

Baba Yaga Hut

Caught: 1025

Deleted member 887

Guest

Must proceed at a high velocity‎ ‎ ‎ ‎ #TeamDecade

Like Like

Starman

2010 experience points!

Must proceed at a high velocity‎ ‎ ‎ ‎ #TeamDecade

Baba Yaga Hut

Like Like

Must proceed at a high velocity‎ ‎ ‎ ‎ #TeamDecade

Member

Rattata

Baba Yaga Hut

Like Like

Caught: 1025

#TeamLate2025WithAPotentialForEarly2026

Like Like

Koopa