• Hey everyone, staff have documented a list of banned content and subject matter that we feel are not consistent with site values, and don't make sense to host discussion of on Famiboards. This list (and the relevant reasoning per item) is viewable here.

StarTopic Future Nintendo Hardware & Technology Speculation & Discussion |ST| (Read the staff posts before commenting!)

12SM would likely surpass the Seires S if fed enough power before DLSS

But the file does say
__NVN_NUM_SMS_GA10F

NVN_ NUMBER OF SMs in_ GA10F
The leak I've seen is related to a Windows based NVN implementation for a dev kit. I imagine it has defines built around the hardware in a dev kit rather than the exact power of the device itself. I'm tempted to not read too much into it, but I am very dumb?
 
That 48 seems... weird. If that's the number of shaders in the SM as implied, that seems weirdly low.

Yeah, it sounds like this will be the API that native Dane/Drake games will use.
Well, I can tell you the other Ampere chips (and an Ada chip) are stated as 64 warps per SM, except for GA10X (sm_86) series which also says 48.
 
The leak I've seen is related to a Windows based NVN implementation for a dev kit. I imagine it has defines built around the hardware in a dev kit rather than the exact power of the device itself. I'm tempted to not read too much into it, but I am very dumb?
A few things: (1) A dev kit for Switch isn't Windows, it's modified Switch hardware, (2) this source code could build either the Windows reference implementation, which is meant for development -- though not for a dev kit -- or it could build for the real NX hardware, (3) anywhere the source refers to a Tegra chip like GA10F/T239, that's concrete in terms of what chip they're talking about and it can only be relevant to the NX hardware since Windows PCs don't have Tegras in them.
 
So the question here is do we multiply the 48 x 2 to get the FP32 count of Ampere?
These notes on Ampere warps per SM state they were the equivalent to Volta...

The maximum number of concurrent warps per SM remains the same as in Volta (i.e., 64), and other factors influencing warp occupancy are:
  • The register file size is 64K 32-bit registers per SM.
  • The maximum number of registers per thread is 255.
 
Last edited:
0
The leak I've seen is related to a Windows based NVN implementation for a dev kit. I imagine it has defines built around the hardware in a dev kit rather than the exact power of the device itself. I'm tempted to not read too much into it, but I am very dumb?
I don't think devkits typically use bigger/more performant GPUs than the target hardware has.
 
Going by wikipedia's numbers for the 30xx series, I'm seeing 128 shaders per SM. So perhaps 1 warp = 2 shaders?
maybe?
either way the Core count for 12SM Drake should be 1536 CUDA which is insane as that would likely leapfrog the PS4 Pro and tail the Series S before DLSS in all but Memory but even then the leak also seemingly exposed some info about how the Cahce design increased efficiency of memory bandwidth >60% so even with 128GB/s (half Orin) of memory bandwidth, Drake likely will compete even there too outside of amount of raw memory.
 
Well, I can tell you the other Ampere chips (and an Ada chip) are stated as 64 warps per SM, except for GA10X (sm_86) series which also says 48.
Okay, so there's probably something more complicated going on here, as I think Ampere is supposed to be 128 CUDA cores per SM. I assume by GA10X you're referring to the desktop Ampere chips? If that's the case, then the numbers would line up, and would imply that this is just some funky architectural detail and it's still 128.
 
The caveat here is that SMs have different configurations in different architectures. So 1 Maxwell SM will have a different number of CUDA cores then 1 Ampere SM.
Ampere and Maxwell contain 128 CC per sm

I don't think devkits typically use bigger/more performant GPUs than the target hardware has.
I believe that in the devkits the GPU, CPU are clocked higher and more RAM are available for debugging purposes

So in a way they would be more performant than the retail versions :p

I think it’s like 20% more for the CPU?
 
Okay, so there's probably something more complicated going on here, as I think Ampere is supposed to be 128 CUDA cores per SM. I assume by GA10X you're referring to the desktop Ampere chips? If that's the case, then the numbers would line up, and would imply that this is just some funky architectural detail and it's still 128.
FAMILY_GA10X or sm_86 is what they're defined as. According to this list, sm_86 appears to cover the desktop Ampere chips.
 
Ampere and Maxwell contain 128 CC per sm
Ah do they?

Well at least the arrangements are definitely not the same so that could be what we're looking at here with the warps.
I believe that in the devkits the GPU, CPU and more RAM are available for debugging purposes

So in a way they would be more performant :p
True but the point still stands. They'll have the same GPU, physically speaking.
 
0
A few things: (1) A dev kit for Switch isn't Windows, it's modified Switch hardware, (2) this source code could build either the Windows reference implementation, which is meant for development -- though not for a dev kit -- or it could build for the real NX hardware, (3) anywhere the source refers to a Tegra chip like GA10F/T239, that's concrete in terms of what chip they're talking about and it can only be relevant to the NX hardware since Windows PCs don't have Tegras in them.
Ah, yes, I mistook GA10F for GA102 - like I said, dumb. I assumed you'd want to stub out all supported hardware so that the reference implementation can scale its threadpool. Thank you!
 
Quoted by: LiC
1
FAMILY_GA10X or sm_86 is what they're defined as. According to this list, sm_86 appears to cover the desktop Ampere chips.
Okay, yeah, definitely sounds like the number probably isn't directly the CUDA core count, then, even if it might work out that way for TX1. This could imply that the chip is configured closer to the desktop cards than Orin, but I feel the need to emphasize that I really don't have an especially solid handle on what this number means.
 
0
Ah, yes, I mistook GA10F for GA102 - like I said, dumb. I assumed you'd want to stub out all supported hardware so that the reference implementation can scale its threadpool. Thank you!
There is in fact a lot of code that's only for the Windows impl to manage which Turing/Ampere GPU it's on and try to make it work properly. That's a big part of what makes it difficult to decipher what's really relevant for the Switch. On the subject of dev kits, GA10F is definitely the "main" Tegra chip NVN2 is meant to run on, but there are places that check for GA10B (Orin), so it's entirely possible that chip or some version of it was used in dev kits.
 
0
12 SMs just sounds wildly unrealistic from a size perspective. Wasn't 8SMs even supposedly cutting it close?

Kinda feel a little sketchy on that detail...
Well, can't really argue with the fucking source code.

Unless Drake is "Docked Only Dane", it has to be it, as it's baked into NVN2's Source Code fragment that is seemingly in the leak.
 
Well, can't really argue with the fucking source code.

Unless Drake is "Docked Only Dane", it has to be it, as it's baked into NVN2's Source Code fragment that is seemingly in the leak.
Yeah Drake being stationary is something I thought about...

This is all just very odd.
 
This is pointless publicity stunting. Nvidia wouldn't buy these guys a coffee if that's what they were asking for. Paying a ransom is just an invitation for more people to try to hack and extort you in the future. Never ceases to amaze me how dumb and immature these black hat groups are despite being sophisticated enough to get the hacks done in the first place.

I am fairly sure this was done via social engineering, which I don't find particularly sophisticated.
 
0
12 SMs just sounds wildly unrealistic from a size perspective. Wasn't 8SMs even supposedly cutting it close?

Kinda feel a little sketchy on that detail...

12 SMs seem unrealistic because we don't have anything to compare it to...
We don't even have full specs on GA107, but kopite7kimi has stated 190+mm²(which the the full GA107 is probably 24SM's)
 
12 SMs seem unrealistic because we don't have anything to compare it to...
We don't even have full specs on GA107, but kopite7kimi has stated 190+mm²(which the the full GA107 is probably 24SM's)
yeah, and we have to consider that Orin is likely as big as it is primarily due to the A78AEs, DLAs, and other Automotive/AI stuff in it.
 
What are you guys talking about? This new system might not be hybrid?
12SMs is stated in NVN2's Source Code which is way bigger of a GPU than any of us were really expecting.


So T239 Drake could be a Stationary variant of Dane?

And NVN2 References it as the SM count because that is the limit of the SM count of the GA10F uArch that NVN2 expects to run?
 
0
What are you guys talking about? This new system might not be hybrid?
There's no real indication that it won't be a hybrid, but there's new info now suggesting it will have a much bigger SoC die than we previously thought.

I'm having trouble believing they'd use something so big in a similar form factor to the original especially considering how little internal space was left in the original Switch.
 
0
I doubt Nintendo is planning a stationary console, but even if they were, I think it's completely impossible that it wouldn't be the complement to another new hybrid system released around the same time. And there's no evidence of two new systems. I also think anything that would make it that divergent from the original Switch already would have been reported on by now.
 
I doubt Nintendo is planning a stationary console, but even if they were, I think it's completely impossible that it wouldn't be the complement to another new hybrid system released around the same time. And there's no evidence of two new systems. I also think anything that would make it that divergent from the original Switch already would have been reported on by now.
Are you seeing any indication of docking/multiple power modes or anything like that? Considering that's the type of stuff that would be detailed in the original NVN I'd imagine.
 
Are you seeing any indication of docking/multiple power modes or anything like that? Considering that's the type of stuff that would be detailed in the original NVN I'd imagine.
I haven't seen anything like that in either NVN1 or 2, although I can't claim to have a remotely thorough view of their contents. But I'd hazard a guess that things like docking and performance profiles are in the Nintendo SDK side of NVN, not the Nvidia graphics driver side.
 
I suppose the shader count being higher than expected could be evidence of the "Drake is a newer iteration of Dane on a denser process" theory.
 
I'm very curious to hear what @Thraktor thinks about this seemingly confirming 12 SMs. My views on the hardware have typically aligned with his and this is throwing me for a loop.
I haven't seen anything like that in either NVN1 or 2, although I can't claim to have a remotely thorough view of their contents. But I'd hazard a guess that things like docking and performance profiles are in the Nintendo SDK side of NVN, not the Nvidia graphics driver side.
Makes sense, thank you. I didn't consider that there'd be two sides to it.
 
Are you seeing any indication of docking/multiple power modes or anything like that? Considering that's the type of stuff that would be detailed in the original NVN I'd imagine.
I haven't seen anything like that in either NVN1 or 2, although I can't claim to have a remotely thorough view of their contents. But I'd hazard a guess that things like docking and performance profiles are in the Nintendo SDK side of NVN, not the Nvidia graphics driver side.
Yeah, I imagine NVN probably wouldn't deal with this super explicitly. That stuff is probably mainly happening at abstraction levels both above and below what NVN deals with.
 
0
I haven't seen anything like that in either NVN1 or 2, although I can't claim to have a remotely thorough view of their contents. But I'd hazard a guess that things like docking and performance profiles are in the Nintendo SDK side of NVN, not the Nvidia graphics driver side.
Docking/Undocking is indeed handled by OS Services, and the game handles changing resolution if need be when undocked (it's documented in various Switch Homebrew places)
 
0
It's been a while since this thread was flooded with a bunch of excited technospeak that I have to use context and google to digest. It feels like I'm in a newsroom and everyone is frantically chasing leads and shouting out what they're finding while others are trying to piece it together as it comes. And I'm just the camera operator, excited to see what all the activity turns into once it all comes together.

Yall're awesome.
 
It's been a while since this thread was flooded with a bunch of excited technospeak that I have to use context and google to digest. It feels like I'm in a newsroom and everyone is frantically chasing leads and shouting out what they're finding while others are trying to piece it together as it comes. And I'm just the camera operator, excited to see what all the activity turns into once it all comes together.

Yall're awesome.
TL;DR, Hackers released DLSS's Source Code.

Accidentally leaked NVN2 while at it.

That code contained the SM Count and Name of T239 (Drake) pinning it at 12SMs
 
It's been a while since this thread was flooded with a bunch of excited technospeak that I have to use context and google to digest. It feels like I'm in a newsroom and everyone is frantically chasing leads and shouting out what they're finding while others are trying to piece it together as it comes. And I'm just the camera operator, excited to see what all the activity turns into once it all comes together.

Yall're awesome.

People in the thread: "the DSSLR count is 3.4x higher than the Switch which has a 14 SM count and a 2439 internal name-"

Us: Thank you Famiboards, very cool!
 
pinning it at 12SMs
And that's way bigger than expected, right?

People in the thread: "the DSSLR count is 3.4x higher than the Switch which has a 14 SM count and a 2439 internal name-"

Us: Thank you Famiboards, very cool!
So cool

Damaging Nvidia's relationship with Nintendo to an extent is one way to do so.
I'm hoping Nintendo doesn't look down on Nvidia for this leak the way it's been said they looked down on Netflix for leaking the Zelda talks. I'm sure Nintendo can't just pull the plug on the Nvidia hardware relationship the way they pulled the plug on Netflix.
 
Some more numbers in case everyone wasn't confused enough. GA10F has 1 GPC with 6 TPC. GA10B has 2 GPC with 4 TPC per (8 total).

I've also been shown some numbers for GA10B and GA10F floating-point efficiency, but I'm scared to post them.
Umm... translate?
I understand what the GPCs mean (Determiner of the L2 allotment) but not the TPC
 
Quoted by: LiC
1
Yes. Personally I was expecting 4SMs and thought 8 was like the max possible, and extremely unlikely.

Looks like they might go for 12 which means very low clocks but much bigger chip.
Okay I'm super ignorant so forgive me if this is off base but: would a configuration that runs at lower clock speeds perhaps help with backward compatibility? I remember people talking about concerns that games programmed for TX1 might not play nice with a much faster chip, so could "bigger, but still clocked low" possibly alleviate that?
 
Umm... translate?
I understand what the GPCs mean (Determiner of the L2 allotment) but not the TPC
I assume you're aware it stands for texture processing cluster. The extent of my knowledge is that GPCs contain TPCs which contain SMs. And actually, since the Ampere architecture evidently has two SMs per TPC, that would make the math check out with the 12 SMs as GA10F has 6 TPCs.
 
Okay I'm super ignorant so forgive me if this is off base but: would a configuration that runs at lower clock speeds perhaps help with backward compatibility? I remember people talking about concerns that games programmed for TX1 might not play nice with a much faster chip, so could "bigger, but still clocked low" possibly alleviate that?
I don't think so because the GPU configuration will still be very different. I think either way they will need a software compatibility layer.

… Which is good, right? 👀
It means the power ceiling on this thing just jumped way up. Like, it could do 3-4TFlops before DLSS at high enough clocks.

However it's likely that they chose to configure the chip this way so that they'll use very low clock speeds, like what the base Switch uses or even lower. This will still be a substantial jump but more in line with what we previously thought.

It also means this thing might be more expensive, since bigger chips cost more $$$.
 
I assume you're aware it stands for texture processing cluster. The extent of my knowledge is that GPCs contain TPCs which contain SMs. And actually, since the Ampere architecture evidently has two SMs per TPC, that would make the math check out with the 12 SMs as GA10F has 6 TPCs.
Ah okay so it is 12SMs for Drake (1536 CUDA Cores, 6 RT Cores, 44 Tensor Cores?) just you have the FP32 FLOPs number (however NVIDIA defines it) for GA10B and GA10F.

Those should let us relate them to each other but not really back to Ampere easily without a common score tester as GA10B/F both have uArch differences versus Desktop Ampere
 
0
Okay I'm super ignorant so forgive me if this is off base but: would a configuration that runs at lower clock speeds perhaps help with backward compatibility? I remember people talking about concerns that games programmed for TX1 might not play nice with a much faster chip, so could "bigger, but still clocked low" possibly alleviate that?
I wouldn't really expect any BC benefits. Would probably mainly be a play for lower power consumption.
 
Some more numbers in case everyone wasn't confused enough. GA10F has 1 GPC with 6 TPC. GA10B has 2 GPC with 4 TPC per (8 total).

I've also been shown some numbers for GA10B and GA10F floating-point efficiency, but I'm scared to post them.
Do you mind DMing me those?
 
I don't think so because the GPU configuration will still be very different. I think either way they will need a software compatibility layer.


It means the power ceiling on this thing just jumped way up. Like, it could do 3-4TFlops before DLSS at high enough clocks.

However it's likely that they chose to configure the chip this way so that they'll use very low clock speeds, like what the base Switch uses or even lower. This will still be a substantial jump but more in line with what we previously thought.

It also means this thing might be more expensive, since bigger chips cost more $$$.

What the hell happened I was still caught up with the ransomers but what
...... 12 SM'S more shaders than PS4? Twice as many shaders as Durango Xbone? Dogs and cats sleeping together? What is going on??!!!!

Okay, how 'massive' is this chip going to be really? What kind of a ballpark reduction of size per transistor..... Per... Sm can we use, using the old 20nm X1 as a base?

Right? That one had 2 sm, we get the footprint of that, How would 12 of these ampere SM'S shrunk to 8 and 5nm compare to that footprint? How much bigger?
 
Please read this staff post before posting.

Furthermore, according to this follow-up post, all off-topic chat will be moderated.
Last edited:


Back
Top Bottom