StarTopic Future Nintendo Hardware & Technology Speculation & Discussion |ST| (Read the staff posts before commenting!)

Skittzo · Nov 10, 2021

ILikeFeet said:
Some folks held out hope we'd get something better.

Sounds like wishful thinking to me. All of the evidence pointed to 8nm and there wasn't really any reason to think that changed.

Chaotic Neutral · Nov 10, 2021

RennanNT said:
Simplified version, @Mbolibombo and @Chaotic Neutral :

The next Nintendo chip (codenamed Dane) is very likely to be derived from the chip they just announced (a cut down version, maybe shrank if we're being really optimistic about the "Orin X" chip, which has yet to be detailed).

There's not much we can take from this in terms of how powerful Dane will be besides more confidence on the previous ballpark ("it should be able to play XB1/PS4 games fine, DLSS should make games look closer to a PS4Pro version with some caveats"), since we don't know how much was cut nor how much they will lower the clocks (for sustained performance and power budget). There are many YouTube videos explaining better than words could what you can expect from DLSS if you're not familiar with it.

For tech enthusiasts, it means a more accurate idea what it can and can't do compared to extrapolating from PC/notebook graphic cards which weren't made with such a low power budget in mind and what tech improvements they made since they finished those cards. For example, I have more confidence that they will use DLSS as is rather than a lighter version and (Limited) Ray Tracing seems more likely now.

As for AAA ports, I would expect companies bringing PS360 ports now to move to PS4X1 ports and about as many PS5 "impossible ports" as we got PS4 "impossible ports" on OG. Overall, support should be better, since the publishers who took a couple years to get onboard might be there from the beginning, but don't expect anything close to parity (there are many reasons for games skipping the Switch besides power).

Thanks for this! I knew about the DLSS but this is a very helpful explanation of the other stuff

PedroNavajas · Nov 10, 2021

Hermii said:
That was always based on nothing but hopes and dreams. Every single piece of rumour we heard, has indicated 8nm.

ILikeFeet said:
Some folks held out hope we'd get something better.

Hope Dies Last.

Ghostsonplanets · Nov 10, 2021

Yesterday was wild! Nice to see what will probably be the SoC Dane will be based off(Orin NX). Inclusion of RT Cores made me really excited. Can't wait for the reveal of Switch Next and what Nintendo will do with RT.

Dakhil · Nov 10, 2021

Here are the slides from the "NVIDIA Orin Platform: Scalable and Modular Architecture for High-Performance AI Compute [A31459]" session.

So there's no mention of which process node is being used for Jetson AGX Orin and Jetson Orin NX, nor is there any mention of Orin X.

Considering the max frequencies for the CPU and GPU for Jetson AGX Orin and Jetson Orin NX, I think Jetson AGX Orin and Jetson Orin NX are fabricated using Samsung's 8N process node. And I think Dane's likely to be fabricated using Samsung's 8N process node.

And I'm sure people will strongly disagree with me, but I think there's still the possibility that car manufacturers are simply speculating that Orin X's fabricated using a 7 nm** process node, similar to how Gainward speculated that consumer Ampere GPUs are fabricated using Samsung's 7LPP process node, probably based on a report from the Korea Herald.

Anatole · Nov 10, 2021

Alovon11 said:
NVIDIA increased the speed of the Deep-Learning SDK by 2-3x (Jensen Said 3X, lowballing it to a 2x increase minimum)

It seems very likely to me that he is talking about training here, not inference on the tensor cores. That’s the audience that a deep learning SDK would be targeting, anyway. (With the caveat that I haven’t watched the presentation that fwd-bwd mentioned about chaining/tilting, which I have never heard of.)

lemonfresh · Nov 10, 2021

RennanNT said:
Simplified version, @Mbolibombo and @Chaotic Neutral :

The next Nintendo chip (codenamed Dane) is very likely to be derived from the chip they just announced (a cut down version, maybe shrank if we're being really optimistic about the "Orin X" chip, which has yet to be detailed).

There's not much we can take from this in terms of how powerful Dane will be besides more confidence on the previous ballpark ("it should be able to play XB1/PS4 games fine, DLSS should make games look closer to a PS4Pro version with some caveats"), since we don't know how much was cut nor how much they will lower the clocks (for sustained performance and power budget). There are many YouTube videos explaining better than words could what you can expect from DLSS if you're not familiar with it.

For tech enthusiasts, it means a more accurate idea what it can and can't do compared to extrapolating from PC/notebook graphic cards which weren't made with such a low power budget in mind and what tech improvements they made since they finished those cards. For example, I have more confidence that they will use DLSS as is rather than a lighter version and (Limited) Ray Tracing seems more likely now.

As for AAA ports, I would expect companies bringing PS360 ports now to move to PS4X1 ports and about as many PS5 "impossible ports" as we got PS4 "impossible ports" on OG. Overall, support should be better, since the publishers who took a couple years to get onboard might be there from the beginning, but don't expect anything close to parity (there are many reasons for games skipping the Switch besides power).

Thanks for laying this out for the tech illiterate like myself, Am officially hyped for Dane now

fwd-bwd · Nov 10, 2021

Anatole said:
It seems very likely to me that he is talking about training here, not inference on the tensor cores. That’s the audience that a deep learning SDK would be targeting, anyway. (With the caveat that I haven’t watched the presentation that fwd-bwd mentioned about chaining/tilting, which I have never heard of.)

Sorry for my typo; it was tiling not tilting. I'll edit my post.

NateDrake · Nov 10, 2021

Hermii said:
Well, none of us expected hundreds of RT cores, if the RT core count is what he means by "limited".

Limited in terms of functionality. Kinda like what you see on Series S.

Thraktor · Nov 10, 2021

Just catching up on the Jetson Orin info from GTC. There's not much surprising about it, with the exception of RT, but it's nice that we've got a white paper on the architecture, as I'd imagine Dane will be very similar. On the RT cores, the fact that they're there at all is a surprise, but perhaps more interesting is the choice to include half as many of them as desktop Ampere. It almost seems as if they're there for compatibility reasons, or maybe they found some limited automotive use-cases for them and decided to keep some limited functionality there. In any case, it does increase the likelihood that we see RT cores in Dane, but it reduces the expected performance of those cores even lower (from a pretty low base), so I still have low expectations of many games making extensive use of them.

There is one thing which we can infer from the photos provided, though, which is the size (and therefore transistor density) of Orin. In particular, the Jetson AGX Orin and Jetson Orin NX pages both provide nice head-on photos of the boards, which makes calculating the die size easy. As these don't show the actual bare die (just a grey rectangle with the Nvidia logo), I also used the photo in this press release, which shows an actual bare die, but is lower resolution and at an awkward angle.

Using each of the three photos, the calculation in each case comes to a 22.1mm x 20.8mm die (+- about 0.1mm), for a die size of approx 460mm2. This tells us a few things:

The Jetson Orin NX chip is the full Orin die, just binned with parts disabled. This is as I would have expected, but good to get confirmation.
The Orin die has a density of approx 45.6 million transistors per mm2, assuming 21 billion transistors is still correct.
This is in line with the density of GA102, GA104, etc., so it's likely using an identical 8N manufacturing process, and isn't using higher-density mobile libraries.

So, if we're to assume about a 100mm2 die size for Dane, a transistor count of about 4.5 billion seems likely. This compares to 2 billion transistors for the TX1/Mariko chips used in existing Switch models.

ILikeFeet · Nov 10, 2021

NateDrake said:
Limited in terms of functionality. Kinda like what you see on Series S.

I'm really curious now how 4-8 RT cores compare to a 6500XT (a hypothetical Series S gpu) given the differences in AMD's method (intersection testing) and Nvidia's method (BVH traversial and intersection testing)

ILikeFeet · Nov 10, 2021

Thraktor said:
It almost seems as if they're there for compatibility reasons, or maybe they found some limited automotive use-cases for them and decided to keep some limited functionality there.

in nvida's talk about medial stuff, they mention using ray tracing there, so Orin would be used there. however, this is also for the updated Clara, so there's also a separate gpu to extra performance. there's also some research in using ray tracing in depth perception and identification stuff

Dakhil · Nov 10, 2021

Thraktor said:
Using each of the three photos, the calculation in each case comes to a 22.1mm x 20.8mm die (+- about 0.1mm), for a die size of approx 460mm2. This tells us a few things:

The Jetson Orin NX chip is the full Orin die, just binned with parts disabled. This is as I would have expected, but good to get confirmation.

The Orin die has a density of approx 45.6 million transistors per mm2, assuming 21 billion transistors is still correct.

This is in line with the density of GA102, GA104, etc., so it's likely using an identical 8N manufacturing process, and isn't using higher-density mobile libraries.

Anandtech mentioned that Jetson AGX Orin (and Jetson Orin NX by extension) has 17 billion transistors. I think Orin X could be the only chip in the Orin family that has 21 billion transistors.

Teal'c · Nov 10, 2021

Meanwhile I'm reading about new titles supporting DLSS (most recently Jurassic World Evolution 2) and I imagine Switch Dane could potentially have a devastating line-up (not obviously all at launch).

I hope by now even medium and small softwarehouses have their own dev-kits.

Alovon11 · Nov 10, 2021

Teal'c said:
Meanwhile I'm reading about new titles supporting DLSS (most recently Jurassic World Evolution 2) and I imagine Switch Dane could potentially have a devastating line-up (not obviously all at launch).

I hope by now even medium and small softwarehouses have their own dev-kits.

Are we sure Devastating is the right word here?

Teal'c · Nov 10, 2021

Alovon11 said:
Are we sure Devastating is the right word here?

for my wallet yes

Alovon11 · Nov 10, 2021

Teal'c said:
for my wallet yes

Fair.
Either way, the cache change alone would boost the system around 30% over Ampere.

So 2TFLOPS ampere would become 2.6TFLOPs effectively, so after DLSS Performance it would be above the GTX 1660 Super, and after DLSS Ultra Performance, it would be pushing near the RTX 2070.

And that is only the Cache improvement.

The FP16 Change if used can at least likely bring a 20% boost.

So that would make it equivalent to a GTX 1650 before DLSS, and then around a 2060 after DLSS performance.

Pretty crazy stuff.

Teal'c · Nov 10, 2021

Alovon11 said:
Fair.
Either way, the cache change alone would boost the system around 30% over Ampere.

So 2TFLOPS ampere would become 2.6TFLOPs effectively, so after DLSS Performance it would be above the GTX 1660 Super, and after DLSS Ultra Performance, it would be pushing near the RTX 2070.

And that is only the Cache improvement.

The FP16 Change if used can at least likely bring a 20% boost.

So that would make it equivalent to a GTX 1650 before DLSS, and then around a 2060 after DLSS performance.

Pretty crazy stuff.

Really a lot of stuff.
If it weren't for you experts, I'd have a really hard time imagining a small portable device capable of unleashing this much power.

I wonder if there will still be developers able to find excuses not to bring their titles to Nintendo's next console.

GravyFingerz · Nov 10, 2021

Alovon11 said:
Fair.
Either way, the cache change alone would boost the system around 30% over Ampere.

So 2TFLOPS ampere would become 2.6TFLOPs effectively, so after DLSS Performance it would be above the GTX 1660 Super, and after DLSS Ultra Performance, it would be pushing near the RTX 2070.

And that is only the Cache improvement.

The FP16 Change if used can at least likely bring a 20% boost.

So that would make it equivalent to a GTX 1650 before DLSS, and then around a 2060 after DLSS performance.

Pretty crazy stuff.

I’m excited for a handheld console with a better GPU than my PC

lemonfresh · Nov 10, 2021

Teal'c said:
Really a lot of stuff.
If it weren't for you experts, I'd have a really hard time imagining a small portable device capable of unleashing this much power.

I wonder if there will still be developers able to find excuses not to bring their titles to Nintendo's next console.

Are you forgetting the steam deck?

RennanNT · Nov 10, 2021

It's worth to point out that the chip with 2 TFlops is the Orin NX. We don't know how many TF the chip Nintendo is going to use will have, but it's very unlikely it will have as many gpu cores as the NX and run the GPU at max clock on top of it.

PedroNavajas · Nov 10, 2021

lemonfresh said:
Are you forgetting the steam deck?

To be fair, the Deck is significantly larger than the OG Switch. I expect Switch 2 to br about the same size.

Hermii · Nov 10, 2021

lemonfresh said:
Are you forgetting the steam deck?

SD really pushes the envelope of a "small, portable" device though.

Teal'c · Nov 10, 2021

lemonfresh said:
Are you forgetting the steam deck?

well in theory yes, but Switch is smaller and really portable, Steam deck...

ILikeFeet · Nov 10, 2021

RennanNT said:
It's worth to point out that the chip with 2 TFlops is the Orin NX. We don't know how many TF the chip Nintendo is going to use will have, but it's very unlikely it will have as many gpu cores as the NX and run the GPU at max clock on top of it.

looking at the gpu clock, I can't help but think we'll see Switch clocks again, but on the high end. so 920MHz and 460MHz

Hermii said:
SD really pushes the envelope of a "small, portable" device though.

Steam Deck feels like it's really just a laptop Valve carelessly shoved into a "handheld"

Skittzo · Nov 10, 2021

RennanNT said:
It's worth to point out that the chip with 2 TFlops is the Orin NX. We don't know how many TF the chip Nintendo is going to use will have, but it's very unlikely it will have as many gpu cores as the NX and run the GPU at max clock on top of it.

Doesn't the NX only use half of the GPU cores of full Orin already?

Alovon11 · Nov 10, 2021

GravyFingerz said:
I’m excited for a handheld console with a better GPU than my PC

What GPU do you have?

It is more likely that the 2.6FLOP/Stronger than 1050T-Weaker than GTX 1650 number is the native processing power Dane would have (1660 Super after DLSS performance, Around 2070 after DLSS Ultra Performance) considering the full "Potential" 3TFLOP number requires high levels of optimization using the double FP16 that Orin/Dane has vs Ampere.

Hosermess · Nov 10, 2021

Alovon11 said:
It is more likely that the 2.6FLOP/Stronger than 1050T-Weaker than GTX 1650 number is the native processing power Dane would have (1660 Super after DLSS performance, Around 2070 after DLSS Ultra Performance) considering the full "Potential" 3TFLOP number requires high levels of optimization using the double FP16 that Orin/Dane has vs Ampere.

Just when I think I'm getting a handle on things in this thread, someone types a sentence like this.

No need to break it down, I can decipher it in chunks, but strung together it made my mind bottle.

Kenka · Nov 10, 2021

Alovon11 said:
Fair.
Either way, the cache change alone would boost the system around 30% over Ampere.

So 2TFLOPS ampere would become 2.6TFLOPs effectively, so after DLSS Performance it would be above the GTX 1660 Super, and after DLSS Ultra Performance, it would be pushing near the RTX 2070.

And that is only the Cache improvement.

The FP16 Change if used can at least likely bring a 20% boost.

So that would make it equivalent to a GTX 1650 before DLSS, and then around a 2060 after DLSS performance.

Pretty crazy stuff.

You are known for your enthusiasm regarding the succ specs but that is going far.

Dekuman · Nov 10, 2021

Alovon11 said:
What GPU do you have?

It is more likely that the 2.6FLOP/Stronger than 1050T-Weaker than GTX 1650 number is the native processing power Dane would have (1660 Super after DLSS performance, Around 2070 after DLSS Ultra Performance) considering the full "Potential" 3TFLOP number requires high levels of optimization using the double FP16 that Orin/Dane has vs Ampere.

Based on the recent Steam hardware survey https://store.steampowered.com/hwsurvey/videocard/
1060
1050ti
1650
and Intel integrated graphics (collectively) remain the most common GPU setup. If Switch 2 is as you describe, it would easily beat those 3 setups before DLSS.

RennanNT · Nov 10, 2021

Skittzo said:
Doesn't the NX only use half of the GPU cores of full Orin already?

Yes, but the NX is a binned Orin, so it doesn't have to worry about die size (which is 460mm2 for both). It also uses 25W at max clocks which is roughly twice what OG Switch used docked (max 18W with 4W being from charging IIRC).

Die size affects how many SoCs they can manufacture per wafer. Using this site just for a quick example (not necessarily accurate), going from 11x11mm (121mm2, roughly OG TX1) to 12x12mm (144mm2) meant a ~17% drop in production (492 -> 409). So keeping it small is part of the balancing act.

Orin AGX makes me confident that 4 A78 + 4 SM fits in a chip around a OG TX1 size (a little more than 1/4 of Orin's size) before cutting unnecessary things and Orin NX TDP makes me confident that they could use max clock for those. I don't have the knowledge about how much more they could pack within a TX1-like size and I'm not betting on Nintendo going for a significant bigger chip nor increasing docked wattage. CPU seems like a bigger priority as well, since Nintendo is more about gameplay than pushing graphics and that will facilitate 3rd party ports more (since GPU work is much easier to scale). And thus, my previous comment.

Alovon11 · Nov 10, 2021

Kenka said:
You are known for your enthusiasm regarding the succ specs but that is going far.

Dekuman said:
Based on the recent Steam hardware survey https://store.steampowered.com/hwsurvey/videocard/
1060
1050ti
1650
and Intel integrated graphics (collectively) remain the most common GPU setup. If Switch 2 is as you describe, it would easily beat those 3 setups before DLSS.

Well, it's just the numbers.

We have generally agreed on 2TFLOPs ampere.

But then NVIDIA goes and doubles the L2 Cache number and FP16 Performance, and adds 50% more L1 Cache into the Orin Architecture.

Remember, AMD got 30-40% of the 54% boost over RDNA1 with Infinity Cache, and that was L3 Cache which is slower than L1 and L2.

So something like that but with far faster memory and Ampere being more memory-efficient than RDNA1/RDNA2 could at least hit that 30% low-end number for a boost.
Therefore, 2 Orin TFLOPs, equates to 2.6 Ampere TFLOPs.

Double that for DLSS Performance mode, you get 5.2TFLOPs, which is above the 5TFLOPs of the 1660 Super.

It's simple math.

And the FP16 thing has an unknown performance boost and can swing wildly between the 40-50% boost FP16 optimization in games for Pascal got, or a lower number, but they only would need to hit a 20% increase to take that 2.6TFLOP number and push it to 3 TFLOPs, meaning it falls into the ballpark of the GTX 1650 natively

Now, it would run more like the Laptop versions of those cards due to the power-draw and thermals.

But it's still impressive and with how DLSS has advanced, it has room to grow even further with things like DRS+DLSS and with how tight-knit ARM SoCs are on communication between components like the shared memory pool vs Desktop and Laptop GPUs.

Onizuka · Nov 10, 2021

Can someone give me a summary of what this is all about Dane and his speculated specs please? And what performance would it suppose compared to another console? (for example PS4, PS4 Pro, Xbox One, Xbox One X ...)

And would Dane be a Pro or Successor model?

Also today I saw that new SoCs were introduced by Nvidia, called Jetson Orin NX and Jetson AGX Orin, and that they are being rumored about them on a Nintendo console. Which of the two would it be? Since according to the specifications table it seems that Jetson AGX Orin is much more powerful, also, would AGX Orin/ Orin NX be Dane like a Pro model? or would it be a successor? Thank you and forgive for all the questions and my infinite ignorance.

Alovon11 · Nov 10, 2021

Onizuka said:
Can someone give me a summary of what this is all about Dane and his speculated specs please? And what performance would it suppose compared to another console? (for example PS4, PS4 Pro, Xbox One, Xbox One X ...)

And would Dane be a Pro or Successor model?

Also today I saw that new SoCs were introduced by Nvidia, called Jetson Orin NX and Jetson AGX Orin, and that they are being rumored about them on a Nintendo console. Which of the two would it be? Since according to the specifications table it seems that Jetson AGX Orin is much more powerful, also, would AGX Orin/ Orin NX be Dane like a Pro model? or would it be a successor? Thank you and forgive for all the questions and my infinite ignorance.

Orin NX is pretty much Dane outside of the CPU and DLA's (Config-wise at least)

Pretty much right now, when docked it would outperform the PS4 by a fair margin (20-50% faster) before DLSS.

After DLSS, it will be somewhere above the Series S on the low end of DLSS' performance boost, and below the PS5 at the high end.

Portable mode gets hard to tell outside of "Equal or faster than Xbox One" before DLSS if DLSS is enabled in Portable mode at all.

RennanNT · Nov 10, 2021

Onizuka said:
And would Dane be a Pro or Successor model?

That's up to Nintendo not the hardware. Here's the summary.

Alovon11 said:
Orin NX is pretty much Dane outside of the CPU and DLA's (Config-wise at least)

How did you reach this conclusion again?

Alovon11 · Nov 10, 2021

RennanNT said:
How did you reach this conclusion again?

Because they literally halved Orin's GPU for Orin NX, and the CPU config for 8CPU cores is the most likely one for Dane (Outside of swapping it for A78Cs vs A78AEs)

Also, the memory bandwidth of Orin NX was halved, and in Orin AGX and Orin NX, the Memory bandwidth looks to be scaling with the GPU (2048 CUDA in Orin AGX, for 204.8GB/s of bandwidth, 1024 CUDA in Orin NX, with 102.4GB/s of bandwidth). So 1MB/s per CUDA core.

So if they are scaled like that for maximum/most efficient performance for bandwidth (or something with the extra L1 and L2 Cache), then that sort of leaves them either dropping it to 4SMs at 102.4GB/s for 8GB-12GB/s of LPDDR5 if it's going to be less than 8SMs. resulting in 2MB/s per CUDA core, keeping it at 8SMs at 102.4GB/s, or pulling an insane move and having 204.8GB/s with 8Sms.

8SMs is just the most likely scenario now as 4SMs is a bit weak overall (And DLSS would be a bit harder to work with),

And 6SMs, if they want higher than 100GB/s of bandwidth, would require jumping to 153.6GB/s to get to a 2MB/s per CUDA number for the 768 CUDA cores a 6SM config would have.

This just leaves 8SMs as the most economical option due to the memory bandwidth end of things.

fwd-bwd · Nov 10, 2021

Onizuka said:
Also today I saw that new SoCs were introduced by Nvidia, called Jetson Orin NX and Jetson AGX Orin, and that they are being rumored about them on a Nintendo console. Which of the two would it be? Since according to the specifications table it seems that Jetson AGX Orin is much more powerful, also, would AGX Orin/ Orin NX be Dane like a Pro model? or would it be a successor? Thank you and forgive for all the questions and my infinite ignorance.

We started discussing it from the middle of page 40. You may want to skim through them quickly. There are a few posts with good summaries.

ILikeFeet · Nov 10, 2021

Onizuka said:
Also today I saw that new SoCs were introduced by Nvidia, called Jetson Orin NX and Jetson AGX Orin, and that they are being rumored about them on a Nintendo console. Which of the two would it be? Since according to the specifications table it seems that Jetson AGX Orin is much more powerful, also, would AGX Orin/ Orin NX be Dane like a Pro model? or would it be a successor? Thank you and forgive for all the questions and my infinite ignorance.

it's less Dane will be one of these, but rather Dane will be based on these designs. what we see here could trickle down to Dane. we might get something totally different (unlikely though).

RennanNT · Nov 10, 2021

Alovon11 said:
Because they literally halved Orin's GPU for Orin NX, and the CPU config for 8CPU cores is the most likely one for Dane (Outside of swapping it for A78Cs vs A78AEs)

NX is a binned chip, it's normal that about half of it is disabled.

Alovon11 said:
Also, the memory bandwidth of Orin NX was halved, and in Orin AGX and Orin NX, the Memory bandwidth looks to be scaling with the GPU (2048 CUDA in Orin AGX, for 204.8GB/s of bandwidth, 1024 CUDA in Orin NX, with 102.4GB/s of bandwidth). So 1MB/s per CUDA core.

So if they are scaled like that for maximum/most efficient performance for bandwidth (or something with the extra L1 and L2 Cache), then that sort of leaves them either dropping it to 4SMs at 102.4GB/s for 8GB-12GB/s of LPDDR5 if it's going to be less than 8SMs. resulting in 2MB/s per CUDA core.

8SMs is just the most likely scenario now.

I'm not surprised with this extrapolation given you're the most optimistic in this thread by a good margin. Maybe you're even right about 2TF and I'm being too pessimistic, but please be careful about stating opinions and deductions as a fact or as a consensus in the thread.

ShadowFox08 · Nov 10, 2021

Alovon11 said:
Fair.
Either way, the cache change alone would boost the system around 30% over Ampere.

So 2TFLOPS ampere would become 2.6TFLOPs effectively, so after DLSS Performance it would be above the GTX 1660 Super, and after DLSS Ultra Performance, it would be pushing near the RTX 2070.

And that is only the Cache improvement.

The FP16 Change if used can at least likely bring a 20% boost.

So that would make it equivalent to a GTX 1650 before DLSS, and then around a 2060 after DLSS performance.

Pretty crazy stuff.

The Tegra X1/X2 and Turing GPUs have mixed precision (combining use of fp16 and fp32) enabled which makez the Switch and Turing GPUs more performant per flop from on paper than Ampere.

I'm not sure if the reason is because ampere apparently has half the fp16 as turing (please correct me if I'm wrong) or what. But perhaps that fp16 change in Orion architecture puts it more in line in performance with Turning.

And this of course also doesn't even count the cache increases. Which like you said would boost switch's performance considerably.

Even with DLSS, I think comparing it to an RTX 2060 is a bit reaching though.

Despite Nvidia bring wizards on efficiency and power, There's only do much you can do on a 8nm node. I'm expecting 15 watts max on docked mode, but hoping they/Nintendo can push out as much performance as possible and match the battery life of V1 switch, while having the matters life of V1/OLED switch in the revision (along with LPDDR5x) in a revision on a 5nm or. smaller node. I do feel more comfortable we match ps4 in docked without DLSS though.

RennanNT said:
It's worth to point out that the chip with 2 TFlops is the Orin NX. We don't know how many TF the chip Nintendo is going to use will have, but it's very unlikely it will have as many gpu cores as the NX and run the GPU at max clock on top of it.

I would be surprised if we don't have all the cores. The NX is already using half of the original Orion GPU cores and 8/12 cpu cores, but at 25 watts max.

What I can see is the GPU bench clocked at 80% max speed (like the TX1 on switch) while the CPU is maybe 1.5GHz on the 8 cores instead of the full 2. Maybe 2 CPU cores will be disabled, I dunno. Not to mention removing anything that could be irrelevant on switch 2, and potentially swapping 12gb ram for 8 (I really hope not) while having the same 102 GB/s bandwidth.

So maybe a on paper 1.6 TFLOPs GPU (0.8 GHz docked and 0.32 Ghz handheld) and 6 cores 1.5 Ghz A78 core CPU.. And hopefully retaining 12GB lpddr5 RAM with 102 GB/s bandwidth is feasible at 15 watts on 8nm Samsung.. If they manage to keep all the cache improvements from NC and RT in the final switch 2 product, that would be a big fat cherry in top!

Alovon11 · Nov 10, 2021

ShadowFox08 said:
The Tegra X1/X2 and Turing GPUs have mixed precision (combining use of fp16 and fp32) enabled which makez the Switch and Turing GPUs more performant per flop from on paper than Ampere.

I'm not sure if the reason is because ampere apparently has half the fp16 as turing (please correct me if I'm wrong) or what. But perhaps that fp16 change in Orion architecture puts it more in line in performance with Turning.

And this of course also doesn't even count the cache increases. Which like you said would boost switch's performance considerably.

Even with DLSS, I think comparing it to an RTX 2070 is a bit reaching though, let alone 2060.

Despite Nvidia bring wizards on efficiency and power, There's only do much you can do on a 8nm node. I'm expecting 15 watts max on docked mode, but hoping they/Nintendo can push out as much performance as possible and match the battery life of V1 switch, while having the matters life of V1/OLED switch in the revision (along with LPDDR5x) in a revision on a 5nm or. smaller node. I do feel more comfortable we match ps4 in docked without DLSS though.

Again I will note, I am talking moreso about the Laptop GPUs when I refer to them.

Which the Laptop 2060 had a 20% overall cut to power vs Desktop.

In that context, I feel it becomes more realistic.

Kenka · Nov 10, 2021

Alovon11 said:
Well, it's just the numbers.

We have generally agreed on 2TFLOPs ampere.

But then NVIDIA goes and doubles the L2 Cache number and FP16 Performance, and adds 50% more L1 Cache into the Orin Architecture.

Remember, AMD got 30-40% of the 54% boost over RDNA1 with Infinity Cache, and that was L3 Cache which is slower than L1 and L2.

So something like that but with far faster memory and Ampere being more memory-efficient than RDNA1/RDNA2 could at least hit that 30% low-end number for a boost.
Therefore, 2 Orin TFLOPs, equates to 2.6 Ampere TFLOPs.

Double that for DLSS Performance mode, you get 5.2TFLOPs, which is above the 5TFLOPs of the 1660 Super.

It's simple math.

And the FP16 thing has an unknown performance boost and can swing wildly between the 40-50% boost FP16 optimization in games for Pascal got, or a lower number, but they only would need to hit a 20% increase to take that 2.6TFLOP number and push it to 3 TFLOPs, meaning it falls into the ballpark of the GTX 1650 natively

Now, it would run more like the Laptop versions of those cards due to the power-draw and thermals.

But it's still impressive and with how DLSS has advanced, it has room to grow even further with things like DRS+DLSS and with how tight-knit ARM SoCs are on communication between components like the shared memory pool vs Desktop and Laptop GPUs.

Let's recapitulate: a 1660 Super has a rated TDP of 125W, has 6.6 billion transistors occupying 284 mm2 of space and is built on the 12Nm TSMC node. The Dane chip should consume a frugal 10W, have say 4 billion transistors (if we go by Thraktor and Dakhil estimations/sources), fit inside the original Switch's board dimensions of 118 mm2 and will probably be built using a 8nm node if what Nate has heard isn't outdated.

In order to have a first estimate of the gap in performance between the two chips (before applying DLSS or DLAA), I's suggest we determine how what impact these metrics above have. For starters, If we assume that the FLOPS scale linearly with the transistor count and TDP, then we obtain that our hypothetical Dane has an output that is (6.6/4)*(125/10) times smaller than the 1660. That is a factor of 18.

Of course, the premises are wrong (maybe even completely) and I haven't touched on memory cache configuration, CPU and CUDA core counts, how advanced DLSS might be by the time the unit launches, etc. But in my opinion, it is a better starting point for guesstimating what to expect from Dane. I am not saying you are wrong though but finding out how clock speeds, TDP, CUDA core counts, die size, cache configuration and amout, and node technology affect a chip's final TFLOPS output would help us grounding our expectation in reality massively.

Alovon11 · Nov 10, 2021

Kenka said:
Let's recapitulate: a 1660 Super has a rated TDP of 125W, has 6.6 billion transistors occupying 284 mm2 of space and is built on the 12Nm TSMC node. The Dane chip should consume a frugal 10W, have say 4 billion transistors (if we go by Thraktor and Dakhil estimations/sources), fit inside the original Switch's board dimensions of 118 mm2 and will probably be built using a 8nm node if what Nate has heard isn't outdated.

In order to have a first estimate of the gap in performance between the two chips (before applying DLSS or DLAA), I's suggest we determine how what impact these metrics above have. For starters, If we assume that the FLOPS scale linearly with the transistor count and TDP, then we obtain that our hypothetical Dane has an output that is (6.6/4)*(125/10) times smaller than the 1660. That is a factor of 18.

Of course, the premises are wrong (maybe even completely) and I haven't touched on memory cache configuration, CPU and CUDA core counts, how advanced DLSS might be by the time the unit launches, etc. But in my opinion, it is a better starting point for guesstimating what to expect from Dane. I am not saying you are wrong though but finding out how clock speeds, TDP, CUDA core counts, die size, cache configuration and amout, and node technology affect a chip's final TFLOPS output would help us grounding our expectation in reality massively.

again, I am saying that it would perform like the LAPTOP 1660 AFTER DLSS Performance mode as it can effectively double the performance.

Orin at native would likely perform around the LAPTOP 1650 if the FP16 boost is used to peak effeictviness

Also, the Orin uArch is looking to be different than Ampere by more than we thought due to the FP16 change, so trying to compare transistor density, TDP, clock, die size is a comparison for naught really between Turing and Orin because it's Apples to Oranges.

They are quite a bit removed at this point.

ILikeFeet · Nov 10, 2021

if we're still thinking that Dane is gonna be in the range of XBO and PS4, we're talking 1050 to 1050 Ti levels of performance here. DLSS can get you higher fidelity, but you rendering at a lower resolution, which helps everything. a docked only switch game that renders at 480p can probably get you a lot of PS4's graphical effects

Kenka · Nov 10, 2021

Alovon11 said:
again, I am saying that it would perform like the LAPTOP 1660 AFTER DLSS Performance mode as it can effectively double the performance.
Orin at native would likely perform around the LAPTOP 1650 if the FP16 boost is used to peak effeictviness

Also, the Orin uArch is looking to be different than Ampere by more than we thought due to the FP16 change, so trying to compare transistor density, TDP, clock, die size is a comparison for naught really between Turing and Orin because it's Apples to Oranges.

They are quite a bit removed at this point.

I get your point but why is it apples to oranges to compare two architectures? Doesn't any comparison between chips eventually boil down to comparing a set of data with one another? I am not trolling, I am genuinely asking for clarification here.

Angel Whispers · Nov 10, 2021

Kenka said:
You are known for your enthusiasm regarding the succ specs but that is going far.

Please can we not do this? Not to have a go at you specifically, but it's alright to dare to dream, however wild some might perceive said dreams to be. Let's leave "policed and curbed expectations" at the New Old Place - It's a speculation thread, so, exploring all possible avenues ought to be welcome. Nothing is so unthinkable in that regard, and if it doesn't happen, it doesn't happen. The onus should be on one's self, as an adult, to get a grip of one's emotions, rather than tell others they're too optimistic.

Alovon11 · Nov 10, 2021

Kenka said:
I get your point but why is it apples to oranges to compare two architectures? Doesn't any comparison between chips eventually boil down to comparing a set of data with one another? I am not trolling, I am genuinely asking for clarification here.

True, it's just there comes a point where trying to get pinpoint accuracy just causes more inaccuracies due to those different architectures.

EX: Zen 2 and Zen 3 are on TSMC 7nm, have the same density, have the same die size, and have the same TDP.

Yet Zen 3 runs circles around Zen 2 due to the architecture changes they made.

So going into detail on Density/TDP/Clocks would be wasted time.

at least to make a simpler example.

And NVIDIA did it as well with the GTX 700 series being on 28nm for the most part still but having notable gains over the GTX 600 series

AMD did it again as well with RDNA1 to RDNA2 being on 7nm as well.

Intoxicate · Nov 10, 2021

So why is it ruled out that Nintendo will not go for a smaller node? I thought they were apparently not happy with the power efficiency of 8N and delayed the DSLL model (so we only got the OLED screen).

ILikeFeet · Nov 10, 2021

Intoxicate said:
So why is it ruled out that Nintendo will not go for a smaller node? I thought they were apparently not happy with the power efficiency of 8N and delayed the DSLL model (so we only got the OLED screen).

costs. Nvidia did a lot of the work for 8nm. changing that, someone is gonna have to foot the bill. and costs of newer nodes are SIGNIFICANTLY more expensive. not to mention all these newer nodes have practically 0 capacity at any fab

RennanNT · Nov 10, 2021

ShadowFox08 said:
I would be surprised if we don't have all the cores. The NX is already using half of the original Orion GPU cores and 8/12 cpu cores, but at 25 watts max.

Dane is derived from Orin, not Orin NX, which is a binned chip. While the NX's 25W power profile is promising... like I said earlier, Orin is a 460mm2 chip, almost 4 times bigger than OG TX1 and it will need to cut more than 1/3 of the cpu core and half the SMs to fit in a similar sized SoC.

StarTopic Future Nintendo Hardware & Technology Speculation & Discussion |ST| (Read the staff posts before commenting!)

Baba Yaga Hut

Chain Chomp

Boo

Ghost in the Sales

2010 experience points!

Octorok

#Team2024

Bob-omb

Chain Chomp

"[✄]. [✄]. [✄]. [✄]." -Microsoft

Warpstar Knight

Warpstar Knight

2010 experience points!

Shriekbat

Like Like

Shriekbat

Like Like

Shriekbat

Gravy Boat Captain

#Team2024

Bob-omb

Boo

Manakete

Shriekbat

Warpstar Knight

Baba Yaga Hut

Like Like

Koopa

Like Like

Kremling

Bob-omb

Like Like

Great Player Onizuka

Like Like

Bob-omb

Like Like

Bob-omb

Warpstar Knight

Bob-omb

Paratroopa

Like Like

Like Like

Like Like

Warpstar Knight

Like Like

Follow me at @angelpagecdlmhq

Like Like

The best is yet to come...

Warpstar Knight

Bob-omb