StarTopic Future Nintendo Hardware & Technology Speculation & Discussion |ST| (New Staff Post, Please read)

Shoulder · Friday at 9:48 AM

Aufhebung RPG said:
I would agree with this, provided that both Aonuma and Fujibayashi are no longer in charge of the Legend of Zelda follow-up.

I'd just caution that a return to a realistic style usually means they don't innovate further with their gameplay mechanics, and I'd be merciless in criticizing how much TP relies on OOT for its gameplay paths, which I consider to be nothing short of OOT 2.0.

A realistic style doesn't necessarily mean gameplay mechanics won't or can't be innovated, since one does not relate to the other. At least Twilight Princess had some interesting types of attacks you learned from Skeleton Link instead of just Hit, Hit, Bam, Bam like what we more or less have with BOTW/TOTK. Yeah, we have the Arrow Time feature, but it's more or less similar to the Rolling mechanic in Wind Waker, just tweaked.

I find nothing innovative in BOTW/TOTK's combat quite frankly, and it's actually a step back compared to Skyward Sword IMO. But in fairness, it was never the main draw for the game in the first place.

If Nintendo really wants to innovate the combat mechanics though (in a post-motion controlled world), maybe they should take some inspiration from Ninja Gaiden, Sekiro, or even go back to the PSX days with Bushido Blade, which I think was rather ahead of its time. Zelda's combat has always been secondary, or even tertiary to the rest of the game, exception being Skyward Sword.

OoT/MM's innovation was the utilization of Z-Targeting, which simply made swordplay easier, and less of a crunch to deal with. It was simple, yet it worked, and that was all that mattered. And it's been the template for all 3D zelda titles since, even BOTW/TOTK.

Aufhebung RPG · Friday at 10:04 AM

WonderLuigi · Friday at 10:07 AM

Shoulder said:
maybe they should take some inspiration from Ninja Gaiden, Sekiro, or even go back to the PSX days with Bushido Blade,

What... Zelda isn't an action game

Shoulder · Friday at 10:11 AM

WonderLuigi said:
What... Zelda isn't an action game

It is an action adventure game though...

Aufhebung RPG · Friday at 10:13 AM

Shoulder said:
It is an action adventure game though...

Moving to the software thread discussion.

WonderLuigi · Friday at 10:13 AM

Shoulder said:
It is an action adventure game though...

Yeah and they should improve on what they have instead of trying to be something they're not. Flurry rush needs a tweak to actually work, for example. Enemy variety would go a long way to make combat more interesting (as if was in WW and SS), but taking elements from Sekiro of all games and taking away even more from the Zelda identity is not the way to do it. They have a working framework that just needs improvement.

Edit: anyway this is off topic, sorry.

Sin Drive · Friday at 10:21 AM

Shoulder said:
Zelda's combat has always been secondary, or even tertiary to the rest of the game, exception being Skyward Sword.

OoT/MM's innovation was the utilization of Z-Targeting, which simply made swordplay easier, and less of a crunch to deal with. It was simple, yet it worked, and that was all that mattered. And it's been the template for all 3D zelda titles since, even BOTW/TOTK.

Even older entries like Oot/MM had a more complex combat system(though not totally necessary) by having directional slashes and WW/TP basically had combat arts to supplement. BOTW/TOTK took a lot of steps back in combat, which I get it's not the main focus but I'd like them to look back at what they had achieved before.

BDGAME · Friday at 10:58 AM

I like to discuss again the possibilities of T239 became stronger than RTX 2050 in the end. Look at those mobile GPUs for a comparison:

Tegra T239: 1536 Cuda Cores, ?? TMUs, ?? ROPs, 12 SM, 48 Tensor Cores, 12 RT Cores, ?? KB of L1, 1~4MB of L2 Cache and 3.8 TFLOPS on FP32.
RTX 2050: 2048 Cuda Cores, 64 TMUs. 32 ROPs, 16 SM, 64 Tensor Cores, 32 RT Cores, 34 KB of L1, 2MB of L2 and 5.1 TFLOPS on FP32.
RTX 3070: 5120 Cuda Cores, 160TMUs. 80 ROPs, 40 SM, 160 Tensor Cores, 40 RT Cores, 128 KB of L1, 4 MB of L2 Cache and 15.97 TFLOPS on FP32.
RTX 4070: 4608 Cuda Cores, 144 TMUs. 48 ROPs, 36 SM, 144 Tensor Cores, 36 RT Cores, 128 KB of L1, 32 MB of L2 Cache and 15.62 TFLOPS on FP32.

We know that RTX 2050 looks better than T239 with double the Cuda and a bigger Tflop, but it has his own advantages too. First is the amount of memory, only 4 GB vs 12 on T239, the second is the bandwidth, with only 96 GB/s what is less than the 120 GB/s of T239. Plus T239 is a hybrid chip, make in ampere but at 4NM like the ADAs, while RTX 2050 is a pure Ampere at 8NM.

I put above the RTX 3070 and 4070, both mobiles, because, if you compare these numbers, you can think that 3070 is stronger than 4070. But in that test, you can see that most part of games runs better on 4070 than 3070. And the bandwidth of 4070 is only 256 GB/s, what is a lot less than the 448.0 GB/s of 3070.

That way the only thing I'm make clear now is that more bandwidth will not make the GPU better.

But I really want to understand how a 4070 can beat the 3070 and if that will be possible to occurs with T239 vs RTX 2050. Can someone explain it?

redmutineer75 · Friday at 11:11 AM

BDGAME said:
I like to discuss again the possibilities of T239 became stronger than RTX 2050 in the end. Look at those mobile GPUs for a comparison:

Tegra T239: 1536 Cuda Cores, ?? TMUs, ?? ROPs, 12 SM, 48 Tensor Cores, 12 RT Cores, ?? KB of L1, 1~4MB of L2 Cache and 3.8 TFLOPS on FP32.

RTX 2050: 2048 Cuda Cores, 64 TMUs. 32 ROPs, 16 SM, 64 Tensor Cores, 32 RT Cores, 34 KB of L1, 2MB of L2 and 5.1 TFLOPS on FP32.

RTX 3070: 5120 Cuda Cores, 160TMUs. 80 ROPs, 40 SM, 160 Tensor Cores, 40 RT Cores, 128 KB of L1, 4 MB of L2 Cache and 15.97 TFLOPS on FP32.

RTX 4070: 4608 Cuda Cores, 144 TMUs. 48 ROPs, 36 SM, 144 Tensor Cores, 36 RT Cores, 128 KB of L1, 32 MB of L2 Cache and 15.62 TFLOPS on FP32.

We know that RTX 2050 looks better than T239 with double the Cuda and a bigger Tflop, but it has his own advantages too. First is the amount of memory, only 4 GB vs 12 on T239, the second is the bandwidth, with only 96 GB/s what is less than the 120 GB/s of T239. Plus T239 is a hybrid chip, make in ampere but at 4NM like the ADAs, while RTX 2050 is a pure Ampere at 8NM.

I put above the RTX 3070 and 4070, both mobiles, because, if you compare these numbers, you can think that 3070 is stronger than 4070. But in that test, you can see that most part of games runs better on 4070 than 3070. And the bandwidth of 4070 is only 256 GB/s, what is a lot less than the 448.0 GB/s of 3070.

That way the only thing I'm make clear now is that more bandwidth will not make the GPU better.

But I really want to understand how a 4070 can beat the 3070 and if that will be possible to occurs with T239 vs RTX 2050. Can someone explain it?

Just to fill in some details, based on the rest of the Ampere GPUs, the GA10F in T239 should have 16 ROPs (16 per GPC) and 48 TMUs (4 per SM). Also, I think it's a reasonable assumption that it’ll have 128 KB per SM of L1$ (1.5 MB total), given that it's the amount in all Ampere gaming GPUs (i.e. not A100 or Orin) and in all Lovelace GPUs.

kvetcha · Friday at 11:16 AM

BDGAME said:
But I really want to understand how a 4070 can beat the 3070 and if that will be possible to occurs with T239 vs RTX 2050. Can someone explain it?

RTX 3070 Clocks:
Base Clock 1500 MHz
Boost Clock 1725 MHz

RTX 4070 Clocks:
Base Clock 1920 MHz
Boost Clock 2475 MHz

You're basically looking at higher frequencies achievable thanks to TSMC N4 vs Samsung 8nm. Which is one of the reasons we're all so curious about the node for T239.

Aufhebung RPG · Friday at 11:17 AM

BDGAME said:
I like to discuss again the possibilities of T239 became stronger than RTX 2050 in the end. Look at those mobile GPUs for a comparison:

Tegra T239: 1536 Cuda Cores, ?? TMUs, ?? ROPs, 12 SM, 48 Tensor Cores, 12 RT Cores, ?? KB of L1, 1~4MB of L2 Cache and 3.8 TFLOPS on FP32.

RTX 2050: 2048 Cuda Cores, 64 TMUs. 32 ROPs, 16 SM, 64 Tensor Cores, 32 RT Cores, 34 KB of L1, 2MB of L2 and 5.1 TFLOPS on FP32.

RTX 3070: 5120 Cuda Cores, 160TMUs. 80 ROPs, 40 SM, 160 Tensor Cores, 40 RT Cores, 128 KB of L1, 4 MB of L2 Cache and 15.97 TFLOPS on FP32.

RTX 4070: 4608 Cuda Cores, 144 TMUs. 48 ROPs, 36 SM, 144 Tensor Cores, 36 RT Cores, 128 KB of L1, 32 MB of L2 Cache and 15.62 TFLOPS on FP32.

We know that RTX 2050 looks better than T239 with double the Cuda and a bigger Tflop, but it has his own advantages too. First is the amount of memory, only 4 GB vs 12 on T239, the second is the bandwidth, with only 96 GB/s what is less than the 120 GB/s of T239. Plus T239 is a hybrid chip, make in ampere but at 4NM like the ADAs, while RTX 2050 is a pure Ampere at 8NM.

I put above the RTX 3070 and 4070, both mobiles, because, if you compare these numbers, you can think that 3070 is stronger than 4070. But in that test, you can see that most part of games runs better on 4070 than 3070. And the bandwidth of 4070 is only 256 GB/s, what is a lot less than the 448.0 GB/s of 3070.

That way the only thing I'm make clear now is that more bandwidth will not make the GPU better.

But I really want to understand how a 4070 can beat the 3070 and if that will be possible to occurs with T239 vs RTX 2050. Can someone explain it?

The amount of VRAM in the 2050 severely impacts how well it can actually play, and unless an 8GB version is available, we can't judge the 2050's true level of performance, and comparisons to the drake are only roughly comparable.

redmutineer75 · Friday at 11:20 AM

kvetcha said:
RTX 3070 Clocks:
Base Clock 1500 MHz
Boost Clock 1725 MHz

RTX 4070 Clocks:
Base Clock 1920 MHz
Boost Clock 2475 MHz

You're basically looking at higher frequencies achievable thanks to TSMC N4 vs Samsung 8nm. Which is one of the reasons we're all so curious about the node for T239.

I imagine the extra 32 MB of L2$ also helps.

Aufhebung RPG · Friday at 11:20 AM

kvetcha said:
RTX 3070 Clocks:
Base Clock 1500 MHz
Boost Clock 1725 MHz

RTX 4070 Clocks:
Base Clock 1920 MHz
Boost Clock 2475 MHz

You're basically looking at higher frequencies achievable thanks to TSMC N4 vs Samsung 8nm. Which is one of the reasons we're all so curious about the node for T239.

In fact even if the drake used a TSMC 4nm node it wouldn't be able to match the high clocks of the 2050 because low power consumption is a core goal for Nintendo, and the docking mode can't be too far away from the portable mode so it would still be around x2, so assuming that the portable mode is clocked at around 550mhz to 600mhz the docking mode would only be at 1.2ghz ( 1210mhz) or so.

LinkURL · Friday at 11:24 AM

Sin Drive said:
Even older entries like Oot/MM had a more complex combat system(though not totally necessary) by having directional slashes and WW/TP basically had combat arts to supplement. BOTW/TOTK took a lot of steps back in combat, which I get it's not the main focus but I'd like them to look back at what they had achieved before.

BotW and TotK increase the complexity of combat in other ways, but again off-topic

Dekuman · Friday at 11:50 AM

LoneRanger said:
An article from a very serious source like Nikkei will probably happen a few days (at most 2 weeks) before official HW announcement. Their sources are from factory/logistics so they are 95% accurate.

My understanding is Nikkei reporting is almost always 'controlled' sanctioned leaks by Nintendo. Japanese press often allow the company they are reporting on to review articles about them before going to print.

If Nikkei reports something, it's probably been vetted by Nintendo, even if the company isn't mentioned directly and they cite suppliers or something. Those suppliers probably don't want to piss off Nintendo either.

BDGAME · Friday at 12:04 PM

redmutineer75 said:
Just to fill in some details, based on the rest of the Ampere GPUs, the GA10F in T239 should have 16 ROPs (16 per GPC) and 48 TMUs (4 per SM). Also, I think it's a reasonable assumption that it’ll have 128 KB per SM of L1$ (1.5 MB total), given that it's the amount in all Ampere gaming GPUs (i.e. not A100 or Orin) and in all Lovelace GPUs.

So, the T239 can have a big L1 Cache (1.5MB vs 32Kb) and a big L2 Cache too? (4MB vs 2MB). That can help a lot the process in the end, no?

kvetcha said:
RTX 3070 Clocks:
Base Clock 1500 MHz
Boost Clock 1725 MHz

RTX 4070 Clocks:
Base Clock 1920 MHz
Boost Clock 2475 MHz

You're basically looking at higher frequencies achievable thanks to TSMC N4 vs Samsung 8nm. Which is one of the reasons we're all so curious about the node for T239.

I believe you put the PC GPU's clock here, not the mobile. The numbers for mobile are:

RTX 3070 Clocks:
Base Clock 1110 MHz
Boost Clock 1560 MHz

RTX 4070 Clocks:
Base Clock 1395 MHz
Boost Clock 1695 MHz

The difference on mobile is not that big, but for sure is one of the reasons why 4070 mobile beat the 3070 mobile.

But again, only that small difference on Clock can be enough?

Teal'c · Friday at 12:40 PM

LoneRanger said:
An article from a very serious source like Nikkei will probably happen a few days (at most 2 weeks) before official HW announcement. Their sources are from factory/logistics so they are 95% accurate.

What if the new anti-leak strategy also included countermeasures for these situations? Maybe I'm exaggerating but I feel like they took it seriously and I can't blame them.
Even Mochizuki isn't what he used to be and if I'm not mistaken (?) he had this kind of information

Mr Swine · Friday at 12:44 PM

Best case scenario is that the A78C has 8MB and the GPU 4MB if it’s on TSCM N4.

But since Nintendo has opted for LPDDR5X I don’t think both CPU/GPU will have that much memory.

It would really help having that much but I don’t think Nintendo wants to splurge that much money

oldpuck · Friday at 1:01 PM

BDGAME said:
I put above the RTX 3070 and 4070, both mobiles, because, if you compare these numbers, you can think that 3070 is stronger than 4070. But in that test, you can see that most part of games runs better on 4070 than 3070. And the bandwidth of 4070 is only 256 GB/s, what is a lot less than the 448.0 GB/s of 3070.

This isn't a good test for comparing GPU performance. It's not two different GPUs, it's two different laptops, from different manufacturers, with different CPUs, different firmware, possibly different OS configurations and different drivers as well. The only thing you can determine from this test is which is the better gaming laptop, not what's the better mobile GPU.

There are not many changes between Ampere and Ada (at least, according to the Nvidia whitepaper), and almost none of them would affect raster performance. If there are differences between the two architectures (all else being equal) the difference is caused by

Ada's different memory subsystem
Ada's higher clock speeds.

But because all else is never equal, there simply aren't good benchmarks out there to determine which is which.

BDGAME said:
That way the only thing I'm make clear now is that more bandwidth will not make the GPU better.

I agree that there is a limit to how much more bandwidth can help, there is a reason that Ada has less bandwidth. It's expensive, both in terms of price and electricity, so it was inevitable that it get cut back. Nvidia's solution (similar to, but less sophisticated than, AMDs) was to add a crapload of cache. So you can't compare bandwidth 1:1 between the two architectures.

BDGAME said:
I like to discuss again the possibilities of T239 became stronger than RTX 2050 in the end

T239 won't be stronger than RTX 2050. They're the same architecture, and RTX 2050 just has more of it. More cores, more ROPS, more TMUs. Probably faster clocks. Switch 2 will outperform a 2050 based gaming laptop, though. Not just because of More Memory (though that will help) but simply because of dedicated ports.

kvetcha · Friday at 1:05 PM

BDGAME said:
So, the T239 can have a big L1 Cache (1.5B vs 32Kb) and a big L2 Cache too? (4MB vs 2MB). That can help a lot the process in the end, no?

I believe you put the PC GPU's clock here, not the mobile. The numbers for mobile are:

RTX 3070 Clocks:
Base Clock 1110 MHz
Boost Clock 1560 MHz

RTX 4070 Clocks:
Base Clock 1395 MHz
Boost Clock 1695 MHz

The difference on mobile is not that big, but for sure is one of the reasons why 4070 mobile beat the 3070 mobile.

But again, only that small difference on Clock can be enough?

you’re right, I very much missed the ‘mobile’ part of the post.

That said, the 4070 has 10% fewer CUDA cores but a 25% higher base clock (and 8% higher boost), so it still probably maths out, especially with the aforementioned cache.

Steve · Friday at 1:13 PM

oldpuck said:
T239 won't be stronger than RTX 2050. They're the same architecture, and RTX 2050 just has more of it. More cores, more ROPS, more TMUs. Probably faster clocks. Switch 2 will outperform a 2050 based gaming laptop, though. Not just because of More Memory (though that will help) but simply because of dedicated ports.

It’s important to note that certain RTX 2050 laptops have achieved some impressive results, without proper optimisation.

Shoulder · Friday at 1:14 PM

Mr Swine said:
Best case scenario is that the A78C has 8MB and the GPU 4MB if it’s on TSCM N4.

But since Nintendo has opted for LPDDR5X I don’t think both CPU/GPU will have that much memory.

It would really help having that much but I don’t think Nintendo wants to splurge that much money

That got me thinking actually.

Nintendo, and Nvidia for that matter would want to balance memory bandwidth, quantity of memory, plus also the amount of die space the SoC accumulates on the substrate (I know the memory chips themselves are not directly part of the SoC. That's not what I'm getting at). And yet, we know the substrate is quite large relative to the die size of the SoC, plus there's I think a greater than 50% chance of T239 using TSMC 4N, meaning not only the SoC is much smaller, but it might also allow them to use a larger amount of cache compared to if it were on SEC8N that T234 is on.

Naturally, this would have to be decided before the SoC is taped out, but I'm curious given more Cache would take up more space on the die itself, if with a smaller SoC (on say TSMC 4N) would allow them to design the chip to have more cache as a result? Probably not realistically since there is also cost to consider for the SoC, plus if there would be a real advantage in using the full amount of L3 Cache on the CPU the A78C would allow, plus also the Cache for the GPU, or if it's more diminished returns that could be better served elsewhere.

So for example, say more cache did allow less of a bottleneck, but it also resulted in a more costly chip, Nintendo/Nvidia might instead opt for less Cache, but higher clockspeeds to compensate, especially in docked mode. Or perhaps decide to use two 8GB 64-bit memory modules, but maybe that would be more costly than more Cache on the SoC? Trade-offs for sure, but something I was thinking about.

Someone like oldpuck, or Dakill could probably go more into detail concerning the balancing act between more physical memory, more Cache, higher clockspeeds, general costs, etc, and their respective trade-offs from one another.

AlwaysOnDeck · Friday at 1:34 PM

As someone who's been saying "2025" since 2020, I'm gonna out myself out there and say September 2025 is when this thing releases.

Darknut85 · Friday at 1:38 PM

WonderLuigi said:
Yeah and they should improve on what they have instead of trying to be something they're not. Flurry rush needs a tweak to actually work, for example.

I think instead of sticking too much with Botw‘s combat, world or artstyle, they should try something new (or go back to pre-Botw combat + Botw‘s physics, which still would be better).

Darknut85 · Friday at 1:39 PM

AlwaysOnDeck said:
As someone who's been saying "2025" since 2020, I'm gonna out myself out there and say September 2025 is when this thing releases.

I go even further, we won’t see the system until October 2026, with a March launch.

Aufhebung RPG · Friday at 1:43 PM

Here comes another one who doesn't know what he's talking about.

Concernt · Friday at 1:55 PM

Aufhebung RPG said:
Here comes another one who doesn't know what he's talking about.

I believe they are joking. We have what appears to be very solid evidence of production going ahead... "soon". Less than a year away. That doesn't mean launch within 12 months, though. Most people here know that and predictions are being founded on that, broadly speaking.

Tioc · Friday at 1:58 PM

with how rock-solid everybody here seems to feel about the spec leaks, i'm fairly comfortable just waiting for the launch, since i know nintendo will have plenty of games to show off the hardware

it's good to feel optimistic

kasparov77 · Friday at 2:09 PM

Darknut85 said:
I go even further, we won’t see the system until October 2026, with a March launch.

personally, im thinking june 2038
i consider myself optimistic

Giancarlo · Friday at 2:13 PM

Lancelot said:
I'd argue that will no longer be the case with the device in hand, it can handle the artstyle and innovate as the last two did just fine. They're not mutually exclusive.

if you want a game that can have a realistic physics, you can do that in a realistic art style

Darknut85 said:
I go even further, we won’t see the system until October 2026, with a March launch.

we gonna play Switch until the end of time, theres no Switch sucessor

Lancelot · Friday at 2:17 PM

Tioc said:
with how rock-solid everybody here seems to feel about the spec leaks, i'm fairly comfortable just waiting for the launch, since i know nintendo will have plenty of games to show off the hardware

it's good to feel optimistic

It's hard not to be rock solid about them, especially regarding the memory interface and storage amount. Those were found in public shipping lists... Doesn't get any more legit than that.

BDGAME · Friday at 2:29 PM

oldpuck said:
T239 won't be stronger than RTX 2050. They're the same architecture, and RTX 2050 just has more of it. More cores, more ROPS, more TMUs. Probably faster clocks. Switch 2 will outperform a 2050 based gaming laptop, though. Not just because of More Memory (though that will help) but simply because of dedicated ports.

The T239 is not exactly the same as RTX 2050. The 2050 is a pure AMPERE, while T239 is a hybrid architecture, with 4NM and a crapload of cache.

I hope they can make a good optimization to bring the best of what T239 can offer.

Now change a little of subject, is RTX 2050 stronger than Series S?

Aufhebung RPG · Friday at 2:35 PM

BDGAME said:
The T239 is not exactly the same as RTX 2050. The 2050 is a pure AMPERE, while T239 is a hybrid architecture, with 4NM and a crapload of cache.

I hope they can make a good optimization to bring the best of what T239 can offer.

Now change a little of subject, is RTX 2050 stronger than Series S?

Drake isn't some hybrid architecture, it's the gpu for the main body of the Ampere architecture, I don't know why there's always rumors of so-called hybrid architectures, and at the moment there's no evidence that drake has anything more new from Ada other than a port of Ada's power-saving tech and possibly 4N.

Tioc · Friday at 2:40 PM

Lancelot said:
It's hard not to be rock solid about them, especially regarding the memory interface and storage amount. Those were found in public shipping lists... Doesn't get any more legit than that.

hence my point! the leaked specs were all i could have asked for lol

Hermii · Friday at 2:42 PM

BDGAME said:
The T239 is not exactly the same as RTX 2050. The 2050 is a pure AMPERE, while T239 is a hybrid architecture, with 4NM and a crapload of cache.

I hope they can make a good optimization to bring the best of what T239 can offer.

Now change a little of subject, is RTX 2050 stronger than Series S?

There's no concrete evidence of a crapload of cache. It's likely the same as regular ampere.

Lancelot · Friday at 2:44 PM

BDGAME said:
The T239 is not exactly the same as RTX 2050. The 2050 is a pure AMPERE, while T239 is a hybrid architecture, with 4NM and a crapload of cache.

I hope they can make a good optimization to bring the best of what T239 can offer.

Now change a little of subject, is RTX 2050 stronger than Series S?

It's not hybrid per see, it was simply taped out later than a regular Ampere chip so it's carrying some features than it wouldn't otherwise, like power gating seemingly. It's still an Ampere GPU first and foremost with what seems like a regular amount of cache for its architecture.

oldpuck · Friday at 2:54 PM

Shoulder said:
Someone like oldpuck, or Dakill could probably go more into detail concerning the balancing act between more physical memory, more Cache, higher clockspeeds, general costs, etc, and their respective trade-offs from one another.

I touched on this a little bit the other day, but it's an interesting topic.

In RAM you need to keep all of your assets, all of the 3D models and textures you're going to use to draw a scene. But, ideally, you actually only use all these once a frame. Instead, you take your 3D scene, and generate a bunch of flat images, called buffers, and then do your complex shading/lighting/effects work using those flat images.

Here, I did a (bad) diagram, outlining the process

What you can see here is the way that RAM, bus, and GPU performance all interact. All the arrows are copies over the memory bus. As quality of asset go up, the more RAM you need to store them. As resolution goes up, the bigger those buffers and the more time spent on the memory bus copying them. Also, longer that each shading pass takes. And the more shading passes, the more elaborate the effects, the more copies, more buffers, and more computation that it takes.

In a well balanced design no single part of this is more likely to be a bottleneck over the other. In any single game, you might discover that a specific part of this process is a bottleneck, but over all the high performance games you're getting on the system, you want to see none of these stick out.

But not only do the prices of these each individual components vary, they also have price curves - just like performance curves, sometimes doubling any section of this diagram costs more (or less) than doubling the cost. Here are some considerations.

For the GPU there are two paths to more performance. More cores and faster clocks

More cores	Faster clocks
Power efficient, linear power curve (that's good)	Power inefficient, quadratic power curve (that's bad)
Heat efficient, same thing	Heat inefficient, same thing
Expensive at first	Cheap at first
Costs rise at a steady rate	Costs rise at rapidly increasing rate
No real cap except $$$	Hard limit before chip just won't function
Makes chip bigger	Chip stays the same size

For the memory bus there are 2 paths to faster performance. A bigger bus, and more cache

Bigger bus	More cache
Limited by memory standards	Unlimited, except by price
Makes the memory you use more expensive	Makes SOC more expensive
Memory will get cheaper over time, due to node shrinks	Node shrinks don't affect cache very much, if at all.
Power hungry	Super efficient
Makes every copy faster	Only improves some copies
Bad latency, even when latency is "low"	Latency is nearly zero

For RAM, there are two ways to make RAM bigger. Add more modules, or make the modules bigger

More modules	Bigger Modules
For cheap RAM, more modules are generally cheaper.	For expensive RAM, bigger modules are generally cheaper
Increases memory bandwidth, but only if you add more memory controllers to the SOC	Doesn't increase memory bandwidth
Takes up lots of space	Takes up less space

Okay, this post is long enough. You can make similar diagrams for the CPU, but adding it into the mix here would complicate things a lot, so I skipped it for now. And I'll probably make a post about how this points to the decisions that Nintendo/Nvidia seem to have made. But that will have to wait until after my afternoon coffee break.

oldpuck · Friday at 4:07 PM

Part II: What do these constraints mean about Switch 2?

Ordinarily, it might mean a lot. Nintendo - with the help of their tech partner - has to balance all of these aspects to get a good, functional console. It needs to perform well, it needs to be cost effective, and it needs to fit into Nintendo's form factor.

But unlike the ATI designs that Nintendo used in the past, Nvidia has done most of the work. Because Nintendo has gone with a highly mature design (Ampere), Nvidia has already balanced cost and performance for them. They've even done it for low power devices - it shouldn't be surprising that Thraktor's power efficient clock numbers are in the same range as RTX 30 mobile cards.

Side note: Realized today that the Ti models of the RTX 30 mobile cards actually decrease clock speed in order to increase core count, resulting in the same total TFLOPS. Implying that, all else being equal, more cores are better than higher clock speeds for perf. Good news for us

On the GPU side, there is really only one area that Nintendo had to do this balancing act themselves, and that's the RAM. Nintendo chose expensive RAM (128-bit) likely because cheaper RAM (64-bit) would have required double the number of modules and a larger chip (with more memory controllers) in order to get up the bandwidth they needed. That put them in a place of either custom ordering RAM or buying larger modules - because 128 bit RAM is for Premium Devices not running at just 8GB. Nintendo went with the big RAM, probably because the performance win was high (RT loves more RAM, see: all the Series S games that don't have RT) but the cost impact was small.

On the CPU side, we have less insight into Nintendo's choices, but they also need to run their own tests, because Nvidia doesn't have lots of data on AAA games running on ARM CPUs. The RAM situation is dictated by the GPU situation, so it's really "how much cache is a win?" Nintendo could keep cache low on the CPU to save money, as they've been pretty profligate elsewhere. But cache is very power efficient, because it keeps the memory bus idle. Nintendo may be willing to spend the pennies to keep "performance per watt" (and thus "performance per minute of battery life") maxed out.

Steve · Friday at 4:24 PM

oldpuck said:
Part II: What do these constraints mean about Switch 2?

Ordinarily, it might mean a lot. Nintendo - with the help of their tech partner - has to balance all of these aspects to get a good, functional console. It needs to perform well, it needs to be cost effective, and it needs to fit into Nintendo's form factor.

But unlike the ATI designs that Nintendo used in the past, Nvidia has done most of the work. Because Nintendo has gone with a highly mature design (Ampere), Nvidia has already balanced cost and performance for them. They've even done it for low power devices - it shouldn't be surprising that Thraktor's power efficient clock numbers are in the same range as RTX 30 mobile cards.

Side note: Realized today that the Ti models of the RTX 30 mobile cards actually decrease clock speed in order to increase core count, resulting in the same total TFLOPS. Implying that, all else being equal, more cores are better than higher clock speeds for perf. Good news for us

On the GPU side, there is really only one area that Nintendo had to do this balancing act themselves, and that's the RAM. Nintendo chose expensive RAM (128-bit) likely because cheaper RAM (64-bit) would have required double the number of modules and a larger chip (with more memory controllers) in order to get up the bandwidth they needed. That put them in a place of either custom ordering RAM or buying larger modules - because 128 bit RAM is for Premium Devices not running at just 8GB. Nintendo went with the big RAM, probably because the performance win was high (RT loves more RAM, see: all the Series S games that don't have RT) but the cost impact was small.

On the CPU side, we have less insight into Nintendo's choices, but they also need to run their own tests, because Nvidia doesn't have lots of data on AAA games running on ARM CPUs. The RAM situation is dictated by the GPU situation, so it's really "how much cache is a win?" Nintendo could keep cache low on the CPU to save money, as they've been pretty profligate elsewhere. But cache is very power efficient, because it keeps the memory bus idle. Nintendo may be willing to spend the pennies to keep "performance per watt" (and thus "performance per minute of battery life") maxed out.

I’m in the belief that the type of ram and also the quality of ram would be the biggest indicator if Nintendo are either cheapening out or the Switch 2 would under delivery.

But having 12GB of big boy ram has made me more confident of the Switch 2 power, especially for third party, indie dev and Nintendo dev.

Overall I’m quite confident we’ll get a 4NM TSMC.

Like for me, all I need for the Switch 2 is a capable device for Indies and first party dev, and all the info we’ve received I think we’re getting that.

But I’m always going to be slightly pessimistic with certain things, mostly third party ports.

ILikeFeet · Friday at 5:10 PM

oldpuck said:
On the CPU side, we have less insight into Nintendo's choices, but they also need to run their own tests, because Nvidia doesn't have lots of data on AAA games running on ARM CPUs. The RAM situation is dictated by the GPU situation, so it's really "how much cache is a win?" Nintendo could keep cache low on the CPU to save money, as they've been pretty profligate elsewhere. But cache is very power efficient, because it keeps the memory bus idle. Nintendo may be willing to spend the pennies to keep "performance per watt" (and thus "performance per minute of battery life") maxed out.

you just made me realize a great test Nintendo could use with one of their games

in Tears of the Kingdom, you can come across moments where the monster control crew (a bunch of NPCs) fight against a stronghold of enemies. it's like 10 hylians vs 15 or so monsters. wonder how many actors the switch can handle before buckling, because I would be certain that Drake could handle almost 100x more

redmutineer75 · Friday at 5:34 PM

ILikeFeet said:
you just made me realize a great test Nintendo could use with one of their games

in Tears of the Kingdom, you can come across moments where the monster control crew (a bunch of NPCs) fight against a stronghold of enemies. it's like 10 hylians vs 15 or so monsters. wonder how many actors the switch can handle before buckling, because I would be certain that Drake could handle almost 100x more

This kinda shit has been part of the dream Zelda game exclusively in my head and a Google Doc that I've had for the past year or so. Instead of traditional dungeons, you'd have multi-part sieges with various NPCs. I'll probably share such ideas on another thread once I've written them all down.

gabeplays · Friday at 6:04 PM

Steve said:
I’m in the belief that the type of ram and also the quality of ram would be the biggest indicator if Nintendo are either cheapening out or the Switch 2 would under delivery.

But having 12GB of big boy ram has made me more confident of the Switch 2 power, especially for third party, indie dev and Nintendo dev.

Overall I’m quite confident we’ll get a 4NM TSMC.

Like for me, all I need for the Switch 2 is a capable device for Indies and first party dev, and all the info we’ve received I think we’re getting that.

But I’m always going to be slightly pessimistic with certain things, mostly third party ports.

It's funny. There's really not a whole lot, from a power aspect, that I could ask any more from the Switch 2.

Sure, we can dream about a fantasy Switch 2 that's as strong as a PS5, but in 2025 selling at $400-450ish dollars? This is really shaping up to be a great console. It's fun watching the decisions Nintendo makes now that they're not in dire straights like they were in the Wii U era. They have all the money and time in the world to make the successor they want and nothing that's been revealed has been hugely disappointing.

About the only thing I hope for is as solid as an LCD screen as they can get since they're not going for an OLED, but that's really it. I'm excited to play with this thing next year

SebastianRooks · Friday at 6:06 PM

t-239 is embedded in a closed highly optimized system, pretty sure its 4N, a few things like ram bandwith are even better, developer learn to use every little % of power, the whole dlss and rt cores from ampere, in the end after a few years, i think we will be surprised what this little chip has to offer, looks well balanced without a bottleneck and its custom developed by nvidia and nintendo, iam optimistic that it could produce even better graphics than rtx2050 when looking at the whole system vs rtx2050 with windows

SpikeTheSnake · Friday at 6:11 PM

Steve said:
I’m in the belief that the type of ram and also the quality of ram would be the biggest indicator if Nintendo are either cheapening out or the Switch 2 would under delivery.

But having 12GB of big boy ram has made me more confident of the Switch 2 power, especially for third party, indie dev and Nintendo dev.

Overall I’m quite confident we’ll get a 4NM TSMC.

Like for me, all I need for the Switch 2 is a capable device for Indies and first party dev, and all the info we’ve received I think we’re getting that.

But I’m always going to be slightly pessimistic with certain things, mostly third party ports.

I missed a few pages in this thread since the direct… Did we get any new evidence that they’re going for N4?

Dakhil · Friday at 6:22 PM

Steve said:
I’m in the belief that the type of ram and also the quality of ram would be the biggest indicator if Nintendo are either cheapening out or the Switch 2 would under delivery.

But having 12GB of big boy ram has made me more confident of the Switch 2 power, especially for third party, indie dev and Nintendo dev.

I don't believe Nintendo has cheapened out on RAM since after the Nintendo 64.

So I don't think RAM's a good indicator of if Nintendo cheapens out or not for the Nintendo Switch's successor.

If the Nintendo Switch's any indication, the display and/or the internal flash storage are probably better indicators of if Nintendo cheapens out on the Nintendo Switch's successor.

And so far, the Nintendo Switch's successor is using 256 GB UFS 3.1 for the internal flash storage.

I can see Nintendo cheapen out on the display like with the Nintendo Switch for the Nintendo Switch's successor.

Gotdatmoneyy · Friday at 6:41 PM

SpikeTheSnake said:
I missed a few pages in this thread since the direct… Did we get any new evidence that they’re going for N4?

No. And we probably wont until someone leaks clocks or it releases and we get a die shot.

Dakhil · Friday at 7:19 PM

Lancelot · Friday at 7:30 PM

Dakhil said:

They're comparing encoding and decoding performance, interesting... Though not shocking, I recall that's why the cloud ports are surprisingly better than one would think.

Steve · Friday at 7:36 PM

HDMI 2.1 for 120fps seems kinda silly the longer i think about it.
Like... How many developers would actually and try hitting 120fps

Since i think i have never played a 120fps game on the PS5, maybe some indies? But i mostly play those on the Switch.

Like... Overtime i have started to prefer a stable performance and a nice image quality, than a high and unstable one.

CeramicPigeon · Friday at 7:54 PM

Steve said:
HDMI 2.1 for 120fps seems kinda silly the longer i think about it.
Like... How many developers would actually and try hitting 120fps

Since i think i have never played a 120fps game on the PS5, maybe some indies? But i mostly play those on the Switch.

Like... Overtime i have started to prefer a stable performance and a nice image quality, than a high and unstable one.

Games like Overwatch 2 and Fortnite run at 120fps on PS5 and Xbox Series X (even the Series S can do it). It's fairly consistent I'd say but I think it'd be a very tall order for Switch, but not sure.

StarTopic Future Nintendo Hardware & Technology Speculation & Discussion |ST| (New Staff Post, Please read)

Koopa

History is tragedy becoming comedy

eegee

Koopa

History is tragedy becoming comedy

eegee

Cappy

"The Baby"

Helping me find my gun

hoopy frood

History is tragedy becoming comedy

Helping me find my gun

History is tragedy becoming comedy

Bob-omb

Gerudo

"The Baby"

Boo

Like Like

Koopa

hoopy frood

Tingle

Koopa

Moblin

#TeamRemake

#TeamRemake

History is tragedy becoming comedy

Opportunistic Optimistist

Rattata

Bob-omb

the Baby

Koopa

"The Baby"

History is tragedy becoming comedy

Rattata

Pikmin

Koopa

Koopa

Koopa

Tingle

Fox Brigade

Helping me find my gun

Tektite

Rattata

Rattata

2010 experience points!

Moblin

2010 experience points!

Koopa

Tingle

You lost the game