StarTopic Future Nintendo Hardware & Technology Speculation & Discussion |ST| (Read the staff posts before commenting!)

PandaAndino · Nov 3, 2023

the video was make with no new inside or consideration of 12gb of ram or dedicated platform vs pc. So it really is just a normal prediction like anyone can have.

Hermii · Nov 3, 2023

Lancelot said:
It was? Might be mixing them up, I do know NVN2 pretty much stated 660mhz (while on 8nm, to note).

No fucking way it's going to be that on 8nm.

The leak did state 8nm (or more accurately Samsung, it didn't specify node) in one place, but that could easily have been a copy paste from Orin.

Lancelot · Nov 3, 2023

Hermii said:
No fucking way it's going to be that on 8nm.

The leak did state 8nm (or more accurately Samsung, it didn't specify node) in one place, but that could easily have been a copy paste from Orin.

Of course not, lol. Just saying 660mhz did show up in there under 8nm.

RennanNT · Nov 3, 2023

Goodtwin said:
A big miss with the DF test was ignoring power draw. How many watts was the RTX2050 pulling when clocked at 750Mhz? If its over 20 watts, that should have been a red flag that T239 cannot be on 8nm. If its not on 8nm then it is very likely 4N and 750Mhz is closer to the portable profile rather than docked. If the docked profile does end up being 1.1Ghz, that would mean those tensor cores will be clocked 32% higher than in their test.

2048 cores at 750 MHz is 3.07TF and 1536 cores at the speculated 1.1GHz is 3.38TF. That's just a 10% increase in performance.

But is enough to get within the 16.6ms, if that 18ms is accurate.

Kise Ryota · Nov 3, 2023

Lancelot said:
assuming we're talking about Thraktor's 660mhz as the base

Thraktor is expecting something between 500MHz/1GHz and 550MHz/1.1GHz

Hermii · Nov 3, 2023

Lancelot said:
Of course not, lol. Just saying 660mhz did show up in there under 8nm.

Yea, Im just saying if a miracle happens and Nintendo somehow managed to make 12sm work on 8nm (which would be impressive for all the wrong reasons), Im expecting closer to Switch 1 clocks.

Brofield · Nov 3, 2023

darthdiablo said:
Eurogamer article on the DF video, adding a few thoughts

Inside Nvidia's new hardware for Switch 2: what is the T239 processor?

What new hardware has Nvidia cooked up for Nintendo? Digital Foundry puts together all available info on the cutting-edge T239 processor.

www.eurogamer.net

Based on everything I've learned about T239 over the months, I think my biggest surprise is how ambitious the hardware is. Given the chance to re-architect a new, games-centric processor, Nvidia could have followed the route it laid down with its GTX 16-series graphics cards by stripping out machine learning and RT hardware, or paring them back to the minimum with cut-price alternatives (like the way AMD integrated hardware RT into RDNA 2 with an emphasis on the lowest silicon cost possible). Instead, the evidence suggests that Nvidia and Nintendo have agreed on a design that arguably over-delivers on next-gen features, while the File Decompression Engine (FDE) points to quality of life additions to the silicon that a firmly orientated around a great gaming experience. Could this be the most forward-looking hardware design from Nintendo since the GameCube?

Be still, my beating heart. Leadbetter sure knows how to hype up T239

Lancelot · Nov 3, 2023

Brofield said:
Be still, my beating heart. Leadbetter sure knows how to hype up T239

I'd like to quote this part too, for those wondering how the lack of memory impacted things for this 2050...

I couldn't get The Matrix Awakens running owing to the 4GB memory limitation on the RTX 2050 and it seems that the demo requires around 5.6GB at the minimum. However, the fully enabled Fortnite UE5 experience using Lumen, Nanite and Virtual Shadow Maps does work. Indeed, unlike the console versions, we do actually have access to hardware accelerated ray tracing in the PC version and yes, it does run on the RTX 2050 laptop. I tested hardware RT Lumen, its software variant, and I also tested at native 720p and several DLSS variants, all resolving to a 1080p output. For the record, software RT with DLSS performance mode (540p in, 1080p out) is the winner but between best and worst results here, there's just a 9.5 percent difference, and in fps terms that's just 2.82fps.

My guess is that the Matrix Awakens demo for Switch 2 based on target hardware probably is real, but Epic must be doing a lot of work in making Unreal Engine 5's key features run well on the new Nintendo machine, simply because The Matrix Awakens is a lot heavier on the GPU than Fortnite. That being the case, just running PC code on a low power Ampere GPU isn't likely to be that revealing. Still, it does run. You can use hardware RT. DLSS is helpful.

ItWasMeantToBe19 · Nov 3, 2023

Kise Ryota said:
Thraktor is expecting something between 500MHz/1GHz and 550MHz/1.1GHz

Are there estimated teraflops and power consumption figures for 500 MHz, 1 GHz, 1.5 GHz, and 2 GHz?

SoldierDelta · Nov 3, 2023

Lancelot said:
I'd like to quote this part too, for those wondering how the lack of memory impacted things for this 2050...

Do we know how much VRAM the T239 has? If it's 8GB, that's pretty damn good overall. Running RE4R with a 4070 Laptop GPU was already pretty snazzy, even if the RT on that game is considered "Eh" in comparison to a lot of other games.

Kise Ryota · Nov 3, 2023

BTW, just I remind of what Thraktor said about those clocks/power draw LiC found

Hidden content is only available for registered users. Sharing it outside of Famiboards is subject to moderation.

Also, great post from oldpuck here and LiC's quote of said post here

Lancelot · Nov 3, 2023

SoldierDelta said:
Do we know how much VRAM the T239 has? If it's 8GB, that's pretty damn good overall. Running RE4R with a 4070 Laptop GPU was already pretty snazzy, even if the RT on that game is considered "Eh" in comparison to a lot of other games.

It's hard to say, VRAM in consoles is always shared with system memory. I'd say the absolute max for a 12 GB Switch 2 would be 8 GB, yeah.

Brofield · Nov 3, 2023

Lancelot said:
I'd like to quote this part too, for those wondering how the lack of memory impacted things for this 2050...

Coming from a place wholly inexperienced with building laptops/desktops, I'm guessing there was no real way to swap out the RAM from a pre-built Dell Vostro to bump it up from 4GB to 6GB, let alone 8GB?

OctoSplattack · Nov 3, 2023

lemonfresh said:
I have a question have Nintendo ever done a reveal close to a investors meeting before? It would be interesting to see the history of the investor meetings and what they announced at them

With the knowledge that financials briefings are done in February, late April/early May, late July/early August, and late October/early November...

Switch: NX announcement was March 2015 -- stated that it will be a new gaming platform, with news to follow in 2016
Wii U: Announced April 2011, a day before the April investor's meeting -- stated that presentation will take place at E3 with playable units, system will release in after fiscal year 2011-2012
3DS: Announced March 2010 -- stated that presentation will take place at E3 with playable units, system will release by the end of fiscal year 2010-2011, will be backward compatible with DS and DSi libraries

Not really worth going more than a decade back, but the Wii was a E3 reveal, and the DS was announced in January 2004 (technically they announced it the prior November with the late 2004 release stated then, but they didn't give a name/codename at that meeting).

Strollin Stu · Nov 3, 2023

Lancelot said:
Excuse me but, what were your expectations at the first place? This is a close approximation to the chip in the console lacking memory, console optimization and using Windows... DF himself stated this is all on the lower end of the spectrum, since T239 is a customized chip for a reason, the very fact it's already performing this incredibly well does all but lower expectations. And keep in mind this is under handheld clocks, docked should perform very similar to the stock 2050 out there (if not better).

Mainly in regards with 3rd party titles. My perspective basically changed from "some Nintendo games and some third party titles should be able to hit that golden benchmark" to "probably only Nintendo games are going to be able to hit it". It's not much of a shift in perspective but I can't really imagine we'll be seeing any games from people outside of Nintendo hitting 4k60 unless they're specifically designed for the Switch 2 itself, which I don't really think anything will be unless Capcom decides to make a Switch-centric Monster Hunter again like they did with Rise (which seems unlikely given the rumors surrounding MH6).

SoldierDelta · Nov 3, 2023

Lancelot said:
It's hard to say, VRAM in consoles is always shared with system memory. I'd say the absolute max for a 12 GB Switch 2 would be 8 GB, yeah.

Coolio. Sounds about right.
8GB seems to be the base standard for modern games, however you usually have to trade settings in order to get it running. I can run Deathloop at absolute cranked max settings but cannot enable RT no matter what unless I turn down other settings or suffer the consequences of insane stuttering.

darthdiablo · Nov 3, 2023

Brofield said:
Coming from a place wholly inexperienced with building laptops/desktops, I'm guessing there was no real way to swap out the RAM from a pre-built Dell Vostro to bump it up from 4GB to 6GB, let alone 8GB?

Easier to do with desktops than laptops. I’ve never even tried to open up laptops except change to new battery.

This crossed my mind too, wondering how much work it would be to swap the RAM to 8gb

Pokemaniac · Nov 3, 2023

SoldierDelta said:
Do we know how much VRAM the T239 has? If it's 8GB, that's pretty damn good overall. Running RE4R with a 4070 Laptop GPU was already pretty snazzy, even if the RT on that game is considered "Eh" in comparison to a lot of other games.

It'll be a unified memory pool. The CPU and GPU will be sharing the entire main system RAM.

Ghostsonplanets · Nov 3, 2023

Goodtwin said:
A big miss with the DF test was ignoring power draw. How many watts was the RTX2050 pulling when clocked at 750Mhz? If its over 20 watts, that should have been a red flag that T239 cannot be on 8nm. If its not on 8nm then it is very likely 4N and 750Mhz is closer to the portable profile rather than docked. If the docked profile does end up being 1.1Ghz, that would mean those tensor cores will be clocked 32% higher than in their test.

I agree. The lack of power data for the RTX 2050@750MHz was very disappointing. I'm also perplexed by this:

DF Rich:
On the clocks question, Nvidia provides a tool for calculating power consumption on the T234 and it suggests that the GPU is most efficient at 540MHz, a good candidate for handheld clocks. Efficiency drops alarmingly the faster you clock it, and the more power required, the more heat is generated. I'd be surprised if docked clocks exceeded 1GHz and it could be a fair degree lower. 1.0GHz also looks like the sweet spot for CPU clocks too. However, these are calculations for a larger Tegra chip that's definitely on the 8nm Samsung process.

He says that 540MHz is the efficiency sweet spot for T234 GPU and it's a good candidate clock for Handheld Mode. He also says 1GHz is the sweet spot for CPU. And of this based on Samsung 8N. However, look at this quick power test from Jetson Power Tool made by Alovon:

With 8 CPU Cores at 1190MHz and 4 TPC (8 SMs) at 612MHz and EMC (Memory Fabric) running at high load 6400 MT/s, it's using 18.6W. And this is just an approximation because T239 has 6 TPCs instead of 4. So I'm definitely puzzled as to why he thinks these clocks are the sweet spot, as he supposedly also had access to the same Jetson Power Tool, and also why he still thinks it's fabbed on 8nm. Unless he thinks Switch 2 Handheld will emulate PC Handhelds like SteamDeck/ROG Ally and use 15W while in Handheld Mode.

@oldpuck, do you have some knowledge of why is that? Does he expect Switch 2 Handheld mode to use up to 15W? Or does the RTX 2050@750MHz power data might be showing him that 8nm T239 is feasible for Handheld mode at the clocks he specified?

Lancelot said:
It was? Might be mixing them up, I do know NVN2 pretty much stated 660mhz (while on 8nm, to note).

To clear things up:

550 MHz = Thraktor expectations for T239 clocks if fabbed on TSMC 4N.

660 MHz = A specified clock in a DLSS Test Tool found by LiC which also specified 4.2W as the energy consumption. But that clock neither the 4.2W figure have any correlation to clocks Nintendo will choose or the energy consumption of T239 (Here LiC has already stated that this is true, but we can't dismiss either way). We also don't know the node. Nothing in Nvidia data state the node for T239, sans the mention of Samsung Foundry in a flag that might have been carried over from T234 work.

Anyway, one thing I think you're all missing is that the 750MHz clock he tested was supposed to represent Docked Performance. 2048 CUDA Cores@750MHz = 1536 CUDA Cores@1GHz is the thinking he's making. He expects Handheld mode to clock at 540 MHz, which is the efficiency sweet spot of T234 8nm from what he tested with Jetson Power Tool.

darthdiablo said:
Easier to do with desktops than laptops. I’ve never even tried to open up laptops except change to new battery.

This crossed my mind too, wondering how much work it would be to swap the RAM to 8gb

You can change memory on Desktop or Laptop GPUs. But you need significant eletronics and soldering skills and some bios flashing and what not. It's something very specialized and for people who really know what they're doing. Otherwise, no. GPU VRAM are fixed and can't be upgraded by the user, either on Desktop or Laptops.

RennanNT · Nov 3, 2023

ItWasMeantToBe19 said:
Are there estimated teraflops and power consumption figures for 500 MHz, 1 GHz, 1.5 GHz, and 2 GHz?

Flops is calculated by: 2 * number of cores * clock

2 * 1536 cores * 0.5 GHz = 1.536 TF

The other 3 clocks you asked are exactly 2x, 3x and 4x that.

And if you want a rough mental estimate for T239, just multiply by 3 and you will get quite close (1.1GHz = a bit above ~3.3TF).

Lancelot · Nov 3, 2023

SoldierDelta said:
Coolio. Sounds about right.
8GB seems to be the base standard for modern games, however you usually have to trade settings in order to get it running. I can run Deathloop at absolute cranked max settings but cannot enable RT no matter what unless I turn down other settings or suffer the consequences of insane stuttering.

Due to it being shared it won't be 8 GB at all times, because the game could end up demanding more. But yeah, accurate enough.

carbvan · Nov 3, 2023

Checked the site for the first time today and I see chat exploded for a different reason. Can't watch the Digital Foundry video, I'm in class rn, is it looking good or bad? I saw some gif reactions and a lot of back and forth on clock speeds when scrolling back through.

ItWasMeantToBe19 · Nov 3, 2023

RennanNT said:
Flops is calculated by: 2 * number of cores * clock

2 * 1536 cores * 0.500 = 1.536 TF

The other 3 clocks are exactly 2x, 3x and 4x that exactly.

And if you want a rough mental estimate for T239, just multiply by 3 and you will get quite close (1.1GHz = a bit above ~3.3TF).

Yeah, that makes sense.

Just curious how much power consumption would increase if you overclocked the tensor cores during the DLSS step as the CUDA cores are idle during the DLSS step...

There are 32 times as many CUDA cores as tensor cores so if you had all of the CUDA cores idle and then cranked the tensor cores up 4 times, jacking the power consumption up let's say 16x... You still end up consuming less power than in the non-DLSS step it would seem like from a naive perspective?

Then again, it might melt the hardware, lol.

darthdiablo · Nov 3, 2023

Ghostsonplanets said:
You can change memory on Desktop or Laptop GPUs. But you need significant eletronics and soldering skils and some bios flashing and what not. It's something very specialized and for people who really know what they're doing. Otherwise, no. GPU VRAM are fixed and can't be upgraded by the user, either on Desktop or Laptops.

Ah right, GPU ram is not like the regular RAM I was thinking of, they’re directly inside the GPU. Yeah now I understand why it’s not easy

SoldierDelta · Nov 3, 2023

carbvan said:
Checked the site for the first time today and I see chat exploded for a different reason. Can't watch the Digital Foundry video, I'm in class rn, is it looking good or bad? I saw some gif reactions and a lot of back and forth on clock speeds when scrolling back through.

Ghostsonplanets · Nov 3, 2023

Kise Ryota said:
BTW, just I remind of what Thraktor said about those clocks/power draw LiC found

* Hidden text: cannot be quoted. *

Also, great post from oldpuck here and LiC's quote of said post here

Hey, this is actually a fantastic post! Thank you for digging and finding these posts by Thraktor, LiC and Oldpuck. They're insanely informative and had some discussions that I had actually missed.

bmfrosty · Nov 3, 2023

ReddDreadtheLead said:
@bmfrosty its you!

Yeah it is.

HankyPanky · Nov 3, 2023

interesting video but clearly missing big pieces of the puzzle, given the Matrix Demo wasn't viable. and the seemingly impossible 4k60 which Nintendo has already demoed shows where things don't add up.

clock speeds higher if not on 8nm
Zelda demo upscaling from 1080 or higher to hit 4k/60
further optimizations

still looking good but i think most of us expect T239 to surpass that laptop GPU significantly. the process node remains a sticking point, if it was known to be 4N expecations would raised across the board ie with mainstream commentators, methinks.

Lancelot · Nov 3, 2023

HankyPanky said:
interesting video but clearly missing big pieces of the puzzle, given the Matrix Demo wasn't viable. and the seemingly impossible 4k60 which Nintendo has already demoed shows where things don't add up.

clock speeds higher if not on 8nm

Zelda demo upscaling from 1080 or higher to hit 4k/60

further optimizations

still looking good but i think most of us expect T239 to surpass that laptop GPU significantly. the process node remains a sticking point, if it was known to be 4N expecations would raised across the board ie with mainstream commentators, methinks.

Matrix could have ran on there, the issue was the lack of vram. Just to note.

RennanNT · Nov 3, 2023

ItWasMeantToBe19 said:
Yeah, that makes sense.

Just curious how much power consumption would increase if you overclocked the tensor cores during the DLSS step as the CUDA cores are idle during the DLSS step...

There are 32 times as many CUDA cores as tensor cores so if you had all of the CUDA cores idle and then cranked the tensor cores up 4 times, jacking the power consumption up let's say 16x... You still end up consuming less power than in the non-DLSS step it would seem like from a naive perspective?

GPU load affect power consumption, so I don't think we will ever get how much it would consume to have the CUDA cores idle and boost the clock just for DLSS.

But I don't think we're going to see this situation. If the CUDA cores aren't used much for the DLSS, then I would expect the GPU move to the next frame while DLSS works on the current frame. This or games will just stick to lower output resolution and/or 30fps.

For example, BotW 4K60 could in theory be achieved by:
0 to 16.7ms: CPU work on the first frame
16.8 to 33.3ms: GPU work on the first frame, CPU works on the second frame
33.4 to 50ms: GPU is doing DLSS on the first frame and working on the second frame, CPU works on the third frame

As long as DLSS can be done within 16ms with enough room for the GPU also render a new frame, this could be achieved. This doesn't make sense on PCs with fast DLSS because it increases latency, but it's right within the compromises/optimization you can expect on consoles.

darthdiablo · Nov 3, 2023

Lancelot said:
Matrix could have ran on there, the issue was the lack of vram. Just to note.

Yup it tells us T239 has more than 4gb which limes up with what we were thinking (6 or 8gb)

HankyPanky · Nov 3, 2023

darthdiablo said:
Yup I think it tells us T239 has more than 4gb which limes up with what we were thinking (6 or 8gb)

the current thinking seems to be 12 or 16gb with 16 being less likely. but definitely 12 if they want this thing to have legs.

LiC · Nov 3, 2023

Kise Ryota said:
BTW, just I remind of what Thraktor said about those clocks/power draw LiC found

* Hidden text: cannot be quoted. *

Also, great post from oldpuck here and LiC's quote of said post here

Hidden content is only available for registered users. Sharing it outside of Famiboards is subject to moderation.

darthdiablo · Nov 3, 2023

HankyPanky said:
the current thinking seems to be 12 or 16gb with 16 being less likely. but definitely 12 if they want this thing to have legs.

Now that you mention 12gb and 16gb, that seems right (I'm now recalling mentions of those 2 numbers).

What amount did the 2050 in the DF video have again? Was it 4gb? 4gb -> 12gb is a big jump but I could be misremembering again, was it 8gb?

qwerp · Nov 3, 2023

darthdiablo said:
Now that you mention 12gb and 16gb, that seems right (I'm now recalling mentions of those 2 numbers).

What amount did the 2050 in the DF video have again? Was it 4gb? 4gb -> 12gb is a big jump but I could be misremembering again, was it 8gb?

a 2050 has 4GB

kvetcha · Nov 3, 2023

darthdiablo said:
Now that you mention 12gb and 16gb, that seems right (I'm now recalling mentions of those 2 numbers).

What amount did the 2050 in the DF video have again? Was it 4gb? 4gb -> 12gb is a big jump but I could be misremembering again, was it 8gb?

The GPU had 4gb dedicated, but the laptop had another 16gb for the CPU.

Charamiwa · Nov 3, 2023

I don't think we should take the Zelda demo being 4K for granted. Not only is it reported from a second hand source, but 4K can mean a lot of things to Nintendo. Like I fully believe that they had a great looking Zelda demo, but I wouldn't use it as a comparison to the work DF has done here.

Thraktor · Nov 3, 2023

I just saw the Digital Foundry video, and I thought they did a good job. Rich gave a lot of caveats (which I'm sure some people will ignore), and it's only a rough approximation of the actual hardware, but it's the best you can do without an actual devkit in front of you.

The comparison to PS5/XBSX/etc was actually a bit better than I expected. He was running Cyberpunk at PS5 settings and getting close-to-respectable performance, and Control ran at PS5/XBSX settings at a relatively solid 30fps, including RT (actually slightly better RT than the others, as it wasn't checkerboarded). Of course resolution and/or frame rates are lower here, but on Switch any "miracle ports" from PS4/XBO tended to have lower resolution, lower frame rates and much lower settings, so not having to sacrifice absolutely everything to get it running is definitely a step up. Plague Tale Requiem, which is a current-gen exclusive, also looked pretty damn respectable, and mostly hit a solid 30fps. I'd expect all of these to get a decent bump in texture quality from moving to 8-12GB of RAM, as well.

The one thing I felt was missing a bit was a comparison to Switch itself. It would have been nice to take one of those "miracle ports" (say Doom Eternal), see what it can do on the 2050, and put it side by side with the Switch version to show the improvement.

Paul_Subsonic said:
The big problem with the DLSS testing in the vid is that it ignores PC games often scale stuff like LOD depending on OUTPUT resolution.
Obviously 4K DLSS looks unfeasable : because they're running the 4K LODs and/or post-process and/or textures.

Using DF's methodology on Techpowerup's data we can say a 3060 needs 2.9 ms for 4K dlss.
Considering Nvidia's number for the 3060 TI is 1.52ms...

I doubt the 3060 is twice slower than a TI.

And DF's numbers are made using depth of field, which would cripple performance if it's running at output resolution. Techpowerup's numbers are without depth of field, and yet still seem to indicate an unrealistic runtime.

This is a fair point. Alex's video on Alan Wake 2 PC settings shows a pretty significant hit to performance from setting post-processing to run at the output res (which includes depth of field), so that could be contributing here. Although the impact won't necessarily be the same as AW2, as they may have different implementations.

Unfortunately it's very tricky to isolate the actual run time of DLSS itself, because of things like this. I think Rich's main point was just to emphasise that DLSS isn't a free lunch and there is a cost there, which is something we should keep in mind, even if the specific numbers he presented should be taken with a pinch of salt.

Kise Ryota · Nov 3, 2023

LiC said:
* Hidden text: cannot be quoted. *

Hidden content is only available for registered users. Sharing it outside of Famiboards is subject to moderation.

ItWasMeantToBe19 · Nov 3, 2023

RennanNT said:
GPU load affect power consumption, so I don't think we will ever get how much it would consume to have the CUDA cores idle and boost the clock just for DLSS.

But I don't think we're going to see this situation. If the CUDA cores aren't used much for the DLSS, then I would expect the GPU move to the next frame while DLSS works on the current frame. This or games will just stick to lower output resolution and/or 30fps.

For example, BotW 4K60 could in theory be achieved by:
0 to 16.7ms: CPU work on the first frame
16.8 to 33.3ms: GPU work on the first frame, CPU works on the second frame
33.4 to 50ms: GPU is doing DLSS on the first frame and working on the second frame, CPU works on the third frame

As long as DLSS can be done within 16ms with enough room for the GPU also render a new frame, this could be achieved. This doesn't make sense on PCs with fast DLSS because it increases latency, but it's right within the compromises/optimization you can expect on consoles.

This makes sense though I do wonder how bad the latency could end up feeling. Could camera movement and other stuff be separated out so that they run at lower latency and make the game feel smoother?

LinkURL · Nov 3, 2023

ItWasMeantToBe19 said:
Wheeeeeeeeeeeee, the CUDA cores are idle while the tensor cores are active.

https://par.nsf.gov/servlets/purl/10415343

"
To accelerate the execution of Machine Learning
applications, recent GPUs use Tensor cores to speed up the
general matrix multiplication (GEMM), which is the heart of
deep learning. The Streaming Processors in such GPUs also
contain CUDA cores to implement general computations. While
the Tensor cores can significantly improve the performance of
GEMM, the CUDA cores remain idle when Tensor cores are
running. This leads to inefficient resource utilization. In this
work, we propose to offload part of the GEMM operations from
Tensor cores to CUDA cores to fully utilize GPU resources.
We investigated the performance bottleneck in such offloading
schemes and proposed architectural optimization to maximize the
GPU throughput. Our technique is purely hardware-based and
does not require a new compiler or other software support. Our
evaluation results show that the proposed scheme can improve
performance by 19% at the maximum."

Utilizing both doesn't help as much as you would think as it's hard to offload parts of it.

This is a pretty big issue for NVIDIA to solve, we'll see.

If you can get DLSS for 4K to run below 16.6ms, and run DLSS on Tensor cores in parallel with CUDA, it would be possible to obtain 60FPS (with input lag of 30Hz but still a very interesting result).

LiC · Nov 3, 2023

kvetcha said:
The GPU had 4gb dedicated, but the laptop had another 16gb for the CPU.

"CPU RAM" size doesn't really help with moment-to-moment performance, though. Primarily, you just need enough to fit in all the assets that your render budget can handle.

ItWasMeantToBe19 · Nov 3, 2023

LinkURL said:
If you can get DLSS for 4K to run below 16.6ms, and run DLSS on Tensor cores in parallel with CUDA, it would be possible to obtain 60FPS (with input lag of 30Hz but still a very interesting result).

To be clear (as I was distracted while posting this), this paper isn't precisely about DLSS but is about executing neural networks in general, but I would guess it applies to DLSS as well.

Kise Ryota · Nov 3, 2023

Are the tensor cores of an RTX 30XX faster than the ones found on an RTX 20XX other than possible higher clocks? If they are, is the difference enough so that 48 TC's from 30 series can match 64 TC's from 20 series?

kvetcha · Nov 3, 2023

LiC said:
"CPU RAM" size doesn't really help with moment-to-moment performance, though. Primarily, you just need enough to fit in all the assets that your render budget can handle.

I know, I'm just noting that 4GB of dedicated RAM for the 2050 can't be compared directly to the larger quantity of shared RAM on T239.

TomNookYankees · Nov 3, 2023

Christer said:
I've watched the DF video now and I recommend everyone to watch it later!

Rich readily concedes that it's a very imperfect comparison and of course software will be optimized for the specific hardware of the new hardware.

Having said all that, the most important finding is that upscaling to 4k from 720p with DLSS takes about 18ms. That is obviously longer than a single frame time for 60fps and also untenably high for 30fps. He did mentioned though that the T234 has special ML-accelarating hardware, and if the T239 retain some of that, those figures could well be lower.

But the TLR I got out of the video is that we shouldn't except 4k output and we shouldn't expect current generation titles to run at 60fps. It will be even harder than with Series S to eke out an acceptable performance mode.

But again, lots we don't know about the hardware and lots that can be achieved with smart optimization against a specific hardware target.

Can't help but think that this could mean a bunch of titles relying too much on DLSS will end up with notable input latency.

goombaicebro · Nov 3, 2023

Brofield said:
Be still, my beating heart. Leadbetter sure knows how to hype up T239

for real. the switch 2's t239 system-on-a-chip having RT cores and tensor (for AI processing) cores means after multiple generations nintendo is using hardware features that are more advanced than the ps5 and xbox series x hardware features

Ghostsonplanets · Nov 3, 2023

Kise Ryota said:
Are the tensor cores of an RTX 30XX faster than the ones found on an RTX 20XX other than possible higher clocks? If they are, is the difference enough so that 48 TC's from 30 series can match 64 TC's from 20 series?

RTX 2050 is Ampere based. Hence why he's using as an approximate GPU to T239. RTX 2050 = RTX 3050M but with half the memory bus. The use the same die, GA107.

Think of the RTX 2050 as the GTX 16xx of the Ampere generation. They used the new architecture, but Nvidia named them as if they're part of the generation before. RTX 3050/Ti, RTX 2050 and MX570 are all Ampere based GPUs and based on the same die, GA107, but with different power budgets, memory, memory clocks and memory bus.

My Tulpa · Nov 3, 2023

ItWasMeantToBe19 said:
If DLSS scales this badly at 2 teraflops or 4 teraflops then the Switch 2's potential is massively lowered as DLSS would be unusable and FSR2 would likely be unusable as well (forcing the Switch 2 to have most of its cycles eaten up by native resolution), but I'm pretty doubtful of this.

I mean, we know DLSS is usable…wasn’t the whole Nvidia hack/leak entirely from DLSS data?

I’ve had many people in this thread argue with me about how this new Switch hardware is gonna function for the most part and how it will be positioned in the ecosystem…with people telling me “DONT you know how powerful this hardware is?? WHY would Nintendo go for this many SM’s and CUDA cores and 8 core CPU and NOT use it for Next Gen Switch gaming! It’s 6X the GPU power!!”

And my response has always been, well, I don’t think the T239 is “overkill” for what I suggest, I think it’s exactly as powerful as it NEEDS to be to effectively and adequately preform DLSS functions at extremely minimal clocks and power draws.

This makes sense to me.

I expect what they are showing off…BotW running on this new Switch at 4k/60fps and having it look like it was a 2023 released game running on an RTX PC rig is what to expect from this new Switch.

It’s designed to run games that run on all the Switch hardware out there, but look and perform much better on the new Switch.

They settled on this new, powerful Nvidia SoC because it’s what was needed to effectively do this with a BotW type game.

Cybergatuno said:
Frankly, that DF video has me hyped even more.

It's just a ballpark estimation but they're running modern PC games with zero porting/optimization effort, and it runs pretty well at 1080p.
I don't care about 4K and never did. I want it to look good handheld primarily.

I have no doubt good port studios will do wonders with a little effort on any current-gen title, specially with Steam handhelds setting the baseline.

And regarding first-parties, when I see Metroid Prime or TotK running on a potato X1, I can barely imagine what they could do on the t239.

Yep, i too am excited at seeing how Nintendo games look and perform on this new thing! Being able to squeeeze out more modern graphical effects and light ray tracing and solid frame rates from their epic ToTK games will be phenomenal.

But I would expect 3rd party support to be much different than we’ve seen the past 6 years. I wouldn’t expect Nintendo to try and use the new hardware to make games that can’t run on the Lite/OLeD at a level TotK runs.

Unless there is some new “gimmick” we don’t know about yet.

Alovon11 · Nov 3, 2023

ItWasMeantToBe19 said:
DLSS 720p>4K is like twice as expensive in frametime as DLSS 720p>1440p going by NVIDIA's documentation so I would expect 720p>1440p to be the standard.

But 18ms sounds incredibly high compared to NVIDIA's documentation.

Obviously these chips are all much stronger than the Switch 2, but the gap shouldn't be this large unless DLSS starts scaling really badly at low spec.

If it does, hopefully NVIDIA can develop a pruned DLSS neural network for the Switch 2... We'll see if they're interested as pruned neural networks are really expensive (in terms of dollars and manpower) to create and the results are less good (obviously)

TBF think that it's memory framebuffer and clock related for DLSS's cost.

4K Frames take the same space if they are DLSS'd or not (other things are reduced hwne using DLSS).

And the 2050M was clocked at 750Mhz. So if T239 is at 1.1-1.38GHz that is way faster than that so the Tensor Cores can cycle faster.

StarTopic Future Nintendo Hardware & Technology Speculation & Discussion |ST| (Read the staff posts before commenting!)

Tektite

Manakete

Like Like

Bob-omb

Bob-omb

Manakete

Magical Famicomrade

Like Like

Manakete

Designated Xenoblade Loremaster

Bob-omb

Like Like

Magical Famicomrade

Chain Chomp

Cappy

Designated Xenoblade Loremaster

+5 Death Stare

Caught: 1025

Ghost in the Sales

Bob-omb

Like Like

Chain Chomp

Manakete

+5 Death Stare

Designated Xenoblade Loremaster

Ghost in the Sales

wins the battle!

Like Like

Like Like

Bob-omb

+5 Death Stare

Like Like

Member

+5 Death Stare

Moblin

hoopy frood

Bob-omb

"[✄]. [✄]. [✄]. [✄]." -Microsoft

Bob-omb

Manakete

Bob-omb

Member

Manakete

Bob-omb

hoopy frood

Koopa

Luma

Ghost in the Sales

Bob-omb

Like Like