StarTopic Future Nintendo Hardware & Technology Speculation & Discussion |ST| (New Staff Post, Please read)

Foltzie · Nov 4, 2023

Giancarlo said:
dont you all feel, a sort of depression feeling, now that Nintendo Switch is approaching it end, dont you all hoped Nintendo Switch would live until 2030 and then Switch sucessor, everywhere i go, i see people with this trough, why Nintendo is killing Switch now, cant they keep it alive for 5 more years

Eh?

It is possible that the Switch to Drake gap between releases will exceed the ~7.5 year gap between the Famicom and the Super Famicom.

13 years would be 4 more than the GB to GBC gap.

Thraktor · Nov 4, 2023

Anatole said:
Is the Cyberpunk modder you posted the only source for RR being ReLAX/ReBLUR plus a neural network for temporal rejection of rays? That just doesn’t seem credible to me. The claim in all of Nvidia’s messaging was that the neural network in RR is doing the actual simultaneous denoising via reconstruction and supersampling, like in the Intel paper. Using ReLAX or ReBLUR would be in direct contradiction to that.

It's all the more suspicious given Cyberpunk 2077 already used NRD before the 2.0 update (see here). There are also clear visual differences between the two, including artefacts like trails which are consistent with a recurrent neural network implementation (eg early versions of DLSS 2) and aren't seen when using NRD denoisers.

I wonder if this modder is just reconfiguring the non-RR path tracing denoisers and is confusing that for the RR setup.

Ghostsonplanets · Nov 4, 2023

Machrider said:
I think the Switch 2 specs are quite solid, being able to run games like A Plague Tale Requiem and Cyberpunk 2077 at PS5 settings is no joke, the resolution I wish could be a little better in docked mode, but still.

I think a Switch 2 Pro could really help, and hopefully this time around with no COVID type break (and component shortages), Nintendo will consider a Switch 2 Pro.

It doesn't have "to split the userbase!", just run the same games with a little boost in frame rate and resolution, maybe ray tracing also turned on. It's really not that different from New 3DS to the OG 3DS, here we see Monster Hunter 4 running at like closer to 50 fps than 30 fps with higher resolution textures, that's a pretty huge boost for New 3DS over the original 3DS.

I think high end games becoming likely more widespread on Apple iPhone/iPad and Android devices in the next 4-5 years should push Nintendo to also stay on top of performance and not rest on the status quo so much. At some point you're probably also going to see Steam Deck

That's very unlikely to happen. Specially if Switch 2 is already fabbed on a bleeding edge node like 4N.

OctoSplattack · Nov 4, 2023

Ghostsonplanets said:
Oh shoot, Switch is the longest generation without new hardware from Nintendo, isn't.

Depending on when it drops in 2024, it'll either be second or third place. Game Boy is the undisputed king at 9.5 years; followed by the Famicom with 7 years, 4 months.

ItWasMeantToBe19 · Nov 4, 2023

Is there a way to mod Alan Wake 2 so that it skips the denoising step and then compare that frametime to Alan Wake 2 with RR enabled.

If this actually is a neural network, then it's massively too much for what the Switch 2 would need (as it's designed for path tracing) and it's possible NVIDIA could make a heavily pruned, shitty one that quickly worked on like RT Reflections.

Deleted member 887 · Nov 4, 2023

Kise Ryota said:
After Rich's tests, what can we extrapolate for handheld mode in terms of DLSS? The base clock used for testing (750MHz) was a plausible docked mode at 1GHz.

Having, for example, half of that clock (375MHz), would DLSS quality or balanced still be ok for 60fps@1080p games on handheld? Or even 30fps

The rule of thumb is - whatever you can get docked mode to do at 1440p, you can get handheld to do at 1080p. It's half as many pixels, and most of the things we're talking about scale with resolution.

Death Stranding running 1440p30fps at max settings and DLSS Performance mode, upscaling from 720p? Then in handheld expect Death Stranding at 1080p30fps max settings and DLSS performance mode, upscaling from 540p.

"But what about 60fps" - 60fps totally viable, sure. Rich's tests don't in anyway rule it out. I think I understand now why some folks are getting the wrong impression from Rich's video, so I'm gonna right something up in a minute, but yes, Rich is pretty clear that 1440p60fps is totally viable.

lemonfresh · Nov 4, 2023

OctoSplattack said:
Depending on when it drops in 2024, it'll either be second or third place. Game Boy is the undisputed king at 9.5 years; followed by the Famicom with 7 years, 4 months.

vita is actually the king its been 12 years and still no successor

/s

ItWasMeantToBe19 · Nov 4, 2023

RR actually being a neural network would be exciting so hopefully it’s the case.

I do think it would take a while for NVIDIA to make a pruned version of this for RT applications other than full PT, but we’ll see.

Hermii · Nov 4, 2023

oldpuck said:
The rule of thumb is - whatever you can get docked mode to do at 1440p, you can get handheld to do at 1080p. It's half as many pixels, and most of the things we're talking about scale with resolution.

Death Stranding running 1440p30fps at max settings and DLSS Performance mode, upscaling from 720p? Then in handheld expect Death Stranding at 1080p30fps max settings and DLSS performance mode, upscaling from 540p.

"But what about 60fps" - 60fps totally viable, sure. Rich's tests don't in anyway rule it out. I think I understand now why some folks are getting the wrong impression from Rich's video, so I'm gonna right something up in a minute, but yes, Rich is pretty clear that 1440p60fps is totally viable.

If it was in the video I missed it, but did Rich calculate DLSS frame time cost at 1440p?

ItWasMeantToBe19 · Nov 4, 2023

Hermii said:
If it was in the video I missed it, but did Rich calculate DLSS frame time cost at 1440p?

7.7 ms for Death Stranding.

Ghostsonplanets · Nov 4, 2023

Hermii said:
If it was in the video I missed it, but did Rich calculate DLSS frame time cost at 1440p?

Yes.

Kise Ryota · Nov 4, 2023

oldpuck said:
The rule of thumb is - whatever you can get docked mode to do at 1440p, you can get handheld to do at 1080p. It's half as many pixels, and most of the things we're talking about scale with resolution.

Death Stranding running 1440p30fps at max settings and DLSS Performance mode, upscaling from 720p? Then in handheld expect Death Stranding at 1080p30fps max settings and DLSS performance mode, upscaling from 540p.

"But what about 60fps" - 60fps totally viable, sure. Rich's tests don't in anyway rule it out. I think I understand now why some folks are getting the wrong impression from Rich's video, so I'm gonna right something up in a minute, but yes, Rich is pretty clear that 1440p60fps is totally viable.

Yeah, but most of Rich's tests were not trying to achieve 1440p, but 1080p. That's why I asked that. If we have A Plague Tale running at 1080p balanced, can we have it running at 1080p using performance mode (maybe even ultra performance) while having the tensor cores at half the clock? Are we going to see games in handheld not running at the native [screen] res again?

Of course, devs can always optimize the game (and downgrade things) so it can meet a good balance, but then we weren't going to see the same visual quality we saw in the test.

Well, the right thing is to wait for the real thing and see how it works. But I just got this feeling that, maybe, Drake would have needed more in terms of tensor cores, as it is going to be fundamental for a hybrid like the switch 2. Maybe there's more customization than we know? Or devs will just work with what they have in hands and do the best they can. I don't know.

chocolate_supra · Nov 4, 2023

lattjeful said:
And the PS4 and Xbox One are running the game at 30fps with the help of TAA. This view of DLSS/FSR as this crutch or big asterisk confuses me when so many games that are viewed as huge graphical feats (Red Dead Redemption 2, Cyberpunk 2077, etc.) use what is - as an incredibly gross simplification - a worse looking and performing version of DLSS/FSR.

I used to get into this over turbocharging vs supercharging engines, because import cars tended to use turbos to make big power and muscle car people would call it "cheating," saying it's "not true horsepower," etc, and then they'd slap a supercharger on their car and say that one's not cheating. And when I brought up that both are literally just slightly different methods of compressing incoming air it would start a fight.

Everybody's upscaling these days, I dunno why Nintendo's the only one catching flack for it (I mean, I do, but yeah)

LuigiBlood said:
* Hidden text: cannot be quoted. *

Hidden content is only available for registered users. Sharing it outside of Famiboards is subject to moderation.

Giancarlo said:
dont you all feel, a sort of depression feeling, now that Nintendo Switch is approaching it end, dont you all hoped Nintendo Switch would live until 2030 and then Switch sucessor, everywhere i go, i see people with this trough, why Nintendo is killing Switch now, cant they keep it alive for 5 more years

*points at his own tag*
Team Decade baybeeee

SoldierDelta said:
I'm kinda curious as to if Monolith Soft/Zelda team would get RT working in their open-world/open-zone games though. Would be awesome if they did, but it'd still be a bit weird. Idk, something I'd like to see being experimented with at least.

This actually got me wondering, and this question is for everyone, is there an LOD equivalent to raytracing? Like in a huge open game like BotW or XB is there a distance point where RT stops calculating and it defaults to old school illumination methods? Sorry for the noob question but it strikes me as something that could help performance if the entire massive vista isnt entirely lit with RT.

Hermii · Nov 4, 2023

Does anyone have a good theory on what could account for the huge disparity between DF testing, and the numbers in the DLSS documentation?

nyaako · Nov 4, 2023

Hermii said:
Does anyone have a good theory on what could account for the huge disparity between DF testing, and the numbers in the DLSS documentation?

Depends on the devs, the quality of the PC port, and how well they implement DLSS. There's tons of factors to consider and even DF said that what they test is likely not going to reflect what the end product may perform like, but a rather very heavy rough estimate since there's currently no equivalent hardware to test on what-ifs and so they try to get the closest thing they can possibly obtain.

Edit: Have to consider what sort of changes the devs will make on Switch 2 games that might be very different on other platforms, especially on PC. Like Doom 2016/Eternal on Switch, where the porting devs made lots of heavy changes that is likely impossible to replicate on PC even if people try to use a Tegra X1 chipset and run the PC version at the lowest possible settings and resolution to try to achieve the same results.

TomNookYankees · Nov 4, 2023

oldpuck said:
The rule of thumb is - whatever you can get docked mode to do at 1440p, you can get handheld to do at 1080p. It's half as many pixels, and most of the things we're talking about scale with resolution.

Death Stranding running 1440p30fps at max settings and DLSS Performance mode, upscaling from 720p? Then in handheld expect Death Stranding at 1080p30fps max settings and DLSS performance mode, upscaling from 540p.

"But what about 60fps" - 60fps totally viable, sure. Rich's tests don't in anyway rule it out. I think I understand now why some folks are getting the wrong impression from Rich's video, so I'm gonna right something up in a minute, but yes, Rich is pretty clear that 1440p60fps is totally viable.

One thing that occurred to me is I guess it's also possible to more easily achieve 1440p60fps docked, 1080p60fps handheld by having the DLSS render the previous frame rather than the current frame so that the internal rendering gets the full 16.6ms of frametime

Multiverse · Nov 4, 2023

TomNookYankees said:
Did you not watch the video? Rich tests both with and without DLSS.

Yes, I just noticed native lol. A lot the games here run worse than base PS4 docked which is a load of crap lol. This “hardware” testing here is based on 8GB ram and 8nm. The audacity of him stressing it’s “confirming” last gen performance from Bobby Kotick’s mouth got on my nerves a little because that’s what he chose to believe rather than the Gamescom’s report.

I mean it’s possible that it’s really is that, but I don’t think so esp from what we speculated here.

ILikeFeet · Nov 4, 2023

ItWasMeantToBe19 said:
ReLAX is just very heavy, whatever, I don't want to fight about this one.

so you don't have evidence? I found some measures of ReLAX but I want to know what counts as "too heavy" before posting it.

ItWasMeantToBe19 said:
My question is not about relative denoiser cost, but just total additional frametime cost over SR alone.

Because if SR is eating up 7.5 ms of frametime on the tensor cores, then RR eating up another 9.2 ms (or more) of frametime on the tensor cores would make RR completely unviable.

this video has their frame rate drop from 83 to 74 fps. an addition of 1.5ms. it's with path tracing turned off and PS5 RT settings (rt lighting and rt reflections)

lattjeful · Nov 4, 2023

lemonfresh said:
My pc is not that powerful its like over 5 years old and I can run botw at 4k and it runs pretty good

Backing this up. I run Breath of the Wild at 4k/30 with some of the settings boosted (LODs, shadow draw distance, etc.), and it's rock solid on a mobile 2060. Native 4k running on an emulator. Getting BotW up to 4k on a console where they can actually optimize/change things, have access to tools like DLSS, and isn't running through a compatibility layer and on Windows wouldn't surprise me one bit.

chocolate_supra said:
I used to get into this over turbocharging vs supercharging engines, because import cars tended to use turbos to make big power and muscle car people would call it "cheating," saying it's "not true horsepower," etc, and then they'd slap a supercharger on their car and say that one's not cheating. And when I brought up that both are literally just slightly different methods of compressing incoming air it would start a fight.

Everybody's upscaling these days, I dunno why Nintendo's the only one catching flack for it (I mean, I do, but yeah)

I think it's moreso people having a hate boner for DLSS/FSR than anything Nintendo - though it's no doubt there. DLSS/FSR isn't just cool technical wizardy; they're marketing tools to sell the RTX 7060XTX Ti Super Pro & Knuckles. These two companies have incredibly loyal fans - whether they bleed Team Green or Team Red - who do a hell of a good job at trying to convince everybody that these upscalers are the best thing since sliced bread. Some are tired of the hyperbole/cherry picking - "DLSS is better than native!" is frequently parroted, but it's conveniently left out that that's only really the case at higher output resolutions - and the fights ("My upscaler is best upscaler because it looks better!" "No, my upscaler is best upscaler because it's open source and can be used everywhere!"), so they tend to overcorrect and go the other way; the upscalers aren't good, they're a crutch, and they aren't rendering "real" frames/pixels. Compound this with the general lack of knowledge about how game development works and the current state of computer hardware design/production (Moore's Law and the whole debate about that) and you get this.

Discostew · Nov 4, 2023

The whole concurrency situation with CPU/GPU/DLSS is something that even retro consoles have been doing for a long time with CPU/PPU. It's not about processing a whole frame's worth of data through all phases from beginning to end before starting the next. It's about setting up the next part before moving onto the next and not wasting cycles when possible.

Take the NES, for example. The PPU (Picture Processing Unit) handles all rendering of background, sprites, and whatnot by reading from memory-mapped registers where each register is linked to a particular part, like a background's /origin/position, sprite's color palette, etc along with the small VRAM pool for graphic cels these registers can also reference. Every 16.66ms (NTSC, forgive me for not going into the details of this vs PAL), the PPU runs through 2 phases. A VDraw phase, where it draws the frame one scanline at a time (scanline 0 - 239, NTSC blanks out the top and bottom 8 scanlines), and a VBlank phase, where it is idle (scanline 240-261) for a total of 262 scanlines. It doesn't wait for the CPU. It will take what's in its registers, renders the frame for the 1st phase, and sits idle for the second phase.

From my understanding of games back then, the NES CPU will handle the game logic for a frame, and then will call the command to wait for the next VBlank (scanline 240), regardless of how much work it has to do. Once that scanline is reached after waiting for it, it updates the PPU's registers with whatever needs updating based on the game logic it had just processed. Then immediately afterwards begins work on the next game logic frame. So think of the PPU phases starting with the VDraw, whereas the CPU's phase starts with the VBlank. Let's assume it takes just a few scanline's worth of time to update those PPU registers, so the workflow could be something like this....

VBlank phase

Scanline 240 --- VBlank phase starts. PPU goes idle. CPU starts updating PPU registers from last processed game logic frame.
Scanline 240~244 --- CPU continued updating PPU registers from last processed game logic frame.
Scanline 244 --- CPU finishes PPU update. Begins processing next game logic frame. PPU remains idle.
Scanline 244~261 --- CPU continues processing next game logic frame. PPU still idle.
Scanline 261 --- VBlank ends, wraps around to scanline 0 on next scanline. PPU becomes active, ready for scanline 0 to start rendering. CPU still processing next game logic frame.

VDraw phase

Scanline 0~199 --- VDraw phase starts. PPU renders 200 scanlines worth of the frame based on prior game logic frame in registers. CPU still processing next game logic frame.
Scanline 200 --- CPU finishes next game logic frame. Makes call to wait for VBlank. PPU renders scanline at this mark.
Scanline 200~239 --- CPU waits. PPU finishes rendering based on last game logic frame.

The CPU could finish its work sooner or later. Ever wonder why some games slow down? It's because the work load for the CPU to process a single game logic frame is too much to fit into a single VDraw + VBlank span of time, so when the call to wait for the VBlank is done, it could already passed the start of VBlank, meaning it has to wait until the next go-around. It's dropping to 30 game logic fps,. The PPU, however, continues to operate at 60fps regardless of the CPU. It has a fixed amount of time to spend per scanline before moving onto the next, using what's in its registers. If the CPU only updates those registers at a rate of 30fps, then the PPU renders using the same information twice before they get changed.

For something like the Switch 2 with more modern technology and different handling of the flow, to me, it's more like how instructions go through a CPU pipeline, or like an assembly line. First stage is the CPU, second stage is the GPU, and third stage in this case is DLSS. The CPU stage handles the game logic frame, and then waits for the GPU to be ready for the next frame. When the GPU is ready, the CPU hands over the information the GPU needs to begin. The GPU takes it, and begins rendering the frame based on that, letting the CPU start the next game logic frame. The GPU continues to render the frame, and once it's finished, it waits for the DLSS phase to finish its own task. It hands over the rendered frame (and whatever else it needs) so the DLSS stage can do its thing, thereby letting the GPU stage be ready to receive from the CPU stage.

Let's say there was only the CPU and GPU. If for a single frame, it took the CPU stage 3.33ms, and the GPU stage 16.66ms, this means for every frame from start to finish would take ~20ms. But with concurrency, we visually see a frame change every 16.66ms (or 60fps). That is because the bottleneck is the GPU at 16.66ms, meaning the CPU is having to sit idly for 13.33ms, waiting for the GPU.

Now let's introduce DLSS, allowing us to spend less time rendering with the GPU by using a lower resolution. If for a single frame, it took the CPU stage 3.33ms, the GPU stage 8.33ms, and the DLSS stage 8.33ms, that's still a total of 20ms from start to finish. But with concurrency, we'd visually see a frame change every 8.33ms (or 120fps). This is because the bottleneck of the GPU is reduced. The CPU only has to wait 5ms instead of 13.33ms after processing, so it does get pushed more.

CPU has to wait on the GPU, GPU has to wait on both CPU and DLSS, and DLSS has to wait on GPU. Like others have said, DLSS isn't free, but can increase frame rates by giving the GPU the chance to spend less time rendering by using a lower resolution. So even with something like BotW demoed for Switch 2 at 4k60, so long as no stage in this pipeline (whether 3 stages or 100 stages) goes above 16.66ms, it can hit 60fps.

I'm done rambling for a bit now.

ItWasMeantToBe19 · Nov 4, 2023

Hermii said:
Does anyone have a good theory on what could account for the huge disparity between DF testing, and the numbers in the DLSS documentation?

I might have gotten Death Stranding free on EGS so maybe I can test this. If I do, I’ll try to turn off post processing effects to see if those are tied to output resolution.

VagrantValmar · Nov 4, 2023

For the doomposters, I've played Death Stranding on the deck both handheld and docked and it does not run remotely close to that. And if that's the floor of the Switch 2 performance, that means we're in for a wild ride.

SoldierDelta · Nov 4, 2023

VagrantValmar said:
For the doomposters, I've played Death Stranding on the deck both handheld and docked and it does not run remotely close to that. And if that's the floor of the Switch 2 performance, that means we're in for a wild ride.

It's actually impressive how big the disparity is between the Steam Deck ""optimised"" version and the "Default" preset used in the DF video. And that's forgetting that the Switch 2's use of DLSS make the image look better than the Steam Deck's FSR and that the Switch 2 is running Death Stranding in 1080p compared to Steam Deck's 800p.

CeramicPigeon · Nov 4, 2023

VagrantValmar said:
For the doomposters, I've played Death Stranding on the deck both handheld and docked and it does not run remotely close to that. And if that's the floor of the Switch 2 performance, that means we're in for a wild ride.

Rich specifically says that it could be the floor, but could also be that T239 is weaker. That is something to keep in mind.

lemonfresh · Nov 4, 2023

VagrantValmar said:
And if that's the floor of the Switch 2 performance, that means we're in for a wild ride.

It's going to be a interesting next few months for sure.

CeramicPigeon · Nov 4, 2023

SoldierDelta said:
It's actually impressive how big the disparity is between the Steam Deck ""optimised"" version and the "Default" preset used in the DF video. And that's forgetting that the Switch 2's use of DLSS make the image look better than the Steam Deck's FSR and that the Switch 2 is running Death Stranding in 1080p compared to Steam Deck's 800p.

The clocks Rich used in the video were also what he theorized would be what Docked mode is clocked at, meaning handheld mode would be running much slower.

SoldierDelta · Nov 4, 2023

CeramicPigeon said:
Rich specifically says that it could be the floor, but could also be that T239 is weaker. That is something to keep in mind.

You could be right... however:

The tests are made on pre-release information that could've changed or been upgraded in the time since the leak
They were done on PC without console-specific tweaks that have, in the past, heavily increased performance to an insane degree
We know that the VRAM used for the 2050 Laptop GPU isn't going to be the same for the Switch 2, meaning the general stability of framerate will likely improve

I'm not saying you are wrong about thinking the T239 could be weaker, but what I am saying is that I have significant and notable doubts about that.

CeramicPigeon · Nov 4, 2023

SoldierDelta said:
The tests are made on pre-release information that could've changed or been upgraded in the time since the leak

What was the timeframe for this again? Rather. was there a date tied to when this information was last updated in a leak?

Thraktor · Nov 4, 2023

Discostew said:
The whole concurrency situation with CPU/GPU/DLSS is something that even retro consoles have been doing for a long time with CPU/PPU. It's not about processing a whole frame's worth of data through all phases from beginning to end before starting the next. It's about setting up the next part before moving onto the next and not wasting cycles when possible.

Take the NES, for example. The PPU (Picture Processing Unit) handles all rendering of background, sprites, and whatnot by reading from memory-mapped registers where each register is linked to a particular part, like a background's /origin/position, sprite's color palette, etc along with the small VRAM pool for graphic cels these registers can also reference. Every 16.66ms (NTSC, forgive me for not going into the details of this vs PAL), the PPU runs through 2 phases. A VDraw phase, where it draws the frame one scanline at a time (scanline 0 - 239, NTSC blanks out the top and bottom 8 scanlines), and a VBlank phase, where it is idle (scanline 240-261) for a total of 262 scanlines. It doesn't wait for the CPU. It will take what's in its registers, renders the frame for the 1st phase, and sits idle for the second phase.

From my understanding of games back then, the NES CPU will handle the game logic for a frame, and then will call the command to wait for the next VBlank (scanline 240), regardless of how much work it has to do. Once that scanline is reached after waiting for it, it updates the PPU's registers with whatever needs updating based on the game logic it had just processed. Then immediately afterwards begins work on the next game logic frame. So think of the PPU phases starting with the VDraw, whereas the CPU's phase starts with the VBlank. Let's assume it takes just a few scanline's worth of time to update those PPU registers, so the workflow could be something like this....

VBlank phase

Scanline 240 --- VBlank phase starts. PPU goes idle. CPU starts updating PPU registers from last processed game logic frame.

Scanline 240~244 --- CPU continued updating PPU registers from last processed game logic frame.

Scanline 244 --- CPU finishes PPU update. Begins processing next game logic frame. PPU remains idle.

Scanline 244~261 --- CPU continues processing next game logic frame. PPU still idle.

Scanline 261 --- VBlank ends, wraps around to scanline 0 on next scanline. PPU becomes active, ready for scanline 0 to start rendering. CPU still processing next game logic frame.

VDraw phase

Scanline 0~199 --- VDraw phase starts. PPU renders 200 scanlines worth of the frame based on prior game logic frame in registers. CPU still processing next game logic frame.

Scanline 200 --- CPU finishes next game logic frame. Makes call to wait for VBlank. PPU renders scanline at this mark.

Scanline 200~239 --- CPU waits. PPU finishes rendering based on last game logic frame.

The CPU could finish its work sooner or later. Ever wonder why some games slow down? It's because the work load for the CPU to process a single game logic frame is too much to fit into a single VDraw + VBlank span of time, so when the call to wait for the VBlank is done, it could already passed the start of VBlank, meaning it has to wait until the next go-around. It's dropping to 30 game logic fps,. The PPU, however, continues to operate at 60fps regardless of the CPU. It has a fixed amount of time to spend per scanline before moving onto the next, using what's in its registers. If the CPU only updates those registers at a rate of 30fps, then the PPU renders using the same information twice before they get changed.

For something like the Switch 2 with more modern technology and different handling of the flow, to me, it's more like how instructions go through a CPU pipeline, or like an assembly line. First stage is the CPU, second stage is the GPU, and third stage in this case is DLSS. The CPU stage handles the game logic frame, and then waits for the GPU to be ready for the next frame. When the GPU is ready, the CPU hands over the information the GPU needs to begin. The GPU takes it, and begins rendering the frame based on that, letting the CPU start the next game logic frame. The GPU continues to render the frame, and once it's finished, it waits for the DLSS phase to finish its own task. It hands over the rendered frame (and whatever else it needs) so the DLSS stage can do its thing, thereby letting the GPU stage be ready to receive from the CPU stage.

Let's say there was only the CPU and GPU. If for a single frame, it took the CPU stage 3.33ms, and the GPU stage 16.66ms, this means for every frame from start to finish would take ~20ms. But with concurrency, we visually see a frame change every 16.66ms (or 60fps). That is because the bottleneck is the GPU at 16.66ms, meaning the CPU is having to sit idly for 13.33ms, waiting for the GPU.

Now let's introduce DLSS, allowing us to spend less time rendering with the GPU by using a lower resolution. If for a single frame, it took the CPU stage 3.33ms, the GPU stage 8.33ms, and the DLSS stage 8.33ms, that's still a total of 20ms from start to finish. But with concurrency, we'd visually see a frame change every 8.33ms (or 120fps). This is because the bottleneck of the GPU is reduced. The CPU only has to wait 5ms instead of 13.33ms after processing, so it does get pushed more.

CPU has to wait on the GPU, GPU has to wait on both CPU and DLSS, and DLSS has to wait on GPU. Like others have said, DLSS isn't free, but can increase frame rates by giving the GPU the chance to spend less time rendering by using a lower resolution. So even with something like BotW demoed for Switch 2 at 4k60, so long as no stage in this pipeline (whether 3 stages or 100 stages) goes above 16.66ms, it can hit 60fps.

I'm done rambling for a bit now.

This is a good post, and I do think concurrency between tensor cores and other GPU logic could come into play on Switch 2, I would caution that it isn't a completely free lunch. The tensor cores sit within the SM with the regular GPU cores and RT cores, which means that if you want to run them at the same time, they'll be competing with each other for things like memory bandwidth, register files and L1/L2 caches. So a DLSS pass which takes 8.33ms (to take your example) when run on its own will probably take longer than that if you try to run it at the same time as regular GPU code.

Optimising around T239's specific hardware may allow developers to mitigate this to some effect, for example by managing register usage to minimise contention, but I'd still expect to see some degree of slowdown from running DLSS concurrently with rendering the next frame. Performance should certainly be better than running them sequentially, but I wouldn't count on being able to reduce DLSS's impact to zero just via concurrency.

Discostew · Nov 4, 2023

Thraktor said:
This is a good post, and I do think concurrency between tensor cores and other GPU logic could come into play on Switch 2, I would caution that it isn't a completely free lunch. The tensor cores sit within the SM with the regular GPU cores and RT cores, which means that if you want to run them at the same time, they'll be competing with each other for things like memory bandwidth, register files and L1/L2 caches. So a DLSS pass which takes 8.33ms (to take your example) when run on its own will probably take longer than that if you try to run it at the same time as regular GPU code.

Optimising around T239's specific hardware may allow developers to mitigate this to some effect, for example by managing register usage to minimise contention, but I'd still expect to see some degree of slowdown from running DLSS concurrently with rendering the next frame. Performance should certainly be better than running them sequentially, but I wouldn't count on being able to reduce DLSS's impact to zero just via concurrency.

Yeah, there are definitely things to take into account. I just wanted to show a simple scenario of what concurrency can provide, and how this isn't really something new.

Deleted member 887 · Nov 4, 2023

Kise Ryota said:
Yeah, but most of Rich's tests were not trying to achieve 1440p, but 1080p. That's why I asked that. If we have A Plague Tale running at 1080p balanced, can we have it running at 1080p using performance mode (maybe even ultra performance) while having the tensor cores at half the clock? Are we going to see games in handheld not running at the native [screen] res again?

Yes. We will not see a console from any manufacturer hit native res on all of its games, anytime in the next decade.

The RTX 4090 is 82 TFLOPS. On a 4k screen, that's about 10 TFLOPS/million pixels. Nintendo could literally have given you a 1080p screen and attached a whole ass PS5 to the back of it, and that would still only be 5 TFLOPS/million pixel.

This is simple, inevitable fact.

Kise Ryota said:
Of course, devs can always optimize the game (and downgrade things) so it can meet a good balance, but then we weren't going to see the same visual quality we saw in the test.

Yes. Exactly. Maximizing for effects will not maximize for resolution, and this test is just an example.

Kise Ryota said:
Well, the right thing is to wait for the real thing and see how it works. But I just got this feeling that, maybe, Drake would have needed more in terms of tensor cores, as it is going to be fundamental for a hybrid like the switch 2.

If tensor cores were so fast that DLSS was 100% free, you'd still not have enough power. Devs will still have to balance effects versus internal resolution, and that will never stop. Technically, this is Control running at 1440p with DLSS, but I would say it would be worth it to increase the internal res and drop some of the effects.

Kise Ryota said:
Maybe there's more customization than we know? Or devs will just work with what they have in hands and do the best they can. I don't know.

Yes, they will do what they have always done. There was no number that would make that stop.

Pokemaniac · Nov 4, 2023

Bigdog said:
Didn't they discount the possibility of DLSS 3 in for the switch NG not all that long ago? Not being very technical my judgment is limited, but as we're talking a console gaming and not PC there must be a number of factors that can't be replicated in the PC space.

Not to take anything away from DF, I think they do some amazing work and if Rich got even close to what we can expect were in for one hell for a ride.

T239 most likely doesn't meet the stated system requirements for DLSS frame generation. Newer versions of other DLSS tools should work fine.

ReddDreadtheLead said:
* Hidden text: cannot be quoted. *

Hidden content is only available for registered users. Sharing it outside of Famiboards is subject to moderation.

Kise Ryota · Nov 4, 2023

oldpuck said:
Yes. We will not see a console from any manufacturer hit native res on all of its games, anytime in the next decade.

In that case, I think I would prefer a 720p screen again

Running below native res just sucks IMO

TLZ · Nov 4, 2023

eye0h said:
team 32gb stand up

32gb of storage yes.

ILikeFeet · Nov 4, 2023

so doing some looking, I found how much ReLAX costs. and in addition to that, Nvidia NRD has three denoisers to choose from that are intended for different use cases

interestingly, throwing more hardware at ReLAX doesn't do anything. want faster denoising, lower the resolution

ItWasMeantToBe19 · Nov 4, 2023

ILikeFeet said:
so doing some looking, I found how much ReLAX costs. and in addition to that, Nvidia NRD has three denoisers to choose from that are intended for different use cases

interestingly, throwing more hardware at ReLAX doesn't do anything. want faster denoising, lower the resolution

The RTX 3090 is only slightly slower than you would expect considering the teraflop difference between the two and that it's working on an image with 16/9 as many pixels.

Paul_Subsonic · Nov 4, 2023

ItWasMeantToBe19 said:
These tests do raise some significant questions about the viability of Ray Reconstruction on the Switch 2 to add on to prior questions about RR.

RR seemingly works by selecting rays from frames n-1, n-2, ..., that can be reused for frame n.

Things we don't know about RR currently

-How much additional frametime does it cost relative to DLSS itself?
-Where in the pipeline does RR happen? Does it happen at the same time as Super Resolution or at a different time? What GPU work is still needed after RR happens? This may not work well with deferred frames if significant GPU work has to be done after RR.
-Is RR compatible with denoisers other than ReBLUR or ReLAX? ReLAX is 1000% not viable on Switch 2 and ReBLUR seems unlikely to work well on Switch 2.

Should be the opposite : Reblur is not viable on Drake, but ReLax should work.

I think you confused the two.

Edit : No, I'm the one who's confused.

TomNookYankees · Nov 4, 2023

Discostew said:
CPU has to wait on the GPU, GPU has to wait on both CPU and DLSS, and DLSS has to wait on GPU. Like others have said, DLSS isn't free, but can increase frame rates by giving the GPU the chance to spend less time rendering by using a lower resolution. So even with something like BotW demoed for Switch 2 at 4k60, so long as no stage in this pipeline (whether 3 stages or 100 stages) goes above 16.66ms, it can hit 60fps.

I'm done rambling for a bit now.

So in a scenario of 3 stages where each stage is close to 16.66ms frame time but still rendering at 60FPS wouldn't that mean at least 2 frames of input lag?

Giancarlo · Nov 4, 2023

TLZ said:
32gb of storage yes.

can 32GB e 6/8GB of storage and RAM memory, allow Switch sucessor to have COD or any triple AAA game on the console?

carbvan · Nov 4, 2023

Speaking of storage, do we know anything about the storage type of the Switch 2 yet? I was wondering last night about their expandable storage solutions because Nintendo hasn't made it more complicated than putting an SD card in their consoles before. Would they make a proprietary storage chip like Xbox for the sake of simplicity? I don't think Nintendo would be too keen on having users open their Switch 2 and screwing a new SSD into the thing.

TLZ · Nov 4, 2023

Giancarlo said:
can 32GB e 6/8GB of storage and RAM memory, allow Switch sucessor to have COD or any triple AAA game on the console?

Maybe cutting back on a lot of assets and doing a lot of compression.

Ghostsonplanets · Nov 4, 2023

carbvan said:
Speaking of storage, do we know anything about the storage type of the Switch 2 yet? I was wondering last night about their expandable storage solutions because Nintendo hasn't made it more complicated than putting an SD card in their consoles before. Would they make a proprietary storage chip like Xbox for the sake of simplicity? I don't think Nintendo would be too keen on having users open their Switch 2 and screwing a new SSD into the thing.

1.No, but it's likely that they'll use eUFS for the internal storage. 2. I don’t think they will go the proprietary route for external storage.

The discussions about storage come and go frequently here. Everyone agrees that they will very much likely use eUFS for internal. But external storage is a big unknown. They can keep using MicroSD, but mSD is slow. Outside of MicroSD, there's SDExpress, CFExpress, UHS-II. But they all have their drawbacks and one glaring one: They're way more expensive than mSD.

What will Nintendo adopt or do? It's anyone guess.

ItWasMeantToBe19 · Nov 4, 2023

carbvan said:
Speaking of storage, do we know anything about the storage type of the Switch 2 yet? I was wondering last night about their expandable storage solutions because Nintendo hasn't made it more complicated than putting an SD card in their consoles before. Would they make a proprietary storage chip like Xbox for the sake of simplicity? I don't think Nintendo would be too keen on having users open their Switch 2 and screwing a new SSD into the thing.

High speed SSDs aren't really viable for mobile hardware. It's probably UFS memory, either UFS 2.2, UFS 3.1, or UFS 4.0 (higher better but more expensive here).

The issues are:

1. Will Nintendo allow UFS memory to reach very high speeds in game read speed?
2. Will many Switch 2 games have mandatory installs to make use of the high game read speed?

Because this leads to the major third issue of

3. No one has ever made expandable storage with UFS so the Switch 2 expandable storage using UFS would be extremely expensive for its first several years.

If they throttle the speed to like 100 MB/s, then people will probably just be able to use SD cards again. But UFS cards can hit 1 GB/s and an SD card wouldn't be able to keep up.

Paul_Subsonic · Nov 4, 2023

Edit : I am incapable of correctly remembering names. Replace "Relax" with "SIGMA".

On the discussion of denoisers, although I never completed the dlss 3.5 breakdown I promised because of personal reasons (I'm still trying to convince myself I'll finish it eventually), I have learned interesting things.

Basically : ReBLUR, the general purpose denoiser of NRD, is expensive as hell. And won't run on Drake, considering it doesn't even run that well on the big boxes. Ever wondered why consoles so often prioritize shadows and ao for raytracing ? One of the big reasons is because they're easier to denoise. And in NRD games, those effects are handled by ReLAX, the fast but really limited NRD denoiser.

ItWasMeantToBe19 · Nov 4, 2023

TomNookYankees said:
So in a scenario of 3 stages where each stage is close to 16.66ms frame time but still rendering at 60FPS wouldn't that mean at least 2 frames of input lag?

To be clear, outside of Bungie, no one hits 1 frame of input lag. Competitive fighting and FPS games aim pretty low, but a lot of games are around 5 frames of input lag and Red Dead 2 somehow is at 1/3 to 1/2... of a second of input lag.

... The issue is whether or not this DLSS step would add on another frame of input lag (or even more depending on how frame timing ends up) as that could end up feeling bad. If Nintendo takes this approach, hopefully they make sure the rest of their pipeline minimizes input lag.

Gerald · Nov 4, 2023

CeramicPigeon said:
The clocks Rich used in the video were also what he theorized would be what Docked mode is clocked at, meaning handheld mode would be running much slower.

Docked mode will be most likely 1.0-1.1GHz

ILikeFeet · Nov 4, 2023

Paul_Subsonic said:
On the discussion of denoisers, although I never completed the dlss 3.5 breakdown I promised because of personal reasons (I'm still trying to convince myself I'll finish it eventually), I have learned interesting things.

Basically : ReBLUR, the general purpose denoiser of NRD, is expensive as hell. And won't run on Drake, considering it doesn't even run that well on the big boxes. Ever wondered why consoles so often prioritize shadows and ao for raytracing ? One of the big reasons is because they're easier to denoise. And in NRD games, those effects are handled by ReLAX, the fast but really limited NRD denoiser.

for shadows, NRD has SIGMA denoiser. that said, ReBLUR is used for WatchDogs Legion I think, though I guess the low RT resolution allows it to overcome costs

Paul_Subsonic · Nov 4, 2023

ILikeFeet said:
for shadows, NRD has SIGMA denoiser. that said, ReBLUR is used for WatchDogs Legion I think, though I guess the low RT resolution allows it to overcome costs

Oops I mistook ReLAX with SIGMA

I have reserached the subject of the cost of denoisers for hours yet I make this mistake. My stupidity never fails to surprise me.

ItWasMeantToBe19 · Nov 4, 2023

Downloading Death Stranding on EGS.

Ghostsonplanets · Nov 4, 2023

Gerald said:
Docked mode will be most likely 1.0-1.1GHz

2048 CUDA Cores@750MHz = 1536 CUDA Cores@1GHz

3 TFLOPs = 3 TFLOPs

StarTopic Future Nintendo Hardware & Technology Speculation & Discussion |ST| (New Staff Post, Please read)

Uncle Muji

"[✄]. [✄]. [✄]. [✄]." -Microsoft

Ghost in the Sales

Chain Chomp

Manakete

Deleted member 887

Guest

#Team2024

Manakete

Manakete

Manakete

Ghost in the Sales

Then I'll say "I told ya"

Must proceed at a high velocity‎ ‎ ‎ ‎ #TeamDecade

Manakete

Rattata

Koopa

Moblin

Warpstar Knight

Like Like

Moblin

Manakete

#TeamMarch2025 and eternal #TeamSFColors

Designated Xenoblade Loremaster

You lost the game

#Team2024

You lost the game

Designated Xenoblade Loremaster

You lost the game

"[✄]. [✄]. [✄]. [✄]." -Microsoft

Moblin

Deleted member 887

Guest

Caught: 1025

Then I'll say "I told ya"

Like Like

Warpstar Knight

Manakete

Être majestuatisant et subversif

Koopa

Nintendo connoiseur

Chain Chomp

Like Like

Ghost in the Sales

Manakete

Être majestuatisant et subversif

Manakete

Bob-omb

Warpstar Knight

Être majestuatisant et subversif

Manakete

Ghost in the Sales