• Hey everyone, staff have documented a list of banned content and subject matter that we feel are not consistent with site values, and don't make sense to host discussion of on Famiboards. This list (and the relevant reasoning per item) is viewable here.

StarTopic Future Nintendo Hardware & Technology Speculation & Discussion |ST| (New Staff Post, Please read)

dont you all feel, a sort of depression feeling, now that Nintendo Switch is approaching it end, dont you all hoped Nintendo Switch would live until 2030 and then Switch sucessor, everywhere i go, i see people with this trough, why Nintendo is killing Switch now, cant they keep it alive for 5 more years
Eh?

It is possible that the Switch to Drake gap between releases will exceed the ~7.5 year gap between the Famicom and the Super Famicom.

13 years would be 4 more than the GB to GBC gap.
 
0
Is the Cyberpunk modder you posted the only source for RR being ReLAX/ReBLUR plus a neural network for temporal rejection of rays? That just doesn’t seem credible to me. The claim in all of Nvidia’s messaging was that the neural network in RR is doing the actual simultaneous denoising via reconstruction and supersampling, like in the Intel paper. Using ReLAX or ReBLUR would be in direct contradiction to that.

It's all the more suspicious given Cyberpunk 2077 already used NRD before the 2.0 update (see here). There are also clear visual differences between the two, including artefacts like trails which are consistent with a recurrent neural network implementation (eg early versions of DLSS 2) and aren't seen when using NRD denoisers.

I wonder if this modder is just reconfiguring the non-RR path tracing denoisers and is confusing that for the RR setup.
 
I think the Switch 2 specs are quite solid, being able to run games like A Plague Tale Requiem and Cyberpunk 2077 at PS5 settings is no joke, the resolution I wish could be a little better in docked mode, but still.

I think a Switch 2 Pro could really help, and hopefully this time around with no COVID type break (and component shortages), Nintendo will consider a Switch 2 Pro.

It doesn't have "to split the userbase!", just run the same games with a little boost in frame rate and resolution, maybe ray tracing also turned on. It's really not that different from New 3DS to the OG 3DS, here we see Monster Hunter 4 running at like closer to 50 fps than 30 fps with higher resolution textures, that's a pretty huge boost for New 3DS over the original 3DS.



I think high end games becoming likely more widespread on Apple iPhone/iPad and Android devices in the next 4-5 years should push Nintendo to also stay on top of performance and not rest on the status quo so much. At some point you're probably also going to see Steam Deck

That's very unlikely to happen. Specially if Switch 2 is already fabbed on a bleeding edge node like 4N.
 
Is there a way to mod Alan Wake 2 so that it skips the denoising step and then compare that frametime to Alan Wake 2 with RR enabled.

If this actually is a neural network, then it's massively too much for what the Switch 2 would need (as it's designed for path tracing) and it's possible NVIDIA could make a heavily pruned, shitty one that quickly worked on like RT Reflections.
 
0
After Rich's tests, what can we extrapolate for handheld mode in terms of DLSS? The base clock used for testing (750MHz) was a plausible docked mode at 1GHz.

Having, for example, half of that clock (375MHz), would DLSS quality or balanced still be ok for 60fps@1080p games on handheld? Or even 30fps
The rule of thumb is - whatever you can get docked mode to do at 1440p, you can get handheld to do at 1080p. It's half as many pixels, and most of the things we're talking about scale with resolution.

Death Stranding running 1440p30fps at max settings and DLSS Performance mode, upscaling from 720p? Then in handheld expect Death Stranding at 1080p30fps max settings and DLSS performance mode, upscaling from 540p.

"But what about 60fps" - 60fps totally viable, sure. Rich's tests don't in anyway rule it out. I think I understand now why some folks are getting the wrong impression from Rich's video, so I'm gonna right something up in a minute, but yes, Rich is pretty clear that 1440p60fps is totally viable.
 
RR actually being a neural network would be exciting so hopefully it’s the case.

I do think it would take a while for NVIDIA to make a pruned version of this for RT applications other than full PT, but we’ll see.
 
0
The rule of thumb is - whatever you can get docked mode to do at 1440p, you can get handheld to do at 1080p. It's half as many pixels, and most of the things we're talking about scale with resolution.

Death Stranding running 1440p30fps at max settings and DLSS Performance mode, upscaling from 720p? Then in handheld expect Death Stranding at 1080p30fps max settings and DLSS performance mode, upscaling from 540p.

"But what about 60fps" - 60fps totally viable, sure. Rich's tests don't in anyway rule it out. I think I understand now why some folks are getting the wrong impression from Rich's video, so I'm gonna right something up in a minute, but yes, Rich is pretty clear that 1440p60fps is totally viable.
If it was in the video I missed it, but did Rich calculate DLSS frame time cost at 1440p?
 
If it was in the video I missed it, but did Rich calculate DLSS frame time cost at 1440p?
Yes.
Screenshot_20231104_142444_YouTube.jpg
 
The rule of thumb is - whatever you can get docked mode to do at 1440p, you can get handheld to do at 1080p. It's half as many pixels, and most of the things we're talking about scale with resolution.

Death Stranding running 1440p30fps at max settings and DLSS Performance mode, upscaling from 720p? Then in handheld expect Death Stranding at 1080p30fps max settings and DLSS performance mode, upscaling from 540p.

"But what about 60fps" - 60fps totally viable, sure. Rich's tests don't in anyway rule it out. I think I understand now why some folks are getting the wrong impression from Rich's video, so I'm gonna right something up in a minute, but yes, Rich is pretty clear that 1440p60fps is totally viable.

Yeah, but most of Rich's tests were not trying to achieve 1440p, but 1080p. That's why I asked that. If we have A Plague Tale running at 1080p balanced, can we have it running at 1080p using performance mode (maybe even ultra performance) while having the tensor cores at half the clock? Are we going to see games in handheld not running at the native [screen] res again?

Of course, devs can always optimize the game (and downgrade things) so it can meet a good balance, but then we weren't going to see the same visual quality we saw in the test.

Well, the right thing is to wait for the real thing and see how it works. But I just got this feeling that, maybe, Drake would have needed more in terms of tensor cores, as it is going to be fundamental for a hybrid like the switch 2. Maybe there's more customization than we know? Or devs will just work with what they have in hands and do the best they can. I don't know.
 
And the PS4 and Xbox One are running the game at 30fps with the help of TAA. This view of DLSS/FSR as this crutch or big asterisk confuses me when so many games that are viewed as huge graphical feats (Red Dead Redemption 2, Cyberpunk 2077, etc.) use what is - as an incredibly gross simplification - a worse looking and performing version of DLSS/FSR.
I used to get into this over turbocharging vs supercharging engines, because import cars tended to use turbos to make big power and muscle car people would call it "cheating," saying it's "not true horsepower," etc, and then they'd slap a supercharger on their car and say that one's not cheating. And when I brought up that both are literally just slightly different methods of compressing incoming air it would start a fight.

Everybody's upscaling these days, I dunno why Nintendo's the only one catching flack for it (I mean, I do, but yeah)

* Hidden text: cannot be quoted. *
Hidden content is only available for registered users. Sharing it outside of Famiboards is subject to moderation.

dont you all feel, a sort of depression feeling, now that Nintendo Switch is approaching it end, dont you all hoped Nintendo Switch would live until 2030 and then Switch sucessor, everywhere i go, i see people with this trough, why Nintendo is killing Switch now, cant they keep it alive for 5 more years
*points at his own tag*
Team Decade baybeeee


I'm kinda curious as to if Monolith Soft/Zelda team would get RT working in their open-world/open-zone games though. Would be awesome if they did, but it'd still be a bit weird. Idk, something I'd like to see being experimented with at least.
This actually got me wondering, and this question is for everyone, is there an LOD equivalent to raytracing? Like in a huge open game like BotW or XB is there a distance point where RT stops calculating and it defaults to old school illumination methods? Sorry for the noob question but it strikes me as something that could help performance if the entire massive vista isnt entirely lit with RT.
 
Last edited:
Does anyone have a good theory on what could account for the huge disparity between DF testing, and the numbers in the DLSS documentation?
Depends on the devs, the quality of the PC port, and how well they implement DLSS. There's tons of factors to consider and even DF said that what they test is likely not going to reflect what the end product may perform like, but a rather very heavy rough estimate since there's currently no equivalent hardware to test on what-ifs and so they try to get the closest thing they can possibly obtain.

Edit: Have to consider what sort of changes the devs will make on Switch 2 games that might be very different on other platforms, especially on PC. Like Doom 2016/Eternal on Switch, where the porting devs made lots of heavy changes that is likely impossible to replicate on PC even if people try to use a Tegra X1 chipset and run the PC version at the lowest possible settings and resolution to try to achieve the same results.
 
0
The rule of thumb is - whatever you can get docked mode to do at 1440p, you can get handheld to do at 1080p. It's half as many pixels, and most of the things we're talking about scale with resolution.

Death Stranding running 1440p30fps at max settings and DLSS Performance mode, upscaling from 720p? Then in handheld expect Death Stranding at 1080p30fps max settings and DLSS performance mode, upscaling from 540p.

"But what about 60fps" - 60fps totally viable, sure. Rich's tests don't in anyway rule it out. I think I understand now why some folks are getting the wrong impression from Rich's video, so I'm gonna right something up in a minute, but yes, Rich is pretty clear that 1440p60fps is totally viable.
One thing that occurred to me is I guess it's also possible to more easily achieve 1440p60fps docked, 1080p60fps handheld by having the DLSS render the previous frame rather than the current frame so that the internal rendering gets the full 16.6ms of frametime
 
0
Did you not watch the video? Rich tests both with and without DLSS.
Yes, I just noticed native lol. A lot the games here run worse than base PS4 docked which is a load of crap lol. This “hardware” testing here is based on 8GB ram and 8nm. The audacity of him stressing it’s “confirming” last gen performance from Bobby Kotick’s mouth got on my nerves a little because that’s what he chose to believe rather than the Gamescom’s report.

I mean it’s possible that it’s really is that, but I don’t think so esp from what we speculated here.
 
Last edited:
ReLAX is just very heavy, whatever, I don't want to fight about this one.
so you don't have evidence? I found some measures of ReLAX but I want to know what counts as "too heavy" before posting it.

My question is not about relative denoiser cost, but just total additional frametime cost over SR alone.

Because if SR is eating up 7.5 ms of frametime on the tensor cores, then RR eating up another 9.2 ms (or more) of frametime on the tensor cores would make RR completely unviable.
this video has their frame rate drop from 83 to 74 fps. an addition of 1.5ms. it's with path tracing turned off and PS5 RT settings (rt lighting and rt reflections)

 
My pc is not that powerful its like over 5 years old and I can run botw at 4k and it runs pretty good
Backing this up. I run Breath of the Wild at 4k/30 with some of the settings boosted (LODs, shadow draw distance, etc.), and it's rock solid on a mobile 2060. Native 4k running on an emulator. Getting BotW up to 4k on a console where they can actually optimize/change things, have access to tools like DLSS, and isn't running through a compatibility layer and on Windows wouldn't surprise me one bit.

I used to get into this over turbocharging vs supercharging engines, because import cars tended to use turbos to make big power and muscle car people would call it "cheating," saying it's "not true horsepower," etc, and then they'd slap a supercharger on their car and say that one's not cheating. And when I brought up that both are literally just slightly different methods of compressing incoming air it would start a fight.

Everybody's upscaling these days, I dunno why Nintendo's the only one catching flack for it (I mean, I do, but yeah)
I think it's moreso people having a hate boner for DLSS/FSR than anything Nintendo - though it's no doubt there. DLSS/FSR isn't just cool technical wizardy; they're marketing tools to sell the RTX 7060XTX Ti Super Pro & Knuckles. These two companies have incredibly loyal fans - whether they bleed Team Green or Team Red - who do a hell of a good job at trying to convince everybody that these upscalers are the best thing since sliced bread. Some are tired of the hyperbole/cherry picking - "DLSS is better than native!" is frequently parroted, but it's conveniently left out that that's only really the case at higher output resolutions - and the fights ("My upscaler is best upscaler because it looks better!" "No, my upscaler is best upscaler because it's open source and can be used everywhere!"), so they tend to overcorrect and go the other way; the upscalers aren't good, they're a crutch, and they aren't rendering "real" frames/pixels. Compound this with the general lack of knowledge about how game development works and the current state of computer hardware design/production (Moore's Law and the whole debate about that) and you get this.
 
The whole concurrency situation with CPU/GPU/DLSS is something that even retro consoles have been doing for a long time with CPU/PPU. It's not about processing a whole frame's worth of data through all phases from beginning to end before starting the next. It's about setting up the next part before moving onto the next and not wasting cycles when possible.

Take the NES, for example. The PPU (Picture Processing Unit) handles all rendering of background, sprites, and whatnot by reading from memory-mapped registers where each register is linked to a particular part, like a background's /origin/position, sprite's color palette, etc along with the small VRAM pool for graphic cels these registers can also reference. Every 16.66ms (NTSC, forgive me for not going into the details of this vs PAL), the PPU runs through 2 phases. A VDraw phase, where it draws the frame one scanline at a time (scanline 0 - 239, NTSC blanks out the top and bottom 8 scanlines), and a VBlank phase, where it is idle (scanline 240-261) for a total of 262 scanlines. It doesn't wait for the CPU. It will take what's in its registers, renders the frame for the 1st phase, and sits idle for the second phase.

From my understanding of games back then, the NES CPU will handle the game logic for a frame, and then will call the command to wait for the next VBlank (scanline 240), regardless of how much work it has to do. Once that scanline is reached after waiting for it, it updates the PPU's registers with whatever needs updating based on the game logic it had just processed. Then immediately afterwards begins work on the next game logic frame. So think of the PPU phases starting with the VDraw, whereas the CPU's phase starts with the VBlank. Let's assume it takes just a few scanline's worth of time to update those PPU registers, so the workflow could be something like this....

VBlank phase
  • Scanline 240 --- VBlank phase starts. PPU goes idle. CPU starts updating PPU registers from last processed game logic frame.
  • Scanline 240~244 --- CPU continued updating PPU registers from last processed game logic frame.
  • Scanline 244 --- CPU finishes PPU update. Begins processing next game logic frame. PPU remains idle.
  • Scanline 244~261 --- CPU continues processing next game logic frame. PPU still idle.
  • Scanline 261 --- VBlank ends, wraps around to scanline 0 on next scanline. PPU becomes active, ready for scanline 0 to start rendering. CPU still processing next game logic frame.
VDraw phase
  • Scanline 0~199 --- VDraw phase starts. PPU renders 200 scanlines worth of the frame based on prior game logic frame in registers. CPU still processing next game logic frame.
  • Scanline 200 --- CPU finishes next game logic frame. Makes call to wait for VBlank. PPU renders scanline at this mark.
  • Scanline 200~239 --- CPU waits. PPU finishes rendering based on last game logic frame.
The CPU could finish its work sooner or later. Ever wonder why some games slow down? It's because the work load for the CPU to process a single game logic frame is too much to fit into a single VDraw + VBlank span of time, so when the call to wait for the VBlank is done, it could already passed the start of VBlank, meaning it has to wait until the next go-around. It's dropping to 30 game logic fps,. The PPU, however, continues to operate at 60fps regardless of the CPU. It has a fixed amount of time to spend per scanline before moving onto the next, using what's in its registers. If the CPU only updates those registers at a rate of 30fps, then the PPU renders using the same information twice before they get changed.


For something like the Switch 2 with more modern technology and different handling of the flow, to me, it's more like how instructions go through a CPU pipeline, or like an assembly line. First stage is the CPU, second stage is the GPU, and third stage in this case is DLSS. The CPU stage handles the game logic frame, and then waits for the GPU to be ready for the next frame. When the GPU is ready, the CPU hands over the information the GPU needs to begin. The GPU takes it, and begins rendering the frame based on that, letting the CPU start the next game logic frame. The GPU continues to render the frame, and once it's finished, it waits for the DLSS phase to finish its own task. It hands over the rendered frame (and whatever else it needs) so the DLSS stage can do its thing, thereby letting the GPU stage be ready to receive from the CPU stage.

Let's say there was only the CPU and GPU. If for a single frame, it took the CPU stage 3.33ms, and the GPU stage 16.66ms, this means for every frame from start to finish would take ~20ms. But with concurrency, we visually see a frame change every 16.66ms (or 60fps). That is because the bottleneck is the GPU at 16.66ms, meaning the CPU is having to sit idly for 13.33ms, waiting for the GPU.

Now let's introduce DLSS, allowing us to spend less time rendering with the GPU by using a lower resolution. If for a single frame, it took the CPU stage 3.33ms, the GPU stage 8.33ms, and the DLSS stage 8.33ms, that's still a total of 20ms from start to finish. But with concurrency, we'd visually see a frame change every 8.33ms (or 120fps). This is because the bottleneck of the GPU is reduced. The CPU only has to wait 5ms instead of 13.33ms after processing, so it does get pushed more.

CPU has to wait on the GPU, GPU has to wait on both CPU and DLSS, and DLSS has to wait on GPU. Like others have said, DLSS isn't free, but can increase frame rates by giving the GPU the chance to spend less time rendering by using a lower resolution. So even with something like BotW demoed for Switch 2 at 4k60, so long as no stage in this pipeline (whether 3 stages or 100 stages) goes above 16.66ms, it can hit 60fps.

I'm done rambling for a bit now.
 
Does anyone have a good theory on what could account for the huge disparity between DF testing, and the numbers in the DLSS documentation?

I might have gotten Death Stranding free on EGS so maybe I can test this. If I do, I’ll try to turn off post processing effects to see if those are tied to output resolution.
 
For the doomposters, I've played Death Stranding on the deck both handheld and docked and it does not run remotely close to that. And if that's the floor of the Switch 2 performance, that means we're in for a wild ride.
It's actually impressive how big the disparity is between the Steam Deck ""optimised"" version and the "Default" preset used in the DF video. And that's forgetting that the Switch 2's use of DLSS make the image look better than the Steam Deck's FSR and that the Switch 2 is running Death Stranding in 1080p compared to Steam Deck's 800p.
 
For the doomposters, I've played Death Stranding on the deck both handheld and docked and it does not run remotely close to that. And if that's the floor of the Switch 2 performance, that means we're in for a wild ride.
Rich specifically says that it could be the floor, but could also be that T239 is weaker. That is something to keep in mind.
 
It's actually impressive how big the disparity is between the Steam Deck ""optimised"" version and the "Default" preset used in the DF video. And that's forgetting that the Switch 2's use of DLSS make the image look better than the Steam Deck's FSR and that the Switch 2 is running Death Stranding in 1080p compared to Steam Deck's 800p.
The clocks Rich used in the video were also what he theorized would be what Docked mode is clocked at, meaning handheld mode would be running much slower.
 
Rich specifically says that it could be the floor, but could also be that T239 is weaker. That is something to keep in mind.
You could be right... however:
  • The tests are made on pre-release information that could've changed or been upgraded in the time since the leak
  • They were done on PC without console-specific tweaks that have, in the past, heavily increased performance to an insane degree
  • We know that the VRAM used for the 2050 Laptop GPU isn't going to be the same for the Switch 2, meaning the general stability of framerate will likely improve

I'm not saying you are wrong about thinking the T239 could be weaker, but what I am saying is that I have significant and notable doubts about that.
 
The whole concurrency situation with CPU/GPU/DLSS is something that even retro consoles have been doing for a long time with CPU/PPU. It's not about processing a whole frame's worth of data through all phases from beginning to end before starting the next. It's about setting up the next part before moving onto the next and not wasting cycles when possible.

Take the NES, for example. The PPU (Picture Processing Unit) handles all rendering of background, sprites, and whatnot by reading from memory-mapped registers where each register is linked to a particular part, like a background's /origin/position, sprite's color palette, etc along with the small VRAM pool for graphic cels these registers can also reference. Every 16.66ms (NTSC, forgive me for not going into the details of this vs PAL), the PPU runs through 2 phases. A VDraw phase, where it draws the frame one scanline at a time (scanline 0 - 239, NTSC blanks out the top and bottom 8 scanlines), and a VBlank phase, where it is idle (scanline 240-261) for a total of 262 scanlines. It doesn't wait for the CPU. It will take what's in its registers, renders the frame for the 1st phase, and sits idle for the second phase.

From my understanding of games back then, the NES CPU will handle the game logic for a frame, and then will call the command to wait for the next VBlank (scanline 240), regardless of how much work it has to do. Once that scanline is reached after waiting for it, it updates the PPU's registers with whatever needs updating based on the game logic it had just processed. Then immediately afterwards begins work on the next game logic frame. So think of the PPU phases starting with the VDraw, whereas the CPU's phase starts with the VBlank. Let's assume it takes just a few scanline's worth of time to update those PPU registers, so the workflow could be something like this....

VBlank phase
  • Scanline 240 --- VBlank phase starts. PPU goes idle. CPU starts updating PPU registers from last processed game logic frame.
  • Scanline 240~244 --- CPU continued updating PPU registers from last processed game logic frame.
  • Scanline 244 --- CPU finishes PPU update. Begins processing next game logic frame. PPU remains idle.
  • Scanline 244~261 --- CPU continues processing next game logic frame. PPU still idle.
  • Scanline 261 --- VBlank ends, wraps around to scanline 0 on next scanline. PPU becomes active, ready for scanline 0 to start rendering. CPU still processing next game logic frame.
VDraw phase
  • Scanline 0~199 --- VDraw phase starts. PPU renders 200 scanlines worth of the frame based on prior game logic frame in registers. CPU still processing next game logic frame.
  • Scanline 200 --- CPU finishes next game logic frame. Makes call to wait for VBlank. PPU renders scanline at this mark.
  • Scanline 200~239 --- CPU waits. PPU finishes rendering based on last game logic frame.
The CPU could finish its work sooner or later. Ever wonder why some games slow down? It's because the work load for the CPU to process a single game logic frame is too much to fit into a single VDraw + VBlank span of time, so when the call to wait for the VBlank is done, it could already passed the start of VBlank, meaning it has to wait until the next go-around. It's dropping to 30 game logic fps,. The PPU, however, continues to operate at 60fps regardless of the CPU. It has a fixed amount of time to spend per scanline before moving onto the next, using what's in its registers. If the CPU only updates those registers at a rate of 30fps, then the PPU renders using the same information twice before they get changed.


For something like the Switch 2 with more modern technology and different handling of the flow, to me, it's more like how instructions go through a CPU pipeline, or like an assembly line. First stage is the CPU, second stage is the GPU, and third stage in this case is DLSS. The CPU stage handles the game logic frame, and then waits for the GPU to be ready for the next frame. When the GPU is ready, the CPU hands over the information the GPU needs to begin. The GPU takes it, and begins rendering the frame based on that, letting the CPU start the next game logic frame. The GPU continues to render the frame, and once it's finished, it waits for the DLSS phase to finish its own task. It hands over the rendered frame (and whatever else it needs) so the DLSS stage can do its thing, thereby letting the GPU stage be ready to receive from the CPU stage.

Let's say there was only the CPU and GPU. If for a single frame, it took the CPU stage 3.33ms, and the GPU stage 16.66ms, this means for every frame from start to finish would take ~20ms. But with concurrency, we visually see a frame change every 16.66ms (or 60fps). That is because the bottleneck is the GPU at 16.66ms, meaning the CPU is having to sit idly for 13.33ms, waiting for the GPU.

Now let's introduce DLSS, allowing us to spend less time rendering with the GPU by using a lower resolution. If for a single frame, it took the CPU stage 3.33ms, the GPU stage 8.33ms, and the DLSS stage 8.33ms, that's still a total of 20ms from start to finish. But with concurrency, we'd visually see a frame change every 8.33ms (or 120fps). This is because the bottleneck of the GPU is reduced. The CPU only has to wait 5ms instead of 13.33ms after processing, so it does get pushed more.

CPU has to wait on the GPU, GPU has to wait on both CPU and DLSS, and DLSS has to wait on GPU. Like others have said, DLSS isn't free, but can increase frame rates by giving the GPU the chance to spend less time rendering by using a lower resolution. So even with something like BotW demoed for Switch 2 at 4k60, so long as no stage in this pipeline (whether 3 stages or 100 stages) goes above 16.66ms, it can hit 60fps.

I'm done rambling for a bit now.

This is a good post, and I do think concurrency between tensor cores and other GPU logic could come into play on Switch 2, I would caution that it isn't a completely free lunch. The tensor cores sit within the SM with the regular GPU cores and RT cores, which means that if you want to run them at the same time, they'll be competing with each other for things like memory bandwidth, register files and L1/L2 caches. So a DLSS pass which takes 8.33ms (to take your example) when run on its own will probably take longer than that if you try to run it at the same time as regular GPU code.

Optimising around T239's specific hardware may allow developers to mitigate this to some effect, for example by managing register usage to minimise contention, but I'd still expect to see some degree of slowdown from running DLSS concurrently with rendering the next frame. Performance should certainly be better than running them sequentially, but I wouldn't count on being able to reduce DLSS's impact to zero just via concurrency.
 
This is a good post, and I do think concurrency between tensor cores and other GPU logic could come into play on Switch 2, I would caution that it isn't a completely free lunch. The tensor cores sit within the SM with the regular GPU cores and RT cores, which means that if you want to run them at the same time, they'll be competing with each other for things like memory bandwidth, register files and L1/L2 caches. So a DLSS pass which takes 8.33ms (to take your example) when run on its own will probably take longer than that if you try to run it at the same time as regular GPU code.

Optimising around T239's specific hardware may allow developers to mitigate this to some effect, for example by managing register usage to minimise contention, but I'd still expect to see some degree of slowdown from running DLSS concurrently with rendering the next frame. Performance should certainly be better than running them sequentially, but I wouldn't count on being able to reduce DLSS's impact to zero just via concurrency.
Yeah, there are definitely things to take into account. I just wanted to show a simple scenario of what concurrency can provide, and how this isn't really something new.
 
Yeah, but most of Rich's tests were not trying to achieve 1440p, but 1080p. That's why I asked that. If we have A Plague Tale running at 1080p balanced, can we have it running at 1080p using performance mode (maybe even ultra performance) while having the tensor cores at half the clock? Are we going to see games in handheld not running at the native [screen] res again?
Yes. We will not see a console from any manufacturer hit native res on all of its games, anytime in the next decade.

The RTX 4090 is 82 TFLOPS. On a 4k screen, that's about 10 TFLOPS/million pixels. Nintendo could literally have given you a 1080p screen and attached a whole ass PS5 to the back of it, and that would still only be 5 TFLOPS/million pixel.

This is simple, inevitable fact.

Of course, devs can always optimize the game (and downgrade things) so it can meet a good balance, but then we weren't going to see the same visual quality we saw in the test.
Yes. Exactly. Maximizing for effects will not maximize for resolution, and this test is just an example.

Well, the right thing is to wait for the real thing and see how it works. But I just got this feeling that, maybe, Drake would have needed more in terms of tensor cores, as it is going to be fundamental for a hybrid like the switch 2.
If tensor cores were so fast that DLSS was 100% free, you'd still not have enough power. Devs will still have to balance effects versus internal resolution, and that will never stop. Technically, this is Control running at 1440p with DLSS, but I would say it would be worth it to increase the internal res and drop some of the effects.

Maybe there's more customization than we know? Or devs will just work with what they have in hands and do the best they can. I don't know.
Yes, they will do what they have always done. There was no number that would make that stop.
 
Didn't they discount the possibility of DLSS 3 in for the switch NG not all that long ago? Not being very technical my judgment is limited, but as we're talking a console gaming and not PC there must be a number of factors that can't be replicated in the PC space.

Not to take anything away from DF, I think they do some amazing work and if Rich got even close to what we can expect were in for one hell for a ride.
T239 most likely doesn't meet the stated system requirements for DLSS frame generation. Newer versions of other DLSS tools should work fine.
* Hidden text: cannot be quoted. *
Hidden content is only available for registered users. Sharing it outside of Famiboards is subject to moderation.
 
so doing some looking, I found how much ReLAX costs. and in addition to that, Nvidia NRD has three denoisers to choose from that are intended for different use cases

image.png


interestingly, throwing more hardware at ReLAX doesn't do anything. want faster denoising, lower the resolution
 
so doing some looking, I found how much ReLAX costs. and in addition to that, Nvidia NRD has three denoisers to choose from that are intended for different use cases

image.png


interestingly, throwing more hardware at ReLAX doesn't do anything. want faster denoising, lower the resolution

The RTX 3090 is only slightly slower than you would expect considering the teraflop difference between the two and that it's working on an image with 16/9 as many pixels.
 
0
These tests do raise some significant questions about the viability of Ray Reconstruction on the Switch 2 to add on to prior questions about RR.

RR seemingly works by selecting rays from frames n-1, n-2, ..., that can be reused for frame n.

Things we don't know about RR currently

-How much additional frametime does it cost relative to DLSS itself?
-Where in the pipeline does RR happen? Does it happen at the same time as Super Resolution or at a different time? What GPU work is still needed after RR happens? This may not work well with deferred frames if significant GPU work has to be done after RR.
-Is RR compatible with denoisers other than ReBLUR or ReLAX? ReLAX is 1000% not viable on Switch 2 and ReBLUR seems unlikely to work well on Switch 2.
Should be the opposite : Reblur is not viable on Drake, but ReLax should work.

I think you confused the two.

Edit : No, I'm the one who's confused.
 
Last edited:
CPU has to wait on the GPU, GPU has to wait on both CPU and DLSS, and DLSS has to wait on GPU. Like others have said, DLSS isn't free, but can increase frame rates by giving the GPU the chance to spend less time rendering by using a lower resolution. So even with something like BotW demoed for Switch 2 at 4k60, so long as no stage in this pipeline (whether 3 stages or 100 stages) goes above 16.66ms, it can hit 60fps.

I'm done rambling for a bit now.
So in a scenario of 3 stages where each stage is close to 16.66ms frame time but still rendering at 60FPS wouldn't that mean at least 2 frames of input lag?
 
Speaking of storage, do we know anything about the storage type of the Switch 2 yet? I was wondering last night about their expandable storage solutions because Nintendo hasn't made it more complicated than putting an SD card in their consoles before. Would they make a proprietary storage chip like Xbox for the sake of simplicity? I don't think Nintendo would be too keen on having users open their Switch 2 and screwing a new SSD into the thing.
 
Speaking of storage, do we know anything about the storage type of the Switch 2 yet? I was wondering last night about their expandable storage solutions because Nintendo hasn't made it more complicated than putting an SD card in their consoles before. Would they make a proprietary storage chip like Xbox for the sake of simplicity? I don't think Nintendo would be too keen on having users open their Switch 2 and screwing a new SSD into the thing.
1.No, but it's likely that they'll use eUFS for the internal storage. 2. I don’t think they will go the proprietary route for external storage.

The discussions about storage come and go frequently here. Everyone agrees that they will very much likely use eUFS for internal. But external storage is a big unknown. They can keep using MicroSD, but mSD is slow. Outside of MicroSD, there's SDExpress, CFExpress, UHS-II. But they all have their drawbacks and one glaring one: They're way more expensive than mSD.

What will Nintendo adopt or do? It's anyone guess.
 
Speaking of storage, do we know anything about the storage type of the Switch 2 yet? I was wondering last night about their expandable storage solutions because Nintendo hasn't made it more complicated than putting an SD card in their consoles before. Would they make a proprietary storage chip like Xbox for the sake of simplicity? I don't think Nintendo would be too keen on having users open their Switch 2 and screwing a new SSD into the thing.

High speed SSDs aren't really viable for mobile hardware. It's probably UFS memory, either UFS 2.2, UFS 3.1, or UFS 4.0 (higher better but more expensive here).

The issues are:

1. Will Nintendo allow UFS memory to reach very high speeds in game read speed?
2. Will many Switch 2 games have mandatory installs to make use of the high game read speed?

Because this leads to the major third issue of

3. No one has ever made expandable storage with UFS so the Switch 2 expandable storage using UFS would be extremely expensive for its first several years.

If they throttle the speed to like 100 MB/s, then people will probably just be able to use SD cards again. But UFS cards can hit 1 GB/s and an SD card wouldn't be able to keep up.
 
Edit : I am incapable of correctly remembering names. Replace "Relax" with "SIGMA".

On the discussion of denoisers, although I never completed the dlss 3.5 breakdown I promised because of personal reasons (I'm still trying to convince myself I'll finish it eventually), I have learned interesting things.

Basically : ReBLUR, the general purpose denoiser of NRD, is expensive as hell. And won't run on Drake, considering it doesn't even run that well on the big boxes. Ever wondered why consoles so often prioritize shadows and ao for raytracing ? One of the big reasons is because they're easier to denoise. And in NRD games, those effects are handled by ReLAX, the fast but really limited NRD denoiser.
 
Last edited:
So in a scenario of 3 stages where each stage is close to 16.66ms frame time but still rendering at 60FPS wouldn't that mean at least 2 frames of input lag?

To be clear, outside of Bungie, no one hits 1 frame of input lag. Competitive fighting and FPS games aim pretty low, but a lot of games are around 5 frames of input lag and Red Dead 2 somehow is at 1/3 to 1/2... of a second of input lag.

... The issue is whether or not this DLSS step would add on another frame of input lag (or even more depending on how frame timing ends up) as that could end up feeling bad. If Nintendo takes this approach, hopefully they make sure the rest of their pipeline minimizes input lag.
 
On the discussion of denoisers, although I never completed the dlss 3.5 breakdown I promised because of personal reasons (I'm still trying to convince myself I'll finish it eventually), I have learned interesting things.

Basically : ReBLUR, the general purpose denoiser of NRD, is expensive as hell. And won't run on Drake, considering it doesn't even run that well on the big boxes. Ever wondered why consoles so often prioritize shadows and ao for raytracing ? One of the big reasons is because they're easier to denoise. And in NRD games, those effects are handled by ReLAX, the fast but really limited NRD denoiser.
for shadows, NRD has SIGMA denoiser. that said, ReBLUR is used for WatchDogs Legion I think, though I guess the low RT resolution allows it to overcome costs
 
for shadows, NRD has SIGMA denoiser. that said, ReBLUR is used for WatchDogs Legion I think, though I guess the low RT resolution allows it to overcome costs
Oops I mistook ReLAX with SIGMA

I have reserached the subject of the cost of denoisers for hours yet I make this mistake. My stupidity never fails to surprise me.
 
Please read this new, consolidated staff post before posting.

Furthermore, according to this follow-up post, all off-topic chat will be moderated.
Last edited by a moderator:


Back
Top Bottom