• Hey everyone, staff have documented a list of banned content and subject matter that we feel are not consistent with site values, and don't make sense to host discussion of on Famiboards. This list (and the relevant reasoning per item) is viewable here.

StarTopic Future Nintendo Hardware & Technology Speculation & Discussion |ST| (Read the staff posts before commenting!)

A lot of times we are comparing the NG Switch with the Steam Deck, which makes sense consider they are both dedicated gaming hardware with similar form factors.

But considering they are all ARM, has anyone done any productive spec and graphical power comparisons of NG Switch with M1 in iPad and MacBooks and/or Meta Quest 2 and 3?
 
This post is a great explainer, but there are some fundamental differences between denoising and antialiasing that are worth clarifying. For example, I wouldn't say that anti-aliasing is deleting "real" detail. Both noise and aliasing are sampling artifacts, but their origin is very different. I'll elaborate on why:

Spatial frequency

Just like how signals that vary in time have a frequency, so do signals that vary in space. A really simple signal in time is something like sin(2 * pi * t). It's easy to imagine a similar signal in 2D space: sin(2 * pi * x) * sin(2 * pi * y). I'll have Wolfram Alpha plot it:

image.png


The important thing to notice here is that the frequency content in x and y separable. You could have a function that has a higher frequency in x then in y, like sin(5 * pi * x) * sin(2 * pi * y):

image.png


So just like time frequency has dimensions 1/[Time], spatial frequency is a 1-dimensional concept with dimensions 1/[Length]. The way the signal varies with x and the way it varies with y are independent. That's true in 1D, 2D, 3D... N-D, but we care about 2D because images are 2D signals.

What is aliasing, really?

Those sine functions above are continuous; you know exactly what the value is at every point you can imagine. But a digital image is discrete; it's made up of a finite number of equally spaced points. To make a discrete signal out of a continuous signal, you have to sample each point on the grid. If you want to take that discrete signal back to a continuous signal, then you have to reconstruct the original signal.

Ideally, that reconstruction would be perfect. A signal is called band-limited if the highest frequency is finite. For example, in digital music, we think of most signals as band-limited to the frequency that the human ears can hear, which is generally accepted to be around 20,000 Hz. A very important theory in digital signal processing, called the Nyquist-Shannon theorem, says that you can reconstruct a band-limited signal perfectly if you sample at more than twice the highest frequency in the signal. That's why music with a 44 kHz sampling rate is considered lossless; 44 kHz is more than twice the limit of human hearing at 20 kHz, so the audio signal can be perfectly reconstructed.

When you sample at less than twice the highest frequency, it's no longer possible to perfectly reconstruct the original signal. Instead, you get an aliased representation of the data. The sampling rate of a digital image is the resolution of the sensor, in a camera, or of the display in a computer-rendered image. This sampling rate needs to be high enough to correctly represent the information that you want to capture/display; otherwise, you will get aliasing.

By the way, this tells us why we get diminishing returns with increasing resolution. Since the x and y components of the signal are separable, quadrupling the "resolution" in the sense of the number of pixels (for example, going from 1080p to 2160p) only doubles the Nyquist frequency in x and in y.

So why does aliasing get explained as "jagged edges" so often? Well, any discontinuity, like a geometric edge, in an image is essentially an infinite frequency. With an infinite frequency, the signal is not band-limited, and there's no frequency that can satisfy the Nyquist-Shannon theorem. It's impossible to get perfect reconstruction. (https://pbr-book.org/3ed-2018/Sampling_and_Reconstruction/Sampling_Theory) But you can also have aliasing without a discontinuity, when the spatial resolution is too low to represent a signal (this is the reason why texture mipmaps exist; lower resolution mipmaps are low-pass filtered to remove high frequency content, preventing aliasing).

You can even have temporal aliasing in a game, when the framerate is too low to represent something moving quickly (for example, imagine a particle oscillating between 2 positions at 30 Hz; if your game is rendering at less than 60 fps, then by the Nyquist-Shannon theorem, the motion of the particle will be temporally aliased).

So what do we do to get around aliasing?

The best solution, from an image quality perspective, is to low-pass filter the signal before sampling it. Which yes, does essentially mean blurring it. For a continuous signal, the best function is called the sinc function, because it acts as a perfect low pass filter in frequency space. But the sinc function is infinite, so the best you can do in discrete space is to use a finite approximation. That, with some hand-waving, is what Lanczos filtering is, which (plus some extra functionality to handle contrast at the edges and the like) is how FSR handles reconstruction. Samples of the scene are collected in each frame, warped by the motion vectors, then filtered to reconstruct as much of the higher frequency information as possible.

The old-school methods of anti-aliasing, like supersampling and MSAA, worked similarly. You take more samples than you need (in the case of MSAA, you do it selectively near edges), then low-pass filter them to generate a final image without aliasing. By the way, even though it seems like an intuitive choice, the averaging filter (e.g. taking 4 4K pixels and averaging them to a single 1080p pixel) is actually kind of a shitty low-pass filter, because it introduces ringing artifacts in frequency space. Lanczos is much better.

An alternative way to do the filtering is to use a convolutional neural network (specifically, a convolutional autoencoder). DLDSR is a low-pass filter for spatial supersampling, and of course, DLSS does reconstruction. These are preferable to Lanczos because, since the signal is discrete and not band-limited, there's no perfect analytical filter for reconstruction. Instead of doing contrast-adaptive shenanigans like FSR does, you can just train a neural network to do the work. (And, by the way, if Lanczos is the ideal filter, then the neural network will learn to reproduce Lanczos, because a neural network is a universal function approximator; with enough nodes, it can learn any function.). Internally, the convolutional neural network downsamples the image several times while learning relevant features about the image, then you use the learned features to reconstruct the output image.

What's different about ray tracing, from a signal processing perspective?

(I have no professional background in rendering. I do work that involves image processing, so I know more about that. But I have done some reading about this for fun, so let's go).

When light hits a surface, some amount of it is transmitted, and some amount is scattered. To calculate the emitted light, you have to solve what's called the light transport equation, which is essentially an integral over some function that describes how the material emits light. But in most cases, this equation does not have an exact, analytic solution. Instead, you need to use a numerical approximation.

Monte Carlo algorithms numerically approximate an integral by randomly sampling over the integration domain. Path tracing is the application of a Monte Carlo algorithm to the light transport equation. Because you are randomly sampling, you get image noise, which converges with more random samples. But if you have a good denoising algorithm, you can reduce the number of samples for convergence. Unsurprisingly, convolutional autoencoders are also very good at this (because again, universal function approximators). Again, I'm not in this field, but I mean, Nvidia's published on it before (https://research.nvidia.com/publica...n-monte-carlo-image-sequences-using-recurrent). It's out there!

And yes, you can have aliasing in ray-traced images. If you took all the ray samples from the same pixel grid, and you happen to come across any high-frequency information, it would be aliased. So instead, you can randomly distribute the Monte Carlo samples, using some sampling algorithm (https://www.pbr-book.org/3ed-2018/Monte_Carlo_Integration/Careful_Sample_Placement).

Once you have the samples, DLSS was already very similar in structure to a denoising algorithm. If, for example, the Halton sampling algorithm (https://pbr-book.org/3ed-2018/Sampling_and_Reconstruction/The_Halton_Sampler) for distributing Monte Carlo samples sounds familiar, it's because it's the algorithm that Nvidia recommends for subpixel jittering in DLSS. So temporal upscalers like DLSS already exploit random distribution to sample and reconstruct higher frequency information. So it makes sense to combine the DLSS reconstruction passes for rasterized and ray traced samples because, in many ways, the way the data are structured and processed is very similar.

tl;dr

Aliasing is an artifact of undersampling a high-frequency signal. Good anti-aliasing methods filter out the high frequency information before sampling to remove aliasing from the signal. Temporal reconstruction methods, like DLSS and FSR, use randomly jittered samples collected over multiple frames to reconstruct high frequency image content.

Noise in ray tracing is an artifact of randomly sampling rays using a Monte Carlo algorithm. Instead of taking large numbers of random samples, denoising algorithms attempt to reconstruct the signal from a noisy input.
He's bringing in the 3D graphs!

And hey, Monte Carlo, finally an algorithm being discussed I actually have experience with!

Too tired to contribute, sadly, all I can say is really nice work!
 
A lot of times we are comparing the NG Switch with the Steam Deck, which makes sense consider they are both dedicated gaming hardware with similar form factors.

But considering they are all ARM, has anyone done any productive spec and graphical power comparisons of NG Switch with M1 in iPad and MacBooks and/or Meta Quest 2 and 3?
as far as I know, the Meta Quest 2 is a SD 865. apple is on a different level because of how they handle their software
 
Lets just look at the PS5 and Switch 2 from the specs we have, it's simple to figure out if this is a next gen product or a "pro" model.

PS4 uses GCN from 2011 as it's GPU architecture, it has 1.84TFLOPs available to it. (We will use GCN as a base) The performance of GCN is a factor of 1

Switch uses Maxwell v3 from 2015, it has 0.393TFLOPs available to it. The performance of Maxwell V3 is a factor of 1.4 + mixed precision for games that use this... This means when docked Switch is capable of 550GFLOPs to 825GFLOPs GCN, still a little less than half the GPU performance of PS4, this doesn't factor in far lower bandwidth, RAM amount or CPU performance, all of which sit around 30-33% of the PS4, with the GPU somewhere around 45% when completely optimized.

PS5 uses RDNA1.X, customized in part by Sony, introduced with the PS5 in 2020, it has up to 10.2TFLOPs available to it. The performance of RDNA 1.X is a factor of 1.2 + mixed precision (though this is limited to AI in use cases, developers just don't use mixed precision atm for console or PC gaming, it's used heavily in mobile and in Switch development though). This means ultimately that the PS5's GPU is about 6.64 times as powerful as the PS4, and around 3 times the PS4 Pro.

Switch 2 uses Ampere, specifically GA10F, which is a custom GPU architecture that will be introduced with the Switch 2 in 2024 (hopefully), it has 3.456TFLOPs available to it. The performance of Ampere is a factor of 1.2 + mixed precision* (this uses the tensor cores, and is independent of the shader cores). Mixed precision offers 5.2TFLOPs to 6TFLOPs. It also reserves 1/4th of the tensor cores for DLSS according to our estimates, much like the PS5 using FSR2, this allows the device to render the scene at 1/4th the resolution of the output with minimal loss to image quality, greatly boosting available GPU performance, and allowing the device to force a 4K image.

When comparing these numbers to PS4 GCN, Switch 2 has 4.14TFLOPs to 7.2TFLOPs, and PS5 12.24TFLOPs GCN equivalent, meaning that Switch 2 will do somewhere between 34% to 40% of PS5. It should also manage RT performance, and while PS5 will use some of that 10.2TFLOPs to do FSR2, Switch 2 can freely use the remaining 1/4th of it's tensor cores to manage DLSS. Ultimately there are other bottlenecks, the CPU is only going to be about 2/3rd as fast as the PS5's, and bandwidth with respect to their architectures, will only be about half as much, though it could offer 10GB+ for games, which is pretty standard atm for current gen games.

Switch 2 is going to manage current gen much better than Switch did with last gen games. The jump is bigger, the technology is a lot newer, and the addition of DLSS has leveled the playing field a lot, not to mention Nvidia's edge in RT playing a factor. I'd suggest that Switch 2 when docked, if using mixed precision will be noticeably better than the Series S, but noticeably behind current gen consoles.

Just wanted to say I’ve always loved reading your posts. Read them all. I’ve read this entire thread. I know full well what the speculated/expected hardware specs for this thing is. I get it.

I’m not arguing against the specs.

I’m arguing how new Switch hardware will be positioned by Nintendo.

Power differentials aren’t the be all, end all of how a console is positioned and treated.

The Wii U -> Switch power differential was relatively minimal...yet we knew it was being treated as a gen breaking successor and knew it was completely replacing and supplanting the Wii U. Similar to knowing the Series S was completely supplanting the Xbox One X despite not having huge differentials.

Likewise, large power differentials in spec doesn’t necessarily dictate that it must be a gen breaking successor either.

T239 seems like a fantastic mobile DLSS machine, right? It seems optimized to efficiently and competently run AI DLSS and some light RT on their respective cores at extremely minimal clocks and power draws.

This is amazing.

But i would suppose Nintendo will use it how it appears to be designed. Render Switch games at their OLED TX1+ profiles and use the power of the new SoC to output it at much higher resolutions than they can now and giving their games far better performance than they can squeeze out now with extra headroom to up the visual IQ to ps4pro/One X levels.

Right?

It will be like modern pc development except Nintendo is only optimizing for two profiles. Starfield, for example, is releasing to minimum specs of a gpu from
2016 and a cpu from 2015. People will also be playing it on their rtx4090 and 12 core cpu from 2022. It’s fundamentally the same game.

Why wouldn’t Nintendo approach their game development be like this for most their games over the next 5 years or so?

We all agree this is what will be done with the eventual Metroid Prime 4 release, yes? They are going to take the game they have been developing on the Tx1+ and use the power of t239 to make it look and run a lot better for that machine.

Now, if there is some out of the blue “gimmick” that we do not see coming that drives Nintendo to change the way we play their games and can only be done on the T239…then all bets are off.

I’m just going by what we know what the hardware is and what it’s designed to do and by what Nintendo has said concerning their devices and the future and the Switch.


You’re wasting your time.

He still thinks this new console is no different to a Switch Lite or Switch OLED being introduced to the Switch family line.

Heck, even if it’s treated and positioned exactly like a ps5 and it’s made clear it’s a gen breaking next gen console successor as it’s clear the ps5 was for the ps4/pro…you will at least agree the “cross gen” support for the OLED and Lite will be much longer than it is with the ps4 and ps4 pro…right?
 
A lot of times we are comparing the NG Switch with the Steam Deck, which makes sense consider they are both dedicated gaming hardware with similar form factors.

But considering they are all ARM, has anyone done any productive spec and graphical power comparisons of NG Switch with M1 in iPad and MacBooks and/or Meta Quest 2 and 3?
Totally different GPU architecture, and because it also a different software stack, there is very little in the way of useful apples-to-apples benchmarks. Especially not informative to the average public - “plays games as well as an iPad” doesn’t help most people visualize what it can do.
 
0
Hey Nate, in your opinion what are the odds Nintendo is talking about the next device in private at Gamescom?
Nintendo, specifically? Hard to say. Greater than 0% but less than 100%.

Industry chatter will be taking place, for certain. I have little birds waiting to sing their song to me once Gamescom concludes. Such songs may be kept under frie-NDA.
 
Whenever people mention industry meetings, I just imagine a bunch of tired people on a boring looking office talking about business but deviating here and there because all they have talked about all day is business lol. Thats what I would do in my research meetings sometimes.

That is something I would not want to be a fly on the wall on lol
 
0
This post is a great explainer, but there are some fundamental differences between denoising and antialiasing that are worth clarifying. For example, I wouldn't say that anti-aliasing is deleting "real" detail. Both noise and aliasing are sampling artifacts, but their origin is very different. I'll elaborate on why:

Spatial frequency

Just like how signals that vary in time have a frequency, so do signals that vary in space. A really simple signal in time is something like sin(2 * pi * t). It's easy to imagine a similar signal in 2D space: sin(2 * pi * x) * sin(2 * pi * y). I'll have Wolfram Alpha plot it:

image.png


The important thing to notice here is that the frequency content in x and y separable. You could have a function that has a higher frequency in x then in y, like sin(5 * pi * x) * sin(2 * pi * y):

image.png


So just like time frequency has dimensions 1/[Time], spatial frequency is a 1-dimensional concept with dimensions 1/[Length]. The way the signal varies with x and the way it varies with y are independent. That's true in 1D, 2D, 3D... N-D, but we care about 2D because images are 2D signals.

What is aliasing, really?

Those sine functions above are continuous; you know exactly what the value is at every point you can imagine. But a digital image is discrete; it's made up of a finite number of equally spaced points. To make a discrete signal out of a continuous signal, you have to sample each point on the grid. If you want to take that discrete signal back to a continuous signal, then you have to reconstruct the original signal.

Ideally, that reconstruction would be perfect. A signal is called band-limited if the highest frequency is finite. For example, in digital music, we think of most signals as band-limited to the frequency that the human ears can hear, which is generally accepted to be around 20,000 Hz. A very important theory in digital signal processing, called the Nyquist-Shannon theorem, says that you can reconstruct a band-limited signal perfectly if you sample at more than twice the highest frequency in the signal. That's why music with a 44 kHz sampling rate is considered lossless; 44 kHz is more than twice the limit of human hearing at 20 kHz, so the audio signal can be perfectly reconstructed.

When you sample at less than twice the highest frequency, it's no longer possible to perfectly reconstruct the original signal. Instead, you get an aliased representation of the data. The sampling rate of a digital image is the resolution of the sensor, in a camera, or of the display in a computer-rendered image. This sampling rate needs to be high enough to correctly represent the information that you want to capture/display; otherwise, you will get aliasing.

By the way, this tells us why we get diminishing returns with increasing resolution. Since the x and y components of the signal are separable, quadrupling the "resolution" in the sense of the number of pixels (for example, going from 1080p to 2160p) only doubles the Nyquist frequency in x and in y.

So why does aliasing get explained as "jagged edges" so often? Well, any discontinuity, like a geometric edge, in an image is essentially an infinite frequency. With an infinite frequency, the signal is not band-limited, and there's no frequency that can satisfy the Nyquist-Shannon theorem. It's impossible to get perfect reconstruction. (https://pbr-book.org/3ed-2018/Sampling_and_Reconstruction/Sampling_Theory) But you can also have aliasing without a discontinuity, when the spatial resolution is too low to represent a signal (this is the reason why texture mipmaps exist; lower resolution mipmaps are low-pass filtered to remove high frequency content, preventing aliasing).

You can even have temporal aliasing in a game, when the framerate is too low to represent something moving quickly (for example, imagine a particle oscillating between 2 positions at 30 Hz; if your game is rendering at less than 60 fps, then by the Nyquist-Shannon theorem, the motion of the particle will be temporally aliased).

So what do we do to get around aliasing?

The best solution, from an image quality perspective, is to low-pass filter the signal before sampling it. Which yes, does essentially mean blurring it. For a continuous signal, the best function is called the sinc function, because it acts as a perfect low pass filter in frequency space. But the sinc function is infinite, so the best you can do in discrete space is to use a finite approximation. That, with some hand-waving, is what Lanczos filtering is, which (plus some extra functionality to handle contrast at the edges and the like) is how FSR handles reconstruction. Samples of the scene are collected in each frame, warped by the motion vectors, then filtered to reconstruct as much of the higher frequency information as possible.

The old-school methods of anti-aliasing, like supersampling and MSAA, worked similarly. You take more samples than you need (in the case of MSAA, you do it selectively near edges), then low-pass filter them to generate a final image without aliasing. By the way, even though it seems like an intuitive choice, the averaging filter (e.g. taking 4 4K pixels and averaging them to a single 1080p pixel) is actually kind of a shitty low-pass filter, because it introduces ringing artifacts in frequency space. Lanczos is much better.

An alternative way to do the filtering is to use a convolutional neural network (specifically, a convolutional autoencoder). DLDSR is a low-pass filter for spatial supersampling, and of course, DLSS does reconstruction. These are preferable to Lanczos because, since the signal is discrete and not band-limited, there's no perfect analytical filter for reconstruction. Instead of doing contrast-adaptive shenanigans like FSR does, you can just train a neural network to do the work. (And, by the way, if Lanczos is the ideal filter, then the neural network will learn to reproduce Lanczos, because a neural network is a universal function approximator; with enough nodes, it can learn any function.). Internally, the convolutional neural network downsamples the image several times while learning relevant features about the image, then you use the learned features to reconstruct the output image.

What's different about ray tracing, from a signal processing perspective?

(I have no professional background in rendering. I do work that involves image processing, so I know more about that. But I have done some reading about this for fun, so let's go).

When light hits a surface, some amount of it is transmitted, and some amount is scattered. To calculate the emitted light, you have to solve what's called the light transport equation, which is essentially an integral over some function that describes how the material emits light. But in most cases, this equation does not have an exact, analytic solution. Instead, you need to use a numerical approximation.

Monte Carlo algorithms numerically approximate an integral by randomly sampling over the integration domain. Path tracing is the application of a Monte Carlo algorithm to the light transport equation. Because you are randomly sampling, you get image noise, which converges with more random samples. But if you have a good denoising algorithm, you can reduce the number of samples for convergence. Unsurprisingly, convolutional autoencoders are also very good at this (because again, universal function approximators). Again, I'm not in this field, but I mean, Nvidia's published on it before (https://research.nvidia.com/publica...n-monte-carlo-image-sequences-using-recurrent). It's out there!

And yes, you can have aliasing in ray-traced images. If you took all the ray samples from the same pixel grid, and you happen to come across any high-frequency information, it would be aliased. So instead, you can randomly distribute the Monte Carlo samples, using some sampling algorithm (https://www.pbr-book.org/3ed-2018/Monte_Carlo_Integration/Careful_Sample_Placement).

Once you have the samples, DLSS was already very similar in structure to a denoising algorithm. If, for example, the Halton sampling algorithm (https://pbr-book.org/3ed-2018/Sampling_and_Reconstruction/The_Halton_Sampler) for distributing Monte Carlo samples sounds familiar, it's because it's the algorithm that Nvidia recommends for subpixel jittering in DLSS. So temporal upscalers like DLSS already exploit random distribution to sample and reconstruct higher frequency information. So it makes sense to combine the DLSS reconstruction passes for rasterized and ray traced samples because, in many ways, the way the data are structured and processed is very similar.

tl;dr

Aliasing is an artifact of undersampling a high-frequency signal. Good anti-aliasing methods filter out the high frequency information before sampling to remove aliasing from the signal. Temporal reconstruction methods, like DLSS and FSR, use randomly jittered samples collected over multiple frames to reconstruct high frequency image content.

Noise in ray tracing is an artifact of randomly sampling rays using a Monte Carlo algorithm. Instead of taking large numbers of random samples, denoising algorithms attempt to reconstruct the signal from a noisy input.
God I love this thread so much. Always learning cool shit.
 
Is it actually?
Drake is an ampere card, so that already rules it out. even if it was Lovelace, to make frame gen work, you need to have a good deal of head room. the reason you can get such high frame rates with frame gen is because you're not working your gpu as hard with DLSS, giving the gpu time to create those new intermediate frames. if you're already running at peak usage, you don't have enough spare performance to create those new frames. the problem gets worse the lower down the performance stack you go
 
Drake is an ampere card, so that already rules it out. even if it was Lovelace, to make frame gen work, you need to have a good deal of head room. the reason you can get such high frame rates with frame gen is because you're not working your gpu as hard with DLSS, giving the gpu time to create those new intermediate frames. if you're already running at peak usage, you don't have enough spare performance to create those new frames. the problem gets worse the lower down the performance stack you go
And Nvidia Reflex forces less-than-full GPU usage in order to keep latency down, which makes it an even poorer fit for a GPU strapped environment. Frame gen - at least the current iteration - just isn't a good fit.
 
0
It’s slowing down because of hitting saturation, not because it’s losing engagement.
Yeah and?
Engagement is important sure, but not nearly as important as having good sales for the new hardware. They can't survive on software sales alone.

I think people are overestimating how much of the Switch userbase will really NEED to play Mario Kart and Animal Crossing and such in 4K with better graphics. And willing to pay another $400 any time soon to do it. So why not cater to them for another 5-6 years?
Because it would be a very bad shortsighted move to do.
Even with the all time high engagement the switch has, the money they would make by releasing their game on the Switch 1 is far, far less than the amount of money they would make by only releasing them on the Switch 2 and pushing sales of the switch 2.

But why, though.

I can see them stop making TX1+ hardware by 2026, sure. But why stop putting most of their big games on them?
Because putting their games on the old console would deter people from buying the new one, and again, they NEED people to buy the new one since the old one is on it's way out.
And also because the games won't be able to run on the switch due to huge power difference between the 2 machines.

Even the people who don’t like what I’m saying will agree that most Nintendo games will appear on the OLED/Lite for at least 3 years after this new model releases.
It will be no more than 3 years, and it will only be small releases and remake/remasters. The next 3DMario/Mario Kart/Zelda/Smash/Animal Crossing/Splatoon/Pokemon will not release on Switch 1.

The Wii U -> Switch power differential was relatively minimal...yet we knew it was being treated as a gen breaking successor and knew it was completely replacing and supplanting the Wii U. Similar to knowing the Series S was completely supplanting the Xbox One X despite not having huge differentials.
Huh... The Wii U was a home console, the Switch is handheld.
Also are you seriously comparing the mid gen pro version of the Xbox One versus the budget version of the Series X? are you being serious right now?

In this particular Switch situation, we have none of that. It would be the same exact team and development making a low end profile and then using the new hardware to make a high end profile, they don’t have to have focus on two different versions from being built from the ground up separately.
But i would suppose Nintendo will use it how it appears to be designed. Render Switch games at their OLED TX1+ profiles and use the power of the new SoC to output it at much higher resolutions than they can now and giving their games far better performance than they can squeeze out now with extra headroom to up the visual IQ to ps4pro/One X levels.
It will be like modern pc development except Nintendo is only optimizing for two profiles.
And most importantly, it’s the same base architecture that’s in the new hardware the new hardware whose design is specifically geared to run low profile games and output them high profile.

These posts now make it clear why you think this is going to be pro model.
You have pretty much zero understanding of Drake's specs.
You claim to have read all the technical analysis but either you didn't or you did not understand what they were saying because I don't know how you could think Drake is designed to run switch 1 games at higher res when the very post you just quoted is telling you that it's not.
Drake's GPU doesn't have the same architecture as the switch, It has a much newer one (Switch 1 Maxwell versus Drake Ampere) that literally cannot run switch games at all.
There is a reason people are worried about Switch 1 games running at all on Drake and discussed different solution to this problem (like an emulation layer for the GPU).
Drake is not designed "Render Switch games at their OLED TX1+ profiles and use the power of the new SoC to output it at much higher resolutions".
Drake design is not "specifically geared to run low profile games and output them high profile."
That's not what drake is, not at all and I don't know where you got that from.

Overall Drake is a vastly different machine from the Switch, with a huge overall jump in power (6x-8x times the Switch) and many new modern features.
The jump from Switch 1 to Drake is bigger than the one from PS3 to PS4.

And if you still want to pull the "how Nintendo will position this console", did you somehow miss how Furukawa has been repeating non-stop for the past year or so how they are worried about handling the transition of the Switch userbase to their next gen machine?
 
Last edited:
Render Switch games at their OLED TX1+ profiles and use the power of the new SoC to output it at much higher resolutions than they can now and giving their games far better performance than they can squeeze out now with extra headroom to up the visual IQ to ps4pro/One X levels.
You’d be leaving so much performance on the table

We all agree this is what will be done with the eventual Metroid Prime 4 release, yes? They are going to take the game they have been developing on the Tx1+ and use the power of t239 to make it look and run a lot better for that machine.

So here’s the deal if all they did was increase resolution/frame rates then fine
But like I said they’d be leaving tons of performance on the table…

But if they use the extra headroom to increase texture resolution, shader complexity, lighting, shadows, draw distances, poly count, etc … or any number of those … then it’s a different branch of development targeting a different profile.

And why wouldn’t they?

if you’re a developer and you’re gonna recompile your code for a new piece of hardware… then you’ll see that the software is only utilizing 20% of the total hardware config… now if you’re just making a simple patch maybe that’s fine so you up the res and framerate and call it a day.
But if you’re developing a cross gen title you’d likely do a lot of the stuff I mentioned above.

My point is they didn’t design T239 to run existing switch games better or it would have been a tx1+++ maybe with a double GPU like ps4 pro

Unless this is what you’re saying…?
 
Last edited:
I had a bit of a crazy idea.
But, would it be possible to use DLSS from a CPU draw call?
I think they could make a more efficient frame generator if it were possible.
Currently DLSS 3 works by creating a frame between two already produced frames.
But what if while the second frame is still going through the rasterization and steps of the Shaders Cores, the CPU sends the same draw call to the Tensor Cores, with the idea of providing necessary data that together with the motion vectors and previous frame could be used to infer an intermediate frame before the next frame is already fully generated?
 
0
You’d be leaving so much performance on the table
More specifically, it's adding unnecessary performance to a device that is so far outside the scope of this proposed "Switch games but at higher res" positioning that such a device with a T239 would be an active and egregious waste of money. And no one should be of the illusion that any shareholder would consider that a good thing, when it's easy to identify that there are much cheaper and far more power-efficient options than what T239 is being designed to be capable of.
 
Nintendo, specifically? Hard to say. Greater than 0% but less than 100%.

Industry chatter will be taking place, for certain. I have little birds waiting to sing their song to me once Gamescom concludes. Such songs may be kept under frie-NDA.

I'm sure they'll sing wonderful songs 🤞
 
Yeah but you get the mobile tflops by halving the docked tflops in general terms. Do people really expect Switch 2 to be 3x the current Switch's docked mode in handheld form?... That line of thinking is wishful thinking imo when you consider they have shown time and time again they value battery life over even at times running games at 360p-480p resolution in handheld mode. An issue I rarely see people talk about is active cooling and the costs involved in cooling a chip running at one third to two thirds the wattage over a chip like Tegra X1 which only need a tiny, tiny fan.

@mjayer

I absolutely adore @Z0m3le but let's be real if you followed him during the pre Switch and Switch Pro speculation years he has consistently massively overshot in terms of how high Nintendo will clock their chipsets thus leading to massively reduced performance in relation to his expectations.

I 100% think Switch 2 will be massively more powerful than the current Switch but starting to compare it to current gen machines with good desktop class CPU's, their massive fast RAM pools, their ridiculously fast SSD's and their GPU's (which are still in a best case scenario going to be 3-5x the performance of a best case scenario Switch 2 GPU) is just setting yourself up for disappointment imo.

Switch 2 will be around top end Switch level visuals with more complex geometry, better textures, better quality lighting and models while running at much higher resolutions due mainly to DLSS and more stable framerates while again getting about 20-25% of the latest and greatest big AAA third party games. It's not going to be the N64 to Gamecube leap some think. it's going to be the PS4 to PS5 leap which causes arguments and the more core audience asking "is this it, is this next gen?" when it comes to visuals in games like Starfield or Spider-Man 2 (I think both look phenomenal personally and go for aims not necessarily to do with visuals but either speed of traversal or scale unseen before in the AAA space).

Will this please the hardcore among us? probably not but is it enough to please the 100+ million consumers that Nintendo will want to capture with the device?... of course.
This is absolutely fair, but lets look at the complete context here. The pre-switch speculation was based on Nate's reporting of Pascal based 16nm Tegra, Tegra X1 was ultimately not that chip. However, I didn't take into account mixed precision, because at the time no one really expected developers to utilize it. The Switch has 393GFLOPs of FP32 performance, but developers utilize up to 600GFLOPs from the Nintendo Switch thanks to mixed precision.

I think what is more important than these aspects though, is that we are not really speculating on what Nintendo will use, we have basically all the information via a data breach 18 months ago and public linux info, when it comes to Drake, I've been more correct than the general thread's tech heads, claiming 8 cores far before linux kernel confirmed it, when other tech people speculated 6 cores and 1GHz clock speeds, meanwhile I've said for over a year that we should be seeing 8 cores with 1.5GHz to 2GHz CPU clock, and probably A78C cores, then Linux Kernel confirmed the 8 CPU cores and A78C is basically the worst/oldest ARM core that could be used by Drake via that leak of 8 cores 1 cluster.

The GPU, most people speculated 768 Cuda Cores, I was estimating 1024 Cuda Cores just based on 8nm transistor counts that would fit in a similar area to TX1... Then the data leak happened and we see it is actually 1536 Cuda Cores, the GPU here is SIX times bigger than Tegra X1, you mention portable performance of Switch 2 not being 3 times faster than Switch, but realistically, to get to 3x the raw performance of docked Switch, the GPU clock could be as low as Orin 8nm's lowest GPU clock of 422MHz, which offers 1.3TFLOPs for portable gaming here... Ampere also has features that help that 1.3TFLOPs match the PS4 in raw power, Variable Rate Shading, Tile based rendering, and Mesh shaders, all being the biggest contributors. You then throw DLSS on top, that is a drastic improvement to portable performance, and that is on 422MHz, a TSMC 4N T239 chip (process node based on Thraktor's estimates) should push the portable clock up much higher, Switch uses 460MHz on the 2D transistor 20nm chip for it's GPU, here we are probably looking at ~600MHz or 1843GFLOPs (same as PS4 before architecture is taken into account), and realistically the data breach could have very possibly leaked the portable clock of 660MHz, which puts the portable performance at just over 2TFLOPs. Tegra X1 is based on a spring 2015 chip on a bad node, this is basically 10 years later, the ability to hit 4 to 5 times the raw performance of Tegra X1 in portable mode without exceeding Nintendo's power requirements (which none of us know), is fairly easy to do by accident, and given the size of T239, there is no point in not drastically exceeding it's performance.

When we compare it to the PS4's GPU, we also have to remember that GPU will be 13 years old when Drake launches, and mobile technology started taking off after PS4's launch, which only accelerated what could be done. I don't really think docked performance matters much in this particular conversation, I appreciate people disagreeing with each other, so I don't mean this to come off as some sort of talking down, I really am just looking at the factual data we have, and coming to a conclusion, we are not trying to guess about the GPU, we have those specs, the clocks certainly have a range, but even the minimal range for these clocks would blow away our previous expectations. I hope this helps, because these numbers really aren't about me, Thraktor has stated that portable clocks for Switch are probably 550mhz to 600mhz (1.69TFLOPs to 1.84TFLOPs) and that docked, he expects 1.1GHz 3.38TFLOPs, while the leak does have a clock of 1.125GHz that I use, giving us the 3.456TFLOPs. A week or two ago, I also brought up mixed precision via tensor cores, and Thraktor confirmed that is possible to be done, I've separated these numbers into 50% and 75% of the tensor cores doing mixed precision, because Ampere's SMs have 4 Tensor cores each, and 1 or 2 of them will be busy with DLSS and denoising, but now that denoising is done in the same steps as DLSS, it will almost certainly only require 1/4th of the tensor cores, meaning mixed precision when docked offers 6TFLOPs (10x the mixed precision of Switch).

Will developers use any of these features? That isn't known to us, however we should expect a GPU that is 6 times bigger and offering higher clocks via an advance node just not scaling down in clocks as well (power curve limitations) that it should realistically do things at 8 to 10 times better than TX1. There are bottlenecks to take into consideration here, memory bandwidth and cpu speeds being the main ones, however this is the case with the Switch to an even greater point, Switch is far less balanced than Drake seems to be, which is very important to note, it also doesn't use the black magic of DLSS, which I really haven't added to this discussion because it blows that 8 to 10 times numbers out of the water, I mean you can render a 1080p frame with 8 to 10 times the performance of TX1's 1080p frame (think MK8D) and pump it to 4K, that is just absolutely crazy, what is also worth thinking about is rendering a 720p image and taking that to 1440p via DLSS and then using a basic spacial scaler like FSR1 to hit 4K, it will look better than a raw 1440p image still, but give you a full 4K image so your TV's bad upscaler isn't being used.

PS sorry to make this post so long, I just don't post in here very much, so when I see my name being mentioned, I figure I should give a full explanation of my thought process... remember, I'm not a hardware engineer, I'm just a game enthusiast like the rest of you, I just happen to really love digging into the tech behind this stuff.
 
You’d be leaving so much performance on the table
Same thing with NSO games running on Switch, so much wasted power.

A backwards compatibility profile will run games the way they were intended to. It has no insight into the game's structures and configurations. Giving the game more resources than it expects may fix a few issues, may do nothing at all or may completely break it.

A next-gen patch will probably only tweak configurations (and recompile shaders) to increase framerate, resolution and/or LoD. It will also leave a lot of performance on the table.

To fully use the available power, the game logic would go from 3x A57 cores on ~3GB of RAM to 7x A78 cores on ~10GB of RAM and rendering would go to a GPU ~6x more powerful with new features. You'd have to rewrite your logic with the new specs and rework your assets. It's almost a full remake.

BC, if available, is mostly to carry the library over.
The hardware itself is about new games and experiences.
 
Just wanted to say I’ve always loved reading your posts. Read them all. I’ve read this entire thread. I know full well what the speculated/expected hardware specs for this thing is. I get it.

I’m not arguing against the specs.

I’m arguing how new Switch hardware will be positioned by Nintendo.

Power differentials aren’t the be all, end all of how a console is positioned and treated.

The Wii U -> Switch power differential was relatively minimal...yet we knew it was being treated as a gen breaking successor and knew it was completely replacing and supplanting the Wii U. Similar to knowing the Series S was completely supplanting the Xbox One X despite not having huge differentials.

Likewise, large power differentials in spec doesn’t necessarily dictate that it must be a gen breaking successor either.

T239 seems like a fantastic mobile DLSS machine, right? It seems optimized to efficiently and competently run AI DLSS and some light RT on their respective cores at extremely minimal clocks and power draws.

This is amazing.

But i would suppose Nintendo will use it how it appears to be designed. Render Switch games at their OLED TX1+ profiles and use the power of the new SoC to output it at much higher resolutions than they can now and giving their games far better performance than they can squeeze out now with extra headroom to up the visual IQ to ps4pro/One X levels.

Right?

It will be like modern pc development except Nintendo is only optimizing for two profiles. Starfield, for example, is releasing to minimum specs of a gpu from
2016 and a cpu from 2015. People will also be playing it on their rtx4090 and 12 core cpu from 2022. It’s fundamentally the same game.

Why wouldn’t Nintendo approach their game development be like this for most their games over the next 5 years or so?

We all agree this is what will be done with the eventual Metroid Prime 4 release, yes? They are going to take the game they have been developing on the Tx1+ and use the power of t239 to make it look and run a lot better for that machine.

Now, if there is some out of the blue “gimmick” that we do not see coming that drives Nintendo to change the way we play their games and can only be done on the T239…then all bets are off.

I’m just going by what we know what the hardware is and what it’s designed to do and by what Nintendo has said concerning their devices and the future and the Switch.
What you are saying here makes sense for a wide variety of Nintendo's Software, but you have to remember that the GPU isn't just 9 years newer than Tegra X1 and has features like DLSS and Ray Tracing bolted down to the fundamental design of the chip, it's also a GPU that is 6 times bigger than the Tegra X1, on a process node that is much much better than any current Ampere chip's 8nm Samsung node. DLSS is enough to get the results you are talking about, even if the GPU wasn't more powerful, rendering a Switch game like Metroid Prime 4, even if that game runs at 720p docked on Switch, could run at 1440p on Switch 2 via DLSS even if the GPU wasn't more powerful, however it's at least SIX times more powerful than Tegra X1, has features like Variable rate shading that offers ~20% increase in performance and mesh shading that offers ~25% increase in performance... The CPU is also 3 times faster per clock and has at least a 50% higher clock and 133% more cores, that means the CPU is at minimum 10 times the performance, the RAM is 3 times larger and 4 times faster... There is a lot more than XB1/PS4 gen visuals going on here.

What developers have available, and what developers will use, is also very very different, not every PC game looks as good as another, expect Capcom to push Nintendo hardware with Switch 2, but don't expect the next Kirby game to look like R&C rift apart. However, thanks to ray tracing, Nintendo could absolutely make something that looks like that with Drake, especially given DLSS, as I don't believe R&C rift apart takes advantage of FSR2 even.
 
This post is a great explainer, but there are some fundamental differences between denoising and antialiasing that are worth clarifying. For example, I wouldn't say that anti-aliasing is deleting "real" detail. Both noise and aliasing are sampling artifacts, but their origin is very different. I'll elaborate on why:

Spatial frequency

Just like how signals that vary in time have a frequency, so do signals that vary in space. A really simple signal in time is something like sin(2 * pi * t). It's easy to imagine a similar signal in 2D space: sin(2 * pi * x) * sin(2 * pi * y). I'll have Wolfram Alpha plot it:

image.png


The important thing to notice here is that the frequency content in x and y separable. You could have a function that has a higher frequency in x then in y, like sin(5 * pi * x) * sin(2 * pi * y):

image.png


So just like time frequency has dimensions 1/[Time], spatial frequency is a 1-dimensional concept with dimensions 1/[Length]. The way the signal varies with x and the way it varies with y are independent. That's true in 1D, 2D, 3D... N-D, but we care about 2D because images are 2D signals.

What is aliasing, really?

Those sine functions above are continuous; you know exactly what the value is at every point you can imagine. But a digital image is discrete; it's made up of a finite number of equally spaced points. To make a discrete signal out of a continuous signal, you have to sample each point on the grid. If you want to take that discrete signal back to a continuous signal, then you have to reconstruct the original signal.

Ideally, that reconstruction would be perfect. A signal is called band-limited if the highest frequency is finite. For example, in digital music, we think of most signals as band-limited to the frequency that the human ears can hear, which is generally accepted to be around 20,000 Hz. A very important theory in digital signal processing, called the Nyquist-Shannon theorem, says that you can reconstruct a band-limited signal perfectly if you sample at more than twice the highest frequency in the signal. That's why music with a 44 kHz sampling rate is considered lossless; 44 kHz is more than twice the limit of human hearing at 20 kHz, so the audio signal can be perfectly reconstructed.

When you sample at less than twice the highest frequency, it's no longer possible to perfectly reconstruct the original signal. Instead, you get an aliased representation of the data. The sampling rate of a digital image is the resolution of the sensor, in a camera, or of the display in a computer-rendered image. This sampling rate needs to be high enough to correctly represent the information that you want to capture/display; otherwise, you will get aliasing.

By the way, this tells us why we get diminishing returns with increasing resolution. Since the x and y components of the signal are separable, quadrupling the "resolution" in the sense of the number of pixels (for example, going from 1080p to 2160p) only doubles the Nyquist frequency in x and in y.

So why does aliasing get explained as "jagged edges" so often? Well, any discontinuity, like a geometric edge, in an image is essentially an infinite frequency. With an infinite frequency, the signal is not band-limited, and there's no frequency that can satisfy the Nyquist-Shannon theorem. It's impossible to get perfect reconstruction. (https://pbr-book.org/3ed-2018/Sampling_and_Reconstruction/Sampling_Theory) But you can also have aliasing without a discontinuity, when the spatial resolution is too low to represent a signal (this is the reason why texture mipmaps exist; lower resolution mipmaps are low-pass filtered to remove high frequency content, preventing aliasing).

You can even have temporal aliasing in a game, when the framerate is too low to represent something moving quickly (for example, imagine a particle oscillating between 2 positions at 30 Hz; if your game is rendering at less than 60 fps, then by the Nyquist-Shannon theorem, the motion of the particle will be temporally aliased).

So what do we do to get around aliasing?

The best solution, from an image quality perspective, is to low-pass filter the signal before sampling it. Which yes, does essentially mean blurring it. For a continuous signal, the best function is called the sinc function, because it acts as a perfect low pass filter in frequency space. But the sinc function is infinite, so the best you can do in discrete space is to use a finite approximation. That, with some hand-waving, is what Lanczos filtering is, which (plus some extra functionality to handle contrast at the edges and the like) is how FSR handles reconstruction. Samples of the scene are collected in each frame, warped by the motion vectors, then filtered to reconstruct as much of the higher frequency information as possible.

The old-school methods of anti-aliasing, like supersampling and MSAA, worked similarly. You take more samples than you need (in the case of MSAA, you do it selectively near edges), then low-pass filter them to generate a final image without aliasing. By the way, even though it seems like an intuitive choice, the averaging filter (e.g. taking 4 4K pixels and averaging them to a single 1080p pixel) is actually kind of a shitty low-pass filter, because it introduces ringing artifacts in frequency space. Lanczos is much better.

An alternative way to do the filtering is to use a convolutional neural network (specifically, a convolutional autoencoder). DLDSR is a low-pass filter for spatial supersampling, and of course, DLSS does reconstruction. These are preferable to Lanczos because, since the signal is discrete and not band-limited, there's no perfect analytical filter for reconstruction. Instead of doing contrast-adaptive shenanigans like FSR does, you can just train a neural network to do the work. (And, by the way, if Lanczos is the ideal filter, then the neural network will learn to reproduce Lanczos, because a neural network is a universal function approximator; with enough nodes, it can learn any function.). Internally, the convolutional neural network downsamples the image several times while learning relevant features about the image, then you use the learned features to reconstruct the output image.

What's different about ray tracing, from a signal processing perspective?

(I have no professional background in rendering. I do work that involves image processing, so I know more about that. But I have done some reading about this for fun, so let's go).

When light hits a surface, some amount of it is transmitted, and some amount is scattered. To calculate the emitted light, you have to solve what's called the light transport equation, which is essentially an integral over some function that describes how the material emits light. But in most cases, this equation does not have an exact, analytic solution. Instead, you need to use a numerical approximation.

Monte Carlo algorithms numerically approximate an integral by randomly sampling over the integration domain. Path tracing is the application of a Monte Carlo algorithm to the light transport equation. Because you are randomly sampling, you get image noise, which converges with more random samples. But if you have a good denoising algorithm, you can reduce the number of samples for convergence. Unsurprisingly, convolutional autoencoders are also very good at this (because again, universal function approximators). Again, I'm not in this field, but I mean, Nvidia's published on it before (https://research.nvidia.com/publica...n-monte-carlo-image-sequences-using-recurrent). It's out there!

And yes, you can have aliasing in ray-traced images. If you took all the ray samples from the same pixel grid, and you happen to come across any high-frequency information, it would be aliased. So instead, you can randomly distribute the Monte Carlo samples, using some sampling algorithm (https://www.pbr-book.org/3ed-2018/Monte_Carlo_Integration/Careful_Sample_Placement).

Once you have the samples, DLSS was already very similar in structure to a denoising algorithm. If, for example, the Halton sampling algorithm (https://pbr-book.org/3ed-2018/Sampling_and_Reconstruction/The_Halton_Sampler) for distributing Monte Carlo samples sounds familiar, it's because it's the algorithm that Nvidia recommends for subpixel jittering in DLSS. So temporal upscalers like DLSS already exploit random distribution to sample and reconstruct higher frequency information. So it makes sense to combine the DLSS reconstruction passes for rasterized and ray traced samples because, in many ways, the way the data are structured and processed is very similar.

tl;dr

Aliasing is an artifact of undersampling a high-frequency signal. Good anti-aliasing methods filter out the high frequency information before sampling to remove aliasing from the signal. Temporal reconstruction methods, like DLSS and FSR, use randomly jittered samples collected over multiple frames to reconstruct high frequency image content.

Noise in ray tracing is an artifact of randomly sampling rays using a Monte Carlo algorithm. Instead of taking large numbers of random samples, denoising algorithms attempt to reconstruct the signal from a noisy input.
I just want to say that I do appreciate your posts.
 
Reminder that T239 or any hardware details are not "known" or confirmed despite many posters stating as such, as correct as the leak was back then.
I’m going to be frank, I’m not sure why you bring this point up repeatedly in the thread about not being confirmed or whatever along those lines.

They aren’t Sony:


A whole video detailing specifics about the hardware and the logic and reason behind it

Oh and Sony even has a blogpost about their console capabilities:


They aren’t Microsoft:


A whole presentation about the literal silicon, the IPs, the capabilities, etc.


Let me remind you what Nintendo are:
IMG_2650.png

This is for the switch.

There’s multiple Tegra processors.

This goes back further to the Wii U:

702F8C85-DBA0-45A1-AF8C-C30724B4AB93.jpg



The most you’d be getting is that they are working with X company, and that’s all. Just like the N64 with Silicon Graphics or IBM for the GameCube.



Waiting for a direct confirmation from Nintendo that they are using the T239 processor from Nvidia is such a futile angle to even choose to die on. Nintendo will never, ever, ever confirm the specifications of their silicon to the public. The information we get about silicon powering Nintendo hardware comes down to third-party sources leaking information that we go with. The actual details of the Switch, a.k.a. being the Tegra X1, we wouldn’t have known directly from Nintendo. We found from other sources.

So while yes, we have zero confirmation about this from Nintendo about the T239 chip, the likelihood of us getting that information from them is about as likely as Sony choosing to go Nvidia x IBM for their PS5 Pro project.

Or Microsoft making an actual handheld Series console.

But sure, it’s not impossible.


TL;DR I don’t understand the point of your comment when the only direct confirmation comes from the company selling the product to you, but the company that sells the product never does that, therefore you’re waiting for information that will not exist, ever.
 
I'm not sure this is saying what you are suggesting it is saying. Not that I'm calling you out, this is tricky stuff.

@Samuspet is talking about FLOPS per Watt. How much does it cost to go past 1 TFLOP? But that's not what this article is talking about - this is talking about benchmark performance per watt.

And the article isn't benchmarking architectures, it's benchmarking GPUs. That matters, because different cards in the same architecture might have radically different efficiencies. In general wider but slower designs are more power efficient than faster but narrower designs. You can build cards both way with either architecture.

Let's take a look at two cards that perform very similarly on the benchmark. The RX 6700 XT and the RTX 3070 have nearly identical benchmark results and power draw numbers. On paper, for this test, the cards look basically exactly the same.

But they have totally different designs. The RX 6700 XT is made up of 2560 RDNA 2 cores, running at 2.6 GHz. The RTX 3070 is made up of 5888 Ampere cores, running at 1.7 GHz. The RDNA 2 card is going for that traditionally inefficient narrow but fast design, and yet is totally keeping up with Ampere. This actually implies the opposite. The RDNA 2 is much more efficient than Ampere.

If you look at it per TFLOP, the way Samuspet was, it's even more stark. Every single RX 6000 series card absolutely stomps the RTX 30 equivalent in efficiency here.

CardWattsTFLOPSTFLOPS/Watt
RTX 3070219.320.310.0926128591
RTX 3060Ti205.516.20.07883211679
RTX 3060 (12GB)171.812.740.07415599534
RTX 308033329.770.0893993994
RTX 309036135.590.09858725762
RX 6800235.432.330.1373406967
RX 6700xt215.526.420.1225986079
RX 6900xt308.546.080.1493679092
RX 6800xt303.441.470.1366842452

You can also see that both sets of cards have lots of variation. Details in the way those cores are arranged and clocked can have huge implications for power consumption.

If we look at frames/Watt, what this benchmark was designed to do, we can see that the two arches are roughly equivalent.

CardWattsFramesFrames/W
RTX 3070219.3116.60.5316917465
RTX 3060Ti205.5106.30.5172749392
RTX 3060 (12GB)171.883.60.4866123399
RTX 3080333142.10.4267267267
RTX 3090361152.70.4229916898
RX 6800235.4130.80.5556499575
RX 6700xt215.51120.5197215777
RX 6900xt308.5148.10.4800648298
RX 6800xt303.4142.80.4706657877

But that begs an obvious question. If frames/watt are similar, but TFLOPS/watt are different, doesn't that imply that there is a difference in Frames/TFLOP? Yes it does.

CardTFLOPSFramesFrames/TFLOP
RTX 307020.31116.65.741014279
RTX 3060Ti16.2106.36.561728395
RTX 3060 (12GB)12.7483.66.562009419
RTX 308029.77142.14.773261673
RTX 309035.59152.74.290531048
RX 680032.33130.84.045777915
RX 6700xt26.421124.239212718
RX 6900xt46.08148.13.213975694
RX 6800xt41.47142.83.443453099

The Ampere cards are stomping all over the face of the RDNA 2 cards, not in the number of TFLOPS, but their quality.

What the hell does all this mean?

The first thing to take away is that there is huge variation in these devices. There isn't one Rosetta stone that allows us to definitively determine which arch is more efficient, or more powerful, but X%, all the time every time.

The second is that the numbers are a bit deceiving. We get hung up on comparing TFLOPS to TFLOPS, but an RDNA2 TFLOP isn't an Ampere TFLOP... but from this chart we can see a 3060 TFLOP isn't a 3090 TFLOP!
All I'm saying is that Nvidia cards will consume less power for similar performance (measured in FPS, not Flops). And is well known that the higher end models are less efficient given their higher clocks.
 
TL;DR I don’t understand the point of your comment when the only direct confirmation comes from the company selling the product to you, but the company that sells the product never does that, therefore you’re waiting for information that will not exist, ever.
My confirmation will be from the likes of Digital Foundry based on decided, confirmed final hardware, not from a data leak that looked to have been planned at some point in the past.
Reason I've posted this more than once is that people keep saying "we know", "we know", "we know".
You don't know anything at this moment in time.

I think there's a good chance it will be T239-based and I'll be very happy if it is (assuming no crippled aspects) but I (and everyone or almost, almost everyone here) don't know.
 
Last edited:
Because Steam Deck is 1.6 and Steam Deck so Big. And because we talked about how much pushing it 1TFLOP would be for so long in the 8nm days.

For the record, I don't think 1.5 TFLOP is where we'll land. @Thraktor is right that 1.5 TFLOP is maximum efficiency, but max efficiency isn't max battery life. And 3 TFLOPS seems like the docked max before the GPU starts to get bottlenecked across the board. So I wouldn't be surprised to find something a little shy of that in both modes.

The difference between 400MHz GPU and 500MHz GPU is yawn. It's not that it doesn't matter - it's a 25% increase in perf - but it's not fundamental. Especially with so many other aspects of the system seemingly in place.
This would make more sense if the GPU was 1280 cuda core shaders TBH, on TSMC 4N, 1280 cuda cores @ 586MHz is enough to hit 1.5TFLOPs, with Drake, the portable clock would have to be 489MHz to hit 1.5TFLOPs. Thraktor put the clock at 550MHz to 600MHz, which is 1.69TFLOPs to 1.84TFLOPs for Drake, both are much more reasonable, but when we look at what Nintendo actually did with Switch's portable clocks, we see that they pushed the TX1 all the way to 460MHz on a 2015 2D transistor 20nm (flawed power bleeding) process node, it wasn't the best for battery life, it was what their target performance was. The reason I believe that the DLSS test reveals the clocks, is because, well... those clocks have no reason to be there, and the power consumption naming, does seem to line up closely with estimations of what Ampere would draw with Drake's configuration on TSMC 4N.

I also just don't expect locked GPU clocks with Drake, I think Nvidia will use a variable clock to save battery life, much like the PS5's variable GPU clocks to reach it's 2.3GHz? target, so 660MHz to hit the target of 2TFLOPs? That seems very Nvidia to me, remember the TX1 was the first "1TFLOPs" mobile GPU and Nvidia talked at length in their marketing about that, hitting "4TFLOPs" with T239 seems exactly like the type of thing Nvidia would love to boast about, it's also about 7TFLOPs docked via mixed precision with Nvidia's marketing numbers, while realistic gaming "FLOPs" (I admit it's flawed to use these numbers as the GPU is clearly never going to be the bottleneck here) is more like 3TFLOPs mixed precision and 6TFLOPs mixed precision when docked... This thread should not look at these numbers like this though, they should compare it to SD 8Gen2 for instance, which is a 3.5TFLOPs GPU, Switch 2 when portable will offer just 2TFLOPs fp32 performance, it's only when docked that it can match the SD 8Gen2's GPU, though the benefit here is that T239 is actively cooled, so it will always be able to hit these numbers.

Sorry, this thread is getting a bit more exciting now that Drake news is finally around the corner, however my point is pretty simple, if all they wanted to do was hit below 1.5TFLOPs and 3TFLOPs in portable and docked modes, they would have used a smaller GPU, 10SM would make more sense for the performance you are expecting, going to 12SM is an extreme measure that isn't needed unless you are looking to push mobile clocks higher IMO. It's the same reasoning that Samsung 8nm is off the table here, just to a lesser extent. We also know for a fact that Nintendo marketed the Wii U to developers as a "360+" device, they literally set their target to be better than the 360 in GPU performance, them doing the same with PS4 now makes a lot of sense, so targeting 1843GFLOPs or 2TFLOPs fits more with what Nintendo has done in the past than a 1.5TFLOPs. On top of this reasoning, I think Nvidia just have egos, they want to release something that is better than the PS4 in raw performance, when they showed of TX1, they showed it off with the same demo PS4 used in it's showcase, this is the type of thing Nvidia just cares about. Anyways it's a very small difference, 1.5TFLOPs or 1.843TFLOPs is the same level of performance difference as the XB1's original 1.228TFLOPs and the XB1S 1.4TFLOPs.
My confirmation will be from the likes of Digital Foundry based on decided, confirmed final hardware, ot from a data leak that looked to have been planned at some point in the past.
Reason I've posted this more than once is that people keep saying "we know", "we know", "we know".
You don't know anything at this moment in time.

I think there's a good chance it will be T239-based and I'll be very happy if it is (assuming no crippled aspects) but I (and everyone or almost, almost everyone here) don't know.

Let me decode this for you a bit:

Kopite7 on June 10th 2021 tweeted that Nintendo will use a custom Tegra called T239 for the next Nintendo hardware.

Nvidia hack reveals NVN2 and NVN2 referencing T239 as it's target hardware on March 1st, 2022.

Public Linux Kernal update on September 5th adds support for T239.

If Nintendo were to have moved to another chip, T239 work would have stopped, Nvidia sells Orin to customers all the way down the stack, from 45 watts to 5 watts, there is no place for the same architecture to exist in the public space, T239 is a custom chip for a private company, and Nvidia is the supplier of Nintendo's next hardware device, we also know from the hack, that there is no other Ampere SoC being made, so unless you think Nvidia is making an even more advance SoC for Nintendo and just making T239 for no one... Then Switch 2 is absolutely using T239, this is deductive reasoning sure, this is occams razor sure, this is the logical conclusion sure... The confirmation that Nintendo is using T239 will never come, even tear downs will almost certainly not clearly identify the chip, what we have is as close to a confirmation as you can get, that indeed Switch 2 will be powered by a 1536 cuda core, 48 tensor core, 12RT core Nvidia Ampere GPU with 8 A78C cores and a 128bit memory bus.
 
Last edited:
Just wanted to say I’ve always loved reading your posts. Read them all. I’ve read this entire thread. I know full well what the speculated/expected hardware specs for this thing is. I get it.

I’m not arguing against the specs.

I’m arguing how new Switch hardware will be positioned by Nintendo.

Power differentials aren’t the be all, end all of how a console is positioned and treated.

The Wii U -> Switch power differential was relatively minimal...yet we knew it was being treated as a gen breaking successor and knew it was completely replacing and supplanting the Wii U. Similar to knowing the Series S was completely supplanting the Xbox One X despite not having huge differentials.

Likewise, large power differentials in spec doesn’t necessarily dictate that it must be a gen breaking successor either.

T239 seems like a fantastic mobile DLSS machine, right? It seems optimized to efficiently and competently run AI DLSS and some light RT on their respective cores at extremely minimal clocks and power draws.

This is amazing.

But i would suppose Nintendo will use it how it appears to be designed. Render Switch games at their OLED TX1+ profiles and use the power of the new SoC to output it at much higher resolutions than they can now and giving their games far better performance than they can squeeze out now with extra headroom to up the visual IQ to ps4pro/One X levels.

Right?

It will be like modern pc development except Nintendo is only optimizing for two profiles. Starfield, for example, is releasing to minimum specs of a gpu from
2016 and a cpu from 2015. People will also be playing it on their rtx4090 and 12 core cpu from 2022. It’s fundamentally the same game.

Why wouldn’t Nintendo approach their game development be like this for most their games over the next 5 years or so?

We all agree this is what will be done with the eventual Metroid Prime 4 release, yes? They are going to take the game they have been developing on the Tx1+ and use the power of t239 to make it look and run a lot better for that machine.

Now, if there is some out of the blue “gimmick” that we do not see coming that drives Nintendo to change the way we play their games and can only be done on the T239…then all bets are off.

I’m just going by what we know what the hardware is and what it’s designed to do and by what Nintendo has said concerning their devices and the future and the Switch.




Heck, even if it’s treated and positioned exactly like a ps5 and it’s made clear it’s a gen breaking next gen console successor as it’s clear the ps5 was for the ps4/pro…you will at least agree the “cross gen” support for the OLED and Lite will be much longer than it is with the ps4 and ps4 pro…right?

No. I don’t think there’s any evidence to support this as yet.

I’m not saying it won’t happen but I can see Nintendo actually wanting some next gen exclusives even early on in order to push the hardware.
 
Reminder that T239 or any hardware details are not "known" or confirmed despite many posters stating as such, as correct as the leak was back then.
TL;DR I don’t understand the point of your comment when the only direct confirmation comes from the company selling the product to you, but the company that sells the product never does that, therefore you’re waiting for information that will not exist, ever.
My confirmation will be from the likes of Digital Foundry based on decided, confirmed final hardware, ot from a data leak that looked to have been planned at some point in the past.
Reason I've posted this more than once is that people keep saying "we know", "we know", "we know".
You don't know anything at this moment in time.

I think there's a good chance it will be T239-based and I'll be very happy if it is (assuming no crippled aspects) but I (and everyone or almost, almost everyone here) don't know.

I think you're both right.

We know for sure T239 exists, both from the Nvidia leak and Linux commits.
We know there was an in-flux NVN2 implementation for it at some point.
In the absence of anything pointing otherwise, thinking T239 is our chip is a good assumption.
That chip represented a pretty big investment that would otherwise be lost as Nvidia hasn't announced anything that would use it.

But it's still only an assumption.

I think Nintendo is known to work on multiple concurrent hardware.
They very probably brought multiple solutions to the prototype stage.
Only the chip of one of those solutions leaked.
Nvidia could have worked on it by itself eyeing multiple customers, both handheld and laptops, and NVN2 was just part of the pitch to Nintendo.

We'll only know for sure around announcement when Eurogamer or others will publish the exact specs.

I, for one, will keep assuming T239 is our guy.
 
I think you're both right.

We know for sure T239 exists, both from the Nvidia leak and Linux commits.
We know there was an in-flux NVN2 implementation for it at some point.
In the absence of anything pointing otherwise, thinking T239 is our chip is a good assumption.
That chip represented a pretty big investment that would otherwise be lost as Nvidia hasn't announced anything that would use it.

But it's still only an assumption.

I think Nintendo is known to work on multiple concurrent hardware.
They very probably brought multiple solutions to the prototype stage.
Only the chip of one of those solutions leaked.
Nvidia could have worked on it by itself eyeing multiple customers, both handheld and laptops, and NVN2 was just part of the pitch to Nintendo.


We'll only know for sure around announcement when Eurogamer or others will publish the exact specs.

I, for one, will keep assuming T239 is our guy.
In this case, the bolded is wrong, while Nintendo could have worked with other vendors for possible hardware solutions, Nvidia's Tegra and Ampere lines were completely exposed in that hack, there is no other Ampere or Ada based SoC being worked on, The entire Ampere GPU stack only has 2 SoCs, the GA10B (Orin) and the GA10F (Drake), Orin actually fills the entire customer stack from 45+watts down to 5 watts, there is no room for a second public Ampere Tegra solution, especially not one coming in 2024, and in 2025 they will have publicly moved on to Thor.

NVN2 only has T239 as target hardware, and the dates here are also really important... The timeline is part of the evidence, the chip's very existence was first known via Kopite7's June 10th 2021 tweet, the DLSS tests on NVN2 were from summer 2021, and the hack on March 1st reveals that as a couple weeks prior, T239 was the target hardware for NVN2. Linux Kernel support for T239 was publicly added on September 5th 2022, and further updates through Linux prove that it continued to be improved for Linux for the rest of the year. Dev kits for Switch 2 started early this year and by summer all major 3rd parties have Switch 2 devkits, there just isn't time for another chip to be designed and produced between February 2022 and early this year, for devkits to not have T239 in it, and certainly not enough time for Nintendo to feel comfortable giving all major 3rd parties these devkits as of July this year.

Switch 2 is beyond a shadow of a doubt, powered by T239 or isn't produced by Nvidia.
 
Nor should anyone else be.
Hell, I'm happy with the current Switch for the most part and to have something significantly more powerful now too?

I don't care if the ports of some current games run at 1080p/30 as long as it's stable. The portable aspect makes up for all of it for me. Switch easily became my platform of preference even having a gaming PC and a Steam Deck.
 
Dev kits for Switch 2 started early this year and by summer all major 3rd parties have Switch 2 devkits, there just isn't time for another chip to be designed and produced between February 2022 and early this year, for devkits to not have T239 in it, and certainly not enough time for Nintendo to feel comfortable giving all major 3rd parties these devkits as of July this year.
I'm not claiming they could have cancelled T239 last year and invested in a new chip.
They would have invested in concurrent solutions all along since 2019. We just wouldn't know about them.

Devkits are in the wild, but I believe we have yet to see evidence linking those to T239.

Switch 2 is beyond a shadow of a doubt, powered by T239 or isn't produced by Nvidia.
I agree.

As I said, I'm assuming T239, as most of us, and we are probably right.
But until the definitive specs are out, 5% of doubt remains.
 
Eurogamer interviewed Phil Spencer, and his answer regarding Series S may be relevant to our hardware discussion:
I don't see a world where we drop [Series] S. In terms of parity [...] I think that's more that the community is talking about it. There are features that ship on X today that do not ship on S, even from our own games, like ray-tracing that works on X, it's not on S in certain games. So for an S customer, they spent roughly half what the X customer bought, they understand that it's not going to run the same way.

I want to make sure games are available on both, that's our job as a platform holder and we're committed to that with our partners. [...] Having an entry level price point for console, sub-$300, is a good thing for the industry. I think it's important, the Switch has been able to do that, in terms of kind of the traditional plug-into-my-television consoles. I think it's important. So we're committed.
  • Spencer claimed that there’s no Series S parity requirement, and implied that it’s only a fan theory.
  • MS is committed to supporting Series S because it’s important to have a low entry price point.
The Lite model, despite its lower sales figure, plays a similar strategic role. Until Nintendo is able to release a low-cost NG Lite, it will most likely remain in the product lineup. I understand that some want Nintendo to drop support for the OG to incentivize users to move on, but personally I don’t think that’d be the company’s strategy.

Because a low entry point is crucial (not to mention 129MM install base), Nintendo probably will maintain a steady cross-gen output for the OG models until the Lite NG comes out. As Spencer suggested, certain games may have to cut features on the OG (and run like a potato), but customers understand that they have an old console. This itself is an incentive for the users to upgrade, without outright cutting off the install base.
 
If it can't do that 7+ years later, someone was sleeping at the wheel.
I meant 3x the current Switch GPU's docked performance for Switch 2's handheld GPU performance.

3x Switch GPU docked performance in handheld mode would also make Switch 2 a 9x leap in power over Switch in handheld mode considering the Switch handheld mode is 157.3gflops as standard and 193.5gflops (when special permission is asked for from Nintendo but isn't always given which would still be a x7.75 leap).

So saying Switch 2's GPU performance in handheld mode will be 3x Switch's GPU's docked mode is a very big expectation especially considering Nintendo will be battery / size / cooling / price sensitive in terms of Switch 2 as a product and want it to be as sleek / cool / $399 all while having a 3 hour battery life at bare minimum.

We are not seeing the usual 10x leap in tech power with every passing console generation anymore like from PS1 to PS2 to PS3. PS4 to PS5 is around a 5x leap (best case scenario) in GPU flops. Now do we really think Nintendo are going to hit a 9x leap in GPU compute in a handheld form factor especially when you consider they have DLSS as a fall back once the system is docked so in turn the handheld GPU flops don't need to be as high to then double when docked versus if they did not have access to DLSS when the console was docked.

I'm starting to sound like Scott Steiner's math now :3
 
As I said, I'm assuming T239, as most of us, and we are probably right.
But until the definitive specs are out, 5% of doubt remains.
I'm not sure I understand this point. What is the the other possibilities if it isn't T239 or just not nvidia? I think this is taking devil's advocate to an extreme for no reason but to "cover all bases,"
 
We know for a fact that Nintendo marketed the Wii U to developers as a "360+" device, they literally set their target to be better than the 360 in GPU performance, them doing the same with PS4 now makes a lot of sense, so targeting 1843GFLOPs or 2TFLOPs fits more with what Nintendo has done in the past than a 1.5TFLOPs. On top of this reasoning, I think Nvidia just have egos, they want to release something that is better than the PS4 in raw performance, when they showed of TX1, they showed it off with the same demo PS4 used in it's showcase, this is the type of thing Nvidia just cares about.

If I may be so bold, PS4 is small fry for Nvidia. I can totally see them using a pc title that has been optimised to hell and back to use their technology as a showcase piece for Nintendo to show how close it gets to PS5.

That title IMO would be cyberpunk 2077, it's been day and date getting nvidia tech updates, and I think a Drake version could be a good candidate for a switch RTX and DLSS showcase, especially with how awful the game was on PS4. The devs seem to have a good relationship with Nintendo as well and I can see huge sales potential for such a game as a Drake launch title.
 
If I may be so bold, PS4 is small fry for Nvidia. I can totally see them using a pc title that has been optimised to hell and back to use their technology as a showcase piece for Nintendo to show how close it gets to PS5.

That title IMO would be cyberpunk 2077, it's been day and date getting nvidia tech updates, and I think a Drake version could be a good candidate for a switch RTX and DLSS showcase, especially with how awful the game was on PS4. The devs seem to have a good relationship with Nintendo as well and I can see huge sales potential for such a game as a Drake launch title.
CP2077 IS a last gen title, though.
 
Eurogamer interviewed Phil Spencer, and his answer regarding Series S may be relevant to our hardware discussion:

  • Spencer claimed that there’s no Series S parity requirement, and implied that it’s only a fan theory.
  • MS is committed to supporting Series S because it’s important to have a low entry price point.
The Lite model, despite its lower sales figure, plays a similar strategic role. Until Nintendo is able to release a low-cost NG Lite, it will most likely remain in the product lineup. I understand that some want Nintendo to drop support for the OG to incentivize users to move on, but personally I don’t think that’d be the company’s strategy.

Because a low entry point is crucial (not to mention 129MM install base), Nintendo probably will maintain a steady cross-gen output for the OG models until the Lite NG comes out. As Spencer suggested, certain games may have to cut features on the OG (and run like a potato), but customers understand that they have an old console. This itself is an incentive for the users to upgrade, without outright cutting off the install base.
This would've had weight if Xbox hardware wasn't on the decline.

Series S though is the better selling console of the two SKUS, even though some regions for some time only had the Series S option.

As for hardware... Seems like devs are letting their frustrations known:



Unlike the Switch or hybrid market that Nintendo occupies, the Series S is trying to fit in the traditional console market that AAA tech pushing devs want to make games for. It is different then what Nintendo occupies or aims for.
 
Please read this staff post before posting.

Furthermore, according to this follow-up post, all off-topic chat will be moderated.
Last edited:


Back
Top Bottom