StarTopic Future Nintendo Hardware & Technology Speculation & Discussion |ST| (Read the staff posts before commenting!)

Derachi · Nov 8, 2022

Brofield said:
$450 USD is looking like $607.18 CAD with current exchange rates, before taxes.

Coming to nearly $700 after taxes in the Canadian market...fuck. I'm seriously starting to doubt if I'll even want to put a pre-order down in the first six months on the market if that's the price. I don't want to buy in on this if there's even so much as a slight risk to hardware build quality on the same level of catastrophic failure with joy con drift. Nintendo still may not want to acknowledge it due to the ongoing lawsuit, but then I can't also see them explain in plain English that they learned from their mistake and course corrected where needed.

Yeah I’m ready for this thing to come in at $600 at the bare minimum and I’m not particularly thrilled about it.

Honestly, I’m in at almost any price under $1000, as long as the price is justified with good, high quality, durable hardware. If they say “$650 Canadian” and they show it with the same old Joy-Cons with the same old sticks… I mean I’ll still buy it but I’m gonna shake my head the whole time.

Brofield · Nov 8, 2022

Derachi said:
Yeah I’m ready for this thing to come in at $600 at the bare minimum and I’m not particularly thrilled about it.

Honestly, I’m in at almost any price under $1000, as long as the price is justified with good, high quality, durable hardware. If they say “$650 Canadian” and they show it with the same old Joy-Cons with the same old sticks… I mean I’ll still buy it but I’m gonna shake my head the whole time.

It's a legit fear; I really wish I could see Nintendo make the announcement of "We were wrong, we dun goofed, here's what we did to fix that", but I know no sane PR rep would ever sign off on that kind of speech ahead of a new console reveal even if they give the console the same spectacle they did with the OG Switch conference in Jan 2017.

Anatole · Nov 8, 2022

I've been seeing a lot of posts about the cost of DLSS lately, but it's mostly based on what's been shown empirically. I wanted to write a short explainer post that talks about what a convolutional autoencoder is and shows how to calculate the number of operations that you would have to run.

Part I: operations

The basic idea of a convolutional autoencoder is that you have two steps: convolution and convolution transpose. Both of these steps simply slide a series of filters across an image; the difference is that one of them downsamples and the other one upsamples. The amount of downsampling/upsampling that these filters cause is controlled by the stride size, which is the number of pixels the filter will skip over as it’s sliding around the image. For example, with a stride size of 1, a convolution filter will move 1 pixel at a time; for a stride size of 2, it will move 2 pixels at a time.

In the convolution step, you progressively compress and downsample the image while increasing the number of learned features in each layer. Essentially, this forces the neural network to select which features are important to preserve for reconstruction. Convolution with a stride size of 2 downsamples a 1920 x 1080 image to 960 x 540.

In the convolution transpose step (sometimes erroneously called “deconvolution”), you progressively reconstruct and upsample the output image using all of the learned features. Convolution transpose with a stride size of 2 upsamples a 960 x 540 image to 1920 x 1080.

The number of filters used in one layer corresponds to the number of learned features/channels in the next layer. We know that the output has exactly 3 channels since it’s an RBG image, but we can pick any number of channels that we want for the hidden layers. A good rule of thumb with a stride size of 2 is to double the number of channels in each convolutional layer and roughly halve the number of channels in each convolutional transpose layer. However, this is not a strict rule (as I'll talk about in a sec).

Part II: a simple network

The simplest kind of neural network you can make has three layers: the input layer, the output layer, and a single “hidden” layer. The simplest convolutional autoencoder has a single convolution step (from the input layer to the hidden layer) and a single convolutional transpose step (from the hidden layer to the output layer). Here’s a diagram of our simple network:

You can see all the important features in the diagram. For the input, I am assuming 6 channels for simplicity; 3 channels from the current frame projected to the output resolution, and 3 channels from the previous composite frame. (This may not be what is used in practice). In the convolution step, I use a stride size of 2 and double the number of channels, which gives us a 960 x 540 x 12 tensor. For the convolution transpose step, I use a stride size of 2 and 3 channels, corresponding to a 1920 x 1080 image and 3 RGB channels.

When you run one of these networks on the tensor cores, they are implicitly cast to matrix multiplication. That’s what I’m showing in the bottom half of the diagram. For example, in the convolution step, the 1920 x 1080 x 6 tensor is cast to a (960 * 540) x (3 * 3 * 6) matrix, where the 960 and 540 come from using a stride size of 2, 3 is the filter size, and 6 is the number of input channels. We have 12 filters, each of which is a 3 x 3 x 6 tensor; all these filters can be rewritten as a 54 x 12 matrix. The computational cost of this step is the cost of multiplying a 518400 x 54 matrix with a 54 x 12 matrix, which is 335 million operations. The end result is downsampling, with an increase in the number of channels.

In the convolution transpose step, the 960 x 540 x 12 tensor is cast to a (1920 * 1080) x (3 * 3 * 12) matrix. The 1920 and 1080 come from using a stride size of 2 with the convolution transpose operation, 3 is the filter size, and 12 is the number of channels in the hidden layer. We have 3 filters this time (corresponding to the 3 output channels), each of which is a 3 x 3 x 12 tensor. We can rewrite all of these filters as a 108 x 3 matrix. The computational cost is multiplying a 2073600 x 108 matrix by a 108 x 3 matrix, which is 671 million operations. The end result is upsampling, with a decrease in the number of channels.

Now just to talk about sparsity for a second; sparsity arises when the learned filters include values that are nearly zero. This is not always guaranteed, but it tends to happen in the hidden layers when networks become very deep, since the different filters should ideally be learning different features. The convolutional transpose operation also shows a lot of sparsity when written as a matrix operation (half of the values in that 2073600 x 108 matrix are zero). I don’t know any of the low-level hardware details of how it works in the tensor cores, but the basic idea behind sparsity acceleration is that you skip these multiplications by 0, since they only give a negligible contribution to the output.

Part III: takeaways and speculation

Understanding how these operations work gives us important insight into what we see empirically from DLSS. There are several key takeaways:

Filtering is independent of resolution. Once you cast a 3840 x 2160 x 6 tensor to a matrix, you can multiply it by the exact same 54 x 12 filter matrix that you used for the 1920 x 1080 x 6 tensor. This is one of the main features of convolutional neural networks.

Projection to output resolution and warping with motion vectors is likely not handled with AI. If we warped with motion vectors at input resolution, we would lose accuracy (this is shown empirically in the Facebook neural supersampling paper). One solution is to manually project the samples in the current frame to the output resolution, then warp those samples with the motion vectors. This projected image at the output resolution would be the input passed to the neural network. This effectively means that the cost of DLSS should** (more on this later) increase slightly with input resolution, because there would be more samples to handle before running the neural network.

Another solution would be to include motion vectors as separate channels passed to the neural network directly. This method is wasteful because it significantly increases the amount of data being passed to the neural network, and it’s ineffective because using 3x3 filters means that it is very difficult for samples to be warped further than 3x3 pixels per downsampling step.

We can increase the quality of the output by adding more hidden layers. For example, we could create a network with 3 hidden layers. This network would downsample twice, i.e. from 1920 x 1080 to 960 x 540 to 480 x 270, and upsample twice to reconstruct the output. In each successive downsampling step, we would learn more features, which could then be used for reconstruction. We are only limited by two things. First, adding more hidden layers increases computational cost. Second, training becomes more difficult with more layers. With modern algorithms, it is possible to train neural networks that are quite deep, so the first issue is the more important one.

We can increase the quality of the output by adding more channels to the hidden layer. The number 12 that I picked for the hidden layer was arbitrary. We could set the number to 18, 24, or 36 and compare the results. Doubling the number of channels in a hidden layer also doubles the computational cost of that layer.

The different modes of DLSS may use different architectures. I have seen people in this thread say that, for example, Ultra Performance mode is more expensive than Performance mode because it is 9x upscaling instead of 4x upscaling. However, as we established above, filtering is independent of resolution. We could use the exact same architecture regardless of what scaling factor we are using. In theory, the cost of the neural network (after the initial projection to the output resolution and warping with motion vectors) should be identical as long as the architectures are the same.

However, it is possible that DLSS uses different architectures (i.e., an architecture with more hidden layers or more channels in the hidden layer) for different modes. The deeper architecture necessary for Ultra Performance mode may be extraneous for Quality mode. This may explain** why the cost of DLSS is roughly constant; in Quality mode, for example, the slight increase in cost of having to warp more samples at a higher input resolution is offset by the decrease in cost afforded by using a shallower network.

Raccoon · Nov 8, 2022

Brofield said:
It's a legit fear; I really wish I could see Nintendo make the announcement of "We were wrong, we dun goofed, here's what we did to fix that", but I know no sane PR rep would ever sign off on that kind of speech ahead of a new console reveal even if they give the console the same spectacle they did with the OG Switch conference in Jan 2017.

I think "new long-lasting stick technology" is the easiest sell in the world and I'm tired of pretending it's not

they have no need to erroneously admit wrongdoing when they can just say "check it out this new one is better"

Aether · Nov 8, 2022

What are the current speculations, when will more info come out?

I honestly dont expect that great of a housing. with the increased prices for hardware i cant see them splurging for high end konstruction. The switch is fine, but it is the least durable nintendo hardware i have ever had (joycons, warping, striped screw while dissasembly (fan was broken, needed to exchange it...), the kickstand just doesnt really clip in perfectly...
the 2 joycons are also not perfectly alighned.
its fine. it works. but its far from what i know from them and far from even mid level current mobile hardware.

Serif · Nov 8, 2022

Aether said:
I honestly dont expect that great of a housing. with the increased prices for hardware i cant see them splurging for high end konstruction. The switch is fine, but it is the least durable nintendo hardware i have ever had (joycons, warping, striped screw while dissasembly (fan was broken, needed to exchange it...), the kickstand just doesnt really clip in perfectly...
the 2 joycons are also not perfectly alighned.
its fine. it works. but its far from what i know from them and far from even mid level current mobile hardware.

From what I've seen and felt, the OLED is better.

If they just reuse that housing I'd be fine. But the backplate rumour suggested a cutout kickstand, which doesn't match the current Surface-style one.

Deleted member 2 · Nov 8, 2022

Quailman said:
While it doesn't necessarily confirm the switch 2...you can reasonably infer that there is a next gen console being made and that Creatures is looking to develop their next game on said console.

I doubt Creature will develop a game for Microsoft or Sony since Nintendo owns 1/3rd of the company. We can atleast believe that the next gen console is the next gen switch.

You're forgetting Ouya. How could you forget Ouya

yohoopzcrazy · Nov 8, 2022

Hm... something has been bugging me recently. We know that "nvn2 references DLSS 2.2"

I understand that some work probably has to be done on the hardware side of things (like communication with the tensor cores or whatever), but what I don't get is why would the graphics api reference to a specific version of DLSS (2.2)? Shouldn't the DLSS version used in a game be something that is shipped in the software itself?

Or am I misreading things here?

Deleted member 2 · Nov 8, 2022

yohoopzcrazy said:
Hm... something has been bugging me recently. We know that "nvn2 references DLSS 2.2"

I understand that some work probably has to be done on the hardware side of things (like communication with the tensor cores or whatever), but what I don't get is why would the graphics api reference to a specific version of DLSS (2.2)? Shouldn't the DLSS version used in a game be something that is shipped in the software itself?

Or am I misreading things here?

AFAIK when it comes to the Drake, DLSS is a system level thing.

SiG · Nov 8, 2022

yohoopzcrazy said:
Hm... something has been bugging me recently. We know that "nvn2 references DLSS 2.2"

I understand that some work probably has to be done on the hardware side of things (like communication with the tensor cores or whatever), but what I don't get is why would the graphics api reference to a specific version of DLSS (2.2)? Shouldn't the DLSS version used in a game be something that is shipped in the software itself?

Or am I misreading things here?

DLSS 2.+ has always been improved upon per driver updates. Right now, I believe the latest DLSS 2 driver (as of this time of writing) is 2.4.12 v2.

DLSS 2.2 could've been the earliest version they tested it with. Though to be perfectly honest, if DLSS Swapper is any indication, I do wonder if the new platform will use a global DLSS system driver or do it "per game"? Seing how it's a general purpose driver, I'm hoping it's the former with some way to revert to a previous version if the newer itteration of the driver isn't as good (as can be seen by people complaining about how DLSS 2.4.12 v1 felt "broken").

bellydrum said:
AFAIK when it comes to the Drake, DLSS is a system level thing.

I do hope they give us some options with retaining previous version of the driver, but I highly doubt it. That said, if Nintendo is curating these driver updates, they should really do some extensive testing to make sure artifacts are actually less than any older version drivers.

Aether · Nov 8, 2022

Serif said:
From what I've seen and felt, the OLED is better.

If they just reuse that housing I'd be fine. But the backplate rumour suggested a cutout kickstand, which doesn't match the current Surface-style one.

Lets hope.
I hoped for a pro 2 years ago, when itcstarted to break down.
After 40€ for parts (sticks, tools, fan (probably used...) and termal paste it worked fine again, but even the plastic started to chip away on parts.
Im really curious how they will position it in the current market...

Skittzo · Nov 8, 2022

bellydrum said:
AFAIK when it comes to the Drake, DLSS is a system level thing.

DLSS as it is cannot be a system level thing, it needs to be implemented directly into the rendering pipeline of the specific engine for each game.

SiG · Nov 8, 2022

Skittzo said:
DLSS as it is cannot be a system level thing, it needs to be implemented directly into the rendering pipeline of the specific engine for each game.

DLSS as it is can be swapped around between games that support the technology (for DLSS 2.+). The drivers can be made to be system level so games would end up using only those drivers, rather than use the ones than come bundled in the game.

ILikeFeet · Nov 8, 2022

Skittzo said:
DLSS as it is cannot be a system level thing, it needs to be implemented directly into the rendering pipeline of the specific engine for each game.

I think they mean the version DLSS will be system level. instead of each game having its own version of DLSS

Pokemaniac · Nov 8, 2022

SiG said:
DLSS as it is can be swapped around between games that support the technology (for DLSS 2.+).

I don't think that's really an officially supported configuration, nor does doing that make sense for a console.

I don't know on what basis the specific version of DLSS is being claimed, but it's plausible that there could be some level of coupling between NVN version and DLSS version, especially if the API is fairly different than normal.

SiG · Nov 8, 2022

Pokemaniac said:
I don't think that's really an officially supported configuration, nor does doing that make sense for a console.

I don't know on what basis the specific version of DLSS is being claimed, but it's plausible that there could be some level of coupling between NVN version and DLSS version, especially if the API is fairly different than normal.

Fair point, but I do think Nvidia/Nintendo would want to optimize whatever version of DLSS they're using on the platform, and subsequently want every game that utilizes it to benefit, so I could see a system level style implementation of DLSS with is updated via firmware.

Deleted member 2 · Nov 8, 2022

Point is - I don't know anything about any of this

Pokemaniac · Nov 8, 2022

SiG said:
Fair point, but I do think Nvidia/Nintendo would want to optimize whatever version of DLSS they're using on the platform, and subsequently want every game that utilizes it to benefit, so I could see a system level style implementation of DLSS with is updated via firmware.

I don't think that really tracks with what we know about how Switch software gets distributed. If even parts of the GPU driver are getting vendored, then the DLSS library most certainly will as well.

Consoles aim to provide a very stable and unchanging environment. A change as impactful as upgrading the DLSS library is going to be undesirable.

SiG · Nov 8, 2022

Pokemaniac said:
I don't think that really tracks with what we know about how Switch software gets distributed. If even parts of the GPU driver are getting vendored, then the DLSS library most certainly will as well.

Consoles aim to provide a very stable and unchanging environment. A change as impactful as upgrading the DLSS library is going to be undesirable.

I could see it this way:

Each physical game is bundled with a minimum firmware upgrade, which pretty much works like the current system Nintendo has used for upgrading firmware on it's current and previous portable systems. (If a game detects you're using an older firmware, it will then urge the user to upgrade to a more recent one bundled within the game card.)

Downloads pretty much work with being online, so firmware updates will automatically be pushed that was as well.

For people with different libraries/games, there's already a feature within the Switch called "match my firmware locally" that will allow a user who has the latest firmware to be able to distribute that version among others within a local space.

Now if DLSS version is tied to firmware, or works in a similar fashion as firmware updates, then there will be no problems with keeping everyone up-to-speed with DLSS versions, and since firmware changes are pretty much treated as still being for stability reasons, no matter how minor (if the recent Firmware 15.0.1 was any indication), I can still see new DLSS drivers being released under such pretenses.

Pokemaniac · Nov 8, 2022

SiG said:
I could see it this way:

Each physical game is bundled with a minimum firmware upgrade, which pretty much works like the current system Nintendo has used for upgrading firmware on it's current and previous portable systems. (If a game detects you're using an older firmware, it will then urge the user to upgrade to a more recent one bundled within the game card.)

Downloads pretty much work with being online, so firmware updates will automatically be pushed that was as well.

For people with different libraries/games, there's already a feature within the Switch called "match my firmware locally" that will allow a user who has the latest firmware to be able to distribute that version among others within a local space.

Now if DLSS version is tied to firmware, or works in a similar fashion as firmware updates, then there will be no problems with keeping everyone up-to-speed with DLSS versions, and since firmware changes are pretty much treated as still being for stability reasons, no matter how minor (if the recent Firmware 15.0.1 was any indication), I can still see new DLSS drivers being released under such pretenses.

It's not that I think delivering DLSS updates would be a technical challenge, it's that philosophically, firmware updates that change how games look is very against the typical console philosophy. Even Xbox only does that sort of thing against a specific subset of games that they explicitly test.

SiG · Nov 8, 2022

Pokemaniac said:
It's not that I think delivering DLSS updates would be a technical challenge, it's that philosophically, firmware updates that change how games look is very against the typical console philosophy. Even Xbox only does that sort of thing against a specific subset of games that they explicitly test.

I don't think it's going to "change the way a game looks" any more than regular game updates and firmware updates have "changed a game's feel" due to increased performance/frame-rate, etc. I think Bloodstained and ARK both kind of demonstrated this.

Any improvement on DLSS perhaps won't be discernable except to the trained eye. And I also think the Switch was anything but "typical console philosophy" (or any of Nintendo's previous consoles for that matter).

Pokemaniac · Nov 8, 2022

SiG said:
I don't think it's going to "change the way a game looks" any more than regular game updates and firmware updates have "changed a game's feel" due to increased performance/frame-rate, etc. I think Bloodstained and ARK both kind of demonstrated this.

Any improvement on DLSS perhaps won't be discernable except to the trained eye. And I also think the Switch was anything but "typical console philosophy" (or any of Nintendo's previous consoles for that matter).

Firmware updates don't visibly change games. Upgrading DLSS would. It's fine for a game update to change things, because that's sort of the point of updating the game, but the console firmware unilaterally imposing DLSS updates goes against the console philosophy, where the system should continue acting exactly as the developer originally expected it to.

Deleted member 887 · Nov 8, 2022

DLSS will ship as part of the SDK and be linked at build time with the game and ship with said game. We know this from NVN2 leaks.

DLAA is very good AA. It is faster than other super sampling AA and more temporally stable than post process AA.

@Anatole’s description of DLSS/CNNs is very nice (and they turned me onto the Facebook paper a while back which is absolutely worth reading). Going by what we know about NVN2 I would expect the performance behaviors of DLSS on Switch to be identical to the desktop version but I am curious about possible customizations that reduce frame time cost, and maintain scaling factor at the cost of reduced image quality.

NintenDuvo · Nov 8, 2022

Did Nate ever share what his update was?

Deleted member 887 · Nov 8, 2022

NintenDuvo said:
Did Nate ever share what his update was?

If Nate has an update that is more than “situation unchanged” I’m sure it will show up in a podcast rather than here

Brofield · Nov 9, 2022

NintenDuvo said:
Did Nate ever share what his update was?

In addition to the above, I think there's still confirmation being sought. I wouldn't hold our collective breaths for any news from Nate before the end of the month, let alone year.

ReddDreadtheLead · Nov 9, 2022

oldpuck said:
Well, "them" in this case is the developer, not Nintendo. PS5 Miracle Porters can do whatever crazy thing they want, but I was referring to PS4 era ports. A game which can run Performance Mode 4k should be able to run 720p+DLAA at half the gpu, resulting in identical IQ, sans res.

Yes the them I’m referring to is the third parties, not Nintendo. If the case example was based on PS4 games running at a lower resolution and with DLAA on for portable mode, then that’s fine as the lowered Res already offset enough where DLAA can be used and still offer a really good if not better image quality than the PS4 version at a higher resolution even if it is a lowered res.

oldpuck said:
That's because DLSS performance doesn't scale linearly with the DLSS upscaling factor - 2x upscaling is not twice as fast as 4x upscaling, 1x upscaling is. 1x upscaling is DLAA. DLSS isn't free either, it just looks free because it hides its cost in the performance wins of the lower internal resolutions. You don't need to factor in DLAA time if you are already factoring in DLSS time.

I never said it is free, neither is free, I’m saying that if you’re going to use DLAA or DLSS, and want to reduce the amount of work/cost needed for the platform, you’re better off using DLSS. Why? Because you do not have to spend more resources to account for the cost that DLAA has having to adjust for something somewhere else in order to hit a desired framerate.

To give the example of what I’m getting at, if a game is 720p native with no post processing, and it is 52FPS average, turning on DLAA can reduce it to 47-48FPS. (example sake)

While if the resolution is dropped from 720p to 540p, the frame-rate increases to say, 67-68FPS. So it goes from around 19ms per frame to around 14.5ms per frame. Now, if DLSS is turned on, not only does cost the DLAA+SS to the target resolution (so again, I’m not saying it’s free here), but now the game is say, 59-62FPS. So the frame time increased from 14.5 to 16.3FPS. Give or take.

I’m aware you said that you’re referring to PlayStation 4 games, but in this scenario, let’s just focus on the PlayStation5/XBSeries games to make the point of what I’m getting at. The drop in the resolution opens up more room for them to add the deep learning super sampling feature versus keeping the native resolution and simply turning on deep learning anti-aliasing. I’m advocating in a scenario where one or the other has to be used, especially for PlayStation5 or Xbox series ports in portable mode, that using the DLAA does not really make sense here, because there’s less room to work with.

Using DLSS in this scenario simply offers the ability of reducing development cost and making optimizations for it. I’m not saying it’s for free, I’m saying that reaching a target frame rate is a lot harder on a resource limited piece of hardware. And as we’ve heard from some developers, even modern vintage gamer has said this, “porting a game to the Nintendo switch is easy. however, optimization for a target is not easy.”

The goal with this idea that I’m presenting here is to reduce how much work is required to ease the development and the pipeline for it as much as possible especially for the portable form factor, hence my original comment a few pages back about memory bandwidth long term.

Portable mode will be the most limiting function of this platform, regardless of how it is looked at. We focus a lot on the performance in TV mode in this thread, but completely forget that the real limiting factor here is the portable mode because it is the weakest mode.

But for PS4/XB1 games I think due to the hardware being better and more efficient, DLAA can be used for portable mode hitting 720p with enough room and offer equal if not better despite having less pixels. No raw-ness to it. Less soft TAA, more smooth DLAA action for games.

oldpuck said:
If for some reason handheld is less than half the GPU, then the math starts to get trickier for the same reason. 540p+DLSS is not necessarily faster than native 720p at low enough clocks.

It depends on other factors but that’s a case by case basis anyway, could be CPU limited for all we know. A raw 720p image is the fastest, but a 720p+DLAA is already slower, it doesn’t have anything freeing up room for it. Lowered res can free up room for it, and DLAA can be used but it’ll look softer. That said, since it is targeting 720p for this example, the cost is much lower than say 1080p to 4k. the scaling is not so much of a problem for that target resolution.

This is only an example with a Turing class card, but the scaling is from lower than 1080p to their target of 1080p, 1440p and 2160p. Internal I believe is 720p.

If the target isn’t that high, like say to 720p, the cost shouldn’t also be that high.

Oh, and DLAA would increase power draw

ertaboy356b · Nov 9, 2022

I heard Nintendo is contemplating whether to increase the price of the switch or not. I'm sure they wouldn't increase the price of an already outdated device right? Probably a hint that the "next" switch release will have a higher price than usual.

Hermii · Nov 9, 2022

ertaboy356b said:
I heard Nintendo is contemplating whether to increase the price of the switch or not. I'm sure they wouldn't increase the price of an already outdated device right? Probably a hint that the "next" switch release will have a higher price than usual.

Nah its just a standard non answer to a question of wether they would raise the price. They dont want to say theyre never going to, but have no current plans to do it.

Hermii · Nov 9, 2022

Anatole said:
I've been seeing a lot of posts about the cost of DLSS lately, but it's mostly based on what's been shown empirically. I wanted to write a short explainer post that talks about what a convolutional autoencoder is and shows how to calculate the number of operations that you would have to run.

Part I: operations

The basic idea of a convolutional autoencoder is that you have two steps: convolution and convolution transpose. Both of these steps simply slide a series of filters across an image; the difference is that one of them downsamples and the other one upsamples. The amount of downsampling/upsampling that these filters cause is controlled by the stride size, which is the number of pixels the filter will skip over as it’s sliding around the image. For example, with a stride size of 1, a convolution filter will move 1 pixel at a time; for a stride size of 2, it will move 2 pixels at a time.

In the convolution step, you progressively compress and downsample the image while increasing the number of learned features in each layer. Essentially, this forces the neural network to select which features are important to preserve for reconstruction. Convolution with a stride size of 2 downsamples a 1920 x 1080 image to 960 x 540.

In the convolution transpose step (sometimes erroneously called “deconvolution”), you progressively reconstruct and upsample the output image using all of the learned features. Convolution transpose with a stride size of 2 upsamples a 960 x 540 image to 1920 x 1080.

The number of filters used in one layer corresponds to the number of learned features/channels in the next layer. We know that the output has exactly 3 channels since it’s an RBG image, but we can pick any number of channels that we want for the hidden layers. A good rule of thumb with a stride size of 2 is to double the number of channels in each convolutional layer and roughly halve the number of channels in each convolutional transpose layer. However, this is not a strict rule (as I'll talk about in a sec).

Part II: a simple network

The simplest kind of neural network you can make has three layers: the input layer, the output layer, and a single “hidden” layer. The simplest convolutional autoencoder has a single convolution step (from the input layer to the hidden layer) and a single convolutional transpose step (from the hidden layer to the output layer). Here’s a diagram of our simple network:

You can see all the important features in the diagram. For the input, I am assuming 6 channels for simplicity; 3 channels from the current frame projected to the output resolution, and 3 channels from the previous composite frame. (This may not be what is used in practice). In the convolution step, I use a stride size of 2 and double the number of channels, which gives us a 960 x 540 x 12 tensor. For the convolution transpose step, I use a stride size of 2 and 3 channels, corresponding to a 1920 x 1080 image and 3 RGB channels.

When you run one of these networks on the tensor cores, they are implicitly cast to matrix multiplication. That’s what I’m showing in the bottom half of the diagram. For example, in the convolution step, the 1920 x 1080 x 6 tensor is cast to a (960 * 540) x (3 * 3 * 6) matrix, where the 960 and 540 come from using a stride size of 2, 3 is the filter size, and 6 is the number of input channels. We have 12 filters, each of which is a 3 x 3 x 6 tensor; all these filters can be rewritten as a 54 x 12 matrix. The computational cost of this step is the cost of multiplying a 518400 x 54 matrix with a 54 x 12 matrix, which is 335 million operations. The end result is downsampling, with an increase in the number of channels.

In the convolution transpose step, the 960 x 540 x 12 tensor is cast to a (1920 * 1080) x (3 * 3 * 12) matrix. The 1920 and 1080 come from using a stride size of 2 with the convolution transpose operation, 3 is the filter size, and 12 is the number of channels in the hidden layer. We have 3 filters this time (corresponding to the 3 output channels), each of which is a 3 x 3 x 12 tensor. We can rewrite all of these filters as a 108 x 3 matrix. The computational cost is multiplying a 2073600 x 108 matrix by a 108 x 3 matrix, which is 671 million operations. The end result is upsampling, with a decrease in the number of channels.

Now just to talk about sparsity for a second; sparsity arises when the learned filters include values that are nearly zero. This is not always guaranteed, but it tends to happen in the hidden layers when networks become very deep, since the different filters should ideally be learning different features. The convolutional transpose operation also shows a lot of sparsity when written as a matrix operation (half of the values in that 2073600 x 108 matrix are zero). I don’t know any of the low-level hardware details of how it works in the tensor cores, but the basic idea behind sparsity acceleration is that you skip these multiplications by 0, since they only give a negligible contribution to the output.

Part III: takeaways and speculation

Understanding how these operations work gives us important insight into what we see empirically from DLSS. There are several key takeaways:

Filtering is independent of resolution. Once you cast a 3840 x 2160 x 6 tensor to a matrix, you can multiply it by the exact same 54 x 12 filter matrix that you used for the 1920 x 1080 x 6 tensor. This is one of the main features of convolutional neural networks.

Projection to output resolution and warping with motion vectors is likely not handled with AI. If we warped with motion vectors at input resolution, we would lose accuracy (this is shown empirically in the Facebook neural supersampling paper). One solution is to manually project the samples in the current frame to the output resolution, then warp those samples with the motion vectors. This projected image at the output resolution would be the input passed to the neural network. This effectively means that the cost of DLSS should** (more on this later) increase slightly with input resolution, because there would be more samples to handle before running the neural network.

Another solution would be to include motion vectors as separate channels passed to the neural network directly. This method is wasteful because it significantly increases the amount of data being passed to the neural network, and it’s ineffective because using 3x3 filters means that it is very difficult for samples to be warped further than 3x3 pixels per downsampling step.

We can increase the quality of the output by adding more hidden layers. For example, we could create a network with 3 hidden layers. This network would downsample twice, i.e. from 1920 x 1080 to 960 x 540 to 480 x 270, and upsample twice to reconstruct the output. In each successive downsampling step, we would learn more features, which could then be used for reconstruction. We are only limited by two things. First, adding more hidden layers increases computational cost. Second, training becomes more difficult with more layers. With modern algorithms, it is possible to train neural networks that are quite deep, so the first issue is the more important one.

We can increase the quality of the output by adding more channels to the hidden layer. The number 12 that I picked for the hidden layer was arbitrary. We could set the number to 18, 24, or 36 and compare the results. Doubling the number of channels in a hidden layer also doubles the computational cost of that layer.

The different modes of DLSS may use different architectures. I have seen people in this thread say that, for example, Ultra Performance mode is more expensive than Performance mode because it is 9x upscaling instead of 4x upscaling. However, as we established above, filtering is independent of resolution. We could use the exact same architecture regardless of what scaling factor we are using. In theory, the cost of the neural network (after the initial projection to the output resolution and warping with motion vectors) should be identical as long as the architectures are the same.

However, it is possible that DLSS uses different architectures (i.e., an architecture with more hidden layers or more channels in the hidden layer) for different modes. The deeper architecture necessary for Ultra Performance mode may be extraneous for Quality mode. This may explain** why the cost of DLSS is roughly constant; in Quality mode, for example, the slight increase in cost of having to warp more samples at a higher input resolution is offset by the decrease in cost afforded by using a shallower network.

I do appreciate all the effort you put into writing this post!

AxkilAvenger · Nov 9, 2022

ertaboy356b said:
I heard Nintendo is contemplating whether to increase the price of the switch or not. I'm sure they wouldn't increase the price of an already outdated device right? Probably a hint that the "next" switch release will have a higher price than usual.

Yeah people are having a field day about this over on era lol. I highly doubt this would be for switch. Perhaps they are contemplating raising whatever price they are planning for pro/2. Then again I don’t know if they’d already have a price in mind.

Skittzo · Nov 9, 2022

ertaboy356b said:
I heard Nintendo is contemplating whether to increase the price of the switch or not. I'm sure they wouldn't increase the price of an already outdated device right? Probably a hint that the "next" switch release will have a higher price than usual.

This is another case of news company trying to find anything meaningful in a quote.

They said they are not planning to raise the Switch price, but will keep monitoring the situation. That's as much of a "no" as a company like this will ever give.

Kartina · Nov 9, 2022

So what's the expectation for drake release date?
Because I'm starting to think we're going to see it release holidays 2023 maybe with a new 3d Mario.

ILikeFeet · Nov 9, 2022

Kartina said:
So what's the expectation for drake release date?
Because I'm starting to think we're going to see it release holidays 2023 maybe with a new 3d Mario.

You'll get eleventy billion answers but I think it will release alongside Zelda

Bonejack · Nov 9, 2022

ILikeFeet said:
You'll get eleventy billion answers but I think it will release alongside Zelda

Team before Zelda in (end of) March or April! ;D

Truno · Nov 9, 2022

Kartina said:
So what's the expectation for drake release date?
Because I'm starting to think we're going to see it release holidays 2023 maybe with a new 3d Mario.

The most common line of thinking is that it'll launch with Zelda. Launching a console near the release of your current biggest game and the first (well, technically not) feature film of your companies mascot would be a marketing dream, right? That certainly makes sense to me

If we don't start hearing more rumblings soon I'm just gonna assume that they're gonna wait until the next 3D Mario or Mario Kart

ILikeFeet · Nov 9, 2022

Bonejack said:
Team before Zelda in (end of) March or April! ;D

I think if it was gonna hit this fiscal year, we'd know a lot more by now

Homura · Nov 9, 2022

Kartina said:
So what's the expectation for drake release date?
Because I'm starting to think we're going to see it release holidays 2023 maybe with a new 3d Mario.

Either April or May 2023.
A release in the first three months of 2023 doesn't fit Nintendo's hardware forecast for this FY, and I really doubt Nintendo will release Drake after ToTK is out. It simply does not make sense not to capitalize on their biggest and most awaited single player title.

Thraktor · Nov 9, 2022

Homura said:
I think the March dream is over, actually. It wouldn't make sense for Nintendo to decrease their hardware forecast from 21 million to 19 million for this FY, if they were planning to release a new hardware model.

So in my opinion, the only possible months for Drake are April and May.

I don’t want to resurrect the pro vs next-gen debate, but it’s possible that the new model could be released this FY without being included in these forecasts. If they consider it a separate line of products from the current lineup, they could track sales for it separately from existing Switch models. In which case, if they release it in March, its sales would be in addition to the 19 million forecast, not included within it.

kimbo99 · Nov 9, 2022

AxkilAvenger said:
Yeah people are having a field day about this over on era lol. I highly doubt this would be for switch. Perhaps they are contemplating raising whatever price they are planning for pro/2. Then again I don’t know if they’d already have a price in mind.

When sales momentum for your main piece of hardware is going down, the best thing you can do is increase the price to help it sell more. - My uncle at Nintendo

kimbo99 · Nov 9, 2022

Kartina said:
So what's the expectation for drake release date?
Because I'm starting to think we're going to see it release holidays 2023 maybe with a new 3d Mario.

If it's a pro model(highly doubt), maybe Q2 next year. If it's a fully iterative model, late 23 or early 24(absolute latest).

AxkilAvenger · Nov 9, 2022

kimbo99 said:
When sales momentum for your main piece of hardware is going down, the best thing you can do is increase the price to help it sell more. - My uncle at Nintendo

You know…your uncle might be on to something

Hosermess · Nov 9, 2022

Hermii said:
Nah its just a standard non answer to a question of wether they would raise the price. They dont want to say theyre never going to, but have no current plans to do it.

When will people learn that Nintendo has no plans to ever give a direct answer to a question unless they absolutely have to.

Bonejack · Nov 9, 2022

AxkilAvenger said:
You know…your uncle might be on to something

It worked for Sony in Japan! ;D

MP! · Nov 9, 2022

How would the nvidia driver on your PC interacting with the in game DLSS info be any different than the DLSS driver on nvn2 interacting with the in game info on a drake game?

Both a “come on guys” statement but also a genuine question cause I don’t know the answer and I’m not all that familiar with API and driver relations

Hartmann · Nov 9, 2022

Nintendo says the next generation hardware needs unique new experiences. (And about backward compatibility.)

https://www.nintendo.co.jp/ir/pdf/2022/221109_2.pdf

Q5. You previously mentioned that you recognize that the issue in the game console business is that past software cannot be played due to hardware generation changes and that content is initialized. However, now, through the use of the Nintendo Account and backward compatibility with new hardware, it seems that content can be transferred semi-permanently in the dedicated game console business, just as with video games. I think there are both advantages and disadvantages to making newly released hardware backward compatible with previous hardware, but what kind of discussions are you having internally?

A5. Representative Director and Fellow Shigeru Miyamoto:
We used to offer a system called "Virtual Console" that allowed users to play old software on new hardware. In comparison, video can be enjoyed for a long time as long as there is a playback environment. However, the rights for video are complicated, so Nintendo is proceeding with the project with the rights firmly secured. Indeed, in the past, software development for game consoles was done in a dedicated development environment that differed for each hardware. Therefore, when the hardware changed, the development environment could not be inherited, and software released on past hardware could not be played without modification. Recently, however, the software development environment itself has gradually become more standardized, so it is generally easier to create a replay environment that allows software for past hardware to be played on new hardware. However, Nintendo's strength is in creating new games, so when we release new hardware in the future, we would like to propose unique games that cannot be realized on existing hardware.
(DeepL translation)

Aether · Nov 9, 2022

Skittzo said:
This is another case of news company trying to find anything meaningful in a quote.

They said they are not planning to raise the Switch price, but will keep monitoring the situation. That's as much of a "no" as a company like this will ever give.

Yeah, its a as close to a none answer they can get.

Switch 2 is coming. the switch is over its peak, slowing down, their own studios are pushing it.
They wont NOW increase the price and then in 10 months release an even more expensive version.
If anything, they will keep the price the same instead of reducing it when they release the follow up in a higher price bracket.

And im also really anoyed by a lot of "its the inflation, obviously big companies need to increase prices" arguing online.
many companies have a hiring freeze, people have problems paying for food, i have heard of institutions having to reduce courses to reduce energy and heating cost, and on the other hand i see companies having really great profits... (especially energy companies...)
(sorry for tangent)

Hosermess · Nov 9, 2022

Hartmann said:
Nintendo says the next generation hardware needs unique new experiences. (And about backward compatibility.)

https://www.nintendo.co.jp/ir/pdf/2022/221109_2.pdf

Miyamoto saying he would like to create new ways to play? Stop the presses. Can't see this having anything to do with Drake.

Hosermess · Nov 9, 2022

Hartmann said:
Nintendo says the next generation hardware needs unique new experiences. (And about backward compatibility.)

https://www.nintendo.co.jp/ir/pdf/2022/221109_2.pdf

On a slight tangent, but I google translated the next question and it was about the Mario movie but it said "Where is the highlight of "O Brothers Movie?".

Now I can't stop thinking of a universe where the Mario movie is just a remake of "O, Brother Where Art Thou?" set in the Mario universe with but with that cast doing the voice acting. It could be brilliant.

Hartmann · Nov 9, 2022

And Nintendo Shiota says it is not at a stage where it can talk about the next generation hardware.

Director, Senior Executive Officer Koh Shiota:
With the Nintendo Switch generation, we feel that we have become closer to our customers through the Nintendo Account. While there are many activities based on the basic strategy I explained at today's briefing, I believe that at the heart of these activities lies the integrated hardware and software business that continues to be unique to Nintendo. While the most important aspect of our dedicated game consoles is to deliver a unique game experience, we also believe that in the future we must work on the development of both hardware and software, taking into consideration the ease of connection between the core game business and the customer experience outside of the dedicated game console through the Nintendo Account. We will also need to consider how we can better connect our core game business with the experiences of our customers outside of the dedicated game consoles via Nintendo Account. We are not yet at the stage where we can talk about next-generation consoles, but our most important goal is to create unique products that integrate hardware and software, and we hope to successfully link our activities using the Nintendo Account to this goal.
(DeepL translation)

StarTopic Future Nintendo Hardware & Technology Speculation & Discussion |ST| (Read the staff posts before commenting!)

Fresh Eater

Magical Famicomrade

Octorok

Fox Brigade

Tingle

𝕽𝖊𝖓𝖊𝖌𝖆𝖉𝖊 𝕬𝖓𝖌𝖊𝖑

Deleted member 2

Guest

Cappy

Deleted member 2

Guest

Chain Chomp

Tingle

Baba Yaga Hut

Chain Chomp

Warpstar Knight

Caught: 1025

Chain Chomp

Deleted member 2

Guest

Caught: 1025

Chain Chomp

Caught: 1025

Chain Chomp

Caught: 1025

Deleted member 887

Guest

Bob-omb

Deleted member 887

Guest

Magical Famicomrade

#TeamLate2025WithAPotentialForEarly2026

Resident Troll

Manakete

Manakete

Like Like

Baba Yaga Hut

theprocessbroke

Warpstar Knight

The Ancient One

Koopa

Warpstar Knight

Moblin

"[✄]. [✄]. [✄]. [✄]." -Microsoft

Spirit Detective

Spirit Detective

Like Like

Koopa

The Ancient One

Like Like

Piranha Plant

Tingle

Koopa

Koopa

Piranha Plant