StarTopic Future Nintendo Hardware & Technology Speculation & Discussion |ST| (Read the staff posts before commenting!)

ItWasMeantToBe19 · Mar 12, 2023

oldpuck said:
DLSS 2 already makes heavy use of sparsity and the 3rd gen tensor cores are heavily optimized for sparse operations

I feel confident there are significant optimizations available for DLSS 2.xx if necessary based on how new the algorithm and how much neural network optimization has improved over the last two years.

ReddDreadtheLead · Mar 12, 2023

oldpuck said:
DLSS 2 already makes heavy use of sparsity and the 3rd gen tensor cores are heavily optimized for sparse operations

I’m aware of the second part, but I haven’t seen anything on DLSS actually using sparsity. In fact, that’s simply something that is speculated I think? PC has no use for sparsity with DLSS since it’s so resource rich (re: it has more than enough hardware).

But I don’t think SParsity has been leveraged in any form of DLSS.

To expand more, Sparsity as a feature is an Ampere and later feature, the Ampere desktop cards are more than enough to power DLSS if we use even the lowest RTX card and compare it to Turing using some maths, the worst Ampere performs similarly to one of the lower Turing cards, but already has the tensor flops around equal iirc to it, the difference in its speed up having to be in how DLSS on Ampere does it with the next frame rather than waiting for it like Turing does.

I think Drake would be the first one to actually make use of Sparsity for DLSS.

Deleted member 887 · Mar 12, 2023

ReddDreadtheLead said:
I’m aware of the second part, but I haven’t seen anything on DLSS actually using sparsity. In fact, that’s simply something that is speculated I think? PC has no use for sparsity with DLSS since it’s so resource rich (re: it has more than enough hardware).

But I don’t think SParsity has been leveraged in any form of DLSS.

You know, I think you are correct. I have a spate of DLSS numbers here to check, and my brain says that DLSS performance scales with sparse TOPS not TOPS period, but I can't find any documentation that says so. So definitely a case of me misremembering. I'll check the DLSS numbers tomorrow to see if there is any confirmation in the numbers themselves

ILikeFeet · Mar 12, 2023

ItWasMeantToBe19 said:
I mean, hypothetically yes, but on the other hand, there are very few bleeding edge developers in the Nintendo ecosystem outside of EPD, Next Level, and Retro.

a lot of possible advancement would probably come from NERD.

any other studio, it's probably a question of what can they get out of it that leverages the tensor cores well enough. a lot of hypothetical examples probably don't need too much hardware in the first place

Concernt · Mar 12, 2023

How does DLSS interact with HDR, if at all?

Zedark · Mar 13, 2023

I would like to make the statement that magic is just technology not yet understood

DLSS has several "hidden" costs, so it's no free lunch. For example, NVIDIA had to gather masses of training data, which was probably costly. More relevant for us, however, is that DF has shown that the use of lower wuality textures impacts their relative visual quality after DLSS, and further some other things in the IQ aren't quite up to original native 4K. That said, it is amazing technology and completely makes the trade-offs worthwhile imo.

ILikeFeet said:
that's very subjective. people have different tolerances, and, as I said, the wider audience is more accepting of poor IQ through worse upscaling methods

DLSS does reduce power consumption, but the idea is that DLSS is on the costly side for Drake, computationally. Digital Foundry did a video on that some time ago. there are ways around it, like making a new model for Drake that's lower quality but faster, or finding out that DLSS's floor is lower than expected.

and some games have larger boosts depending on how resolution dependent they are. this is why RT games get higher boosts than raster games, because a big part of the costs are the number of rays shot out

The DF video looked at the problem using an estimate of a much smaller chip than what T239 is. The chip they compared against (RTX2060) has 110 TOPS , which rendered DLSS using an estimated 1.8 ms, while the Drake chip could be somewhere in the range of 50-80 TOPS -- e.g. compare here with the Jetson NX 8GB, which has 32 Tensor cores at 765 MHz for a total of 70 TOPS. Dividing by 2 and multiplying by 1.5 (1536 vs. 1024 CUDA cores means 1.5x SM advantages and half the tensor core performance compared with automotive devices), we get 52.5 TOPS, or about half of what RTX 2060 produces. Note that these clock estimates of 765 MHz might be low for Switch 2 docked, since there is a 1.125 GHz profile floating out there. This would give 77 TOPS if it is the actual frequency. Either way, with half the TOPS you would expect DLSS to be achievable in about 4 ms, or 1/4 of the frame time at 60 fps. This leaves 3/4 of the frame to render a 720p image, which sounds like a workable scenario considering the drop in required native IQ. For 30 fps games, you have 7/8 of the frame time left, which is almost 90% of the rendering time, so it's definitely very useful there.

Another thing is that the 20 series does not have is overlapped ALU/tensor execution. I don't know if this is possible with the Switch form factor due to potentially extra heat, or if it is practical anywhere else even, but if it is then the DLSS cost would be overlapped and be reduced to a small overhead. But let's stick to the first discussion without going into hypotheticals too much

Bosintang · Mar 13, 2023

DLSS is Nvidia's flagship technology and Nintendo is the current market leader on console, and their only client. I am confident that Nvidia is going to want to make DLSS on switch a showcase of their technology on a low powered device. I expect great results.

SiG · Mar 13, 2023

Bosintang said:
DLSS is Nvidia's flagship technology and Nintendo is the current market leader on console, and their only client. I am confident that Nvidia is going to want to make DLSS on switch a showcase of their technology on a low powered device. I expect great results.

I expect the same level of marketing as how Nintendo put a "Graphics by ATi" label on the GameCube.

I could see a similar thing where the box says "Powered by Nvidia DLSS" and does not elaborate more.

Hartmann · Mar 13, 2023

Nintendo has published a new patent for neural network training.

US20230019874A1 - Systems and methods of neural network training - Google Patents

A computer system is provided for training a neural network that converts images. Input images are applied to the neural network and a difference in image values is determined between predicted image data and target image data. A Fast Fourier Transform is taken of the difference. The neural...

patents.google.com

Hermii · Mar 13, 2023

Hartmann said:
Nintendo has published a new patent for neural network training.

US20230019874A1 - Systems and methods of neural network training - Google Patents

A computer system is provided for training a neural network that converts images. Input images are applied to the neural network and a difference in image values is determined between predicted image data and target image data. A Fast Fourier Transform is taken of the difference. The neural...

patents.google.com

What does it say, in less technical/ legal language?

Hermii · Mar 13, 2023

ILikeFeet said:
the problem with comparing Drake to PS5/Series X is that it'd be similar to how Switch is to XBO/PS4. games still need downgrades in asset quality and frame rate in addition to resolution.

the reason DLSS isn't magic is because the performance gains come from lowering the resolution. the "magic" is in the upscaling quality. but FSR does a good enough job too, so there's nothing stopping the other systems from lowering their resolution for increased performance. that's already the best practice right now. that said, DLSS works better than FSR at lower resolutions, so in theory, devs could get away with a lower input res

The major advantage of DLSS, especially on a low power device is using dedicated hardware to reduce runtime.
Has anyone done a comparison between ms runtime expected on Drake vs FSR2 on Series S/ X/ PS5?

Mercury_Sagit · Mar 13, 2023

Hermii said:
What does it say, in less technical/ legal language?

Only a quick read:

The patent claims that the training method uses L1 norm as an objective function. Mathematically, L1 norm promotes a sparse neural network so I guess Nintendo wants to train a sparse version of DLSS which can leverage the sparsity feature of Drake.
The hardware to which the trained neural net is deployed can switch between battery and wall socket as power source. It can also disable the usage of the neural net in "battery" mode and enable it in "socket" mode. This reads like Nintendo plans to only allow DLSS to run in docked mode.
Motion vectors are not mentioned in the patent at all. This makes sense within the context of training to upscale images, but not so if it's video games' visual output. I would be surprised if Nintendo decides to utilize DLSS 1.x, but it is also possible that the patent only covers the training and inference part of the neural net that relates to only the spatial input (images), while omitting the part that relates to the temporal input (motion vectors). Would love to have expert opinion on this possibility and, if yes, why Nintendo would do that.

Will update here if I find further interesting claims inside the patent

ILikeFeet · Mar 13, 2023

Hermii said:
What does it say, in less technical/ legal language?

As far as I can tell, there are some new examples. I don't think the previous patents mentioned fast fourier transformations (though I could be wrong). It's probably an update to better differentiate their process. I'll have to wait until im on my desktop to really take a look at it.

ReddDreadtheLead · Mar 13, 2023

Concernt said:
How does DLSS interact with HDR, if at all?

It should work fine. I remember seeing a slide that had HDR and TAA where it was displaying DLSS taking the place of TAA, but HDR wasn’t affected negatively by this.

Zedark said:
I would like to make the statement that magic is just technology not yet understood

DLSS has several "hidden" costs, so it's no free lunch. For example, NVIDIA had to gather masses of training data, which was probably costly. More relevant for us, however, is that DF has shown that the use of lower wuality textures impacts their relative visual quality after DLSS, and further some other things in the IQ aren't quite up to original native 4K. That said, it is amazing technology and completely makes the trade-offs worthwhile imo.

The DF video looked at the problem using an estimate of a much smaller chip than what T239 is. The chip they compared against (RTX2060) has 110 TOPS , which rendered DLSS using an estimated 1.8 ms, while the Drake chip could be somewhere in the range of 50-80 TOPS -- e.g. compare here with the Jetson NX 8GB, which has 32 Tensor cores at 765 MHz for a total of 70 TOPS. Dividing by 2 and multiplying by 1.5 (1536 vs. 1024 CUDA cores means 1.5x SM advantages and half the tensor cores per SM compared with automotive devices), we get 52.5 TOPS, or about half of what RTX 2060 produces. Note that these clock estimates of 765 MHz might be low for Switch 2 docked, since there is a 1.125 GHz profile floating out there. This would give 77 TOPS if it is the actual frequency. Either way, with half the TOPS you would expect DLSS to be achievable in about 4 ms, or 1/4 of the frame time at 60 fps. This leaves 3/4 of the frame to render a 720p image, which sounds like a workable scenario considering the drop in required native IQ. For 30 fps games, you have 7/8 of the frame time left, which is almost 90% of the rendering time, so it's definitely very useful there.

Another thing is that the 20 series does not have is overlapped ALU/tensor execution. I don't know if this is possible with the Switch form factor due to potentially extra heat, or if it is practical anywhere else even, but if it is then the DLSS cost would be overlapped and be reduced to a small overhead. But let's stick to the first discussion without going into hypotheticals too much

Wait so this is including the Drake tensor cores being half as potent vs ORIN?

Also, this makes it prime for Sparsity to be leveraged for a device like this, now that I think about it if it really can hit that high.

Would mean that, in Theory, the previous statement that @Kazuyamishima said is more or less accurate to the conversation at hand. Or really that it has a little more truth to it than I thought…. hmm.

Bosintang · Mar 13, 2023

SiG said:
I expect the same level of marketing as how Nintendo put a "Graphics by ATi" label on the GameCube.

I could see a similar thing where the box says "Powered by Nvidia DLSS" and does not elaborate more.

Possibly yes. Though I think that Nintendo will try to leverage "AI" rather than DLSS in their marketing materials. Even my mom knows what AI is. If Nintendo can find creative ways to use the tensor cores beyond visual improvements, I could even see that as being treated as a new gimmick for the console.

ReddDreadtheLead · Mar 13, 2023

Hartmann said:
Nintendo has published a new patent for neural network training.

US20230019874A1 - Systems and methods of neural network training - Google Patents

A computer system is provided for training a neural network that converts images. Input images are applied to the neural network and a difference in image values is determined between predicted image data and target image data. A Fast Fourier Transform is taken of the difference. The neural...

patents.google.com

This is @Anatole ’s expertise

SiG · Mar 13, 2023

Bosintang said:
Possibly yes. Though I think that Nintendo will try to leverage "AI" rather than DLSS in their marketing materials. Even my mom knows what AI is. If Nintendo can find creative ways to use the tensor cores beyond visual improvements, I could even see that as being treated as a new gimmick for the console.

The only way I could see Nintendo leveraging AI as something marketable is if it had a Siri/Alexa-like assistant that is used for automation.

Outside of that particular usecase, I don't think AI as "smarter enemies" is as out-of-the-box marketable, seeing how other games also have "smarter enemies" thanks to cleaver scripting. It's not going to be a big selling point.

ILikeFeet · Mar 13, 2023

Zedark said:
I would like to make the statement that magic is just technology not yet understood

DLSS has several "hidden" costs, so it's no free lunch. For example, NVIDIA had to gather masses of training data, which was probably costly. More relevant for us, however, is that DF has shown that the use of lower wuality textures impacts their relative visual quality after DLSS, and further some other things in the IQ aren't quite up to original native 4K. That said, it is amazing technology and completely makes the trade-offs worthwhile imo.

The DF video looked at the problem using an estimate of a much smaller chip than what T239 is. The chip they compared against (RTX2060) has 110 TOPS , which rendered DLSS using an estimated 1.8 ms, while the Drake chip could be somewhere in the range of 50-80 TOPS -- e.g. compare here with the Jetson NX 8GB, which has 32 Tensor cores at 765 MHz for a total of 70 TOPS. Dividing by 2 and multiplying by 1.5 (1536 vs. 1024 CUDA cores means 1.5x SM advantages and half the tensor cores per SM compared with automotive devices), we get 52.5 TOPS, or about half of what RTX 2060 produces. Note that these clock estimates of 765 MHz might be low for Switch 2 docked, since there is a 1.125 GHz profile floating out there. This would give 77 TOPS if it is the actual frequency. Either way, with half the TOPS you would expect DLSS to be achievable in about 4 ms, or 1/4 of the frame time at 60 fps. This leaves 3/4 of the frame to render a 720p image, which sounds like a workable scenario considering the drop in required native IQ. For 30 fps games, you have 7/8 of the frame time left, which is almost 90% of the rendering time, so it's definitely very useful there.

Another thing is that the 20 series does not have is overlapped ALU/tensor execution. I don't know if this is possible with the Switch form factor due to potentially extra heat, or if it is practical anywhere else even, but if it is then the DLSS cost would be overlapped and be reduced to a small overhead. But let's stick to the first discussion without going into hypotheticals too much

4ms is a lot of time. All the more reason to come up with a custom model if need be

ReddDreadtheLead · Mar 13, 2023

ILikeFeet said:
4ms is a lot of time. All the more reason to come up with a custom model if need be

Technically 3.6

But it’s unsure if that’s accounting for sparsity in the theoretical math or not

Zedark · Mar 13, 2023

ReddDreadtheLead said:
Wait so this is including the Drake tensor cores being half as potent vs ORIN?

Also, this makes it prime for Sparsity to be leveraged for a device like this, now that I think about it if it really can hit that high.

Would mean that, in Theory, the previous statement that @Kazuyamishima said is more or less accurate to the conversation at hand. Or really that it has a little more truth to it than I thought…. hmm.

Yeah, it includes that factor in the consideration. Sparsity is a good question, I don't know how that might affect the performance, nor do I know which devices do and don't use that for DLSS evaluation.

ILikeFeet said:
4ms is a lot of time. All the more reason to come up with a custom model if need be

Well, fair enough. It depends on how tight your rendering pipeline would be at 720p (plus the additional post-processing at 4K that needs to happen). Perhaps it is cheaper/better to render with the full frame budget at 1080p in certain cases, considering that the native res increase is offset both by the increased frame budget time and by no longer needing to render post-processing at 4K. If the 1.125 GHz profile is true, then the frame time taken up by DLSS would be 2.6 ms, which leaves 14 ms out of the 16.6 ms frame buffer at 60 fps. That could maybe change the consideration as well.

A custom model is an interesting idea regardless. My worry there would be that it might semi-freeze the DLSS feature in time and make it such that it takes longer for NVIDIA innovations in the core algorithm to be reflected on the Switch. But if it is a pressing requirement to be able reduce the DLSS runtime to for example below 1ms so as to make it almost invisible, then we shouldn't disregard it, of course.

Edit: Considering that the network used for DLSS is a convolutional autoencoder, there is ample opportunities to reduce the model cost, mostly by increasing the stride size or by increased the pooling kernel size I think. Decreasing the kernel sizes might harm the feature extraction process too much, so the first two seems like safer options. Regardless, NVIDIA will need to retrain the entire network and see how well the result stack up: I imagine they already have done many hyperoptimisation studies to tell how large each component of the network should be. But if they change their view on the desired quality, they might find some good compromises.

ReddDreadtheLead · Mar 13, 2023

Zedark said:
Yeah, it includes that factor in the consideration. Sparsity is a good question, I don't know how that might affect the performance, nor do I know which devices do and don't use that for DLSS evaluation.

Currently, from what I can tell, no devices use sparsity for DLSS, but I could be wrong on that part and the desktop uses it. Though, the numbers for DLSS time don’t seem to show that it does use it so I’m leaning to no.

@ILikeFeet actually has access to I think it was UE data that showed the times for DLSS and it compared it across different cards. I just remember that the 4090 was like 0.49ms or something, 3080 was 1.something ms and the 2060S required 2.something.

Anyway here’s what the ampere white paper has to say about the sparsity feature:

Fine-Grained Structured Sparsity
With the A100 GPU, NVIDIA introduced Fine-Grained Structured Sparsity, a novel approach which doubles compute throughput for deep neural networks. This feature is also supported on GA10x GPUs and helps accelerate certain AI-based graphics inference workloads.
An in-depth description of the implementation of Fine-Grained Structured Sparsity in the Ampere GPU architecture and a primer on Sparsity is available in the NVIDIA A100 Tensor Core GPU whitepaper.
Sparsity is possible in deep learning because the importance of individual weights evolves during the learning process, and by the end of network training, only a subset of weights have acquired a meaningful purpose in determining the learned output. The remaining weights are no longer needed.
Fine grained structured sparsity imposes a constraint on the allowed sparsity pattern, making it more efficient for hardware to do the necessary alignment of input operands. NVIDIA engineers have found that because deep learning networks are able to adapt weights during the training processbasedontrainingfeedback,ingeneralthestructureconstraintdoesnotimpactthe accuracy of the trained network for inferencing. This enables inferencing acceleration with sparsity.

Third-Generation Tensor Cores in GA10x GPUs

Fine-Grained Structured Sparsity prunes trained weights with a 2-out-of-4 non-zero pattern, followed by a simple and universal recipe for fine-tuning the non-zero weights. The weights are compressed for a 2x reduction in data footprint and bandwidth, and the Sparse Tensor Core operations double math throughput by skipping the zeros.

NVIDIA has developed a simple and universal recipe for sparsifying deep neural networks for inference using a 2:4 structured sparsity pattern. The network is first trained using dense weights, then fine-grained structured pruning is applied, and finally the remaining non-zero weights are fine-tuned with additional training steps. This method results in virtually no loss in inferencing accuracy based on evaluation across dozens of networks spanning vision, object detection, segmentation, natural language modeling, and translation.

Also, I just want to say that this bolded part is something the Optical Flow Accelerator is supposed to be good for…

hehe, maybe they’ll use Sparsity with the OFA on Drake for some DLSS3!

this is a joke, mostly. But the OFA is great at object detection and what it’s meant for, so maybe a later version of DLSS3 would include this? [

Disorientator · Mar 13, 2023

Hartmann said:
Nintendo has published a new patent for neural network training.

US20230019874A1 - Systems and methods of neural network training - Google Patents

A computer system is provided for training a neural network that converts images. Input images are applied to the neural network and a difference in image values is determined between predicted image data and target image data. A Fast Fourier Transform is taken of the difference. The neural...

patents.google.com

This is not new if I'm not mistaken, and was possibly already brought-up a couple of times in here as well.

It was filed in 2021 and published back in January 19th.

Edit: Yeah it was posted here back in January.

LukasManak22 · Mar 13, 2023

I am going to get dejavu when the next-gen Switch Presentation comes and we will see Furukawa speaking for a first time, just like we saw Kimishima.

ReddDreadtheLead · Mar 13, 2023

Disorientator said:
This is not new if I'm not mistaken, and was possibly already brought-up a couple of times in here as well.

It was filed in 2021 and published back in January 19th.

Edit: Yeah it was posted here back in January.

I believe some parts are new though

Mercury_Sagit said:
The hardware to which the trained neural net is deployed can switch between battery and wall socket as power source. It can also disable the usage of the neural net in "battery" mode and enable it in "socket" mode. This reads like Nintendo plans to only allow DLSS to run in docked mode.

Like this, assuming this does happen, hopefully they’d allow something like a DLAA in portable mode…?

Bonejack · Mar 13, 2023

LukasManak22 said:
I am going to get dejavu when the next-gen Switch Presentation comes and we will see Furukawa speaking for a first time, just like we saw Kimishima.

I wonder if the former EA boss comes back to stage again, maybe he now has a daughter that they called Daisy?

LukasManak22 · Mar 13, 2023

Bonejack said:
I wonder if the former EA boss comes back to stage again, maybe he now has a daughter that they called Daisy?

+ bring back the translator for Suda again.

YolkFolk · Mar 13, 2023

An older video but still a good watch for those who have never seen it.

Bonejack · Mar 13, 2023

LukasManak22 said:
+ bring back the translator for Suda again.

Generally, Suda should always be invited on stage on a live Nintendo presentation. And he should have a segment in at least 1 of a years 3 general Directs.

Zedark · Mar 13, 2023

YolkFolk said:
An older video but still a good watch for those who have never seen it.

Just a heads up: my post #45706 is in reference to this specific DF video.

Simba1 · Mar 13, 2023

ItWasMeantToBe19 said:
Uhh, late 2023 is possible because Nintendo has nothing whatsoever planned for Fall other than DLC, but 2024 is definitely more likely as Nintendo gave investors no indication whatsoever that a new console was coming soon at the latest financial briefing.

Its still early for Nintendo announce their fall-winter lineup, this is time when they focus on releases until September, very similar was previous years also.

Mercury_Sagit · Mar 13, 2023

ReddDreadtheLead said:
I believe some parts are new though

Like this, assuming this does happen, hopefully they’d allow something like a DLAA in portable mode…?

Tbf the patent reads like Nintendo has the option to disable the neural net in portable mode, but it's not obligatory. So I would not draw conclusion about what they will do from this, but rather just speculate what they can do with a bespoke version of DLSS for Drake.

Speaking of bespoke design for neural nets, one of my favorite reference is this project and the accompanying technical paper. Apparently, the author completely mitigated the usage of VRAM traffic by carefully choosing the width of the neural net layers. Afterwards, each layer is distributed into threads as blocks and each resulting block completely fit into the target GPU's cache. The only usage of VRAM traffic is for initializing the neural net and grabbing the output. In the context of DLSS on Drake, this can be very beneficial to overall performance since mobile SoC are in general bandwidth conconstrained versus desktop components.

Lelouch0612 · Mar 13, 2023

Simba1 said:
Its still early for Nintendo announce their fall-winter lineup, this is time when they focus on releases until September, very similar was previous years also.

No really tho. Almost all the Holiday titles of the past years would have already been announced at this point :

2017: SMO & XC2 (Jan 17)
2018 : Super Smash Bros. Ultimate (Mar 18)
2019 : Luigi's Mansion 3 (Sep 18), Pokémon S/S (Feb 19)
2020* :
2021 : Pokémon BD/SP (Feb 21)
2022 : Bayonetta 3 (Sep 21), Pokémon S/V (Feb 22)

The only times we knew as much as today about the Holiday line-up was 2020 which marketing's got disrupted by Covid.

Homura · Mar 13, 2023

Lelouch0612 said:
No really tho. Almost all the Holiday titles of the past years would have already been announced at this point :

2017: SMO & XC2 (Jan 17)
2018 : Super Smash Bros. Ultimate (Mar 18)
2019 : Luigi's Mansion 3 (Sep 18), Pokémon S/S (Feb 19)
2020* :
2021 : Pokémon BD/SP (Feb 21)
2022 : Bayonetta 3 (Sep 21), Pokémon S/V (Feb 22)

The only times we knew as much as today about the Holiday line-up was 2020 which marketing's got disrupted by Covid.

We didn't know about Metroid Dread and Mario Party Superstars in 2021 until E3.

ReddDreadtheLead · Mar 13, 2023

Homura said:
We didn't know about Metroid Dread and Mario Party Superstars in 2021 until E3.

That’s not early in the year, which is what Lelouch is getting at.

KMStwo · Mar 13, 2023

Hidden content is only available for registered users. Sharing it outside of Famiboards is subject to moderation.

Zaimokuza · Mar 13, 2023

Just to be clear: what do modern smartphones do with their neural engines or whatever they call them?
I've always thought it was just analyzing usage data to optimize when to connect to internet to retrieve messages and updates, but there must be more to it. Is there some dynamic management of the cpu frequencies to prevent throttling?

I understand this type of processing unit can be useful to parse language (which is why Google advertised its Pixel NPU boasting particular features for system wide subtitles and Google Assistant's speed) but I still don't understand what the benefits for videogames, apart from raytracing, are. I've read that it can potentially be used to simulate fluid dynamics (which sounds like a waste, unless it's an important mechanic in your game) and compute precise collisions, but nothing sounds like an impressive selling point.

kvetcha · Mar 13, 2023

Zaimokuza said:
Just to be clear: what do modern smartphones do with their neural engines or whatever they call them?
I've always thought it was just analyzing usage data to optimize when to connect to internet to retrieve messages and updates, but there must be more to it. Is there some dynamic management of the cpu frequencies to prevent throttling?

I understand this type of processing unit can be useful to parse language (which is why Google advertised its Pixel NPU boasting particular features for system wide subtitles and Google Assistant's speed) but I still don't understand what the benefits for videogames, apart from raytracing, are. I've read that it can potentially be used to simulate fluid dynamics (which sounds like a waste, unless it's an important mechanic in your game) and compute precise collisions, but nothing sounds like an impressive selling point.

Apple uses it for FaceID.

ILikeFeet · Mar 13, 2023

Zaimokuza said:
Just to be clear: what do modern smartphones do with their neural engines or whatever they call them?
I've always thought it was just analyzing usage data to optimize when to connect to internet to retrieve messages and updates, but there must be more to it. Is there some dynamic management of the cpu frequencies to prevent throttling?

I understand this type of processing unit can be useful to parse language (which is why Google advertised its Pixel NPU boasting particular features for system wide subtitles and Google Assistant's speed) but I still don't understand what the benefits for videogames, apart from raytracing, are. I've read that it can potentially be used to simulate fluid dynamics (which sounds like a waste, unless it's an important mechanic in your game) and compute precise collisions, but nothing sounds like an impressive selling point.

largely assisting with imaging stuff. these neural engines are still matrix multiplication hardware and that's probably used in specific instances

Foltzie · Mar 13, 2023

KMStwo said:
* Hidden text: cannot be quoted. *

I’m intrigued at how AI could be a broad selling point. Primarily because like others I don’t see an immediate benefit for a game.

Perhaps on device speech interpretation or gyro interpretation could make for better control options.

If it’s anything like CoreML developers will have some flexibility in leveraging the tools.

ILikeFeet said:
largely assisting with imaging stuff. these neural engines are still matrix multiplication hardware and that's probably used in specific instances

Maybe AR makes a return.

Hosermess · Mar 13, 2023

Yeah I'll need specific examples of anything AI related before I get excited or interested in it.

And even if they don't know the "technical and hardware aspects", do they know an approximate completion date target for whatever project they are working on?

Bosintang · Mar 13, 2023

KMStwo said:
* Hidden text: cannot be quoted. *

Mr.Gamerson · Mar 13, 2023

KMStwo said:
* Hidden text: cannot be quoted. *

AI being the gimmick is very smart if true and I hope Nintendo is doing some cool stuff besides dlss that will help set this system apart.

Truno · Mar 13, 2023

Doug Bowser was asked about new hardware by the Washington Post:

Q: When is the next Switch coming out? What kinds of features or new capabilities would you like to see?

A: As we enter the seventh year for the Nintendo Switch, sales are still strong. I think we still have a very very strong lineup coming. As Mr. Furukawa (Nintendo President Shuntaro Furukawa) said recently, we’re entering uncharted territory with the platform. It’s exciting to see that demand is still there. So nothing to announce on any future console or device, but we are still feeling very bullish about Nintendo Switch. I should be careful about what I personally would like to see (in a new Switch). But what I can share is that one of the reasons that even going into year seven we feel very confident that the Switch can have a strong performance over the next few years is that it is still truly that unique device that you can play in a variety of ways, at home, on the go. One of the things we look at always is how can we surprise and delight. How can we introduce new unique ways of playing. That’s always in front of our mind.

ezidro3 · Mar 13, 2023

Maybe if Nintendo decides to add native voice chat with the extra RAM, it could be something akin to Nvidia Broadcast’s AI to reduce background noise?

Bonejack · Mar 13, 2023

It's totally a voice assistant like Siri, but based on the Koopalings' Wendy.

"Hey Wendy, launch Tears of the Kingdom."

"Got it, downloading Everybody 1-2 Switch! But don't ask me for anything again."

Raccoon · Mar 13, 2023

Hidden content is only available for registered users. Sharing it outside of Famiboards is subject to moderation.

Thraktor · Mar 13, 2023

KMStwo said:
* Hidden text: cannot be quoted. *

Hidden content is only available for registered users. Sharing it outside of Famiboards is subject to moderation.

Zaimokuza said:
Just to be clear: what do modern smartphones do with their neural engines or whatever they call them?
I've always thought it was just analyzing usage data to optimize when to connect to internet to retrieve messages and updates, but there must be more to it. Is there some dynamic management of the cpu frequencies to prevent throttling?

I understand this type of processing unit can be useful to parse language (which is why Google advertised its Pixel NPU boasting particular features for system wide subtitles and Google Assistant's speed) but I still don't understand what the benefits for videogames, apart from raytracing, are. I've read that it can potentially be used to simulate fluid dynamics (which sounds like a waste, unless it's an important mechanic in your game) and compute precise collisions, but nothing sounds like an impressive selling point.

At the moment they're mostly used for image processing, I believe. Things like face recognition, background blurring, low-light denoising, etc. Camera quality is a major selling point for higher-end phones, and without the ability to fit physically bigger cameras in there, it's increasingly a competition over what's known as "computational photography", which includes machine learning components.

ILikeFeet · Mar 13, 2023

ezidro3 said:
Maybe if Nintendo decides to add native voice chat with the extra RAM, it could be something akin to Nvidia Broadcast’s AI to reduce background noise?

that's one thin they can do. I hypothesized that they could also use it to police hate speech and whatnot

Mr.Gamerson said:
AI being the gimmick is very smart if true and I hope Nintendo is doing some cool stuff besides dlss that will help set this system apart.

I just can't think of many marketable gimmicks for AI right now. DLSS is per-game and relies of people caring about IQ. the voice chat stuff could be handled server-side. stuff like physics are also situational

Anatole · Mar 13, 2023

ReddDreadtheLead said:
This is @Anatole ’s expertise

This one isn’t especially interesting on the neural network side, imo. Everything described is fairly standard practice. The one thing that’s somewhat different from what I’ve seen most often is that they calculate the loss function in frequency space after taking the FFT, but that’s a lot less exotic than the block transform neural network in the other patent, which was completely foreign to me.

Zedark · Mar 13, 2023

I think it is good to note that neural networks such as the one underlying the DLSS software are not AI specifically, but instead they are better described more broadly as mathematical function approximators. What any neural network does is compute a mathematical function f(x) = y, where x is a set of inputs and y is the corresponding output. This is the same concept as for example the simple parabolic function f(x) = x^2 that you all probably know. Neural networks define a similar mapping from input to output, only the input can be something as complex as a full image and the mapping from this image to the desired output is very complex. The neural network learns this mapping via training on example data. A high-level definition of DLSS, for example, could be:

f (<low-res image>, <motion-vectors> ) = <high-res image>.

During the training process, the neural network is tuned to make this mapping fit as closely with the example pairs of low-res and high-res images until it can produce high-res images from low-res ones by itself. (Note: the real DLSS algorithm also uses thr previous frame's image as input)

When we use neural networks for what we commonly recognise as AI (self-driving cars, distinguishing between cats and dogs, etc.), we are 'simply' learning a mathematical formula that maps images to a value. The underlying assumption is that our brains do not do anything much more sophisticated that such a (very complex and efficient!) mathematical mapping. Because of this, forms of AI emerges from the computational capacity of these neural networks.

Now, when someone wants to apply 'AI' to a rendering problem, they are usually just referring to training a neural network to find an easily evaluatable mapping from the input parameters to the result of that effect, for example

f (<state-of-water-at-time-t>, <forces> ) = <state-of-water-at-time-(t+1)>.

At any time, you would need to wonder if training and employing a neural network for a given effect is worthwhile. Some scenarios where it could be worth it perhaps is when you want to compute complex fluid dynamics (the real formula is notoriously difficult to evaluate efficiently so a functional approximation could work wonders) or when you want to simulate realistic movement of light object through the air (e.g. as part of a game mechanic). Neural Networks can help when the mathematical underpinnings are either unknown (text generation) or too expensive to evaluate with high accuracy (fluid dynamics).

Other applications can be small versions of what chatGPT does, which is a language model that has learned to map from any user input to a well-formulated response. This could be used in dynamic conversation generation, for example.

Anyway, if you take anything away from this post, I hope it is that AI is not much more than a simple mathematical formula, and that the algorithms behind what we view as AI can be used for anything that we can approximate in a mathematical way. Therefore, many rendering techniques are a potential candidate for substitution as long as the training and execution costs of the Neural network do not exceed its added benefit.

Thank you for attending my mostly unprompted lecture about neural networks!

StarTopic Future Nintendo Hardware & Technology Speculation & Discussion |ST| (Read the staff posts before commenting!)

Manakete

#TeamLate2025WithAPotentialForEarly2026

Deleted member 887

Guest

Warpstar Knight

Optimism is non-negotiable

The guy with the ToV avatar

Gourmet

Chain Chomp

Cappy

Manakete

Manakete

┏(‘▀_▀’)ノ♬♪

Warpstar Knight

#TeamLate2025WithAPotentialForEarly2026

Gourmet

#TeamLate2025WithAPotentialForEarly2026

Chain Chomp

Warpstar Knight

#TeamLate2025WithAPotentialForEarly2026

The guy with the ToV avatar

#TeamLate2025WithAPotentialForEarly2026

Cappy

who the fuck is that

#TeamLate2025WithAPotentialForEarly2026

The Ancient One

who the fuck is that

Tingle

The Ancient One

The guy with the ToV avatar

Bob-omb

┏(‘▀_▀’)ノ♬♪

Growing Install Base

Moblin

#TeamLate2025WithAPotentialForEarly2026

Luma

Piranha Plant

hoopy frood

Warpstar Knight

Uncle

Koopa

Gourmet

Octorok

Koopa

Boo

The Ancient One

Fox Brigade

"[✄]. [✄]. [✄]. [✄]." -Microsoft

Warpstar Knight

Octorok

The guy with the ToV avatar