• Hey everyone, staff have documented a list of banned content and subject matter that we feel are not consistent with site values, and don't make sense to host discussion of on Famiboards. This list (and the relevant reasoning per item) is viewable here.
  • Do you have audio editing experience and want to help out with the Famiboards Discussion Club Podcast? If so, we're looking for help and would love to have you on the team! Just let us know in the Podcast Thread if you are interested!

StarTopic Future Nintendo Hardware & Technology Speculation & Discussion |ST| (Read the staff posts before commenting!)

Either they could make a custom chip with 32 bit support, build in a 32 bit supporting chip soley for BC or create an translation layer.
Assuming Nintendo wants to support BC that far back, I would imagine a translation layer would be enough. They'd already have it for the GPU side of things, so why not the CPU? This is Switch 3 trying to run Switch games, so there would likely be plenty of power to do so.
 
This is fantastic, thank you! Nvidia seems to be pushing the FP4 support specifically for generative LLMs, so perhaps there is some super narrow use case there? Regardless, they're only talking about it in relation to Blackwell, and right now all their public Blackwell info is about the datacenter chips.

It'll be interesting to see if Nvidia continues to segment out some of these features for enterprise customers, where the margins are still high, or let them flow down into consumer products.
Happy to oblige! :)

Low precision is definitely valid for some applications; I know someone posted that 1.58-bit LLM paper in here at some point. I’m just skeptical of FP4 specifically because, on the surface level, it doesn’t seem like it offers any range advantage over INT4, which is the primary reason to use floating points. It will come down to how effective the scaling is; they call it “microtensor” scaling in the brief they’ve released. If I had to guess what that is, I suppose they are probably breaking up larger tensors into a bunch of smaller ones and individually scaling each of those tensors to fit in the FP4 dynamic range. But as far as I know, there’s no further public information yet.

Until Nvidia actually proves with independently verifiable data that the quality of standard architectures trained on or operating in FP4 exceeds INT4 or is comparable to FP8/FP16, I’m treating it all as marketing speak. And either way, I don’t think low precision will work well for DLSS, unfortunately.

EDIT: I actually found what I believe is the specification, with a lot of info! I’ll write it up sometime this week. Short version: FP4 does have some major caveats and often has significantly reduced quality. “Microtensor scaling,” it turns out, indeed involves taking one of the dimensions of your tensor (for example, the column of a matrix) and normalizing all the elements along that axis to the maximum value, instead of normalizing all the elements in the tensor to the global maximum. Anyway, more to come!
 
Last edited:
Although I don't really see LPDDR5X-10700 being adopted by Nintendo any time soon, especially since I doubt JEDEC is going to formally approve LPDDR5X-10700, I still find this fascinating.
micron got 9.6 at least. But that would be too late for Switch 2.

Anyway, Samsung's version will be mass produced in latter half of 2024 and is said to give 25% power savings.. Perhaps it's possible for a Switch 2 revision.. Especially if launch switches have lpddr5(x) from Samsung 🤔
 
Although I don't really see LPDDR5X-10700 being adopted by Nintendo any time soon, especially since I doubt JEDEC is going to formally approve LPDDR5X-10700, I still find this fascinating.
That's kind of nuts, almost twice the speed that Steam Deck had.
 
0
micron got 9.6 at least. But that would be too late for Switch 2.

Anyway, Samsung's version will be mass produced in latter half of 2024 and is said to give 25% power savings.. Perhaps it's possible for a Switch 2 revision.. Especially if launch switches have lpddr5(x) from Samsung 🤔

Or the delay was because of this and we get that nice LPDDR5X Memory from samsung.
And yeah that is insane hopium and very likely not the case ;)
 
Happy to oblige! :)

Low precision is definitely valid for some applications; I know someone posted that 1.58-bit LLM paper in here at some point. I’m just skeptical of FP4 specifically because, on the surface level, it doesn’t seem like it offers any range advantage over INT4, which is the primary reason to use floating points. It will come down to how effective the scaling is; they call it “microtensor” scaling in the brief they’ve released. If I had to guess what that is, I suppose they are probably breaking up larger tensors into a bunch of smaller ones and individually scaling each of those tensors to fit in the FP4 dynamic range. But as far as I know, there’s no further public information yet.

Until Nvidia actually proves with independently verifiable data that the quality of standard architectures trained on or operating in FP4 exceeds INT4 or is comparable to FP8/FP16, I’m treating it all as marketing speak. And either way, I don’t think low precision will work well for DLSS, unfortunately.

EDIT: I actually found what I believe is the specification, with a lot of info! I’ll write it up sometime this week. Short version: FP4 does have some major caveats and often has significantly reduced quality. “Microtensor scaling,” it turns out, indeed involves taking one of the dimensions of your tensor (for example, the column of a matrix) and normalizing all the elements along that axis to the maximum value, instead of normalizing all the elements in the tensor to the global maximum. Anyway, more to come!
The main thing, we are still learning how NN (specially the large ones) really work. I have always been in the camp that continuity (aka high precision floats) was an artifact of the methods we were using for training the neural networks and there was no reason why purely binary neurons would not work. This has been a bit fringe position for a long time, after all the term "differentiable programming" its been used for deep learning. But it seems that we will be proven right in the end. I mean, 4 bit is already much closer to discrete maths than continuous maths already.

However, all this is talking about from a mathematical and theoretical CS point of view. The question if 1, 1.56, 2, 4, 8,... bits would be more efficient from a hardware pov is out of my expertise.
 
Last edited:
Well Nvidia just released the RTX A400, and that thing has 6SMs (?!), and is rated for 2.7TFLOPS of FP32 perf. Would be nice for someone to test gaming performance of that thing as a proxy of Switch 2 performance.
 
Well Nvidia just released the RTX A400, and that thing has 6SMs (?!), and is rated for 2.7TFLOPS of FP32 perf. Would be nice for someone to test gaming performance of that thing as a proxy of Switch 2 performance.
Not sure if it adds anything beyond the RTX2050M comparison, especially considering the VRAM is also limited at 4GB like in the RTX2050M. A downclocked A1000, on the other hand, might give some additional insight into docked performance with its 8GB of VRAM (with the necessary caveats), but then again the faster RAM bandwidth would still obfuscate things. No such thing as a perfect point of comparison, unfortunately.
 
Anyway, Samsung's version will be mass produced in latter half of 2024 and is said to give 25% power savings.. Perhaps it's possible for a Switch 2 revision.. Especially if launch switches have lpddr5(x) from Samsung 🤔
Samsung mentioned that LPDDR6 is coming in 2026 at the earliest.

And assuming LPDDR6 has actual power efficiency improvements compared to LPDDR5/5X, unlike how there weren't any power efficiency improvements between LPDDR5 and LPDDR5X, then using LPDDR6 makes much more sense than using LPDDR5X-10700, especially since JEDEC's very likely to officially validate LPDDR6, whereas I don't really see JEDEC officially validating LPDDR5X-10700, especially since I haven't seen JEDEC officially validating LPDDR5X-9600.
 
Outside of RAM and architecture, is there anything coming in the near (<5 years) future that would really justify a Switch 2 Pro.

Same cores/clocks/node but with faster RAM and more advanced architecture would be kind of weird for a revision. Not sure how easy it would be to program both versions.
 
Outside of RAM and architecture, is there anything coming in the near (<5 years) future that would really justify a Switch 2 Pro.

Same cores/clocks/node but with faster RAM and more advanced architecture would be kind of weird for a revision. Not sure how easy it would be to program both versions.
Not much I can see that would justify Switch 2 Pro, other than maybe minimization, increased internal storage, better display type, and increased clock speeds (which you already mentioned).

I am skeptical Nintendo would add RAM, that runs a real risk of dividing up the Switch 2 library, something I don't think Nintendo is too keen on doing.

Increasing RAM means a real possibility of games being designed mainly/solely for higher RAM version, which means running poorly or not at all on lower RAM version. Expansion for DK64 on N64 was less than ideal scenario.
 
Last edited:
It's pretty hard to see where mobile gaming hardware goes after TSMC 5nm+ other than just architectural improvements and maybe just flooding the chips with tensor cores. 3nm and 2nm do not seem good at all.

I think the current silicon distribution for RTX is like 90% CUDA, 10% tensor and RT cores... Could see a future where it's like 30% CUDA, 60% tensor, 10% RT and PhysX.
 
We currently have neural networks for

1. Upscaling and anti-aliasing
2. Frame generation
3. Ray tracing denoising
4. Ray tracing caching

And will probably have neural networks in the future for

5. VRAM compression/decompression
6. Temporal ghosting cleanup

I believe there are also some theoretical papers on using neural networks to generate simulated dynamic global illumination, but these don't seem like they'll continue as we'll just have full RTGI for all games that want it within like 5-10 years.

But we'll see how many gaming functions we can move to neural networks that can be sped up with tensor cores and thus justify flooding a chip with tensor cores... Already have a decent bit.
 
We currently have neural networks for

1. Upscaling and anti-aliasing
2. Frame generation
3. Ray tracing denoising
4. Ray tracing caching

And will probably have neural networks in the future for

5. VRAM compression/decompression
6. Temporal ghosting cleanup

I believe there are also some theoretical papers on using neural networks to generate simulated dynamic global illumination, but these don't seem like they'll continue as we'll just have full RTGI for all games that want it within like 5-10 years.

But we'll see how many gaming functions we can move to neural networks that can be sped up with tensor cores and thus justify flooding a chip with tensor cores... Already have a decent bit.
Given that Nvidia is finally dipping its toes into multi-die with Blackwell, maybe we could see separate dies for the shaders and tensor cores. I'm not sure if the latency hit would be too much for gaming, but it could be a good way to greatly increase neural performance without sacrificing shaders in the limited die space.
 
Does this mean that Switch 1 BC for Switch 3 is out of the question? Or would they be able to get around this issue somehow?
Just to slightly expand other answers, yeah, translation/emulation is potentially the answer. With generational leaps getting smaller, Switch 1->Switch 2 probably doesn't have quite enough performance to get there consistently.

Nintendo is not longer accepting 32-bit software into the eShop already, as I understand. So Switch 2 software won't use the 32-bit capabilities of the hardware, which sets them up nicely for Switch 3
 
Given that Nvidia is finally dipping its toes into multi-die with Blackwell, maybe we could see separate dies for the shaders and tensor cores. I'm not sure if the latency hit would be too much for gaming, but it could be a good way to greatly increase neural performance without sacrificing shaders in the limited die space.

This seems like it would increase electricity consumption a lot which is hard for mobile hardware
 
* Hidden text: cannot be quoted. *
I imagine they could be used to provide a smoother and more elegant-feeling latch. Just enough that it sort of pulls itself into place before clicking.

Depends how premium Nintendo wants these devices to feel.
 
Where exactly?
Speakers, HD rumble motors, fan motor, at least. The problem with more magnets is also that you could interfere with these components and the motion control elements. Especially if they're strong enough to hold controller to console.
 
0
Hidden content is only available for registered users. Sharing it outside of Famiboards is subject to moderation.


Not sure if it adds anything beyond the RTX2050M comparison, especially considering the VRAM is also limited at 4GB like in the RTX2050M. A downclocked A1000, on the other hand, might give some additional insight into docked performance with its 8GB of VRAM (with the necessary caveats), but then again the faster RAM bandwidth would still obfuscate things. No such thing as a perfect point of comparison, unfortunately.

Yup, I think the proxy T239 experiment by DF provides a sufficient overview with the lowest ampere card.
I've played enough with the RTX 3050 mobile to see similar results as DF, and to test DLSS at higher resolution you need that VRAM. Some games also don't run sufficiently well at 1080p to estimate performance (imo) so it's mostly "handheld" tier resolution that has some validity IMO.
If I took the experiments as a conclusion, I think that the primary disappointment would be docked performance going by the metrics. For a handheld, you are limited by the thermal design (e.g. it has to cool ~15W in a docket profile), so that provides the upper bound for the clock speed you're able to configure, oversimplified ofc. The lower bound would be primarily dominated by the battery life for a handheld.
 
wtf did I just read in this thread. Feel absolutely matters, yes... to a device you'll actually be handling. A controller feeling cheap/light would be a bad. A console that sits under your TV, never to be touched? As long as the build quality isn't complete shit, no one's gonna care. Nintendo is a known quantity, ain't nobody out there dismissing them because their console is lighter. The size of Switch's contemporaries is one of the sore sticking points that people dunk on them for. Go back even just a few generations and consoles were a fraction of the weight.
I don't see how the weight of a portable has anything to do with it's assumed cost.

The psp 3000 is lightweight as heck, WAY more than a switch. And that's why even though it isn't ergonomic, it's not uncomfortable to hold for long periods of time like a switch without a grip.

One of the key issues with switch comfort is related to how heavy it is compared to the ds lite and gba. Although the ds still felt awful to hold for long periods of time due to it's cramped design and lack of grip.

Also, something feeling cheap has way more to do with design and finish. The psp at the time felt like a premium product because of how glossy plastic looks shiny and pretty and how minimalistic and well-made the buttons and overall handheld's face was.
 
Please read this staff post before posting.

Furthermore, according to this follow-up post, all off-topic chat will be moderated.
Last edited:


Back
Top Bottom