• Hey everyone, staff have documented a list of banned content and subject matter that we feel are not consistent with site values, and don't make sense to host discussion of on Famiboards. This list (and the relevant reasoning per item) is viewable here.

StarTopic Future Nintendo Hardware & Technology Speculation & Discussion |ST| (Read the staff posts before commenting!)

Fuck, my thirst for this little beast of a machine has increased. Sweet sweet specs talk. Even the lower end of expectations is more than solid.
just a couple years ago, we were talking about a ~700GFLOPs handheld and 1.1TFLOPs docked console, now it's 3 times that lol. I feel like we were happy with that upgrade at the time too.
 
just a couple years ago, we were talking about a ~700GFLOPs handheld and 1.1TFLOPs docked console, now it's 3 times that lol. I feel like we were happy with that upgrade at the time too.
Now that I have gotten myself more into the conversation the idea is quite crazy. I can't wait for the full reveal and then see everyone react on it. That said, I already love my Nintendo Switch now so anything that is slightly better already. Chances of me not buying the next console are quite small.
 
The soon to be official timeline:
[...]

June can't come soon enough. Nintendo not being at E3 gives them the freedom to reveal things following their own agenda, but I'm still convinced that June will be the time when they finally acknowledge the new console.

In two months we'll be playing Zelda.
In three months we'll know about Nintendo's new console.

What a time to be alive.
 
Yes, 12w is just the GPU's power consumption. The whole system itself would likely draw an additional 4 to 6 watts over the OG model if this was the case, however the OG model drew 8w for just the GPU at 768MHz, so a 9w with DLSS (this was a DLSS test afterall) is in line with what we could expect.

I don't actually know if these clocks are even used, but they were within my estimations at 5nm, and the nature of where they were found is compelling to conclude. Ultimately we might have to wait to find out these finer details, the chip is big enough and those specs are known enough, to make some pretty conclusive assumptions about it's performance IMO.

The original Switch's GPU used around 4w actually, and a little under for Zelda. The whole system used 7.1w (in botw) with the screen at minimum brightness and everything turned off. 9w with everything up and on.
Oop, thanks for the correction.

Given that, I find the 12W mode very odd. I think it could only realistically be a stress test. It's too high for the rest of the system to operate simultaneously unless they update the dock's hardware, which I doubt they will do, but it's so low that a TV only system targeting better performance would be wasted on it.

12W on the GPU would leave just 3W for the CPU, fan, all support circuitry and charging the Joy-Con.

On the flipside, 4W consumption being in line with the original Switch makes a lot of sense. I think V1 power consumption is quite realistic. I only hope they make the battery better capacity.
 
Short tangent since I mentioned the battery:

Updating the battery and charging circuitry would allow the device to fast charge off the existing charger, which would be nice.
 
Yes, 12w is just the GPU's power consumption. The whole system itself would likely draw an additional 4 to 6 watts over the OG model if this was the case, however the OG model drew 8w for just the GPU at 768MHz, so a 9w with DLSS (this was a DLSS test afterall) is in line with what we could expect.

I don't actually know if these clocks are even used, but they were within my estimations at 5nm, and the nature of where they were found is compelling to conclude. Ultimately we might have to wait to find out these finer details, the chip is big enough and those specs are known enough, to make some pretty conclusive assumptions about it's performance IMO.

The original Switch's GPU used around 4w actually, and a little under for Zelda. The whole system used 7.1w (in botw) with the screen at minimum brightness and everything turned off. 9w with everything up and on.
i think that Switch 2 also will be have bigger battery than Switch 1
 
And 5nm!

I've pointed out a few times that 40 Series and T239 were sampled and developed in tandem. I don't think the Ampere/Ada difference is really all that significant performance wise, but inheriting the node from Ada would help.

DLSS 3.0... that would be madness. Brilliant, incredible madness.
I don't think they'll do it but I'll be damn amazed if they do. 7/8ths of all pixels being interpreted rather than rendered. Even at 2 or 3 teraflops that would have well optimised games looking close to even Xbox Series X. 1080p30, perhaps even, 720p30 rendering, displayed in 4K60 with nothing but some noise and mild controller delay.

It would he a dream come true to see this thing hit 3-4TF and have DLSS 3.0

Of course, Teraflops depend on node. I suspect 5nm stronger than ever. But if it was made alongside Ada, tested alongside Ada, inherited features from Ada... Maybe the OFA from Ada isn't so absurd.
Lovelace-derivative SoC would surely confirm a 5nm (4nm) process, as that's what it starts on, IIRC.

Ok I’m done. 🤭

It’s already out of date by virtue of using a CPU from 2020 and a GPU from 2020. But it’s modern to the mass consumers that it doesn’t matter much in the grand scheme.
It wouldn't be any more dated, tbqh, considering that PS5/XS have parts from 201X, and those systems would still be in use. Also, the A78C has been shown to be better-suited for gaming purposes VS its successors. So, "Newer" doesn't always and doesn't necessarily equate to "better". Of course, a newer AND better solution can arrive before launch, but one suspects that's out of Nintendo and Nvidia's control and schedule.
 
June can't come soon enough. Nintendo not being at E3 gives them the freedom to reveal things following their own agenda, but I'm still convinced that June will be the time when they finally acknowledge the new console.

In two months we'll be playing Zelda.
In three months we'll know about Nintendo's new console.

What a time to be alive.
We’re almost there !
 
Is this true even considering the bare minimum clocks and low power draw?

It’s been awhile since I’ve read all the discussions pages ago…just curious.

So a Switch game that runs a variable 540p-720p and struggles a bit to keep 30fps can perform at native 4K steady 60fps on Drake hardware with minimum clocks and at 10w? Enough headroom to increase graphic iq even? (After assuming needing to push 4K textures and stuff)
If we assume same clocks as the Switch as the minimum, then it's a 6x power bump. Resolution scales very well with the GPU power, but it's not perfect, so I wouldn't take 6x the number of pixel as granted, but I believe Drake should safely be able to output 4x if nothing else changes.

4x the pixels is twice the "p" number, so 720p -> 1440p and 1080p -> 2160p (4K). 900p -> 4K is under 6x, so it could happen even on the lower spectrum.

Since she said "Switch-tier games" and "most", this excludes PS4 ports and it has exceptions.

But I want to highlight the "if nothing else changes" again. The games are not getting double fps on top of 4K resolution and if they decide to tune up some graphics options instead of just maxing resolution, don't count with a huge bump in resolution either. For example, I would expect (and hope) 3rd party ports getting more on par with PS4 version instead of keeping all downgrades aside from resolution.
 
just a couple years ago, we were talking about a ~700GFLOPs handheld and 1.1TFLOPs docked console, now it's 3 times that lol. I feel like we were happy with that upgrade at the time too.
As a Pro it was a good upgrade.
If that was the device Nintendo canceled fine

As a successor after 6 years, we definitely want something more powerful
 
Hi everyone! :)

Surely the new Switch or whatever it will be called will have DLSS, but which version? 2.2? 2.3? 2.4? 2.5? Or 3.0? If it had 2.2 would it be possible to update it to support 3.0?
 
Hi everyone! :)

Surely the new Switch or whatever it will be called will have DLSS, but which version? 2.2? 2.3? 2.4? 2.5? Or 3.0? If it had 2.2 would it be possible to update it to support 3.0?
2.x dosent require Lovelace hardware, so it will be the the latest version of 2. Since Drake is ampere, it will not support frame generation.
 
Actually, while we're on the topic of DLSS, will the better OFA that Drake uses allow any performance or image quality improvements?

I know the OFA is only one part of the DLSS equation, and using a faster OFA doesn't really matter if the older Ampere tensor cores still take tons of clock cycles to get the information to the OFA, but this is also the first time we'd be seeing DLSS implemented in a closed environment, right? I'd imagine there must be some benefit to having a fixed spec, a better OFA compared to the Ampere cards, and using DLSS. DLSS isn't exactly plug-and-play and not all DLSS solutions are created equal (See: Red Dead Redemption 2.)

Edited: for clarity.
 
Last edited:


Not sure if this is on topic or not, but it seems like Microsoft and Activision included the current Switch for the 10 year Nintendo deal. Will be interesting to see how this shakes out. On one hand, Call of Duty - Warzone aside - has the benefit of smaller maps VS. Apex, Fortnite, or even Doom Eternal, so image quality and performance should hypothetically be better than those titles. But on the other, the new CoD games are a bit of a mess. I can see from-the-ground-up versions of these games being an okay experience, but not ports.
 
They are supposedly aiming at day&date launch for future CODs on Switch, which means it needs less than ✂️ months to port it to Switch.
I think Switch is always intended as a target though, it's the one with 100million users after all.
 
It says in the Nvidia leak?

Edit: also frame generation would have been present in nvn2.

I guess you're right we don't know for 100%, but it would require some levels of mental gymnastics to explain it.
That's a mean way of putting it. I don't think it's likely but strictly speaking it is possible.
 
0


Not sure if this is on topic or not, but it seems like Microsoft and Activision included the current Switch for the 10 year Nintendo deal. Will be interesting to see how this shakes out. On one hand, Call of Duty - Warzone aside - has the benefit of smaller maps VS. Apex, Fortnite, or even Doom Eternal, so image quality and performance should hypothetically be better than those titles. But on the other, the new CoD games are a bit of a mess. I can see from-the-ground-up versions of these games being an okay experience, but not ports.

REDACTED REDACTED REDACTED


Fuel for the fire!
 
Horizon splits the driver into two parts, once which lives in the game, and another that lives in the OS. I am honestly a little fuzzy on what does what there. What I am 100% certain about is that games have some privileged access that UDA drivers don't have, and that games hardcode some Maxwell microcode which uses those elevated permissions to set themselves up. That bypasses the HAL at least in that step, and won't work automagically on new hardware.

I don't think it's a particularly challenging fix, but it will require some work on the OS side to get it there.
As I understand the UDA that should not be a big deal for backwards compatibility as long as the New Switch has it's HAL. The HAL on the NEW Switch doesn't need the HAL to be in the old Switch. Also because of how it all works I don't believe you can bypass the HAL, I could be wrong on this one though. And I don't know if SciresM got that right about the HAL being absent...

SciresM replied to me on Twitter to tell me the HAL is not in the drivers. As I understand it the HAL is not in the drivers, it's part of the hardware. He may be the Switch Homebrew guy. But he's not an Nvidia Engineer, and he doesn't seem to know much about the UDA specifically. I respect him and his work. I think he's not researching the UDA enough to understand it and is making flawed assumptions. Also being an authority doesn't mean he's always correct. Assuming he is correct because he's an authority is the literal definition of the Appeal to Authority Fallacy.

Because the UDA uses a unified instruction set we're already on track for relatively easy backwards compatibility. The new hardware will still have all the old instructions of course. It's the old hardware not having new instructions for the new software that is more of an issue. With the starting point of a unified instructions set that has almost 20 years of development I'm not worried about backwards compatibility. Even if they need to do some work, they have an excellent foundation and starting point with that unified instruction set.

I am happy to be proven wrong. But with actual evidence, not appeal to authority fallacies.

Where I am at is that unless someone can prove with facts and evidence that the UDA is absent there is no reason to believe we won't have backwards compatibility. It would be more work to remove the UDA than it's worth. If Nintendo wanted to remove it that would cost extra in terms of people hours. They would need to redesign a lot. From a business standpoint it doesn't seem to make sense to ditch it. Too much extra cost for no benefit.

Think of it this way:

If the UDA were removed from Tegra that would be a problem for their sales of Tegra to AI and vehicle manufacturers. If whatever they did is stuck on that generation of Tegra, and can't be easily ported forward and/or backwards, that's a big deal and big problem. The exact kind of problem UDA was meant to solve. If anything it makes much more sense that Nvidia would make sure that the UDA adapts to the Tegra SoC because that's a huge selling point. If you can sell your chips and tell people that they won't be outdated so quickly because the UDA allows old chips to run newer software that's huge for marketing your chips. It doesn't make any sense for the UDA to be absent from the Tegra SoC lineup. Removing the UDA would be a massive step backwards.

And if Nintendo requested the UDA be removed I can only see Nvidia giving that a hard no. It's not worth the time and effort. And also that's not in Nvidia's behavior patterns. Devs have complained about how Nvidia involvement can be frustrating because they are hard set on certain things. And that you have to do it Nvidia's way if you want their direct support. I could see Nintendo asking, then Nvidia telling them that it's a hard no.
 
There's nothing stopping them from just implementing all the missing/modified instructions from Maxwell, but I don't think there's any direct evidence they've done that. Driver compatibility is likely a non-issue, since there must be some level of that already for older games to work on the most recent firmware.

I get the impression that the homebrew/hobbyist crowd are stuck in the hobbyist emulator mindset , where shader emulation has to be quite complicated and heavy to be able to run anywhere. Nintendo just has to make shaders compiled for one ISA run on a different, but related one. The problem is not the same, and the solution doesn't have to be either.
I think you are correct and it's a mindset thing.

They're used to older consoles that never had anything like the UDA and aren't as familiar with PC drivers and hardware.

The thing is the Tegra SoC is more like a modern PC than the older consoles. So what they're used to doesn't apply the same way, or at all.
And instead of actually finding out how the UDA works they're making flawed assumptions.

The concept that the UDA does in fact bring easy backwards compatibility is too far outside the paradigms they're familiar with. Which is understandable. Older consoles could have had a form of UDA, but chose not too. They were too deep into very customized hardware that was unique to the platform/console. With the Tegra line you don't have that issue. Even when customized they still share the same core/foundational architecture, which the UDA is a integral part of.

It does seem like there is a factor of they so used to overly restrictive one off customized hardware that something like the Tegra SoC line throws them off in NOT being like that. The Tegra SoC line isn't like previous console hardware. It's much much more closer to PC hardware. Overall consoles are a lot more closer to customized PC hardware than every before. the PS3/Xbox 360 era felt like a tipping point on one off customized hardware. That shit was expensive and problematic for both companies. The emulation of those consoles is still troublesome because of how ridiculous the hardware was. We don't have a Red Dead Redemption 1 port in part because what they had to do to make it run on that generation was so hacky that it would take an insane amount of rewriting the code to make it work on newer hardware.

Tegra doesn't have those issues because of stuff like the UDA. As much as Nvidia can be a crappy corpo, they do excell at innovating new tech. And the UDA is just one example of that. AMD still has driver issues to this day, and has always had a reputation for mediocre drivers. Nvidia should get credit for the innovation of the UDA and what it enables. Even if they suck in other areas.

Nvidia has been doing the UDA since around 2000 or so(fun fact CUDA is Compute Unified Driver Architecture.) And did it for the express purpose of not needing a specific driver for every new hardware. Which was a thing and very annoying. I was a kid in those days, and I remember that you had to really make sure you had the specific driver for your specific hardware. The modern one driver package for all hardware paradigm did NOT always exist.

They've been iterating on it for over 20 years. And none of us are Nvidia Engineers either. No one on this forum(that I know of,) nor MVG or SciresM, truly has the direct experience with Nvidia's UDA architecture to do more than make guesses. I at least do my best to make my guesses as educated by the available facts as I am able. I'm doing my best to read the documentation from Nvidia and understand it using my own experience and knowledge. I could definitely be wrong. But one needs evidence, not "said so," to prove me wrong.

At the end of the day it seems insane and stupid to stop using the UDA for any reason. It's has too many benefits. And is too deeply a core part of the Nvidia Hardware and Driver Architecture.
 
June can't come soon enough. Nintendo not being at E3 gives them the freedom to reveal things following their own agenda, but I'm still convinced that June will be the time when they finally acknowledge the new console.

In two months we'll be playing Zelda.
In three months we'll know about Nintendo's new console.

What a time to be alive.
That's a sure thing for Zelda, however Nintendo could definitely reveal [REDACTED] at a much later date.
I think we'll have to be a little bit extra patient - again - for the console reveal.

Of course I'd love to be wrong!
 
just a couple years ago, we were talking about a ~700GFLOPs handheld and 1.1TFLOPs docked console, now it's 3 times that lol. I feel like we were happy with that upgrade at the time too.

If you were to remove the knowledge of T239 and simply be looking at other products on the market, that would still be a reasonable estimation. When I saw the Orin chip breakdowns, before I knew the lower end models were just binned chips, I thought for sure the Orin NX would be the chip powering the next Switch. 1024 cores at 765Mhz and 6 Arm A78 cores with a 128 bit memory bus seemed like a reasonable upgrade to expect from Nintendo. Looking at slightly better than PS4 performance when docked with the added benefit of DLSS. Then when observing Steam Deck, we have a much bulkier handheld that is pulling 25 watts total when full tilt and the APU has a TDP of 15 watts, far more than the next Switch would be running on in handheld and probably a higher than even Switch Redacted when docked. So it became very easy to assume that Switch Redacted would fall short of Steam Deck in terms of raw power, but likely have slightly better results because of the low level API. When you start to learn about Drake T239, that's when the opinion starts to radically change. Unlike many of Nintendo's prior products where really promoting the performance of the machine was avoided for obvious reasons, a T239 on 4N clocking over 1Ghz docked and 600Mhz portable starts looking like a very capable machine that is worth promoting its graphical capabilities. I want Switch Redacted later this year, but even if it were to end up being a 2024 product, its not like the performance we are talking about will suddenly look bad.
 


Not sure if this is on topic or not, but it seems like Microsoft and Activision included the current Switch for the 10 year Nintendo deal. Will be interesting to see how this shakes out. On one hand, Call of Duty - Warzone aside - has the benefit of smaller maps VS. Apex, Fortnite, or even Doom Eternal, so image quality and performance should hypothetically be better than those titles. But on the other, the new CoD games are a bit of a mess. I can see from-the-ground-up versions of these games being an okay experience, but not ports.

I should've listened to my dad and started a scissors business...
 


Not sure if this is on topic or not, but it seems like Microsoft and Activision included the current Switch for the 10 year Nintendo deal. Will be interesting to see how this shakes out. On one hand, Call of Duty - Warzone aside - has the benefit of smaller maps VS. Apex, Fortnite, or even Doom Eternal, so image quality and performance should hypothetically be better than those titles. But on the other, the new CoD games are a bit of a mess. I can see from-the-ground-up versions of these games being an okay experience, but not ports.



My guess is that it would struggle to hit 30. I don't even know what the pixel count would be during demanding situations.
 
My guess is that it would struggle to hit 30. I don't even know what the pixel count would be during demanding situations.
Why?

Switch has done just fine with super demanding games. TW3 is an enormous open world, if that can run decently then what's stopping a fairly linear FPS?

I don't believe for one second that COD is "special" in such a way that porting it would be impossible or lead to horrible results, the reason it has not yet happened is purely due to publisher politics. And file size I guess.
 
Switch has done just fine with super demanding games. TW3 is an enormous open world, if that can run decently then what's stopping a fairly linear FPS?

We go through this song and dance for every big third party game arriving to Switch.
 
If you believe Call of Duty can be losslessly compressed 50% (very optimistic and something that will be very hard to do while still having good framerates and load times)

You end up with a 125 GB file size, lol.

Activision also clearly believes that Call of Duty sucks at less than rock-solid 60 FPS. You're going to have to make some truly heroic graphical compromises to get this file size to 50 GBs while maintaining 60 FPS.
 
If COD does come to OG Switch it will be at 720p 30fps. This was the cut back for Doom and Wolfenstein when ported to Switch, so it seems fair to assume that would be the approach with COD on Switch. The drop to 30fps was made for the Wii ports of COD back in the day and porting to the a completely different architecture on Wii would have been far more challenging that getting COD on Switch. I still have my doubts that we will see COD on Switch prior to Redacted coming out, and at that point I think they skip the OG Switch.
 
If COD does come to OG Switch it will be at 720p 30fps. This was the cut back for Doom and Wolfenstein when ported to Switch, so it seems fair to assume that would be the approach with COD on Switch. The drop to 30fps was made for the Wii ports of COD back in the day and porting to the a completely different architecture on Wii would have been far more challenging that getting COD on Switch. I still have my doubts that we will see COD on Switch prior to Redacted coming out, and at that point I think they skip the OG Switch.

Doom and Wolfenstein are single player focused games where 30 FPS is much more acceptable.
 
I wouldn't take Doom and Wolfenstein as benchmarks. Generally, the work of panic button was impressive because they ported games nobody would port at the time, but by today's standards, there are much much more impressive third party ports on switch. The Witcher of course, but also things like Nier Automata and many others I'm forgetting.
 
As I understand the UDA that should not be a big deal for backwards compatibility as long as the New Switch has it's HAL. The HAL on the NEW Switch doesn't need the HAL to be in the old Switch.
You are incorrect, but it's a common mistake. Let's also clarify what we mean by HAL. HAL has at least three meanings in the context of an Nvidia driver. HAL is a term of art in the Windows NT kernel and all drivers have to interact with the HAL. HAL also refers to a hardware block in a modern Nvidia GPU which facilitates the UDA. And HAL is also the term used to refer to the software layer inside the driver that communicates with the HAL block on chip.

A driver lives in either ring 0 or, in the case of microkernals, ring 1. Meaning that they run in the most privileged parts of the system and have sufficient permissions to talk directly to the underlying hardware.

A driver implements a generic software layer that applications which don't live in those privileged rings can use to talk to the hardware in a standard way regardless of which specific piece of hardware is underneath. In the case of a video driver, a huge component of this driver is effectively a complete Graphics API implementation, like DirectX, or NVN, or OpenGL.

The vast majority of this stack is generic across devices. The majority of a DirextX implementation is the same regardless of the hardware. Beneath that, most pieces of hardware from the same vendor have the same structure. An Nvidia GPU might have 20 SMs or 80 SMs, but the SMs operate the same. An Nvidia GPU might have an Optical Flow Accelerator which runs the same microcode, but lives at a different interrupt location on chip.

One way to handle drivers is to have templated piece of software, and when a new GPU is made, all you have to do is plug in all these few thousand facts about the GPU, update the various bits of microcode, and then build your new driver binary.

The innovation of the UDA is to actually take all those facts and plug them into a hardware block on the physical GPU itself, and make sure that the hardware block lives in the exact same location on every single GPU you make. When the driver loads, it goes to a known location on chip, loads up the all those facts, populates them into internal tables, and then makes them available to the generic portions of the driver code.

Critically - the driver doesn't need to query the HAL. If the HAL was, instead, a hardware block that gated all access to the underlying hardware it would be a huge performance bottleneck, and new drivers would have trouble adding new features. Instead if it just knows all of these facts about the hardware it can skip the HAL entirely.

It doesn't matter if the Tegra chips still have the HAL hardware block in place if the driver doesn't use it. And if a driver doesn't use it, and it gets moved to a new piece of hardware, the fact that the NEW device has a HAL is irrelevant, because the software portion will ignore it anyway.

You can run a decompiler over a Nintendo Switch game and plainly see the embedded driver level Maxwell microcode and interrupt addresses. Facts that are normally queried from the HAL are hard coded into video games.

Nintendo Switch games cannot directly initialize a non Maxwell GPU. On a Maxwell GPU, they cannot address more than the two SMs that are in the TX1. And they cannot use NVENV, NVDEC, and OFA on Maxwell GPUs where those live at different interrupt/bus addresses than the TX1s.

Also because of how it all works I don't believe you can bypass the HAL, I could be wrong on this one though. And I don't know if SciresM got that right about the HAL being absent...


SciresM replied to me on Twitter to tell me the HAL is not in the drivers. As I understand it the HAL is not in the drivers, it's part of the hardware.

He may be the Switch Homebrew guy. But he's not an Nvidia Engineer, and he doesn't seem to know much about the UDA specifically. I respect him and his work. I think he's not researching the UDA enough to understand it and is making flawed assumptions. Also being an authority doesn't mean he's always correct. Assuming he is correct because he's an authority is the literal definition of the Appeal to Authority Fallacy.
Well, it actually isn't, since that isn't what the word Appeal means, but I think arguing about the definition of Appeal to Authority is beyond the scope of the discussion, and will end in looking at a dictionary - in other words, an appeal to an authority :)

Fortunately, I don't need an authority on the subject, as I have confirmed this fact about the drivers myself. You are welcome to perform the same process if you doubt my authority, that's perfectly valid. Just as I doubt your authority based on having read the patent paper.

Because the UDA uses a unified instruction set we're already on track for relatively easy backwards compatibility. The new hardware will still have all the old instructions of course. It's the old hardware not having new instructions for the new software that is more of an issue.
This is factually inaccurate. The first is, again, you can plainly see the Maxwell microcode when you extract it from published Nintendo games. You don't need to go to the Dark Web and look up leaked internal Nvidia documentation on the Switch's shader compiler to verify that it doesn't include the UDA ISA by default, but that would be another way of doing it.

Switch games do not use the UDAs ISA, they use raw Maxwell microcode. And Ampere does remove some of those instructions. Ampere cannot run all Maxwell shaders without changes. Nvidia's doesn't document their full shader ISA, but they do document the the fact that there are breaking changes between the architectures.

In fact, games on PC don't use the UDA ISA. They use GLSL or and equivalent, because the UDA ISA is only supported by Nvidia, and Elden Ring would like to run on your AMD box as well. The value of the UDA is in large scale data center applications where the overhead of a shader compile is quickly lost in long running GPU compute operations.


With the starting point of a unified instructions set that has almost 20 years of development I'm not worried about backwards compatibility. Even if they need to do some work, they have an excellent foundation and starting point with that unified instruction set.

I am happy to be proven wrong. But with actual evidence, not appeal to authority fallacies.
As decompilation is legally OK, but redistribution of decompiled data isn't, I think you'll probably have to do your own research to find the exact data you are looking for, if you don't trust members of the datamining//homebrewing community.

Where I am at is that unless someone can prove with facts and evidence that the UDA is absent there is no reason to believe we won't have backwards compatibility. It would be more work to remove the UDA than it's worth. If Nintendo wanted to remove it that would cost extra in terms of people hours. They would need to redesign a lot. From a business standpoint it doesn't seem to make sense to ditch it. Too much extra cost for no benefit.

The UDA isn't a thing you can remove, it is a methodology. If you are referring to the HAL on hardware, I'm not saying it's gone either. I am saying that decompilation clearly demonstrates that the HAL isn't used by Nintendo Switch games.

The Nintendo Switch uses a single flat memory pool for both the System and the GPU. For performance reasons, games on Switch have access to privileges not available to games on PC. The side effect of this model is that the game - application code - is responsible for managing a number of resources that are usually contained inside the kernel driver or an external Graphics API Library, including the memory pool.

Using the HAL and generalizing that code causes games to allocate larger pools of memory than are necessary on Switch, to invoke a CPU intensive compile operation for shader code, and to insulate developers who know the exact hardware profile they are targetting to go around their ass to the get their elbow to make optimizations (like number of work queues) that simply not using the HAL would make easy.

There is a significant performance win for building the API/Driver architecture this way, and is one of the reasons that consoles generally are able to deliver the kind of performance they do - for an example of what happens with identical hardware on HAL based driver stack, look no further than the Nvidia Shield, which runs Android, versus similar games running on the Switch, and both using the TX1.

On PC, this performance overhead is a necessary side effect of games 1) running on windows and 2) supporting multiple GPUs, including non-Nvidia GPUs, and 3) because many memory resources live in System RAM and don't compete with allocations on the GPU side. In the case of a console - especially a tiny one like the Switch with limited CPU performance, a shared memory pool, and only one GPU to support, cutting that out from the software design is a win.

That's why the UDA isn't used. I'm not here telling you BC isn't coming, or is super hard. I'm telling you - and hopefully pointing you in the direction to confirm for yourself - that the design of Nintendo Switch's driver/API model doesn't, factually, do the thing that the Windows/Linux drivers do that solves the BC problem in the way you describe.
 
@NintendoPrime @oldpuck @Alovon11 @Dakhil @LiC @Thraktor @ReddDreadtheLead (just naming people who can check my work here, I got 3 hours of sleep and am powering through work on my second cup of coffee right now)

I recommend scrolling to the bottom and reading the TL;DR: first. It should give the immediate answers, if there is more questions about other aspects of the hardware, or if my explanation is long winded/not clear, please @me.

In order to understand the upgrade "Switch 2" offers, we first need to look at Switch specs:

CPU-


TX1 ARM, 4 A57 cores @1GHz (one core reserved for the OS)
T239 ARM, 8*A78C cores ~1GHz-2GHz (one core reserved for the OS)

Upgraded result to CPU:
A78 is 3 times faster than A57 cores per clock, giving between a 7 times and 14 times performance jump, A78 cores are also faster than Ryzen 2 cores used in PS5/XBS per clock, but clocked much lower, so a 2GHz clock would result in somewhere above 50% of the CPU resources found in PS5/XBS, and far beyond last gen consoles.

When compared to Steam Deck, Steam Deck has 4 cores and 8 threads, while Drake has 8 cores/threads, if Drake is clocked at 2GHz, it would offer a similar CPU resource to Steam Deck, although Steam Deck's CPUs clock to 3.5GHz, because pairs of threads share resources between each core, the overall performance drops here, with somewhere in the neighborhood of 70-80% of having 8 cores at that clock, Drake's 2GHz cores would offer ~70% of 8 cores at 3.5GHz so while Steam Deck has more CPU performance, but it shouldn't be by very much.


RAM-


TX1 4GB 64bit LPDDR4/LPDDR4X ~20GB/s in handheld mode, 25.6GB/s docked (~800MB reserved for os iirc)
T239 8GB to 16GB 128bit LPDDR5(x?) over 60GB/s in handheld mode, up to 102GB/s (137GB/s if lpddr5x).

Upgraded result to the RAM:
3.2GB RAM @20-25GB/s vs 7-15GB RAM @60-102GB/s, we are talking about 3 to 4 times the capacity and speed of Switch, 12GB is probably the most realistic capacity.

102GB/s would be around PS4's 176GB/s RAM speed when architecture advantage is taken into account, as these architectures are far more bandwidth efficient. This should allow for third parties to bring their games forward onto the platform without much problem, bandwidth is less an issue of direct comparison with other devices, and more about individual system's available bandwidth, this is about preventing bottlenecks, rather than increasing performance, so hard to say how this compares to current gen consoles, Steam Deck for instance has 88GB/s of memory bandwidth, but it's a good balance for that system.


Storage-


While storage is unknown, what we do know is the range of storage that could be used:
First, Switch's internal storage is 100MB/s, it uses EMMC.
When compared to Drake, EMMC actually has a speed of 400MB/s, so if it uses this type of memory, expect a 4 times increase in read speeds.

UFS is also a type of storage that could be used, here the minimum speed is twice as fast, and could easily match XBS internal storage if needed.


Load times-


This is a reflection of above's specs, it also would have something to do with the decompression block found in Drake, but lets just go over minimum gains, as that is where we should discuss this, and we will also only be talking about Switch gen 1 titles, because next gen titles we have no real idea about.

If you run across a Switch game (not in the cloud) that takes 30 seconds to load, Drake should load that same data in 7 seconds or less. Most Switch games load in about half that time, so here we are talking about ~3 seconds on Drake. It could be faster if it does use UFS, and there will always be rare hiccups where games just take longer to load, but the direct comparison here is over 4 times faster than Switch.


GPU-


TX1 256 Maxwell cuda cores @ 460MHz and 768MHz for 235GFLOPs and 393GFLOPs
T239 1536 Ampere cuda cores @ 660MHz* and 1125MHz* for 2TFLOPs and 3.456TFLOPs, 48 Tensor cores, 12 RT cores

TX1 Maxwell is a 2015 design that is the 3rd iteration of Maxwell, much closer to Pascal architecture, borrowing most noteably 16fp at 2:1 cuda cores, or twice the flops at half the precision.

Ampere is over half a decade newer, it has mesh shaders, VRS, and a slew of other GPU features, that increase raw speed beyond what paper math can tell you, I'll discuss DLSS a little later, because it's much clearer to see what it offers if we separate it from the other GPU features.

Drake's GPU is 6 times bigger than Switch's, in handheld mode given these speculative (possibly real) clocks, it would out perform PS4 before DLSS is used, again even beyond just having more raw performance over the PS4, it also has those GPU features that the 2011 GPU architecture found in PS4, is lacking. VRS is said to offer a 20% increase in performance, and mesh/geometry shaders, can offer 25% increase in performance as well, just these do features combined can add 50% performance increase to the same architecture per flop. Comparing GCN to Ampere is much less precise, but we can look at the raw performance here and conclude that Drake > PS4. “if the engine supports the features that is, which will enable the game to make use of it. However, even if these aren’t accounted for there’s been a decade of improvements between architectures of the early 2010s and architectures now, Drake should be ahead, and if all things are considered it should be more efficient at doing the job if enabling other unique features” -Redddeadthelead

When compared to Steam Deck's RDNA GPU, it has these features and while the GPU is generally clocked lower for 1.3TFLOPs, it can reach 1.6TFLOPs, and it does have these features, as well as a flop advantage over Ampere in PCs, however in a closed environment, Ampere should pick up ground, I'd put 1.6TFLOPs Steam Deck around the same as a 660MHz clocked (2TFLOPs) Drake GPU, before DLSS is applied. Once DLSS is applied, it can significantly drop the required performance and offer a higher resolution, and if Drake is capable of frame generation, it could further expand this lead, basically a PS4 Pro to XB1X in your hands at the very best, however it's best to just think of it as a Steam Deck with DLSS on top. (Steam Deck is also a poor FSR2 system, so it really can't offer it's own competitive upscaling tech).

When docked, Drake at 1.125GHz offers 3.456TFLOPs, this should be similar to XBSS' 4TFLOPs. DLSS should help it match whatever XBSS can do with FSR2, and if it comes with 12GB or more RAM, it might actually have less of a RAM issue than XBSS, even though the RAM is half as fast, because RAM speed is more about bottlenecks as I discussed above.


The TL;DR
Drake's CPU is somewhere around Steam Decks, slower, but in the ballpark. (more cores, same threads, less clock) ~85% of SD?
Drake's GPU in handheld should offer similar, but better performance over Steam Deck, ~130-200%
Drake's GPU in docked should match or exceed XBSS, thanks to DLSS being superior to FSR2. ~80-100%
Drake's RAM is 3 to 4 times the capacity and speed of Switch's, and should fit well with current gen consoles.
Drake's Storage is at least 4 times faster than Switch's and load times should shrink in Switch gen 1 games by over 4 times.
Not much for me to check here since I'm not into extrapolation, just known or at least evidenced specs.
 
You are incorrect, but it's a common mistake. Let's also clarify what we mean by HAL. HAL has at least three meanings in the context of an Nvidia driver. HAL is a term of art in the Windows NT kernel and all drivers have to interact with the HAL. HAL also refers to a hardware block in a modern Nvidia GPU which facilitates the UDA. And HAL is also the term used to refer to the software layer inside the driver that communicates with the HAL block on chip.

A driver lives in either ring 0 or, in the case of microkernals, ring 1. Meaning that they run in the most privileged parts of the system and have sufficient permissions to talk directly to the underlying hardware.

A driver implements a generic software layer that applications which don't live in those privileged rings can use to talk to the hardware in a standard way regardless of which specific piece of hardware is underneath. In the case of a video driver, a huge component of this driver is effectively a complete Graphics API implementation, like DirectX, or NVN, or OpenGL.

The vast majority of this stack is generic across devices. The majority of a DirextX implementation is the same regardless of the hardware. Beneath that, most pieces of hardware from the same vendor have the same structure. An Nvidia GPU might have 20 SMs or 80 SMs, but the SMs operate the same. An Nvidia GPU might have an Optical Flow Accelerator which runs the same microcode, but lives at a different interrupt location on chip.

One way to handle drivers is to have templated piece of software, and when a new GPU is made, all you have to do is plug in all these few thousand facts about the GPU, update the various bits of microcode, and then build your new driver binary.

The innovation of the UDA is to actually take all those facts and plug them into a hardware block on the physical GPU itself, and make sure that the hardware block lives in the exact same location on every single GPU you make. When the driver loads, it goes to a known location on chip, loads up the all those facts, populates them into internal tables, and then makes them available to the generic portions of the driver code.

Critically - the driver doesn't need to query the HAL. If the HAL was, instead, a hardware block that gated all access to the underlying hardware it would be a huge performance bottleneck, and new drivers would have trouble adding new features. Instead if it just knows all of these facts about the hardware it can skip the HAL entirely.

It doesn't matter if the Tegra chips still have the HAL hardware block in place if the driver doesn't use it. And if a driver doesn't use it, and it gets moved to a new piece of hardware, the fact that the NEW device has a HAL is irrelevant, because the software portion will ignore it anyway.

You can run a decompiler over a Nintendo Switch game and plainly see the embedded driver level Maxwell microcode and interrupt addresses. Facts that are normally queried from the HAL are hard coded into video games.

Nintendo Switch games cannot directly initialize a non Maxwell GPU. On a Maxwell GPU, they cannot address more than the two SMs that are in the TX1. And they cannot use NVENV, NVDEC, and OFA on Maxwell GPUs where those live at different interrupt/bus addresses than the TX1s.


Well, it actually isn't, since that isn't what the word Appeal means, but I think arguing about the definition of Appeal to Authority is beyond the scope of the discussion, and will end in looking at a dictionary - in other words, an appeal to an authority :)

Fortunately, I don't need an authority on the subject, as I have confirmed this fact about the drivers myself. You are welcome to perform the same process if you doubt my authority, that's perfectly valid. Just as I doubt your authority based on having read the patent paper.


This is factually inaccurate. The first is, again, you can plainly see the Maxwell microcode when you extract it from published Nintendo games. You don't need to go to the Dark Web and look up leaked internal Nvidia documentation on the Switch's shader compiler to verify that it doesn't include the UDA ISA by default, but that would be another way of doing it.

Switch games do not use the UDAs ISA, they use raw Maxwell microcode. And Ampere does remove some of those instructions. Ampere cannot run all Maxwell shaders without changes. Nvidia's doesn't document their full shader ISA, but they do document the the fact that there are breaking changes between the architectures.

In fact, games on PC don't use the UDA ISA. They use GLSL or and equivalent, because the UDA ISA is only supported by Nvidia, and Elden Ring would like to run on your AMD box as well. The value of the UDA is in large scale data center applications where the overhead of a shader compile is quickly lost in long running GPU compute operations.



As decompilation is legally OK, but redistribution of decompiled data isn't, I think you'll probably have to do your own research to find the exact data you are looking for, if you don't trust members of the datamining//homebrewing community.



The UDA isn't a thing you can remove, it is a methodology. If you are referring to the HAL on hardware, I'm not saying it's gone either. I am saying that decompilation clearly demonstrates that the HAL isn't used by Nintendo Switch games.

The Nintendo Switch uses a single flat memory pool for both the System and the GPU. For performance reasons, games on Switch have access to privileges not available to games on PC. The side effect of this model is that the game - application code - is responsible for managing a number of resources that are usually contained inside the kernel driver or an external Graphics API Library, including the memory pool.

Using the HAL and generalizing that code causes games to allocate larger pools of memory than are necessary on Switch, to invoke a CPU intensive compile operation for shader code, and to insulate developers who know the exact hardware profile they are targetting to go around their ass to the get their elbow to make optimizations (like number of work queues) that simply not using the HAL would make easy.

There is a significant performance win for building the API/Driver architecture this way, and is one of the reasons that consoles generally are able to deliver the kind of performance they do - for an example of what happens with identical hardware on HAL based driver stack, look no further than the Nvidia Shield, which runs Android, versus similar games running on the Switch, and both using the TX1.

On PC, this performance overhead is a necessary side effect of games 1) running on windows and 2) supporting multiple GPUs, including non-Nvidia GPUs, and 3) because many memory resources live in System RAM and don't compete with allocations on the GPU side. In the case of a console - especially a tiny one like the Switch with limited CPU performance, a shared memory pool, and only one GPU to support, cutting that out from the software design is a win.

That's why the UDA isn't used. I'm not here telling you BC isn't coming, or is super hard. I'm telling you - and hopefully pointing you in the direction to confirm for yourself - that the design of Nintendo Switch's driver/API model doesn't, factually, do the thing that the Windows/Linux drivers do that solves the BC problem in the way you describe.
i feel like i need a cigarette
 
Wait… So I haven’t been checking this forum as much lately (work, school, etc.).

I come back and now it seems like everyone is convinced that not only will the new Switch be pretty powerful (compared to the original Switch), but it will also be released this year as well???

How much have I missed?! xD
 
Wait… So I haven’t been checking this forum as much lately (work, school, etc.).

I come back and now it seems like everyone is convinced that not only will the new Switch be pretty powerful (compared to the original Switch), but it will also be released this year as well???

How much have I missed?! xD
The specs leaked over a year ago.

Release timing is still very much up in the air.
 
Wait… So I haven’t been checking this forum as much lately (work, school, etc.).

I come back and now it seems like everyone is convinced that not only will the new Switch be pretty powerful (compared to the original Switch), but it will also be released this year as well???

How much have I missed?! xD
It can be late 2023 but its nothing confirmed yet
 
Please read this staff post before posting.

Furthermore, according to this follow-up post, all off-topic chat will be moderated.
Last edited:


Back
Top Bottom