• Hey everyone, staff have documented a list of banned content and subject matter that we feel are not consistent with site values, and don't make sense to host discussion of on Famiboards. This list (and the relevant reasoning per item) is viewable here.
  • Do you have audio editing experience and want to help out with the Famiboards Discussion Club Podcast? If so, we're looking for help and would love to have you on the team! Just let us know in the Podcast Thread if you are interested!

StarTopic Future Nintendo Hardware & Technology Speculation & Discussion |ST| (Read the staff posts before commenting!)

I don't think introducing variable core count into power profiles would make sense for Drake because:
  • If wattage is a concern such that reduced SM and CPU core in handheld mode is necessary, I'd assume that NV (and Nintendo) engineers already spotted it during the design phase, not close to release.
  • Prioritizing variable core count also doesn't make sense for a hybrid device imo. The Tegra line does have power profiles with such feature, but I'm not sure if they can switch between modes on the fly. Internet search so far points to nvpmodel, a tool used to configure Jetson/AGX power modes. However I understood that these modes are set before running, but not during a session (training, inference, etc). Such profiles do not really exist in desktop PCs either.
Yes you can disable and enable TPCs on the fly on Tegra chips, and Orin supports this. You can run work on the GPU, then disable TPCs and continue on merrily for at least some workloads, though I would imagine there is trickery in the various SDKs that support it.

Switching GPU configurations happens all the time in desktop land - users might have two GPUs and even switch between them mid work with no interruption. Hardcoding the number of SMs makes sense in a driver that is destined for exactly one piece of hardware, and just because you define the number of SMs in a constant doesn't mean you can't support using a subset of them.

However, I don't think that is how it will work. Unlike, say, a laptop with two GPUs, we're not going to switch to a more efficient piece of hardware. It's the same hardware, and clock gating is pretty much the way to go there. Unless docked mode is already running so close to the bottom of the power curve there are no savings in coming down.
 
Yes, it's extremely unlikely it wouldn't show up in the GPU Definition section in NVN2
This statement is way too definitive to be making secondhand. But I'll try to explain a bit more what is actually going on and why I think it indicates the SM count doesn't change.

The reason why the 12 SM number is in NVN2 at all is because a developer may want to query it in order to parameterize some part of their pipeline in a way that scales with SM count. If the number could change between docked and handheld modes, then it wouldn't be appropriate to use a single constant for it, as the query results would be incorrect when running in handheld mode. Furthermore, in one or two places, the NVN2 driver itself uses that value for an optional debug validation against user-provided memory buffers. This too would become incorrect if the number of active SMs changed.

The result would essentially be an overallocation of resources if the actual number of available SMs was less than 12. Note that this does not impact the functional operation of the GPU, which is why the existence of the value in NVN2 doesn't definitively prove a different SM count can't be used. You would be wasting RAM by creating enough scratch space for 12 SMs to do their job, while since only 6 are active you could have gotten away with half as much space, but the 6 SMs would still function in that scenario.

Trying to make an NVN pipeline (where the user is responsible for all resource allocation) react to a changing number of SMs is completely infeasible, whereas the overallocation scenario is at least technically possible, but I think it totally goes against dedicated game console engineering to have to always allocate the resources to back 12 SMs when you may not need them. Developers want full control over every bit of allocation and performance -- and this is a core goal of NVN's design -- and that kind of guaranteed waste is just not realistic.
 
Switching GPU configurations happens all the time in desktop land - users might have two GPUs and even switch between them mid work with no interruption. Hardcoding the number of SMs makes sense in a driver that is destined for exactly one piece of hardware, and just because you define the number of SMs in a constant doesn't mean you can't support using a subset of them.
But all of that is handled by the driver, which allocates VRAM on the user's behalf in order to work with the configuration that they've selected. Tegras on the other hand don't have dedicated VRAM, and the RAM the GPU uses has to be allocated and provided by the application, not the driver. And the application can't be on the hook to monitor for, and then spend time tearing down and recreating some part of the pipeline due to, a change in GPU configuration, because pre-validation, repeated submissions of the same data, and the performance you can squeeze out when you know exactly what your hardware config is are all key tenets of dedicated game console engineering and explicit design goals of NVN.
 
Yes you can disable and enable TPCs on the fly on Tegra chips, and Orin supports this. You can run work on the GPU, then disable TPCs and continue on merrily for at least some workloads, though I would imagine there is trickery in the various SDKs that support it.

Switching GPU configurations happens all the time in desktop land - users might have two GPUs and even switch between them mid work with no interruption. Hardcoding the number of SMs makes sense in a driver that is destined for exactly one piece of hardware, and just because you define the number of SMs in a constant doesn't mean you can't support using a subset of them.

However, I don't think that is how it will work. Unlike, say, a laptop with two GPUs, we're not going to switch to a more efficient piece of hardware. It's the same hardware, and clock gating is pretty much the way to go there. Unless docked mode is already running so close to the bottom of the power curve there are no savings in coming down.
True, I forgot that laptops can already switch between iGPU and dGPU. Also agree on the difficulty, using laptops again as an example, I think Windows allows users to set which GPU to use by app. But no hot swap when an app is running afaik. I can't imagine the complexity introduced when switching from docked to handheld, with all parts moving around within the GPU as well as how the game has to deal with that while changing its own configurations. It would be better off shutting down the GPU then launch it again with changed SM config.
 
0
It sounds like the same thing as the first to me, not sure what part makes you think frame N+2 comes into play?
The two comes from every other frame being generated with the frame generation network. I will doctor up some Nvidia diagrams to show what I mean.

I am using this convention to number frames, where the frame being generated is always frame N+1:

image.png


Interpolation: The super resolution network operates on frame N+2, which is the "current frame." Then, the OFA and frame generation network operate on frame N and frame N+2 to generate frame N+1.

image.png


Extrapolation: The super resolution network operates on frame N, which is the "current frame." Then, the OFA and frame generation network operate on frame N and frame N-2 to generate frame N+1.***

image.png


***I am assuming that optical flow fields are only calculated from frames that are rendered by super resolution. You could also extrapolate frame N+1 by operating on frame N and frame N-1 if you wanted to.
 
But all of that is handled by the driver, which allocates VRAM on the user's behalf in order to work with the configuration that they've selected. Tegras on the other hand don't have dedicated VRAM, and the RAM the GPU uses has to be allocated and provided by the application, not the driver. And the application can't be on the hook to monitor for, and then spend time tearing down and recreating some part of the pipeline due to, a change in GPU configuration, because pre-validation, repeated submissions of the same data, and the performance you can squeeze out when you know exactly what your hardware config is are all key tenets of dedicated game console engineering and explicit design goals of NVN.
Yeah, you basically said everything i intended to say better. Just because SMs is a #define doesn't prohibit turning SMs on and off, but if you want to scale that way you need a big thick OS/Driver stack, or you have to have every developer handle it themselves. NVN is closer to the metal than Vulkan, and that's asking game devs to do a lot of work in engine when you can just clock down.

I haven't look at the NVN2 code, so the confirmation is nice. Just bugs be that it's considered a definitive impossibility. It just seems like an extremely bad idea
 
0
I just can't wait for the Switch "Pro" to be announced. Nintendo using such tech is very exciting and I just want it announced haha
 
0
The two comes from every other frame being generated with the frame generation network. I will doctor up some Nvidia diagrams to show what I mean.
Yes, I just don't see why you infer interpolation from one paragraph and not the other, when they sound to me like rewordings of the same thing.

Illustration of why after-the-fact interpolation would be much less exciting for real-time stuff. Let's say here are a few successive frames as the game would be without any newly generated frames.
lOCiPNE.png

If creating future frames from past frames, the "regular" frames can still appear when they did before, while the generated frames take up what would've been the second half of the original frame's display time.
bot1wkm.png

But if it has to wait for a frame to be finished, then spend a non-zero amount of time producing an in-between frame before it can display even that, everything gets pushed back. More like
smFmuMv.png

A pretty big cost, on top of the $1300 or whatever spent on the GPU.
 
Would make sense to launch it with Tears of the Kingdom. However, I personally don't care what it comes out with, I just want it to come out haha
 
Yes, I just don't see why you infer interpolation from one paragraph and not the other, when they sound to me like rewordings of the same thing.
I mean, I was totally onboard with the extrapolative model for the last two days for that reason. You are right on both counts; it would be an actual improvement latency-wise and a cooler innovation. Two things started to bug me though:

First, using the words "sequential" and "intermediate" in the same sentence implies that the intermediate will be inserted between the two sequential frames. It could just be poor writing though, so that didn't bother me too much at first.

Second, and more important: those motorcycle shadows, which I didn't pay close attention to before. In the first image, notice how one of the shadows is ahead of the motorcycle and one is behind. The motorcycle and its rider have engine motion vectors, so they get warped to the correct intermediate position. But when DLSS tries to do the same for the RT shadows, which don’t have engine motion vectors, they are incorrectly warped with the motion vectors of the street geometry. One shadow (from the previous frame) is warped to a position behind the motorcycle; the other shadow (from the next frame) is warped to a position ahead of it, leading to that dual shadow artifact.

If it was an extrapolative model, wouldn't both of the shadows be warped to be behind the motorcycle? To me, this is an indictor that it might be interpolation.

Whereas the Optical Flow Accelerator accurately tracks pixel level effects such as reflections, DLSS 3 also uses game engine motion vectors to precisely track the movement of geometry in the scene. In the example below, game motion vectors accurately track the movement of the road moving past the motorcyclist, but not their shadow. Generating frames using engine motion vectors alone would result in visual anomalies like stuttering on the shadow.

nvidia-dlss-3-without-optical-flow.jpg


For each pixel, the DLSS Frame Generation AI network decides how to use information from the game motion vectors, the optical flow field, and the sequential game frames to create intermediate frames. By using both engine motion vectors and optical flow to track motion, the DLSS Frame Generation network is able to accurately reconstruct both geometry and effects, as seen in the picture below.

nvidia-dlss-3-motion-optical-flow-estimation.jpg
In summary:
image.png


edited to clarify wording
 
Last edited:
I just had a play around with the Jetson power tool myself, and I'm getting quite different estimates than were posted before.

For the GPU, my methodology was to turn off the DVA/PVA, etc., set the CPU to a fixed value, and compare the power consumption with the GPU turned off entirely to the power consumption at each GPU clock speed for 4 TPC (8 SM) and 8 TPC (16 SM) configurations. Then for each clock, I took the mid-point of the 4 TPC and 8 TPC power consumption, and subtracted the power with the GPU turned off. I set the GPU load level to high for all these tests. I feel this is a reasonable way to isolate the GPU power consumption and estimate it in a 12 SM configuration.

The power consumption figures I got for the GPU with 12 SMs are:

420.75MHz - 5.70W
522.75MHz - 6.80W
624.75MHz - 8.65W
726.75MHz - 10.85W
828.75MHz - 13.20W
930.75MHz - 16.35W
1032.75MHz - 19.90W
1236.75MHz - 29.35W

These probably overestimate the power consumption of an 8nm Drake GPU by a bit, as it uses smaller tensor cores, has fewer ROPs, and should have some small power savings by using only a single GPC rather than two. Still, I wouldn't expect the power consumption to be significantly lower than this. A few months ago I made some rough estimates of power consumption of Ampere on 8nm by measuring my RTX 3070 and scaling from there, and I got a figure of 16.1W for 12 SMs at 1155MHz. This was a very crude estimate (just take the reported power consumption from GPU-Z and divide by the number of SMs), and seems to come in a bit lower than we see above. I'd err on the side of trusting Nvidia's official tool for estimating Orin power consumption over my rough estimate based on a GPU 4x as big, though.

I believe the original Switch's GPU consumed somewhere around 3W in portable mode, which means we're already pushing things at the base clock of 420MHz. Even if Drake comes down to about 5W at 420MHz from the smaller tensor cores and other optimisations, that's still a big increase. It's impossible to say how power consumption behaves at lower clocks, and maybe they could squeeze it in at the same 384MHz clock as the original Switch, but my guess is if Nvidia doesn't clock below 420MHz in any configuration, it's because it's not efficient to do so. I'm still of the opinion that running a full 12 SM GPU on 8nm isn't feasible in handheld mode for a device the same size as the current Switch, which means either disabling SMs in handheld mode, or a different manufacturing process.

On the docked side of things, they're more limited by cooling than power consumption, and a 768MHz clock seems reasonable with a similar cooling set-up to the original Switch, if running a bit hotter. They could possibly push up higher to 900MHz-1GHz or so if they really wanted to, but it's more a question of Nintendo's tolerance for a more expensive cooling setup and/or more fan noise than a technical limitation.

For the CPU, as it's not possible to turn the CPU off entirely, I simply took the difference between the 4 core and 12 core CPU configuration for each clock, again with the CPU load level set to high. As a bit of a sense-check, I also checked the reported power consumption for each clock on an 8 core configuration, and validated that the difference in power draw between 4 and 8 cores was approximately the same as between 8 and 12 cores. This is true typically to within 0.1W, so I think it's reasonable to assume the power cost is linear with respect to the number of clusters activated (ie if going from 1 to 2 clusters adds the same amount of power consumption as moving from 2 to 3, then we can infer that going from 0 to 1 cluster of cores costs a similar amount).

The figures I got for an 8 core CPU are as follows:

1113.6MHz - 2.2W
1267.2MHz - 2.5W
1497.6MHz - 3.1W
1651.2MHz - 3.8W
1728.0MHz - 4.1W
1881.6MHz - 4.9W
2035.2MHz - 5.8W
2188.8MHz - 7.1W

CPU power consumption on Drake might be a bit lower due to using a single cluster and potentially less cache, but I would expect only pretty small differences here. Compared to the GPU, though, I don't think there's as big an issue. The CPU on the original Switch consumed a bit under 2W, so Nintendo could clock a bit higher than the old 1GHz. While ~1.1GHz or so might not sound like much, an 8 core A78 clocked at 1.1GHz (with 7 cores for games) is still a huge increase over 4 A57 cores (with 3 for games) at 1GHz. If they push the power draw of the CPU up a bit (which I feel is more likely than increasing handheld GPU power draw), then at 3W they could get up to around 1.5GHz, which would put them in a pretty nice spot.

The CPU side of things is a bit better than I expected, to be honest. You're obviously not going to clock 8 A78s up to 2GHz+ on 8nm in Switch's form-factor, but they don't have to clock them down to oblivion to get them to work. It would still be comfortably more capable than PS4/XBO's CPUs in a portable device while on an old manufacturing process, which isn't a bad position to be in.
I just woke up, but I have to say that this doesn't make sense, is blacktangmaster's numbers fake? Did you forget to turn off something? At the bottom of the console info he posted, it showed other power draws, so why is there such a big difference between your numbers and his?
 
0
What was said doesn't make sense anyway, there's no one point in time where "developers switch over to new hardware."
I was gonna say "tell that to PlayStation/Xbox devs" but then I thought about it for a second

..like I literally don't even know what to consider "current gen" anymore. 😅
 
0
must be some specific dev who got kits later than others
The actual claim is a little bonkers. Four months ago they were told by a developer with Switch dev kits that they were expecting around this time for the majority of devs to be provided with, and change over to, Switch Next kits.

So it’s not about individual developers but broad availability, and four months ago would have been May, and we know internal teams at Nvidia had hardware in April for Linux dev

Edit: also, the context in the podcast is discussing how leaks are bad, that they have no way of verifying it, they don’t want to be in the business of being gatekeepers for rumors, and devs say wild shit to them all the time they ignore, and that if they said it all there would be a rash of dumb articles written about their claims with no context (which we are in the first step of spreading around to make that happen)
 
The actual claim is a little bonkers. Four months ago they were told by a developer with Switch dev kits that they were expecting around this time for the majority of devs to be provided with, and change over to, Switch Next kits.

So it’s not about individual developers but broad availability, and four months ago would have been May, and we know internal teams at Nvidia had hardware in April for Linux dev

Edit: also, the context in the podcast is discussing how leaks are bad, that they have no way of verifying it, they don’t want to be in the business of being gatekeepers for rumors, and devs say wild shit to them all the time they ignore, and that if they said it all there would be a rash of dumb articles written about their claims with no context (which we are in the first step of spreading around to make that happen)
"leaks are bad....

but let me leak this shit"

lel
 

So you had to make an agreement a few years ago for 2022 and then here you are.

J[ensen] H[uang]: Yeah, exactly. You have to place orders a year-and-a-half in advance, and you're so short, the whole industry is so short, the whole market is so short, that you're more than delighted to do so. In our case, we are running almost a $10 billion dollar a quarter company and $10 billion a quarter, a year and a half, that's a lot of supply. I think the benefit of course, is that our company is really agile and the moment that we saw it, we have really great products and they're going to be terrific for a long time and we were fortunate to have just started a brand new product cycle. When we prepaid a whole bunch of capacity, it's for capacity of Ada and Hopper and ORIN and so on, so forth. We were fortunate in that regard, but the fundamental change is the cycle time of the world supply chain and that really has to get shortened.
 
If anyone wants a summary of what’s going on, we are simply at a position where it’ll be somewhere, the switch to will be a major upgrade over the original switch. That’s the bare minimum I can say at this point.

If we look at the article Dakhil posted above, such as Nvidia securing capacity, and it’s done at least 1.5 years in advance we can probably make a gander at when this happened for something like Drake.
 
0
For the GPU, my methodology was to turn off the DVA/PVA, etc., set the CPU to a fixed value, and compare the power consumption with the GPU turned off entirely to the power consumption at each GPU clock speed for 4 TPC (8 SM) and 8 TPC (16 SM) configurations. Then for each clock, I took the mid-point of the 4 TPC and 8 TPC power consumption, and subtracted the power with the GPU turned off. I set the GPU load level to high for all these tests. I feel this is a reasonable way to isolate the GPU power consumption and estimate it in a 12 SM configuration.

The power consumption figures I got for the GPU with 12 SMs are:

420.75MHz - 5.70W
522.75MHz - 6.80W
624.75MHz - 8.65W
726.75MHz - 10.85W
828.75MHz - 13.20W
930.75MHz - 16.35W
1032.75MHz - 19.90W
1236.75MHz - 29.35W

These probably overestimate the power consumption of an 8nm Drake GPU by a bit, as it uses smaller tensor cores, has fewer ROPs, and should have some small power savings by using only a single GPC rather than two. Still, I wouldn't expect the power consumption to be significantly lower than this. A few months ago I made some rough estimates of power consumption of Ampere on 8nm by measuring my RTX 3070 and scaling from there, and I got a figure of 16.1W for 12 SMs at 1155MHz. This was a very crude estimate (just take the reported power consumption from GPU-Z and divide by the number of SMs), and seems to come in a bit lower than we see above. I'd err on the side of trusting Nvidia's official tool for estimating Orin power consumption over my rough estimate based on a GPU 4x as big, though.

I believe the original Switch's GPU consumed somewhere around 3W in portable mode, which means we're already pushing things at the base clock of 420MHz. Even if Drake comes down to about 5W at 420MHz from the smaller tensor cores and other optimisations, that's still a big increase. It's impossible to say how power consumption behaves at lower clocks, and maybe they could squeeze it in at the same 384MHz clock as the original Switch, but my guess is if Nvidia doesn't clock below 420MHz in any configuration, it's because it's not efficient to do so. I'm still of the opinion that running a full 12 SM GPU on 8nm isn't feasible in handheld mode for a device the same size as the current Switch, which means either disabling SMs in handheld mode, or a different manufacturing process.

On the docked side of things, they're more limited by cooling than power consumption, and a 768MHz clock seems reasonable with a similar cooling set-up to the original Switch, if running a bit hotter. They could possibly push up higher to 900MHz-1GHz or so if they really wanted to, but it's more a question of Nintendo's tolerance for a more expensive cooling setup and/or more fan noise than a technical limitation.

For the CPU, as it's not possible to turn the CPU off entirely, I simply took the difference between the 4 core and 12 core CPU configuration for each clock, again with the CPU load level set to high. As a bit of a sense-check, I also checked the reported power consumption for each clock on an 8 core configuration, and validated that the difference in power draw between 4 and 8 cores was approximately the same as between 8 and 12 cores. This is true typically to within 0.1W, so I think it's reasonable to assume the power cost is linear with respect to the number of clusters activated (ie if going from 1 to 2 clusters adds the same amount of power consumption as moving from 2 to 3, then we can infer that going from 0 to 1 cluster of cores costs a similar amount).

The figures I got for an 8 core CPU are as follows:

1113.6MHz - 2.2W
1267.2MHz - 2.5W
1497.6MHz - 3.1W
1651.2MHz - 3.8W
1728.0MHz - 4.1W
1881.6MHz - 4.9W
2035.2MHz - 5.8W
2188.8MHz - 7.1W

CPU power consumption on Drake might be a bit lower due to using a single cluster and potentially less cache, but I would expect only pretty small differences here. Compared to the GPU, though, I don't think there's as big an issue. The CPU on the original Switch consumed a bit under 2W, so Nintendo could clock a bit higher than the old 1GHz. While ~1.1GHz or so might not sound like much, an 8 core A78 clocked at 1.1GHz (with 7 cores for games) is still a huge increase over 4 A57 cores (with 3 for games) at 1GHz. If they push the power draw of the CPU up a bit (which I feel is more likely than increasing handheld GPU power draw), then at 3W they could get up to around 1.5GHz, which would put them in a pretty nice spot.

The CPU side of things is a bit better than I expected, to be honest. You're obviously not going to clock 8 A78s up to 2GHz+ on 8nm in Switch's form-factor, but they don't have to clock them down to oblivion to get them to work. It would still be comfortably more capable than PS4/XBO's CPUs in a portable device while on an old manufacturing process, which isn't a bad position to be in.
No matter what happens.. I hope the CPU is at least 1.5GHz for 7 cores... 1Ghz is just going to put it in the same power gap as Switch vs PS4, and its gonna be last gen all over again (eventually).
Tears of the Cycle
Drake's Deception Despair: Tears of Samsung
 
Last edited:
0
Saw this on era

Post in thread 'Nintendo |OT| Sequel to the Nintendo Discussion Threads Of The Past'
https://www.resetera.com/threads/ni...sion-threads-of-the-past.570280/post-93733220



At 31:10

Something about the delivery of this rumour almost sounds like he's intentionally pulling it out of his ass out of spite, especially given the context of the title of the leakers that NEED to be STOPPED (but granted, don't have the time to watch the whole video right now). Also calling out "Can I be Nate now?" seems a little...jaded? Mean-spirited? idk, but even if it is real it seems almost harmless enough that it still wouldn't affect a general announcement of the new hardware before 2023 anyways.

~~~

Just for my clarification re: the whole #nm discussion: whether it's 8nm, 7nm or 5nm, it's mostly going to affect the battery life, right? All the other components are established to determine how much power this device will have, it's just the nm's will determine the clock speed and power use (or something like that)

Also, just because others mentioned there isn't much in the way of die shrinking an 8nm SoC, I'm wondering outloud if this is what would in particular cement it as a hardware refresh rather than a full on successor. No need for a Switch Drake Lite, because whatever node Nintendo moves with in 2027 will be some 6nm or 5nm magic number with a die shrink in 2030 for the Switch 2 and Switch 2 Lite

(God I hope this makes sense; I really feel like I'm just tossing buzzwords around that I hope correctly communicate the thoughts I have lol)
 
Yuzu devs are allowed to (and do) have a buggy product. Nintendo doesn't have that luxury for a system that costs hundreds. That's all there is to it.
The Yuzu development team is also larger than NERD, isn't legally reusable by Nintendo, and based on SLOC alone is a 10+ million dollar project for the core emulator alone.
Yuzu is also supporting more hardware configs than can be reasonably counted by folks who have far less access to engineering data.

I mean, if that’s "all there is to it", then I guess we should consider all of Nintendo’s retrofits to pre-existing software to be a damned magic trick that defies this fact.

We're talking about a company that:
  • is only supporting one hardware config, and thus greatly simplifying the required work
  • has full and detailed access to the metal of that single hardware config
  • co-engineered that hardware config and its SDKs
  • has a demonstrable direct line of communication to and co-operative development relationship with Nvidia, one of the largest tech companies in the world and the engineers of some of the technology undoubtedly leveraged in part by Yuzu devs to make their solution possible in the absence of those same connections

And yet Nintendo are the ones who are woefully ill-equipped to engineer a better solution that isn't a bug-laden mess and do it with less labour based on those rather profound advantages? How does that make sense unless you believe both Nintendo and Nvidia to be bumblingly inept?
 
Last edited:
Yuzu is also supporting more hardware configs than can be reasonably counted by folks who have far less access to engineering data.

I mean, if that’s "all there is to it", then I guess we should consider all of Nintendo’s retrofits to pre-existing software to be a damned magic trick that defies this fact.

We're talking about a company that:
  • is only supporting one hardware config, and thus greatly simplifying the required work
  • has full and detailed access to the metal of that single hardware config
  • co-engineered that hardware config and its SDKs
  • has a demonstrable direct line of communication to and co-operative development relationship with Nvidia, one of the largest tech companies in the world and the engineers of some of the technology undoubtedly leveraged in part by Yuzu devs to make their solution possible in the absence of those same connections

And yet Nintendo are the ones who are woefully ill-equipped to engineer a better solution that isn't a bug-laden mess and do it with less labour based on those rather profound advantages? How does that make sense unless you believe both Nintendo and Nvidia to be bumblingly inept?
Nintendo's prior efforts are either:
-Hardware solutions that dodge the software problem entirely
-Emulated and tested on a per-game basis

Neither are feasible for what you're asking. There's a reason even the Wii VC was a drip-feed, back when Nintendo was still gung-ho about Virtual Console; there was a manpower constraint. Every game on every emulation service they provide is extensively tested. Yes, even N64 on NSO.

Yuzu's audience by nature is one that doesn't mind tinkering with resolution settings to dial in everything they play. It's simply not a good comparison for that reason. Bugs can be introduced by raising the resolution that have nothing to do with the hardware configuration, but the game logic itself. Game engines are notoriously messy and do weird shit that causes them to break for the stupidest reasons. Nintendo's audience is going to expect everything to just work. You can't do that by brute forcing higher resolutions and especially frame rates for everything.

The more relevant comparison than Yuzu is PS4 BC on the PS5. It's exactly what I called the best case scenario; some games (honestly, quite a few) were updated by the publishers to take advantage of the hardware, while everything else simply runs at or closer to the top end of their respective dynamic resolutions and frame rates. Even in Xbox land, where Microsoft is being rightfully lauded for their OG Xbox and 360 support, it's the same thing with Xbox One games on Series S|X. A selection runs at higher target resolutions and frame rates, and everything else just runs better within the constraints of their original targets.

Because that's how you do it. Those enhanced versions of OG and 360 games are released in batches because they're all tested with those enhancements. You're asking Nintendo to do the same thing for thousands of games at once. It's just not going to happen.

Yuzu cannot be a valid comparison because if you run into issues, it's acceptable. The difference in getting from there to a perfect solution for even one hardware configuration is utterly astronomical, assistance from the people who made it or no (it's not like AMD doesn't play nice with Microsoft and Sony either).

Edit: The missing link for you here seems to be that no amount of hardware/emulation/translation layer magic can fix faulty game code at a universal level. That is why they need testing. If, for whatever reason (usually related to crunch), a game does something that relies on the resolution not being higher than it's set to go, a perfect emulation of that at a higher resolution will result in a bug, because that is the hardware accurate thing to happen if you run the game at a higher resolution. I'm pretty confident that even Yuzu's devs would tell you what you're asking for isn't going to happen for this reason. I know the Dolphin devs would.
 
Last edited:
I just had a play around with the Jetson power tool myself, and I'm getting quite different estimates than were posted before.

For the GPU, my methodology was to turn off the DVA/PVA, etc., set the CPU to a fixed value, and compare the power consumption with the GPU turned off entirely to the power consumption at each GPU clock speed for 4 TPC (8 SM) and 8 TPC (16 SM) configurations. Then for each clock, I took the mid-point of the 4 TPC and 8 TPC power consumption, and subtracted the power with the GPU turned off. I set the GPU load level to high for all these tests. I feel this is a reasonable way to isolate the GPU power consumption and estimate it in a 12 SM configuration.

The power consumption figures I got for the GPU with 12 SMs are:

420.75MHz - 5.70W
522.75MHz - 6.80W
624.75MHz - 8.65W
726.75MHz - 10.85W
828.75MHz - 13.20W
930.75MHz - 16.35W
1032.75MHz - 19.90W
1236.75MHz - 29.35W

These probably overestimate the power consumption of an 8nm Drake GPU by a bit, as it uses smaller tensor cores, has fewer ROPs, and should have some small power savings by using only a single GPC rather than two. Still, I wouldn't expect the power consumption to be significantly lower than this. A few months ago I made some rough estimates of power consumption of Ampere on 8nm by measuring my RTX 3070 and scaling from there, and I got a figure of 16.1W for 12 SMs at 1155MHz. This was a very crude estimate (just take the reported power consumption from GPU-Z and divide by the number of SMs), and seems to come in a bit lower than we see above. I'd err on the side of trusting Nvidia's official tool for estimating Orin power consumption over my rough estimate based on a GPU 4x as big, though.

I believe the original Switch's GPU consumed somewhere around 3W in portable mode, which means we're already pushing things at the base clock of 420MHz. Even if Drake comes down to about 5W at 420MHz from the smaller tensor cores and other optimisations, that's still a big increase. It's impossible to say how power consumption behaves at lower clocks, and maybe they could squeeze it in at the same 384MHz clock as the original Switch, but my guess is if Nvidia doesn't clock below 420MHz in any configuration, it's because it's not efficient to do so. I'm still of the opinion that running a full 12 SM GPU on 8nm isn't feasible in handheld mode for a device the same size as the current Switch, which means either disabling SMs in handheld mode, or a different manufacturing process.

On the docked side of things, they're more limited by cooling than power consumption, and a 768MHz clock seems reasonable with a similar cooling set-up to the original Switch, if running a bit hotter. They could possibly push up higher to 900MHz-1GHz or so if they really wanted to, but it's more a question of Nintendo's tolerance for a more expensive cooling setup and/or more fan noise than a technical limitation.

For the CPU, as it's not possible to turn the CPU off entirely, I simply took the difference between the 4 core and 12 core CPU configuration for each clock, again with the CPU load level set to high. As a bit of a sense-check, I also checked the reported power consumption for each clock on an 8 core configuration, and validated that the difference in power draw between 4 and 8 cores was approximately the same as between 8 and 12 cores. This is true typically to within 0.1W, so I think it's reasonable to assume the power cost is linear with respect to the number of clusters activated (ie if going from 1 to 2 clusters adds the same amount of power consumption as moving from 2 to 3, then we can infer that going from 0 to 1 cluster of cores costs a similar amount).

The figures I got for an 8 core CPU are as follows:

1113.6MHz - 2.2W
1267.2MHz - 2.5W
1497.6MHz - 3.1W
1651.2MHz - 3.8W
1728.0MHz - 4.1W
1881.6MHz - 4.9W
2035.2MHz - 5.8W
2188.8MHz - 7.1W

CPU power consumption on Drake might be a bit lower due to using a single cluster and potentially less cache, but I would expect only pretty small differences here. Compared to the GPU, though, I don't think there's as big an issue. The CPU on the original Switch consumed a bit under 2W, so Nintendo could clock a bit higher than the old 1GHz. While ~1.1GHz or so might not sound like much, an 8 core A78 clocked at 1.1GHz (with 7 cores for games) is still a huge increase over 4 A57 cores (with 3 for games) at 1GHz. If they push the power draw of the CPU up a bit (which I feel is more likely than increasing handheld GPU power draw), then at 3W they could get up to around 1.5GHz, which would put them in a pretty nice spot.

The CPU side of things is a bit better than I expected, to be honest. You're obviously not going to clock 8 A78s up to 2GHz+ on 8nm in Switch's form-factor, but they don't have to clock them down to oblivion to get them to work. It would still be comfortably more capable than PS4/XBO's CPUs in a portable device while on an old manufacturing process, which isn't a bad position to be in.
So I messed around with the tool itself, I avoided all the on and off BS because it wasn't what we were trying to figure out, and since we were looking for the max power draw, with T234's configurations and architecture weirdness (A78AE on 3 clusters instead of A78C on one cluster and different gen Tensor cores, among other things), I just picked a CPU clock, choose the minimum CPU cores (2), clicked the estimate power button, noted it, then turned the core count to 10 cores, then subtracted the noted number from the new one... This gives you a power draw greater than Drake's 8 A78C cores, because it is on 3 clusters instead of one, with the A78AE cores which are bigger and have extra logic drawing power...

For the CPU I set the clock to 1497MHz on 2 cores with a high load, the other settings don't matter, because I'm isolating the CPU, the only number that matters is the difference between this one and the 10 core estimate, anyways base power draw here was 9.2w at whatever other settings I have and 2 cores at 1497MHz, with 10 cores it draws 12.4w at the same clock with a high load... this is 3.2w for this clock, remember Drake will be more energy efficient since it's 1 cluster with a smaller A78C core without the extra logic...

Next we have the GPU, I did this same thing, 2TPC (4SM) at 420MHz with a high load and saw 11w, with 8TPC at 420MHz, it estimated 15.7w, meaning Orin's additional 12SM is drawing 4.7w here...

CPU 8*A78AE at 1.5GHz for 3.2w on a High load + GPU 12SM at 420MHz for 4.7w on a high load, 7.9w... I'd suggest Orin's tweaks would net maybe half a watt over Drake's configuration, so more like 7.4w on a high load for both the CPU and GPU with those above clocks.

Now it's time to look at Erista, see we haven't been looking at high load, we've been looking at power consumption during Zelda botw, which we know lasts around 3 hours of battery life and draws 7.1w with the min screen brightness and 9w with the max screen brightness. We know that the battery is 4315mah, at 5V that is ~21.5wh, so at 7.1w per hour, you'd drain about the entire batter after 3 hours of gaming with Zelda. However, we know that other games have pushed Erista much harder, with battery life only being 2 hours, or near 11w power draw, though we can suggest the screen probably was set somewhere in the middle, we know that during Zelda gameplay, SoC is between 5.3w and 5.5w on average, this means maximum power draw for Erista is over 7w for the SoC.

With the numbers I've ran, in the method I've used above, and the understanding that Orin isn't as efficient for gaming as Drake would be, these clocks are what make sense on 8nm:
CPU: 8*A78C at 1.5GHz
GPU: 12SM at 460MHz in portable and 768MHz in docked (read ahead)


I also wanted to give Orin's medium load numbers, as this would match up better with Zelda's gaming numbers, with the clocks above, Nvidia power tools gives me 2.1w for the CPU and 4.2w for the GPU (it should be noted that the change being so small in the GPU when load is reduced, probably means it is near the low end for the power curve), because of the small change here, they might look at Switch's 460MHz clock for portable still, which would add around .5w here at medium load.

This gives you 6.3w with Orin at those clocks and Drake's core count, remember you'd probably see a reduction of maybe half a watt moving to Drake, so ~5.8w, but with Switch's portable clock of 460MHz, you'd be back up to ~6.3w.

Docked with the GPU clocked at 726.75MHz, it would draw 10w with Orin's chip on a high load, however with 828.75MHz it draws 12.5w, so with Switch's 768MHz, it should draw about 11w on a high load, however at 768MHz for Orin on medium load, we are looking at just under 10w. Again Drake should be more efficient here, so you could see a couple 10th of a watts shaved off, suggesting this clock to be valid for Drake as well, pushing the SoC up to only ~12w in docked mode in a medium load and ~14w for a high load.

That's where I'm going with this, this is the simplest way to use this Nvidia power tools, all I'm doing is isolating what 8 Orin CPU cores adds to the power draw regardless of the extra architecture it would have, same with the GPU and adding 12SM, these are more power draw than Drake should see with Orin's added complexity, so it's save to use these and reduce slightly.

The clocks I gave earlier were based on BlackTangMaster's use of Nvidia Power Tools and whatever method he mathed out, but I couldn't figure that out and what made sense for me was to just subtract the base power draw from the CPU and GPU configurations in isolation, the earlier clocks would probably work with Samsung 5nm, which Drake could very much be, we don't know, it's tough to guess the future, so if we see those higher clocks, I'd generally assume the chip was shrunk to a more advance node, this is something Nintendo did with the Wii back in the day, moving from 130um to 90nm from Gamecube to Wii, and 5nm has been available for years with Samsung losing major companies from that process node, I wouldn't write off Nvidia and Nintendo going with 5nm for Drake.
 
Nintendo's audience is going to expect everything to just work.
Just as a side note: the Nintendo user experience usually is impeccable. Apple unrightfully has the connotation that "it just works" (source: 12 years of cursing at MacBooks), but Nintendo is the company that deserves it. Every time I briefly return to PC gaming I'm reminded of this.
 
Good morning folks, anything interesting over the night?

What is the general difference between early devkits and final ones? Memory?

Long time since i was in this stuff, but i believe final devkits are using finalized hardware, and look "close" to the final product compared to the early ones that very often look like someone took random PC parts and put them somehow together.

Though there should still be differences between final devkit and retail units, most likely the amount of RAM. Devkits usually have way more of that.
 
Just as a side note: the Nintendo user experience usually is impeccable. Apple unrightfully has the connotation that "it just works" (source: 12 years of cursing at MacBooks), but Nintendo is the company that deserves it. Every time I briefly return to PC gaming I'm reminded of this.
I think the Switch is probably the worst Nintendo's been at this, which is indeed pretty remarkable. I've certainly had more game crashes and weird finicky controller sync issues than on prior systems, but... they're still very few and very far between. It's completely smooth sailing for months (if not over a year) at a time.

Except for Joy-Con drift.
 
I just had a play around with the Jetson power tool myself, and I'm getting quite different estimates than were posted before.

For the GPU, my methodology was to turn off the DVA/PVA, etc., set the CPU to a fixed value, and compare the power consumption with the GPU turned off entirely to the power consumption at each GPU clock speed for 4 TPC (8 SM) and 8 TPC (16 SM) configurations. Then for each clock, I took the mid-point of the 4 TPC and 8 TPC power consumption, and subtracted the power with the GPU turned off. I set the GPU load level to high for all these tests. I feel this is a reasonable way to isolate the GPU power consumption and estimate it in a 12 SM configuration.

The power consumption figures I got for the GPU with 12 SMs are:

420.75MHz - 5.70W
522.75MHz - 6.80W
624.75MHz - 8.65W
726.75MHz - 10.85W
828.75MHz - 13.20W
930.75MHz - 16.35W
1032.75MHz - 19.90W
1236.75MHz - 29.35W

These probably overestimate the power consumption of an 8nm Drake GPU by a bit, as it uses smaller tensor cores, has fewer ROPs, and should have some small power savings by using only a single GPC rather than two. Still, I wouldn't expect the power consumption to be significantly lower than this. A few months ago I made some rough estimates of power consumption of Ampere on 8nm by measuring my RTX 3070 and scaling from there, and I got a figure of 16.1W for 12 SMs at 1155MHz. This was a very crude estimate (just take the reported power consumption from GPU-Z and divide by the number of SMs), and seems to come in a bit lower than we see above. I'd err on the side of trusting Nvidia's official tool for estimating Orin power consumption over my rough estimate based on a GPU 4x as big, though.

I believe the original Switch's GPU consumed somewhere around 3W in portable mode, which means we're already pushing things at the base clock of 420MHz. Even if Drake comes down to about 5W at 420MHz from the smaller tensor cores and other optimisations, that's still a big increase. It's impossible to say how power consumption behaves at lower clocks, and maybe they could squeeze it in at the same 384MHz clock as the original Switch, but my guess is if Nvidia doesn't clock below 420MHz in any configuration, it's because it's not efficient to do so. I'm still of the opinion that running a full 12 SM GPU on 8nm isn't feasible in handheld mode for a device the same size as the current Switch, which means either disabling SMs in handheld mode, or a different manufacturing process.

On the docked side of things, they're more limited by cooling than power consumption, and a 768MHz clock seems reasonable with a similar cooling set-up to the original Switch, if running a bit hotter. They could possibly push up higher to 900MHz-1GHz or so if they really wanted to, but it's more a question of Nintendo's tolerance for a more expensive cooling setup and/or more fan noise than a technical limitation.

For the CPU, as it's not possible to turn the CPU off entirely, I simply took the difference between the 4 core and 12 core CPU configuration for each clock, again with the CPU load level set to high. As a bit of a sense-check, I also checked the reported power consumption for each clock on an 8 core configuration, and validated that the difference in power draw between 4 and 8 cores was approximately the same as between 8 and 12 cores. This is true typically to within 0.1W, so I think it's reasonable to assume the power cost is linear with respect to the number of clusters activated (ie if going from 1 to 2 clusters adds the same amount of power consumption as moving from 2 to 3, then we can infer that going from 0 to 1 cluster of cores costs a similar amount).

The figures I got for an 8 core CPU are as follows:

1113.6MHz - 2.2W
1267.2MHz - 2.5W
1497.6MHz - 3.1W
1651.2MHz - 3.8W
1728.0MHz - 4.1W
1881.6MHz - 4.9W
2035.2MHz - 5.8W
2188.8MHz - 7.1W

CPU power consumption on Drake might be a bit lower due to using a single cluster and potentially less cache, but I would expect only pretty small differences here. Compared to the GPU, though, I don't think there's as big an issue. The CPU on the original Switch consumed a bit under 2W, so Nintendo could clock a bit higher than the old 1GHz. While ~1.1GHz or so might not sound like much, an 8 core A78 clocked at 1.1GHz (with 7 cores for games) is still a huge increase over 4 A57 cores (with 3 for games) at 1GHz. If they push the power draw of the CPU up a bit (which I feel is more likely than increasing handheld GPU power draw), then at 3W they could get up to around 1.5GHz, which would put them in a pretty nice spot.

The CPU side of things is a bit better than I expected, to be honest. You're obviously not going to clock 8 A78s up to 2GHz+ on 8nm in Switch's form-factor, but they don't have to clock them down to oblivion to get them to work. It would still be comfortably more capable than PS4/XBO's CPUs in a portable device while on an old manufacturing process, which isn't a bad position to be in.
It seems I made a mistake during my calculations. There definitely is a variation of at least 37.7W on the GPU side. I will have to redo my chart.
 
Just as a side note: the Nintendo user experience usually is impeccable. Apple unrightfully has the connotation that "it just works" (source: 12 years of cursing at MacBooks), but Nintendo is the company that deserves it. Every time I briefly return to PC gaming I'm reminded of this.
I feel the same all the time when on my SteamDeck. Always having to figure out why shit isnt working and look things up online. Going back to Switch always seems like a relief, just back to immediate playing without having to worry about all these minor issues.

Well in the end everything has its pro and cons.
 
What is the general difference between early devkits and final ones? Memory?
Good morning folks, anything interesting over the night?



Long time since i was in this stuff, but i believe final devkits are using finalized hardware, and look "close" to the final product compared to the early ones that very often look like someone took random PC parts and put them somehow together.

Though there should still be differences between final devkit and retail units, most likely the amount of RAM. Devkits usually have way more of that.
Bonejack pretty much nailed it, but I thought I'd elaborate a little more.

The time between hardware being finalized - including the chip design/testing - and release is usually under a year, perhaps under two quarters. Obviously, that's a lot shorter than it takes to develop a game. In order to have games ready for launch, early devkits exist as sort of weird proxies of the target performance level that usually utilize some sort of similar architecture. Generally, the reasons these work is because they're far less efficient; whatever the new hardware is capable of was possible many years ago, just not in the Switch's form factor. You can think of them as purpose-built PCs designed to run at an estimate of the speed Nintendo is targeting.

But obviously, once the real thing is available, you need to validate that your game runs as expected on that. That's what final dev kits are for. They contain the actual Switch hardware running the actual OS, and are all Nintendo provides once they have them ready. They can also contain a few extra bells and whistles that facilitate game testing (such as extra ports like network and wired controller ports), and across the board have extra RAM to run software debugging tools without taking away from the RAM available to the game. Switch dev kits have double the RAM, for example.
 
Thinking beyond the launch of this device, I’m very interested to see how/if the industry shifts if the power increase causes Nintendo to encroach upon the market share of the other 2 major platform holders. If this thing is able to reach near parity on major 3rd party ports like COD/Warzone, NBA2K, Madden, Assassin’s Creed, I wonder how many consumers will start to look at Drake as a truly viable Primary console.

I’m sure everyone in the industry has been keeping a close eye on it with how Switch moves software. And Now we’re seeing bigger focus on the PC handheld releases despite those existing for a while now. SteamDeck 2 is pretty much a given now well the current iteration is doing.

It gonna do great regardless but it just seems like generally the Switch’s success is a potential major plot twist to where the Industry is headed fast. Trends that are taking over like hardware agnosticism for software, and death of physical media, the Switch seems to go against all of that as it’s extremely strong in those areas. Sure technically, Switch games can easily be played elsewhere but people don’t want to play them elsewhere. Don’t really think the industry expected people to enjoy a hybrid experience so much and want to see what Drake does and if they really need to make a move to serve that market as well.
 
Switch games will need a patch to take advantage of all Drake's features, but dynamic resolution and unstable/unlocked framerates will automatically increase if games just have access to Drake hardware.
 
Last edited:
Please read this staff post before posting.

Furthermore, according to this follow-up post, all off-topic chat will be moderated.
Last edited:


Back
Top Bottom