• Hey everyone, staff have documented a list of banned content and subject matter that we feel are not consistent with site values, and don't make sense to host discussion of on Famiboards. This list (and the relevant reasoning per item) is viewable here.
  • Do you have audio editing experience and want to help out with the Famiboards Discussion Club Podcast? If so, we're looking for help and would love to have you on the team! Just let us know in the Podcast Thread if you are interested!

StarTopic Future Nintendo Hardware & Technology Speculation & Discussion |ST| (Read the staff posts before commenting!)

That depends entirely on implementation. You could say the same thing about, say, RT cores, but in reality, engines like UE5 compile with hardware like that in mind to achieve the desired output, so I don't see DLSS implementations being much different in that respect.
Not saying it will be difficult to implement, but I think there would be a lot of disappointment when games like No More Heros 3, Hyrule Warriors 2 or Arceus will probably perform only slightly better if they aren‘t optimized at launch.

There is this notion around that a lot of games look and run bad on Switch mainly because it is weak hardware. While this is of course partially true, there are also just many games that are badly optimized or have other issues (I mean Pokemon has among other problems a really badly implemented art style) because it is more difficult and takes a lot of time to have ambitious games run well on less forgiving hardware.

I will already be excited when games with frame rate dips would stay at 30fps or when those with dynamic resolution would now have 1080p most of the time, but I could imagine that some people would expect much more of unoptimized games.
 
Last edited:
The first time we saw the clocks (for docked mode) was when a participant in that discussions (blu?) that had a rooted Shield Tv ran benchmarks and reported the sustained clocks, which were exactly the docked clocks that DF reported a few days later from their insider contacts. So, this is an argument based on ignorance, we didn't knew what were the maximum sustainable clocks for the chip, so we took Nvidia for its word on the advertised maximum and concluded that Nintendo was going to run those clocks for docked mode, because why they wouldn't?

Actually some people were very in mater (@Thraktor was very active also there), Shield TV was running at 1.9-2GHz, and people knew that Shield TV is throttling and we talking about bigger heating and power consumption at 2GHz for CPU and 1GHz for GPU.
So people logicaly assume its lower than that, and 1.5GHz for ARM A57 CPU is more than sustainable clock and people assume that will be at least around 1.5GHz..and we got 1GHz at end, much less than almost every person assume.
 
Last edited:
Watch Nintendo match 76.8 % of that. 2.3 TFLOPs "Please understand!"

5-6 TFLOPs for 39 watts for the whole system sounds way too good to be true. Especially considering steam deck runs up to 30 watts for 1.6 TFLOPs GPU max and maybe 3Ghz for it's 4 core CPU 🤔

It sounds too good to be true, but I can honestly see it happening. This whole thing has made me look at similar products on the market. The M1 max MacBook hits PS5 TFLOPs at 100 watts apparently. The steam deck unfortunately has a few things going against it. It could possibly be on an older node, and it has more power hungry x86 cpu cores. If Drake actually moved to tsmc 5nm I can see it performing that well.
 
Quick and Dirty Power Consumption Numbers:

Assuming 8nm and linear scalling from orin numbers:

Nvidia says Orin Power peak consumption is 50W. Let's assume 1W per 2GHZ CPU. So, SoC - CPU power consumption is : 50 - 12 = 38W @ 1GHz GPU. This is quick and dirty, so I will not get into memory power and other factors power consumption. So, let's us say that the rest scales linearly with the GPU clock and SM count.

Going with Thraktor's clocks, we add 4W for 4 cores at 2Ghz (or 8 at 1Ghz).

Handheld mode:

Low-end: 6 SMs @ 400 MHz - 614 Gflops FP32

38W*(6/16)*(4/10) = 5.7W , 9.7W

High-end: 8 SMs @ 600 MHz - 1.2 Tflops FP32


38W*(8/16)*(6/10) = 11.4W, 15.4W

I think that power consumption is quite manageable. The 9.7W is quite manageable for the lower end side. 15.4W is about what the deck consumes at recommended settings.

Docked mode:

Low-end: 10 SMs @ 768MHz - 1.97 Tflops FP32


38W*(10/16)*(768/1000) = 18.24W , 22.24W

High-end: 12 SMs @ 1.2GHz - 3.68 Tflops FP32


38W*(12/16)*(1.2/1) = 34.2W , 38.2W

38.2W is over what we would realistically see in the upper end, IMHO (30W).

Now, this is a very rough estimation. I would love for someone more knowledgeable to do a better one.
 
Last edited:
Quick and Dirty Power Consumption Numbers:

Assuming 8nm and linear scalling from orin numbers:

Nvidia says Orin Power peak consumption is 50W. Let's assume 1W per 2GHZ CPU. So, SoC - CPU power consumption is : 50 - 12 = 38W @ 1GHz GPU. This is quick and dirty, so I will not get into memory power and other factors power consumption. So, let's us say that the rest scales linearly with the GPU clock and SM count.

Going for Thraktor's clocks, we add 4W for 4 cores at 2Ghz (or 8 at 1Ghz).

Low-end: 6 SMs @ 400 MHz - 614 Gflops FP32

38W*(6/16)*(4/10) = 5.7W , 9.7W

High-end: 8 SMs @ 600 MHz - 1.2 Tflops FP32


38W*(8/16)*(6/10) = 11.4W, 15.4W

We need to add 4W for 4 cores at 2Ghz (or 8 at 1Ghz).

I think that power consumption is quite manageable. The 9.7W is quite manageable for the lower end side. 15.4W is about that the deck consumes at recommended settings.
The original Switch consumes what, around 9W in portable mode? I'm guessing they'll want to match that number for this, so the lower end number seems to make more sense. I can see the battery being denser but they'll probably want a minimum of 3 hours of battery life, so I don't see the power in portable mode being much higher than the original.

Drake should have a more efficient screen though, so they may have some wiggle room to increase the SoC power budget.
 
Actually some people were very in mater (@Thraktor was very active also there), Shield TV was running at 1.9-2GHz, and people knew that Shield TV is throttling and we talking about bigger heating and power consumption at 2GHz for CPU and 1GHz for GPU.
So people logicaly assume its lower than that, and 1.5GHz for ARM A57 CPU is more than sustainable clock and people assume that will be at least around 1.5GHz..and we got 1GHz at end, much less than almost every person assume.
Including digital foundry, who remarked that the CPU clock was shockingly lower than their expectations. They had devkit documents that said it would be between 1GHz and 2GHz, so there is no fault on people expecting more, however when it comes to the GPU clock, a reduction from 1GHz to 768MHz isn't bad, it is actually a realistic number we are running with right now because the node shrink should set the power curve right around 800MHz.
 
I thought I'd do a quick round-up of what we know, and give some general idea of how big our margin of error is on the known and unknown variables on the new chip.

Chip

Codenamed Drake/T239. Related to Orin/T234. We don't have confirmation on manufacturing process. The base assumption is 8nm (same as Orin), however kopite7kimi, who previously leaked info about the chip and said 8nm, is now unsure on the manufacturing process. The fact that the GPU is much larger than expected may also indicate a different manufacturing process, but we don't have any hard evidence. We also don't know the power consumption limits Nintendo have chosen for the chip in either handheld or docked mode, which will impact clock expectations.

GPU
This is what the leaks have been about so far, so we have much more detailed info here. In particular, on the die we have:

12 SMs
Ampere architecture with 128 "cores" per SM, and tensor performance comparable to desktop Ampere per SM. Some lower-level changes compared to desktop Ampere, but difficult to gauge the impact of those.
12 RT cores
No specific info on these, in theory they could have changes compared to desktop Ampere, but personally I'm not going to assume any changes until we have evidence.
4MB L2 cache
This is higher than would be expected for a GPU of this size (most comparable would be RTX 3050 laptop, with 2MB L2). Same as PS5 GPU L2 and only a bit smaller than XBSX GPU L2 of 5MB. This should help reduce memory bandwidth requirements, but it's impossible to say exactly by how much. Note this isn't really an "infinity cache", which range from 16MB to 128MB on AMD's 6000-series GPUs, it's just a larger than normal cache.

Things we don't know: how many SMs are actually enabled in either docked or handheld mode, clocks, ROPs.

Performance range in docked mode: It's possible that we could have a couple of SMs binned for yields, as this is a bigger GPU than expected. This would probably come in the form of disabling one TPC (two SMs) brining it down to 10. Clocks depend heavily on the manufacturing process and whether Nintendo have significantly increased their docked power consumption over previous models. I'd expect clocks between 800MHz-1GHz are probably most likely, but on the high end of expectations (better manufacturing process and higher docked power consumption) it could push as high as 1.2GHz. I doubt it will be clocked lower than the 768MHz docked clock of the original Switch, but that's not strictly impossible.

Low-end: 10 SMs @ 768MHz - 1.97 Tflops FP32
High-end: 12 SMs @ 1.2GHz - 3.68 Tflops FP32

Obviously there's a very big range here, as we don't know power consumption or manufacturing process. It's also important to note that you can't simply compare Tflops figures between different architectures.

Performance range in handheld mode: This gets even trickier, as Drake is reportedly the only Ampere GPU which supports a particular clock-gating mode, which could potentially be used to disable SMs in handheld mode. This makes sense, though, as peak performance per watt will probably be somewhere in the 400-600MHz range, so it's more efficient to, say, have 6 SMs running at 500MHz than all 12 running at 250MHz. Handheld power consumption limits are also going to be very tight, so performance will be very much limited by manufacturing process. I'd expect handheld clocks to range from 400MHz to 600MHz, but this is very dependent on manufacturing process and the number of enabled SMs.

One other comment to make here is that we shouldn't necessarily expect the <=2x performance difference between docked and handheld that we saw on the original Switch. That was for a system designed around 720p output in portable mode and 1080p output docked, however here we're looking at a 4K docked output, and either 720p or 1080p portable, so there's a much bigger differential in resolution, and therefore a bigger differential in performance required. It's possible that we could get as much as a 4x differential between portable and docked GPU performance.

Low-end: 6 SMs @ 400 MHz - 614 Gflops FP32
High-end: 8 SMs @ 600 MHz - 1.2 Tflops FP32

There is of course DLSS on top of this, but it's not magic, and shouldn't be taken as a simple multiplier of performance. Many other aspects like memory bandwidth can still be a bottleneck.

CPU

The assumption here is that they'll use A78 cores. That isn't strictly confirmed, but given Orin uses A78 cores, it would be a surprise if Drake used anything else. We don't know either core count or clocks, and again they will depend on the manufacturing process. The number of active cores and clocks will almost certainly remain the same between handheld and docked mode, so the power consumption in handheld mode will be the limiting factor.

For core count, 4 is the minimum for compatibility, and 8 is probably the realistic maximum. The clocks could probably range from 1GHz to 2GHz, and this will depend both on the manufacturing process and number of cores (fewer cores means they can run at higher clocks).

The performance should be a significant improvement above Switch in any case. In the lower end of the spectrum, it should be roughly in line with XBO/PS4 CPU performance, and at the high-end it would sit somewhere between PS4 and PS5 CPU performance.

RAM

Again, the assumption is that they'll use LPDDR5, based on Orin using it, and there not being any realistic alternatives (aside from maybe LPDDR5X depending on timing). The main question mark here is the bus width, which will determine the bandwidth. The lowest possible bus width is 64-bit, which would give us 51.2GB/s of bandwidth, and the highest possible would be 256-bit, which would provide 204.8GB/s bandwidth. Bandwidth in handheld mode would likely be a lot lower to reduce power consumption.

Quantity of RAM is also unknown. On the low end they could conceivably go with just 6GB, but realistically 8GB is more likely. On the high end, in theory they could fit much more than that, but cost is the limiting factor.

Storage

There are no hard facts here, only speculation. Most people expect 128GB of built-in storage, but in theory it could be more or less than that.

In terms of speeds, the worst case scenario is that Nintendo retain the UHS-I SD card slot, and all games have to support ~100MB/s as a baseline. The best case scenario is that they use embedded UFS for built-in storage, and support either UFS cards or SD Express cards, which means games could be built around a 800-900MB/s baseline. The potential for game card read speeds is unknown, and it's possible that some games may require mandatory installs to benefit from higher storage speeds.
Great summary of what is known/unknown and possible power ranges, thank you.
 
0
Very good summary, but a couple things are wrong. The portable range, you list 8SM as the maximum, but that is assuming they disable SM, the actual max range would be 12SM with a ~500MHz clock. For all we know they went with 5nm TSMC for Drake, as the original chip was codenamed Dane, and while T239 is shared between both versions of the chip, we do not know what was changed.

Storage could also be a mix of standard microSD cards and UFS storage, which a much higher speed.

RAM, there are cell phones today that have 12GBs-16GBs, with given that XBSS is 10GBs, I do think 8GBs would be enough and is the most likely result, but it shouldn't be taken for granted, just last week I think we would both agree that 12SM for the GPU seems outlandish and above the maximum we should expect.

I think Drake has exceeded our expectations so far, we will have to wait and see what other surprises Nintendo has in store for us, but it doesn't seem like they are realistically going to match a 12SM GPU with a 4 core CPU, while I agree it is the minimum, I do think 6 cores is much more likely given the cache of A78C cores and what they might be doing to mitigate memory bandwidth.

Yeah, if it's TSMC N5, then conceivably they could run all 12 SMs in portable mode, but I didn't think that was likely enough to include.

For storage, I'm more thinking about how it would impact games, where the minimum guaranteed speeds are what developers have to build around. We could have a situation where there's 900MB/s UFS internal storage and still 100MB/s MicroSD cards, in which case we get faster loading on games installed internally, but developers who are looking at asset streaming will have to build around a 100MB/s baseline.

On RAM, 8GB was just a minimum. It's a more open one, because from a theoretical perspective, there's nothing stopping them putting 64GB in there, aside from cost. I'd say 8GB or maybe 12GB is likely, with 16GB an outside possibility.

Edit:
Do you personally believe it's impossible that all 12 SMs are active in handheld mode, assuming this is still 8nm? Would the power draw be simply unrealistically high?

There was some speculation about the clock gating being specific to this device in order to allow one TPC to be active during standby/sleep mode occasionally.

If it's still 8nm, I'd be surprised if there are more than 6 SMs active in handheld mode. However, my assumption has always been that this is a device with the same form factor as the original Switch. Maybe that's wrong, though. If this is a larger or thicker device (which would negate the "uses the same dock as the OLED model" theory), then conceivably there could be a much bigger battery and therefore higher power draw in portable mode.
 
Last edited:
Yeah, if it's TSMC N5, then conceivably they could run all 12 SMs in portable mode, but I didn't think that was likely enough to include.

For storage, I'm more thinking about how it would impact games, where the minimum guaranteed speeds are what developers have to build around. We could have a situation where there's 900MB/s UFS internal storage and still 100MB/s MicroSD cards, in which case we get faster loading on games installed internally, but developers who are looking at asset streaming will have to build around a 100MB/s baseline.

On RAM, 8GB was just a minimum. It's a more open one, because from a theoretical perspective, there's nothing stopping them putting 64GB in there, aside from cost. I'd say 8GB or maybe 12GB is likely, with 16GB an outside possibility.

Edit:


If it's still 8nm, I'd be surprised if there are more than 6 SMs active in handheld mode. However, my assumption has always been that this is a device with the same form factor as the original Switch. Maybe that's wrong, though. If this is a larger or thicker device (which would negate the "uses the same dock as the OLED model" theory), then conceivably there could be a much bigger battery and therefore higher power draw in portable mode.
Do you have an estimate on how many watts you'd think a 12SM 8nm GPU would draw at base Switch clocks? I'm just trying to get a sense of the power requirements here, a lot of people keep talking about what is or isn't feasible but I'm unsure if we have any real world power numbers to work from here or if these are assumptions.
 
And it won't use 39 Watts for the whole system docked. It will be way less. 39W is probably the worst-case scenario, where the Switch will be playing while recharging.
I think we're focusing too much on the 39 Watts thing.

Assuming the OLED doc is compatible with the Super Switch, why now? Are they going to sell a cheaper version without the doc for people who already own one? If they do, would that cause confusion for the average folks that don't follow tech? Like a parent buys a Switch for their kids and they just bought the cheaper one thinking their dock also works, but their system is an older model.
 
I think we're focusing too much on the 39 Watts thing.

Assuming the OLED doc is compatible with the Super Switch, why now? Are they going to sell a cheaper version without the doc for people who already own one? If they do, would that cause confusion for the average folks that don't follow tech? Like a parent buys a Switch for their kids and they just bought the cheaper one thinking their dock also works, but their system is an older model.
I think the idea is that they would build up their production lines for the OLED dock and some of the OLED components like the screen, and then reuse much of that for the Drake model, so that by the time the SoC is in mass production they'd have very few other bottlenecks in their supply chain.
 
I think we're focusing too much on the 39 Watts thing.

Assuming the OLED doc is compatible with the Super Switch, why now? Are they going to sell a cheaper version without the doc for people who already own one? If they do, would that cause confusion for the average folks that don't follow tech? Like a parent buys a Switch for their kids and they just bought the cheaper one thinking their dock also works, but their system is an older model.
The assumption is that Nintendo might re-use some OLED components (Like Screen, Dock) or maybe allow people to use their OLED Docks.
 
1.4 TFlops portable(Same 460MHz clock of current Switch Portable) and 2.3 TFlops Docked(Same 768MHz clock od current Switch Docked) would still be amazing bro. And it won't use 39 Watts for the whole system docked. It will be way less. 39W is probably the worst-case scenario, where the Switch will be playing while recharging.
i quoted someone mentioning 5-6 tflops. 5-6 tflops at 39 watts in docked would be too good to be true. And yes that would while it's charging with joycons.

But yes, 1.4 TFLOPs at 10 watts is also pretty crazy on an 8nm Samsung mode. We would really need something like 6 or 5nm.They might go 50% lower than that at 307Mhz around 900 FFLOPs.. which I could see at 10 watts, and perhaps 1.4 TFLOPs at 15 watts (if Nintendo dares that). I dunno. maybe they won't use the 1p, 15 and 25 watt performance profiles at all. But perhaps 2.3 is doable at 25 watts for the whole system with 768Mhz GPU.
 
i quoted someone mentioning 5-6 tflops. 5-6 tflops at 39 watts in docked would be too good to be true. And yes that would while it's charging with joycons.

But yes, 1.4 TFLOPs at 10 watts is also pretty crazy on an 8nm Samsung mode. We would really need something like 6 or 5nm.They might go 50% lower than that at 307Mhz around 900 FFLOPs.. which I could see at 10 watts, and perhaps 1.4 TFLOPs at 15 watts (if Nintendo dares that). I dunno. maybe they won't use the 1p, 15 and 25 watt performance profiles at all. But perhaps 2.3 is doable at 25 watts for the whole system with 768Mhz GPU.
I don't think 1.4Tflops is realistic at 10W at 8nm. At 10W we are around 700-1Gflops at 8nm.
 
0
If portable mode practical top-end is constrained by screen (720p, 60hz) even more than battery/power draw, at some point more GPU horsepower is mostly meaningless, no? Especially with the availability of DLSS.

I imagine that theoretical threshold is quite a bit below Thraktor's "Portable Maximum" scenario, but of course this requires a heavy assumption on the screen res/refresh rate.
 
I think the idea is that they would build up their production lines for the OLED dock and some of the OLED components like the screen, and then reuse much of that for the Drake model, so that by the time the SoC is in mass production they'd have very few other bottlenecks in their supply chain.
The assumption is that Nintendo might re-use some OLED components (Like Screen, Dock) or maybe allow people to use their OLED Docks.

Alright, fair point.

Here's the other thing, aren't we being overly optimistic about the wattage? Like would the system really use 39 watts docked? Doesn't some of the wattage have to be reserved for charging and the USB ports? So we really wouldn't be feeding the system all 39 watts.
 
If portable mode practical top-end is constrained by screen (720p, 60hz) even more than battery/power draw, at some point more GPU horsepower is mostly meaningless, no? Especially with the availability of DLSS.

I imagine that theoretical threshold is quite a bit below Thraktor's "Portable Maximum" scenario, but of course this requires a heavy assumption on the screen res/refresh rate.

Yes.... but I see dlss being thrown around in ways that I think might be misunderstanding what it is (not your post) a lot of people around the internet suddenly exposed to it seem to think it's just like a massive general purpose power boost.

Dlss will offload overhead pertaining to IQ aspects of graphical fidelity, namely resolution and stuff like aa for the higher res image, to tensor cores. It's such a performance booster, because increasing resolution is a demand multiplier for everything related to pixels. So you can have the same scene, with the same geometry, the same number of characters on screen, the same number of texture tiles, and layers, the same per pixel lighting, everything the same, and simply increasing the resolution from 1080p to 4k requires a lot of shader core power... Even though everything else is the still exactly the same. Dlss just takes that base exactly the same 1080p image, and offloads getting it up to 4k somewhere else on the die. (Except irrc post processing effects? Anyone know if am I remembering that right?)

It won't do diddly squat for actual lighting, geometry, and effects, everything needed to actually build that virtual 3d space. You still need actual GPU horsepower for that. So I guess the yes comes down to, if you are satisfied with what your GPU can produce at N resolution, the quality and quantity of geometry, textures, lighting, effects, etc if you feel that attains parity or acceptable levels of such with your competition, or whatever goal you have set, THEN yeah, there is no point in increasing GPU horsepower, as you can offload IQ/resolution to dlss to make it the rest of the way.
 
Alright, fair point.

Here's the other thing, aren't we being overly optimistic about the wattage? Like would the system really use 39 watts docked? Doesn't some of the wattage have to be reserved for charging and the USB ports? So we really wouldn't be feeding the system all 39 watts.
Well the main thing is that the SWOLED Dock has the ability to likely feed the Drake Switch at least 25 Watts, if not 30W.
 
Alright, fair point.

Here's the other thing, aren't we being overly optimistic about the wattage? Like would the system really use 39 watts docked? Doesn't some of the wattage have to be reserved for charging and the USB ports? So we really wouldn't be feeding the system all 39 watts.
The point people are making from the increased wattage of the Dock is that the next machine will use more energy while docked than the OG Switch will use. The OG Switch Dock allowed up to 18/9W and OG Switch used 11W fully charged on the Dock and 15W fully discharged. If the OLED allows up 39W, doesn't mean the next machine will use 39W, as it also has to supply energy for the ports, charge the Switch, Joy-Cons, you get it. But it means the next machine might use more energy (15/20W) while docked. That would point that Nintendo might be clocking this next machine higher while docked, to give a very insane boost in docked performance.
 
The point people are making from the increased wattage of the Dock is that the next machine will use more energy while docked than the OG Switch will use. The OG Switch Dock allowed up to 18/9W and OG Switch used 11W fully charged on the Dock and 15W fully discharged. If the OLED allows up 39W, doesn't mean the next machine will use 39W, as it also has to supply energy for the ports, charge the Switch, Joy-Cons, you get it. But it means the next machine might use more energy (15/20W) while docked. That would point that Nintendo might be clocking this next machine higher while docked, to give a very insane boost in docked performance.

I know, but I'm looking at Pedro's numbers and his most optimistic numbers get to around 39 watts.
 
0
The hackers said they would dump the Nvidia data they stole today if Nvidia didn't meet their demands. Nvidia didn't and we're waiting. Haven't seen anything so far. It's a possibility that they might not even release anything and it was just a bluff. In case nothing is released, our best bet for new info will be this year GDC.
Thanks.
I wish we didn't have to rely on outlaw hackers 😑
 
The hackers said they would dump the Nvidia data they stole today if Nvidia didn't meet their demands. Nvidia didn't and we're waiting. Haven't seen anything so far. It's a possibility that they might not even release anything and it was just a bluff. In case nothing is released, our best bet for new info will be this year GDC.
Looking like a bluff....
 
Do you have an estimate on how many watts you'd think a 12SM 8nm GPU would draw at base Switch clocks? I'm just trying to get a sense of the power requirements here, a lot of people keep talking about what is or isn't feasible but I'm unsure if we have any real world power numbers to work from here or if these are assumptions.
Not exactly, but my thought process is that, with 6x as many SMs as TX1, then this new chip would have to be 6x as efficient to run them at the same clocks. I would expect Ampere on 8nm to be quite a bit more efficient than Maxwell on 20nm, but not to the point of providing 6 times the performance per Watt.

I'm going to see if I can get some measurements from my RTX 3070. Obviously it's a much bigger GPU, and simply dividing by the number of SMs isn't going to translate exactly, and I don't think most of the tools like MSI Afterburner actually allow you to set clocks as low as 400MHz or 500MHz, but I'll play around with it and see if I can get any rough numbers for power consumption.
 
Looking like a bluff....
It is not a bluff.
They freaking released the drivers and all that earlier.

They could be waiting until tonight as a "Friday is your last day" thing and we just misinterpreted it shrug.

Can't tell with Hackers.
 
Quoted by: Sol
1
It is not a bluff.
They freaking released the drivers and all that earlier.

They could be waiting until tonight as a "Friday is your last day" thing and we just misinterpreted it shrug.

Can't tell with Hackers.
Oh I know they have the info. I meant a bluff as far as releasing it today. They may refine their demands or issue another ultimatum or something instead of giving up all their leverage today.
 
Would it require major engine work, to properly utilize Ampere?

I mean famously, Ampere is a significant downgrade over Turing in gaming performance per flop because the architecture is more compute heavy, than it is razterizjng efficienct.

Here is a Reddit post detailing the issue.

 
Last edited:
Would it require major engine work, to properly utilize Ampere?

I mean famously, Ampere is a significant downgrade over Turing in gaming performance per flop because the architecture is more compute heavy, than it is razterizjng efficienct.

Here is a Reddit post detailing the issue.


Yeah, it seems that the root cause of that comes from all those shader cores that Ampere introduced being starved of bandwidth.

NVIDIA seems to be addressing that with the absolutely massive L1 and L2 caches of Lovelace and in Orin/Drake, and we can see this to a lesser extent in Ampere itself with seeing how jumping from GDDR6 to GDDR6X increases performance by a decent bit due to that bandwidth feeding the cores better
 
I agree it should be very good machine in any case, just think people should lower expectations compared to more optimistic expectations,
especially when comes to clocks.

I would said that BC is certain in any case.




I talking about general expectations based on credible rumors and leakers not about troll comments (it will be little stronger than Vita, will sell like Wii U..).
Before Switch revel, based on rumors (on NeoGaf and plenty people from here were also there back than) most people expected performance similar to XB1.
For instance, I also remember when was generally expected that Switch ARM CPU will be clocked at 1.5GHz only few months before Switch was launched,
and then disappointed few months later when actually proved its 1GHz.




I dont think thats a reason, I mean you can also say that Nvidia GPU architecture is also more efficient,
simple people forget about memory bandwith or dont like to compare memory bandwith of next Switch with PS4/PS4Pro/XsS.

New Switch will almost certain have stronger CPU and probably stronger GPU even without DLSS compared to PS4,
but it will most likely have weaker memory bandwith compared to PS4, not to mention PS4 Pro or XsS.
Not at all, the memory bandwidth Situation has been discussed to death numerous times before and the conclusion that everyone agree with us that more cache can help reduce or mitigate the requirements for memory bandwidth. We already know Nintendo isn’t a stranger to having more expensive memory and has previously gone the route of attempting to mitigate the bandwidth issue with using embedded memory. It doesn’t eliminate it, but reduces it.

While it could have 88-102GB/s from just the memory modules (LPDDR5), for portable and docked mode, the large L2 cache relative to the amount of shaders it has leaves it in a much more favorable spot compared to the PS4 and by extension the PS4 Pro which had very inefficient architectures with respect to memory bandwidth utilization compared to NVidia’s offerings especially compared to Ampere. Series S is RDNA2, but lacks the feature that helps with the memory efficiency greatly which is the infinity cache(L3 cache pool), that isn’t to say that it isn’t more efficient vs the GCN architecture, but NV’s offerings are still more efficient with memory bandwidth requirements.

The product having a large L2 compared to the amount of shaders it has greatly aid it in not being as strict with the memory bandwidth situation despite there being only 88-102GB/s most likely. The exact metric is unknown, but considering the closest is the PS4(lowers common denominator), it should be able to perform equally with the memory bandwidth it has available or a bit better.


In essence, it is doing more with less or rather doing about enough/the same with less. And it is going a route that Nintendo isn’t a stranger to with respect to bandwidth and memory, in fact their philosophy with their own architectures seem to be that the implementation is to go the more efficient route of achieving a result than just cramming it all and cranking it all the the max to get the same result.


Of course, others do that too, but Nintendo goes an extra level above all else.


It won't do diddly squat for actual lighting, geometry, and effects, everything needed to actually build that virtual 3d space. You still need actual GPU horsepower for that. So I guess the yes comes down to, if you are satisfied with what your GPU can produce at N resolution, the quality and quantity of geometry, textures, lighting, effects, etc if you feel that attains parity or acceptable levels of such with your competition, or whatever goal you have set, THEN yeah, there is no point in increasing GPU horsepower, as you can offload IQ/resolution to dlss to make it the rest of the way.
considering the mesh shaders, the geometry part is at least mitigated or reduced. And considering people are really happy and impressed by PS4 tier games which should be quite feasible on this device in theory, I think it’ll be ok to really good for most people ;).

So, even if it has the fidelity of a lower powered machine, it would still look pretty great to many. Though, this is stepping into diminishing returns territory discussion.
 
Not at all, the memory bandwidth Situation has been discussed to death numerous times before and the conclusion that everyone agree with us that more cache can help reduce or mitigate the requirements for memory bandwidth. We already know Nintendo isn’t a stranger to having more expensive memory and has previously gone the route of attempting to mitigate the bandwidth issue with using embedded memory. It doesn’t eliminate it, but reduces it.

While it could have 88-102GB/s from just the memory modules (LPDDR5), for portable and docked mode, the large L2 cache relative to the amount of shaders it has leaves it in a much more favorable spot compared to the PS4 and by extension the PS4 Pro which had very inefficient architectures with respect to memory bandwidth utilization compared to NVidia’s offerings especially compared to Ampere. Series S is RDNA2, but lacks the feature that helps with the memory efficiency greatly which is the infinity cache(L3 cache pool), that isn’t to say that it isn’t more efficient vs the GCN architecture, but NV’s offerings are still more efficient with memory bandwidth requirements.

The product having a large L2 compared to the amount of shaders it has greatly aid it in not being as strict with the memory bandwidth situation despite there being only 88-102GB/s most likely. The exact metric is unknown, but considering the closest is the PS4(lowers common denominator), it should be able to perform equally with the memory bandwidth it has available or a bit better.


In essence, it is doing more with less or rather doing about enough/the same with less. And it is going a route that Nintendo isn’t a stranger to with respect to bandwidth and memory, in fact their philosophy with their own architectures seem to be that the implementation is to go the more efficient route of achieving a result than just cramming it all and cranking it all the the max to get the same result.


Of course, others do that too, but Nintendo goes an extra level above all else.



considering the mesh shaders, the geometry part is at least mitigated or reduced. And considering people are really happy and impressed by PS4 tier games which should be quite feasible on this device in theory, I think it’ll be ok to really good for most people ;).

So, even if it has the fidelity of a lower powered machine, it would still look pretty great to many. Though, this is stepping into diminishing returns territory discussion.
yeah I feel pretty optimistic with 102GB/s and ampere's/orion's improved architecture efficiencies. Not to mention
1. Much more so with the increased cache.
2. have DLSS to boot!

Switch was doing surprisingly well with some ports at 25 GB/s. I think I remember hearing @Z0m3le saying (on resetera) that GPU cards that were around around xbone tier with 50 GB/s bandwidth were holding their own in performance vs xbone and PS4 in ports.

I'm really excited to see Doom 2016 and Eternal performance on handheld on Switch 2. We went from 600p 30fps docked performance with worse textures and models vs 900p-1080p 60 fps on xbone and PS4 versions.. which sounds pretty insane, but they were really pushing Switch's bottlenecks (CPU and bandwidth) , as well as the GPU being 2.5-3.5x weaker on switch (taking mixed precision mode into account.. and doing a rough ballpark).

At least it shouldn't have a problem running 4k switch games natively, and that's without DLSS, if we do get SMs.

I'm not expecting ps4 game asset performance at 4k. Not 2k native 2k without DLSS.

It's going to be interesting. I'll be happy with just xbone perfomance on handheld (720p 6pfps Doom). But I hope that doesn't become outdated too soon. So a 2-3 TFLOP docked performance without DLSS would be be future proofing it.


Anyways, what do you guys expect the CPU to be in relation to current gen PS5? We've been through this before but I heard the performance per clock is the same on a78 vs PS5/XsS. Maybe a 3x gap like switch vs PS4 if we get 8 cores and it's clocked at 1.2Ghz for 7.
1.5 would put it closer to 2x gap..
 
Last edited:
0
Not at all, the memory bandwidth Situation has been discussed to death numerous times before and the conclusion that everyone agree with us that more cache can help reduce or mitigate the requirements for memory bandwidth.

This isn't a "Switch" problem anyway - this is a whole industry problem. Fat buses are expensive, but the trend is toward more parallelism. Big, fat caches over slightly constrained busses are going to be the norm.
 
Is the potential 4MB L2 cache only for the GPU right? If I remember correctly the PS4 and PS4 Pro has 2x 2MB L2 caches that is shared between the CPU and GPU. hopefully the CPU has its own L2 cache so that is a big upgrade over the last Gen architecture
 
Is the potential 4MB L2 cache only for the GPU right? If I remember correctly the PS4 and PS4 Pro has 2x 2MB L2 caches that is shared between the CPU and GPU. hopefully the CPU has its own L2 cache so that is a big upgrade over the last Gen architecture
Well yeah the 4MB of L2 (Sort of confirmed in the NVN2 Driver leak), is just for the GPU which is a massive boost in effective bandwidth for Drake over PS4 as the A78/A78Cs will have their own Cache to themselves so it could theoretically have double the Caches of the PS4s.
 
0
Looks like Lapsus$ is releasing Samsung info today instead of more Nvidia hardware info. IIRC I think Zombie said Samsung got contracts for multiple components of this new Switch?

Edit: Looking over their list, I doubt we'll see anything relating to the topic.
 
Looks like Lapsus$ is releasing Samsung info today instead of more Nvidia hardware info. IIRC I think Zombie said Samsung got contracts for multiple components of this new Switch?

Edit: Looking over their list, I doubt we'll see anything relating to the topic.
What do you mean their list?

They did say that they had 250GB of "Hardware Product Information, both of current and unreleased products"
 
Is the potential 4MB L2 cache only for the GPU right? If I remember correctly the PS4 and PS4 Pro has 2x 2MB L2 caches that is shared between the CPU and GPU. hopefully the CPU has its own L2 cache so that is a big upgrade over the last Gen architecture
IIRC it’s 2MB of L2 cache per CPU cluster and it had 2 4Core Clusters, the GPU has its own L2 cache but it’s a lot less.


I’m gonna give an estimate on the amount they probably have based on the GCN architecture:

PS4: 288KB L1$/ 256-512KB L2$

XBox One: 192KB L1$/ 256KB L2$

PS4 Pro: 576KB L1$/ 768KB-1024KB L2$

XBox One X: 640KB L1$/ 1024KB L2$

Though if anyone knows the amount of L1/L2 cache from leaks, that would be easier.

Point is, they probably have less.
 
IIRC it’s 2MB of L2 cache per CPU cluster and it had 2 4Core Clusters, the GPU has its own L2 cache but it’s a lot less.


I’m gonna give an estimate on the amount they probably have based on the GCN architecture:

PS4: 288KB L1$/ 256-512KB L2$

XBox One: 192KB L1$/ 256KB L2$

PS4 Pro: 576KB L1$/ 768KB-1024KB L2$

XBox One X: 640KB L1$/ 1024KB L2$

Though if anyone knows the amount of L1/L2 cache from leaks, that would be easier.

Point is, they probably have less.
Drake would likely be

CPU: 2MB L2??? (1MB per cluster for 2 clusters???): 3-6MB L3???
GPU: 2.3MB L1 + 4MB L2

for the CPU I am extrapolating off of Orin which is 12 Cores (3 clusters?)
 
Drake would likely be

CPU: 2MB L2??? (1MB per cluster for 2 clusters???): 3-6MB L3???
GPU: 2.3MB L1 + 4MB L2

for the CPU I am extrapolating off of Orin which is 12 Cores (3 clusters?)
Around that, yeah.


Though I think A78 can have more L2 cache?
 
0
That whole hacking situation is absolutely terrible.
Theoretically it should be. None of us here know if the exact cooling requirements will change, but the fact that the API sees 12SMs means that's what the chip has. It's essentially confirmed.

It's highly likely this will reuse the OLED model s dock, so any increase in size will have to be small enough to allow it to fit there still. It can get wider by a few mm and thicker by a few mm but not much beyond that.

39W is the theoretical maximum for playing while also charging the battery and joycons. And possibly some reserved for the USB ports but I'm not clear on that. The wattage supplied to the unit should be spread out enough not to make it absurdly hot.

Also IIRC Digital Foundry did some testing on a hacked Switch to see what kind of temperatures you'd get at high enough wattage and it was never really too hot to the touch.


EDIT: Also, welcome to Fami!
Thanks for the reply. It's crazy how much of a jump this seems to be over Switch. I guess that's the result of 7 years of technological advancements.
Hopefully it's still on schedule for fall 2022/spring 2023 and we won't have to wait for the announcement too long.
 
Reference A78 allows for 256 KB to 512 KB of L2 cache per core (and they should be private to each core?).
Orin opts for 256 KB L2 per core. The spec sheet adds them up to 3 MB total, but I think that they're private to each core.
Orin divides the 12 A78AE cores into 3 clusters of 4 each. Each cluster has its own 2 MB of L3 cache, then there's a 4 MB 'system cache' for the entire CPU complex. Unsure if the GPU has access to that, or if the 'system cache' is effectively a L4 cache for the entirety of the CPU complex.
Speaking of L3 cache, reference A78 allows for 512 KB to 4 MB per cluster, while A78C allows up to 8 MB per cluster. (and this increase in max L3 cache gets carried over to the new DSU with the ARM v9 updates)
 
0
Please read this staff post before posting.

Furthermore, according to this follow-up post, all off-topic chat will be moderated.
Last edited:


Back
Top Bottom