• Hey everyone, staff have documented a list of banned content and subject matter that we feel are not consistent with site values, and don't make sense to host discussion of on Famiboards. This list (and the relevant reasoning per item) is viewable here.

StarTopic Future Nintendo Hardware & Technology Speculation & Discussion |ST| (Read the staff posts before commenting!)

Do you think the rise of handheld PC gaming could (indirectly) benefit the Switch/SwitchNext (<- if it's a hybrid again), when it comes to engine optimizations related to the "special needs" of said devices.

Of course, and I'm probably answering my own question here, engines already include mobile-specific settings/optimizations that help devs scale their games (down) accordingly, so I assume this is nothing new.

At the same time, I would guess that (at least) the popular engines (Unreal, Unity) already include Switch-specific build options, so maybe there's no point in looking into this?
 


So if your game is well equipped to use nvme speeds, there will be a good sized difference. But slow games will be similar.

Since UFS is a good deal slower than nvme, the difference wouldn't nearly be as large. Since I don't think Switch even hits 100MB/s for its SD card slot, we might still see a big increase in speeds

Given the limitations of SD Card, I still expect 100mb/s to be the standard, but for real world performance to be generational in its improvement thanks to the vastly improved CPU (thanks for the graph, Oldpuck!) and the dedicated file decompression hardware. I don't see Nintendo allowing any developer to force an install to internal storage, however I will concede that they did allow that on Wii and Wii U for games that didn't work straight off the disk. I think partial installs would be acceptable, ultimately, but for the most part, games will function straight from the Game Card.
 
Do you think the rise of handheld PC gaming could (indirectly) benefit the Switch/SwitchNext (<- if it's a hybrid again), when it comes to engine optimizations related to the "special needs" of said devices.

Of course, and I'm probably answering my own question here, engines already include mobile-specific settings/optimizations that help devs scale their games (down) accordingly, so I assume this is nothing new.

At the same time, I would guess that (at least) the popular engines (Unreal, Unity) already include Switch-specific build options, so maybe there's no point in looking into this?
the problem with handheld pcs are that they're still pcs. engines are already scalable as is, so optimizations would come from the developer side. and yea, because Switch et al aren't PCs they have their bespoke optimizations
 
Do you think the rise of handheld PC gaming could (indirectly) benefit the Switch/SwitchNext (<- if it's a hybrid again), when it comes to engine optimizations related to the "special needs" of said devices.

Of course, and I'm probably answering my own question here, engines already include mobile-specific settings/optimizations that help devs scale their games (down) accordingly, so I assume this is nothing new.

At the same time, I would guess that (at least) the popular engines (Unreal, Unity) already include Switch-specific build options, so maybe there's no point in looking into this?
If anything I think SD could be beneficial to Linux gaming. I know some devs have native steamOS support, instead of relying on Proton.
 
the problem with handheld pcs are that they're still pcs. engines are already scalable as is, so optimizations would come from the developer side. and yea, because Switch et al aren't PCs they have their bespoke optimizations

Yeah, makes sense.

If anything I think SD could be beneficial to Linux gaming. I know some devs have native steamOS support, instead of relying on Proton.

Never thought of the Linux angle. Interesting point.
 
Nice analysis! Though I think if REDACTED GPU have only 16 ROPS, games will likely be limited by fillrate before being compute-bound or memory bandwidth starved. Having a GPU with such huge amount of shading power whilst having low pixel fillrate seems pretty unbalance too.
Yes but also no.

Overall, I think Ampere GPUs don't have enough ROPs/TMUs - Nvidia wanted that compute power for RT and DLSS, not for rasterization, and it shows in the benchmarks.

But [redacted]'s ROP situation is the same as the desktop GPUs. Every RTX 30 card has 16 ROPs per GPC, and T239 is a single GPC.

So while I think you're totally right that fillrate is probably the bottleneck, I don't think its a [redacted] problem, just an Ampere problem, and already baked into the side by side benchmarks.
 
Could [redacted] run Starfield?
The last time I did some truly in depth prediction on [redacted] performance it was in the context of PS4 and cross-gen. Since then, the launch of truly "next-gen" games has come along, and my own understanding has grown, so I thought it might be worth returning to.

Rather than do some abstract "Redacted is 73% of Series 5, assuming Nintendo picks Zeta Megahertz on the Right Frombulator" I thought it would be nice to look in depth at Starfield, a game I'm curious about, and think about what it might look like on a theoretical [redacted]. Which, I guess, is kinda abstract since we're talking about unreleased software on unannounced hardware, but let me have this.

TL;DR: The Takeaway
If there is one thing I want folks to come away with from this exercise it's "the problems of last gen are not the problems of this gen. Same for the solutions."

I know that's not satisfying, but the PS5/Xbox Series consoles are not just bigger PS4/Xbox One, and [redacted] is not just a bigger Switch. Switch had big advantages and big disadvantages when it came to ports - [redacted] is the same but they are different advantages and disadvantages.

For the most part, the Series S doesn't "help" [redacted] ports as much as some folks think. And obviously, Starfield is going to remain console exclusive to Microsoft's machines. But yes, I believe a port of Starfield would be possible. It would also be a lot of work, and not in the ways that, say, The Witcher III was a lot of work.

Zen and the ARM of Gigacycle Maintenance
Behold, the ballgame:



Graphs like this kill a lot of nuance, but they're also easy to understand. Last gen TV consoles went with bad laptop CPUs. Switch went with a good mobile CPU. That put them in spitting distance of each other.

[redacted] is set to make a generational leap over Switch, but PS5/Xbox Series have made an even bigger leap, simply because of how behind they were before. And, most importantly - the daylight between Series S and Series X is minimal. The existence of a Series S version doesn't help at all here.

This is especially rough with Starfield, a game that is CPU limited. With GPU limited games, you can cut the resolution, but that won't help here. Cutting the frame rate would - except it's already 30fps. There are no easy solutions here.

That doesn't mean no solutions. But this puts in solidly "holy shit how did they fit it onto that tiny machine" territory.

I Like It When You Call Me Big FLOPa
Good news: DLSS + The Series S graphics settings, done. Go back to worrying about the CPU, because that's the hard problem.

The tech pessimism - Ampere FLOPS and RDNA 2 FLOPS aren't the same, and it favors RDNA 2. Whatever the on-paper gap between [redacted] and Series S, the practical gap will be somewhat larger. If you want the numbers, open the spoiler. Otherwise, just trust me.

GPUs are not FLOPS alone. There are also ROPS/TMUs/memory subsystems/feature set. There are also tradeoffs for going for a wider/slower vs narrower/faster design. If we want to game out how Series S and [redacted] might perform against each other we would, ideally, want two GPUs that we could test that roughly parallel all those things.

The Series S GPU is 1280 cores, 80 TMUs, 32 ROPs, with 224 GB/s of memory bandwidth, at 4TFLOPS
[redacted]'s GPU is 1536 cores, ?? TMUs, 16 ROPs, with 102 GB/s of memory bandwidth, at a theoretical 3 TFLOPS.

The RX 6600 XT is 2048 cores, 128 TMUs, 64 ROPS, with 256 GB/s of memory bandwidth + 444.9 GB/s infinity cache, at 10.6 TFLOPS
The RTX 3050 is 2560 cores, 80 TMUs, 32 ROPs, with 224 GB/s of memory bandwidth, at 9 TFLOPS.

No comparison is perfect, but from a high level, this is pretty close. The Ampere card is slightly fewer FLOPS built on 20% more cores, the RDNA 2 card supports that compute power with twice as much rasterization hardware. And the performance is within the same realm as the existing consoles, so we're not trying to fudge from something insane like a 4090.

The downside of this comparison is the memory bandwidth. The consoles and the RX 6000 series have very different memory subsystems. We're going to act like "big bandwidth" on consoles and "medium bandwidth plus infinity cache" are different paths to the same result, but it's the biggest asterisk over the whole thing.

Digital Foundry has kindly provided us with dozens of data points of these two cards running the same game in the same machine at matched settings. Here is the 1080, rasterization only numbers

GameAmpere FPSRDNA 2 FPSPercentage
Doom Eternal15623167
Borderlands 3539456
Control548365
Shadow of the Tomb Raider9013268
Death Stranding8313561
Far Cry 59513968
Hitman 29614665
Assassin's Creed: Odyssey518162
Metro Exodus488060
Dirt Rally 2.06210459
Assassin's Creed: Unity10015763

As we can see pretty clearly, the Ampere card underperforms the RDNA 2 card by a significant margin, with only a 3.9% standard deviation. If we grade on a curve - adjusting the for the differences in TFLOPS - that improves slightly. Going as the FLOPS fly, Ampere is performing at about 74% of RDNA 2.

We could compare other cards, and I have, but the gap gets bigger, not smaller as you look elsewhere. Likely because where Nvidia spent silicon on tensor cores and RT units, AMD spent them on TMUs and ROPs.

If you take those numbers, an imaginary 3TFLOP [redacted] isn't 75% the performance of the Series S, but closer to 55%. We will obviously not be able to run the Series S version of the game without graphical changes. So what about DLSS? Again, technical analysis below, but the short answer is "DLSS Performance Mode should be fine".

Let's do some quick math. At 55% of the performance of Series S, is Series S can generate an image natively in 1ms, [redacted] can do it in 1.78ms. According to the DLSS programming guide, our theoretical [redacted], we can get a 1440p image (the Series S target for Starfield) from a 720p source in 2.4ms.

Looking at those numbers it is clear that there is a point where DLSS breaks down - where the native image rendering is so fast, that the overhead of DLSS actually makes it slower. That should only happen in CPU limited games, but it just so happens, Starfield is a CPU limited game. So where is that line?

Series S GPU Time * 1.78 (the redacted performance ratio) * 0.25 (DLSS performance mode starts at 1/4 res) + 2.4ms (redacted's DLSS overhead) = Series S GPU Time

Don't worry, I've already solved it for you - it's 3.8ms. That would be truly an extremely CPU limited game. So DLSS seems extremely viable in most cases.

Starfield is a specific case, however, as is the Series S generally. Starfield uses some form of reconstruction, with a 2x upscale. If Series S is struggling to get there natively, will DLSS even be enough? Or to put it another way, does FSR "kill" DLSS?

Handily, AMD, also provides a programming guide with performance numbers for FSR 2, and they're much easier to interpret than the DLSS ones. We can comfortably predict that FSR 2 Balanced Mode on Series S takes 2.9ms. You'll note that DLSS on [redacted] is still faster than FSR 2 on the bigger machine. That's the win of dedicated hardware.

And because of that, we're right back where we started. For GPU limited games, if the Series S can do it natively, we can go to half resolution, and DLSS back up in the same amount of time, or less. If the Series S is doing FSR at 2x, we can do 4x. If Series S is doing 4x, by god, we go full bore Ultra Performance mode. And should someone release a FSR Ultra Performance game on Series S, well, you know what, Xbox can keep it.

Worth noting, that even then the options don't end for [redacted]. Series S tends to target 1440p because it scales nicely on a 4k display. But 1080p also scales nicely on a 4k display, giving us more options to tune.

Whether you are willing to put up with DLSS here is a subjective question, but this is a pretty straight forward DLSS upscale, nothing unusual at all. Where it might become dicey is if Imaginary Porting Studio decided to do something wild like go to Ultra Performance mode, not because of the graphics, but to free up time for the CPU to run. In CPU limited games, that rarely gives you the performance you need, but it's worth noting that [redacted] and DLSS do give us some "all hands on deck" options.

In Space, No One Can Hear You Stream
It's not just CPUs and GPUs obviously. The ninth gen machines all advertise super fast NVMe drives. Meanwhile, we have no idea what [redacted]'s storage solution will look like. But I don't want to talk too much about abstract performance, I want to talk about Starfield.

Starfield's
PC requirements are informative. It requires an SSD, but doesn't specify type, nor does it recommend an NVMe. It only requires 16GB of RAM, which is pretty standard for console ports, which suggests that Starfield isn't doing anything crazy like using storage as an extra RAM pool on consoles. It's pretty classic open world asset streaming.

Let's make a little table:

Switch eMMCOld SATA SSDModern eMMCSATA III SSDiPhone NVMeSeries S NVMeAndroid UFS 4UFS 4, on paper
300MB/s300MB/s400 MB/s500MB/s1600MB/s2400MB/s3100MB/s5800MB/s

Nintendo has a lot of options, and pretty much all of them cross the Starfield line - if mandatory installs are allowed by Nintendo. There is a big long conversation about expansion and GameCard speed that I think is well beyond the scope here, and starts to get very speculative about what Nintendo's goals are. But at heart, there is no question of the onboard storage of [redacted] being fast enough for this game.

Don't Jump on the Waterbed
When you push down on the corner of a waterbed, you don't make the waterbed smaller, you just shift the water around.

You can do that with software, too. Work can be moved from one system (like the CPU) to another (RAM) if you're very clever about it (caching, in this case). Sometimes it's faster. Sometimes it's slower. But that doesn't matter so much as whether or not you've got room to move. This is likely one of the reasons that Nintendo has historically been so generous with RAM - it's cheap and flexible.

The danger with this next-gen ports isn't any one aspect being beyond what [redacted] can do. It's about about multiple aspects together combining to leave no room to breath. NVMe speed you can work around, GPU can cut resolution, CPU can be hyper optimized. But all three at once makes for a tricky situation.

At this point I don't see evidence of that in Starfield - I suspect only the CPU is a serious bottle neck. But some minor things worth bringing up:

RAM - reasonable expectations are that Nintendo will go closer to 12 GB than 8 GB, so I don't see RAM as a serious issue.

Storage space - PC requirements call for a whopping 128GB of free space. That's much larger than Game Cards, and most if not all of the likely on board storage in [redacted]. There are likely a bunch of easy wins here, but it will need more than just easy wins to cross that gap.

Ray Tracing - Starfield uses no RT features on consoles, so despite the fact that [redacted] likely does pretty decent RT for its size, it's irrelevant here.

Appendix: The Name is Trace. Ray Trace
But someone will ask, so here is the quick version: [redacted]'s RT performance is likely to be right up to Series S. But it's not like Series S games often have RT, and RT does have a decent CPU cost, where [redacted] is already weakest. So expect RT to be a first party thing, and to be mostly ignored in ports.

Let's look at some benchmarks again. The 3050 vs the 6600 XT once more. This time we're using 1440p resolution, For Reasons.

Game3050 FPS3050 FPS w/RTRT Cost6600 XT FPS6600 XT FPS w/RTRT Cost
Control351924.1ms492029.6ms
Metro Exodus372414.6ms603016.7ms
The method here is less obvious than before. We've taken the games at max settings with RT off, then turned RT on, and captured their frame rates. Then we've turned the frame rate into frame time - how long it took to draw each frame on screen. We've then subtracted the time of the pure raster frame from the RT frame.

This gives us the rough cost of RT in each game, for each card, lower is better. And as you can see, despite the fact that the 3050 is slower than the 6600 XT by a significant margin, in pure RT performance, it's faster. About 38% faster when you grade on the curve for the difference in TFLOPS.

There aren't a lot of games with good available data like this to explore, but there are plenty of cards, and you can see that this ratio tends to hold.

Game3060 FPS3060 FPS w/RTRT Cost6700 XT FPS6700 XT FPS w/RTRT Cost
Control552817.5ms672525.1ms
Metro Exodus543510.1ms743713.5ms
This gives us 43% improvement for Ampere, adjusted for FLOPS.

Applying this adjustment our theoretical 3TF [redacted] out performs the 4TF Series S by 3.5%.

It's worth noting that RDNA 2 doesn't have true RT hardware. Instead, the CPU builds the BVH structure, and then triangle intersections are tested by the existing TMUs that the GPU already has. Ampere performs both operations on dedicated hardware. This should reduce the CPU load, but also opens up the possibility of further wins when using async compute.

While a decent analysis of performance, I do argue that we don't really know where the bottleneck in Starfield reaching 60fps is.

If Todd is to be believed, it may actually be a GPU Bottleneck due to him saying it can hit 60 at points but they lock it to 30 to maintain the fidelity they are at with a consistent framerate.

Now it's likely a bit of all 3 (CPU, GPU, Memory) but we won't be able to analyze what the bottleneck actually would be until Starfield comes out and PC users/DF can break it down.

Now, something I will say is the CPU Cost Ballooning out seems to be moreso a symptom of lack of optimization at this stage, at least beyond pushing 60fps.

Add on the Series S|X/PS5 pretty much being in the worst case scenario to try to do RT (RDNA2 RAs, having to build the BVH on the CPU, having no Infinity Cache, having very bad latency GDDR vs low latency DDR which RT likes as we can see on Steam Deck keeping up despite the bandwidth Deficit) ,

I feel the cuts on Series S are mainly just a domino of other deficits in things like Unreal 4 being stretched to complexity limits, and problems in games like Jedi Survivor and RedFall (Beyond the latter needing more time in the oven in general), being the Engine complexity limit problem exposing itself as Jedi Survivor and Hogwarts Legacy are trying to push through UE5 level assets...in UE4, without Nanite or any sort of Compute/Mesh Shaded Geometry. That engine just wasn't built for that type of scope, and both CPU and memory suffer hard.
 
Yes but also no.

Overall, I think Ampere GPUs don't have enough ROPs/TMUs - Nvidia wanted that compute power for RT and DLSS, not for rasterization, and it shows in the benchmarks.

But [redacted]'s ROP situation is the same as the desktop GPUs. Every RTX 30 card has 16 ROPs per GPC, and T239 is a single GPC.

So while I think you're totally right that fillrate is probably the bottleneck, I don't think its a [redacted] problem, just an Ampere problem, and already baked into the side by side benchmarks.
Yeah, I'm aware that Nvidia leaned more towards raw shading power rather than rasterization especially with Ampere.
That said I still hope at the end Nvidia and Nintendo customized the GPC to fit say, 32 instead of 16 ROPS as desktop Ampere, as it would hugely improve rasterization performance, and likely improve the GPU per FLOP efficiency. 16 ROPS is actually the same amount of ROPs as the current Switch Maxwell GPU.
 
All this talk about if Redacted could run Starfield has me thinking, hoping, that it can run ES6 when the time comes. Assuming MS even makes it multi platform. 😖
By the time ES6 comes out, we'll be wondering how it would run on the Switch 3 and comparing it to the Series XXX.
 
Yes but also no.

Overall, I think Ampere GPUs don't have enough ROPs/TMUs - Nvidia wanted that compute power for RT and DLSS, not for rasterization, and it shows in the benchmarks.

But [redacted]'s ROP situation is the same as the desktop GPUs. Every RTX 30 card has 16 ROPs per GPC, and T239 is a single GPC.

So while I think you're totally right that fillrate is probably the bottleneck, I don't think its a [redacted] problem, just an Ampere problem, and already baked into the side by side benchmarks.

I'm curious to what extent fillrate would actually be a bottleneck for a modern games console, with games increasingly using compute shaders and compute-like shaders (eg mesh and RT shaders) which bypass the ROPs. Particularly so for games developed exclusively for the console, where if you know you've got lots of compute performance but not much pixel fillrate, you can, let's say, use compute shaders rather than pixel shaders for your deferred lighting calculations to get around the bottleneck. Probably more of a pain if you're porting an existing engine that expects a high pixel fillrate, though.
 
I'm curious to what extent fillrate would actually be a bottleneck for a modern games console, with games increasingly using compute shaders and compute-like shaders (eg mesh and RT shaders) which bypass the ROPs. Particularly so for games developed exclusively for the console, where if you know you've got lots of compute performance but not much pixel fillrate, you can, let's say, use compute shaders rather than pixel shaders for your deferred lighting calculations to get around the bottleneck. Probably more of a pain if you're porting an existing engine that expects a high pixel fillrate, though.
I think it's depends, modern game consoles like the XSX and PS5 have pretty decent pixel fillrate in my opinion, though in theory the PS5 have the highest fillrate, and that might explain why some games run better or have higher resolution on PS5 compare to the XSX despite the latter advantage in compute power. The XSS is another story though the XSS fillrate problem is more likely its lower memory bandwidth rather than its cut down amount of render backend units.
As for REDACTED, it's interesting to see which direction Nvidia and Nintendo gone with the customizations they made on the GPU side. Personally I think Nintendo will want not only a power-efficient SoC, but a balanced SoC too. Increasing the number of ROPs would alleviate the fillrate problem I've mentioned above, but other stuff like a bigger cache would help, too.
 
I'm curious to what extent fillrate would actually be a bottleneck for a modern games console, with games increasingly using compute shaders and compute-like shaders (eg mesh and RT shaders) which bypass the ROPs. Particularly so for games developed exclusively for the console, where if you know you've got lots of compute performance but not much pixel fillrate, you can, let's say, use compute shaders rather than pixel shaders for your deferred lighting calculations to get around the bottleneck. Probably more of a pain if you're porting an existing engine that expects a high pixel fillrate, though.
Yeah, Nintendo first party games are going to be in great shape, considering what the module system is already doing with deferred rendering on Tears of the Kingdom. But ports, especially legacy ports which I expect a lot of, are going to have more trouble.

As for REDACTED, it's interesting to see which direction Nvidia and Nintendo gone with the customizations they made on the GPU side. Personally I think Nintendo will want not only a power-efficient SoC, but a balanced SoC too. Increasing the number of ROPs would alleviate the fillrate problem I've mentioned above, but other stuff like a bigger cache would help, too.
Assuming we get the SoC leaked last year, it will not increase the number of ROPs. Ampere's design bakes the number of ROPs into the GPC design. If you want to increase ROPs relative to the compute power, you spread your SMs out over more GPCs, which Nvidia has done.

The design is pretty explicitly built around a single GPC. More ROPs would be a major change to the chip design. What's more likely isn't more ROPs, but fewer SMs, with some of them binned for yield reasons, and increasing clock to compensate. But increasing the raw ROP count is tantamount to imagining a new chip, in which case my whole performance analysis is moot. :)

While a decent analysis of performance, I do argue that we don't really know where the bottleneck in Starfield reaching 60fps is.

If Todd is to be believed, it may actually be a GPU Bottleneck due to him saying it can hit 60 at points but they lock it to 30 to maintain the fidelity they are at with a consistent framerate.
I think that quote has been floating around without enough context. It really seemed to be like he was talking about CPU load. Emphasis mine
I think it'll come as no surprise, given our previous games, what we go for. Always these huge, open worlds, fully dynamic, hyper detail where anything can happen. And we do want to do that. It's 4K in the X. It's 1440 on the S. We do lock it at 30, because we want that fidelity, we want all that stuff. We don't want to sacrifice any of it.

And we don't ever want to sacrifice that experience that makes our games feel really, really special. So it feels great. We're really happy with how it feels even in the heat of battle. And we need that headroom because in our games, really anything can happen.
It would be absolutely wild if a Bethesda open world RPG - especially one that looks like Starfield - was GPU limited.

Now, something I will say is the CPU Cost Ballooning out seems to be moreso a symptom of lack of optimization at this stage, at least beyond pushing 60fps.

Add on the Series S|X/PS5 pretty much being in the worst case scenario to try to do RT (RDNA2 RAs, having to build the BVH on the CPU, having no Infinity Cache, having very bad latency GDDR vs low latency DDR which RT likes as we can see on Steam Deck keeping up despite the bandwidth Deficit) ,

I feel the cuts on Series S are mainly just a domino of other deficits in things like Unreal 4 being stretched to complexity limits, and problems in games like Jedi Survivor and RedFall (Beyond the latter needing more time in the oven in general), being the Engine complexity limit problem exposing itself as Jedi Survivor and Hogwarts Legacy are trying to push through UE5 level assets...in UE4, without Nanite or any sort of Compute/Mesh Shaded Geometry. That engine just wasn't built for that type of scope, and both CPU and memory suffer hard.
I have Opinions on all of this - which is not to say I think you're wrong - but this is all why I didn't want to talk about Games In General, but one game in specific. Sure, UE5 might change things considerably, and yes, RT is a little apples to oranges between the two architectures, and I disagree with you about CPU optimization, but I see where you're coming from.

But none of those things matter on a game with no ray tracing, that isn't Unreal engine, and is simulation focused. I do think Starfield could be ported to [redacted]! But the idea that a 30fps is just an optimization issue, when the game hasn't released and there are clear technical reasons why the game might be CPU limited seems presumptive.
 
0
I know we've talked about it ad nauseam at this point, but I find it odd that there haven't really been any huge rumblings of hardware since the whole Mochizuki fiasco almost two years ago. Before then, it seemed like you would hear something every other month that the Switch Pro is coming out between insiders and YouTubers. Not that I think it means all that much, but you'd think we'd hear even fake rumblings about a successor, like some clout chaser with a somewhat decent track record, but that doesn't seem to be the case. Maybe I'm not looking toward the right places, but it's still peculiar, nonetheless. Maybe Nintendo really is cracking down on leaks, maybe insiders have learned their lesson and are keeping what they know close to their chest, maybe sources just aren't willing to talk, or maybe the time just isn't right. I know a dozen or so pages ago, we talked about how random and volatile leaks actually are, so maybe it's just that.

Either way, between the supposed tape-out, Nintendo's weirdly front loaded 1H and seemingly empty 2H, and Furukawa's vagueness regarding future hardware, 2023 is still in the cards for me, rumblings or no rumblings. This week may very well be the moment of truth.
 
Assuming we get the SoC leaked last year, it will not increase the number of ROPs. Ampere's design bakes the number of ROPs into the GPC design. If you want to increase ROPs relative to the compute power, you spread your SMs out over more GPCs, which Nvidia has done.

The design is pretty explicitly built around a single GPC. More ROPs would be a major change to the chip design. What's more likely isn't more ROPs, but fewer SMs, with some of them binned for yield reasons, and increasing clock to compensate. But increasing the raw ROP count is tantamount to imagining a new chip, in which case my whole performance analysis is moot. :)
Yeah I do agree that if we get the SoC leaked last year the number of ROPs will likely stay as it at 16 ROPs. What I think is - and while the chance may low - Nintendo ask Nvidia to do more custom job to, say, put more ROPs in it before the chip tape-out. That involves redesigning the entire GPC, or like as you said spread the SMs out over more GPC. It's not impossible to think that Nintendo will spend extra so Nvidia will do more custom work for them, as historically Nintendo had done so with other partners.
In the past Nvidia GPU put ROPs according to how wide the memory bus and the amount of L2 cache that the GPU have, but in Ampere and Ada they move it as part of a GPC to eliminating throughput mismatches between the scan conversion frontend and raster operations backend. This have the benefit of more ROPs if the GPU has a lot of GPC, something like the 3090 but on the other hand it limited the number of raster backend unit if the GPU is "small" like we have here with REDACTED. ROPs are extremely important as REDACTED will likely aim for 1080p and resolution beyond that when docked, even with DLSS and whatnot. I still can't wrap my head around that a GPU with such big compute performance relative to it's small ROPs count, not to mention that the current Switch already has 16 ROPs in its GPU (though ultilization is another story - it's memory bandwidth problem).
Another reason I brought this up is, well, we know REDACTED GPU is wide-and-slow design to save power. In handheld mode this is extremely important, but ROPs are tied with the entire GPU clock, and as the GPU will be clock very low in handheld mode to save power, ROPs will be affected as well. Assuming a portable clock around 600MHz, the device would still have good FP32 performance but its pixel fillrate will even be lower than the XB One. This can be a major bottleneck to games ported even from the PS4/XBO generation. Compute performance is good, but pixel and texel fillrate are also very important aspect along with memory bandwidth, cache setup,... that dictate the real world performance of REDACTED GPU.
 
I know we've talked about it ad nauseam at this point, but I find it odd that there haven't really been any huge rumblings of hardware since the whole Mochizuki fiasco almost two years ago. Before then, it seemed like you would hear something every other month that the Switch Pro is coming out between insiders and YouTubers. Not that I think it means all that much, but you'd think we'd hear even fake rumblings about a successor, like some clout chaser with a somewhat decent track record, but that doesn't seem to be the case. Maybe I'm not looking toward the right places, but it's still peculiar, nonetheless. Maybe Nintendo really is cracking down on leaks, maybe insiders have learned their lesson and are keeping what they know close to their chest, maybe sources just aren't willing to talk, or maybe the time just isn't right. I know a dozen or so pages ago, we talked about how random and volatile leaks actually are, so maybe it's just that.

Either way, between the supposed tape-out, Nintendo's weirdly front loaded 1H and seemingly empty 2H, and Furukawa's vagueness regarding future hardware, 2023 is still in the cards for me, rumblings or no rumblings. This week may very well be the moment of truth.
Cracking down on leaks doesn't work. Nintendo doesn't have som grand formula that makes them more efficient at preventing leaks than any other tech company.

A theory that's been thrown around, is that they really did retract devkits, leading everyone but their most trusted of trusted partners to think it was cancelled. Mochis reporting could have played a part in that.

This would ensure less leaks, but would also mean less software in the launch window. A worthy tradeoff? Imo no, but who knows how Nintendo thinks.
 
I know we've talked about it ad nauseam at this point, but I find it odd that there haven't really been any huge rumblings of hardware since the whole Mochizuki fiasco almost two years ago. Before then, it seemed like you would hear something every other month that the Switch Pro is coming out between insiders and YouTubers. Not that I think it means all that much, but you'd think we'd hear even fake rumblings about a successor, like some clout chaser with a somewhat decent track record, but that doesn't seem to be the case. Maybe I'm not looking toward the right places, but it's still peculiar, nonetheless. Maybe Nintendo really is cracking down on leaks, maybe insiders have learned their lesson and are keeping what they know close to their chest, maybe sources just aren't willing to talk, or maybe the time just isn't right. I know a dozen or so pages ago, we talked about how random and volatile leaks actually are, so maybe it's just that.

Either way, between the supposed tape-out, Nintendo's weirdly front loaded 1H and seemingly empty 2H, and Furukawa's vagueness regarding future hardware, 2023 is still in the cards for me, rumblings or no rumblings. This week may very well be the moment of truth.
Nintendo has been very keen on controlling the narrative, but perhaps they should've weighted in the effects of what rumor mongering does to keep them both in market and mind share. Part of me thinks the whole "Switch Pro" trend already has done this and they have all the data they need, from which they conclude that a giant information black out is needed to prevent the competition from swooping in and stealing any potential lightning.

This amount of tight-lipped control does probably hint at something big. New hardware is mostly a given, but we should discount software offerings and other left-field surprises.
 
Cracking down on leaks doesn't work. Nintendo doesn't have som grand formula that makes them more efficient at preventing leaks than any other tech company.

A theory that's been thrown around, is that they really did retract devkits, leading everyone but their most trusted of trusted partners to think it was cancelled. Mochis reporting could have played a part in that.

This would ensure less leaks, but would also mean less software in the launch window. A worthy tradeoff? Imo no, but who know how Nintendo thinks.
Cracking down leaks does work.

Who would've expected the just-casual announcement of Metroid Prime Remastered being shadow dropped with a physical coming later? If it was rumor mongered, it could've deflated the hype.
 
Cracking down leaks does work.

Who would've expected the just-casual announcement of Metroid Prime Remastered being shadow dropped with a physical coming later? If it was rumor mongered, it could've deflated the hype.
Exactly 1 studio worked on prime remaster, and a digital only release helps keep the number of people who has to know about this to the bare minimum.
 
...
Either way, between the supposed tape-out, Nintendo's weirdly front loaded 1H and seemingly empty 2H, and Furukawa's vagueness regarding future hardware, 2023 is still in the cards for me, rumblings or no rumblings. This week may very well be the moment of truth.

These are very strange times. 2023 is still in the cards for me as well.

But until the actual reveal, I don't think we will have one single moment of truth.

We were expecting the February Direct to shed some light on the whole year. It turned out inconclusive. Then the Mario Movie/TotK/investors meeting, it turned out inconclusive.

If the alleged Direct next week only shows Pikmin, 1-2 Switch, a few DLCs and a couple of new games for August/September (along third-parties), it will be inconclusive.

If the possible September Direct only shows DLCs and a couple of holiday/early 2024 games, it will be inconclusive.

We are just moving goal posts at this point.
 
These are very strange times. 2023 is still in the cards for me as well.

But until the actual reveal, I don't think we will have one single moment of truth.

We were expecting the February Direct to shed some light on the whole year. It turned out inconclusive. Then the Mario Movie/TotK/investors meeting, it turned out inconclusive.

If the alleged Direct next week only shows Pikmin, 1-2 Switch, a few DLCs and a couple of new games for August/September (along third-parties), it will be inconclusive.

If the possible September Direct only shows DLCs and a couple of holiday/early 2024 games, it will be inconclusive.

We are just moving goal posts at this point.
I kinda disagree, June is where we usually get the biggest picture of what the rest of the year will look like which, as has been stated time and time again, is in desperate need for new info. If the rumblings about July are to be believed, we may get something hardware related there and that will be the cutoff point for me should we not get anything. Imho, the goalpost for this year was always gonna stop at the start of summer, unless Nintendo wants to have an absurdly small turnaround period that may work in theory, as seen with the Lite and OLED, but it might not yield as favorable results as those did.
 
I kinda disagree, June is where we usually get the biggest picture of what the rest of the year will look like which, as has been stated time and time again, is in desperate need for new info. If the rumblings about July are to be believed, we may get something hardware related there and that will be the cutoff point for me should we not get anything. Imho, the goalpost for this year was always gonna stop at the start of summer, unless Nintendo wants to have an absurdly small turnaround period that may work in theory, as seen with the Lite and OLED, but it might not yield as favorable results as those did.
"rumblings about July" isn't that just random people speculating? Or is there anything more substantial that I missed?
 
Quoted by: D36
1
"rumblings about July" isn't that just random people speculating? Or is there anything more substantial that I missed?
It started with Jeff Grubb and another insider. The former was unsure about it and the latter was somewhat confident in their findings.
 
@Quangcute03 @oldpuck @Thraktor with respect to the ROPs, the TX1 was over provisioned iirc with ROPs especially for a mobile device that it didn’t seem to need that many for what it was meant for. And I’ve noticed nvidia GPUs with lower ROP counts seem to match an equivalent AMD card that has, many times at least with Turing and later even with lower fill rate, maybe it’s different here…?
 
But none of those mentioned hardware?
No, unfortunately. We don't even know if this supposed July thing is a Direct, which we have smoke saying that it might be this week. It could be just nothing and those two might have gotten their data mixed up. We may have to check Grubb's Twitter and ask, just in case.
 
No, unfortunately. We don't even know if this supposed July thing is a Direct, which we have smoke saying that it might be this week. It could be just nothing and those two might have gotten their data mixed up. We may have to check Grubb's Twitter and ask, just in case.
Should also add that we may get a Pikmin Direct soon and that may have been what Grubb was referring to.
 
What I gather from Dakhil and Thraktor's comments is that a more powerful Switch is excessively difficult to engineer.

So maybe Nintendo will pivot to another concept altogether?
No, not really. Just that time is of the essence.

If Nintendo chases what’s out now that’s better, they’ll be delayed for a few years.

If Nintendo hopped on it before it released or was announced it wouldn’t be the same constraint.
 
Nintendo finally learning how to do obfuscation and properly clean up their firmware isn't necessarily evidence that it won't happen.
Nintendo doesn't actually seem to put a ton of effort into hiding stuff from the firmware. The TX1(+) revisions all being out in the open while the Drake stuff is limited to "shadows" of new features is all very functionally driven from how the firmware is and will likely be distributed.
 
What I gather from Dakhil and Thraktor's comments is that a more powerful Switch is excessively difficult to engineer.

So maybe Nintendo will pivot to another concept altogether?
don't know how you got that. making a more powerful switch is pretty easy for the likes of Nvidia
 
Do you think the rise of handheld PC gaming could (indirectly) benefit the Switch/SwitchNext (<- if it's a hybrid again), when it comes to engine optimizations related to the "special needs" of said devices.

Of course, and I'm probably answering my own question here, engines already include mobile-specific settings/optimizations that help devs scale their games (down) accordingly, so I assume this is nothing new.

At the same time, I would guess that (at least) the popular engines (Unreal, Unity) already include Switch-specific build options, so maybe there's no point in looking into this?
Don't overexaggerate this niche. Current estimates say the Steamdeck will reach lifetime 3M units in 2023. This the Switch does even with worst predictions does in 3 Months. And the Steamdeck is by far the most succesful PC Handheld.
 
I don’t have much doubt about an upcoming Direct now, hopefully this week.
I’m ready…


… To cry and probably enjoy my Switch for another year without any successor on the horizon :(
 
Add on the Series S|X/PS5 pretty much being in the worst case scenario to try to do RT (RDNA2 RAs, having to build the BVH on the CPU, having no Infinity Cache, having very bad latency GDDR vs low latency DDR which RT likes as we can see on Steam Deck keeping up despite the bandwidth Deficit)
I know I'm replying to you twice, but I wanted to pick this one thread out. And I want to be clear, I am not disagreeing with you. I just want to take a side tangent on how I talk about this stuff.

In experimentation/benchmarking/data analysis, there is this concept called a Type I and a Type II error. When you hear "margin of error" what we're talking about is that stats nerds realized that they could measure error just like they could measure anything else, and they could bake the odds that they were wrong into the rest of their analysis.

But it's not just about the odds that you're wrong but what way are you wrong? Maybe you see something that isn't there, a false positive, a Type I error. Or maybe you fail to see a thing that is there, a false negative, a Type II error. When you do an analysis you can not only tune for the odds that you're wrong, but which kind of wrong you are.

And that comes down to which kind of wrong is better? Let's say you think that aspirin might prevent heart disease. Heart disease kills lots and lots of people, aspirin is cheap, and the side effects are pretty low, and most people already take it for head aches. If you see an effect that isn't real? Well, that's not too bad. People took a low dose of aspirin and it didn't do anything.

What if you miss a very small effect? Then lots of people die of a disease that is cheaply preventable. You optimize your experiment to really laser in on tiny changes. That would be the reasonable - and in fact ethical - thing to do.

And we prescribed aspirin to folks at risk of heart disease for years, and it wasn't until we did HUGE studies over and over again that we were pretty sure that the Aspirin didn't do shit.

Meanwhile, we did the same thing with prenatal B12. A small but tiny chance it was improving the health of babies. And after a number of massive, randomly controlled trials? The effect remains! Thousands of kids born without spina bifida, because we risked seeing something that might not be there, over missing something that was.

Obviously, console analysis in a fan thread is not life or death. But we can still do the same sorts of things. Consider all the DLSS chatter over the last year+. Was DLSS even possible on such a tiny machine was a real question early on, and there was some real data that suggested "barely, maybe not even". Some of the numbers were really wonky, and there were roughly two, equally reasonable interpretations for those numbers. One, simple explanation seemed to imply that DLSS started to break down at Drake sized machines. Another, more complex analysis, saw a statistical effect that meant DLSS on Drake would be pretty great actually. A Type I vs Type II error situation.

If this thread were actually designing a console, seeing an effect for DLSS that wasn't there would be really bad. It would mean that we would be adding silicon that we couldn't use to very constrained hardware. But we're not, we're trying to game out Nintendo's existing decisions. Nintendo put DLSS in their tiny machine. It's probably useful to them. So let's assume the more complex analysis is the right one.

This thread loves our Type I errors. We love risking false positives for the opportunity to be giddy about the highest possibilities. That's fun, I get that, and if you come here even in the absence of news, that's probably what you're here for. And if that is a risk you wanna take, go for it. I don't think it's wrong at all! In fact, I'm saying it has an equal chance of being right over the alternative! It's just, which kind of wrong do you prefer to be?

I think, overall, Nintendo fans have a history of stomach drops about performance after a console launches. And for me, personally, I can get hyped for new Nintendo games, even if they're running on a low power machine. So I'd prefer to make Type II errors - failing to see performance wins where performance wins might exist - because I'm just as excited for the new hardware and then I am positioned to be pleasantly surprised if it exceeds my analysis! It maximizes my personal joy.

In the specific case of RT - I think you are exactly right about the consoles creating conditions where they underperform desktop AMD. And for the most part I don't factor that in. Why?

1) Because there just isn't good data on it. There are no GPUs that look exactly like the consoles that can be plugged into a desktop and go head to head on matched settings with matched system configurations. You can't easily isolate differences in GPU bandwidth. If someone has a Steam Deck, and a Series S in developer mode, and wants to construct a reliable benchmark for RT performance between the two? Please! Give me some data!

2) Because I don't think the effect, even if it is real, is big enough to affect ports. There are a number of games that get RT on Series X but not on Series S. So the question has to be asked, but much additional RT performance by itself would be enough to enable RT on Series S? Is it closer to 20% or 50%? A 100%?

20% extra RT performance might let first party games do some extra cool stuff, but it's a drop in the bucket when it comes to the gap between Series S and Series X. Even if there are games where Series S is just barely below the threshold for turning on RT, that 20% is gonna matter a hell of a lot less than the 400% perf multiplier that DLSS Performance Mode provides.
 
0
Sorry for the dumb question, but where did the name Drake come from? I know it’s supposed to be from super heroes but which one is supposed to be Drake?
 
Sorry for the dumb question, but where did the name Drake come from? I know it’s supposed to be from super heroes but which one is supposed to be Drake?
I think it's believed to come from Tim Drake, one of the people who has acted as Robin in the Batman comics.
 
Sorry for the dumb question, but where did the name Drake come from? I know it’s supposed to be from super heroes but which one is supposed to be Drake?
I think it was the name of the chip or SOC? You ought to check the first page, it has a collection of links and articles regarding future hardware.
 
0
hearing the rumblings of something in July giving me little hope of Nintendo keeping up the pattern of revealing new hardware every 2 years
 
@Quangcute03 @oldpuck @Thraktor with respect to the ROPs, the TX1 was over provisioned iirc with ROPs especially for a mobile device that it didn’t seem to need that many for what it was meant for. And I’ve noticed nvidia GPUs with lower ROP counts seem to match an equivalent AMD card that has, many times at least with Turing and later even with lower fill rate, maybe it’s different here…?
Yeah, the TX1 was a ROP-heavy design because Nvidia originally want the SoC to push 4K display and such. The TX1 ROPs, however, are limited by its memory bandwidth, which limit their ultilization. On the Switch, memory bandwidth has proven to be a huge bottleneck. With REDACTED, they probably has taken the lesson and improve it, though there are other aspect of the GPU as well that I think can have a direct impact on performance like its cache (which seems to be 1MB according to the leaks). Speaking of the cache situation, I think REDACTED might have a System Level Cache like other Orin chips that can help with the small 1MB L2 cache that REDACTED might have on its GPU.
Also is leaks mention more of the GPU setup of REDACTED? I remember the leaks only defined the architecture, the SM count, the Tensor core capabilities and maybe the amount of L2 on GPU .
 
don't know how you got that. making a more powerful switch is pretty easy for the likes of Nvidia
What Switch successor can Nintendo bring to the world when:

  • non mass market (and thus only expensive) cartridge solutions exist that could exploit the potential of UE5?
  • the usage of ray tracing is prohibitively expensive on a mobile GPU?
  • any CPU that fits in the Switch form factor will make the port of current gen consoles potentially as challenging as they were during the Switch Era?
Nintendo engineers are having a humongous task in their hands if they have to solve everything that is above while keeping the Switch profitable.
 
What Switch successor can Nintendo bring to the world when:

  • non mass market (and thus only expensive) cartridge solutions exist that could exploit the potential of UE5?
  • the usage of ray tracing is prohibitively expensive on a mobile GPU?
  • any CPU that fits in the Switch form factor will make the port of current gen consoles potentially as challenging as they were during the Switch Era?
Nintendo engineers are having a humongous task in their hands if they have to solve everything that is above while keeping the Switch profitable.
You are probably right on all 3 counts, but these are not dealbreakers for a super successful successor, and there are possible workarounds.

1. Who says every single game has to be playable from the card? A lot of games will have no issues, but partial or full installs could solve the problem for those that do.

2. For many games probably yeah, as they are on series s. Games can't rely on RT for lighting anyway, especially if they are releasing on Xbox/ PC. Nvidia is also much more efficient at RT than AMD, so even as a portable it would punch above its weight.

3. Yes, many games won't be ported but that didn't prevent Switch from being super successful.
 
@Quangcute03 @oldpuck @Thraktor with respect to the ROPs, the TX1 was over provisioned iirc with ROPs especially for a mobile device that it didn’t seem to need that many for what it was meant for. And I’ve noticed nvidia GPUs with lower ROP counts seem to match an equivalent AMD card that has, many times at least with Turing and later even with lower fill rate, maybe it’s different here…?
I don't think it is different here, from a hardware standpoint. If you're buying a GPU, you care about the performance and the dollar amount you're spending for it.

Maybe Nvidia offers less efficient FLOPS, but offers more of them per dollar. Maybe AMD offers more raster performance per FLOP, but that requires extra silicon, which drives up costs. And so maybe at different points in the stack, one strategy wins out over the other in price per dollar.

It's only when you get into the weird black boxes of the consoles where this kind of head to head makes sense, so you start uncovering these differences. And you're totally right. ROP for ROP, Ampere is more efficient. FLOP for FLOP, RDNA 2 is.

Would more ROPs increase performance? Yeah, probably. Would it be a good idea? I doubt it. Nvidia could easily have gone with the same number of SMs in two half sized GPCs. That they didn't suggests that the win probably wasn't worth the die space.

Also is leaks mention more of the GPU setup of REDACTED? I remember the leaks only defined the architecture, the SM count, the Tensor core capabilities and maybe the amount of L2 on GPU .
Depend on what you mean by leak? Nvidia published partial linux drivers for T239, which included lots of info, but isn't a leak since it wasn't in anyway by accident. And the hack from last year effectively included the complete design of the whole chip, but in source code form that required you already know what you were looking for to extract any info.

In short, it's straight desktop Ampere, with a single GPC. If you read the Ampere white paper, virtually everything documented has been confirmed true of T239, with a few exceptions.
 
You folks are so active, anything new?

This is pretty interesting. My hope if this is true, that there is a Switch 2 reveal at the end


If there is a Direct this week, there won't be any new hardware announcement in it. A Direct would generally not have the announcement of a new system in it.

Something like this will have it's own dedicated event.
 
Last edited:
the way I see it from the outside looking in Nvidia have done all the heavy lifting re giving Nintendo something that is ideal for a next-gen Switch. the fact it hasn't materialized yet is Nintendo being slow and/or 'waiting on software' which is totally unnecessary when titles like TOTK are releasing on the OG Switch. maybe they're so worried about 'getting it right' that they'd rather drop it later but the optimal time to bring it to market has already passed. end of this year would still be passable but riding out this hardware until well into next year makes you wonder what is going on. from an amazing market position where is the confidence to take things forward? do they not think they can sell-out a higher priced Switch Pro/2 for the first couple of years even with a percentage of cross-gen software? because the demand for the Switch shows they most definitely can.
 
Please read this new, consolidated staff post before posting.

Furthermore, according to this follow-up post, all off-topic chat will be moderated.
Last edited:


Back
Top Bottom