• Hey everyone, staff have documented a list of banned content and subject matter that we feel are not consistent with site values, and don't make sense to host discussion of on Famiboards. This list (and the relevant reasoning per item) is viewable here.

StarTopic Future Nintendo Hardware & Technology Speculation & Discussion |ST| (Read the staff posts before commenting!)

Depends on if there is a sufficient external storage option or not

Honest question, is there? I assume speed could be a crucial problem here, going with the rumors that the built-in one is rather fast.

So the way most semiconductor production works, is you have a base silicon wafer (usually around 11 inches/300 mm in radius), which undergoes a whole bunch of sequential steps meant to form a whole bunch of layers of different materials and different patterns, and then square/rectangular dies are cut from the wafer which are then packaged onto a final product.

You can only get X number of usable dies from any wafer, since they are square regions on a circular wafer. And because each wafer undergoes a whole bunch of processes which all take time, equipment, and precursor materials, the amount of dies you can get from a single wafer is the single most important factor in determining the economics of the final chip cost.

So let's say for current 16GB game cards we have an 11 inch wafer, and we can get 8 dies from each wafer. I'm just pulling this number out of my ass, I'm sure in reality it's different. When switching to 3D NAND, you still start with the same 11 inch wafer, but since you can now add 48+ layers to increase your storage, the amount of die space needed is drastically smaller. Again, out of my ass, let's say you can now get 32 dies per 11 inch wafer when you use 48 layers, getting even more storage on each die due to the increased total volume, despite the lower area.

So that's how this tech winds up being much, much cheaper. For 1,000,000 wafers you wind up with 32,000,000 dies, versus 8,000,000 dies you'd have previously. That means you need to buy 4x fewer wafers and use your time, equipment, and precursor materials ~4x less than before.

Sounds good.

make-it-so-picard.gif
 
Honest question, is there? I assume speed could be a crucial problem here, going with the rumors that the built-in one is rather fast.
They have a number of options, contrary to what I once thought!

Options not currently in mass production (so are less likely):
UFS Card. Very fast, power efficient, and the reader can be backwards compatible with MicroSD. (>=1000MB/s)
MicroSDe. Extremely fast, not so power efficient, reader can be backwards compatible. (>=1000MB/s)

Options in mass production, but failed or extremely niche:
SDe, same situation as MicroSDe, only, it actually got made.

Options in mass production:
CFe. Type B wouldn't fit, bit Type A is smaller than a Game Card, so they could fit a reader in. (>=1000MB/s)
(micro)SD UHS-II (V90). Honestly, seems like the most likely scenario. They come in microSD formfactor, too, so nothing stands in the way other than expense. (≈300MB/s)

Another possibility:
Proprietary format based on Game Card technology. Obvious upsides, obvious downsides. (≈300MB/s)

My bet at the moment is firmly on microSD UHS-II (V90), targeting 300MB/s reads and good compression ratios to achieve close to 1000MB/s. It's actually in mass production, it doesn't need an additional slot, and the speed is adequate, so it could be the sweetspot for price to performance. Furthermore, it seems Nintendo is sticking with the eMMC interface for Game Card, which could give that format 300MB/s reads. Pair this with a low grade internal UFS storage solution, and you get a nice flat 300MB/s minimum read speed across all three storage options, and synchronicity like that is good for Nintendo, good for game design, and good for customers who don't need to worry about what's doing what.
 
Xbox series loads pretty dang fast in bc mode. Not sure what it's doing differently.
Interesting, I didn't know that. I will look it up when I have more time, but I imagine that if they achieved instant loading across the board it would be one of:

- Sony used the best formats they could while MS assumed the formats used on the XB1 were enough.

- MS have support to both the best formats and also the old formats, which would increase their costs per console.

- MS used their servers to decompress and recompress assets, then used smart delivery to deliver the assets with the appropriate compression format. This one would require a big download though.

The first one seems very unlikely to me. The other 2 wouldn't be that surprising given how much effort MS put on BC so far, but I would be very (pleasantly) surprised with Nintendo taking extra costs for a small return.
 
0
They have a number of options, contrary to what I once thought!

Options not currently in mass production (so are less likely):
UFS Card. Very fast, power efficient, and the reader can be backwards compatible with MicroSD. (>=1000MB/s)
MicroSDe. Extremely fast, not so power efficient, reader can be backwards compatible. (>=1000MB/s)

Options in mass production, but failed or extremely niche:
SDe, same situation as MicroSDe, only, it actually got made.

Options in mass production:
CFe. Type B wouldn't fit, bit Type A is smaller than a Game Card, so they could fit a reader in. (>=1000MB/s)
(micro)SD UHS-II (V90). Honestly, seems like the most likely scenario. They come in microSD formfactor, too, so nothing stands in the way other than expense. (≈300MB/s)

Another possibility:
Proprietary format based on Game Card technology. Obvious upsides, obvious downsides. (≈300MB/s)

My bet at the moment is firmly on microSD UHS-II (V90), targeting 300MB/s reads and good compression ratios to achieve close to 1000MB/s. It's actually in mass production, it doesn't need an additional slot, and the speed is adequate, so it could be the sweetspot for price to performance. Furthermore, it seems Nintendo is sticking with the eMMC interface for Game Card, which could give that format 300MB/s reads. Pair this with a low grade internal UFS storage solution, and you get a nice flat 300MB/s minimum read speed across all three storage options, and synchronicity like that is good for Nintendo, good for game design, and good for customers who don't need to worry about what's doing what.

Anything but a proprietary format.

Vita nightmares are still very real.
 
The bottleneck for the faster loading on ps5 in BC is most likely the CPU (running in BC mode) and not the storage speed. Microsoft BC solution is different and probably allows for the CPU to be "fully" utilised even in BC mode. (There is a reason they can do things like fps boost)
 
It just has a fast SSD, no magic tricks about it. If NG Switch has faster storage, BC games will load faster.
BC and dedicated file decompression engine, with the later also being on NG. Afaik PS5's BC has no access/aren't made to leverage the I/O engine so the CPU needs to bruteforce through all the requests like the PS4 Pro.
 
Anyway, although I don't think Nintendo and Nintendo and/or Nvidia plan to use nanoimprint lithography (NIL) technology anytime in the future, this still fascinates me.
 
Fourth option: even a weak Ps4 can run Zelda at 4K 60 fps.
I 100% doubt that this is true.

People are underestimating how taxing 4K is. PS4 couldn't even run a lot of PS3 remasters at 4K 60fps on its best day.

Not a chance in hell it's running an open-world game that's physics-based at 4k 60fps without some major drawbacks.
 
BC and dedicated file decompression engine, with the later also being on NG. Afaik PS5's BC has no access/aren't made to leverage the I/O engine so the CPU needs to bruteforce through all the requests like the PS4 Pro.

With the difference in raw power (before DLSS), i'm sure ReDraketed should be able to brute force some unlocked framerate games at least too.

To be honest, while i do expect BC to be there, it's imo more likely it'll be like Sony's approach and not like MS'.
Made my peace with that, i'm fine with 95%+ of the games running fine and having some "4K patches" for at least the Switch evergreens.

I work in the semiconductor industry and never expected to see a paper from SPIE outside of my work environment lol

Feels like right at home, eh? ;D
 
Options not currently in mass production (so are less likely):
UFS Card. Very fast, power efficient, and the reader can be backwards compatible with MicroSD. (>=1000MB/s)

I really hope they went up going with this - compatibility with a console set to sell tens of millions is all you'd need to make mass production worth it.
 
0
BC and dedicated file decompression engine, with the later also being on NG. Afaik PS5's BC has no access/aren't made to leverage the I/O engine so the CPU needs to bruteforce through all the requests like the PS4 Pro.
Yeah, as I said earlier, PS5 CPU has a up to 50% faster loading speeds in the tests done in that link. Assuming you double the CPU clocks, you're theoretically halving it again, so a 1 minute loading time on PS4 would become 30s on BC mode and probably 12~18s on native if the CPU roughly doubles the clocks. A huge improvement, just not instantaneous without the decompressor hardware.

But according to Hermii, Xbox BC is instantaneous. Which made me wonder in this post what they're doing. But now that I think about it, it's also possible the PS5 file decompressor actually supports the format used on PS4, but the hardware simply can't be accessed in BC mode, unlike Xbox. And in that situation, Nintendo doing like Xbox would be more plausible.
 
Last edited:
I appreciate the effort to make me feel like I am working while looking at famiboards during work :p

"Hey Mr./Mrs. Superizer, are you working? You've been staring at your monitor for a while now."

"Technically, sir/ma'am, i indeed am!"
 
Last edited:
@Thraktor explained why we feel like we're stuck with the bandwidth limits of LPDDR5, but figured I'd step in and answer your question directly. Note - this is all @Look over there's work, I'm just recapping.

All of the RTX 30 cards float around 30 GB/s of memory bandwidth for every TFLOP of performance. So in order to perform in the same ballpark as an RTX 30 card, the GPU needs that much bandwidth.

In a console, the CPU and the GPU share a memory bus, unlike PCs. This has performance advantages, but means you will need extra bandwidth for the CPU. CPU performance will need to keep up with GPU performance, generally, so a 20% premium is a good estimate.

That's the origin of the 3 TFLOP limit. 30 GB/s * 3 TFLOPS * 1.2 = 108GB/s, just a touch higher than the 102 GB/s that would be the max for LPDDR5.

There are some caveats.

First, obviously, not all RTX 30 cards go as high as 30GB/s/TFLOP. The highest end (RTX 3090 Ti) and the lowest end (RTX 3050) both hit 25 GB/s. That may seem unintuitive, but think of it this way. The highest end card is about pushing as much performance as possible. Extra TFLOPS past the bandwidth limit aren't useless they're just less useful. The 3090 Ti is a waste of electricity and money, delivering only marginal improvements for it's huge cost premium, but it's a card targeted at the "fuck it, I don't care, card go brrr" market.

The low end card is sort of the same situation. More bandwidth costs money. The physical silicon is cut down till it hits the cost Nvidia wants, but pushing the GPU clock speed (and thus the TFLOPS) is basically free. So even if those TFLOPS aren't delivering as many frames/pixels as they should, they're still good performance for the $$$.

Nintendo is slightly more constrained. Electricity matters a lot more. Every little bit of electricity spent costs battery life. It also means a bigger heat sink, and a bigger fan, which increases the size and the cost of the device, makes it more expensive to ship, and perversely leaves less room for the battery.

We know a little bit about Nintendo's hardware development process - it's a back and forth between the hardware team and the lead software teams. Software wants more performance, more features. Hardware wants to cut costs and hit battery life targets. When software hits a bottleneck, they ask for more power, when hardware hits a bottleneck they try to cut it.That's why we tend to assume that things will land in the area of best balance and peak efficiency. Because that's the place where the Performance that software wants is the cheapest for Hardware to deliver.

But it's possible that software just Needs More Power, and Hardware has to bite the bullet, and push past the bandwidth limit (like Nvidia did on the low end 3050). It's also possible that, while laying out the rest of the design, there are a few spare milliwatts in the budget, and the biggest bang for the buck is to push the GPU clocks a bit, rather than horde it for an extra 10 minutes of battery life (ala the 3090 Ti).

Of course that could also be true for the CPU clocks, or the storage speeds, or the screen brightness, or the WiFi speed or... so even if this situation occurs, that doesn't mean it's reflected in the GPU clocks. A lot of discussion in the past here has been about whether the T239 is more likely to be limited by the GPU, or the CPU, or the memory bus, or the the storage speed, or even the RAM. The fact that there hasn't been a clear consensus on any of those things is a good sign - it means that T239's design seems pretty balanced.

I tend to fall in the camp of "Nintendo went with a big design. Underclocking it would be wasteful, but it's so big that pushing the clocks isn't required either." I think if there is any wiggle room in the design, Nintendo will spend it making the device smaller, and improving battery life.

I agree with all your general points here, and that the system could well become bandwidth limited, but I should point out that the "30GB/s per Tflop" calculation is based on official figures from Nvidia, which don't really reflect the actual clock speeds their GPUs run at. I was about to look up some figures on this online, when I remembered that I have an RTX 3070, and can just run some tests myself.

I ran tests of Cyberpunk 2077 (under three different settings, which I'll explain below), Metro Exodus, Baldur's Gate 3 and Warhammer Total War 3 (under two different settings), logging data using GPU-Z.

Nvidia advertises a base clock of 1.5GHz for the RTX 3070, and a boost clock of 1.725GHz. My card is a Nvidia Founder's Edition model, with no changes to clocks or voltages, so as stock as you can get. Across the 7 tests (totalling just under 13 minutes of in-game data logged), the clock had a median value of 1.83GHz, and an average of 1.827GHz. The GPU never clocked down as low as 1.5GHz, and was above the 1.725GHz boost clock for 92.5% of the time. The peak clock was 1.935GHz, which it hit just under 18% of the time.

Taking the median clock of 1.83GHz, that puts the RTX 3070 at a theoretical 21.6Tflops, and with 448GB/s of bandwidth (RAM clocks were constant during the tests), it would come to 20.74 GB/s per Tflop, which is quite a bit lower than the 30GB/s you're using. By my reckoning, the limit is around 3.6Tflops (although I'm allocating a bit more to the CPU than you).

Another interesting thing from my testing is that GPU-Z outputs a value for the memory controller load when logging data, which can be used as a proxy for bandwidth usage. Here's the average figures for each of the test runs:

GameGPU Clock (MHz)GPU Load (%)Memory Controller Load (%)
Cyberpunk 2077 (no RT)1747.298.238.1
Cyberpunk 2077 (RT)1850.198.813.6
Cyberpunk 2077 (PT)1837.096.345.9
Metro Exodus (RT)1770.098.648.8
Baldur’s Gate 31801.098.751.9
Warhammer 3 (High)1837.499.857.6
Warhammer 3 (Low)1842.697.564.3

Now I should say that I don't expect the memory controller load to be close to 100% like the GPU load typically is. I'm only using logging in one-second increments here, so nowhere near the resolution required to see intra-frame behaviour, but I'd expect that bandwidth demand is pretty unevenly distributed through the frame. There are likely parts of the rendering pipeline which are almost 100% bandwidth-limited, and other parts which aren't at all, so even with a balanced GPU you're not going to see full fully saturated bandwidth usage.

That said, there are some interesting things to take from it. One is how variable bandwidth usage is between the different games (or settings within games). The most intensive games are generally using the least bandwidth, and vice versa, with the reduction in settings from high to low on Warhammer Total War 3 causing an increase in load on the memory controller.

In a sense, this isn't entirely surprising, as the less intensive games are doing less work per pixel, and are pushing more pixels as a result. They still have to create the same buffers and read from and write to them in largely the same way, but they're doing less shader work per pixel. So for Warhammer 3 on low settings, the ratio of bandwidth required per shader op is pretty high, whereas on a game that's doing a lot of shader work on each pixel, like Cyberpunk, it's pretty low.

This also explains, in part, why bandwidth hasn't needed to increase in line with Gflops over the years. Resolutions have been increasing much more slowly than raw shader performance has, so developers have been able to put a lot more shader work in per pixel. Some changes over the years (like deferred rendering) have increased bandwidth requirements by adding extra buffers which need to be moved back and forth to memory, but generally the trend of work done relative to buffer data held in RAM has been increasing.

I actually tried to take an old game (Half Life 2) and run it at very high resolution and frame rate to see if it was more bandwidth-bound than the newer titles, but I wasn't getting very high GPU load, so it seems like there was a bottleneck elsewhere, possibly the CPU. I didn't spend much time on it, but I might go back to it.

The RT result for Cyberpunk, by the way, seems to be a bit of an anomaly. It's using everything on ultra, without path tracing, but with all other ray tracing effects turned up to max. I probably should have turned on DLSS here, but it's running full native 4K, so it was chugging along at around 4-5fps. The very definition of putting in a lot of work on not that many pixels. The PT (path tracing) run for Cyberpunk used DLSS-RR, so it was running at a somewhat more respectable 15fps or so.

When I get the time I'll see if I can down clock the memory on the graphics card to see what the impact is on performance, which may be able to give an idea of the point at which some of the games become bandwidth bottlenecked. I might also have a play around with Nsight to see if I can get some more granular data on individual frames.
 
With the difference in raw power (before DLSS), i'm sure ReDraketed should be able to brute force some unlocked framerate games at least too.

To be honest, while i do expect BC to be there, it's imo more likely it'll be like Sony's approach and not like MS'.
Made my peace with that, i'm fine with 95%+ of the games running fine and having some "4K patches" for at least the Switch evergreens.
I feel like it will be more similar to MS when it comes hardware utilization (loading times etc), but without fps boost/ auto hdr. A good second place.

Why? Because Nvidia has some of the best software engineers in the business, and this seems to be an all software implementation.
 
0
I heard Nintendo was just shoving an RTX 3050 into the Switch OLED form factor and calling it a day. No concessions. Nintendo Switch Advance go brrrr.
I agree with all your general points here, and that the system could well become bandwidth limited, but I should point out that the "30GB/s per Tflop" calculation is based on official figures from Nvidia, which don't really reflect the actual clock speeds their GPUs run at. I was about to look up some figures on this online, when I remembered that I have an RTX 3070, and can just run some tests myself.

I ran tests of Cyberpunk 2077 (under three different settings, which I'll explain below), Metro Exodus, Baldur's Gate 3 and Warhammer Total War 3 (under two different settings), logging data using GPU-Z.

Nvidia advertises a base clock of 1.5GHz for the RTX 3070, and a boost clock of 1.725GHz. My card is a Nvidia Founder's Edition model, with no changes to clocks or voltages, so as stock as you can get. Across the 7 tests (totalling just under 13 minutes of in-game data logged), the clock had a median value of 1.83GHz, and an average of 1.827GHz. The GPU never clocked down as low as 1.5GHz, and was above the 1.725GHz boost clock for 92.5% of the time. The peak clock was 1.935GHz, which it hit just under 18% of the time.

Taking the median clock of 1.83GHz, that puts the RTX 3070 at a theoretical 21.6Tflops, and with 448GB/s of bandwidth (RAM clocks were constant during the tests), it would come to 20.74 GB/s per Tflop, which is quite a bit lower than the 30GB/s you're using. By my reckoning, the limit is around 3.6Tflops (although I'm allocating a bit more to the CPU than you).

Another interesting thing from my testing is that GPU-Z outputs a value for the memory controller load when logging data, which can be used as a proxy for bandwidth usage. Here's the average figures for each of the test runs:

GameGPU Clock (MHz)GPU Load (%)Memory Controller Load (%)
Cyberpunk 2077 (no RT)1747.298.238.1
Cyberpunk 2077 (RT)1850.198.813.6
Cyberpunk 2077 (PT)1837.096.345.9
Metro Exodus (RT)1770.098.648.8
Baldur’s Gate 31801.098.751.9
Warhammer 3 (High)1837.499.857.6
Warhammer 3 (Low)1842.697.564.3

Now I should say that I don't expect the memory controller load to be close to 100% like the GPU load typically is. I'm only using logging in one-second increments here, so nowhere near the resolution required to see intra-frame behaviour, but I'd expect that bandwidth demand is pretty unevenly distributed through the frame. There are likely parts of the rendering pipeline which are almost 100% bandwidth-limited, and other parts which aren't at all, so even with a balanced GPU you're not going to see full fully saturated bandwidth usage.

That said, there are some interesting things to take from it. One is how variable bandwidth usage is between the different games (or settings within games). The most intensive games are generally using the least bandwidth, and vice versa, with the reduction in settings from high to low on Warhammer Total War 3 causing an increase in load on the memory controller.

In a sense, this isn't entirely surprising, as the less intensive games are doing less work per pixel, and are pushing more pixels as a result. They still have to create the same buffers and read from and write to them in largely the same way, but they're doing less shader work per pixel. So for Warhammer 3 on low settings, the ratio of bandwidth required per shader op is pretty high, whereas on a game that's doing a lot of shader work on each pixel, like Cyberpunk, it's pretty low.

This also explains, in part, why bandwidth hasn't needed to increase in line with Gflops over the years. Resolutions have been increasing much more slowly than raw shader performance has, so developers have been able to put a lot more shader work in per pixel. Some changes over the years (like deferred rendering) have increased bandwidth requirements by adding extra buffers which need to be moved back and forth to memory, but generally the trend of work done relative to buffer data held in RAM has been increasing.

I actually tried to take an old game (Half Life 2) and run it at very high resolution and frame rate to see if it was more bandwidth-bound than the newer titles, but I wasn't getting very high GPU load, so it seems like there was a bottleneck elsewhere, possibly the CPU. I didn't spend much time on it, but I might go back to it.

The RT result for Cyberpunk, by the way, seems to be a bit of an anomaly. It's using everything on ultra, without path tracing, but with all other ray tracing effects turned up to max. I probably should have turned on DLSS here, but it's running full native 4K, so it was chugging along at around 4-5fps. The very definition of putting in a lot of work on not that many pixels. The PT (path tracing) run for Cyberpunk used DLSS-RR, so it was running at a somewhat more respectable 15fps or so.

When I get the time I'll see if I can down clock the memory on the graphics card to see what the impact is on performance, which may be able to give an idea of the point at which some of the games become bandwidth bottlenecked. I might also have a play around with Nsight to see if I can get some more granular data on individual frames.
 
Based on Thraktor findings, the test chips back in 2017 achieved around 133MB/s.
Huh, hopefully they sped things up then. As an idea I had would not work with speeds that slow

But that was 2017 and this is now. And with a post-1GB/s request era, I assume Nintnedo/Macronix would find some solution to that developer request.
 
I agree with all your general points here, and that the system could well become bandwidth limited, but I should point out that the "30GB/s per Tflop" calculation is based on official figures from Nvidia, which don't really reflect the actual clock speeds their GPUs run at. I was about to look up some figures on this online, when I remembered that I have an RTX 3070, and can just run some tests myself.

I ran tests of Cyberpunk 2077 (under three different settings, which I'll explain below), Metro Exodus, Baldur's Gate 3 and Warhammer Total War 3 (under two different settings), logging data using GPU-Z.

Nvidia advertises a base clock of 1.5GHz for the RTX 3070, and a boost clock of 1.725GHz. My card is a Nvidia Founder's Edition model, with no changes to clocks or voltages, so as stock as you can get. Across the 7 tests (totalling just under 13 minutes of in-game data logged), the clock had a median value of 1.83GHz, and an average of 1.827GHz. The GPU never clocked down as low as 1.5GHz, and was above the 1.725GHz boost clock for 92.5% of the time. The peak clock was 1.935GHz, which it hit just under 18% of the time.

Taking the median clock of 1.83GHz, that puts the RTX 3070 at a theoretical 21.6Tflops, and with 448GB/s of bandwidth (RAM clocks were constant during the tests), it would come to 20.74 GB/s per Tflop, which is quite a bit lower than the 30GB/s you're using. By my reckoning, the limit is around 3.6Tflops (although I'm allocating a bit more to the CPU than you).

Another interesting thing from my testing is that GPU-Z outputs a value for the memory controller load when logging data, which can be used as a proxy for bandwidth usage. Here's the average figures for each of the test runs:

GameGPU Clock (MHz)GPU Load (%)Memory Controller Load (%)
Cyberpunk 2077 (no RT)1747.298.238.1
Cyberpunk 2077 (RT)1850.198.813.6
Cyberpunk 2077 (PT)1837.096.345.9
Metro Exodus (RT)1770.098.648.8
Baldur’s Gate 31801.098.751.9
Warhammer 3 (High)1837.499.857.6
Warhammer 3 (Low)1842.697.564.3

Now I should say that I don't expect the memory controller load to be close to 100% like the GPU load typically is. I'm only using logging in one-second increments here, so nowhere near the resolution required to see intra-frame behaviour, but I'd expect that bandwidth demand is pretty unevenly distributed through the frame. There are likely parts of the rendering pipeline which are almost 100% bandwidth-limited, and other parts which aren't at all, so even with a balanced GPU you're not going to see full fully saturated bandwidth usage.

That said, there are some interesting things to take from it. One is how variable bandwidth usage is between the different games (or settings within games). The most intensive games are generally using the least bandwidth, and vice versa, with the reduction in settings from high to low on Warhammer Total War 3 causing an increase in load on the memory controller.

In a sense, this isn't entirely surprising, as the less intensive games are doing less work per pixel, and are pushing more pixels as a result. They still have to create the same buffers and read from and write to them in largely the same way, but they're doing less shader work per pixel. So for Warhammer 3 on low settings, the ratio of bandwidth required per shader op is pretty high, whereas on a game that's doing a lot of shader work on each pixel, like Cyberpunk, it's pretty low.

This also explains, in part, why bandwidth hasn't needed to increase in line with Gflops over the years. Resolutions have been increasing much more slowly than raw shader performance has, so developers have been able to put a lot more shader work in per pixel. Some changes over the years (like deferred rendering) have increased bandwidth requirements by adding extra buffers which need to be moved back and forth to memory, but generally the trend of work done relative to buffer data held in RAM has been increasing.

I actually tried to take an old game (Half Life 2) and run it at very high resolution and frame rate to see if it was more bandwidth-bound than the newer titles, but I wasn't getting very high GPU load, so it seems like there was a bottleneck elsewhere, possibly the CPU. I didn't spend much time on it, but I might go back to it.

The RT result for Cyberpunk, by the way, seems to be a bit of an anomaly. It's using everything on ultra, without path tracing, but with all other ray tracing effects turned up to max. I probably should have turned on DLSS here, but it's running full native 4K, so it was chugging along at around 4-5fps. The very definition of putting in a lot of work on not that many pixels. The PT (path tracing) run for Cyberpunk used DLSS-RR, so it was running at a somewhat more respectable 15fps or so.

When I get the time I'll see if I can down clock the memory on the graphics card to see what the impact is on performance, which may be able to give an idea of the point at which some of the games become bandwidth bottlenecked. I might also have a play around with Nsight to see if I can get some more granular data on individual frames.
Not to mention this all is completely ignoring how latency limited Turing and Ampere are as architectures.

Both have massive amounts of stall time when they fill their registers/cache due to how they are designed, AMD Offsets this via larger registers and then Infinity Cache afterwards (NVIDIA just riffing on the latter with Lovelace).

T239 though will have low-latency memory acess, therefore less stalling.

We already know that RT as a task is very latency-sensitive, but if Ampere is really that latency starved, even in pure-rasterized workloads (as suggested by some Chips and Cheese the fourm not the discord breakdowns) , then T239 using LPDDR may trade raw bandwidth for far less stalling in the GPU itself.

Heck, we can sort of see this Latency-oriented mindset elsewhere in Lovelace with features like SER allowing the GPUs to regulate Stall time more by reordering instructions (Primarily targeted towards Ray Tracing which is a very latency-sensitive task) Or in Hopper with them making it fully Asynchronous.

EDIT: Referencing this review mainly as per Dakhil's reply. Editing for visibility mainly, but between the ones of them that I have looked at, latency has been a notable sore spot for Turing/Ampere.
 
Last edited:
With the difference in raw power (before DLSS), i'm sure ReDraketed should be able to brute force some unlocked framerate games at least too.

To be honest, while i do expect BC to be there, it's imo more likely it'll be like Sony's approach and not like MS'.
Made my peace with that, i'm fine with 95%+ of the games running fine and having some "4K patches" for at least the Switch evergreens.
True, I'm just answering the concerns about its BC being slower than load than native PS5 games, it just doesn't have any access to any of this new hardware let alone SMT to help with things a little on that department. Ryzen was a massive jump over Jaguar, but it's still not enough for literally instantaneous times without the help of the new stuff put specifically for this purpose.
Yeah, as I said earlier, PS5 CPU has a up to 50% faster loading speeds in the tests done in that link. Assuming you double the CPU clocks, you're theoretically halving it again, so a 1 minute loading time on PS4 would become 30s on BC mode and probably 12~18s on native if the CPU roughly doubles the clocks. A huge improvement, just not instantaneous.

But according to Hermii, Xbox BC is instantaneous. Which made me wonder in this post what they're doing. But now that I think about it, it's also possible the PS5 file decompressor actually supports the format used on PS4, but the hardware simply can't be accessed in BC mode, unlike Xbox. And in that situation, Nintendo doing like Xbox would be more plausible.
Well, what Xbox's doing with BC is less traditional and more like virtualizing the entirety of the new hardware into the old games, so they can utilize it exactly like a native game would and automatically unlocking the original framerate and resolution targets of those games. Neither Sony or Nintendo have supported this approach since it's bound to have compatibility issues, especially as consoles get increasingly more complex and atypical ways to leverage them are figured out by developers.
 
Last edited:
That... is not what this video shows at all? The game runs at 20fps, and it's running on a GPU that is 60% more powerful than the PS4? On a more modern architecture?
They adjust the GPU to run like a Ps4, and make some tests that show a result close to what a Ps4 can do.

The 20fps is more a CPU problem on that emulator. In that video they run the Zelda at 720p and it runs at 20fps. They upscale to 1080p and runs at the same 20fps. Then they up to 4K and the result is the same.

And the FPS part, the Zelda Tears can run at 60 fps on switch with a MOD. Nintendo probably choose to lock on 30 to consistence, but it is easy to see a better CPU running that game at 60 all the time.

And, with what we know about the T239, the minimum power it can be is 2.3 TFLOPS. If a Wii U can run BoTW at 720fps with 260 GFLOPS, I can easy see a Switch 2 do the same but at 4K native.

To tell the truth, I'm hope for 3.4 Tflops right now. If, at 4 TFLOPS, a Ps4 PRO can run some Ps4's games at 4K, A Switch 2 with better architecture and a CPU miles better, will easy do the same to any Switch game, no?
 

Nintendo recently filed a patent of a switch lite-like system with less buttons and a pressure sensitive touch screen, had we ever covered that before?

Is this where "the Nintendo patent rule" applies here?

Learned that one in this thread.. it says something like if you can look up a Nintendo patent and it's something Nintendo hasn't announced, it likely won't be used in the future, or something to that effect.
 
This patent was already discussed. The whole point of it is supposed space saving or screen expansion from moving a couple of buttons to the bottom of the console. The rest of it is just a Switch Lite as an example device. The concept itself is dumb and doesn't actually save any space, so yeah, we can safely expect to never see this utilized in the future.
 
The 20fps is more a CPU problem on that emulator. In that video they run the Zelda at 720p and it runs at 20fps. They upscale to 1080p and runs at the same 20fps. Then they up to 4K and the result is the same.
I don't think there is anything to learn here about how much performance Breath of the Wild needs to run at 4k60 on an ARM CPU and a PS4 GPU by looking at emulated performance on an x86-64 CPU and GCN 4.0 GPU.

"It's CPU limited on the frame rate" is not "it runs at 4k60fps on a PS4." "It's CPU limited in emulation, so resolution doesn't matter there" is not the same as "without the CPU limit it would hit 4k60fps easily" is not the same as "Switch NG will have 4k60fps levels of CPU/GPU performance."

And the FPS part, the Zelda Tears can run at 60 fps on switch with a MOD. Nintendo probably choose to lock on 30 to consistence,
It can run 120fps with even more instability! I'm not sure how unstable frame rates matters?

but it is easy to see a better CPU running that game at 60 all the time.
Overclocking the CPU alone in TotK results in no performance uplift. Because the game isn't CPU limited. You can't map emulation performance back to real world performance like that.

And, with what we know about the T239, the minimum power it can be is 2.3 TFLOPS. If a Wii U can run BoTW at 720fps with 260 GFLOPS, I can easy see a Switch 2 do the same but at 4K native.
I never said anything about the resolution. It was resolution + frame rate.

To tell the truth, I'm hope for 3.4 Tflops right now. If, at 4 TFLOPS, a Ps4 PRO can run some Ps4's games at 4K, A Switch 2 with better architecture and a CPU miles better, will easy do the same to any Switch game, no?
No, unfortunately. I'm not trying to burst your bubble here, just trying to clarify. TFLOPS can't really be directly compared across architectures, but for the sake of argument, let's pretend they can.

Plenty of Switch games are running sub-720p when docked. 4k is almost exactly a 9x increase in pixel count from 720p. 3.4 TFLOPS is an 8.8x increase. For a GPU limited game (not all of them are), anything running sub-720p can't be brute forced up to 4k. And this is all assuming that the backwards compatibility layer is completely free, which is unlikely.
 
You can get a native port of TotK to 4K/60 pretty easily on the Switch 2 hardware most likely.

The question is whether you can get TotK to 4K/60 while dramatically improving asset quality and LOD, lol. The textures are very low res to save on RAM.
 
Yeah, as I said earlier, PS5 CPU has a up to 50% faster loading speeds in the tests done in that link. Assuming you double the CPU clocks, you're theoretically halving it again, so a 1 minute loading time on PS4 would become 30s on BC mode and probably 12~18s on native if the CPU roughly doubles the clocks. A huge improvement, just not instantaneous without the decompressor hardware.

But according to Hermii, Xbox BC is instantaneous. Which made me wonder in this post what they're doing. But now that I think about it, it's also possible the PS5 file decompressor actually supports the format used on PS4, but the hardware simply can't be accessed in BC mode, unlike Xbox. And in that situation, Nintendo doing like Xbox would be more plausible.
It's not really a useful comparison without looking at specific games. There's going to be a fair amount of game to game variance in how loading is impacted due to a variety of factors, like how CPU heavy the loading workload is, how the game batches its IO, or even basic stuff like how the loading animations are designed. To establish that there is a difference in the first place, you should compare specific titles across the two platforms.
 
I would expect the next Zelda game to dramatically change its combat system as the game really tries to have fights be 1-2 enemies at a time to conserve cycles. Probably can scale up to 4 to 8 enemies at a time with new hardware and make a new combat system that takes advantage of that.
 
I would expect the next Zelda game to dramatically change its combat system as the game really tries to have fights be 1-2 enemies at a time to conserve cycles. Probably can scale up to 4 to 8 enemies at a time with new hardware and make a new combat system that takes advantage of that.
Tbh, the sky is the limit with a lot of next-gen Nintendo games. While we can safely expect series like Xenoblade or Fire Emblem to mostly remain the same in terms of tone, setting and gameplay, series like Zelda or Mario can justify going in a weird direction and evolving with 8th/9th gen hardware.

As to what they'll actually do? Idk. I'd like to see a more combat-focused Zelda game that uses items in fights, I'd like to see a return to proper dungeons, but I'm happy to see where the series can go regardless of what Nintendo wants to do. I just want it to be good, and I'll be happy.
 
Yeah, Xenoblade is going to be massively changed gameplay-wise, lol.

I would expect Smash, Mario Kart, and Splatoon to stay largely the same but with more modes next-gen. I expect Zelda, Xenoblade, and Mario to change dramatically. Probably Metroid Prime as well because... There hasn't been a game in the franchise in 16 years?
 
Yeah, Xenoblade is going to be massively changed gameplay-wise, lol.

I would expect Smash, Mario Kart, and Splatoon to stay largely the same but with more modes next-gen. I expect Zelda, Xenoblade, and Mario to change dramatically. Probably Metroid Prime as well because... There hasn't been a game in the franchise in 16 years?
Xenoblade is a weird one, but developers at Monolith Soft have basically gone on record saying that they might shake it up. Considering the tease after Future Redeemed, I'm just going to shrug and say "I trust them".

Regardless, the main thing I want to see Nintendo work with is funky technology. I'm not really impressed by graphics and "size" of games anymore, but I'd adore to see games that use hardware in weirder ways. Tears of the Kingdom was a key instance of this, and (if possible) I'd like to see more of that.
 
Tbh, the sky is the limit with a lot of next-gen Nintendo games. While we can safely expect series like Xenoblade or Fire Emblem to mostly remain the same in terms of tone, setting and gameplay, series like Zelda or Mario can justify going in a weird direction and evolving with 8th/9th gen hardware.

As to what they'll actually do? Idk. I'd like to see a more combat-focused Zelda game that uses items in fights, I'd like to see a return to proper dungeons, but I'm happy to see where the series can go regardless of what Nintendo wants to do. I just want it to be good, and I'll be happy.
I feel like what’s next to Zelda is adding layers to the combat system. We already have a phenomenal physics system. What more can they improve? Combat. If the sneaking Shrines are anything to go by in TOTK, then there seems to be openness ans ideas to diversify the combat systems.

An about dungeons: I’m expecting more of the same, but more resemblance to older games dungeons, like the Thunder Temple
 
Regardless, the main thing I want to see Nintendo work with is funky technology. I'm not really impressed by graphics and "size" of games anymore, but I'd adore to see games that use hardware in weirder ways. Tears of the Kingdom was a key instance of this, and (if possible) I'd like to see more of that.
If it isn't fun, why bother?
 
as you said, it's a handheld. it's power limited. there's gonna be limits to how fast the read speeds can get. given you mention 1/5th 2.4GB/s, UHS-II is just under that, so I don't see it being too much of a problem
Ideally we don't have to play tetris moving games to and from internal storage and expandable storage but at the end of the day that is better than slow internal storage. I am encouraged by the fast internal storage rumors as I think the impact from PS5/XSX is that SSD is becoming a requirement for consoles and PC games whether or not it is fully utilized at this point. Given the planned lifespan for this device, I think you need UFS 3.1 at a bare minimum and hopefully UFS 4.0 for the internal storage. I would much rather have say 128 GB of UFS 4.0 than more space but slower speed
 
I don't think there is anything to learn here about how much performance Breath of the Wild needs to run at 4k60 on an ARM CPU and a PS4 GPU by looking at emulated performance on an x86-64 CPU and GCN 4.0 GPU.

"It's CPU limited on the frame rate" is not "it runs at 4k60fps on a PS4." "It's CPU limited in emulation, so resolution doesn't matter there" is not the same as "without the CPU limit it would hit 4k60fps easily" is not the same as "Switch NG will have 4k60fps levels of CPU/GPU performance."


It can run 120fps with even more instability! I'm not sure how unstable frame rates matters?


Overclocking the CPU alone in TotK results in no performance uplift. Because the game isn't CPU limited. You can't map emulation performance back to real world performance like that.


I never said anything about the resolution. It was resolution + frame rate.


No, unfortunately. I'm not trying to burst your bubble here, just trying to clarify. TFLOPS can't really be directly compared across architectures, but for the sake of argument, let's pretend they can.

Plenty of Switch games are running sub-720p when docked. 4k is almost exactly a 9x increase in pixel count from 720p. 3.4 TFLOPS is an 8.8x increase. For a GPU limited game (not all of them are), anything running sub-720p can't be brute forced up to 4k. And this is all assuming that the backwards compatibility layer is completely free, which is unlikely.
I appreciate your insights as my naive take was that this was a helpful experiment and although far from conclusive seemed to add weight to the idea that our base case could handle the BOTW 4k/60 native rumors.

I am pretty sure even 1440p/60 native for Switch games upscaled by nicer 4k HDTVs (Sony, LG, Samsung, etc) to full 4k will make for an awesome upgrade.
 
If they do, I wonder if Nintendo will allow developers to port games to NVN2 for free, because I don't see them not monetizing full 4K60 DLSS NVM2 ports for their own titles somehow, and if 3rd part devs do it for free it would make Nintendo look bad by comparison.
Ehh, that's just the nature of competition, and Nintendo doesn't seem to be a dick about it. Like, they don't force third parties to maintain high prices on their software so BOTW being $60 six years later looks better.
I really wish Samsung UFS 3.0 cards were still a thing because what you are suggesting is not remotely future proof enough imo. Perhaps for the expandable storage if the idea is we have to manage it with a smaller fast internal storage but for example the XBox Series S has 2.4GB/s uncompressed and 4.8GB/s compressed read speeds.

I totally get we are talking a handheld vs a console but phones are using UFS 4.0 internal storage as of early this year. If this is a device which launches in potentially late 2024, is expected last 6+ years and it is hoping to have 1/5 the raw read speed of the lowest common denominator from 2020 this could become a huge issue for the NG ports in time.
I feel like it'd have to be some pretty specific circumstances for it to be a development headache that something that can load in 1 second on Series S could take 5 seconds on Switch 2.

Personally I think I'll be pretty happy as long as read speeds grow faster than RAM.
 
The textures are very low res to save on RAM.
They could have chosen a certain art style to help save on RAM usage, but I'm not sure that's a bad thing. I would have preferred this art style choice over "hyperrealistic" textures (hypothetically, if RAM usage was non-issue for the latter).

It's kind of like saying Wind Waker, back then, was using low res to save on RAM. Same thing, might have been true, but I loved the art style in that game.
 
They could have chosen a certain art style to help save on RAM usage, but I'm not sure that's a bad thing. I would have preferred this art style choice over "hyperrealistic" textures (if RAM usage is non-issue for the latter).

It's kind of like saying Wind Waker, back then, was using low res to save on RAM. Same thing, might have been true, but I loved the art style in that game.

?????????????????

High resolution textures... Do not require a realistic art style?

Do you think Encanto (which uses as high of quality of assets as possible at all times because it's a film) has a realistic art style?
 
0
Ideally we don't have to play tetris moving games to and from internal storage and expandable storage but at the end of the day that is better than slow internal storage. I am encouraged by the fast internal storage rumors as I think the impact from PS5/XSX is that SSD is becoming a requirement for consoles and PC games whether or not it is fully utilized at this point. Given the planned lifespan for this device, I think you need UFS 3.1 at a bare minimum and hopefully UFS 4.0 for the internal storage. I would much rather have say 128 GB of UFS 4.0 than more space but slower speed
personally, I think cold storage is a worse option. especially when paired with a slow medium you're moving to/from. having to move a 60GB+ CoD Warzone from you're insufficiently fast storage whenever you want to play a variety of games get tiring quickly
 
Thought I would share this link for a crowd funded Android device here as the specs / price are pretty interesting context. Obviously a crowd funded device manufactured in China will likely have quality issues, inferior parts and potentially a somewhat subsidized price but it should make for good discussion.

Odin 2

I provided the link in case anyone wanted the source or to check it out. There are 3 tiers separated by storage and RAM upgrades but the top tier is below.

Odin Max ($449):
Hall joysticks
6" 1080p IPS LCD
Qualcomm Kyro 64bit CPU (1 GoldPlus @ 3.2 Ghz, 4 Gold @ 2.8 Ghz & 3 Silver @ 2 Ghz)
Adreno 740 GPU
16GB LPDDR5X RAM
512 GB UFS 4.0
USB 3.1, Bluetooth 5.3, WiFi7
8000 mAh battery
65W quick charge
420g weight

They are expecting to release this in Dec 2023
 
Thought I would share this link for a crowd funded Android device here as the specs / price are pretty interesting context. Obviously a crowd funded device manufactured in China will likely have quality issues, inferior parts and potentially a somewhat subsidized price but it should make for good discussion.

Odin 2

I provided the link in case anyone wanted the source or to check it out. There are 3 tiers separated by storage and RAM upgrades but the top tier is below.

Odin Max ($449):
Hall joysticks
6" 1080p IPS LCD
Qualcomm Kyro 64bit CPU (1 GoldPlus @ 3.2 Ghz, 4 Gold @ 2.8 Ghz & 3 Silver @ 2 Ghz)
Adreno 740 GPU
16GB LPDDR5X RAM
512 GB UFS 4.0
USB 3.1, Bluetooth 5.3, WiFi7
8000 mAh battery
65W quick charge
420g weight

They are expecting to release this in Dec 2023
Link doesn't work (edit: nvm, fixed now)
 
Thought I would share this link for a crowd funded Android device here as the specs / price are pretty interesting context. Obviously a crowd funded device manufactured in China will likely have quality issues, inferior parts and potentially a somewhat subsidized price but it should make for good discussion.

Odin 2

I provided the link in case anyone wanted the source or to check it out. There are 3 tiers separated by storage and RAM upgrades but the top tier is below.

Odin Max ($449):
Hall joysticks
6" 1080p IPS LCD
Qualcomm Kyro 64bit CPU (1 GoldPlus @ 3.2 Ghz, 4 Gold @ 2.8 Ghz & 3 Silver @ 2 Ghz)
Adreno 740 GPU
16GB LPDDR5X RAM
512 GB UFS 4.0
USB 3.1, Bluetooth 5.3, WiFi7
8000 mAh battery
65W quick charge
420g weight

They are expecting to release this in Dec 2023
the price is definitely subsidized. the crowdfunding price are usually limited time while the post-crowdfunding price goes up by a couple hundred dollars
 
Please read this staff post before posting.

Furthermore, according to this follow-up post, all off-topic chat will be moderated.
Last edited:


Back
Top Bottom