• Hey everyone, staff have documented a list of banned content and subject matter that we feel are not consistent with site values, and don't make sense to host discussion of on Famiboards. This list (and the relevant reasoning per item) is viewable here.

StarTopic Future Nintendo Hardware & Technology Speculation & Discussion |ST| (Read the staff posts before commenting!)

We had no leaks indicating anything about the cpu, other than it’s likely to be a78.For all we know it’s quad core.

Nothing about ram either for that matter. The only solid info we have is about gpu.and what we can extrapolate from Orin.

Nintendo opting for a quad core CPU for Drake would be odd. either it has SMT (or hyper threading) or is clocked very high.
 
Actually, LiC (and Thraktor) found that in comparison to Orin, which has a max bus-width of 256-bit, Drake has a max bus-width of 128-bit, with respect to the RAM, from the illegal Nvidia leaks.
Yes, the AGX Orin is equipped with a 256-bit data bus running at 204.8 GB/s (3200 MHz frequency). The Orin NX, on the other hand, is 128-bit at 102.4 GB/s (same 3200 MHz).

Since we are discussing the process nodes again, I wonder if the Orin NX is a better reference point than AGX Orin, which has a lot more fat and might give us a wrong impression regarding Samsung 8nm node's feasibility for the Drake. According to the official data sheet, there are two versions of Orin NX:
  • Jetson Orin NX 16 GB
    • 100 (Sparse) INT8 TOPs and 50 (Dense) INT8 TOPs
    • 10W | 15W | 25W modes
  • Jetson Orin NX 8 GB
    • 70 (Sparse) INT8 TOPs and 35 (Dense) INT8 TOPs
    • 10W | 15W | 20W modes
At their max performances, there's only a 5W delta between these two versions. So I looked through the data sheet to identify the differences in specs:
  • Jetson Orin NX 16 GB
    • GPU: 918 MHz (1024 CUDA cores, 32 tensor cores)
    • CPU: 8 cores, 2 MB L2 (2 GHz)
    • DLA: 2 accelerators at 614 MHz (20 TOPS * 2)
    • Memory: 16 GB (128-bit LPDDR5 102.4 GB/s)
  • Jetson Orin NX 8 GB
    • GPU: 765 MHz (same core numbers)
    • CPU: 6 cores, 1.5 MB L2 (same frequency)
    • DLA: 1 accelerator at 610 MHz (20 TOPS * 1)
    • Memory: 8 GB (same specs)
These were all the differences that I could find; otherwise the two Orin NX modules are identical. Unless I'm missing something, the following components account for 5W using the Samsung 8nm node:
  • 918 MHz vs. 765 MHz GPU frequency
  • Two A78AE CPU cores, including 0.5 MB L2 difference
  • One Deep Learning Accelerator (DLA); negligible frequency difference
  • 8 GB DRAM
I'm not knowledgeable enough to estimate the power consumption of each individual component, but my guess is that the majority of that 5W goes to the DLA, which Drake does not need. There are other components on the Orin NX that Drake doesn't need either, such as the Programmable Vision Accelerator (PVA) and Image Signal Processor (ISP), but I couldn't find any info suggestive of their power consumption.
 
So I ended up buying my mates Steamdeck as mentioned earlier in the thread.

Some thoughts if anyone cares since it's an up to date handheld device from the view of someone who's waited on the next Switch for 3 years and decided to break and buy a SD lol... -

CPU -

For third party games 60fps is extremely taxing on the CPU meaning it ramps up to 100% utilisation, a lot of heat then as you'd expect the fan ramps up to 100% meaning significant noise even after the latest patch to make it quieter. You also get an hour less battery life playing at 60fps versus using the 30fps cap. Keep in mind this CPU is basically the PS5/Series console CPU cut in half with 4 cores / 8 threads and can boost up to 3.5Ghz.

GPU -

Image quality wise the colours obviously aren't as good as the OLED screen but even lowering the resolution to 720p then using the AMD trick it still looks crisp. DLSS is going to be great as it's much better by all accounts.

Memory -

16GB's at 88GB/s seems to be phenomenal and plenty for most games. This could also be why games load so fast? For future proofing I'd love to see this exact RAM from Nintendo considering some will have to be used for the OS and background recording. 12GB dedicated for games would be great!

HDD -

Although I only have the standard HDD and not an SSD it loads really, really fast compared to a PS4 Pro and even faster than PS5 in Path of Exile for instance. I'm also using a 128GB SD card so maybe that makes it much faster than the PS4 Pro? Overall after using what is essentially a PC handheld with no way as much per game optimisation I'm no longer too worried if the next Switch doesn't use an SSD.

Build -

Although quite a bit bigger than the OLED Switch it's much, much more comfortable to hold as someone with large hands. Feels comparable to the OLED with the Hori Split Pad Pro Joycons. It was a LOT lighter than I anticipated.

Personal Conclusion -

Overall it's an extremely impressive piece of kit especially as I was lucky enough to bag it for the same price I paid for my OLED Switch (£330).

It has worried me slightly though for future third party games because I don't feel Nintendo will want to push this amount of RAM and clock speeds to this extent to prevent the heat, noise of fan and lower end battery life even if it's on a smaller die. It's more than enough for a lot of PS360 + PS4/XBO games at 720p/60fps.

If Nintendo can hit this CPU + GPU + RAM performance with DLSS on top then I shall be delighted though !
Would be interesting to see the performance and power draw on 5nm..
Yes, the AGX Orin is equipped with a 256-bit data bus running at 204.8 GB/s (3200 MHz frequency). The Orin NX, on the other hand, is 128-bit at 102.4 GB/s (same 3200 MHz).

Since we are discussing the process nodes again, I wonder if the Orin NX is a better reference point than AGX Orin, which has a lot more fat and might give us a wrong impression regarding Samsung 8nm node's feasibility for the Drake. According to the official data sheet, there are two versions of Orin NX:
  • Jetson Orin NX 16 GB
    • 100 (Sparse) INT8 TOPs and 50 (Dense) INT8 TOPs
    • 10W | 15W | 25W modes
  • Jetson Orin NX 8 GB
    • 70 (Sparse) INT8 TOPs and 35 (Dense) INT8 TOPs
    • 10W | 15W | 20W modes
At their max performances, there's only a 5W delta between these two versions. So I looked through the data sheet to identify the differences in specs:
  • Jetson Orin NX 16 GB
    • GPU: 918 MHz (1024 CUDA cores, 32 tensor cores)
    • CPU: 8 cores, 2 MB L2 (2 GHz)
    • DLA: 2 accelerators at 614 MHz (20 TOPS * 2)
    • Memory: 16 GB (128-bit LPDDR5 102.4 GB/s)
  • Jetson Orin NX 8 GB
    • GPU: 765 MHz (same core numbers)
    • CPU: 6 cores, 1.5 MB L2 (same frequency)
    • DLA: 1 accelerator at 610 MHz (20 TOPS * 1)
    • Memory: 8 GB (same specs)
These were all the differences that I could find; otherwise the two Orin NX modules are identical. Unless I'm missing something, the following components account for 5W using the Samsung 8nm node:
  • 918 MHz vs. 765 MHz GPU frequency
  • Two A78AE CPU cores, including 0.5 MB L2 difference
  • One Deep Learning Accelerator (DLA); negligible frequency difference
  • 8 GB DRAM
I'm not knowledgeable enough to estimate the power consumption of each individual component, but my guess is that the majority of that 5W goes to the DLA, which Drake does not need. There are other components on the Orin NX that Drake doesn't need either, such as the Programmable Vision Accelerator (PVA) and Image Signal Processor (ISP), but I couldn't find any info suggestive of their power consumption.
But Drake is supposed to be 1500 cuda cores (12 SMs), right, which is in the middle of AGX and NX.
I think 16GB is overkill for Drake as well, so 12GB is more likely (Nvidia/Nintendo customized the RAM for switch) But It's truly mind blogging to think Nvidia would really make a third Orion model, just for Switch 2. They could just overclock NX (half the cores of AGX) at 5nm or just use a NX with 8nm Samsung. This would hypothetically be based on the 16GB model with the 918Mhz and 8 CPU cores of course.

They have the GPU for that.

Nvidia GPUs aren’t a slouch when it comes to physics simulations, and the RT cores are essentially physics accelerators (I’m overly simplifying this).

It does require more work if they have the CPU doing this though, but it isn’t so far out of the realm.
AyaNeo2 also has twice the CPU cores and threads, and at a higher clock too. It's like portable x series s. Crazy.

X series s draws +80 watts max in certain games.
 
Last edited:
Quoted by: LiC
1
But It's truly mind blogging to think Nvidia would really make a third Orion model, just for Switch 2. They could just overclock NX (half the cores of AGX) at 5nm or just use a NX with 8nm Samsung. This would be based on the 16GN model with the 918Mhz and 8 CPU cores.
Orin wasn't designed for gaming. With the various aspects where GA10F actually hews closer to desktop Ampere than GA10B does, it seem plausible that the decision to make a custom chip was motivated by a need to design it for its proper use case. Power draw is also a big consideration, with GA10F supporting the fine-grained FLCG level of power control that GA10B doesn't. And while we don't know anything about the CPU used in T239, the one used in T234 is also not what you would choose for a gaming design.

It's also less Nvidia deciding to do something and more what Nintendo is willing to pay them for. Clearly Nintendo thought it made more business sense to pay for a custom chip than to wrangle an existing design to fit their needs. And why shouldn't they? Using an off-the-shelf TX1 was a good fit for getting back on track after a tough few years; now the Switch is hugely successful and they can get back to what they've pretty much always done with new hardware.
 
I’m sure you know, but just need to point out that it doesn’t actually need as good specs for similar real world performance, as natively optimized games run a lot more efficient than the windows version does through Proton.
Yeah some games definitely run better than others. A PC platform with optimised settings and API tweaks is cool to have as awful console ports was always the reason I could never stick with PC as my main multiplatform machine.
 
0
Since we are discussing the process nodes again, I wonder if the Orin NX is a better reference point than AGX Orin, which has a lot more fat and might give us a wrong impression regarding Samsung 8nm node's feasibility for the Drake. According to the official data sheet, there are two versions of Orin NX:
  • Jetson Orin NX 16 GB
    • 100 (Sparse) INT8 TOPs and 50 (Dense) INT8 TOPs
    • 10W | 15W | 25W modes
  • Jetson Orin NX 8 GB
    • 70 (Sparse) INT8 TOPs and 35 (Dense) INT8 TOPs
    • 10W | 15W | 20W modes
At their max performances, there's only a 5W delta between these two versions. So I looked through the data sheet to identify the differences in specs:
  • Jetson Orin NX 16 GB
    • GPU: 918 MHz (1024 CUDA cores, 32 tensor cores)
    • CPU: 8 cores, 2 MB L2 (2 GHz)
    • DLA: 2 accelerators at 614 MHz (20 TOPS * 2)
    • Memory: 16 GB (128-bit LPDDR5 102.4 GB/s)
  • Jetson Orin NX 8 GB
    • GPU: 765 MHz (same core numbers)
    • CPU: 6 cores, 1.5 MB L2 (same frequency)
    • DLA: 1 accelerator at 610 MHz (20 TOPS * 1)
    • Memory: 8 GB (same specs)
These were all the differences that I could find; otherwise the two Orin NX modules are identical. Unless I'm missing something, the following components account for 5W using the Samsung 8nm node:
  • 918 MHz vs. 765 MHz GPU frequency
  • Two A78AE CPU cores, including 0.5 MB L2 difference
  • One Deep Learning Accelerator (DLA); negligible frequency difference
  • 8 GB DRAM
I'm not knowledgeable enough to estimate the power consumption of each individual component, but my guess is that the majority of that 5W goes to the DLA, which Drake does not need. There are other components on the Orin NX that Drake doesn't need either, such as the Programmable Vision Accelerator (PVA) and Image Signal Processor (ISP), but I couldn't find any info suggestive of their power consumption.
Ignoring TDP, I think the Jetson AGX Orin (32 GB) is probably the closest reference to Drake in terms of the GPU. The max amount of TPCs the Jetson AGX Orin (32 GB) has available is 7 TPCs, which translates to 14 SMs, or 1792 CUDA cores. And Drake's known to have 12 SMs, or 1536 CUDA cores, which translates to 6 TPCs.
 
So I ended up buying my mates Steamdeck as mentioned earlier in the thread.

Some thoughts if anyone cares since it's an up to date handheld device from the view of someone who's waited on the next Switch for 3 years and decided to break and buy a SD lol... -

CPU -

For third party games 60fps is extremely taxing on the CPU meaning it ramps up to 100% utilisation, a lot of heat then as you'd expect the fan ramps up to 100% meaning significant noise even after the latest patch to make it quieter. You also get an hour less battery life playing at 60fps versus using the 30fps cap. Keep in mind this CPU is basically the PS5/Series console CPU cut in half with 4 cores / 8 threads and can boost up to 3.5Ghz.

GPU -

Image quality wise the colours obviously aren't as good as the OLED screen but even lowering the resolution to 720p then using the AMD trick it still looks crisp. DLSS is going to be great as it's much better by all accounts.

Memory -

16GB's at 88GB/s seems to be phenomenal and plenty for most games. This could also be why games load so fast? For future proofing I'd love to see this exact RAM from Nintendo considering some will have to be used for the OS and background recording. 12GB dedicated for games would be great!

HDD -

Although I only have the standard HDD and not an SSD it loads really, really fast compared to a PS4 Pro and even faster than PS5 in Path of Exile for instance. I'm also using a 128GB SD card so maybe that makes it much faster than the PS4 Pro? Overall after using what is essentially a PC handheld with no way as much per game optimisation I'm no longer too worried if the next Switch doesn't use an SSD.


Build -

Although quite a bit bigger than the OLED Switch it's much, much more comfortable to hold as someone with large hands. Feels comparable to the OLED with the Hori Split Pad Pro Joycons. It was a LOT lighter than I anticipated.

Personal Conclusion -

Overall it's an extremely impressive piece of kit especially as I was lucky enough to bag it for the same price I paid for my OLED Switch (£330).

It has worried me slightly though for future third party games because I don't feel Nintendo will want to push this amount of RAM and clock speeds to this extent to prevent the heat, noise of fan and lower end battery life even if it's on a smaller die. It's more than enough for a lot of PS360 + PS4/XBO games at 720p/60fps.

If Nintendo can hit this CPU + GPU + RAM performance with DLSS on top then I shall be delighted though !
Thanks for the review.
Just want to clean up terminology a bit regarding storage, as I've bolded.
So what the base model of the Steam Deck has for storage is eMMC, like the Switch. eMMC is not an HDD. HDD stands for Hard Disk Drive; those literally have disks spinning around inside of them (ie,those 5400/7200 RPM numbers literally are Rotations Per Minute) and are a form of magnetic storage.
I get that when you say SSD, what you have in mind is more specifically NVMe Solid State Drives as used in PCs; very common misunderstanding by general audiences. eMMC uses NAND flash (like other Solid State Drives) and is classified under solid state storage too. If you're wondering about the speed difference between eMMC and NVMe, well, it's age. The MMC in eMMC (embedded Multi Media Card) was a standard unveiled in 1997. Version 1.0 of NVMe was released in 2011. Btw, SD (as in SD cards) was released as a successor to MMC in 1999.

If you can feel a difference in loading speeds between eMMC/SD card and the HDD in a PS4, the explanation is probably in the difference between random read speeds or random IOPS (Input/Output Operations Per Second). You see, HDDs actually aren't that bad at sequential read. That is, if the data you need to read is all in a row or whatever is the best case scenario for the reader heads to pick things up. They'd probably actually be neck and neck with UHS-I SD cards and maybe 1/3 to 1/2 of modern eMMC's sequential read.
Random read and IOPs are a different story though. Spinning disks probably have random IOPS in the 2 digit range and read in the low 2 digit MB/s. SD Cards have random read IOPS in the low 4 digits (random write IOPs in the 3 to low 4 digits). eMMC's random IOPs are in the high 4 to low 5 digit range.
 
Thanks for the review.
Just want to clean up terminology a bit regarding storage, as I've bolded.
So what the base model of the Steam Deck has for storage is eMMC, like the Switch. eMMC is not an HDD. HDD stands for Hard Disk Drive; those literally have disks spinning around inside of them (ie,those 5400/7200 RPM numbers literally are Rotations Per Minute) and are a form of magnetic storage.
I get that when you say SSD, what you have in mind is more specifically NVMe Solid State Drives as used in PCs; very common misunderstanding by general audiences. eMMC uses NAND flash (like other Solid State Drives) and is classified under solid state storage too. If you're wondering about the speed difference between eMMC and NVMe, well, it's age. The MMC in eMMC (embedded Multi Media Card) was a standard unveiled in 1997. Version 1.0 of NVMe was released in 2011. Btw, SD (as in SD cards) was released as a successor to MMC in 1999.

If you can feel a difference in loading speeds between eMMC/SD card and the HDD in a PS4, the explanation is probably in the difference between random read speeds or random IOPS (Input/Output Operations Per Second). You see, HDDs actually aren't that bad at sequential read. That is, if the data you need to read is all in a row or whatever is the best case scenario for the reader heads to pick things up. They'd probably actually be neck and neck with UHS-I SD cards and maybe 1/3 to 1/2 of modern eMMC's sequential read.
Random read and IOPs are a different story though. Spinning disks probably have random IOPS in the 2 digit range and read in the low 2 digit MB/s. SD Cards have random read IOPS in the low 4 digits (random write IOPs in the 3 to low 4 digits). eMMC's random IOPs are in the high 4 to low 5 digit range.
Thanks for the explanation. Very interesting.

Steam Deck is definitely faster especially in the usual long initial loads. It's more on par with using a PS4 app on PS5 rather than a PS4 app on PS4 I'd estimate. PC games in general have always loaded way faster than PS4/XBO due to the much better CPU's so that's probably the main explanation.

Switch is indeed no slouch either but with much less third party comparisons available obviously. The loading in Nintendo games is minimal.
 
0
I don't think the next Switch will close the gap, if anything i think it will be get bigger as the PS5 and Xbox Series are not bottlenecked by jaguar CPU's.

Well current Switch is bottlenecked by CPU and RAM bandwith, also most of PS5/Xbox Series X power goes for achieving native 4K resolution (at least that goal),
and with DLSS Nintendo doesnt need native resolution, actually achiving native 1080p would be enough.

In any case, IQ should be much better on Drake Switch compared to current Switch units, definitely bigger power jump then going from Wii U to Switch.
 
Last edited:
Well current Switch is bottlenecked by CPU and RAM bandwith, also most of PS5/Xbox Series X power goes for achieving native 4K resolution (at least that goal),
and with DLSS Nintendo doesnt need native resolution, actually achiving native 1080p would enough.

In any case, IQ should be much better on Drake Switch compared to current Switch units.
There may be a chance they use it in portable mode too simply to squeeze every bit of extra battery life, someone did a test with it and the results are unsurprising:
Between FSR and DLSS. So even though the other consoles will be utilizing FSR (probably not Sony but who knows?), dedicated hardware> Generalized software.


Though, Series X and S do have dedicated hardware that is supposed to do FP16 acceleration I think, maybe for MS consoles they’ll offload it to that.
 
Though, Series X and S do have dedicated hardware that is supposed to do FP16 acceleration I think, maybe for MS consoles they’ll offload it to that.
They do not have dedicated hardware. They have INT8/INT4 instructions available, but they are still executed on the vector ALUs, so it’s not really comparable to the tensor cores. The advantage of integer quantization in this case is that you can execute more operations in the same time; you get 12 TFLOPS for FP32, 24 TFLOPS for FP16, 49 TOPS for INT8, and 97 TOPS for INT4.
 
They do not have dedicated hardware. They have INT8/INT4 instructions available, but they are still executed on the vector ALUs, so it’s not really comparable to the tensor cores. The advantage of integer quantization in this case is that you can execute more operations in the same time; you get 12 TFLOPS for FP32, 24 TFLOPS for FP16, 49 TOPS for INT8, and 97 TOPS for INT4.
Oh, I thought this meant that they had some sort of “hardware block” (not necessarily tensor cores in my post):

"We knew that many inference algorithms need only 8-bit and 4-bit integer positions for weights and the math operations involving those weights comprise the bulk of the performance overhead for those algorithms," says Andrew Goossen. "So we added special hardware support for this specific scenario. The result is that Series X offers 49 TOPS for 8-bit integer operations and 97 TOPS for 4-bit integer operations. Note that the weights are integers, so those are TOPS and not TFLOPs. The net result is that Series X offers unparalleled intelligence for machine learning."

There was also the HotChips presentation where they label the “Machine Learning” part as having due area cost and describe it for “resolution” among other things that it can do:


I assumed this was just an extra bit that is with the ALU that helps it perform the ML task more efficiently.
 
Oh, I thought this meant that they had some sort of “hardware block” (not necessarily tensor cores in my post):

"We knew that many inference algorithms need only 8-bit and 4-bit integer positions for weights and the math operations involving those weights comprise the bulk of the performance overhead for those algorithms," says Andrew Goossen. "So we added special hardware support for this specific scenario. The result is that Series X offers 49 TOPS for 8-bit integer operations and 97 TOPS for 4-bit integer operations. Note that the weights are integers, so those are TOPS and not TFLOPs. The net result is that Series X offers unparalleled intelligence for machine learning."

There was also the HotChips presentation where they label the “Machine Learning” part as having due area cost and describe it for “resolution” among other things that it can do:


I assumed this was just an extra bit that is with the ALU that helps it perform the ML task more efficiently.
I think they are just talking about the INT8/INT4 support in the ALUs. I found the video of the presentation here:




The slide from that article is at 24:40, where the speaker says, “We added a small amount of extra logic to the compute units to get us a 10X improvement in these tasks versus using standard shader ops.” I believe this is just referring to the integer support, which afaik PS5 doesn’t have.

As far as superresolution goes, you could run the matrix math on the normal ALUs regardless of precision, but it would be in lieu of other operations, just like running FSR 2.0 with the ALUs. I don’t actually know if it’s ever actually been confirmed whether DLSS is running in floating points or quantized to integers. Nvidia’s tensor cores do support both floating point and integer calculations, and you can see their performance comparison in Table 3 in the Ampere white paper. (It’s a 2x and 4x multiplier over FP16 performance.)
 
All this stuff with the latest AMD AI upscaling version - does this have an impact on current Switch hardware at all? Not so much patches for old games, but going forwards we see some titles getting a little lift as a result?
 
All this stuff with the latest AMD AI upscaling version - does this have an impact on current Switch hardware at all? Not so much patches for old games, but going forwards we see some titles getting a little lift as a result?
This is hard to answer right now. There are a few unknowns when it comes to the performance of FSR at lower resolutions and its impact on framerate on lower performance gpus.

With DLSS, especially on Ampere, the performance hit for using it is minimal as the process is offloaded to Tensor Cores, with FSR 2.0 the compute is done on the regular shader cores so it has to run as part of the rendering pipeline rather than running in parallel with it.

If FSR 2.0 is being implemented into a switch game the dev would have to weigh up the cost of using it to try and clean up an already low resolution vs just spending that GPU compute elsewhere. With how small the GPU is in the regular switch I am not sure how feasible it is.

For a practical example let's take XBC2, it can hit as low as 378p in handheld and really only runs around 720p docked. To implement FSR 2 there would be a frame time cost which may take the internal render resolution down even further, is this going to make the image even worse despite higher output resolution? With it being a much simpler scaling solution vs DLSS I would say it's likely. Whereas with DLSS that frame time cost is much lower on dedicated hardware so the required drop in resolution to implement is much lower and with DLSS being an AI solution it is much better at resolving those low resolution images so the output will be better.

Personally, if Nintendo is launching a DLSS capable device soon I don't see them giving much attention to FSR 2 given the current hardware and given that DLSS is a superior solution.

Even with regards to third parties, as FSR 2.0 and DLSS 2+ require roughly the same input data, I see most games that get a temporal up scaling solution getting DLSS on switch. If you are porting a game from PS5 or XBOX etc and it already has FSR 2.0, it makes sense they would swap it out for DLSS on switch to get the best performance possible given I imagine both Nintendo and Nvidia will be pushing it and wouldn't be surprised if the dev tools get some functionality to help swap them out in the future.

There are politics that will get in the way, I'll be curious to see what the upcoming next gen patches for RE2 Remake and RE3 Remake on PC look like. Capcom has traditionally favoured Sony for RE titles, at least in more recent years, so wouldn't be surprised to see only FSR 2 supported and no RTX optimisations for nvidia. If we do see both FSR 2 and DLSS implemented and RTX optimisations, I think it will be a hint towards switch 2 ports.
 
Last edited:
The power profile, battery tech and laws of physics are still the same.
Their point is valid. Most of the places you see ARM chipsets are mobile phones where big cores are used in burst and not for sustained workloads.

When designing a chip specifically for a game console there is no reason to use little cores and the use case of the CPU is completely different.

The switch will need to be able to use the CPU at a constant level of sustained performance, thus the really high boost clocks you see on mobile phones isn't going to cut it. We will probably see much lower clock speeds.

As the power requirement and heat generated from a CPU core does not scale linearly with clock speed, the power saving from clocking lower will be significant. As such I'd argue its more likely they will have more than four big cores as its more efficient to have more cores clocked lower to meet a given performance target than it is to have less clocked higher.

I mean we also have the fact Orin largely comes in 6-12 core variants.
 
The power profile, battery tech and laws of physics are still the same.
and there are still ways to mitigate those. lower clocks, more advanced nodes like TSMC's 4nm, etc. point is, what a phone is doing doesn't have too much bearing on what a gaming device is doing because their purposes are quite different
 
and there are still ways to mitigate those. lower clocks, more advanced nodes like TSMC's 4nm, etc. point is, what a phone is doing doesn't have too much bearing on what a gaming device is doing because their purposes are quite different
I said it before, but I would take 4 cores at 2Ghz over 8 cores at 1Ghz. To be clear, 4 cores at 2Ghz wouldn't be a compromise. For comparison, the Steam Deck is 4x3.5Ghz, 4x2Ghz A78AE wouldn't be far behind, they would be comparable in power.

I believe that the core count has become another arbitrary metric for many, without much regard if that is better or not. I expect any device Nintendo can realistically make will be slower than an Apple's A15 CPU, and that's 2 cores. The industry leading, desktop tier performance M1 is quad core.
 
Last edited:
0
The power profile, battery tech and laws of physics are still the same.
I'd rather have eight cores @ 1.4Ghz than four cores @ 2Ghz, and those two profiles use roughly similar levels of power consumption.

Does anyone know if A78C supports clocking cores independently? If so, I can imagine Nintendo having one or two cores clocked higher than the rest for games that rely heavily on single-thread performance, maybe 2Ghz or even higher, then having the core reserved for the OS running at 1Ghz.
 
I'd rather have eight cores @ 1.4Ghz than four cores @ 2Ghz, and those two profiles use roughly similar levels of power consumption.

Does anyone know if A78C supports clocking cores independently? If so, I can imagine Nintendo having one or two cores clocked higher than the rest for games that rely heavily on single-thread performance, maybe 2Ghz or even higher, then having the core reserved for the OS running at 1Ghz.
Price and size are also major factors here.

Bigger cpu inevitably means less space and monetary budget to spend elsewhere.
 
0
I'd rather have eight cores @ 1.4Ghz than four cores @ 2Ghz, and those two profiles use roughly similar levels of power consumption.

Does anyone know if A78C supports clocking cores independently? If so, I can imagine Nintendo having one or two cores clocked higher than the rest for games that rely heavily on single-thread performance, maybe 2Ghz or even higher, then having the core reserved for the OS running at 1Ghz.
I will admit that I do not know enough to say if 4x2Ghz would be better or worse than eight at 1.4Ghz. For my work, 4x2Ghz would be significantly better, but I'm no game dev. But I dare to say that the distinction is not nearly as clear as you say, but I would love to know the opinion of actual game devs on this.
 
Last edited:
Not many mobile/handheld devices ship with more than 4 (full sized) cores. I wouldn't find odd. I think that 4x2GHz is the most likely configuration.
The other mobiles (like Steam Deck) that do have 4 cores, have insanely high clocks. Like 3-3.5 GHz range. It's worth noting . that that the upcoming AyaNeo2 aka Steam Deck killer will have 8 cores and go from 2.7Ghz to 4.75Ghz (boost).

If we are talking about mobile phones, a lot of android based phones actually do have 8 cores and run in the 2Ghz range, with few of the 8 cores running in 3Ghz. But as someone else stated, a lot can be for bursts.

There might not even be an Orion model that has only 4 cores. Seems to be in the 6-12 range.So unless Drake's CPU clocks are ridiculously high, I expect more than 4 cores. I don't think we will get a repeat of tx1. TX1 does have A53 cores to total to 8, but it's deactivated in Switch because A53s can't be used at the same time as A57s. Not to mention, Switch successor will be even more behind in CPU power vs current gen than switch is with last gen.

The 6-8 cores can be at 2Ghz or maybe lower at 1.5 or something. If we do get 5nm drake, there's no reason why it can't run at 2Ghz per core.. Unless Nintendo really wanted to save on power draw.
 
Last edited:
0
I would caution against using what Aya's is doing as an example of anything. Those niche Chinese OEM have no regard over battery life, battery longevity or heat dissipation. I say that the Steam Deck is pushing it.
 
I will admit that I do not know enough to say if 4x2Ghz would be better or worse than eight at 1.4Ghz. For my work, 4x2Ghz would be significantly better, but I'm no game dev. But I dare to say that the distinction is not nearly as clear as you say, but I would love to know the opinion of actual game devs on this.
if we are talking about same performance per 1Ghz, 8 X 11.2. So it's a no brainer.

I would caution against using what Aya's is doing as an example of anything. Those niche Chinese OEM have no regard over battery life, battery longevity or heat dissipation. I say that the Steam Deck is pushing it.

Whats there to caution about?

There's no reason for Nintendo to stick to 4 cores, because mobile phones have been using 8 cores in the 2Ghz range for at least 5 years.

AyaNeo and Steam Deck are essentially outliers, but mainly because they are handheld PC consoles and use Intel or AMD to get to 3Ghz. Whether Neyo2 goes up to 4Ghz is irrelevant. AyaNeo2 is actually using 8 cores instead of 4 that Steam is using, while still being in the 3 GHz. It's definitely most likely overkill, but it's basically a handheld xss.


It's really the TX1 that is outdated. The same year switch was released, we were getting the pixel 2 using 8 core cpus.
And four Ghz cores would put the successor even more behind vs current gen than switch vs last gen.4 is not out of the question, but I think it's less likely than say 6 or 8 cores. I do think Nitnendo wants to close the gap if not march last gen as well. But sticking with Orion should likely keep us in 6-8 cores. We will see. Not saying Nintendo will be using 2Ghz for each of it's 6-8 cores but it's not improbable if switch 2 is in 5nm vs 8nm Samsung.
 
Last edited:
It's really the TX1 that is outdated. 4 2Ghz cores would put the successor even more behind vs current gen than switch vs last gen.
Is not out of the question, but I think it's less likely than say 6 or 8 cores.
8x1Ghz A78 would be nowhere closer to the 8x3.6Ghz Zen 2 CPU of current gen, and the 8x2Ghz A72 is plain unrealistic for a portable device. Moreover, 4x2Ghz is not slouch and would put the Switch on the top tier CPU mobile performance, being not much slower than the Steam deck and more powerful than almost all non-Apple mobile devices. Also, I do not believe that 4x2Ghz would fare worse against 8x3.6Ghz Zen 2 than the 4x1Ghz A57 CPU did against the 8x1.6Ghz Jaguars.

I would not be surprised if we get 4x1.5Ghz A78 CPUs, and it would be a much more stronger CPU for its time than what the Switch launched with. The difference is that the PS4/XB1 launched with crap CPUs for their time, so 4x1Ghz A57 could still get some ports, while the PS5&XBSX are the most powerful consoles for their time we have ever gotten.
 
Last edited:
8x1Ghz A72 would be nowhere closer to the 8x3.6Ghz Zen 2 CPU of current gen, and the 8x2Ghz A72 is plain unrealistic for a portable device. Moreover, 4x2Ghz is not slouch and would put the Switch on the top tier CPU mobile performance, being not much slower than the Steam deck and more powerful than almost all non-Apple mobile devices. Also, I do not believe that 4x2Ghz would fare worse against 8x3.6Ghz Zen 2 than the 4x1Ghz A57 CPU did against the 8x1.6Ghz Jaguars.

I would not be surprised if we get 4x1.5Ghz A72 CPUs, and it would be a much more stronger CPU for its time than what the Switch launched with. The difference is that the PS4/XB1 launched with crap CPUs for their time, so 4x1Ghz A57 could still get some ports, while the PS5&XBSX are the most powerful consoles for their time we have ever gotten.
I believe what you actually mean is A78 :p
 
I think the chances of 4 cores is astronomically low. We have the specs of Orin NX, the 16gb model has a 1024 core ampere GPU, 8 core A78 at 2ghz and a whole bunch of DLA, Camera module stuff the switch would not need and operates at 10-25w.

That's on Samsung 8nm.

We know from the nvidia leak that Nintendo has opted for a GPU that's 50% larger than Orin NX, so for me Samsung 8nm is unlikely and 5nm is on the table. If they are making the GPU 50% larger than Orin NX I dont see them making the CPU half as big. It doesn't make sense as it would be introducing a performance bottleneck whilst simultaneously increasing cost by increasing the GPU size.
 
Too muddy the core question considerably, for game purposes is won’t be 4x anything, it will be 3x (or 7x) as one core will almost definitely be dedicated to the OS.

If so, I can imagine Nintendo having one or two cores clocked higher than the rest for games that rely heavily on single-thread performance, maybe 2Ghz or even higher, then having the core reserved for the OS running at 1Ghz.
One of the big wins of the Switch was it’s flat architecture. I suspect they won’t want to make it very difficult to optimize for Drake
 
8x1Ghz A72 would be nowhere closer to the 8x3.6Ghz Zen 2 CPU of current gen, and the 8x2Ghz A72 is plain unrealistic for a portable device. Moreover, 4x2Ghz is not slouch and would put the Switch on the top tier CPU mobile performance, being not much slower than the Steam deck and more powerful than almost all non-Apple mobile devices. Also, I do not believe that 4x2Ghz would fare worse against 8x3.6Ghz Zen 2 than the 4x1Ghz A57 CPU did against the 8x1.6Ghz Jaguars.

I would not be surprised if we get 4x1.5Ghz A72 CPUs, and it would be a much more stronger CPU for its time than what the Switch launched with. The difference is that the PS4/XB1 launched with crap CPUs for their time, so 4x1Ghz A57 could still get some ports, while the PS5&XBSX are the most powerful consoles for their time we have ever gotten.
It would certainly be an interesting move to use A72s in a product releasing in or after 2022, haha. I'm guessing that's a mistake?

My understanding is that A78 is similar in performance per Ghz to Zen 2, just as A57 was similar to Jaguar. So that makes the comparison math easier; Switch with TX1 is roughly one third the CPU strength of a PS4, while 4 A78 @ 2Ghz would be a few points over a quarter the CPU strength of PS5. So the gap would be similar, but grow slightly rather than shrink. Obvious caveats with this sort of simple comparison exist, but it's enough to show us an overview.

One thing I've forgotten to mention is that we can expect Nintendo to continue reserving one core for the OS. With four cores that would be a quarter of the CPU, with eight cores it would be one eighth. So the difference in power available to games is slightly wider than one might initially think.

To half-answer my own question from earlier, there are a number of mobile SoC that clock one core higher than the rest, though I haven't been able to figure out if there's a downside to it (for example, are they still able to share cache with the other cores?). So if anyone believes that 4 cores at 2Ghz is possible, I think it should be made clear that for the same battery life you can get six cores at 1.4Ghz, along with one core at 1Ghz and one core at 2Ghz, for almost the same total power budget, I think it'd be six percent higher or something minor like that. The only downsides are die size and if your code benefits from just a handful of heavy threads.

If a game is engineered to run several threads really fast, then 4@2Ghz might be preferable. But if it relies on just one thread going fast, or if it's processes are modular enough to run in parallel (which is growing more and more common it seems, but I admit I'm on the outside looking in), then the system I laid out above would be more effective.

As for die size: According to this post from last November
So I was bored and was playing around with a new app I got on my iPad...

qOZyxEr.jpg


Basically just scaled the die shot to come out to ~460mm2, then got the rest from there, about ~4mm2 for the A78AEs (so of course ~48mm2 for the x12 cluster of them). Course it's all an in exact science based off a jpeg, like depending how it was cropped and how they're counting the size would mess with all those measurements.
A78AE cores are around 4mm2 on Samsung 8N. I don't know how accurate it is, but I think it's safe to say it's in the ballpark. So at worst 8 cores would add 16mm2 over 4 cores. But I've read that A78AE is larger than A78C, and 8N is the worst case scenario node for Drake, so I would not be surprised if the actual difference between four and eight cores ends up being ten square millimeters or under, maybe as little as 6mm2 added onto the die. Which isn't nothing, but it isn't much either. If Nintendo okayed 12SM, I can't imagine this would be too much for them.

EDIT: @oldpuck "One of the big wins of the Switch was it’s flat architecture. I suspect they won’t want to make it very difficult to optimize for Drake"

That's a good point I hadn't considered. I can imagine the whole industry learned from the PS3 that power is useless if there's too much complexity in the way of using it. Granted this wouldn't be anywhere near as complex as that, but I think simplicity winning out is far more likely than my post implies.
 
Last edited:
0
Too muddy the core question considerably, for game purposes is won’t be 4x anything, it will be 3x (or 7x) as one core will almost definitely be dedicated to the OS.


One of the big wins of the Switch was it’s flat architecture. I suspect they won’t want to make it very difficult to optimize for Drake
If the OS core is clocked lower, it won’t affect devs at all. As they don’t even have access to that core.
 
Theoretically speaking, yes, since the Cortex-A78C supports DynamIQ. I don't know about practically speaking.
That article doesn't explicitly mention one little detail that Anandtech's version does on this topic:
"Within a cluster CPUs are divided into voltage/frequency domains, and within a domain each core is inside its own power domain. This allows each CPU to be individually powered down, although all CPUs in the same domain must operate at the same frequency, which is no different from bL; however, with DynamIQ each cluster can support up to 8 voltage/frequency domains, providing greater flexibility than bL’s single voltage/frequency domain per cluster. So, what does this mean? It means that, in theory, an SoC vendor could place each CPU into its own voltage domain so that voltage/frequency could be set independently for each of the 8 CPUs in the cluster. Each voltage/frequency domain requires its own voltage regulator, which adds cost and complexity, so we’ll most likely continue to see 2-4 CPUs per domain."

So given the need for separate voltage regulators, it's a question of foresight at the time of designing the chip.
It's pretty highly likely for the core reserved for the OS to get its own domain. Beyond that, I think that it's likely for 3 cores to get a domain, to match base Switch. And then finally, the simplest/cheapest option would then be to lump all the remaining cores (be it 2 or 4 or whatever) into another domain.
Not gonna lie though, my wishful thinking is that it'd be neat if out of 8 cores, we get a 1/3/2/2 arrangement, so that devs get the option of utilizing 3/5/7 cores.

Edit: Wait, I'm being silly. A 1/1/2/4 arrangement allows for everything from 1 to 7 cores using the same number of domains.
 
Last edited:
So given the need for separate voltage regulators, it's a question of foresight at the time of designing the chip.
It's pretty highly likely for the core reserved for the OS to get its own domain. Beyond that, I think that it's likely for 3 cores to get a domain, to match base Switch.
I know we're all tired of the "successor or revision" discussion but this seems like a place where it matters.

Running a 3 core domain, and isolating "classic" Switch games to it is exactly the sort of BC solution the Wii and Wii U offered, but those systems were not being sold on the ability to improve you existing library. It seems like an odd decision to sell a revision that has 4 whole extra, higher clocked, cores but only allow performance improvement for exclusives and patched games. That seems like a sizable jump in the hardware cost for a very low amount of benefit.

I've been running on the assumption (and it is just an assumption) that the design is built around improving existing dynamic res/frame rate games, and future proofing Nintendo's architecture. For that I would expect ~4x1.7Ghz, a configuration that the existing switch gets into briefly already for "boost mode" and (AIUI) regularly found by the mod community to be stable and not gamebreaking for a wide variety of games.

While it wouldn't surprise me to find that games could request more clock if built for Drake, it would surprise me if there is a huge chunk of silicon in the device that almost never gets used.

Perhaps I am in the minority here, but I keep looking at Drake's potential and think "what is the version of this hardware that will make existing Switch games sing, and then allow some cool patches/exclusives after that".
 
I know we're all tired of the "successor or revision" discussion but this seems like a place where it matters.

Running a 3 core domain, and isolating "classic" Switch games to it is exactly the sort of BC solution the Wii and Wii U offered, but those systems were not being sold on the ability to improve you existing library. It seems like an odd decision to sell a revision that has 4 whole extra, higher clocked, cores but only allow performance improvement for exclusives and patched games. That seems like a sizable jump in the hardware cost for a very low amount of benefit.

I've been running on the assumption (and it is just an assumption) that the design is built around improving existing dynamic res/frame rate games, and future proofing Nintendo's architecture. For that I would expect ~4x1.7Ghz, a configuration that the existing switch gets into briefly already for "boost mode" and (AIUI) regularly found by the mod community to be stable and not gamebreaking for a wide variety of games.

While it wouldn't surprise me to find that games could request more clock if built for Drake, it would surprise me if there is a huge chunk of silicon in the device that almost never gets used.

Perhaps I am in the minority here, but I keep looking at Drake's potential and think "what is the version of this hardware that will make existing Switch games sing, and then allow some cool patches/exclusives after that".
I think all games will require patching.
 
I would caution against using what Aya's is doing as an example of anything. Those niche Chinese OEM have no regard over battery life, battery longevity or heat dissipation. I say that the Steam Deck is pushing it.

My point still stands. There's no reason for Nintendo to stick to 4 cores. I'm just saying that AyaNeo and Steam Deck are essentially handheld PC consoles and use Intel or AMD to get to 3Ghz.



On the other hand, mobile phone devices and any that use ARM usually only go up to 2Ghz, and many since 5-6 years ago have been using 8 core CPUs. Its the TX1 that is outdated.



4 2Ghz A78s would put Nitnendo even more behind than 4 A57s vs last gen as well

8x1Ghz A78 would be nowhere closer to the 8x3.6Ghz Zen 2 CPU of current gen, and the 8x2Ghz A72 is plain unrealistic for a portable device. Moreover, 4x2Ghz is not slouch and would put the Switch on the top tier CPU mobile performance, being not much slower than the Steam deck and more powerful than almost all non-Apple mobile devices. Also, I do not believe that 4x2Ghz would fare worse against 8x3.6Ghz Zen 2 than the 4x1Ghz A57 CPU did against the 8x1.6Ghz Jaguars.

I would not be surprised if we get 4x1.5Ghz A78 CPUs, and it would be a much more stronger CPU for its time than what the Switch launched with. The difference is that the PS4/XB1 launched with crap CPUs for their time, so 4x1Ghz A57 could still get some ports, while the PS5&XBSX are the most powerful consoles for their time we have ever gotten.
I'm not expecting 1Ghz. More like 1.5. TX1 on 20nm was not very efficient and had a lot of temperature leaks. Though 1GHz on A78s would match the gap of last gen with switch vs PS4 (switch 2 and XSs in this case).
Also I said nothing about A72s.. So I am assuming that's a typo?

2Ghz A78s could be doable on 5nm TSMC with a power draw of 25 or less when combined with a 12 SM at 928 Mhz or w/e, given that given that we have power draw profiles for Orion they reflect 8nm Samsung. And 5nm TSMC is way more efficient.

Also, A78s are also much more efficient than A57 per clock..I don't remember the power draw, but it's lower per GHz per core.

But again, I'm not expecting 2Ghz. Just saying it's doable if we get the best case scenario with TSMC 5nm.
But Nintendo doesn't have to keep the same 15 watts max draw on the OG switch. They could go higher.
 
I know we're all tired of the "successor or revision" discussion but this seems like a place where it matters.

Running a 3 core domain, and isolating "classic" Switch games to it is exactly the sort of BC solution the Wii and Wii U offered, but those systems were not being sold on the ability to improve you existing library. It seems like an odd decision to sell a revision that has 4 whole extra, higher clocked, cores but only allow performance improvement for exclusives and patched games. That seems like a sizable jump in the hardware cost for a very low amount of benefit.

I've been running on the assumption (and it is just an assumption) that the design is built around improving existing dynamic res/frame rate games, and future proofing Nintendo's architecture. For that I would expect ~4x1.7Ghz, a configuration that the existing switch gets into briefly already for "boost mode" and (AIUI) regularly found by the mod community to be stable and not gamebreaking for a wide variety of games.

While it wouldn't surprise me to find that games could request more clock if built for Drake, it would surprise me if there is a huge chunk of silicon in the device that almost never gets used.

Perhaps I am in the minority here, but I keep looking at Drake's potential and think "what is the version of this hardware that will make existing Switch games sing, and then allow some cool patches/exclusives after that".

Fair point; I didn't think that far.

Assuming we're going for as much of a 'hands free/automatic for the devs' boost to classic Switch games thanks to Nvidia/Nintendo's work on backwards compatibility, can the OS on its own recognize the number of threads, and then automatically utilize the appropriate number of cores?
 
Sorry but this is fucking ridiculous imo. In practice, that would mean most games won’t be bc.
From the context of that reply I assume he's referring to games that will see improvements needing patches to get that improvement.

Which is correct as far as I understand.
 
From the context of that reply I assume he's referring to games that will see improvements needing patches to get that improvement.

Which is correct as far as I understand.
I misunderstood the post then. I still believe it will have improved bc even without a patch.

MS set the gold standard for bc, with Sony not to far behind. Nintendo/Nvidia needs can’t offer just plain bc.

That and I doubt the a78/ ampere combo is even capable of not exceeding OG Switch performance.
 
I misunderstood the post then. I still believe it will have improved bc even without a patch.

MS set the gold standard for bc, with Sony not to far behind. Nintendo/Nvidia needs can’t offer just plain bc.

That and I doubt the a78/ ampere combo is even capable of not exceeding OG Switch performance.
Oh I'm sure games will be pushed to their peak resolution/framerate but I'd be hesitant to expect anything more than that. Theoretically Nintendo could allow games to be forced into docked mode settings while in portable mode but I feel like they'd be hesitant to do that if there's even the slightest chance of breaking compatibility in some way.
 
Fair point; I didn't think that far.

Assuming we're going for as much of a 'hands free/automatic for the devs' boost to classic Switch games thanks to Nvidia/Nintendo's work on backwards compatibility, can the OS on its own recognize the number of threads, and then automatically utilize the appropriate number of cores?
That's what most general purposes OSes do, and I assume Horizon (the switch OS) does the same. This is one fo the reasons that the Switch was so nice to port to - the CPUs all behave the same, so devs don't have to do things like "this CPU runs a faster clock, but this has a larger cache, so I need to put physics on this CPU and then optimize for cache misses..."

When you're expecting to run on 3 cores, then you probably set your thread pool to a small number of threads. It can be much faster to manually have your threads jump between multiple tasks, than make a thread for each task and have the OS try and schedule them for you. So running Switch games on lots of cores is less likely to give you the perf boost of the same number of cores but better per-core clock speed. Though that is not a universal truth, and once you're talking about both clock AND core increases, the tradeoffs become less obvious.

Also, I'm not actually sure how devs request performance profiles from the hardware. @davec00ke might be right that to access better clock speeds might always require a patch, though potentially a trivial one.
 
Fair point; I didn't think that far.

Assuming we're going for as much of a 'hands free/automatic for the devs' boost to classic Switch games thanks to Nvidia/Nintendo's work on backwards compatibility, can the OS on its own recognize the number of threads, and then automatically utilize the appropriate number of cores?
A game would only recognize the amount of threads it was coded for afaik.

So BoTW would only see 3 threads/cores

Not 5 or 7.

XBox games (and presumably PS5 too) also disable their newer features and set a clock for the BC titles until they get patched to make use of the newer features. 3.8GHz in the Series X case.


i wonder if Nintendo will give devs options for 3, 5 and 7 cores though…. With their own set speeds.

would be interesting to see what devs opt for…
 
Is FSR 2.0 coming to Steam Deck? The standard one does a decent job of hiding some jaggies but it's not great. If Nintendo manage to get DLSS in handheld mode then the IQ would be insane!
 
Please read this staff post before posting.

Furthermore, according to this follow-up post, all off-topic chat will be moderated.
Last edited:


Back
Top Bottom