• Hey everyone, staff have documented a list of banned content and subject matter that we feel are not consistent with site values, and don't make sense to host discussion of on Famiboards. This list (and the relevant reasoning per item) is viewable here.
  • Do you have audio editing experience and want to help out with the Famiboards Discussion Club Podcast? If so, we're looking for help and would love to have you on the team! Just let us know in the Podcast Thread if you are interested!

StarTopic Future Nintendo Hardware & Technology Speculation & Discussion |ST| (Read the staff posts before commenting!)

Yes more cores at lower clocks would be better (if necessary) because we have heard from a few developers working on impossible Switch ports that core parity to the other systems would have greatly made the porting lesser of a challenge.
Of course we would all much prefer that Nvidia and Nintendo just shoot for the better node and achieve a best of both worlds scenario...
I think the main reason they said it that way, is because core counts are much easier to explain to laymen than the intricacies of cpu design.

The main issue for those ports were the total available performance of the cpu.
 
Last edited:
0
Those PC Ryzens are probably not 100% eqalent to their console counterparts though. Pretty sure consoles have less cache among other things.
Yes, less cache and they have the GDDR memory that has pretty high latency, and CPUs are more sensitive to latency which affects their performance to be a bit worse.

It’s just that the 6 core Ryzen is the closest you’ll get to the console PS5.

Speaking of Alex's vid on Gotham Knights and the PS5 using 6.5 cores, I'm still a bit flabbergasted at a whole Zen 2 core and a half being reserved. What's going on in the background that seems like needing so much more than say, a Windows install?
The PS5 does use a very heavy OS, and i remember something that happened early in the PS4 generation is that it used 6 cores and later on had a 7th core unlocked but only partly. I suspect that as time goes the 7th core will be made more available but at the moment games that are currently being developed are around the 6.5 core limitation.

Remember that the PS5 is also a VR device.
According to Alex from DF PS5 uses 6.5 cpu cores for gaming during their analysis of Gotham Knights, which I wonder if this is something Sony will free up down the line(but I'm guessing not since these newer consoles have a lot of system wide functions running in the background).
So if Drake can dedicate a full 7 cores to gaming it would greatly put the device over the PS4 and XboxOne consoles, which is probably Nintendo's internal target performance metrics to beat.
It having the full 7th core just helps it a bit. In this case that is.
It's almost Januari, i hope we got some juicy news and or leaks in the coming month.

E889D1F8-B0C1-4282-93F3-FDBACFA4B313.jpg
 
The first surface book. The ps3?

None of those were probably customized to the same level of Drake though.
The original Xbox also had something from Nvidia, can't remember if that was custom or not.

Also funny how Xbox/Playstation only tried Nvidia once and never again.
 
The original Xbox also had something from Nvidia, can't remember if that was custom or not.

Also funny how Xbox/Playstation only tried Nvidia once and never again.
The advantages of getting everything from one company are just to advantageous to ignore in a console.

Nvidia doesn’t offer a product like that for x86. AMD does.
 
The advantages of getting everything from one company are just to advantageous to ignore in a console.

Nvidia doesn’t offer a product like that for x86. AMD does.
Nvidia seems to just offer an existing product and tweak it slightly. No wonder everyone steered clear from them.
 
maybe they'll do something crazy and put Grace Hopper into a desktop chassis
As the Nintendo Switch family of systems marches towards the future, I fully expect it to expand beyond hybrids and handhelds. I don't see them dropping the Nintendo Switch branding, perhaps ever, I could even see their first AR-VR dedicated devices being Switch devices replacing handheld mode with VR mode.

With all that said, while I think it's utterly unrealistic RIGHT NOW, the possibility of an Nvidia powered Nintendo home console that plays Switch games with absurd graphics isn't impossible in the future. A Nintendo Switch Cube- a GameCube sized console for playing Nintendo games at 8K, 16K, whatever it is.


That is the only scenario where I see Grace Hopper entering the consumer video game console business. I doubt Nintendo wants them working with anyone else, and I doubt Nvidia wants to risk their partnership with Nintendo to do that even if they aren't contractually obliged to refuse. Plus the other big players in the industry are on AMD and moving to NVIDIA would be a huge development hurdle. Nvidia would have to push hard and offer huge savings and support to get anyone on board- which they have no incentive to do when Nintendo is so successful for them.
 
As the Nintendo Switch family of systems marches towards the future, I fully expect it to expand beyond hybrids and handhelds. I don't see them dropping the Nintendo Switch branding, perhaps ever, I could even see their first AR-VR dedicated devices being Switch devices replacing handheld mode with VR mode.

With all that said, while I think it's utterly unrealistic RIGHT NOW, the possibility of an Nvidia powered Nintendo home console that plays Switch games with absurd graphics isn't impossible in the future. A Nintendo Switch Cube- a GameCube sized console for playing Nintendo games at 8K, 16K, whatever it is.


That is the only scenario where I see Grace Hopper entering the consumer video game console business. I doubt Nintendo wants them working with anyone else, and I doubt Nvidia wants to risk their partnership with Nintendo to do that even if they aren't contractually obliged to refuse. Plus the other big players in the industry are on AMD and moving to NVIDIA would be a huge development hurdle. Nvidia would have to push hard and offer huge savings and support to get anyone on board- which they have no incentive to do when Nintendo is so successful for them.
just so you know, Grace Hopper is an absurd chip designed for HPC. I don't think there's a way to scale it to fit in a tablet
 
It uses ARM v9 so that makes it irrelevant.
It's still fully compatible with ARMv8 (at least for 64 bit code, I believe the Neoverse core they're using has dropped 32 bit compatibility).

The one place where Grace Hopper (or something like it) could come into play is if Nintendo starts a streaming service. Hopper wouldn't work here, as it's missing some of the graphics-specific hardware, but Grace+AD102 could probably virtualise 15-20 Switch games running at Drake-level performance.
 
Depends on the positioning of Drake.
guess it depends on the ratings group. ESRB separate exclusive platforms

for example
 
It's still fully compatible with ARMv8 (at least for 64 bit code, I believe the Neoverse core they're using has dropped 32 bit compatibility).

The one place where Grace Hopper (or something like it) could come into play is if Nintendo starts a streaming service. Hopper wouldn't work here, as it's missing some of the graphics-specific hardware, but Grace+AD102 could probably virtualise 15-20 Switch games running at Drake-level performance.
I meant with respect to the scalability, ARM V9 seems to aim to be more of a scalable architecture that can go both ways, something that was difficult with ARM V8. Scaled up to DC but couldn’t scale down.

ARM v9 aims to be scaleable in both ways. Well, just to do it better than the ARM v8 CPUs. Maybe not perfect, but just perform it in a better/easier way. Though, there shouldn’t be much of a difference between the DC cores and say, a mobile SoC core…
 
What? No, this is wrong.


And here is that CPU compared to Orin at comparable clocks.

https://browser.geekbench.com/v5/cpu/compare/18647313?baseline=9535279

Orin core perf (only really comparable since cluster sizes are not the same) is 2.6x Jaguar. The CPU in the Pro consoles was clocked at 2.3 GHz. Roughly comparable. Unless I am doing something deeply stupid?


Where are you getting 20% at 1GHz? Did you mean at 720MHz?
Neither, I was bouncing back between comparing the Series S and the PS4 and swapped numbers. :oops:

PS4 is 18CUs @811Mhz. Drake is 12SMs @???Mhz. Raster perf scales linearly in both arches with clock and CU/SM number, and Ampere vs GCN4 is 1.8 more efficient controlling for clock and CU/SM. At the same clocks, Drake is 20% more powerful than PS4 in raster perf.

Did you account infinity cache offering a 25% performance uplift as per AMD? The thing the consoles lack.
The short version is "I considered Infinity Cache in my analysis, and that is one of the reasons the 135% number is a floor." :)

The longer version is I have reason to believe that Series X and PS5 are not memory bandwidth constrained. RDNA2 GPUs have unusually low memory bandwidth, and Infinity Cache is their solution. Ampere GPUs just have a big fat bus.

Series X and PS5... also just have a big fat bus. PS5 has a 256-bit bus, with 448 GB/s peak bandwidth. The RX 6700 has the same clocks and number of CUs, but has 130-bit bus, with a 320 GB/s peak. Additionally, while PS and Series X don't have Infinity Cache, they use a shared L3 cache between the CPU and the GPU. Where a desktop card would need to decompress a texture to RAM, copy to VRAM (over the bus), then read from VRAM to render, the consoles can decompress, skip the copy (because it's already in VRAM), and often skip the read because the CPU has primed the L3 cache for the GPU.

@Look over there has determined that Drake likely is a "well fed" system from a memory bandwidth perspective, that the bandwidth available tracks with similarly sized Ampere GPUs. Looking at PS5/Series X, by comparison to their GPU counterparts, they similarly well fed.

Ampere GPUs have fairly consistent bandwidth/SM ratios. RDNA 2 has consistent in the high end and consistent in the low end, but the two don't match at all. So I decided to use a card in the higher RDNA2 band, and compare it to a comparably specced Ampere card. The RX 6900XT is in that upper band, and the 3080ti has the same number of compute units, and clock/memorybandwidth/VRAM ratios are in line with the rest of the 3000 series and the general ratio between Ampere and RDNA2.

Their performance ratios are very close and very consistent, but RDNA2 is clocked significantly faster. This is not surprising, Ampere SMs are packing more "cores" than RDNA2 CUs, the extra clock speed helps them turn over faster. Internal comparisons show that performance does tend to scale linearly with number of compute cores in both designs, unsurprisingly, as well as clocks, though there are ceiling effects where high clocks don't help much.

TL;DR at the scales we are talking about, we really can trade clock speed for compute units, and vice versa. Ampere's arch is generally better optimized for pure performance, RDNA2 for a cost/perf ratio that really suits a mid-to-high range card, but absolutely falls apart at the bottom - and stops being cost effective at the top. 135% perf improvement, measured either as clock speed or at CU/SM count is a data based floor for Drake's performance relative to the other current gen consoles.
 
Addendum: Series S is a mess. I see what Microsoft was trying to do, but the memory situation over there is bad. While the Series S GPU is going to be superior in raster perf to Drake's, Series S is not a well balanced console on this front. When I say "Drake can't meet the Series S" I'm trying to narrowly speak to the Series S GPU power. The Series S as a console is a whole 'nuther thing.

Series S has a memory bandwidth to CU ratio that is lower than any GPU AMD ships, unlike its bigger brother, which has bandwidth on the high end of what AMD ships. It's also heavily memory constrained, with what seems like a reasonable memory cut down, but with very similar OS overhead eating into that number. On the other hand, it's CPU is basically the same as Series X (again, probably for porting simplicity). The creates situations where Xbox One X and Series S have similar framerates, for opposite reasons. One X is CPU constrained, and needs half frame rate to compute AI/collision logic/etc, meanwhile the Series S CPU can churn through all that in 60fps time, and the GPU could keep up, but memory thrashing slows the whole thing down.

Drake is outperformed by Series S on the GPU and the CPU front, but the extremely limited RT power of Series S and its really unfortunate memory situation mean that it's tricky for games to take full advantage of what Series S can do. But CPU constrained and GPU constrained games on Series S won't have an easier time of it on Drake.
 
0
Speaking of Alex's vid on Gotham Knights and the PS5 using 6.5 cores, I'm still a bit flabbergasted at a whole Zen 2 core and a half being reserved. What's going on in the background that seems like needing so much more than say, a Windows install?
It's not fatter than a Windows install. It can actually leave the reserved threads unusually idle.

They have a normal, non-real time scheduler in their kernel, and then use reservations and processor affinity to make quasi-real time guarantees for the OS. This leaves the game in it's own sandbox. Devs get a desktop-ish scheduler that gives high throughput by default, and while it's entirely possible for thread A to starve out thread B on the same core, the devs control both thread A and thread B, so it's their problem under their control.

But if you want all your PS5 system services to be instantly responsive in game, you have to overprovision your OS reservation or be immaculate in designing your shell. Especially with Sony wanting to ship new features over the lifetime of the device, immaculate probably wasn't an option.

IIRC (please someone correct me if I don't) uses a strategy where a single core is reserved, and the UI lives there, but low priority OS tasks can get scheduled on the "game" cores
 
This is the way:


I'm serious. This is the future of NPCs.

That being said, there are a couple of implementation issues with this when it comes to real-time interactive AI in video games. For one, OpenAI is extremely picky when it comes to commercial licensing, so its utility for commercial use is limited right now. More importantly, it is not that easy to implement models like these in certain development environments that impose restrictions that are in conflict with the means through which OpenAI APIs function.

Also, it would be nice to have improved GPU acceleration to better assist with GPT local tuning of hyperparameters, TTS latency reduction, etc. so I hope we will see more promising advancements from GPU manufacturers in these areas, which would benefit future console hardware development as well.
 
This is the way:


I'm serious. This is the future of NPCs.

That being said, there are a couple of implementation issues with this when it comes to real-time interactive AI in video games. For one, OpenAI is extremely picky when it comes to commercial licensing, so its utility for commercial use is limited right now. More importantly, it is not that easy to implement models like these in certain development environments that impose restrictions that are in conflict with the means through which OpenAI APIs function.

Also, it would be nice to have improved GPU acceleration to better assist with GPT local tuning of hyperparameters, TTS latency reduction, etc. so I hope we will see more promising advancements from GPU manufacturers in these areas, which would benefit future console hardware development as well.
Literally the best/worst thing Nintendo could ever possibly do is install these separate learning AI personalities into Animal Crossing villagers.

Could you imagine how devasted they would be if you just up and disappear for two years? I don't think I could handle that level of responsibility
 
This is the way:


I'm serious. This is the future of NPCs.
It's pretty interesting. Thinking about most games, free text interaction is rare, but a system where minor NPCs don't have to be written by hand, and even major ones can modify responses to follow the flow of conversation is a tasty concept.

That being said, there are a couple of implementation issues with this when it comes to real-time interactive AI in video games. For one, OpenAI is extremely picky when it comes to commercial licensing, so its utility for commercial use is limited right now. More importantly, it is not that easy to implement models like these in certain development environments that impose restrictions that are in conflict with the means through which OpenAI APIs function.
What sort of restrictions are you referring to? Just access to the remote OpenAI endpoionts?
 
It's pretty interesting. Thinking about most games, free text interaction is rare, but a system where minor NPCs don't have to be written by hand, and even major ones can modify responses to follow the flow of conversation is a tasty concept.


What sort of restrictions are you referring to? Just access to the remote OpenAI endpoionts?
Yeah there is a game out there called AI Dungeon which is a text adventure where an AI responds to your prompts. It works... decently. A lot of the time the flow of the game makes it feel much more like a fever dream than an actual text adventure game but advances in AI text response could really make something like that a lot more palatable.
 
Literally the best/worst thing Nintendo could ever possibly do is install these separate learning AI personalities into Animal Crossing villagers.

Could you imagine how devasted they would be if you just up and disappear for two years? I don't think I could handle that level of responsibility
Transformer-based Animal Crossing NPCs (fine-tuned to Nintendo's liking) would be incredible!
It's pretty interesting. Thinking about most games, free text interaction is rare, but a system where minor NPCs don't have to be written by hand, and even major ones can modify responses to follow the flow of conversation is a tasty concept.
Yes, and that's the primary focus of my project, which is one of the reasons it has taken so long to make significant progress outside of the development of the game world itself. It is not yet commercially viable, though it is already working phenomenally. Still, there's too much of a delay between when you audibly ask the NPCs questions and when they audibly answer.
What sort of restrictions are you referring to? Just access to the remote OpenAI endpoionts?
Can't really dig into the details of that without breaking NDA, so my statement was intentionally vague, sorry.
 
As a counterpart to the split comparison of 720p games scaled up to 1080p, I made some images with 1080p originals split with 720p downscales. Of course, unless viewed on a 7" 4K screen it's not going to help settle the matter of how much detail most people can appreciate at that size.

 
0
It's not fatter than a Windows install. It can actually leave the reserved threads unusually idle.

They have a normal, non-real time scheduler in their kernel, and then use reservations and processor affinity to make quasi-real time guarantees for the OS. This leaves the game in it's own sandbox. Devs get a desktop-ish scheduler that gives high throughput by default, and while it's entirely possible for thread A to starve out thread B on the same core, the devs control both thread A and thread B, so it's their problem under their control.

But if you want all your PS5 system services to be instantly responsive in game, you have to overprovision your OS reservation or be immaculate in designing your shell. Especially with Sony wanting to ship new features over the lifetime of the device, immaculate probably wasn't an option.

IIRC (please someone correct me if I don't) uses a strategy where a single core is reserved, and the UI lives there, but low priority OS tasks can get scheduled on the "game" cores
If you want to get pedantic, the OS is always running on all of the cores*, it's mostly just a matter of what work is scheduled where. The OS kernel is always what runs on the bare metal, which presents an abstraction of the system that all of the game and system processes run in. Even a full Game Core™ is going to end up doing some non-zero amount of OS work, albeit probably primarily in response to calls from the game itself. The OS reserved cores are not necessarily for making the OS go (though it's definitely possible some core tasks have been shifted to them), but really more for taking care of any background stuff that needs to be able to happen without directly interfering with the game.

That said, I'm not really familiar with the specifics of what Sony has done on PS5 beyond it probably still being a BSD.

Modern Xboxes actually run like 3 operating systems at once, with two of them being in VMs. It is probable some of the resource partitioning happens at the VM level.
 
pokemon is actually my guess

I expect mvg to explain what a memory leak actually is
when a game needs to use something, that something is loaded into memory. when it's not being used anymore, it's removed from memory. memory leak is when it doesn't get removed and the problem persists for other things that are requested, leaving less and less memory.

it could be a problem for pokemon, but it probably wouldn't explain all the problems for pokemon.
 
when a game needs to use something, that something is loaded into memory. when it's not being used anymore, it's removed from memory. memory leak is when it doesn't get removed and the problem persists for other things that are requested, leaving less and less memory.

it could be a problem for pokemon, but it probably wouldn't explain all the problems for pokemon.
I'm kind of stupid but I have a bachelor's degree in computer science, I know what a memory leak is
 
If you want to get pedantic, the OS is always running on all of the cores*,
"System" tasks is probably a more reasonable term. But userland-vs-kernel gets semantic, fast.

I'm actually less familiar with the Xbox strategy - what are the other two OSes, and does it use a hypervisor model underneath?
 
And here is that CPU compared to Orin at comparable clocks.

https://browser.geekbench.com/v5/cpu/compare/18647313?baseline=9535279

Orin core perf (only really comparable since cluster sizes are not the same) is 2.6x Jaguar. The CPU in the Pro consoles was clocked at 2.3 GHz. Roughly comparable. Unless I am doing something deeply stupid?
*2.84x :p
If Normalized to the same frequency.
Neither, I was bouncing back between comparing the Series S and the PS4 and swapped numbers. :oops:

PS4 is 18CUs @811Mhz. Drake is 12SMs @???Mhz. Raster perf scales linearly in both arches with clock and CU/SM number, and Ampere vs GCN4 is 1.8 more efficient controlling for clock and CU/SM. At the same clocks, Drake is 20% more powerful than PS4 in raster perf.
I’m not sure if that’s an apt comparison, the 28SM 3060 outperforms the Vega 64 with 64 CUs by 20% as per TechPowerUP. While also having less memory bandwidth. And roughy the same TFLOP count.




If we were to normalize it, CU to SM, 2.28:1 ratio just using that kind of metric.

Add the 20% when at equal TFLOPs, and you get that an SM performs like 2.736 CUs of GCN5.0 who uses rapid packed math.


Or let’s compare the RX Vega 56 to the 3060 mobile.

56CUs vs 30SMs, both around 10TF (latter is a bit higher), the 3060 mobile outperforms that by 11%

So the performance would be 2.07CU:1SM in this case. Without accounting for the TF, 1.866CU:1SM to have roughly similar performance.




So, if we were to take it by strictly using the CU to SM method,

Using the latter example, to have the same performance, it would be 24.4% more performant/efficient than the PS4. If both were the same 1.843TF, then accounting it the Drake would be 38% more performant.


If using the former example, then 52% more performant/efficient than the PS4 while having less. And if they are the same TF, then 82.4% more performant/efficient than the the PS4… which I think might be unrealistic?


Unless my math doesn’t check out here….


Let’s use the GCN3.0 Radeon R9 Fury, a 56 CU graphics card that has the same shader count as the 3060 the 28SM card, the latter outperforms it by 1.78x… but also has ~1.78x the TF or close enough.

7.168TF vs 12.74TF


So I’m not really sure using CUs and SMs is really a good idea in this case, or Drake would be >20% the PS4.


Additionally, while PS and Series X don't have Infinity Cache, they use a shared L3 cache between the CPU and the GPU.
They do not. AMD APUs don’t do that.

Intel, Apple Silicon, Snapdragons (via SLC acts as L4$ to CPU) and nVidia SOCs do that. Also Mediatek via the SLC (acts as a L4 cache to the CPU)

And AMD does that by design. The L3 is exclusive for the CPU. GPU gets 0 access to that. It goes from the L2 to the VRAM and that’s about it.

None of their APUs do that “GPU access to the CPU L3”. They go straight to the RAM.

@Look over there has determined that Drake likely is a "well fed" system from a memory bandwidth perspective, that the bandwidth available tracks with similarly sized Ampere GPUs. Looking at PS5/Series X, by comparison to their GPU counterparts, they similarly well fed.
That is only if Drake doesn’t exceed 1.043GHz. After that it has worse memory bandwidth allocation per TF than the desktop Ampere.


Basically, why I said before about the ceiling for Drake with respect to TF is 3.2, after that it doesn’t scale in the favor and you get a more imbalanced system like the Tegra X1.

And also why I think that with the GPU clocking to half in portable mode, a bandwidth of 68.2GB/s is “preferred” to balance the scaling from portable to Dock as best as possible. If they used LPDDR5X it can make it easier, being 2x the bandwidth scaling with 2x the FP32 increase.

And yes it does account for the CPU.
 
"System" tasks is probably a more reasonable term. But userland-vs-kernel gets semantic, fast.

I'm actually less familiar with the Xbox strategy - what are the other two OSes, and does it use a hypervisor model underneath?
I don't think they've talked about this a whole lot since the initial Xbox One reveal, so I might get some details wrong, but basically there's a manager OS that then launches (I think basically Hyper-V) VMs containing a "Game" OS (which is what Xbox games run on) and an "App" OS (which runs the Windows Store apps). I'm not entirely sure where the system UI itself lives, but it's probably either on the host or the App OS.
 
Please read this staff post before posting.

Furthermore, according to this follow-up post, all off-topic chat will be moderated.
Last edited:


Back
Top Bottom