• Hey everyone, staff have documented a list of banned content and subject matter that we feel are not consistent with site values, and don't make sense to host discussion of on Famiboards. This list (and the relevant reasoning per item) is viewable here.

StarTopic Future Nintendo Hardware & Technology Speculation & Discussion |ST| (Read the staff posts before commenting!)

We heard recently about the RX6500xt suffering from an imbalance of memory bandwidth issues, even though it has 16MB of Infinity Cache 4GB just isn't enough for what that card wants to be...

Thank you for providing a nice example.
I read Eurogamer article about this card and they have proven that its bandwidth of 144 GB/s over 4 lanes saturates the PCIe 3.0 standard, or at least that's my interpretation of the situation.

However, when I look for information about PCIe 3.0, I get that the max bandwidth over a single lane is... 1 GB/s.
PCIe 3.0x4 is 4GB/s.
PCIe 4.0 x4 is 8 GB/s

Isn't that weird? Both 4 and 8 are way less than 144. Thus, any modern GPU would be bottlenecked by the PCIe standard (no matter 3.0 or 4.0 and not matter how many lanes are used). What am I missing in there?
 
I do not think Nintendo necessarily follows "conventional wisdom", much less your definition of "logic".
I do not see them attempting to imitate Sony and Microsoft in branding/marketing. That is something most "gamers" (of the Sony/Microsoft variety) have been wanting Nintendo to do, to "get with the times" so to speak, even though their mindset is in the wrong place.

You are also assuming this next iteration or successor, should it retain the Switch branding, will not have any brand new hooks nor "gimmicks". I would suspect that if it does, the name will center around this differentiating factor.
Note that "Nintendo Switch OLED" pretty much highlighted the biggest change/upgrade in the iteration, with other features (new kickstand, better speakers, new dock with built-in ethernet) being tertiary.

I really doubt they will call it "Nintendo Switch 2" and cause more confusion, since the "Nintendo Switch OLED" and "Nintendo Switch Lite" might also be perceived as "Nintendo Switch 3 and 2", respectively.

Its not point about following "conventional wisdom" or even less about imitate Sony and Microsoft in branding/marketing (because console are not only tech products where you have numbers with naming), but its point about what makes most sense in current time with this new hardware for Nintendo, point that Sony using numbers in naming doesnt mean and it shouldn't mean nothing for Nintendo.

I saying that if this is really next gen Switch like current information saying, that most logical would be to name it Switch 2.
Thing is that Switch branding and hybrid concept is here to stay, having that on mind that hybrid concept will will be main concept of future Nintendo hardware releases, it has most sense that every next gen hardware get number in naming, because than you could easily and simple have next gen revisions namings also. For instance this next gen Switch 2, than you have Switch 2 lite, than you have Switch 2 "something", than you have Switch 3, Switch 3 Lite, Switch 3 "something"...and by not naming Switch 2 you basically getting with another 3DS situation with bad naming.

OLED is basically regular Switch with OLED screen (and double internal memory), so its revision with minimal differences, so it did made sense to call it OLED,
because Switch OLED naming says that difference is about that OLED screen.
On other hand, we here talking about next gen hardware, (stronger and new CPU, stronger and new GPU, new and more RAM, new and more internal memory, new hardware and software features...and who knows what else), and Switch 2 naming says exactly that, next generation of Switch hardware, very clear and simple.


Saying that next gen hardware with naming Switch 2 would make confusion is not good argument in world where we had much worse namings, for instance Wii U after Wii, or 3DS after DS, followed by 3DS XL, 2DS, New 3DS, New 3DS XL, New 2DS XL

Even Switch Lite is good example that not even not that good naming doesnt make confusion, Switch Lite naming tells us its smaller or somehow weaker/less capable Switch, but it doesnt says you cant use it on TV and played it on dock.
So before launch you could say that naming like that could be make big confusion because main point of Switch is to play it in handheld or TV mode, and Lite naming dont says it cant be played in TV mode (so its huge difference compared to regular Switch), and at end there was no confusion, because Nintendo on reveal and with marketing explained what Switch Lite really is.
Switch 2 would made even less confusion (counting on reveal information and marketing) than Switch Lite did.


Switch 2 will mostly be compared to PS2.

Nintendo is not Playstation. And that's a good thing.

Yeah, because everyone will say, "Sony has Playstation 5 and Nintendo has Switch 2, I better buy Playstation 5!" :p

Offcourse that Nintendo is not Sony, and even less that Nintendo is becoming Sony by calling it Switch 2 (lol),
but reality is that in industry companies often replicated good things from competition (for instance Sony replicate Nintendo Directs with its State Of Play) or tech companies. Also have on mind that number naming for new hardware is not Sony thing, plenty companies doing same thing in other part of tech hardware, for instance on phone or TV market.
 
Last edited:
Thank you for providing a nice example.
I read Eurogamer article about this card and they have proven that its bandwidth of 144 GB/s over 4 lanes saturates the PCIe 3.0 standard, or at least that's my interpretation of the situation.

However, when I look for information about PCIe 3.0, I get that the max bandwidth over a single lane is... 1 GB/s.
PCIe 3.0x4 is 4GB/s.
PCIe 4.0 x4 is 8 GB/s

Isn't that weird? Both 4 and 8 are way less than 144. Thus, any modern GPU would be bottlenecked by the PCIe standard (no matter 3.0 or 4.0 and not matter how many lanes are used). What am I missing in there?
Those are different things. 144GB/s is the speed of the video memory. This is the memory pool that the GPU has a dedicated access to. The second, is PCie speed, which is a general purpose communication channel that the GPU needs to use to communicate with the CPU and system RAM.

One big advantage for SoCs is that the GPU has direct communication with the system RAM and CPU. One disadvantage is that the CPU and GPU compete for memory bandwidth.
 
Last edited:
PlayStation approach to hardware is "what you loved but better", targeting the same audience, which fits perfecrly with numbering them.

Nintendo approach is always striving to deliver new experiences, to keep things fresh and expand their audiences, and that's reflected in their naming. Nintendo Switch was a far better name than Nintendo 11, but only because it has a big differential from past consoles. And Furukawa doesn't seems at all like he wants to change their approach to hardware.

And there's no reason for them to stick with the Switch brand forever either. The real brand is Nintendo and "Switch" is what differentiate it from other consoles right now. Once a hybrid is taken for granted and the new console have a big USP compared to other Nintendo hybrids, they will change it again.

To be clear, I have no problem if they named it Switch 2 or anything not stupid, I just find extremely unlikely, specially when big USP and branding around it gave them their 3 biggest success so far.
 
PlayStation approach to hardware is "what you loved but better", targeting the same audience, which fits perfecrly with numbering them.

Nintendo approach is always striving to deliver new experiences, to keep things fresh and expand their audiences, and that's reflected in their naming. Nintendo Switch was a far better name than Nintendo 11, but only because it has a big differential from past consoles. And Furukawa doesn't seems at all like he wants to change their approach to hardware.

And there's no reason for them to stick with the Switch brand forever either. The real brand is Nintendo and "Switch" is what differentiate it from other consoles right now. Once a hybrid is taken for granted and the new console have a big USP compared to other Nintendo hybrids, they will change it again.

To be clear, I have no problem if they named it Switch 2 or anything not stupid, I just find extremely unlikely, specially when big USP and branding around it gave them their 3 biggest success so far.

Thing is that before Nintendo named consoles (at least last few generations) based on console concept,
and now with unified handheld and home console Nintendo division and with hybrid concept, Nintendo has winning combination even for future generations,
and that without break through new things like Dual Screen or Motion Controls that every time could mean huge success or fail or mean they are just a current succes. On other hand hybrid concept is wining combination that will not be weaker by time like Dual Screen or Motion Controls.

Offcourse, Nintendo will try to add something new and fresh to new gen Switch, new controls, AR/VR support, but those things will not be main features and those thing would be more just a nice addition, main feature will again be Switch itself.

If you having simple and yet great and popular branding thats very easy for marketing, than you don't really need to change it, especially when that branding tell you what kind of product you offer.
I mean no one knows what will be in around 10 years for now (dont talking only about brending), but its at least to safe that this new next gen Nintendo hardware will have Switch in naming, and IMO most sense has to naming it Switch 2 (at least more sense than naming it Switch 4K or Switch Pro).
 
This applies to big Orin. In post #6,662 Thraktor finds that

"The interesting thing here is that FP16 non-tensor performance is the same as FP32 performance. This lines up with desktop Ampere, but is different than Orin, which has double the FP16 performance to the FP32 performance. This indicates (as we will see further below), that Drake's tensor cores are the same "width" as desktop Ampere, not the double-rate tensor cores we see in Orin."
I was simply responding to the comment of “T239 doesn't have double speed FP16 processing” which isn’t true. Technically all of them have double FP16 rate, as per the white paper of Ampere uArch, but not all execute twice the half point for every FP32, it does 2 FP32 and 2 FP16 unlike the predecessor of Turing that does 1FP32 and 2 FP16.

It just handles it differently, then there’s the tensor cores who also do FP16.
I add that if GA10F is 1 GPC then it is the 1 GPC of G102.
EDIT: The problem might not be the die size if the chip is clocked low, if the cost per transistor is worth it, they might go for it. 8 NM will be available after Nvidia moves GPUs to 5 NM. Low yields would be dealt with by releasing Nano Next and maybe binned SOC Lite model.
Even if the chips clocked low, it was still be less efficient than the same chip on the 7/6/5nm, while costing about or close enough due to being such a large due.
 
0
Those are different things. 144GB/s is the speed of the video memory. This is the memory pool that the GPU has a dedicated access to. The second, is PCie speed, which is a general purpose communication channel that the GPU needs to use to communicate with the CPU and system RAM.

One big advantage of SoCs is that the GPU has direct communication with the system RAM and CPU. One disadvantage is that the CPU and GPU compete for memory bandwidth.
I get it. A single PCIe 3.0 lane gives me access to 1 GB of system memory per second (what would be DDR4 nowadays).
But then, I don't get where the bottleneck occurs when using the RX 6500 XT over PCIe 3.0; I guess the interface doesn't only grant access to the system RAM but also to... something else? And the access to that something else is somewhat limited by the card's bus width of 64 bit?
 
Not to mention calling it "Switch 2" can lead to a lot of confusion, like when being asked if a game is compatible for "Switch 2", making people lead to believe it requires, quite literally, 2 Switches.
Come on now, give the general public at least a little bit of credit lol would make for a good Abbott and Costello routine though: "Have you seen the Switch 2?" "The switch to what?"....ok, maybe not that good.
 
I get it. A single PCIe 3.0 lane gives me access to 1 GB of system memory per second (what would be DDR4 nowadays).
But then, I don't get where the bottleneck occurs when using the RX 6500 XT over PCIe 3.0; I guess the interface doesn't only grant access to the system RAM but also to... something else? And the access to that something else is somewhat limited by the card's bus width of 64 bit?
The GPU still needs to read information from the main memory and get instructions from the CPU from time to time. A PCIe bottleneck occurs when the GPU process data faster than what the PCIe interface can feed it. Why, when and how that happens can be very complicated and vary through each game.
 
The GPU still needs to read information from the main memory and get instructions from the CPU from time to time. A PCIe bottleneck occurs when the GPU process data faster than what the PCIe interface can feed it. Why, when and how that happens can be very complicated and vary through each game.
May you be blessed in your honorable quest of teaching the laymen.
Thank you :)

I guess that is also where all the dev work shines actually, right? In the way they plan the data transfers between the GPU, CPU and RAM over the bus.
 
I get it. A single PCIe 3.0 lane gives me access to 1 GB of system memory per second (what would be DDR4 nowadays).
But then, I don't get where the bottleneck occurs when using the RX 6500 XT over PCIe 3.0; I guess the interface doesn't only grant access to the system RAM but also to... something else? And the access to that something else is somewhat limited by the card's bus width of 64 bit?
The memory of the GPU is different in this, the speed of PCIe is what determines how fast a CPU can communicate with a GPU.
 
Not to mention calling it "Switch 2" can lead to a lot of confusion, like when being asked if a game is compatible for "Switch 2", making people lead to believe it requires, quite literally, 2 Switches.

Sorry, but literally no one is going to think that. Nobody thought PlayStation 2 games required two PlayStations.

Switch 2 is a boring name but it's probably the most clear and unambiguous name for a successor system. People understand it means Not The Switch 1.
 
Nothing I saw on this board.

Now that I have a better grasp on the basic concepts of GPU computing and its metrics, I am now interested in bottlenecks. In particular, I want to know more about the interplay between memory bus width, memory clock and memory capacity. From what I understand:

  • a low memory clock can be balanced with a large bus and this, regardless of the memory capacity (unless with are talking about extreme cases)
  • a portion of the memory capacity can be be left unused if the bus is too small and the memory clock not high enough

Are there notorious examples of unbalances in design that led to some GPUs being crippled by one or the other of these elements?

GTX 970 was crippled with having 0.5GB out of it’s 4GB ram to be really slow thanks to some shenanigans from Nvidia’s side.
Think it was the SM setup being very different from the 980 resulting in a big memory penalty when using more than 3.5GB of ram
 
If your first thought when seeing “Switch 2” is “I need 2 switch units to play this?” I’m gonna have to ask you from what dimension you spawned from and what does this mean:

2


In that dimension.
 
I get it. A single PCIe 3.0 lane gives me access to 1 GB of system memory per second (what would be DDR4 nowadays).
But then, I don't get where the bottleneck occurs when using the RX 6500 XT over PCIe 3.0; I guess the interface doesn't only grant access to the system RAM but also to... something else? And the access to that something else is somewhat limited by the card's bus width of 64 bit?
In actuality that 64 bit bus is probably the biggest hindrance of that card, in comparison to a highway if the highest theoretical speed limit is 120mph but only over 2 lanes of traffic. It would be better served to have more lanes running at slightly slower speeds to achieve more efficient results without the potential of congestion.

I'm sure whatever solutions Nvidia and Nintendo comes up with won't leave this new device bottlenecked in the way Switch currently is.
They have a very real idea of what's needed (memory bandwidth wise) in order to render both native resolutions of 720p @60fps in handheld and 1080p @60fps while docked.
 
Who are all these people who can’t justify a $299 gaming console?

The massive amounts of Switch hybrids and Series S machines sold doesn’t suggest this is a huge problem.

The $199 Switch Lite exists.

I think you are over exaggerating the number of gamers who refuse to buy gaming hardware until it hits $199.

And if they do, it’s not because they can’t afford it…it’s because they don’t find value in spending a lot of money for a dedicated video game machine, period. They don’t care that much about console gaming, tbh. Certainly they don’t need to be lamented and championed as if this ISNT a luxury good (it is)

“oh will anyone think of the poor people who will never afford a ps5???” No, I don’t think i will…

Yikes!
 
On other hand hybrid concept is wining combination that will not be weaker by time like Dual Screen or Motion Controls.

Offcourse, Nintendo will try to add something new and fresh to new gen Switch, new controls, AR/VR support, but those things will not be main features and those thing would be more just a nice addition, main feature will again be Switch itself.
The main feature is playing games, switching is Switch's unique selling point, until it's not unique anymore. Just like being a handheld, having 3D games, using motion controls, screens having touch input and all other features which were once USP and didn't go away, it eventually becomes something you already have and already expects rather than a selling point. The next switch will already not have the "Wow" it had in 2017, nobody will be impressed if the advertising is about you docking and undocking it.
 
The main feature is playing games, switching is Switch's unique selling point, until it's not unique anymore. Just like being a handheld, having 3D games, using motion controls, screens having touch input and all other features which were once USP and didn't go away, it eventually becomes something you already have and already expects rather than a selling point. The next switch will already not have the "Wow" it had in 2017, nobody will be impressed if the advertising is about you docking and undocking it.

Agree, but point is that with Switch hybrid concept (because its so good and strong), Nintendo hardware doesnt need to have "Wow" effect any more, they dont need to have something huge and big again, especially compared to Sony/MS that are still just home consoles.
But like wrote, you will probably have some other unique things that previous console didnt had for instance AR or VR, but not huge new thing like dual screen, motion controls or now hybrid concept that hole console will made around.

There two huge changes that Nintendo is offers with Switch compared to any previous Nintendo hardware, 1st they selling hardware to home console and handheld market in same time, and second, you have all Nintendo IPs and games on single Nintendo device, and that will be continue to main selling point for at least this next gen Switch hardware.
 
0
The name of the next Nintendo hardware is more a marketing question than anything. The name will give direction to how they want to sell it to the public. They can name it Switch 4k or Switch 2 or Switch Advance or Super Switch, but they still have to quickly explain what is different about this hardware. They will drop a short promotional video doing just that. And that video will feature 4k talk when docked. It will mention upgraded graphics. If there are some new hardware gimmicks they will be showcased. The name won’t matter too much in the long run as long as the major bullet points are apparent to the buyer. That is to say much like everyone understands how (OLED model) is largely an iterative upgrade to the handheld experience, not that its only compatible with OLED TVs, they will understand this next hardware is an upgrade for games across the board but especially when you hook it up to your 4k TV. Image quality and game performance are going to be the major selling points for whats next (unless they introduce some gimmick).
 
I'm currently playing Far Cry 6 on Xbox One S. It's dynamic 900p with TAA anti aliasing. Switch first party games look much cleaner. I'm glad that Nintendo choose custom AA for their games instead of TAA. TAA needs at least 1080p for kinda sharp image. 1440p+ recommended.
 
It does.

All Ampere products technically do, however T239 has is done differently than the other ampere products.

It can do 4096 GFLOPs FP32 and 8192GFLOPs of FP16.

I believe ILikeFeet shared info about that and it had listed that ORIN does it differently.


Edit: never mind, it was FWD-BWD that found it

The "double speed" thing is marketing relative to Volta. Chips starting with Turing doubled the FP16 performance compared to Volta. Ampere then doubled the FP32 performance, making them 1:1 again. Orin is unique among the Ampere chips because it again doubles the FP16 performance, making it 2:1 with FP32. Drake doesn't seem to have this feature.
 
Anyone who can’t fathom paying hundred of dollars on a device that ONLY plays videogames…just never buys one.
This is a strawman and an obvious statement. I'm not talking about people who can't fathom buying a gaming console, I'm talking about enthusiasts who can't afford to buy one. There is a difference.
The people you are talking about…who love video gaming but haven’t bought any gaming hardware for years cause they can’t justify spending that kind of money on that hobby…isn’t reading forums like this.
I love how you just pulled this out of thin air. Fun fact, when I was younger, I was exactly the kind of person who both frequented video game forums and was too poor to afford to buy gaming consoles. I know plenty of other people on forums who are currently in such a situation. This comment is completely out of touch with reality.
I love the pure communist ideal, but unfortunately it is a utopia.
The reality is that the world today is based on consumerism, and companies adjust the prices of their products accordingly.
This is not the place to decide whether it is right or wrong, but we are forced to acknowledge it and reason accordingly.
This has nothing to do with communism. I'm just talking about acknowledging the experiences of people within certain demographics. I'm not saying they're entitled to gaming consoles.
I don't see how discussions can be constured as lack of empathy (do people even know what that word even means?) or compassion.
When a person explicitly states that they don't care how one feels about something, then yes, such apathy quite literally indicates a lack of empathy.
 
there's not GI or AO. it's only Reflections and is only on the cars. also, it's important to remember that the RT reflections are full resolution (2160p).

if the resolution of RT was ¼ the input resolution (before DLSS), then Drake could definitely do RT
Ah thanks, yeah I noticed in the rain soaked courses (on the replays) nothing reflected in the track which had this weird disconnect making the car feel like it wasn't completely part of the scene ...
Thanks for the correction.
 
0
If your first thought when seeing “Switch 2” is “I need 2 switch units to play this?” I’m gonna have to ask you from what dimension you spawned from and what does this mean:

2


In that dimension.
Wait switch is a games console?
I thought I was playing on a a device for making and breaking the connection in an electric circuit.
Man I'm so confused.
 
0
The memory of the GPU is different in this, the speed of PCIe is what determines how fast a CPU can communicate with a GPU.
GTX 970 was crippled with having 0.5GB out of it’s 4GB ram to be really slow thanks to some shenanigans from Nvidia’s side.
Think it was the SM setup being very different from the 980 resulting in a big memory penalty when using more than 3.5GB of ram
In actuality that 64 bit bus is probably the biggest hindrance of that card, in comparison to a highway if the highest theoretical speed limit is 120mph but only over 2 lanes of traffic. It would be better served to have more lanes running at slightly slower speeds to achieve more efficient results without the potential of congestion.

I'm sure whatever solutions Nvidia and Nintendo comes up with won't leave this new device bottlenecked in the way Switch currently is.
They have a very real idea of what's needed (memory bandwidth wise) in order to render both native resolutions of 720p @60fps in handheld and 1080p @60fps while docked.
I think I get it, its not so much about what comes out of the GPU, it's more about what gets in. Over PCIe 4.0, the RAM and the CPU can push twice as much information as they do over PCIe 3.0. And in some scenarios with 3.0, there is simply too much being pushed to the GPU memory for it to process stuff in a timely fashion (there is congestion over the paltry 4 lanes of traffic of its bus). Had it had more lanes, it could have processed whatever is thrown at it on time, assuming the buffer (memory capacity) of the GPU is large enough to accomodate for the flow.

So, if we follow this logic, developers will be interested in first knowing what CPU and RAM speeds the succ has, and from there give suggestions about what memory capacity, memory bandwidth and bus the GPU should have in their opinion?

This was Capcom's train of thought when they asked for 4 GB of RAM for the original Switch? They probably deduced that the bus is wide enough and the bandwidth high enough to saturate 2GB of RAM at the speeds Nintendo targeted so they asked for more?

Did I get it right?
 
I think I get it, its not so much about what comes out of the GPU, it's more about what gets in. Over PCIe 4.0, the RAM and the CPU can push twice as much information as they do over PCIe 3.0. And in some scenarios with 3.0, there is simply too much being pushed to the GPU memory for it to process stuff in a timely fashion (there is congestion over the paltry 4 lanes of traffic of its bus). Had it had more lanes, it could have processed whatever is thrown at it on time, assuming the buffer (memory capacity) of the GPU is large enough to accomodate for the flow.

So, if we follow this logic, developers will be interested in first knowing what CPU and RAM speeds the succ has, and from there give suggestions about what memory capacity, memory bandwidth and bus the GPU should have in their opinion?

This was Capcom's train of thought when they asked for 4 GB of RAM for the original Switch? They probably deduced that the bus is wide enough and the bandwidth high enough to saturate 2GB of RAM at the speeds Nintendo targeted so they asked for more?

Did I get it right?
It can be more complicated than that. Even if the memory bus is slow, but you have large amount of memory then you can push large amount of data to it during the loading screen and leave it there until its not longer needed, thus mitigating the low speed transfer to a degree. More memory is always useful.

On the converse, one of the reasons why many argue that the Ps5 and XBSX are fine without as much RAM (2x jump from previous gen, rather than the 16x from PS3 to PS4) is because the HD is much faster, so not as many assets need to be pre-loaded into the RAM as before.
 
Last edited:
I'm currently playing Far Cry 6 on Xbox One S. It's dynamic 900p with TAA anti aliasing. Switch first party games look much cleaner. I'm glad that Nintendo choose custom AA for their games instead of TAA. TAA needs at least 1080p for kinda sharp image. 1440p+ recommended.
Doesn't xenoblade 2 use TAA? I like how xenoblade 2 did it, but not xbde
 
0
FSR 2.0 looking promising 👀

Dealthloop-1-NATIVE-.jpg
Dealthloop-1-FSR2.jpg
Dealthloop-1-FSR1.jpg


Deathloop-NATIVE.jpg
Deathloop-FSR2.jpg
Deathloop-FSR1.jpg


AMD-FSR-2.0-3.jpg
AMD-FSR-2.0-4.jpg


What's most impressive is the fact that it looks comparable to DLSS 2.0, yet doesn't need ML to train a model AND it stays open source. I wonder how it'll stack up against DLSS 2.0 when it releases and people start comparing the two. Also, would the performance hit be greater than DLSS?
I assume implementing FSR 2.0 will be similar to DLSS 2.0 because of the need for depth, motion vector, and color data?

Source pictures: https://videocardz.com/newz/amd-teases-fidelityfx-super-resolution-2-0-coming-q2-2022
 
What's most impressive is the fact that it looks comparable to DLSS 2.0, yet doesn't need ML to train a model AND it stays open source. I wonder how it'll stack up against DLSS 2.0 when it releases and people start comparing the two. Also, would the performance hit be greater than DLSS?
I assume implementing FSR 2.0 will be similar to DLSS 2.0 because of the need for depth, motion vector, and color data?
it'll probably be inferior to DLSS in like for like comparisons. I don't think there will be a noticeable performance hit, in fact, it should run better.

if a dev implements one, there's no reason they can't implement the other. hence I foresee all solutions implemented unless devs come up with their own TAAU or use an engine's built-in TAAU (like UE5's)
 
0
It can be more complicated than that. Even if the memory bus is slow, but you have large amount of memory then you can push large amount of data to it during the loading screen and leave it there until its not longer needed, thus mitigating the low speed transfer to a degree. More memory is always useful.

On the converse, one of the reasons why many argue that the Ps5 and XBSX are fine without as much RAM (2x jump from previous gen, rather than the 16x from PS3 to PS4) is because the HD is much faster, so not as many assets need to be pre-loaded into the RAM as before.
Again a straightforward and interesting answer. It only makes me want to delve deeper in the subject. Thanks!
 
0
When a person explicitly states that they don't care how one feels about something, then yes, such apathy quite literally indicates a lack of empathy.
I don't know how this discussion suddenly became all about people's feelings over price, when I would expect them to know better than get emotional over estimates that are just pure speculation at this point...

Besides, I would advise against trying to spend on something outside one's financial means. This is the reason why I've held off on building a new PC, or buying every game that comes out. I simply lack the time and resources to enjoy everything nowadays.

Especially people who had just bought a Switch: Enjoy what you have right now and don't fuss over what comes next.
Come on now, give the general public at least a little bit of credit lol would make for a good Abbott and Costello routine though: "Have you seen the Switch 2?" "The switch to what?"....ok, maybe not that good.
Well that's the thing: People don't seem to know any better.
 
So, if we follow this logic, developers will be interested in first knowing what CPU and RAM speeds the succ has, and from there give suggestions about what memory capacity, memory bandwidth and bus the GPU should have in their opinion?

This was Capcom's train of thought when they asked for 4 GB of RAM for the original Switch? They probably deduced that the bus is wide enough and the bandwidth high enough to saturate 2GB of RAM at the speeds Nintendo targeted so they asked for more?

Did I get it right?

Probably depends on how early/late into the design process.
Bus width would have to be pretty early, as that's related to the physical design. Like the actual number of physical pins used, IIRC.
Bandwidth.... well, bandwidth is memory frequency x 2 (because the DDR in DDR/LPDDR/GDDR stands for Double Data Rate) x bus width (usually in bits) / 8 (to convert to bytes). That's in MB/s, then divide by another ~1000 to get GB/s. So current Switch when docked is 1600 MHz * 2 = 3200 MT/s (MT = MegaTransfers), then x 64 / 8 = 25,600 MB/s, then divide by 1000 to get 25.6 GB/s. So, bus width as noted above is related to physical design, so that needs to be decided earlier. (Officially supported) maximum frequency/clock rate is decided by the particular memory standard/generation (LPDDR4 vs 4X vs 5 vs 5X, etc.). Which standard/generation is supported is in turn decided by the memory controller; presumably that too is figured out earlier in the process. The actual frequency/clocks at runtime should be decided on the software side, so that can come later.
Capacity... is indirectly constrained by bus width. Memory manufacturers produce RAM in chips or modules that are X bits wide with Y capacity. So when you decided on bus width, you effectively decided the minimum and maximum capacity as well, depending on what options RAM makers offer. But that's the range; your actual selection from the given options can come later, like with the Capcom example. Changing from 2 or 3 GB to 4 GB is a matter of changing from a pair of 32-bit 1 or 1.5 GB modules to a pair of 32-bit 2 GB modules. I say pair in this case as we've seen from teardowns that the Switch uses two modules, right? Anyway, that should be a relatively simple change that shouldn't require modifying the design.
 
Capacity... is indirectly constrained by bus width. Memory manufacturers produce RAM in chips or modules that are X bits wide with Y capacity. So when you decided on bus width, you effectively decided the minimum and maximum capacity as well, depending on what options RAM makers offer. But that's the range; your actual selection from the given options can come later, like with the Capcom example. Changing from 2 or 3 GB to 4 GB is a matter of changing from a pair of 32-bit 1 or 1.5 GB modules to a pair of 32-bit 2 GB modules. I say pair in this case as we've seen from teardowns that the Switch uses two modules, right? Anyway, that should be a relatively simple change that shouldn't require modifying the design.
So, going with a wider bus has this advantage then. Well, if that is the case then I hope the succ will come with a 128 bit bus.

Last point: how does cache influence all this? Where does it intervene? What flows through it? Why is it there in the first place? Does its introduction in the discussion invalidate some of the facts mentioned so far?
 
Last edited:
Ah, cache...
First, the Why.
In order to do work, CPU/GPU cores need to pull instructions/data from somewhere, right?
There's storage (be it magnetic tape, Hard Disk Drive, Solid State storage, or whatever). But accessing storage is freaking slow!
And lo, we have RAM. And accessing RAM is a lot faster than storage!... but it's still too slow for us!
And lo, we come up with cache. Chunks of SRAM physically located even closer to the cores, for even better latency (definitely) and bandwidth (...depending on what we're comparing it to).

What it's doing/where's it placed?
It goes back to "In order to do work, CPU/GPU cores need to pull instructions/data from somewhere, right?"
To pull something from somewhere, you need to search for that something. When you search, you look through locations A, B, C, etc. in a chronological order of some sort, until you find what is you're looking for.
Typically, cache is looked through first. If there are multiple levels of cache, you look through L1 first, then L2, and so on. (L1/L2/L3 are named so for that reason). If you can't find what you want in cache, you look in RAM. If RAM doesn't have what you want, you look in storage.

What's the influence?
Any time you can successfully pull from cache instead of RAM is a net improvement to performance, because it means the core(s) spent less time idling/waiting around for the next instruction/data to work on.

The facts already mentioned so far should not be invalidated. The introduction of cache should just add more pieces to the puzzle... and further complicate things.
 
Ah, cache...
First, the Why.
In order to do work, CPU/GPU cores need to pull instructions/data from somewhere, right?
There's storage (be it magnetic tape, Hard Disk Drive, Solid State storage, or whatever). But accessing storage is freaking slow!
And lo, we have RAM. And accessing RAM is a lot faster than storage!... but it's still too slow for us!
And lo, we come up with cache. Chunks of SRAM physically located even closer to the cores, for even better latency (definitely) and bandwidth (...depending on what we're comparing it to).

What it's doing/where's it placed?
It goes back to "In order to do work, CPU/GPU cores need to pull instructions/data from somewhere, right?"
To pull something from somewhere, you need to search for that something. When you search, you look through locations A, B, C, etc. in a chronological order of some sort, until you find what is you're looking for.
Typically, cache is looked through first. If there are multiple levels of cache, you look through L1 first, then L2, and so on. (L1/L2/L3 are named so for that reason). If you can't find what you want in cache, you look in RAM. If RAM doesn't have what you want, you look in storage.

What's the influence?
Any time you can successfully pull from cache instead of RAM is a net improvement to performance, because it means the core(s) spent less time idling/waiting around for the next instruction/data to work on.

The facts already mentioned so far should not be invalidated. The introduction of cache should just add more pieces to the puzzle... and further complicate things.
Yeah, although I say the easy-to-understand thing is More Cache at the L1, L2, and L3 level 9 times out of 10 = more performance in video games to a certain extent.

You can't give a GT1030 a gigabyte of L1 Cache and expect it to perform like an RTX 3050.
Cache's effectiveness heavily hinges on how memory-constrained the card is at the core level.

Ampere and RDNA1 (And therefore RDNA2) happen to be memory-starved uArchs that brute-forcing higher core counts and clocks hit diminishing returns (at the expense of higher power draws).

Infinity Cache in RDNA2 helps alleviate this greatly allowing higher clocks and better binns to give a large boost in performance in the case of the 6900XT, and also showing how the lack of Cache in RDNA2 on consoles makes their GPUs perform weaker relative to where their core counts should suggest in RDNA2's Desktop stack.

And luckily, Orin, Lovelace, and Drake all seemingly address the memory cache problem Ampere had with upping the L1 and L2 cache rather than adding a large L3 that AMD did with Infinity Cache.

(Now, Nintendo could add an L3 to Drake and have it shared between the CPU and GPU, but we don't know that yet)

Either way, for an easy-to-understand answer to what "Fixing Ampere's Memory Starvation" could result in, look at Turing.

Turing has 64 CUDA cores per SM and was the most performance-dense uArch NVIDIA has produced in modern memory, and it shows relative to Ampere.

The 2080Ti and the 3070, despite performing similarly in games, have very different TFLOP numbers, the 3070's number being "Inflated" versus the 2080Ti, indicating that it is less efficient per FLOP.

The cache that Lovelace, Drake, and Orin add to Ampere could bring that FLOP-efficiency back more in line with Turing (Don't know if they would reach that level exactly, but that's the general gist)
 
Ah, cache...
First, the Why.
In order to do work, CPU/GPU cores need to pull instructions/data from somewhere, right?
There's storage (be it magnetic tape, Hard Disk Drive, Solid State storage, or whatever). But accessing storage is freaking slow!
And lo, we have RAM. And accessing RAM is a lot faster than storage!... but it's still too slow for us!
And lo, we come up with cache. Chunks of SRAM physically located even closer to the cores, for even better latency (definitely) and bandwidth (...depending on what we're comparing it to).

What it's doing/where's it placed?
It goes back to "In order to do work, CPU/GPU cores need to pull instructions/data from somewhere, right?"
To pull something from somewhere, you need to search for that something. When you search, you look through locations A, B, C, etc. in a chronological order of some sort, until you find what is you're looking for.
Typically, cache is looked through first. If there are multiple levels of cache, you look through L1 first, then L2, and so on. (L1/L2/L3 are named so for that reason). If you can't find what you want in cache, you look in RAM. If RAM doesn't have what you want, you look in storage.

What's the influence?
Any time you can successfully pull from cache instead of RAM is a net improvement to performance, because it means the core(s) spent less time idling/waiting around for the next instruction/data to work on.

The facts already mentioned so far should not be invalidated. The introduction of cache should just add more pieces to the puzzle... and further complicate things.
I see. So the cache minimizes the 'backtracking' of data between the CPU and GPU, and the RAM/storage. So, the more often a certain data is used, the closer it must be placed to make the cache as efficient as possible.

OK then, cache is there to boost performance. And a difference in cache can explain why Nvidia flops are not equal to AMD flops? If yes, how did that change with the introduction of Infinity cache. And is this cache an L3? So it is inherently slower than any L1 and L2 cache around, yes?

Sorry for asking so many questions but this board has a very pedagogic approach to explaining this stuff and I can't stop learning. It's a great feeling.
 
Last edited:
I see. So the cache minimizes the 'backtracking' of data between the CPU and GPU and the RAM/storage. So, the more often a certain data is used, the closer it must be placed to make the cache as efficient as possible.

OK then, cache is there to boost performance. And a difference in cache can explain why Nvidia flops are not equal to AMD flops? If yes, how did that change with the introduction of Infinity cache. And is this cache an L3? So it is inherently slower than any L1 and L2 cache around, yes?

Sorry for asking so many questions but this board has a very pedagogic approach to explaining this stuff and I can't stop learning. It's a great feeling.
The funny thing is TFLOPs are not comparable across generations even within the same vendor unless the core structure is VERY similar (Kepler 600 vs Kepler 700, Maxwell vs Pascal.etc, all the GCN-Gen cards more or less)


It's moreso of a quirk on how they calculate the TFLOP value and how that applies in real-world scenarios.


As for RDNA2, Infinity cache managed to bring the gaming FLOP efficiency up to a degree where it overtakes Ampere's FLOP efficiency.

RDNA2 without Infinity cache is either slightly less efficient, or marginally more efficient than Ampere per-FLOP.

As for Cache speed, yeah, L1 > L2 > L3 in speed, but cost increases greatly along with the size decreasing greatly as well usually.
 
So, going with a wider bus has this advantage then. Well, if that is the case then I hope the succ will come with a 128 bit bus.

Last point: how does cache influence all this? Where does it intervene? What flow through it? Why is it there in the first place? Does its introduction in the discussion invalidate some of the facts mentioned so far?
You're asking a bunch of foundational questions about computing, which is rad. This is starting to get way off topic, but I'm gonna try and generalize a bit for you.

The basic loop that any computational system has is Take Some Data In -> Transform It -> Output. This structure is everywhere. You see it at the high level - how do I take the game data off the cartridge, turn it into a big open world, and shove it out to the screen - but it repeats at the low level of every individual component. I have to feed data from the cartridge and decompress it into memory, I have to get data into memory and put it in the processor, or the the GPU. Every output turns into an input for the next stage, into it finally turns into something that the player experiences... and then even the player experience turns into a button press, which becomes input again.

Every time we reproduce this pipeline, we ask the same kind of performance questions about it.
  • How do we make the loop faster, faster being measured by the number of times we can read/transform/output data in a second
    • When you see things like a clock speed or FLOPS (floating point operations per second), that's what we're measuring
  • How do we make this loop more performant, which is harder to have an objective measure of, but basically its how many times do we have to run this loop before we get the final result
    • When we talk about one microarchitecture being more performant per FLOP that is what we mean. If we can do more work per cycle, then we don't always need as many cycles-per-second
  • How do we make the cycles fatter or have higher bandwidth - how much data can we move in and out per second.
    • This is the bus speed we've been discussing
  • How do we make this loop shallower or less latent. So you might have a process that runs very fast and is ultra performant but has a long startup time.
    • Happens all the time in controllers. You want the response to be instant
  • How do we keep this loop fed or free from bubbles.
    • If one part of the system is waiting on another part, it doesn't matter how fast/efficient/latent/fat your pipe is
    • Making this work is often the job of the software devs
    • But they only have the tools available that the hardware gives them
    • Which is why some games can seem smaller than Doom/Witcher III to port to Switch, but devs can't make it work.
You can almost always make some part of the system BIGGER - more clock, more ram, more bus - but all of them cost money. And generate heat. And use battery. And take up space. So a hardware designer's job is often to figure out the biggest bang for the buck - which means looking at where software is hitting roadblocks now but also guessing what future roadblocks might be.

They've got a lot of tools to try, too. They can do things like:
  • JUST MAKE IT BIGGER. Valid, when you can do it, but you're quickly going to hit diminishing returns unless you make everything bigger - the bus, the RAM, the GPU. That's why you can't just compare clocks between two systems and know which is "better"
    • Also, eventually you're going to be limited by the speed of light
  • Just add another one. 2 cores instead 1, 4 cores instead of 2. This means you can do two things at the same time, but it only works if the kind of things you are doing can be split into independent tasks.
    • Like doing sound on one core, and physics on another.
    • Going to max out when you can't subdivide tasks anymore
    • Same goes for busses - two slightly slower busses might be better than one really fast bus, but only as long as you can split your data into two channels
  • Make it more complicated by specializing. For example: we used to do graphics on the CPU before the invention of the GPU! Then we invented this specialized piece of hardware to make just graphics faster.
    • But now we've added a second, inner cycle, where instead of just feeding the CPU, which goes to the screen, the CPU feeds the GPU, and sometimes it even comes back to the CPU. New spot for bubbles to occur...
      • And then a new spot to optimize this cycle again...
    • Only accelerates one part of the pathway. So if you CPU is crap, you maybe pushing gorgeous 3D worlds, but your physics and enemy AI is stuck in N64 era.
  • Make it more complicated by caching. Caching essentially lets you eliminate or greatly speed up one of the steps of the process. You have a tiny, expensive , blisteringly fast piece of storage and you stick it inside one of the steps
    • For data input, you store the last couple things you looked at, so if you're still working on the same data, you don't actually have to read it again
    • For transformation you store the results of some instructions, and instead of doing the work again, when you see an identical instruction you just send out the old result
    • For output, you just cache the last thing you saw and use it again
      • Practically speaking what really happens isn't that you cache your output, it's that your output is someone else's input, so they do the caching
Okay, at this point I've written the intro to a hardware 101 textbook.
 
What is the thing I hear about increasing L2 cache also having it also increase latency? Are there downsides to having such enormous caches?
Cache gets slightly slower the bigger it gets. If your system has a working set small enough that it fits in the smaller cache, the newer, bigger cache can cause problems.

In the Big Data server workloads I am used to that is almost never a problem. I suspect that video games have similar bigger-cache-is-better-cache workloads, but are also very sensitive to latency problems
 
I see. So the cache minimizes the 'backtracking' of data between the CPU and GPU and the RAM/storage. So, the more often a certain data is used, the closer it must be placed to make the cache as efficient as possible.

OK then, cache is there to boost performance. And a difference in cache can explain why Nvidia flops are not equal to AMD flops? If yes, how did that change with the introduction of Infinity cache. And is this cache an L3? So it is inherently slower than any L1 and L2 cache around, yes?

Sorry for asking so many questions but this board has a very pedagogic approach to explaining this stuff and I can't stop learning. It's a great feeling.
Functionally speaking, yea, Infinity Cache is just a fancy name for L3. It's a level of cache placed after L2, ergo it's L3 (because the naming is descriptive).
It would be slower relative to L1 or L2, because I think that it'd be physically located further away than the L1 or L2 are.

What is the thing I hear about increasing L2 cache also having it also increase latency? Are there downsides to having such enormous caches?
Cache gets slightly slower the bigger it gets. If your system has a working set small enough that it fits in the smaller cache, the newer, bigger cache can cause problems.

In the Big Data server workloads I am used to that is almost never a problem. I suspect that video games have similar bigger-cache-is-better-cache workloads, but are also very sensitive to latency problems
I'm actually a bit unclear on why cache size correlates to slowdown. Is it because of the time necessary to search through the increased cache? Is it because the increase in physical size increases the average distance from CPU to the cache (and we're still bound by the speed of light)? Is it a combination of both?

And to answer 'downsides of such enormous caches': die size. Cache/SRAM is the opposite of dense.
Although honestly, I'm a bit unclear myself on how the gap in density between SRAM and DRAM is so big. I get that a SRAM cell needs 6 transistors while a DRAM cell needs 1 transistor + 1 capacitor. I'm... not sure how we eventually get to 'cache is measured in KB/MB, RAM modules are in GB'.
I do get on the surface level that in manufacturing, logic has been shrinking faster than SRAM, but I'm unclear on specifics.
 
Yeah, and honestly part of that is why I feel Nintendo can't let the price of Drake creep up too high at default because it would make that model that "Everyone in the family" can get unattainable.

Nintendo won’t be positioning Drake as the Switch everyone needs to get.

I’m sure they will position it as the premium high end model for enthusiasts.
 
Nintendo won’t be positioning Drake as the Switch everyone needs to get.

I’m sure they will position it as the premium high end model for enthusiasts.
I am saying for the future
Like, after the launch for future models.

EX: Drake Lite

Drake is a next-generation model, and it will have to get treated as one in full once Mariko can no longer be produced.

So Drake's base cost can't be too high otherwise the Drake Lite will be too expensive/the gap between the Drake Lite and Drake-Hybrid will be too big.
 
Switch 2 is most logical and most practical naming if we really talk about next gen hardware, all those names are thing of the past.
I dont see any problem with Switch 2 naming and in same time keep selling Switch and Switch 2 hardware and games in same time for around 2-3 years after Switch 2 launch.

There is absolutely no way…zero, none…that Nintendo would release an upgrade model at the halfway point of the Switch’s lifecycle, during its continued and upward growth, and position it as a next gen successor.

No way.

Now, you can argue with me all you want that Nintendo will eventually treat Drake Switch as a successor and stop selling and supporting OLED/Lite/older hybrid Switch’s In two years…go ahead and do that. But they absolutely won’t release it as such.

So suggesting they will call it Switch 2…or that they need to…is just silly. No offense :p
 
FSR 2.0 looking promising 👀

Dealthloop-1-NATIVE-.jpg
Dealthloop-1-FSR2.jpg
Dealthloop-1-FSR1.jpg


Deathloop-NATIVE.jpg
Deathloop-FSR2.jpg
Deathloop-FSR1.jpg


AMD-FSR-2.0-3.jpg
AMD-FSR-2.0-4.jpg


What's most impressive is the fact that it looks comparable to DLSS 2.0, yet doesn't need ML to train a model AND it stays open source. I wonder how it'll stack up against DLSS 2.0 when it releases and people start comparing the two. Also, would the performance hit be greater than DLSS?
I assume implementing FSR 2.0 will be similar to DLSS 2.0 because of the need for depth, motion vector, and color data?

Source pictures: https://videocardz.com/newz/amd-teases-fidelityfx-super-resolution-2-0-coming-q2-2022

I’m on mobile and there is no difference between any of the pics 😂
 
Please read this staff post before posting.

Furthermore, according to this follow-up post, all off-topic chat will be moderated.
Last edited:


Back
Top Bottom