• Hey everyone, staff have documented a list of banned content and subject matter that we feel are not consistent with site values, and don't make sense to host discussion of on Famiboards. This list (and the relevant reasoning per item) is viewable here.
  • Do you have audio editing experience and want to help out with the Famiboards Discussion Club Podcast? If so, we're looking for help and would love to have you on the team! Just let us know in the Podcast Thread if you are interested!

StarTopic Future Nintendo Hardware & Technology Speculation & Discussion |ST| (Read the staff posts before commenting!)

A small question.

Do you play with your Nintendo Switch or you just enjoy talking about it? I mean... Are you guys even going to play with your Switch Pro/2/Super/Ultra?

๐Ÿ˜…
Depends on what released recently. Once I finally got my hands on Arceus, I barely put it down. If Xenoblade and Zelda are out around that time, I'm definitely going to play it a ton.
 
0
A small question.

Do you play with your Nintendo Switch or you just enjoy talking about it? I mean... Are you guys even going to play with your Switch Pro/2/Super/Ultra?

๐Ÿ˜…
I definitely spend more time talking about hardware and the possibilities than playing lol. But that's because I have tons of things to manage. I do play from time to time. Though recently I've been playing more 3DS titles due to my big backlog of titles that I haven't finished.
 
0
I bought an mClassic for my Wii and tried it out on Switch, which I didn't really think it needed at first, but I ended up liking the effect on my LG CX. So now I'm playing more and more Switch in docked mode as a psuedo-sneak peek at what a Switch with better IQ feels like. :cool:
 
I actually started playing multiple switch games
Northgard
Majoras mask
Strangers wrath
Throne breaker
Dang
I still need to play Mario sunshine
Iโ€™m having a hard time sticking to just one game I wonder which one Iโ€™ll end up landing on
 
I actually started playing multiple switch games
Northgard
Majoras mask
Strangers wrath
Throne breaker
Dang
I still need to play Mario sunshine
Iโ€™m having a hard time sticking to just one game I wonder which one Iโ€™ll end up landing on
The switch curse ๐Ÿ˜‚
 
A small question.

Do you play with your Nintendo Switch or you just enjoy talking about it? I mean... Are you guys even going to play with your Switch Pro/2/Super/Ultra?

๐Ÿ˜…
Iโ€™ve put tons of hours into my Switch and Iโ€™d have put even more into it if not for so many games running so poorly or being so blurry, which this will help a ton with.
 
0
NVN2 is a graphics API, so it doesn't have things like that. The API lets you turn configure the GPU and tell it what shaders, models, textures, etc. to use when rendering frames. There is some level of integration with the OS and therefore the Nintendo Switch SDK, but only for graphics purposes, and that integration is not Nvidia's property and wasn't part of the leak.
I see. So the API only deals with the renderer. I'm guessing most of the input stuff is handled by the OS.
 
0
A small question.

Do you play with your Nintendo Switch or you just enjoy talking about it? I mean... Are you guys even going to play with your Switch Pro/2/Super/Ultra?

๐Ÿ˜…
Not a top poster by any means, but my Switch-Forum ratio is quite healthy nowadays xD . Despite being bummed out by SwOLED's announcement the actual thing itself is such a nice improvement that I'm playing a lot - and that's just a Switch with it's rough edges softened and a really nice screen.

So Switch 4K with BotW2 alone would consume me. Add to that the Mario Kart DLC, potential Doom 2016 / Eternal patches and whatnot and yeah xD .

(That said if all goes well I also have to make time to gush about it here...)
 
0
Regarding the usage of mobile phones, the info of that @Hotgirlvideos69 account is being recycled from something Emily said regarding the Stinky Horse Everybody's Switch game, which might imply being possible to play (in some way) with your phone.

So, if that game ever releases including some kind of mobile phone interactions, don't think that account has any more credibility because of that.

She even mentioned at the time that she was sure this kind of insiders accounts were going to follow with the info...

As I said months ago -- and even last month-- the game doesn't sound too hot. But hey, I really shouldn't make assumptions about what critics will enjoy. Maybe critics will be pleasantly surprised and delighted by this sequel?

For reference, the original 1-2-Switch has a 58 on Metacritic. Critical receptions don't get more mixed than that. 1-2-Switch was a cute idea back when the system launched, but how will critics react to a shallow sequel to a shallow game when Switch is almost 5 years old?

Surprisingly, somehow, this sequel seems even more gimmicky than the first game.

Why play with only joy-cons when you can also use mobile devices (smartphones)? (Now let's wait 24 hours so SamusHunters can post this on their Twitter account and claim they heard it from their sources. )
 
I spend way much more time on Famiboards and Install Base than on my Switch. Let's see if Kirby and Triangle Strategy change anything to that.
 
Can Samsung's 5nm yields be improved or is that something that is set in stone?
They can be improved in the sense that TSMC can achieve chips with better specs without yield issues, so it must be possible. However, I'm sure that Samsung is willing to cut you a check with many zeroes if you can tell them how.
 
Last edited:
0
A small question.

Do you play with your Nintendo Switch or you just enjoy talking about it? I mean... Are you guys even going to play with your Switch Pro/2/Super/Ultra?

๐Ÿ˜…

I play on my Switch nearly every day, between 1-3 hours. SW Kotor, Xcom 2 or Arceus
 
Last edited:
A small question.

Do you play with your Nintendo Switch or you just enjoy talking about it? I mean... Are you guys even going to play with your Switch Pro/2/Super/Ultra?

๐Ÿ˜…

I play on it a lot and Iโ€™ll play even more with the Switch Ultra.

So itโ€™s in Nintendoโ€™s best interest to release Switch Ultra later this year. :p
 
Orin Drive is expected to be upgradable via software to Level 5 autonomous driving (meaning no attendant human required) according to Nvidia, so thatโ€™s what I mean by precision, enough AI strength to make that claim (which Nvidia hadnโ€™t before Orin), and that kind of machine learning capacity would be derived (at least in VERY large part) from these new and improved tensor cores you mentioned and their deep learning capabilities.

But Nintendo makes video games, the accuracy required for Level 5 autonomous driving just isnโ€™t necessary for them, their machine learning use case can absolutely afford to make a mistake. And if Orinโ€™s better tensor cores arenโ€™t efficient for Nintendoโ€™s needs on a performance per watt level and do far more than what they need while the tensor cores in desktop Ampere chips are enough and better on power use... itโ€™s a no-brainer, right?

But the double-wide Orin tensor cores would be the more efficient option in performance per Watt if Nintendo's looking purely for DLSS performance. Let's say that you had a 6 SM part based on desktop Ampere and wanted to double the tensor core performance. If you switched to Orin's double-wide tensor cores, then you're approximately doubling the power consumption from tensor cores*, and that's about it. However, if you kept the standard tensor cores, but doubled up on the number of SMs, you're doubling the power consumption for everything. Tensor core power consumption doubles, because you've got two of them, but you're also adding extra standard CUDA cores, extra RT cores, extra texture units, extra control logic, and all the additional wiring, logic and associated power consumption from moving data and instructions to and between these units.

* I'd actually assume the power consumption of Orin's tensor cores is less than double the power consumption of standard Ampere tensor cores, as while you're doubling the ALU width, there'll be a certain proportion of instruction decode and control logic which won't be doubled.

I remember we were discussing this back and forth with A100 and the performance of its tensor over GA10x series of cards and the only difference maker seemed to be the larger cache memory for A100.
So I do wonder if we see the same uplift in performance with Lovelace and its massive cache increases over Ampere...

A100's tensor cores do operate at twice the operations per clock of the GA10x series. It's detailed in Nvidia's GA102 white paper (page 25). These are purely theoretical op/clock figures, so the cache wouldn't come into it (although it surely does play a part in real-world performance differences).

Again, I feel the idea of turning off SMs is inherently flawed as there is zero evidence indicating it outside of the clock-step mentioned in the NVN2 Driver.
Disabling SMs outright does far more than just what a clock-stepper would do and would need a completely different profile in the driver more or less as I doubt they can hot-swap NVN2 Drivers between portable and docked.

And the NVN2 Driver has 0 mentions of any SM, RT Core, or Tensor Core count below what 12SMs would bring.

If you expect them to disable the SMs in portable mode, that is more or less having to program for a completely different system in regards to CUDA count, RT core count, Tensor count (and therefore DLSS"s power), and BANDWIDTH, because the Cache itself would get disabled at the L1 level.

EDIT: Also NVIDIA did mention keeping Ampere around for a bit longer so Samsung 8nm may not be cleared out.
So at best likely it would be GA107-GA103 being active with GA102 being depricated, them depricating the GA-node GPUs as their equivalent AD-node GPUs came out for both desktop and laptops
Not to mention we do know the main Orin family is on 8nm as well so that means even less supply to fit Drake into.

My "evidence" is simply that, if it's manufactured on Samsung N8, I personally find it extremely unlikely that they would be able to run all 12 SMs in portable mode and manage an acceptable battery life. You're welcome to disagree with that. If it's on a better manufacturing process, or is just a physically larger device than Switch, and therefore able to fit a much larger battery, then potentially they could run all 12 SMs in portable mode, but on N8 and with a ~5000mAh battery I don't see all 12 SMs being viable in portable mode.

Incidentally, the things you mention aren't really all that different to the current Switch, as developers already have to perform a mode change when docking or undocking, changing resolutions, graphical effects, managing changes to available GPU resources and bandwidth, etc. The only issue I could see is if developers on Switch are able to assign warps to individual SMs. This isn't a thing on PC, for obvious reasons, but might be the case in the console space. This might require developers to have separate sets of SM affinity mappings for docked and handheld, although if it were a clean cut from 12 SMs to 6, then Nintendo could just implement a system level modulo 6 operation on SM affinity in portable mode (eg if a warp was assigned to SM 9, it would go to SM 3 in portable mode), which would keep warps together and evenly distributed without additional developer effort.

I want to clarify something I said a while back about clock gating that led to the speculation around disabling SMs in portable mode. Clock gating seems to be a very broad area of the drivers that can be used for many different components, and it's a feature common to all Nvidia SoCs/GPUs, not unique to T239/GA10F. What is unique to GA10F is that it's the only Ampere GPU to support FLCG, which stands for first-level clock gating, differentiated from second-level clock gating or SLCG. And I think the reason for this is simply because it's a new feature, with wide support only added in Ada. (While there are older references to FLCG, including one for T210, aka TX1, at least the way it's being exposed is new). There was already some hay made about how T239 would borrow Ada features, so maybe this is one of them.

It's not clear to me which is the higher/broader level of control, first or second. First-level would seem to imply the broadest layer, but there are open-source drivers from Nvidia that refer to SLCG as "master clock gating," making it sound like the higher level. In any event, it seems to me that clock gating is used more as a dynamic system to minimize power consumption, rather than a high-level feature switch you would use to turn off SMs for the duration the system was in portable mode. Even if SMs are being disabled -- which is still a possibility, as I really have no idea one way or the other if that could be done or needs to be done for Drake -- I don't think the one mention of FLCG in GA10F I brought up before is a strong reason to start thinking along those lines.

Thanks for the additional context on this. Out of interest, does it explicitly state that FLCG stands for first-level clock gating, or are you implying it? It just strikes me as a bit odd that they would add something called second-level clock gating and only years later add first-level clock gating. Unless, as you say FLCG was something that already existed in some form but was never really utilised or exposed.

Looking into this, there's actually an additional clock gating mode, BLCG (block level clock gating). The info I can find on this seems to suggest it was the first type of clock gating implemented by Nvidia, and seems to operate at a very high level. The next level is then SLCG (second level clock gating), which is lower level. Incidentally, I found this commit for T210 (Tegra X1) which adds support for SLCG on T210 within Linux, and in the case of this particular commit for T210, it looks like SLCG just covers interfaces, codec blocks, and soforth, rather than core GPU logic. The code you link, by the way, isn't actually driver source code, it's a verilog file for Nvidia's DLA hardware. It's likely Nvidia use the same conventions for what counts as SLCG there, though. My guess is that FLCG would then probably operate at a lower level again.

In any case, it's interesting that there is at least one Ada GPU feature which is supported in Drake but didn't make it into Orin. So if Orin's GPU is a half-step between Ampere and Ada, Drake's is maybe a two-thirds step.

Whatโ€™s the minimum amount of CUDA cores running at 1 ghz with 25W is needed to getโ€ฆsayโ€ฆDeath Stranding running at 4K/DLSS in performance mode? I genuinely donโ€™t know.

As I mentioned above, the RTX 2060 barely does this.

Honestly I don't know. But there are two separate questions there. The first is what performance is required to get Death Stranding itself to render the image data that gets fed into DLSS, and the second is what performance is required to get DLSS to bring that up to 4K. Either of these could be the bottleneck. On the first question, that's entirely on a game-by-game basis. The second question is much more predictable, and Digital Foundry has a good video looking into it for a potential Switch Pro/2. However, there's also the possibility that the DLSS used on the new Switch won't be identical to the PC version, and it may have been optimised to perform better at the cost of a bit of image quality. So in that case the answer would change again.
 
I have no idea how to glean any useful information from this but I just want to thank you for the hard work and let you know it's not being ignored.

Thank you - we are all in this together ๐Ÿค˜

I'm not sure what I was expecting to get out of this, but I find it interesting data nevertheless.

It's fun seeing how business decisions get translated/reflected to actual shipments.

For example, it was reported back in December of 2019 that MinebeaMitsumi Inc (a Nintendo supplier) was shifting production from China to Cambodia, and this is clearly visible in the shipment data:

minebeacambodiad3knc.png


Trying to make sense of shipment spikes/patterns seems like a futile effort though.

For example, the below graph shows that there was a spike of shipments from Nintendo of Japan back in the summer of 2019.
Could this mean something "interesting" or is it maybe just E3-related (i.e. marketing material, etc)? We'll never know.

nintendojapanoskff.png
 
Last edited:
Thanks for the additional context on this. Out of interest, does it explicitly state that FLCG stands for first-level clock gating, or are you implying it? It just strikes me as a bit odd that they would add something called second-level clock gating and only years later add first-level clock gating. Unless, as you say FLCG was something that already existed in some form but was never really utilised or exposed.

Looking into this, there's actually an additional clock gating mode, BLCG (block level clock gating). The info I can find on this seems to suggest it was the first type of clock gating implemented by Nvidia, and seems to operate at a very high level. The next level is then SLCG (second level clock gating), which is lower level. Incidentally, I found this commit for T210 (Tegra X1) which adds support for SLCG on T210 within Linux, and in the case of this particular commit for T210, it looks like SLCG just covers interfaces, codec blocks, and soforth, rather than core GPU logic. The code you link, by the way, isn't actually driver source code, it's a verilog file for Nvidia's DLA hardware. It's likely Nvidia use the same conventions for what counts as SLCG there, though. My guess is that FLCG would then probably operate at a lower level again.

In any case, it's interesting that there is at least one Ada GPU feature which is supported in Drake but didn't make it into Orin. So if Orin's GPU is a half-step between Ampere and Ada, Drake's is maybe a two-thirds step.
I'm partially inferring the meaning of FLCG, and partially basing it off this open-source Tegra code from Android. The latter explicitly calls it first-level clock gating. The leaked source doesn't expand on it as far as I can tell. It does explicitly name all of the following:
  • ELCG: engine-level clock gating
  • BLCG: block-level clock gating
  • SLCG: second-level clock gating
And then there are references to MSCG, which also isn't spelled out anywhere.

...While I was writing this post, I found this Nidia patent that describes a hierarchy going from MSCG (master system clock gating) to ELCG (engine level clock gating) to SLCG (sub-engine level clock gating). So the name for SLCG is different, MSCG is clearly the master one even though SLCG is called that in the aforementioned Verilog file, and it doesn't have any references to BLCG or FLCG. The patent seems mostly to be concerned with memory modules, presumably VRAM and cache. Memory was certainly one of things that came up in the source with respect to clock gating, but with these differing names, this might just add to the confusion.
 
I say it all the time, you guys are the real mvps! Keep the discussion going. Hopefully more info becomes available this year to further the discussion.
 
0
I'm partially inferring the meaning of FLCG, and partially basing it off this open-source Tegra code from Android. The latter explicitly calls it first-level clock gating. The leaked source doesn't expand on it as far as I can tell. It does explicitly name all of the following:
  • ELCG: engine-level clock gating
  • BLCG: block-level clock gating
  • SLCG: second-level clock gating
And then there are references to MSCG, which also isn't spelled out anywhere.

...While I was writing this post, I found this Nidia patent that describes a hierarchy going from MSCG (master system clock gating) to ELCG (engine level clock gating) to SLCG (sub-engine level clock gating). So the name for SLCG is different, MSCG is clearly the master one even though SLCG is called that in the aforementioned Verilog file, and it doesn't have any references to BLCG or FLCG. The patent seems mostly to be concerned with memory modules, presumably VRAM and cache. Memory was certainly one of things that came up in the source with respect to clock gating, but with these differing names, this might just add to the confusion.

Yeah, the naming scheme does seem confusing, alright. AMD seem to have a much simpler naming scheme:
  • Fine Grain Clock Gating
  • Medium Grain Clock Gating
  • Coarse Grain Clock Gating
Basically we can say that Nvidia GPUs support several kinds of clock gating, and some kind of clock gating functionality that's being added in Ada is also present in Drake.

One interesting (although completely unrelated) thing from that patent is figure 10, which lists the lineup of Pascal GPU chips, along with manufacturing process and the configuration of GPCs, TPCs and framebuffer partitions (ie memory channels). This all seems to be accurate, although not too revealing given the Pascal lineup had been on sale for a while when this patent was applied for in early 2017. However, it also lists an entire lineup of what appear to be consumer Volta GPUs:

ChipProcessGPCTPC per GPCFBP
GV202tsmc10ff666
GV204tsmc10ff464
GV206tsmc10ff263
GV207tsmc10ff242
GV208tsmc10ff141

Obviously these never released, but it seems like they had considered releasing a consumer Volta GPU line on TSMC's 10FF process. Instead we got Turing (which is an evolution of the Volta architecture) on 12nm. One thing that's interesting is that the config for GV202, GV204, GV206 and GV207 line up exactly with what we got for TU102, TU104, TU116 and TU117 respectively. So perhaps the initial plans were to release a line of consumer Volta GPUs on 10nm, but then they decided to use 12nm instead, and rebranded to Turing. Completely unrelated to any future Nintendo hardware, but interesting nonetheless.

Edit: Actually, LiC, here's a question: is there any info in the leak on the number of framebuffer partitions for either Orin or Drake? In the Hopper and Ada leaks, this seems to have been revealed under the variable numFbps. I'm assuming it's probably not there for GA10B or GA10F, as they share memory with the CPU, so don't necessarily have the memory controller treated as part of the GPU, but I figure it's worth asking.
 
Last edited:
Hmm, the consumer Volta cancellation sounds due to TSMC's 10FF ending up having a short life.
Looking at the quarterly reports on TSMC's website, 10FF went from being 13% of revenue in 2Q2018 to 6% in 3Q2018 with N7 making its debut with 11%. And as we know since then 10FF continued to shrink while the N7 family grew.
So Nvidia initially planned for consumer Volta, then at some point maybe hear about the short life plans for 10FF, then changed course for Turing?
 
Edit: Actually, LiC, here's a question: is there any info in the leak on the number of framebuffer partitions for either Orin or Drake? In the Hopper and Ada leaks, this seems to have been revealed under the variable numFbps. I'm assuming it's probably not there for GA10B or GA10F, as they share memory with the CPU, so don't necessarily have the memory controller treated as part of the GPU, but I figure it's worth asking.
It seems to be 2 for GA10B and 1 for GA10F.
 
Thanks. Seems like a 128bit bus for 102GB/s bandwidth in that case, just in comparison to Orin. Or conceivably up to 136GB/s if they move to LPDDR5X.
Yeah, around what I think is expected.
102.4GB/s like Orin NX for the lowest.

Likely an effective bandwidth upwards of 200GB/s because of how memory-efficient Ampere is, Orin allegedly being even more efficient, and with the L1 and L2 Cache increases, maybe add in the idea of them sharing 8MB of L3 as supported by the A78/A78Cs between the CPU and GPU
 
Last edited:
Thanks. Seems like a 128bit bus for 102GB/s bandwidth in that case, just in comparison to Orin. Or conceivably up to 136GB/s if they move to LPDDR5X.
Now just to have a better grasp of the things, how much power would be needed for 8GB of LPDDR5 to achieve this speed?
 
Can Samsung's 5nm yields be improved or is that something that is set in stone?
Generally speaking, yes to the former, especially as more companies use said process node, and said process node become more mature as the years go by.
 
Thanks. Seems like a 128bit bus for 102GB/s bandwidth in that case, just in comparison to Orin. Or conceivably up to 136GB/s if they move to LPDDR5X.
There's a section on fb configs that lists supported memory types. It doesn't include LPDDR5X and states that LPDDR5 is used on "T23X and later" which covers T234 and T239. I suppose if T239 is unique and uses LPDDR5X, it wouldn't be terribly surprising for that not to be updated here yet, but there's no sign of it currently.

Come to think of it, it doesn't include LPRDDR4X either, even though that's (supposedly) what Mariko uses. So maybe it just doesn't differentiate. Then again, T214/GM20B_B are barely present anywhere, presumably since that was largely a manufacturing change and there were very few, if any, changes to software and firmware.
 
You cant know if Nintendo will expects growth in sales this year, because they will talk about (2022. sales) at full years results (at end of april).
I dont think that. real question is why you think that this new Switch hardware would be sell at loss with price point of $399?
My bet is $399 price point, maybe at worst case will be $449, but there is no way that will be $499.


Switch is selling at profit from day one, point that Switch is still $299 and Switch OLED is $349 doesnt mean that they have small profit on them, I am pretty sure that they could easily make price cut of $50 for both and still selling them at profit.
Going from $199 to $149 is big price cut we talking about 25% price cut, so I dont see that big price cut in a year, especially because Switch Lite is still selling without game,
1st thing they will do is make bundle with one game for $199.


I dont see how what you wrote has with my points?
Like I wrote, this will not be simple revision or ugprade, its full next gen Switch (based on current rumors) in any way, not to mention that at launch will you have some exclusive games (Nate said some 3rd party games), and by time more and more games will be exclusive until games will stop be developed for current models.


I wrote that biggest difference compared to PS4/PS5 will be that Nintendo will continue making OLED model for around 2 years after new Switch launch.
I don't see why they couldn't stop OLED production around 2 years after this new model launch, or Nintendo stop releasing (or at least most of its games) games for current Switch models also around 2 years after this new model launch.


Well, one thing is sure, if this leaks and rumors are true, no why they will market it like simple revision, because its obvious will be much more than PS4 Pro or New 3DS.
Actually Nintendo always making money on hardware.
Again, you keep ignoring point that this is not simple revision (at least based on leaks and rumors), this is not 3DS XL, 2DS, New 3DS...Switch Lite, Switch OLED...type of revision, this is much more than, its full next gen hardware with some next gen features.
So no its not only one of them, this hardware will carry Switch platform forward, that will have some exclusive games right away from launch and one point games will be relased just for this next gen Switch model and next gen Switch revisions.
Having on mind that some of current models will keep selling and that there will limited supply, Nintendo cant have like 20m units ready for 1st year, but 10-15m is maybe possible.
Point is that you don't make that type of investment just to make another simple revision, we talking about full next gen hardware in every sense.


IMO sales numbers will mostly depend from supplies, but 15m per year for 1st two years is definitely possible.
Why not,? Switch was launched in 2017. so around 2 years after this next gen Switch model would mean around 8 years of support for Tegra X1 based Switch models.
Around 2 years after next gen hardware is not soon as possible, it's very reasonable time frame, I could see 2-3 years, but no way they will support it 4-5 years after this new Switch launch.
Hmm... Of course depends on the actual hardware, but my guess is something like:
60% probability for $499
30% probability for $449
10% probability for $549

Personally I would like maxed out design in $699 range. But that's not gonna happen. These are all so cheap prices compared most ppl buing phones in $600-$1200 price range.
 
Yeah, around what I think is expected.
102.4GB/s like Orin NX for the lowest.

Likely an effective bandwidth upwards of 200GB/s because of how memory-efficient Ampere is, Orin allegedly being even more efficient, and with the L1 and L2 Cache increases, maybe add in the idea of them sharing 8MB of L3 as supported by the A78/A78Cs between the CPU and GPU

I was thinking about this, and I'm actually not sure if Orin's system level cache (which sits above both the CPU and GPU) is really necessary for Drake. Generally you would want a SLC like that if there's a lot of data going back and forth between the CPU and GPU, which would be the case for Orin, but on a games console SoC, the really heavy bandwidth uses are framebuffer objects and to a lesser extent textures, which are only touched by the GPU. So it may be both more effective and simpler to just increase the size of the GPU L2, rather than adding an SLC on top of that.

Can you lay out how you arrived at this conclusion, I have some vague memories of this, but I couldn't follow directly.

Is this bit I have around still up to date/useful? 'Pascal and later' was... Some time ago.


Nvidia's framebuffer partition sizes vary by the memory type. For LPDDR5 it's different than GDDR6 or HBM2E, but we know from Nvidia's specs that Orin has a 256 bit memory interface, and Drake has half the framebuffer partitions, therefore Drake should have a 128 bit memory interface.

Now just to have a better grasp of the things, how much power would be needed for 8GB of LPDDR5 to achieve this speed?

The capacity of the memory shouldn't impact the performance, just the speed, width and type of the interface. I calculated a little while back that LPDDR5 probably has around 4 picoJoules per bit (pJ/b) of energy consumption, based on claims from manufacturers on LPDDR4X consumption and efficiency improvements for LPDDR5, but it's only a rough estimate, and likely to vary from manufacturer to manufacturer. If it is 4 pJ/b, though, then the power consumption for 102.4GB/s would be a bit over 3.3W. For LPDDR5X, Samsung claims 20% less power consumption than LPDDR5, so we could expect about 3.2 pJ/b, which would result in a bit over 3.5W to hit the maximum 136GB/s. For comparison, I'd say the LPDDR4 in the original Switch consumed between 1.5W to 2W for 25.6GB/s of bandwidth, so it's an increase over the original model either way. My guess is that the memory will be clocked quite a bit lower in portable mode to accommodate for this.

There's a section on fb configs that lists supported memory types. It doesn't include LPDDR5X and states that LPDDR5 is used on "T23X and later" which covers T234 and T239. I suppose if T239 is unique and uses LPDDR5X, it wouldn't be terribly surprising for that not to be updated here yet, but there's no sign of it currently.

Come to think of it, it doesn't include LPRDDR4X either, even though that's (supposedly) what Mariko uses. So maybe it just doesn't differentiate. Then again, T214/GM20B_B are barely present anywhere, presumably since that was largely a manufacturing change and there were very few, if any, changes to software and firmware.

My default assumption is definitely LPDDR5, the only reason I'm considering LPDDR5X is that Nintendo used 4X on the Mariko Switch models when they could have just as well stuck with LPDDR4, so they presumably value the extra bit of power efficiency. I do wonder if there would even need to be any changes on the software side between LPDDR5 and 5X (or 4 and 4X), though, as it's largely just an increased clock, and things like channel size, bank grouping, etc. stay the same. The hardware memory controller would definitely have to be updated, but I suspect it may be relatively invisible on the software side, aside from the appearance of additional clock speeds.

Incidentally, T214 seems to have been more commonly referred to as T210B01 inside Nvidia, so you may have more luck searching for it with that code.
 
I was thinking about this, and I'm actually not sure if Orin's system level cache (which sits above both the CPU and GPU) is really necessary for Drake. Generally you would want a SLC like that if there's a lot of data going back and forth between the CPU and GPU, which would be the case for Orin, but on a games console SoC, the really heavy bandwidth uses are framebuffer objects and to a lesser extent textures, which are only touched by the GPU. So it may be both more effective and simpler to just increase the size of the GPU L2, rather than adding an SLC on top of that.



Nvidia's framebuffer partition sizes vary by the memory type. For LPDDR5 it's different than GDDR6 or HBM2E, but we know from Nvidia's specs that Orin has a 256 bit memory interface, and Drake has half the framebuffer partitions, therefore Drake should have a 128 bit memory interface.



The capacity of the memory shouldn't impact the performance, just the speed, width and type of the interface. I calculated a little while back that LPDDR5 probably has around 4 picoJoules per bit (pJ/b) of energy consumption, based on claims from manufacturers on LPDDR4X consumption and efficiency improvements for LPDDR5, but it's only a rough estimate, and likely to vary from manufacturer to manufacturer. If it is 4 pJ/b, though, then the power consumption for 102.4GB/s would be a bit over 3.3W. For LPDDR5X, Samsung claims 20% less power consumption than LPDDR5, so we could expect about 3.2 pJ/b, which would result in a bit over 3.5W to hit the maximum 136GB/s. For comparison, I'd say the LPDDR4 in the original Switch consumed between 1.5W to 2W for 25.6GB/s of bandwidth, so it's an increase over the original model either way. My guess is that the memory will be clocked quite a bit lower in portable mode to accommodate for this.



My default assumption is definitely LPDDR5, the only reason I'm considering LPDDR5X is that Nintendo used 4X on the Mariko Switch models when they could have just as well stuck with LPDDR4, so they presumably value the extra bit of power efficiency. I do wonder if there would even need to be any changes on the software side between LPDDR5 and 5X (or 4 and 4X), though, as it's largely just an increased clock, and things like channel size, bank grouping, etc. stay the same. The hardware memory controller would definitely have to be updated, but I suspect it may be relatively invisible on the software side, aside from the appearance of additional clock speeds.

Incidentally, T214 seems to have been more commonly referred to as T210B01 inside Nvidia, so you may have more luck searching for it with that code.

Ha, I was shooting the moon. Thanks!

I'm thinking 2 4Gb blocks?
 
Hmm... Of course depends on the actual hardware, but my guess is something like:
60% probability for $499
30% probability for $449
10% probability for $549

Personally I would like maxed out design in $699 range. But that's not gonna happen. These are all so cheap prices compared most ppl buing phones in $600-$1200 price range.
How likely is a digital-only SKU like the PS5? $350 digital only vs $450 regular (yes I know that would overlap with the SwOLED price) or $400 digital vs. $500 regular. I feel like it being a handheld with smaller filesizes than usual would make the digital SKU easier to stomach, but maybe it just isn't worth it.
 
How likely is a digital-only SKU like the PS5? $350 digital only vs $450 regular (yes I know that would overlap with the SwOLED price) or $400 digital vs. $500 regular. I feel like it being a handheld with smaller filesizes than usual would make the digital SKU easier to stomach, but maybe it just isn't worth it.
I don't think the game card slot costs much so removing it won't really save them much money. On the contrary it would pressure them to include more storage with the digital only model which would be a lot pricier.

I don't see it happening for a hybrid.
 
Ha, I was shooting the moon. Thanks!

I'm thinking 2 4Gb blocks?
4GB* not 4Gb ;)


I think 8-12GB personally. Though I think 4GB 64-bit modules are being sampled this year and if it releases this year it wouldnโ€™t have that for this year perhaps.

So maybe 12GB? As in 2 6GB 64-bit modules.

The capacity of the memory shouldn't impact the performance, just the speed, width and type of the interface. I calculated a little while back that LPDDR5 probably has around 4 picoJoules per bit (pJ/b) of energy consumption, based on claims from manufacturers on LPDDR4X consumption and efficiency improvements for LPDDR5, but it's only a rough estimate, and likely to vary from manufacturer to manufacturer. If it is 4 pJ/b, though, then the power consumption for 102.4GB/s would be a bit over 3.3W. For LPDDR5X, Samsung claims 20% less power consumption than LPDDR5, so we could expect about 3.2 pJ/b, which would result in a bit over 3.5W to hit the maximum 136GB/s. For comparison, I'd say the LPDDR4 in the original Switch consumed between 1.5W to 2W for 25.6GB/s of bandwidth, so it's an increase over the original model either way. My guess is that the memory will be clocked quite a bit lower in portable mode to accommodate for this.
I thought we all agreed for 88GB/s for portable mode? At least thatโ€™s the common expectation I kept reading here and in the last thread.

Though even if they went with LPDDR5X, I think keeping 88GB/s in portable and 136GB/s docked can work better in their favor, it scales much better here for the docked mode.
 
I don't think the game card slot costs much so removing it won't really save them much money. On the contrary it would pressure them to include more storage with the digital only model which would be a lot pricier.

I don't see it happening for a hybrid.

I'm under the assumption that the disc drive in the Playstation 5 also doesn't cost Sony much to account for the $100 difference, and the motivation is to make up for the loss through getting it all back from digital sales.

You're right about the storage, I'm forgetting both PS5 models have nearly a TB. My personal assumptions are 128 GB for Switch+ which is definitely not enough lol
 
I don't think the game card slot costs much so removing it won't really save them much money. On the contrary it would pressure them to include more storage with the digital only model which would be a lot pricier.

I don't see it happening for a hybrid.
They wouldnโ€™t save much on the lack of a card slot, but theyโ€™d have much higher profit margins on software purchased by owners of a digital device since the cards are expensive.
 
4GB* not 4Gb ;)


I think 8-12GB personally. Though I think 4GB 64-bit modules are being sampled this year and if it releases this year it wouldnโ€™t have that for this year perhaps.

So maybe 12GB? As in 2 6GB 64-bit modules.


I thought we all agreed for 88GB/s for portable mode? At least thatโ€™s the common expectation I kept reading here and in the last thread.

Though even if they went with LPDDR5X, I think keeping 88GB/s in portable and 136GB/s docked can work better in their favor, it scales much better here for the docked mode.
Doh, I done did the Gigabits.
 
0
Hmm... Of course depends on the actual hardware, but my guess is something like:
60% probability for $499
30% probability for $449
10% probability for $549

Personally I would like maxed out design in $699 range. But that's not gonna happen. These are all so cheap prices compared most ppl buing phones in $600-$1200 price range.
$399
100%
Final Answer Regis
 
I'm under the assumption that the disc drive in the Playstation 5 also doesn't cost Sony much to account for the $100 difference, and the motivation is to make up for the loss through getting it all back from digital sales.

You're right about the storage, I'm forgetting both PS5 models have nearly a TB. My personal assumptions are 128 GB for Switch+ which is definitely not enough lol
It might be less the physical drive in itself but more all the assorted royalties associated with Blu-Ray propping the price up? Stand alone UHD Blu-Ray players are all above $100 as far as I'm aware.
 
The capacity of the memory shouldn't impact the performance, just the speed, width and type of the interface. I calculated a little while back that LPDDR5 probably has around 4 picoJoules per bit (pJ/b) of energy consumption, based on claims from manufacturers on LPDDR4X consumption and efficiency improvements for LPDDR5, but it's only a rough estimate, and likely to vary from manufacturer to manufacturer. If it is 4 pJ/b, though, then the power consumption for 102.4GB/s would be a bit over 3.3W. For LPDDR5X, Samsung claims 20% less power consumption than LPDDR5, so we could expect about 3.2 pJ/b, which would result in a bit over 3.5W to hit the maximum 136GB/s. For comparison, I'd say the LPDDR4 in the original Switch consumed between 1.5W to 2W for 25.6GB/s of bandwidth, so it's an increase over the original model either way. My guess is that the memory will be clocked quite a bit lower in portable mode to accommodate for this.
:-(
 
Hmm... Of course depends on the actual hardware, but my guess is something like:
60% probability for $499
30% probability for $449
10% probability for $549

Personally I would like maxed out design in $699 range. But that's not gonna happen. These are all so cheap prices compared most ppl buing phones in $600-$1200 price range.

So you saying there is no chance for below $499 price point?

Consoles price should be compared to consoles prices not to phone prices, and Nintendo usually looking to have more affordable price point than Sony/MS.
 
Last edited:
Not too big of a concern though; bandwidth needs will be lowered accordingly with the less strenuous demands of handheld mode.
Just like how it is now with the Switch as is; according to this, the memory controller clocks down by ~1/6th (from 1600 mhz/3200 MT/s, or 25.6 GB/s bandwidth, to 1331 mhz/2662 MT/s, or ~21.3 GB/s bandwidth) when portable.
 
Switch memory already alternates clocks, but I believe there could be more power savings if they went with LPDDR5X instead of LPDDR5 while keeping the same target of around 88GB/s in portable mode.

5X 88GB/s>5 88GB/s in terms of energy saving.

And docked 5X 136GB/s> 5 102GB/s in speed up.
 
0
If Drake does go with 102GB/s bandwidth I would presume that Undocked mode would be around 75-80GB/s. That is more than enough for Xbone/PS4 ports, together with better compression thanks to the modern uArch it will be more effective. Should be better than what Xbone has
 
But the double-wide Orin tensor cores would be the more efficient option in performance per Watt if Nintendo's looking purely for DLSS performance. Let's say that you had a 6 SM part based on desktop Ampere and wanted to double the tensor core performance. If you switched to Orin's double-wide tensor cores, then you're approximately doubling the power consumption from tensor cores*, and that's about it. However, if you kept the standard tensor cores, but doubled up on the number of SMs, you're doubling the power consumption for everything. Tensor core power consumption doubles, because you've got two of them, but you're also adding extra standard CUDA cores, extra RT cores, extra texture units, extra control logic, and all the additional wiring, logic and associated power consumption from moving data and instructions to and between these units.

* I'd actually assume the power consumption of Orin's tensor cores is less than double the power consumption of standard Ampere tensor cores, as while you're doubling the ALU width, there'll be a certain proportion of instruction decode and control logic which won't be doubled.



A100's tensor cores do operate at twice the operations per clock of the GA10x series. It's detailed in Nvidia's GA102 white paper (page 25). These are purely theoretical op/clock figures, so the cache wouldn't come into it (although it surely does play a part in real-world performance differences).



My "evidence" is simply that, if it's manufactured on Samsung N8, I personally find it extremely unlikely that they would be able to run all 12 SMs in portable mode and manage an acceptable battery life. You're welcome to disagree with that. If it's on a better manufacturing process, or is just a physically larger device than Switch, and therefore able to fit a much larger battery, then potentially they could run all 12 SMs in portable mode, but on N8 and with a ~5000mAh battery I don't see all 12 SMs being viable in portable mode.

Incidentally, the things you mention aren't really all that different to the current Switch, as developers already have to perform a mode change when docking or undocking, changing resolutions, graphical effects, managing changes to available GPU resources and bandwidth, etc. The only issue I could see is if developers on Switch are able to assign warps to individual SMs. This isn't a thing on PC, for obvious reasons, but might be the case in the console space. This might require developers to have separate sets of SM affinity mappings for docked and handheld, although if it were a clean cut from 12 SMs to 6, then Nintendo could just implement a system level modulo 6 operation on SM affinity in portable mode (eg if a warp was assigned to SM 9, it would go to SM 3 in portable mode), which would keep warps together and evenly distributed without additional developer effort.



Thanks for the additional context on this. Out of interest, does it explicitly state that FLCG stands for first-level clock gating, or are you implying it? It just strikes me as a bit odd that they would add something called second-level clock gating and only years later add first-level clock gating. Unless, as you say FLCG was something that already existed in some form but was never really utilised or exposed.

Looking into this, there's actually an additional clock gating mode, BLCG (block level clock gating). The info I can find on this seems to suggest it was the first type of clock gating implemented by Nvidia, and seems to operate at a very high level. The next level is then SLCG (second level clock gating), which is lower level. Incidentally, I found this commit for T210 (Tegra X1) which adds support for SLCG on T210 within Linux, and in the case of this particular commit for T210, it looks like SLCG just covers interfaces, codec blocks, and soforth, rather than core GPU logic. The code you link, by the way, isn't actually driver source code, it's a verilog file for Nvidia's DLA hardware. It's likely Nvidia use the same conventions for what counts as SLCG there, though. My guess is that FLCG would then probably operate at a lower level again.

In any case, it's interesting that there is at least one Ada GPU feature which is supported in Drake but didn't make it into Orin. So if Orin's GPU is a half-step between Ampere and Ada, Drake's is maybe a two-thirds step.



Honestly I don't know. But there are two separate questions there. The first is what performance is required to get Death Stranding itself to render the image data that gets fed into DLSS, and the second is what performance is required to get DLSS to bring that up to 4K. Either of these could be the bottleneck. On the first question, that's entirely on a game-by-game basis. The second question is much more predictable, and Digital Foundry has a good video looking into it for a potential Switch Pro/2. However, there's also the possibility that the DLSS used on the new Switch won't be identical to the PC version, and it may have been optimised to perform better at the cost of a bit of image quality. So in that case the answer would change again.
Some DLSS results I have seen with current (PC) DLSS version were decent even from 30 fps 720p rendering to 4K. So if they will fine tune DLSS (3.0?) and use 60 fps target (I really hope so), the 720p rendering will be fine for running heavy games.

Of course I would prefer 1080p to 1440p rendering in 1st party games because even DLSS can't do miracles.
 
0
Please read this staff post before posting.

Furthermore, according to this follow-up post, all off-topic chat will be moderated.
Last edited:


Back
Top Bottom