• Hey everyone, staff have documented a list of banned content and subject matter that we feel are not consistent with site values, and don't make sense to host discussion of on Famiboards. This list (and the relevant reasoning per item) is viewable here.

StarTopic Future Nintendo Hardware & Technology Speculation & Discussion |ST| (Read the staff posts before commenting!)

Is my understanding correct that DLAA is simply DLSS from 720p -> 720p (i.e. it uses the same exact autoencoder design, but with an input layer size equal to the output layer size)? If so, then what it does is introduce information from higher level reconstruction information in its training dataset and using that to predict how pixels should be voloured to prevent aliasing while retaining a high res-like level of detail.

DLSS does the same, but it also raises the output resolution. Now imagine if your screen is 1080p, and you render natively at 720p. DLAA and DLSS now apply the same technique, but DLAA outputs a 720p image with the aforementioned reconstructed detail and DLSS output a 1080p image with the aforementioned detail. Now it comes time to output the omage on the screen. For the DLSS picture, the image is a perfect match, so no upscaling happens. For the DLAA image on the other hand, you are at the merci of a simple upscaler in order to fit the omage to the 1080p screen. My handhaving intuition would be that the DLAA picture will have a lower quality.

Of course, it could be DLSS slightly more expensive than DLAA (the network layers are necessarily larger for DLSS, after all), and furthermore additional post-processing costs on a 1080p image are a bit more expensive than at 720p. So DLAA could be somewhat cheaper, though my personal intuition would suggest that DLSS is bigger bang for your buck. As such, DLAA is probably only useful for games that have a rendering budget surplus and can afford to splurge on great anti-aliasing by applying the DLAA algorithm after rendering a native (1080p) image or - god forbid - after rendering a native 4K image (lol).

BTW: this discussion works if you substitute a 720p for a 1080p screen and 480p/540p native for 720p native rendering before DLSS. So please don't reply to tell me a 720p screen is more likely :)

Edit: Hypothetically, perhaps one way in which DLSS may supercede DLAA, even if we have already achieved the native screen res, is by rendering a superresolution image from which higher quality motion vectors can be derived for the next step of the algorithm. I don't know if this confers any tangible advantage, and DLSS up to a super-native resolution is more expensive than DLSS (plus more expensive post-processing), but you could likely get good results. In fact, now that I think about it (I'm thinking it through while writing this edit lol), this is in fact how DLSS was first conceived (as a super-sampling AA solution), so there's good reason to think if conveys noticeable improvements. Whether it is worth the cost remains to be seen, of course, and devs will be the ones who need to test it for their specific game with its specific render budget.
 
Last edited:
0
Not a bad thought. My gut says that TFLOPS is still better - the idea is that, as the power of the GPU goes up, the more data it needs to touch over the course of a frame. On the other hand, the SMs are the thing you’re actually feeding, so there is more of a 1:1 relationship there.
True, and doing the estimation with the Switch it points to having more bandwidth to SM, when we know the switch is bandwidth starved so it can’t be right or a perfect reflection.

Like, the switch has 25.6GB/s docked right? And 21.3GB/s portable, if the CPU uses about 6-8GB of that bandwidth for its needs(this is just an assumption, it can be lower than this and gradually increased per gen), then that means that it should be 17.6 to 19.6GB/s docked, and 13.3-15.3GB/s

Split between two SMs, you get 8.8-9.8GB/s docked, and 6.65-7.65GB/s portable, right? All seems good?

Well… here’s where it shows how… off this method is.

PS4 has 18CUs/9WGPs and 176GB/s, that comes out to 9.77GB/s per CU(or 19.5GB/s per WGP if you will)

But we know the switch is very bandwidth starved.

And then there’s the One/One S and that has 12CUs/6WGPs and 68.2GB/s and on top of that, a large on die ram that when combined in direction, is 204/218GB/s (so, 102/109GB/s in either direction).

And that gives 5.6GB/s +17GB/s per CU, or 11.36GB/s + 34GB/s per WGP for the One, and 5.6GB/s +18.25GB/s per CU, or 11.36GB/s + 36.5GB/s per WGP for the One S


PS4 Pro would be ~6GB/s per CU, or 12GB/s per WGP.

One X would be 8.16GB/s per CU or 16.32GB/s per WGP.

So, this is why I don’t think it’s an apt use of this, and I haven’t even accounted for the CPU in the other console comparisons.

Flop per GB/s seems better.
 
The logic for deciding which compute unit/stream multi-processor the RAM memory access should go to should be doable with a handful of logic gates - a couple of nanoseconds. Pushing all the required data to that SM requires synchronised pushes of coalesced data: several clock cycles per 128 bits of coalesced data. Multiply that by thousands when a new object (e.g. a geometry object or a texture) is loaded into cache, and you can see that the number of SMs is mostly irrelevant to bandwidth utilisation. On the other hand, the ratio of FLOPS to bandwidth will determine how many operations are wasted if the GPU has to wait for RAM to finish loading in a new object: higher FLOPS means more wasted cycles.

So I would say that only FLOPS is a relevant comparison to bandwidth. That said, caches are extremely important as well: the best RAM access speed is achieved by the one you prevented by having the data in cache already.
 
Last edited:
Much cheaper than I thought, I routinely equipment way above that. And being lower than 5000 means I wouldn't have to justify the purchase.
I don't have a practical use for it in my job though, at least not yet.
I am sure the internet fame you would get from doing this, would help your job somehow.
 
What is your take on it? A typo or the software is just not included in the retail version?
My assumption is that this was an understandable mistake by a technical writer, whose job is to produce readable documentation for consumers from internal docs, often for products that they themselves don't understand deeply. Nvidia is building their Linux stack beside their Nintendo stack, for whatever reason, the writer saw the files/docs, and included the reference.

I don't believe the emulation strategy mentioned would work for T239, anyway. All the Orin modules are the same, just with different hardware blocks enabled/disabled. The devkit can emulate them by loading up different firmware which switches the appropriate blocks on or off. But T239 has it's own custom blocks (the FDE), a different CPU, and a different GPU layout.
 
0
I'm not sure you can definitively say that for all people. It would be one thing if you were losing half the detail, but you're not.


I think that DLSS Estimator is based on the best data we have, but I think it's pretty definitive that the amount of data isn't very good. I wouldn't make any confident statements based on it.

I'd argue the data we have is pretty good, relatively speaking. We don't have direct measurements running on T239 via NVN2, which we're unlikely to ever have, but the DLSS Programming Guide gives us 34 data points across Turing, Ampere and Ada GPUs at a variety of resolutions, including 1080p and 4K.

If we want to know if DLSS performance is directly proportional to resolution (which is what I'd expect intuitively), then one thing we can do is take the programming guide data, and divide each one by the resolution, which I'm measuring in million pixels for convenience. If the resulting ms/Mpx is consistent across multiple resolutions for each GPU, then the data supports the hypothesis.

Here are the numbers I've got:

Code:
ms/Mpx               1080p    1440p    2160p    4320p
RTX 2060 S           0.294    0.274    0.263    0.304
RTX 2080 Ti          0.178    0.157    0.152    0.166
RTX 2080 (laptop)    0.270    0.247    0.239    0.274
RTX 3060 TI          0.217    0.198    0.183    0.211
RTX 3070             0.198    0.179    0.164    0.174
RTX 3080             0.154    0.127    0.113    0.128
RTX 3090             0.135    0.114    0.095    0.104
RTX 4080             0.096    0.100    0.088    0.090
RTX 4090                               0.061    0.059

While not exactly consistent, it's pretty close. The biggest deviations seem to be on the largest GPUs. Another way of looking at it would be to simply take the ratio of the 4K time to the 1080p time, where values close to 4 would indicate direct proportionality between resolution and performance. Here are the ratios for the Ampere cards:

Code:
RTX 3060 TI    3.38
RTX 3070       3.32
RTX 3080       2.94
RTX 3090       2.82

Again, not exactly proportional, but reasonably proportional for the smaller cards, getting less proportional as the GPUs get bigger. If the concern is that T239's GPU, being smaller than those tested by Nvidia, would have some extra overhead that we're not taking into account, the evidence suggests the opposite; that the smaller the GPU is, the closer it is to providing DLSS performance that's proportional to resolution.

Off the top of my head I can think of two possible explanations for this behaviour. The first is that there are some fixed costs involved in DLSS that don't scale with GPU performance and/or don't scale linearly with resolution. These would cause a relatively bigger impact on more powerful GPUs, as a fixed cost of (say) 0.05ms is going to matter more if the total time is 0.2ms than if it's 0.5ms. The second is that DLSS from 540p to 1080p is simply too small a task to fully utilise the computational resources of a large GPU like an RTX 3090.

Even if T239 did end up having performance scaling in line with the worst of the cards tested by Nvidia, it would still require a large performance gap of 3x or more between docked and portable modes for 1080p DLSS to be more expensive in portable mode than 4K is in docked. I'd find this pretty unlikely, as it would require Nintendo to be both very aggressive with docked clocks while also being very conservative with portable clocks.
 
To complement @Thraktor 's post, here's the estimation of how the 1080p/4K ratio evolves depending on performance (The yellow curve):



A Performance of 1 = 3080 Performance
These are estimated curves, but it shows that the ratio does seem to trend toward 4x when going at lower performance. Which would make total sense.
 
In theory, the next switch with DLSS could be the first console for Nintendo with a graphics/tech feature that the other consoles don’t have. Devs are loving DLSS and what it provides them. The next switch could really leverage this and excite a lot of devs. A lot of PC games could make their way to switch. Software is the key. Very excited for this console and I hope Nintendo doesn’t mess up.
 
In theory, the next switch with DLSS could be the first console for Nintendo with a graphics/tech feature that the other consoles don’t have. Devs are loving DLSS and what it provides them. The next switch could really leverage this and excite a lot of devs. A lot of PC games could make their way to switch. Software is the key. Very excited for this console and I hope Nintendo doesn’t mess up.
I like your theory as I’ve been recently thinking about that as well. It would be quite a clout for Nintendo to have. Question is how well it will be implemented. Is it reasonable to fear that Nintendo could somehow gimp DLSS for this new hardware for reasons? Or is this just not possible.
 
In theory, the next switch with DLSS could be the first console for Nintendo with a graphics/tech feature that the other consoles don’t have. Devs are loving DLSS and what it provides them. The next switch could really leverage this and excite a lot of devs. A lot of PC games could make their way to switch. Software is the key. Very excited for this console and I hope Nintendo doesn’t mess up.
First recent console
 


So apparently we just need to put together a fund to buy an ORIN kit and the software for developing for T239 is there?

Not clicking, but I assume this is at least partially based on the mention of T239 in this page of documentation, which Doctre81 also alluded to in a recent tweet? As @oldpuck mentioned, no partition files for T239 are actually in the SDK, and even if they were, it would be extremely uninteresting data, basically just a metadata definition for system storage regions, and probably almost identical to the ones that do exist for T234.
 
That Orin stuff is like a year old now.



And attempting to moderate whatever youtube/twitter content generators do is a lost cause.
Can always post fake info behind Hidden tags and see who takes the bait. The only issue is that it requires being dishonest to the members here, which is unfair.

The T239 documentation mention is nothing. You cannot buy the Jetson and use it as a devkit.
 
Can always post fake info behind Hidden tags and see who takes the bait. The only issue is that it requires being dishonest to the members here, which is unfair.
omg you're right we could totally StarFoxGP some youtubers 🤣
 
GDC is next week, so hopefully we start to hear rumblings from developers. Between Jeff Grubb and @NateDrake, we should get some idea of what is going on in the Nintendo camp. Even if it ends up being 2024 hardware, its getting to the point where at least some developers should have access to the hardware.
 
In theory, the next switch with DLSS could be the first console for Nintendo with a graphics/tech feature that the other consoles don’t have. Devs are loving DLSS and what it provides them. The next switch could really leverage this and excite a lot of devs. A lot of PC games could make their way to switch. Software is the key. Very excited for this console and I hope Nintendo doesn’t mess up.
as great as DLSS is, the performance improvements mostly come from lowering resolution. the magic is IQ. I don't really consider that something other systems can't do.

hell, I wonder if the premium systems offer enough room for FSR3
 
at best, as in most hopeful, DLSS could help Nintendo close the gap to great effect

"wow, it really is a home console I can take with me on the go!" people will say

thinking about it as a secret weapon outside of that doesn't seem appropriate to me. as feet puts it, the competition has resolution lol
 
GDC is next week, so hopefully we start to hear rumblings from developers. Between Jeff Grubb and @NateDrake, we should get some idea of what is going on in the Nintendo camp. Even if it ends up being 2024 hardware, its getting to the point where at least some developers should have access to the hardware.
I’m so afraid that the rumblings will be that Drake/T239 is cancelled. I’m a pessimist/cynic by nature, so this thought is nothing new, but I get more uneasy with every day that passes with no new info. Then again, maybe it could be replaced with something more powerful? Something derived from Thor?

Oh well. I hope to be proven wrong. “Expect the worst, hope for the best.”
 
I’m so afraid that the rumblings will be that Drake/T239 is cancelled. I’m a pessimist/cynic by nature, so this thought is nothing new, but I get more uneasy with every day that passes with no new info. Then again, maybe it could be replaced with something more powerful? Something derived from Thor?

Oh well. I hope to be proven wrong. “Expect the worst, hope for the best.”
It's natural for us who are looking at this everyday to get anxious and even pessimistic because the lack of info feels like it points in that direction, but honestly expecting new info regularly for an unannounced console is probably unreasonable. Even after Switch was shown in the teaser trailer back in 2016, many questions remained until the January reveal, and some things weren't fully known until launch.

I always come back to the thought of why cancel hardware that far along in development? I have yet to hear a good theory from anyone on why Nintendo would spend all that money, be in the final stages of development, and at the last moment decide to scrap it and go back to the drawing board, essentially starting over. Even if Nintendo's plan was for a Pro model, why not release it rather than just scrap it? I'm in the boat that Nintendo at some point pivoted away from a Pro model and will instead push it as a successor. Perhaps delays played a factor, but I find it hard to believe Drake made it all the way to the taping out process only to be scraped entirely.

One more bit of speculation, let's say Nintendo send out preliminary 4K development kits in 2021. What if the number of developers was indeed eleven. At that point Nintendo basically knew every developer in possession was breaking NDA. Nintendo pulls the kits with no further explanation.
 
Last edited:
I’m sorry to be the “doom and gloom” post of the week. I’ll move on. Nvidia’s GTC is next week as well. Might we hear something there, at least in terms of a hint? If nothing else, I’d like to hear something more on Thor. That new chip sounds like a beast.
 
Please read this staff post before posting.

Furthermore, according to this follow-up post, all off-topic chat will be moderated.
Last edited:


Back
Top Bottom