• Hey everyone, staff have documented a list of banned content and subject matter that we feel are not consistent with site values, and don't make sense to host discussion of on Famiboards. This list (and the relevant reasoning per item) is viewable here.
  • Do you have audio editing experience and want to help out with the Famiboards Discussion Club Podcast? If so, we're looking for help and would love to have you on the team! Just let us know in the Podcast Thread if you are interested!
  • General system instability
    🚧 We apologise for the recent server issues. The site may be unavaliable while we investigate the problem. 🚧

StarTopic Future Nintendo Hardware & Technology Speculation & Discussion |ST| (Read the staff posts before commenting!)

There's so much to unpack here, is there a good post that covers the gist?
Not yet, probably Dakhil will make one/as an update to the main post though.

But for a rundown.
  • Jetson Orin has been revealed and now we know a lot more about what Dane's floor effectively is and have pretty much honed in on what it will be hardware-wise thanks to the Orin NX SKU.
    • 8 CPU cores, 8SMs, 8-12GB of LPDDR5 RAM.
  • Orin does indeed have RT cores, but without customizations, they are half of the number per-SM of Desktop Ampere (1 RT Core per 2 SMs vs 1 per SM)
    • Even then, with 8SMs being 4 RT cores, that is enough to accelerate RTXGI, Lumen, for RTGI, and likely RT shadows still as those are mostly rasterized, just may not be enough for RT reflections.
      • Although if part of Nintendo's customizations to Orin NX to make T239/Dane includes adding those extra 4 RT cores (Which shouldn't be as hard as adding the RT cores in the first place/removing them), it would let it have the RT Acceleration ability around the laptop RTX 2060 which could do RT High in Control at 540p-1080p DLSS which includes reflections.
  • Orin does have UFS Support, but it is only one lane.
    • So UFS may be off the cards unless part of the customizations is adding that second lane back in somehow.
  • NVIDIA increased the speed of the Deep-Learning SDK by 2-3x (Jensen Said 3X, lowballing it to a 2x increase minimum)
    • This pretty much means that even if Orin S is 20TOPs only when docked, that DLSS Ultra Performance 4k at 60fps is back on the table as on 20TOPs it takes 10ms to DLSS to 4k in DLSS 2.2. With this SDK change, it would only take 3.33-5ms, which out of a 16ms frame, is likely enough.
      • And there is room to make that smaller as that is versus DLSS 2.2 which DLSS 2.3 sped it up even more.
      • For reference, DLSS Ultra Performance is 720p upscaled to 4k.
 
Not yet, probably Dakhil will make one/as an update to the main post though.

But for a rundown.
  • Jetson Orin has been revealed and now we know a lot more about what Dane's floor effectively is and have pretty much honed in on what it will be hardware-wise thanks to the Orin NX SKU.
    • 8 CPU cores, 8SMs, 8-12GB of LPDDR5 RAM.
  • Orin does indeed have RT cores, but without customizations, they are half of the number per-SM of Desktop Ampere (1 RT Core per 2 SMs vs 1 per SM)
    • Even then, with 8SMs being 4 RT cores, that is enough to accelerate RTXGI, Lumen, for RTGI, and likely RT shadows still as those are mostly rasterized, just may not be enough for RT reflections.
      • Although if part of Nintendo's customizations to Orin NX to make T239/Dane includes adding those extra 4 RT cores (Which shouldn't be as hard as adding the RT cores in the first place/removing them), it would let it have the RT Acceleration ability around the laptop RTX 2060 which could do RT High in Control at 540p-1080p DLSS which includes reflections.
  • Orin does have UFS Support, but it is only one lane.
    • So UFS may be off the cards unless part of the customizations is adding that second lane back in somehow.
  • NVIDIA increased the speed of the Deep-Learning SDK by 2-3x (Jensen Said 3X, lowballing it to a 2x increase minimum)
    • This pretty much means that even if Orin S is 20TOPs only when docked, that DLSS Ultra Performance 4k at 60fps is back on the table as on 20TOPs it takes 10ms to DLSS to 4k in DLSS 2.2. With this SDK change, it would only take 3.33-5ms, which out of a 16ms frame, is likely enough.
      • And there is room to make that smaller as that is versus DLSS 2.2 which DLSS 2.3 sped it up even more.
      • For reference, DLSS Ultra Performance is 720p upscaled to 4k.

The discussions about Orin only having 1 RT core per 2-SM's makes me wonder if this is an improvement that we will see in Lovelace?
Maybe just like Ampere contains less Tensor cores than Turing but they are more performant, it could be that Nvidia has found ways to make 1 RT core perform better than 2...

Orin NX specs definitely are some of the most interesting pieces of information from this conference currently...
That and if Omniverse can be fully realized, just hearing about the hassle most developers have endured working from home during this pandemic, a fleshed out network providing solutions in real-time is pretty epic.
 
Last edited:
0
now that we know that Orin does have (some amount of) RT, it calls into mind the early insider news that there was no RT in the dev kits. outside of the "early kits just doesn't have RT enabled hardware", either there is still not RT planned, or Nintendo is gonna surprise them with RT in near final kits next year

there's still the question of just how much acceleration up-to-8 RT cores can bring. could we see Metro Exodus Enhanced on Dane?
As others have said, the kits in dev hands were likely preliminary approximations to begin working on games with a target hardware and will be optimized as revisions that fit closer with the end product get rolled out.
Anyway, in terms of video encode streams, AGX Orin supports H.264, H.265 (HEVC), and AV1.

HELL YEAH! I'm telling you all right now, this is a big deal. Might help them keep on with game cards no larger than 64GB, at least for a while.
I can see nintendo going with VP9 for video encoding
AV1 is (in a sense) the evolution of VP9 and royalty-free. No sense in a step backwards if it's not required.
Yeah it could be that Nano Next is bimmed Orin S.

I'm guessing it's cost effective to make a separate chip that has similar performance to Orin NX but without all of the additional hardware that is disabled already.
It absolutely is, if only because it reduces overall die size.
I think they'll stick with emmc and micro sd card. There's still some speed left in that format
Not that much, really. There's a reason several smartphone manufacturers went with eUFS as quickly as they have and why Apple specifically went with a highly proprietary NVMe setup. We must also realize that PS5/XBS games are taking advantage of fast data access times to improve overall performance. Modern games may demand more than eMMC and SD cards can provide to be performative at the level we'd expect them to.
Well at least it would at the minimum hit the 250mb/s cap easily if they stuck with eMMC.

And if the Steam Deck can use NVME, I think Dane can use a lower-speed NVME drive or a "Proprietary" NVME solution like Microsoft did with the Series S|X
Well, Steam Deck is also priced at $529 for the NVMe version (which are SSDs using the M.2 interface), so cost of components (and size, because Steam Deck has 3 times the physical depth of the Switch, remember) might be a more limiting factor than power draw.
The aforementioned technical briefing for the Jetson AGX Orin mentions single-lane UFS support.
For single lane, eUFS 2.X offers 600 MB/s while eUFS 3.X offers 1450 MB/s, or so says wikipedia.
For UFS cards, 1.X and 2.0 are 600 MB/s per lane, while 3.0 is 1200 MB/s per lane.

Looking at Samsung's catalog for eUFS... they're all listed for a 2 lane interface. Would they work fine in the Jetson AGX Orin (but at single lane speeds, of course)? Or not?
This is maybe misunderstanding terminology, but IIRC the primary benefit of 2 lanes is the ability to pull data from 2 different UFS data sources simultaneously (eUFS and UFS cards) and being able to use both lanes when only one read/write source is in use. Since patches are usually stored with the original game package, I don't think Nintendo would require 2-lane support.
 
  • Orin does have UFS Support, but it is only one lane.
Since AGX Orin uses "UPHY lanes for USB 3.2, UFS, PCIe, and MGBE, and some of the UPHY lanes are shared between these interfaces" (see table below), I wonder how difficult it'd be to move some lanes to UFS in order to support dual lane eUFS and/or UFS Card. I'm not an expert though.
zDt6l5C.png


This diagram also seems to suggest that there are two (single lane?) UFS interfaces.
o10R22o.png


And even with just one lane, eUFS 2.0 and UFS Card 1.0 can already reach 600MB/s duplex.

  • NVIDIA increased the speed of the Deep-Learning SDK by 2-3x (Jensen Said 3X, lowballing it to a 2x increase minimum)
This GTC session introduces Orin SoC's optimization strategies for deep neural networks (DNNs)—sparsity, tiling, and chaining—worth a watch if interested. It also includes the follow slide showing the hardware features that improve DNN optimization.
zwtp0CE.png

Edit: typo
 
Last edited:
Why?

It has half the SMs and 3/4 of the A78 cores. What else is deactivated on the NX compared to the AGX?

Edit: Oh, you mean the deactivated area we can cut off immediately is closer to 1/4, right?
It was just a general "half the core" estimate. If the chip is half the size in the length and width directions it's a quarter of the area.
 
Well at least it would at the minimum hit the 250mb/s cap easily if they stuck with eMMC.

And if the Steam Deck can use NVME, I think Dane can use a lower-speed NVME drive or a "Proprietary" NVME solution like Microsoft did with the Series S|X
For the Steam Deck, I think that Valve prioritizes storage being user replaceable over maximizing power efficiency (also, handheld Steam Deck targets a higher power consumption than handheld Switch, so it can spare a watt or two).

NVMe drives at lower speed draw too much power still, I think. Their active states are at least 1 W or more, so even if you slow things down to within reach of UFS 2.X or 3.X, you're still 2 or 3 orders of magnitude above what eUFS supposedly draws*

*white paper for UFS claims that 2.0 uses ~1 mW when active (same as eMMC 4.51). For comparison, SATA and NVMe drives in their idle states use single to double digit mW.

HELL YEAH! I'm telling you all right now, this is a big deal. Might help them keep on with game cards no larger than 64GB, at least for a while.


This is maybe misunderstanding terminology, but IIRC the primary benefit of 2 lanes is the ability to pull data from 2 different UFS data sources simultaneously (eUFS and UFS cards) and being able to use both lanes when only one read/write source is in use. Since patches are usually stored with the original game package, I don't think Nintendo would require 2-lane support.

Strictly speaking, I think the game size benefit was achievable with Ampere already with NVDEC's capability to decode AV1.
The interesting thing about Orin updating NVENC for AV1 encode support would be user side recording. Which leads to a chicken or the egg situation: if a user encodes in AV1, how large of an audience can they share it with? I mean, I get that the world will move onward from x264 to AV1 sooner or later, but I'm just not sure on the time scale, since it does require the proliferation of devices able to play it back.
Unless you're referring to a possibly easier time on the developer side of things, where they can maybe use something Orin-based for AV1 encoding instead of leaving on beefy desktops overnight?

Regarding the 1/2 lanes thing: hmm, makes sense.
 
It would be easier if PS5 and the Xbox Series also supported AV1 decode in hardware, or it make sense to encode AV1 video files it multiple systems support it. Supposedly AV1 decode is part of RDNA2.
 
0
because you have to scale in two directions rather than 1.
It was just a general "half the core" estimate. If the chip is half the size in the length and width directions it's a quarter of the area.
If you go from 16 sm (4x4 grid, for example) to 8 sm (4x2 or 2x4 grid), you're only downscaling in 1 direction.

If we were halving the sm count in both width and height by 2, we would end with a 2x2 grid, 4 sm.

Another way to put it is that each core/sm has a fixed area, if you cut the count by half, you're reducing the total area by half as well.
 
If you go from 16 sm (4x4 grid, for example) to 8 sm (4x2 or 2x4 grid), you're only downscaling in 1 direction.

If we were halving the sm count in both width and height by 2, we would end with a 2x2 grid, 4 sm.

Another way to put it is that each core/sm has a fixed area, if you cut the count by half, you're reducing the total area by half as well.
Good point. I think it was just a theoretical "half" idea, rather than specifically looking at what would be removed.
 
0
depends on if DLSS is applicable or not for handheld mode, RT makes sense to use more power. though I'm not expecting a crazy amount of RT in the first place. a lot of it could probably fallback to screen space in handheld mode
 
depends on if DLSS is applicable or not for handheld mode, RT makes sense to use more power. though I'm not expecting a crazy amount of RT in the first place. a lot of it could probably fallback to screen space in handheld mode
Well if they use RTXGI, that can be scaled and used without RT on a system with the power of the Xbox One NVIDIA said.

And guess where our current Portable Mode Predictions land (Around Xbox One level).

So just have RTXGI with RT-Acceleration when docked, and just fallback to software+SS when portable.
 
something I just realized will be important if there's RT in Dane is that RT is very memory intensive. slow memory can really hobble RT performance
 
0
Strictly speaking, I think the game size benefit was achievable with Ampere already with NVDEC's capability to decode AV1.
The interesting thing about Orin updating NVENC for AV1 encode support would be user side recording. Which leads to a chicken or the egg situation: if a user encodes in AV1, how large of an audience can they share it with? I mean, I get that the world will move onward from x264 to AV1 sooner or later, but I'm just not sure on the time scale, since it does require the proliferation of devices able to play it back.
Unless you're referring to a possibly easier time on the developer side of things, where they can maybe use something Orin-based for AV1 encoding instead of leaving on beefy desktops overnight?
It also means, should devs be working on hardware without AV1 encode (which is quite possible for the next few years), they could in theory drop raw video footage and use the SDK for encoding, which could allow universal adoption of AV1 video across all games using pre-rendered video.
 
0
Ooh, yea, that should also be good for standardizing.
...and I guess save some time/energy that would've been spent on choosing from one of the current encoders and learning the details for optimization.
 
0
Rereading some of kopite's previous tweets and I see they said that the next Switch is using a Lovelace based GPU.




How does that square with the Orin reveals today?
 
Last edited:
Rereading some of kopites and I see they said that the next Switch is using a Lovelace based GPU.




How does that square with the Orin reveals today?

we have no idea. kopite mentioned that Lovelace is similar to Ampere, so a Maxwell>Pascal-like jump would mean that Ampere and Lovelace are functionally similar with minor tweaks. we already see that Orin supports AV1 encode, so that means there's a newer nvnec, possibly from Lovelace
 
Rereading some of kopites and I see they said that the next Switch is using a Lovelace based GPU.




How does that square with the Orin reveals today?

Well so far, the GPU on Orin seems to borrow features from the Lovelace architecture, such as AV1 encoding. (The consumer Ampere GPUs only supported AV1 decoding.)

~

Anyway, NateDrake also mentioned that RTX support on the late 2020 devkits have been adjusted due to the power consumption. And RTX support has been tested for handheld mode, but battery life has been impacted more negatively than desired.
 
Can someone with access to the full data sheet check on AV1 decode to see if there's a NVDEC update too?
Ampere's NVDEC supports Main profile, so that's 8 or 10-bit depth 4:0:0 and 4:2:0 chroma sampling.
High profile would add 4:4:4 chroma sampling. Professional would have 8/10/12-bit depth 4:0:0/4:2:0/4:2:2/4:4:4 chroma sampling.
 
Nvidia is ramping up hiring of Game Console Developer Tools Engineers. About 10 days ago they posted 4(!) open positions (CA, NC, TX, and remote), looking for engineers to "pioneer the next generation of Graphics Developer Tools for Game Consoles". Unless Nvidia has a secret console client that we are unaware of, these can only be for the "next generation" Switch.

NVIDIA is looking to hire a deeply technical, creative and hands-on software engineer to pioneer the next generation of Graphics Developer Tools for Game Consoles. You will work with the NVIDIA Graphics Tools team to build tools that enable developers worldwide to harness the full power of NVIDIA GPUs. We are looking for a low-level programmer to help us provide the best possible experience for game developers seeking to debug and optimize their games.
 
Can someone with access to the full data sheet check on AV1 decode to see if there's a NVDEC update too?
Ampere's NVDEC supports Main profile, so that's 8 or 10-bit depth 4:0:0 and 4:2:0 chroma sampling.
High profile would add 4:4:4 chroma sampling. Professional would have 8/10/12-bit depth 4:0:0/4:2:0/4:2:2/4:4:4 chroma sampling.
aV7MOQU.png

q9aKQkW.png
 
Can someone with access to the full data sheet check on AV1 decode to see if there's a NVDEC update too?
Ampere's NVDEC supports Main profile, so that's 8 or 10-bit depth 4:0:0 and 4:2:0 chroma sampling.
High profile would add 4:4:4 chroma sampling. Professional would have 8/10/12-bit depth 4:0:0/4:2:0/4:2:2/4:4:4 chroma sampling.
If no one looks it up, I can check when I get back to my desktop.
 
0
Can someone with access to the full data sheet check on AV1 decode to see if there's a NVDEC update too?
Ampere's NVDEC supports Main profile, so that's 8 or 10-bit depth 4:0:0 and 4:2:0 chroma sampling.
High profile would add 4:4:4 chroma sampling. Professional would have 8/10/12-bit depth 4:0:0/4:2:0/4:2:2/4:4:4 chroma sampling.
For the use-case here (pre-rendered video for video games) 4:4:4 chroma sampling is overkill. As long as video doesn't feature a lot of text or super-crisp vector images, 4:2:0 is more than sufficient, especially in 4K, where subsampling artifacts become incredibly difficult to spot due to higher resolution and pixel density.
 
Nvidia is ramping up hiring of Game Console Developer Tools Engineers. About 10 days ago they posted 4(!) open positions (CA, NC, TX, and remote), looking for engineers to "pioneer the next generation of Graphics Developer Tools for Game Consoles". Unless Nvidia has a secret console client that we are unaware of, these can only be for the "next generation" Switch.
Similar description to this one by NTD
 


Thanks! So on the surface, I don't notice any new capabilities, profile-wise*. I wonder if there are any improvements throughput-wise, but I can't seem to find Ampere's numbers to compare against.

*...kinda, sorta. So I'm not 100% on this, but what's getting my attention is the HEVC 8k@30 fps stream. Regarding Ampere, the documentation I can find mentions 8192x8192 resolution support for HEVC, but profile-wise up to Level 5.1.
Then, going by the table I'm looking at for HEVC levels, 5.1 defines up to 4k@60 fps. One would need to go to level 6 to see 8k@30 fps.
But honestly, I'm getting beyond the depth of my understanding of this subject at this point. For all I know, maybe Ampere's NVDEC was already capable of HEVC 8k@30 fps.

For the use-case here (pre-rendered video for video games) 4:4:4 chroma sampling is overkill. As long as video doesn't feature a lot of text or super-crisp vector images, 4:2:0 is more than sufficient, especially in 4K, where subsampling artifacts become incredibly difficult to spot due to higher resolution and pixel density.
Oh, yea, no disagreement from me. I was just wondering about whether or not Nvidia touched NVDEC too or not, potentially for Lovelace.
 
Considering that Orin's confirmed to have AV1 encoding, does that mean Netflix will finally come to the DLSS model*?!
If it wasn't obvious, I'm joking here.

~

Although this rumour may or may not be dubious, and is not related to the DLSS model*, I think it'd be interesting to see how Dane compares to the Exynos 2200 (?).
 
0
we have no idea. kopite mentioned that Lovelace is similar to Ampere, so a Maxwell>Pascal-like jump would mean that Ampere and Lovelace are functionally similar with minor tweaks. we already see that Orin supports AV1 encode, so that means there's a newer nvnec, possibly from Lovelace
Lovelace is 5nm... which would conflict with the process I've heard concerning the Switch 4K. The info could be outdated, admittedly; but the last bit of info I had heard was Switch 4K was looking to use 8nm. That's not to say it may not have Lovelace features added but still be Ampere-based.
 
Lovelace is 5nm... which would conflict with the process I've heard concerning the Switch 4K. The info could be outdated, admittedly; but the last bit of info I had heard was Switch 4K was looking to use 8nm. That's not to say it may not have Lovelace features added but still be Ampere-based.
I think the GPU on Dane is likely still based on Ampere, but borrows features from Lovelace, similar to how the GPU on the Tegra X1 is based on Maxwell, but borrows features from Pascal. So far, the GPU on Orin supports AV1 encoding, which I assume is a Lovelace feature, considering the consumer Ampere GPUs only supported AV1 decoding.
 
Lovelace is 5nm... which would conflict with the process I've heard concerning the Switch 4K. The info could be outdated, admittedly; but the last bit of info I had heard was Switch 4K was looking to use 8nm. That's not to say it may not have Lovelace features added but still be Ampere-based.

That's pretty much what kopite7kimi attested to Lovelace being, just a tweaked updated version of Ampere.
(Kopite has repeatedly stated that Orin will be on Samsung's 8nm process)

What will be interesting is if we see desktop Lovelace use a similar modification of 1 RT core per 2 SM's like Orin or if that choice was made to alleviate both TDP and memory bandwidth constraints on the mobile chip.
 
0
I think the GPU on Dane is likely still based on Ampere, but borrows features from Lovelace, similar to how the GPU on the Tegra X1 is based on Maxwell, but borrows features from Pascal. So far, the GPU on Orin supports AV1 encoding, which I assume is a Lovelace feature, considering the consumer Ampere GPUs only supported AV1 decoding.
That is my expectation & would better fit the info I had heard earlier in the year. As mentioned elsewhere, the info today meshes well with devkit info gathered months ago. A better picture is forming.
 
Thanks! So on the surface, I don't notice any new capabilities, profile-wise*. I wonder if there are any improvements throughput-wise, but I can't seem to find Ampere's numbers to compare against.
I can't tell either. The comparison of video capacity between Orin NX and AGX Orin is interesting though (see below). Despite having 4 more A78AE cores and 1024 more CUDA core, AGX Orin doesn't perform that much better than Orin NX aside from 4K encoding.

Left: Orin NX | Right: AGX Orin | Top: video encode | Bottom: video decode
NI1wwgs.png
 
So definitely some interesting improvements in the Orin Ampere, it seems that they have increased L1$ from 128KB to 192KB per SM, and also for 2 GPC it has 4MB of L2$. For comparison the whole GA102 chip has 6MB of L2$ for 6 GPC's, so in scale they have doubled the amount of L2$.

These improvements must be Nvidia's answer to AMD's Infinity cache set-up that we are sure to see replicated in desktop Lovelace..
930-sm-diagram.jpg



nvidia_orinsoc_ampere_block.jpg
 
Last edited:
So definitely some interesting improvements in the Orin Ampere, it seems that they have increased L1$ from 128KB to 192KB per SM, and also for 2 GPC it has 4MB of L2$. For comparison the whole GA102 chip has 6MB of L2$ for 6 GPC's, so in scale they have doubled the amount of L2$.

These improvements must be Nvidia's answer to AMD's Infinity cache set-up that we are sure to see replicated in desktop Lovelace..
930-sm-diagram.jpg



nvidia_orinsoc_ampere_block.jpg
Doubling of L2 and Double FP16 on CUDA. (And 50% more L1 Christ)
Gheez Dane is going to actually be 50% more powerful than we previously thought isn't it.

(Previous thoughts were 2TFLOPs Ampere which would be either equal to 20% better than PS4, but the FP16 thing is huge as it can bring a 40-50% boost to games and also sort of help with TX1 B/C at the same time, and that is before the L2 Cache Doubling)

(AKA at 8SMS it likely falls on the PS4 Pro side of the PS4-PS4Pro gap now when Docked)

ILikeFeet

That resolve the memory bandwidth question? XD
 
Last edited:
Doubling of L2 and Double FP16 on CUDA. (And 50% more L1 Christ)
Gheez Dane is going to actually be 50% more powerful than we previously thought isn't it.

(Previous thoughts were 2TFLOPs Ampere which would be either equal to 20% better than PS4, but the FP16 thing is huge as it can bring a 40-50% boost to games and also sort of help with TX1 B/C at the same time, and that is before the L2 Cache Doubling)

(AKA at 8SMS it likely falls on the PS4 Pro side of the PS4-PS4Pro gap now when Docked)

It will definitely be interesting to see this in actuality of what changes are made to Lovelace and how it impacts performance over Ampere.
The early rumors are that it's definitely more than +2x over Ampere, so maybe the changes plus the 5nm node lends itself to all of the improvements.

I'm also interested to see if RT and Tensor performance is just on another level because of these memory allotment changes and maybe the GPU performance is more stream lined vs the Ampere cards...
 
It will definitely be interesting to see this in actuality of what changes are made to Lovelace and how it impacts performance over Ampere.
The early rumors are that it's definitely more than +2x over Ampere, so maybe the changes plus the 5nm node lends itself to all of the improvements.

I'm also interested to see if RT and Tensor performance is just on another level because of these memory allotment changes and maybe the GPU performance is more stream lined vs the Ampere cards...
Well, the FP16 Changes surely help streamline the GPU end of things with Ampere seemingly not having enough things to do with its CUDA cores.

And the Memory Changes would scale up to be massive on a full Desktop GPU scale.

As the RTX 3080 has 8.7MB of L1 Cache (128kb of L1 per SM * 82SMs = 8704kb), and 6MB of L2

A Lovelace GPU with the same SM Number would have 13MB of L1 Cache and 12MB of L2.

And that means 8SM Dane would have 1.5MB of L1, and 2-4MB of L2, which is small in comparison to the other Lovelace GPUs that would come with that Cache change but to put it in context.

That means Dane could have more L2 Cache than the RTX 3060 and the same L2 Cache amount as the 3070TI.
 
0
Not yet, probably Dakhil will make one/as an update to the main post though.

But for a rundown.
  • Jetson Orin has been revealed and now we know a lot more about what Dane's floor effectively is and have pretty much honed in on what it will be hardware-wise thanks to the Orin NX SKU.
    • 8 CPU cores, 8SMs, 8-12GB of LPDDR5 RAM.
  • Orin does indeed have RT cores, but without customizations, they are half of the number per-SM of Desktop Ampere (1 RT Core per 2 SMs vs 1 per SM)
    • Even then, with 8SMs being 4 RT cores, that is enough to accelerate RTXGI, Lumen, for RTGI, and likely RT shadows still as those are mostly rasterized, just may not be enough for RT reflections.
      • Although if part of Nintendo's customizations to Orin NX to make T239/Dane includes adding those extra 4 RT cores (Which shouldn't be as hard as adding the RT cores in the first place/removing them), it would let it have the RT Acceleration ability around the laptop RTX 2060 which could do RT High in Control at 540p-1080p DLSS which includes reflections.
  • Orin does have UFS Support, but it is only one lane.
    • So UFS may be off the cards unless part of the customizations is adding that second lane back in somehow.
  • NVIDIA increased the speed of the Deep-Learning SDK by 2-3x (Jensen Said 3X, lowballing it to a 2x increase minimum)
    • This pretty much means that even if Orin S is 20TOPs only when docked, that DLSS Ultra Performance 4k at 60fps is back on the table as on 20TOPs it takes 10ms to DLSS to 4k in DLSS 2.2. With this SDK change, it would only take 3.33-5ms, which out of a 16ms frame, is likely enough.
      • And there is room to make that smaller as that is versus DLSS 2.2 which DLSS 2.3 sped it up even more.
      • For reference, DLSS Ultra Performance is 720p upscaled to 4k.

Thank you - and in laymans terms this would mean that Nintendo games should run at good performance at a 4K resolution and even increased third party support from AAA devs?
 
0
I can't tell either. The comparison of video capacity between Orin NX and AGX Orin is interesting though (see below). Despite having 4 more A78AE cores and 1024 more CUDA core, AGX Orin doesn't perform that much better than Orin NX aside from 4K encoding.

Left: Orin NX | Right: AGX Orin | Top: video encode | Bottom: video decode
NI1wwgs.png
That is actually expected since video decoding/encoding are accelerated on NV GPUs by fixed-function hardware afaik. So depending on the number of NVDEC/NVENC chips on each device, the difference in video processing capabilities between Orin NX and AGX may be different from what their horsepower delta suggests.
 
That is actually expected since video decoding/encoding are accelerated on NV GPUs by fixed-function hardware afaik. So depending on the number of NVDEC/NVENC chips on each device, the difference in video processing capabilities between Orin NX and AGX may be different from what their horsepower delta suggests.
Ah yes, they are in the functional block diagram and I totally forgot about them. Thank you.
 
0
2K is actually 2048x1080, not 2000x1080, and Nintendo makes consoles for consumer-grade TV, so I doubt they make a console that can output QHD which is linked to PC monitors, as I doubt all consumer-grade TV accept QHD, and that's what Nintendo will chase.
But if we're talking 16:9, then it's 1920x1080.
 
0
I really wish I understood more of the tech talk. I'm super excited to see how this all unfolds and I try to always check in on this thread every once in a while to keep up but I usually feel like I need a "for dummies" version of it. :p
 
Simplified version, @Mbolibombo and @Chaotic Neutral :

The next Nintendo chip (codenamed Dane) is very likely to be derived from the chip they just announced (a cut down version, maybe shrank if we're being really optimistic about the "Orin X" chip, which has yet to be detailed).

There's not much we can take from this in terms of how powerful Dane will be besides more confidence on the previous ballpark ("it should be able to play XB1/PS4 games fine, DLSS should make games look closer to a PS4Pro version with some caveats"), since we don't know how much was cut nor how much they will lower the clocks (for sustained performance and power budget). There are many YouTube videos explaining better than words could what you can expect from DLSS if you're not familiar with it.

For tech enthusiasts, it means a more accurate idea what it can and can't do compared to extrapolating from PC/notebook graphic cards which weren't made with such a low power budget in mind and what tech improvements they made since they finished those cards. For example, I have more confidence that they will use DLSS as is rather than a lighter version and (Limited) Ray Tracing seems more likely now.

As for AAA ports, I would expect companies bringing PS360 ports now to move to PS4X1 ports and about as many PS5 "impossible ports" as we got PS4 "impossible ports" on OG. Overall, support should be better, since the publishers who took a couple years to get onboard might be there from the beginning, but don't expect anything close to parity (there are many reasons for games skipping the Switch besides power).
 
Please read this staff post before posting.

Furthermore, according to this follow-up post, all off-topic chat will be moderated.
Last edited:


Back
Top Bottom