• Hey everyone, staff have documented a list of banned content and subject matter that we feel are not consistent with site values, and don't make sense to host discussion of on Famiboards. This list (and the relevant reasoning per item) is viewable here.
  • Do you have audio editing experience and want to help out with the Famiboards Discussion Club Podcast? If so, we're looking for help and would love to have you on the team! Just let us know in the Podcast Thread if you are interested!

StarTopic Future Nintendo Hardware & Technology Speculation & Discussion |ST| (Read the staff posts before commenting!)

Another interesting find on GA107 performance in comparison to some older Nvidia cards

"As for performance, in OpenCL, the NVIDIA GeForce RTX 3050 Ti is about as fast as the Radeon RX 5600 XT and the GeForce GTX 1080 Ti. The GeForce RTX 3050 is about as fast as the GTX 1080. The Non-Ti variant is also slightly faster than the GeForce GTX 1650 SUPER which is what it replaces by the time of launch. Also, this is just OpenCL performance and the Ampere architecture offers a huge increase in performance in OpenCL & Vulkan APIs so average performance in DirectX titles would be slightly lower."

I find this interesting because that would probably equate to an 8SM part performing pretty well against say a GTX 1060 and they also list the tested gpu clocks for these benchmarks at 1Ghz(the Orin AGX and NX max clocks).
These are the smallest Ampere gpu's that also have RT capabilities and they don't get much public fanfare when it comes to specs, but all of this information together gives us better insight of how the architecture performs in a mobile space.


Edit: So just for fun doing some rough math a theoretical 8SM GPU @ 1GHz should probably score around mid-high 36,000 range (36,643 was the number I came up with but it may not be 100% accurate).
This would put this part above a GTX1060 and 1650 Max-q (35,897 and 36,045 respectfully) but right below the GTX 1650 desktop @ 38,361.

 
Last edited:
Another interesting find on GA107 performance in comparison to some older Nvidia cards

"As for performance, in OpenCL, the NVIDIA GeForce RTX 3050 Ti is about as fast as the Radeon RX 5600 XT and the GeForce GTX 1080 Ti. The GeForce RTX 3050 is about as fast as the GTX 1080. The Non-Ti variant is also slightly faster than the GeForce GTX 1650 SUPER which is what it replaces by the time of launch. Also, this is just OpenCL performance and the Ampere architecture offers a huge increase in performance in OpenCL & Vulkan APIs so average performance in DirectX titles would be slightly lower."

I find this interesting because that would probably equate to an 8SM part performing pretty well against say a GTX 1060 and they also list the tested gpu clocks for these benchmarks at 1Ghz(the Orin AGX and NX max clocks).
These are the smallest Ampere gpu's that also have RT capabilities and they don't get much public fanfare when it comes to specs, but all of this information together gives us better insight of how the architecture performs in a mobile space.

That 4GB framebuffer is murdering those cards' potential on laptops, it's so sad XD

NVIDIA why
 
Edit: So just for fun doing some rough math a theoretical 8SM GPU @ 1GHz should probably score around mid-high 36,000 range (36,643 was the number I came up with but it may not be 100% accurate).
This would put this part above a GTX1060 and 1650 Max-q (35,897 and 36,045 respectfully) but right below the GTX 1650 desktop @ 38,361.

And considering OpenGL and Vulkan (And likely NVN) are closer to OpenCL than DirectX, that does paint positive prospects for Dane at 1GHz GPU clocks.

My question is what would a 768MGHz 8SM GPU perform like numbers-wise as some people have mentioned it was a "Lower case" GPU Clock when docked.

EDIT: Also, maybe tac on a 30% increase to performance as Orin would be stronger per-FLOP/IPC than Ampere due to the 50% L1, Double L2 cache increases.
 
Last edited:
When you think about it, it's really weird to have the same exact clock speed profiles as the switch TX1, outside of sheer coincidence. There's no reason why it should be at 768, 307 and whatever the highest handheld GPU clock speed was. It's not going to help with backwards compatibility. Ampere/Lovelace is a completely different architecture than Maxwell after all.

That being said I would like to see a similar 2-2.5x difference in GPU power between handheld and docked... so more 720p native handheld and 1080p docked, for DLSS and non DLSS games. I know that people have brought up even higher speeds for handheld (like 60-80%), because they dont think DLSS would be used in handheld mode. I wouldn't mind that, but I wonder what the power draw and thermals will be.

It will be interesting to see realistically how long the Steam Deck can play at 1 and 1.6 TFLOPs overtime before the battery drains. One thing the switch has that Steam Deck doesn't is DLSS hardware and potentially RT cores, so nintendo could keep handheld power draw to like 7 watts max and at 40-50% GPU speeds in handheld to save battery life, assuming that fully docked mode with DLSS and RT cores enabled matches 15 watts. But who knows how it will play out. A 1 TFLOP performance for handheld mode would be amazing. Basically 720p native support for almost all PS4 games easily.

But I hope we get as close to 1GHz as we can with all the GPU cores intact, assuming the Dane will be based on the Orion NX. 920-MHz-1Ghz docked would be nice. If we can get the handheld mode running at 40-60% of 920 MHz, and as long as it has a similar battery life was v1 switch (3hrs), the more future proof it will be.. and the inevitable revision using a 5nm node or less will have the same battery life or better as v2 and OLED switch models.
 
Last edited:
When you think about it, it's really weird to have the same exact clock speed profiles as the switch TX1, outside of sheer coincidence. There's no reason why it should be at 768, 307 and whatever the highest handheld GPU clock speed was. It's not going to help with backwards compatibility. Ampere/Lovelace is a completely different architecture than Maxwell after all.

That being said I would like to see a similar 2-2.5x difference in GPU power between handheld and docked... so more 720p native handheld and 1080p docked, for DLSS and non DLSS games.

But I hope we get as close to 1GHz as we can with all the GPU cores intact, assuming the Dane will be based on the Orion NX. 920-MHz-1Ghz docked would be nice. If we can get the handheld mode running at 40-60% of 920 MHz, and as long as it has a similar battery life was v1 switch (3hrs), the more future proof it will be.. and the inevitable revision using a 5nm node or less will have the same battery life or better as v2 and OLED switch models.
the highest handheld clock was 460MHz, conveniently half of 920MHz. it'd be weird to have those clocks, but given Nintendo's aims at efficiency, it's what I've been expecting. if anything, it's more weird Nvidia maxes out the gpu at 1GHz and the cpu at 2GHz. I wonder if there's an unlocked mode that has an unlimited clock
 
the highest handheld clock was 460MHz, conveniently half of 920MHz. it'd be weird to have those clocks, but given Nintendo's aims at efficiency, it's what I've been expecting. if anything, it's more weird Nvidia maxes out the gpu at 1GHz and the cpu at 2GHz. I wonder if there's an unlocked mode that has an unlimited clock
Agreed. I don't think Nintendo will max out the CPU at 2GHz. And I'm not expecting the GPU to be 1GHz... Mainly because these are supposedly the specs at 25 watts power draw on the Orion NX.. Granted there's some AI stuff that Nintendo doesn't need as well.

If we do get an 8 core A78, I can see it clocked anywhere between 1-1.5Ghz, and hopefully the GPU as close to 1 GHz as possible... Like 90% max, but maybe 75-80% is more realistic.

BTW, does anyone know if the Xavier/Xavier NX was ever able to match clock speeds consistentally, especially for GPU? Tegra chips have a history of not for the GPU, but that's because Nvidia programmed them not to for the GPU at least.. I heard even TX2 didn't even reach full GPU specs..
 
BTW, does anyone know if the Xavier/Xavier NX was ever able to match clock speeds consistentally, especially for GPU? Tegra chips have a history of not for the GPU, but that's because Nvidia programmed them not to for the GPU at least.. I heard even TX2 didn't even reach full GPU specs..
I don't know if it could, but looking it up, the AGX does have an unlimited wattage mode. the cpu peaks at 2265.6MHz while the cpu peaks at 1377 MHz. sustaining speeds is probably a function of cooling, like most device. so capping the board with an overkill cooler would definitely do it.

EDIT: Industrial Xavier has no CPU peak. the gpu peak is 1211.3MHz
 
0
We don't know if the next Switch has DLSS or RT cores
Unless you're willing to discount the only reliable rumors we actually have DLSS is by far one of the most consistently reported things about the next Switch. RT cores are more up in the air but so far we have reason to believe there will be a limited amount of those too.
 
Its a mistake to take it for granted that the next Switch will have DLSS
Nintendo's not going to custom engineer an Xaviar chip with Nvidia in 2021. Period. And that's now the only possible way it wouldn't include RT cores and DLSS at this point. The only question is how many RT cores it will have.
 
Its a mistake to take it for granted that the next Switch will have DLSS
Well technically everything can happen, but the fact that tensor cores found their way to Orin boards lends the credence to existence of tensor cores in Dane as well. A further step is tensor cores in the next Jetson/Tegra Nano Next dev kit which is as closest to Dane as possible, but we will see. I hope it gets revealed soon.
 
0
I personally don’t think going for a higher GPU clock is worth it for a device like this. I am of the opinion that going for a higher than expected CPU clock and a relatively low handheld GPU clock is the right way to go for longevity reasons.

We still have to account that this device will have a screen, Memory, sound, wifi, Bluetooth, etc.


The recent discussion is only talking about the SoC potential and that is the vacuum conversation that does not include these other factors (also leaning too much on the higher end and I think that expectation should be lowered and start thinking of the handheld aspect of this device since it still has that to contend with)

The OG switch was around 7-9W in handheld mode right, with the Mariko unit consuming 5-7W in handheld mode and ~10W in docked mode if I’m not mistaken (and OG at 13W most cases).


I think CPU we have a few configs that can be considered:

IF 4+4: 2.35GHz for the big cores and 1.1-1.3GHz for the little cores, you should get around the SD865 with this in CPU performance

IF 6C: 1.7GHz for the big cores, slightly lower perf vs the previous configuration but consuming a bit lower in CPU I think.

IF 8C: 1.3-1.4GHz for the big cores, you get around a bit better performance than the other two set ups(should be around 5% higher perf than the first set up)


The three configs should consume between 2 and 4w of battery life for the portable device.

This isn’t what they will necessarily use, simply what I think the conversation should

GPU has to consider a few possible configurations:

4SMs, 6SMs, 8SMs.

The lower it is the higher it can be clocked, especially in docked mode.


I’m not sure how the conversation reached to expecting 2TFLOPs in docked mode when that’s still consuming around, what, 20W with the other stuff removed? What are the chances of the switch Dane unit even reaching this? 2TFLOPs is more of a ceiling, not the average.

1.4-6TFLOPs at best is a reasonable assumption to make of what to expect for an 8SM configuration(possibly too optimistic still). But in portable mode it has to consume a lot less to fit the adequate battery life of the switch. Again, let’s assume it is aiming for an OK battery life, like say the Lite battery life, it has to be within the 3-7 hour range.

4SMs can be clocked higher for a performance target of say 1.1TFLOPs docked and ~500-600GFLOPs portably (so ~500MHz and 1.1GHz respectively, for example). This is edging slightly above what the Orin and Orin NX GPU cap of 1GHz but those are also chips that are several times bigger than this soc.


If you have 6SMs, which is what I expect personally, you can have above the switch docked experience in handheld mode and what reaches the PS4 experience in docked mode (but more modern) in real world performance still. I’m thinking that using the current switch higher presets for handheld (460MHz) and the TX1 highest preset (920) for docked is pretty OK and reasonable.

You get ~700GFLOPs and 1.4TFLOPs respectively. This should be within 14-16W.


Believe it or not, that’s about the SD845 in terms of paper specs for the handheld GFLOPs, but ampere is several times better than the Adreno GPU. SD845 does 737GFLOPs, this would be doing ~700GFLOPs in handheld mode, and both are on SEC nodes of the 10nm family.


For the 8SM configuration, I do not think they will aim higher than having 16W (and in being g generous with this higher than expected TDP) in docked mode. I think that, in handheld mode it would be clocked lower and closer to the current switch clock speed, 307 and 768MHz for the hand and the docked experience which can give us similar to the 6SM in handheld perf (slightly lower) and a bit above in handheld mode for perf. So, 786GFLOPs (lowest clock of 307MHz would be slightly lower than this) and 1.572TFLOPs in docked mode.


RAM with respect to memory bandwidth will be clocked lower in portable mode for obvious reasons, reaching 88GB/s is very possible in the handheld form factor with it clocking higher in docked for 102GB/s (a paltry increase). 8-12GB of RAM is very possibly imo, but I think 12GB is more possible than 8GB. (Even if they don’t need that much). I do not see 16GB.

The Screen, the storage, the SD card slot, the sound, the Bluetooth, the wifi, the fan, etc., all have to be accounted for with this device. If you have the idea that it will consume 10W in portable mode, lower that to include the other features this device has to contend with and you get more or less the actual power draw range to expect (I think 6-10W is a very possible range) on an 8N node. We don’t have any better node and have no confirmation of it being on a different node, so efficiency concerns should be considered here.


With the node, convo surrounding the 8N process. But good news, 8N is better than the regular 8LPx at doing the job so it’s not the end of the world for being on that node. Also, the 8nm process does NOT suffer from the same inefficiencies as the 20nm process and the A78 does not suffer from the same flaws as the A57-73 which were noticeably inefficient in design compared to the A78.



Still, I think that these are more in line of what to expect between CPU and GPU clock speeds for this device. I’m sorry if this is being a Debby downer, but a lot of the recent discussion has only focused on the docked mode experience outside of the realm and not so much the handheld experience which is what is the basis for this device entirely, it is the limiter of what it can do.

You can mix and match between the different configs and perhaps find any of them to be likely in this device IMO.



What size do I expect the die to be? 100mm^2 or if above between 100-120mm^2.
 
We don't know if the next Switch has DLSS or RT cores
I'm personally not betting or expecting on RT cores being used on Switch 2, or perhaps at a very limited capacity. There is an insider here who says Nintendo had to dial back the RT cores a bit because its affecting battery life..

As far as DLSS goes, I think we'll have it. Based on Bloomberg and the insider I was talking about (who also confirms DLSS).. and the Orion NX having DLSS hardware. Not to mention the DLSS like patent that NERD is/was working all. It's all leading to Switch 2 with DLSS likely happening.

@ReddDreadtheLead What you're saying here is pretty much the consensus here actually in terms of conservative estimates. ~1.5 TFLOPs GPU, 6-8 core A78 CPU, DLSS support.
 
Last edited:
And considering OpenGL and Vulkan (And likely NVN) are closer to OpenCL than DirectX, that does paint positive prospects for Dane at 1GHz GPU clocks.

My question is what would a 768MGHz 8SM GPU perform like numbers-wise as some people have mentioned it was a "Lower case" GPU Clock when docked.

EDIT: Also, maybe tac on a 30% increase to performance as Orin would be stronger per-FLOP/IPC than Ampere due to the 50% L1, Double L2 cache increases.

The 1650 Max-q is a 2->2.5Tflop gpu and the 1650 desktop is 2.6->2.98Tflops, so even if the target was 1.6Tflops 8SM@ 800Mhz I would guesstimate raw performance somewhere in between those two cards (factoring in architecture advancements, DLSS, +cache increases and such).
Image quality on the other hand should be much closer to the 1650 desktop if not better as a good measurement using DLSS in docked mode...


It's a pretty long video but the image quality shown across a multitude of games on the 1650 desktop probably puts most games at or slightly above PS4 Pro level of fidelity. Plus I agree that both Vulkan or NVN both should allow developers to put out very great looking games for the spec level of the new Switch's hardware.

 
I'm personally not betting or expecting on RT cores being used on Switch 2, or perhaps at a very limited capacity. There is an insider here who says Nintendo had to dial back the RT cores a bit because its affecting battery life..
the RT Cores are tied to the gpu clock. so, if RT isn't working in handheld mode, they have to flat out turn it off if they can't lower fidelity even further. I don't think NIntendo will want devs to have to light a game twice , so either they get more performance in handheld mode, or they might scrap RT for games. they could just turn off RT, I guess, and deal with the gulf in visual fidelity.

and this is assuming they already can't go any lower. for a 720p panel, will 360p reflections, GI, shadows, etc look good? will they even perform good? 180p resolution RT is just out of the question. might as well go with screen space effects
 
the RT Cores are tied to the gpu clock. so, if RT isn't working in handheld mode, they have to flat out turn it off if they can't lower fidelity even further. I don't think NIntendo will want devs to have to light a game twice , so either they get more performance in handheld mode, or they might scrap RT for games. they could just turn off RT, I guess, and deal with the gulf in visual fidelity.

and this is assuming they already can't go any lower. for a 720p panel, will 360p reflections, GI, shadows, etc look good? will they even perform good? 180p resolution RT is just out of the question. might as well go with screen space effects
Well, there are things like RTXGI which can be scaled to software level enough to run on Xbox One.

So why not just have the Portable Profile be built first, see if RT can work in portable for that, then if not, just have it be in software, and then when docked, up the sample count, and turn RT On.

We already have Switch games where things like SSR are turned off in portable mode, I don't see much of the problem here about a Visual Quality difference.
 
I personally don’t think going for a higher GPU clock is worth it for a device like this. I am of the opinion that going for a higher than expected CPU clock and a relatively low handheld GPU clock is the right way to go for longevity reasons.

We still have to account that this device will have a screen, Memory, sound, wifi, Bluetooth, etc.


The recent discussion is only talking about the SoC potential and that is the vacuum conversation that does not include these other factors (also leaning too much on the higher end and I think that expectation should be lowered and start thinking of the handheld aspect of this device since it still has that to contend with)

The OG switch was around 7-9W in handheld mode right, with the Mariko unit consuming 5-7W in handheld mode and ~10W in docked mode if I’m not mistaken (and OG at 13W most cases).


I think CPU we have a few configs that can be considered:

IF 4+4: 2.35GHz for the big cores and 1.1-1.3GHz for the little cores, you should get around the SD865 with this in CPU performance

IF 6C: 1.7GHz for the big cores, slightly lower perf vs the previous configuration but consuming a bit lower in CPU I think.

IF 8C: 1.3-1.4GHz for the big cores, you get around a bit better performance than the other two set ups(should be around 5% higher perf than the first set up)


The three configs should consume between 2 and 4w of battery life for the portable device.

This isn’t what they will necessarily use, simply what I think the conversation should

GPU has to consider a few possible configurations:

4SMs, 6SMs, 8SMs.

The lower it is the higher it can be clocked, especially in docked mode.


I’m not sure how the conversation reached to expecting 2TFLOPs in docked mode when that’s still consuming around, what, 20W with the other stuff removed? What are the chances of the switch Dane unit even reaching this? 2TFLOPs is more of a ceiling, not the average.

1.4-6TFLOPs at best is a reasonable assumption to make of what to expect for an 8SM configuration(possibly too optimistic still). But in portable mode it has to consume a lot less to fit the adequate battery life of the switch. Again, let’s assume it is aiming for an OK battery life, like say the Lite battery life, it has to be within the 3-7 hour range.

4SMs can be clocked higher for a performance target of say 1.1TFLOPs docked and ~500-600GFLOPs portably (so ~500MHz and 1.1GHz respectively, for example). This is edging slightly above what the Orin and Orin NX GPU cap of 1GHz but those are also chips that are several times bigger than this soc.


If you have 6SMs, which is what I expect personally, you can have above the switch docked experience in handheld mode and what reaches the PS4 experience in docked mode (but more modern) in real world performance still. I’m thinking that using the current switch higher presets for handheld (460MHz) and the TX1 highest preset (920) for docked is pretty OK and reasonable.

You get ~700GFLOPs and 1.4TFLOPs respectively. This should be within 14-16W.


Believe it or not, that’s about the SD845 in terms of paper specs for the handheld GFLOPs, but ampere is several times better than the Adreno GPU. SD845 does 737GFLOPs, this would be doing ~700GFLOPs in handheld mode, and both are on SEC nodes of the 10nm family.


For the 8SM configuration, I do not think they will aim higher than having 16W (and in being g generous with this higher than expected TDP) in docked mode. I think that, in handheld mode it would be clocked lower and closer to the current switch clock speed, 307 and 768MHz for the hand and the docked experience which can give us similar to the 6SM in handheld perf (slightly lower) and a bit above in handheld mode for perf. So, 786GFLOPs (lowest clock of 307MHz would be slightly lower than this) and 1.572TFLOPs in docked mode.


RAM with respect to memory bandwidth will be clocked lower in portable mode for obvious reasons, reaching 88GB/s is very possible in the handheld form factor with it clocking higher in docked for 102GB/s (a paltry increase). 8-12GB of RAM is very possibly imo, but I think 12GB is more possible than 8GB. (Even if they don’t need that much). I do not see 16GB.

The Screen, the storage, the SD card slot, the sound, the Bluetooth, the wifi, the fan, etc., all have to be accounted for with this device. If you have the idea that it will consume 10W in portable mode, lower that to include the other features this device has to contend with and you get more or less the actual power draw range to expect (I think 6-10W is a very possible range) on an 8N node. We don’t have any better node and have no confirmation of it being on a different node, so efficiency concerns should be considered here.


With the node, convo surrounding the 8N process. But good news, 8N is better than the regular 8LPx at doing the job so it’s not the end of the world for being on that node. Also, the 8nm process does NOT suffer from the same inefficiencies as the 20nm process and the A78 does not suffer from the same flaws as the A57-73 which were noticeably inefficient in design compared to the A78.



Still, I think that these are more in line of what to expect between CPU and GPU clock speeds for this device. I’m sorry if this is being a Debby downer, but a lot of the recent discussion has only focused on the docked mode experience outside of the realm and not so much the handheld experience which is what is the basis for this device entirely, it is the limiter of what it can do.

You can mix and match between the different configs and perhaps find any of them to be likely in this device IMO.



What size do I expect the die to be? 100mm^2 or if above between 100-120mm^2.

Of course this is all just for speculation purposes but the reasons we try hard to nail down a span of potential is because we don't have any current Nvidia hardware in an SoC form that's dedicated to gaming. Even in comparison to what the TX1 is on 20nm(which was not a great node for mobile at all) 8nm is and has always been a great base for a mobile chip design...

The docked mode has been the main topic of discussion because Nintendo has shown us they are more willing to play with the handheld clocks over time but docked performance has been consistent. Also there's just so many more advancements to hardware that have happened since the TX1 first came on the scene that performance and efficiency of the new chip should easily be night and day.
 
the RT Cores are tied to the gpu clock. so, if RT isn't working in handheld mode, they have to flat out turn it off if they can't lower fidelity even further. I don't think NIntendo will want devs to have to light a game twice , so either they get more performance in handheld mode, or they might scrap RT for games. they could just turn off RT, I guess, and deal with the gulf in visual fidelity.

and this is assuming they already can't go any lower. for a 720p panel, will 360p reflections, GI, shadows, etc look good? will they even perform good? 180p resolution RT is just out of the question. might as well go with screen space effects
if they enable DLSS in handheld mode, they can certainly maintain a decent performance with RT on and with 720p resolution (upscaling from 360 or 540 to 720p)🤔.
 
if they enable DLSS in handheld mode, they can certainly maintain a decent performance with RT on and with 720p resolution (upscaling from 360 or 540 to 720p)🤔.
I just ran the Boundary RT demo in 720p ultra performance. man.... the artifacting was something fierce. but it was an older version of DLSS (2.1 to be exact). so it's doable

over 200fps tho

unknown.png
 
0
I think outside of Orin having two RT cores per SM, Nvidia hasn't really disclosed anything about Orin's ray tracing performance.

Is there a possibility that Orin's ray tracing performance is similar to Ampere's ray tracing performance? And the reason why Orin has two RT cores per SM instead of one RT core per SM is because Orin's designed for autonomous purposes, not gaming purposes? And there's a possibility that Dane can have one RT core per SM due to being customised specifically for gaming purposes? I think that's certainly a possibility.

But assuming that kopite7kimi is correct about the GPU on Dane being based on Lovelace, at least in some capacity, I think there's also a possibility that Lovelace can achieve comparable ray tracing performance as Ampere with fewer RT cores compared to Ampere, which could potentially result in having a lower overall power consumption, potentially killing two birds with one stone, at least to a degree. And I believe RedGamingTech has heard that there's a big focus on ray tracing with DLSS 3.0, which could debut on the launch of the consumer Lovelace GPUs. (Keep in mind RedGamingTech isn't considered the most reliable source. So RedGamingTech's words should be taken with a healthy grain of salt.)

My point is that there's still at least a good amount of information that Nvidia hasn't disclosed about Orin. And there's a possibility that some information about Orin can only be unveiled through specialised methods, with one example being analysing a very high resolution die shot photo of Jetson AGX Orin, Jetson Orin NX, or Dane.
 
They bothered to change the setup and keep RT cores, so they definitely found a use for them. While orin is automotive first, they probably kept in mind the future of RT. Apple and ARM has both mentioned RT and Samsung is using RDNA2 with RT acceleration. And Nvidia ain't gonna be the last in this.

Their internal demo must look quite good. Maybe they'll show it off in the future at some point
 
They bothered to change the setup and keep RT cores, so they definitely found a use for them. While orin is automotive first, they probably kept in mind the future of RT.
I'm not saying there's no use for ray tracing in autonomous applications. I'm saying that ray tracing usage for autonomous applications is probably not as extensive as ray tracing usage for games, which is potentially the reason why Nvidia decided to have one RT core per two SMs for Orin as opposed to one RT core per SM for Ampere.

I've mentioned there's a potential possibility that Dane could have one RT core per SM, like with Ampere.
But I've also mentioned that there's also a potential possibility that the Lovelace could potentially achieve comparable ray tracing performance as Ampere, but with fewer RT cores. And fewer RT cores could result in the total overall power consumption being considerably lower, which is definitely beneficial for a hybrid console in handheld mode. Therefore, Dane could get away with having one RT core per two SMs. After all, kopite7kimi did comment on the GPU on Dane being based on Lovelace, at least in some capacity. And although RedGamingTech isn't considered the most reliable source, and what RedGamingTech said should be taken with a healthy grain of salt, RedGamingTech did hear about ray tracing being the big focus with DLSS 3.0, which could debut during the launch of the consumer Lovelace GPUs.

And I've mentioned there's still at least a good amount of information that Nvidia hasn't disclosed about Orin, so who knows?
 
0
I feel confident Nintendo will again lower the clocks on Dane. They will be ultra cautious as they always are. Hopefully people are taking this into account with their speculations.
 
I feel confident Nintendo will again lower the clocks on Dane. They will be ultra cautious as they always are. Hopefully people are taking this into account with their speculations.
Of course, it has to be lowered to meet a portable battery life, anyone expecting the device to suddenly not clock lower in this unit isn’t looking at the whole picture, only one part: docked perf

Of course this is all just for speculation purposes but the reasons we try hard to nail down a span of potential is because we don't have any current Nvidia hardware in an SoC form that's dedicated to gaming. Even in comparison to what the TX1 is on 20nm(which was not a great node for mobile at all) 8nm is and has always been a great base for a mobile chip design...

The docked mode has been the main topic of discussion because Nintendo has shown us they are more willing to play with the handheld clocks over time but docked performance has been consistent. Also there's just so many more advancements to hardware that have happened since the TX1 first came on the scene that performance and efficiency of the new chip should easily be night and day.
I simply doubt they will be more flexible with the docked perf and on top of that for such a small device. Or that it will be around +2TFLOPs for the GPU.

@ReddDreadtheLead What you're saying here is pretty much the consensus here actually in terms of conservative estimates. ~1.5 TFLOPs GPU, 6-8 core A78 CPU, DLSS support
It seems like it though the convo has shifted to the upper end (a tad beyond that really) which is what I meant by being a Debby downer in this.
 
Of course, it has to be lowered to meet a portable battery life, anyone expecting the device to suddenly not clock lower in this unit isn’t looking at the whole picture, only one part: docked perf

Im talking about whatever it ends up being Nintendo will be extra cautious. They will probably lower clocks lower than needed. I’ve been impressed with what switch has accomplished with its setup and clock speeds. I just hope Nintendo designs well so the clocks can run at close to maximum in its setup.
 
0
It would be really useful if we could get a specs list of the current Switch, maybe in the OP or maybe as a thread mark. All this talk about how many flops and hertz and bytes would have a lot more context.
 
0
How about Nintendo targets 18w for docked if 20w is a bit too much (cooling problems) and that 15w is a bit too low performance wise?
 
0
What is better for heat and power draw 8SM’s clocked “low” or 4SM’s clocked high?

The heat/power consumption increases exponential at some point. So it‘s depending on where they want to land with the performance.with unambiguous goals 4SM might be the simpler way.
 
0
What is better for heat and power draw 8SM’s clocked “low” or 4SM’s clocked high?
Rule of thumb: more at lower clocks spread out can produce less heat than smaller more packed at higher clocks


Even “more at lower clocks (packed closer)” can produce less heat than “farther spread out at higher clocks”

This has more to do with the physics of it really.

With energy consumption it can depend, but can be more performant at a similar power draw for the higher SM that is clocked lower vs the lesser SM clocked higher
 
0
My only concern with completely using AGX Orin's manufacturing process versus what Nintendo and Nvidia might use for Dane is that Samsung are most likely using an automotive grade 8nm process that is designed to work within extreme temperatures (in of itself might be limiting on purpose to transistor density for the reason of reliability).

That's my only concern is that yes all of this is giving us a great blueprint of possibilities, but the end products will be for different purposes and Nintendo needs a chip that's not only performant in handheld but can scale up enough while docked, but also needs to be efficient as possible.
A 4SM part would need to be clocked at 1.2 - 1.4Ghz to be in the XboxOne range of raw theoretical numbers and this would be much higher than even where Nvidia have capped the clocks on the Orin SoC...

I also believe that Nvidia knows the Ampere architecture becomes very inefficient at higher clocks which is why it's capped the way it is in Orin, something they also address in the mobile RTX 30 variants of having lower base clocks to achieve a much better TDP range.

That's a fair point regarding manufacturing process, and looking into it Samsung did announce they were working on an automotive-grade 8nm process a couple of years ago. I'd still say with density pretty much identical to their 8N process, it's probably very close to standard 8N.

Regarding clock speeds, I don't think that's a process issue, I think it's just the reality of using a really big chip within the power/thermal limitations of the Jetson AGX form factor they've chosen (which is carried over from Xavier). They're limited to 50W for the full board, which means they're simply not going to be able to run a 460mm2 chip at full clocks. Consider that the GA104, which is a smaller die, consumes well over 200W when running at full clocks. The Drive AGX Orin doesn't seem to have the same limitation, as it advertises 254 TOPS (rather than 200), which would suggest a GPU clock as high as 1.5GHz (assuming the DLA clocks don't change much).

I actually wouldn't be surprised to see a Dane in the next Switch running higher clocks than Jetson Orin, perhaps up to about 1.2GHz docked, simply because with a smaller die and smaller GPU it'll have the capacity to.

Why are you used unknown numbers of TX1 that has 2 unique SMs and a 7 year old design, vs Xavier, a 8SM Volta GPU which we have exact numbers for. The Volta GPU which includes 64 tensor cores is 89mm² on 12nm. That is the official size of the GPU, even if you assume Ampere has larger GPU logic, you are looking at a process node twice as dense...

Orin NX's architecture for 8SM is a single GPC too, so you are required to include some logic from 1SM all the way to 8SM, while a whole new GPC logic would be needed to increase the size further. Also just looking at official thermal numbers and your own expectations for the CPU clock, Orin NX's configuration without the extra logic makes a lot of sense.

We have seen numbers for Volta, I doubt pretty heavily that we are looking at a GPU in Dane with this configuration, anywhere near as large as Orin NX.

I don't know what you mean by "unknown numbers" or what Xavier has to do with it, but here's my point: The TX1 was used in the original Switch and is about 121mm2 in size. I don't think it's reasonable to expect that their new chip for the next Switch (Dane) will be substantially bigger than that. We know from die shots of TX1 that Nvidia could spend 12mm2 of that 121mm2 on SMs. We also have a reliable source telling us that Dane is very similar in architecture to Orin, and we now know that an Orin SM takes up about 4.4mm2 of die space each (and that's the old Orin, they may be larger on the new 21b transistor Orin).

I'm saying that Nvidia might be able to squeeze 4 Orin SMs onto Dane, which would take up 17.6mm2, almost a 50% increase over TX1, and even then they're going to have to find that space somewhere else on the die. When people are saying they expect something like the Orin NX GPU for Dane, they're talking about tripling the die area of SMs over TX1, which I think is unrealistic unless you expect a significantly larger die, which is just not something I expect.

Thraktor


His main argument is that anything bigger than 4 A78 + 4 A55s with 4SMs would be "Very Unlikely" due to size restrictions.

(Although as Z0m3le pointed out, the numbers he is running are likely far off on the size argument because of him comparing The TX1 to Orin which he should be comparing Xaiver to Orin as we have a proper GPU measurement for Xaiver and Xaiver is 8SMs too and Orin is on a node twice as dense)

TBH, I feel Thraktor is sort of ignoring the reality of the situation here and that that sort of config he is suggesting could cost as much, or more than the 8 A78C + 8SM config, but perform far worse.

It would cost that much due to piling RND Costs due to diverging so much from Orin
(A55s, non-cache based A78s, redesigned cooler and power delivery to push 4SMs to clocks to make everything work, Upfront and accelerated development costs for a DLSS Replacement for Launch versus being able to use DLSS at launch and develop the in-house version for longer.etc)


Like, TBH, 6 A78Cs with 6SMs is the bottom-line now realistically, and even then that would only cost marginally cheaper than 8 A78Cs and 8SMs due to the die size GPU-wise not going down too much due to it needing that full GPC.

Nintendo account for more of Nvidia's revenue than the entire automotive industry. The amount they'd lose by producing tens of millions of dies at twice the size they need to be would absolutely dwarf the R&D costs of using a different GPU configuration (which hey, they're going to have to do anyway, because they're clearly not using full Orin).

My point in using 4xA78+4xA55 is that it would require new R&D costs for Nintendo that A78C cores wouldn't, and that we know a 4xA55 cluster is about the same size as an A78 core, and the performance per clock would also be the same... So 6xA78C cores make more sense than 4xA55 cores, as you get an extra A78 core and 1 A78 core can make up for the 4 A55 cores. DynamIQ also allows that single core to be clocked separately than the gaming cores if needed.

The transistor cost for 6xA78C not including the extra cache would only be ~20% larger than Thraktors CPU configuration, not have a separate R&D cost and offer over 30% more performance, depending on how much the cache helps, it could be 50% more performance.

Note I said maybe some A55s. I'm not ruling out a full A78 setup, but I'm also not ruling out a couple of A55s. It could be two A55s, which should be sufficient to run Switch's OS, would take up less die space and less power than an A78 core, and would allow the system to run much more efficiently in low-power states (eg sleep mode).

No, Orin NX is Orin S, look at the picture above. Dane is what Nintendo will use, and that isn't Orin NX, it removes those AI components, but it still will have a 10-15 watt TDP and use the same architecture, core counts don't really matter when talking about Dane, it will be the same performance per watt as Orin NX outside of the automated engine stuff.

Orin NX isn't Orin S. You can see from Nvidia's photos that the die size of the Orin AGX and Orin NX chips is identical, so it's clearly a binned full Orin die.
 
Nintendo account for more of Nvidia's revenue than the entire automotive industry. The amount they'd lose by producing tens of millions of dies at twice the size they need to be would absolutely dwarf the R&D costs of using a different GPU configuration (which hey, they're going to have to do anyway, because they're clearly not using full Orin).
is your note about nvidia losing money tied to larger die size and charging Nintendo the same amount per chip? Isn't an increase in price for the next SoC very likely?

Also given TX1 was on a 20nm process and we are expecting Dane to be on a 8nm process and assuming they drop the unused A53 clusters in the new custom design for Nintendo, how much extra die space will be avilable to them at 8nm on the same physical chip size as the OG TX1?
 
Also given TX1 was on a 20nm process and we are expecting Dane to be on a 8nm process and assuming they drop the unused A53 clusters in the new custom design for Nintendo, how much extra die space will be avilable to them at 8nm on the same physical chip size as the OG TX1?
Considering that Orin's using the Cortex-A78AE, and Dane's likely to use the Cortex-A78 (4 Cortex-A78 cores & 4 Cortex-A55 cores or the hexa-core (6) configuration of the Cortex-A78C or the octa-core (8) configuration of the Cortex-A78C) by extension, using the Cortex-A53 is definitely not an option.
 
I feel people are ignoring just how small Orin NX is despite it having the same die as Orin?

It's 70x45mm.

That is smaller than the Jetson TX1's board.


And Dane would be able to fit into a board of same SoC to PCB proportions as Orin NX within a board of 45.22.5mm (1012.5mm2)

For reference, here is what a 40x30 (1200mm2) board would look overtop the RAM and SoC section of the Switch's OG Motherboard

Overlay_Mobo.png


Fits i'd say considering Orin NX has SoC and RAM only, and that part of the Switch Motherboard is in charge of that, and that square is 200mm2 over the general size half of Orin NX would be.
 
Of course, it has to be lowered to meet a portable battery life, anyone expecting the device to suddenly not clock lower in this unit isn’t looking at the whole picture, only one part: docked perf
It's important to recognize, however, that batteries have improved a fair amount in 6 years since Switch was likely entered into final engineering. On average, Li-ion battery performance improves at a rate of roughly 5% per year, so Nintendo will be able to squeeze more performance out of a new battery to the tune of ~30% or so, which gives a lot more flexibility on this front. It's not as though they're going to drastically improve hours of battery performance, customers seem quite happy (or at least content) with the battery performance, especially since the Mariko update.
 
0
Alright, I'm gonna entertain this thought experiment: What If Nintendo also releases a docked variant of Switch 2 that is basically the full AGX Orion chip but:
-auto stuff disabled
-disabled 4 CPU cores, so only 8 are running CPU clocks are capable of running higher than Switch 2 hybrid.
-GPU running a full 1GHz (5 TFLOPs?) with all the DLSS and RT hardware.
-The RAM is also only 16GB instead of 32, but it's 256 bus width with +200 GB/s bandwidth is there.

Why would we get this? Nintendo is interested in making a family of switch consoles and could have it's niche just like a handheld, but having higher performance than the hybrid will be the draw in, all while being the same price as regular Switch 2 ($400). No exclusive games as well...

This console would take advantage of the full GPU power and RT cores, and blow the Series S out of the water without DLSS even factored in , and with DLSS enabled, could run fairly impressively performance wise against PS5.

This machine will be a 2k-4k native of switch 2 games without DLSS, but will also have better performance for the same games, particularly for PS5/x series ports.

Now the only thing is the CPU though. Could 8 A78s at a full 2Ghz each get pretty pretty close to Ryzen CPU of the x series S and PS5? And if so.. how much?A nd timing of release. Same time as the Switch 2 or 2-3 years after? 🤔
 
How do I put this... if they're going to do a TV-only non-hybrid variant, it'll be for the original Switch, to bring the console to a hyper-low price (thanks to no battery, no screen and no dock, maybe offering a Pro controller instead of 2 Joycons), something in a potential $100 range, and sell to demographics with tighter entertainment budgets or to emerging markets with low median incomes, as a means to extend the sales longevity of the original Switch.

A more powerful TV-only variant of Dane doesn't seem like it would be in the cards.
 
Alright, I'm gonna entertain this thought experiment: What If Nintendo also releases a docked variant of Switch 2 that is basically the full AGX Orion chip but:
snipped

I... I don't think Nintendo's gonna do a dock-only model for various reasons. I feel if they had, they would've already done so with the current Nintendo Switch. The idea is either having a portable or hybrid device that can be brought on the go. Docked only model would defeat the entire purpose of why Nintendo created the system and development environment to begin with.

Again, anything can change in the future, but I just don't see this happening if it hasn't already.
 
0
I feel docked only doesn't offer the same niche as handheld only.

The Lite was a cheaper version, but it's small size and integrated structure made it more optimised as a form factor. The games still played the same.

A Switch Heavy wouldn't be able to offer any advantage over a normal Switch unless it was performance, but that makes the standard Switch the sub-optimal version.
 
Now the only thing is the CPU though. Could 8 A78s at a full 2Ghz each get pretty pretty close to Ryzen CPU of the x series S and PS5? And if so.. how much?
Assuming that the successor to the Snapdragon 8cx Gen 2 is using the octa-core (8) configuration of the Cortex-A78C, which seems to be the case going by how the CPU cores are described in the rumour from Roland Quandt, which seems to be vindicated by the information written on "CPU Information" (here and here) in the Geekbench 5 benchmarks, and using the Geekbench 5 benchmarks for the AMD 4700S, which is based on the PlayStation 5's APU, the octa-core configuration of the Cortex-A78C is overall theoretically very close to the Zen 2 CPU in the PlayStation 5 in terms of single-core performance, but the octa-core configuration of the Cortex-A78C can theoretically range from being ~43.63% to ~64.36% slower than the Zen 2 CPU in the PlayStation 5 in terms of multi-core performance.
 
Last edited:
I feel docked only doesn't offer the same niche as handheld only.

The Lite was a cheaper version, but it's small size and integrated structure made it more optimised as a form factor. The games still played the same.

A Switch Heavy wouldn't be able to offer any advantage over a normal Switch unless it was performance, but that makes the standard Switch the sub-optimal version.
I dunno, The lite model isn't even that good. Battery life is only barely better than V1 switch.
. If the battery was similar to V2, j would only maybe recommend it.

With my hypothetical scenario, I'm suggesting a reason for people to buy a docked only mode for those on the fenced and that's more performance in docked. Higher resolution, framerates etc. Essentially a switch pro with more marching performance for 3rd party games to the of the x series s and PS5.

or they could make it 50-100 cheaper with the same specs as hybrid.
 
Last edited:
There are several points that need to be clarified.

- Orin (NX) will never be used in a home console device. Its massive >350mm2 die size (actually closer to 400~450mm2) would make it more expensive than Series X's chipset.

- It seems that the node used for this chip doesn't rely on HD libraries while 8LPA has been advertised with a power and density upgrade over 8LPU (which was the last upgrade announced before the Ampere line announcement). It's supposed to have a density between 48MTr/mm2 in the best case scenario and 37MTr/mm2 for a 450mm2 die size which is lower than E9820/9825's 67MTr/mm2. While T239 could reuse some if not most of T234 design and architecture, it will probably use a different process rather it is another version of 8N (SEC8N+/8LPA?) or 7LPP which uses some EUV layers in order to lower the power consumption. We will have to wait for GA103 release in order to see if there has been any improvement from Samsung on their 8 nm process especially in the density area.

- Packing more the 4 SM in a 120mm2 SoC for Switch 2 will be more or less impossible with Orin density on the exact same process. That means that they will probably use a different GPU design (that supports uHD libraries) than the one used on the Orin SoC. That said, I would expect the 2022 Samsung Foundry to be able to pack 10BTr in 120mm2 chipsets on their 8 nm DUV node the same way TSMC's did the same with Apple A12X/Z in 2018. Or at least be able to propose the same price with a newer node.
 
0
I dunno, The lite model isn't even that good. Battery life is only barely better than V1 switch.
. If the battery was similar to V2, j would only maybe recommend it.

With my hypothetical scenario, I'm suggesting a reason for people to buy a docked only mode for those on the fenced and that's more performance in docked. Higher resolution, framerates etc. Essentially a switch pro with more marching performance for 3rd party games to the of the x series s and PS5.

or they could make it 50-100 cheaper with the same specs as hybrid.
Aside from a super-cheap EoL revision like I suggested for OG Switch, the only reason I can think to see a TV-only variant would be if Nintendo's got usage statistics that suggest there's a substantive customer base that uses their Switch near-exclusively while docked or has enough feedback from people to suggest they want one that way. And if there was such statistics, I feel like we would have seen it by now.
 
0
Please read this staff post before posting.

Furthermore, according to this follow-up post, all off-topic chat will be moderated.
Last edited:


Back
Top Bottom