• Hey everyone, staff have documented a list of banned content and subject matter that we feel are not consistent with site values, and don't make sense to host discussion of on Famiboards. This list (and the relevant reasoning per item) is viewable here.

StarTopic Future Nintendo Hardware & Technology Speculation & Discussion |ST| (Read the staff posts before commenting!)

Most importantly, there are some key points that are still totally unknown about Orin, like frequency of the Cortex-A78s they're using with it, the core config, what type of RAM it will be configured with by default (understanding that might change depending on how custom Nintendo goes)... and we know next to NOTHING about what the Ampere GPU in this SoC configuration will actually be capable of or what fab process will be used, which would be the most telling bit of all. By knowing the TDP of each chip in the Orin lineup, we can make reasonable and more precise estimations of what performance is actually achievable when seeing a 15W TDP Orin chip compared to the 15W TDP Tegra X1 used by Switch at launch.

In other words, seeing specs of actual Orin hardware for the first time gives us an opportunity to determine the absolute minimum performance floor.
I would really only say an approximate soc for what can be used, but it’s unknown if the SoC is even more custom than we imagine. Or if it’s closer to a stock SoC.
 
And although this is not relevant to the DLSS model*, I think this is relevant to a potential DLSS model* refresh.
 
We're likely so close to Orin news (less than 24 hours, in fact), it's kinda exciting to maybe get a legitimate better picture of what the future of Nintendo hardware will look like when details drop.
keynote is in 8 hours, but the Orin and Jetson stuff might be later
A friendly reminder for the thread:

Jensen Huang keynote
Tue Nov 9, 12am PST / 9am CET

DLSS in Frostbite
Tue Nov 9, 5am PST / 2pm CET

"Jetson product roadmap"
Tue Nov 9, 10am PST / 7pm CET

"Orin product line up"
Wed Nov 10, 7am PST / 4pm CET

You must register, which is free, to view.
 
0
And although this is not relevant to the DLSS model*, I think this is relevant to a potential DLSS model* refresh.
Yeah, although RAM can be swapped out pretty quickly so it likely depends on the deal Nintendo can get with Samsung on it.

Unlikely, but not impossible as RAM is the most easily tweakable thing about Dane that can change closer to launch (like the OG Switch changed from 3GB to 4GB).

It would be nice to see them use LPDDR5X

Would help close the gap with the Series S as if a normal config of LPDDR5 would be around 102GB/s if each of those is 30% faster on their own, then that would be 133GB/s at 20% less power, so if they add that 20% power back (matching what LPDDR5 would consume at the same config) then they would hit nearly 160GB/s

EDIT:
NVM, Apparently not in mass production yet.
 
Last edited:
Yeah, although RAM can be swapped out pretty quickly so it likely depends on the deal Nintendo can get with Samsung on it.

Unlikely, but not impossible as RAM is the most easily tweakable thing about Dane that can change closer to launch (like the OG Switch changed from 3GB to 4GB).

It would be nice to see them use LPDDR5X

Would help close the gap with the Series S as if a normal config of LPDDR5 would be around 102GB/s if each of those is 30% faster on their own, then that would be 133GB/s at 20% less power, so if they add that 20% power back (matching what LPDDR5 would consume at the same config) then they would hit nearly 160GB/s
Samsung made no mention about the LPDDR5X DRAM being in mass production though. So by the time Samsung starts mass production of LPDDR5X DRAM, the DLSS model* would probably be already released. (Samsung announced the development of LPDDR5 DRAM on 17 July 2018. And Samsung announced starting mass production of LPDDR5 DRAM on 25 February 2020.)
 
Do we have expected release times of the new jetson/orin chips or can we expect that at the press confrence. Also what is the difference between the two?
That's a possibility.

Jetson is a series of embedded Arm based SoCs from Nvidia designed for machine learning applications. Orin is the upcoming Arm based automotive SoC from Nvidia that's scheduled to be released in 2022.

So Jetson Orin is a paired down variant of Orin, configured to be used for machine learning applications, instead for automotive car applications.
 
Do we have expected release times of the new jetson/orin chips or can we expect that at the press confrence. Also what is the difference between the two?
Well, Xavier was officially showcased (as in, not just hinted/alluded/vaguely referred to) at Hotchips in August 2018 and came out in March 2019.

Considering we have had the "Vague Phase" of the Orin SoC already since its announcement, we can expect Jetson Orin's turnaround to be similar or shorter. So likely will be out alongside the other Orin Chips in Q1 2022
 
I would really only say an approximate soc for what can be used, but it’s unknown if the SoC is even more custom than we imagine. Or if it’s closer to a stock SoC.
Most of the configurability is going to be in cache, RAM, etc. Maybe a swap of the CPU to a more standard Cortex-A78 (A78AE is highly likely to be unnecessary for Nintendo's specific application of the CPU). I don't see much that would be reconfigured in the Ampere GPU except the number of cores and clock speeds, but... well, if Nintendo's going to customize it, it'll be to squeeze more performance out of it, not less. Hence why I said "performance floor", because what Orin can do in the 10-20W TDP range in the SoCs they announce is going to tell us the barest minimum of what to expect before customizations. And properly-set expectations are always better.
 
Last edited:
Most of the configurability is going to be in cache, RAM, etc. Maybe a swap of the CPU to a more standard Cortex-A78 (A78AE is highly likely to be unnecessary for Nintendo's specific application of the CPU). I don't see much that would be reconfigured in the Ampere GPU except the number of cores and clock speeds, but... well, if Nintendo's going to customize it, it'll be to squeeze more performance out of it, not less. Hence why I said "performance floor", because what Orin can do in the 10-20W TDP range in the SoCs they announce is going to tell us the barest minimum of what to expect before customizations. And properly-set expectations are always better.
Well more accurately it helps us more properly extrapolate performance at that level (as there is a likelihood that Jetson Orin would be derived from Big Orin rather than a cut-down Dane)

Although it would be impressive if the majority of that giant SoC is actually the A78AEs and the Automotive elements and Jetson Orin is far smaller than Big Orin because of the swapping to A78Cs and removal of the Auto-elements. (And maybe a shrink to Samsung 7nm)

Either way, we will get some indication of what Dane will be like from this even if Jetson Orin is derived from Big Orin GPU wise (as the GPU is the main size factor here for performance comparisons as going past 8 cores you won't get too much extra performance anyway)
 
0
2K is actually 2048x1080, not 2000x1080, and Nintendo makes consoles for consumer-grade TV, so I doubt they make a console that can output QHD which is linked to PC monitors, as I doubt all consumer-grade TV accept QHD, and that's what Nintendo will chase.
1440p can't be displayed on 4k tvs?? I mean we have PS4 pro, x bone x, and current gen x series s/x and PS5 games that run on QHD/1440p resolution. It's very likely we'll see resolutions between full HD and 4k.

And although this is not relevant to the DLSS model*, I think this is relevant to a potential DLSS model* refresh.
Yeah, probably for power savings only. sigh. I'm guessing lpddr5x is a q4 2022 release then. Hard to believe lpddr5 will be 2cyrsrs old next quarter.
 
Last edited:
0
Q4:
They say semiconductor supplies will be very tight during 2022 and into 2023, so how do you see this affecting Nintendo Switch production quantities? Would it be possible to boost production through adjusting hardware design? Also, how is the hardware supply shortage affecting software development?
A4:
Furukawa: Since the start of this fiscal year, we have continually said that semiconductor components are in short supply and the future is uncertain. There has not been a major improvement in the situation so far, which is why we do not think we can produce hardware in the quantities we initially expected and have revised our unit sales forecast for the second half. However, even though hardware supplies might be limited, there has not been any impact on software sales. We will continue to share the appeal of new and evergreen titles with consumers through the end of this year and into next year. By maximizing our software sales as much as possible, we aim to maintain the momentum of the Nintendo Switch business. We are reviewing our unit sales forecast for the next fiscal year and beyond, so we would like to revisit that topic once we reach the point where we can make specific announcements.

Ko Shiota (Director, Senior Executive Officer): To mitigate the effects of the tight supply situation for semiconductor components, those working on Nintendo Switch hardware development are also continuing to take measures such as evaluating alternative components and reviewing our designs.

Miyamoto: With so many consumers worldwide having already bought a Nintendo Switch, the situation has not had any major effects on software development. In addition, hardware is not an issue for our mobile business or our visual content business.
Q5:
I'd like to hear your thoughts about the next generation of hardware, both its concept and launch timing.
A5:
Furukawa: We are not able to comment about the next game system at this time. It is now the fifth year since the launch of Nintendo Switch, and the total hardware sell-through has exceeded 90 million units. We recognize that the system is at the mid-point of its lifecycle. The launch of Nintendo Switch – OLED Model has also been contributing to continued sales momentum and we are now offering consumers three Nintendo Switch models to match their play styles and lifestyles, as well as a wide range of software. With this, we believe a foundation for growth has been laid that exceeds what we previously considered to be a conventional hardware lifecycle. With regards to the next game system, we are considering many different things, but as far as the concept and launch timing are concerned, there is nothing we can share at this time
 
Crazy how much the official phrasing on that last sentence changes the message. It’s pretty much a non statement.
I mean, what do you expect? "Yeah, we have new hardware around the corner and it will have so-and-so features." You could see their competitors already taking notes.
 
The 15W Orin S chip, as I understand, is running the full Orin chip which is advertised at 15-50W. That would mean that Dane is probably less powerful than this 15W version of Orin IMHO.

Also, no info concerning the node used nor the GPU configuration/core clocks. We will have to wait for the dedicated Orin/Jetson presentations for more detailed informations.
 
Last edited:
I mean, what do you expect? "Yeah, we have new hardware around the corner and it will have so-and-so features." You could see their competitors already taking notes.
No I know that, I'm just comparing to the unofficial translations from before where it made it seem like they were still actively in the concept phase of designing the next gen. The official translation merely says that they have nothing to share with regards to the concept itself. It went from "oh we're still in the early days of designing the thing" to "we can't even talk about what it might be OR its release date".

Again, I never paid attention to even the initial translations and at most believed Nintendo was just playing with words. I'm just pointing out how tricky Japanese to English translation can be and how we should all wait for official and/or professional translations to pass judgement.
 
Crazy how much the official phrasing on that last sentence changes the message. It’s pretty much a non statement.
"We think we got this as is, please pay no attention to the SoC behind the curtain."
The 15W Orin S chip, as I understand, is running the full Orin chip which is advertised at 15-50W. That would mean that Dane is probably less powerful than this 15W version of Orin IMHO.

Also, no info concerning the node used nor the GPU configuration/core clocks. We will have to wait for the dedicated Orin/Jetson presentations for more detailed informations.
15-50W is the advertised TDP range, based on the multiple different designs/configurations. If Orin S is 15W TDP, well... the original Tegra X1 was advertised at a 15W TDP, so... unless scaling back the CPU from A78AE makes a difference in TDP (?), that'll give us some good info to start from.

I also remember Orin S was tentatively being advertised at 10W TDP, so that'd be a bump up from where it was originally.
 
Jensen Huang announced that the Nvidia Drive Hyperion 8, which contains 1-2 Orin SoCs, is "available today". Not sure if this actually indicates that the Orin SoC is shipping now. Here are the Orin SoC specs:

4jgncyq.png


The "A78A" in the specs is probably a typo of A78AE. It's a bit surprising that Clara Holoscan uses the stock Orin, despite A78AE being intended for automotive applications.
 
"We think we got this as is, please pay no attention to the SoC behind the curtain."

15-50W is the advertised TDP range, based on the multiple different designs/configurations. If Orin S is 15W TDP, well... the original Tegra X1 was advertised at a 15W TDP, so... unless scaling back the CPU from A78AE makes a difference in TDP (?), that'll give us some good info to start from.

I also remember Orin S was tentatively being advertised at 10W TDP, so that'd be a bump up from where it was originally.
I believe that the full Orin chip can scale down to 15W rather it is with a deactivation of CPU clusters/GPU TPCs and/or with a CPU/GPU/DLA downclock. Orin S would be the (Xavier) NX version of the full Xavier chip.
 
0
Most of the configurability is going to be in cache, RAM, etc. Maybe a swap of the CPU to a more standard Cortex-A78 (A78AE is highly likely to be unnecessary for Nintendo's specific application of the CPU). I don't see much that would be reconfigured in the Ampere GPU except the number of cores and clock speeds, but... well, if Nintendo's going to customize it, it'll be to squeeze more performance out of it, not less. Hence why I said "performance floor", because what Orin can do in the 10-20W TDP range in the SoCs they announce is going to tell us the barest minimum of what to expect before customizations. And properly-set expectations are always better.
I suppose my issue is the phrasing of the context, you don’t necessarily “swap” a CPU core for CPU core like a jigsaw puzzle. The A78AE would be more integral to the design of the chip, and it’s cache set up.

It’s semantics, but I feel like using “replace” sounds better. Swapping seems to make it appear as though it can flip back and forth no biggie. Replace on the other hand is unidirectional for this.

Again, the semantics :p
 
0
Jensen Huang announced that the Nvidia Drive Hyperion 8, which contains 1-2 Orin SoCs, is "available today". Not sure if this actually indicates that the Orin SoC is shipping now. Here are the Orin SoC specs:

4jgncyq.png


The "A78A" in the specs is probably a typo of A78AE. It's a bit surprising that Clara Holoscan uses the stock Orin, despite A78AE being intended for automotive applications.
The GPU in the Orin SoC is surprisingly beefy in terms of ingeter math. In the GA10x whitepaper (page 12), RTX 3080 is listed with 238 TOPS for INT8 data type with tensor structure and 29.8 TFLOPS for FP32 data type with or without tensor structure. So compared to RTX 3080, this SoC has 70% throughput in
integar math but only around 17% throughput in FP math

Also it's a bit confusing why TOPS is the unit of measrement for floating point math but I guess it's just semantics to unify the performance metric
 
And although this is not relevant to the DLSS model*, I think this is relevant to a potential DLSS model* refresh.
Yeah, probably for power savings only. sigh. I'm guessing lpddr5x is a q4 release then. Hard to believe lpddr5 is 3 years old now. There is some speculation it could be available as soon as on the Samsung Galaxy S22 phone models though, which will be out in February I believe.
Yeah, although RAM can be swapped out pretty quickly so it likely depends on the deal Nintendo can get with Samsung on it.

Unlikely, but not impossible as RAM is the most easily tweakable thing about Dane that can change closer to launch (like the OG Switch changed from 3GB to 4GB).

It would be nice to see them use LPDDR5X

Would help close the gap with the Series S as if a normal config of LPDDR5 would be around 102GB/s if each of those is 30% faster on their own, then that would be 133GB/s at 20% less power, so if they add that 20% power back (matching what LPDDR5 would consume at the same config) then they would hit nearly 160GB/s

EDIT:
NVM, Apparently not in mass production yet.
But I wonder if lpddr5x allows 20% lower power consumption and an 30% boost at the same time at max speed (8.5Gbps). I don't think lpddr4x did, so I'm assuming no. Probably runs at the same voltage/power draw as LPDDR5 when it's it at max speed (8.5Gbps vs lpddr5's 6.4Gbps).
 
Nice find!

Hmm .. imagine if Nintendo releases switch 2 with two different models (not necessarily at the same time). Hybrid Switch using Orin S, and a docked only switch using a full Orion X.

Both models using 8 A278AE cores (remaining 4 disabled), with higher clocks on home console variant. But 128 bit ram in hybrid and 256 bit in docked (latter offering twice as much bandwidth and 12 GB Ram vs 8). GPU cores are essentially halved and at lower clock speeds 5.2TFLOPs vs 2.0 T and 0.9 Tflops. The home variant competes directly with the x box series s and offers superior GPU power, ray tracing and DLSS. Essentially close in performance to an RTX 2060. Hell home variant could use a LPDDR5X in my thought experiment and at least equally match the ram bandwidth of the series s.

The CPU is the only issue I have with, because we want smooth compatibility with all switch 2 games. Precious pro models use the same cpu but higher clocks than base models. I wonder if activating all 12 cpu cores and increasing clock count for Orion X Switch 2 home variant would put performance in line with x series s 🤔
 
Curiously, the cpu clock tops out at 2GHz and the gpu tops out at 1GHz. Probably due to the number of cores this chip pushes.
 
0
Nice find!

Hmm .. imagine if Nintendo releases switch 2 with two different models (not necessarily at the same time). Hybrid Switch using Orin S, and a docked only switch using a full Orion X.

Both models using 8 A278AE cores (remaining 4 disabled), with higher clocks on home console variant. But 128 bit ram in hybrid and 256 bit in docked (latter offering twice as much bandwidth and 12 GB Ram vs 8). GPU cores are essentially halved and at lower clock speeds 5.2TFLOPs vs 2.0 T and 0.9 Tflops. The home variant competes directly with the x box series s and offers superior GPU power, ray tracing and DLSS. Essentially close in performance to an RTX 2060. Hell home variant could use a LPDDR5X in my thought experiment and at least equally match the ram bandwidth of the series s.

The CPU is the only issue I have with, because we want smooth compatibility with all switch 2 games. Precious pro models use the same cpu but higher clocks than base models. I wonder if activating all 12 cpu cores and increasing clock count for Orion X Switch 2 home variant would put performance in line with x series s 🤔
There is no point to use a binned chip for a console. A78AE is less powerful than A78A and A78C. Finally, you don't need a 17BTr chip for 5TFLOPs device (Series S = 8BTr / Series X = 15BTr)
 
A78AE is less powerful than A78A and A78C
How do we know that? It’s still an A78 at its core (pun not intended, maybe), but with extra features suited for its specific need case. And why would they use weaker CPU cores for a device that is also supposed to be in automotive and healthcare industries? Having less or slower performance is not exactly a good thing to put in these devices, opposite really. The most performant for the low power consumption and offering pretty speedy work across 12 cores (2 6x A78AE clusters).

And finally, I’m pretty sure that it’s a typo and it’s meant to be A78AE not A78A. ARM doesn’t have a thing called A78A afaik.
 
How do we know that? It’s still an A78 at its core (pun not intended, maybe), but with extra features suited for its specific need case. And why would they use weaker CPU cores for a device that is also supposed to be in automotive and healthcare industries? Having less or slower performance is not exactly a good thing to put in these devices, opposite really. The most performant for the low power consumption and offering pretty speedy work across 12 cores (2 6x A78AE clusters).

And finally, I’m pretty sure that it’s a typo and it’s meant to be A78AE not A78A. ARM doesn’t have a thing called A78A afaik.
A78AE is using extra features (error correction for sensible and complexe critical tasks) at the cost of performances compared to A78/A78C (I thought the A78A naming existed).
 
A78AE is using extra features (error correction for sensible and complexe critical tasks) at the cost of performances compared to A78/A78C (I thought the A78A naming existed).
There's no A78A cpu, it's just a typo.

In any case, the usage of the AE over the C doesn't really change anything. I wonder what kopite7kimi thinks of this since there are no indications of the GPU being more than Ampere
 
There's no A78A cpu, it's just a typo.

In any case, the usage of the AE over the C doesn't really change anything. I wonder what kopite7kimi thinks of this since there are no indications of the GPU being more than Ampere
A78A is a typo in the Orin presentation. I wrote A78A as a reference for the mobile A78 variant used in mainstream smartphone SoCs but it seems that the original model is just called A78 without the A. I made a mistake here.
 
0
Curiously, Orin has 3 clusters of 4 A78s rather than 2 clusters of 6 A78s. Maybe we won't get the A78C and just get 8 cores via 2 groups of 4 cores 🤔
 
0
In any case, the usage of the AE over the C doesn't really change anything. I wonder what kopite7kimi thinks of this since there are no indications of the GPU being more than Ampere
I think it could be the case that Nvidia officially said that the GPU on AGX Orin is based on Ampere, but the GPU in AGX Orin borrows features from Lovelace, similar to how Nvidia officially said that the GPU on the Tegra X1 is based on Maxwell, but the GPU on the Tegra X1 borrows features from Pascal. kopite7kimi did mention that Lovelace is roughly similar to Ampere after all.

Anyway, I think a die shot of AGX Orin could offer more insight on the GPU and possibly which process node is being used. Also, some people will be disappointed there's no mention of RT cores in the AGX Orin technical brief. And considering Nvidia so far has made no mention of which process node is being used to fabricate AGX Orin, I'm inclined to believe Samsung's 8N process node is being used.
 
0
Knowing the 15W GPU SM/core clocks configuration would definitely give us what to expect from a dedicated 15W chip made for Nintendo with A78/8nm/Ampere technologies.

I would expect them halves the SM and CPU count with a significant downclock in order to reach tablet like TDP for switch 2. Maybe the 15W power configuration is already using switch 2 SM/CPU count.
 
Knowing the 15W GPU SM/core clocks configuration would definitely give us what to expect from a dedicated 15W chip made for Nintendo with A78/8nm/Ampere technologies.

I would expect them halves the SM and CPU count with a significant downclock in order to reach tablet like TDP for switch 2. Maybe the 15W power configuration is already using switch 2 SM/CPU count.
I don't think it would give us too much. If Xavier is anything to go by, the 15W mode just disables one or 2 of the cpu clusters and then down clocks everything. You still have access to all 2048 gpu cores and whatnot
 
I don't think it would give us too much. If Xavier is anything to go by, the 15W mode just disables one or 2 of the cpu clusters and then down clocks everything. You still have access to all 2048 gpu cores and whatnot
Switch 2 will obviously don't use 12 CPU cores and 2048 cauda cores. What I am interested to is the number of CPU cores and SM count and core clocks in order to reach the Orin 15W power profile.

A similar chip would easily use less power in order to reach the same performances due to the smaller die size and fewer hardware accelerated parts on the chip. It would give us the most realistic estimation for an 8 nm chip running with 4 to 8 A78 (I would expect th 15W power profile to deactivate one or two CPU clusters, maybe they will them all with an heavy downclock but I wouldn't expect them to have a 6 core profile as it would mean having to deactivate 2 cores in one cluster) and 8 SM as they are using 2 GPC with 8 SM.
 
0
Where did you manage to find this?
From the Jetson AGX Orin data sheet (v0.2). Regarding its GPU Architecture:
There are multiple texture processing clusters (TPC) units within a graphics processing cluster (GPC), each TPC includes two SMs, a Polymorph Engine, two Texture Units, and a Ray Tracing core (RTcore). Each GPC includes a Raster Engine (ROP), which can access all of memory. Each SM is partitioned into four separate processing blocks, each with its own instruction buffer, scheduler and 128 CUDA cores.

The GPC is a dedicated hardware block for rasterization, shading, texturing, and compute. The GPU's core graphics functions are performed inside the GPC. Inside the GPC, the SM CUDA cores perform pixel/vertex/geometry shading and physics/compute calculations. Texture units perform texture filtering and load/store units fetch and save data to memory. Special Function Units (SFUs) handle transcendental and graphics interpolation instructions. Tensor cores perform matrix multiplies to greatly accelerate DL inferencing. The RTcore unit assists ray- tracing by accelerating Bounding Volume Hierarchy (BVH) traversal and intersection of scene geometry during ray tracing.

Finally, the PolyMorph engine handles vertex fetch, tessellation, viewport transform, attribute setup, and stream output. The SM geometry and pixel processing performance make it highly suitable for rendering advanced user interfaces and complex gaming applications. The power efficiency of the Ampere GPU enables this performance on devices with power-limited environments.
Edit: I scanned through the data sheet quickly, and couldn't any info on how to drop it to 15w.
 
Please read this staff post before posting.

Furthermore, according to this follow-up post, all off-topic chat will be moderated.
Last edited:


Back
Top Bottom