- Pronouns
- He/Him
Sounds like wishful thinking to me. All of the evidence pointed to 8nm and there wasn't really any reason to think that changed.Some folks held out hope we'd get something better.
Sounds like wishful thinking to me. All of the evidence pointed to 8nm and there wasn't really any reason to think that changed.Some folks held out hope we'd get something better.
Thanks for this! I knew about the DLSS but this is a very helpful explanation of the other stuffSimplified version, @Mbolibombo and @Chaotic Neutral :
The next Nintendo chip (codenamed Dane) is very likely to be derived from the chip they just announced (a cut down version, maybe shrank if we're being really optimistic about the "Orin X" chip, which has yet to be detailed).
There's not much we can take from this in terms of how powerful Dane will be besides more confidence on the previous ballpark ("it should be able to play XB1/PS4 games fine, DLSS should make games look closer to a PS4Pro version with some caveats"), since we don't know how much was cut nor how much they will lower the clocks (for sustained performance and power budget). There are many YouTube videos explaining better than words could what you can expect from DLSS if you're not familiar with it.
For tech enthusiasts, it means a more accurate idea what it can and can't do compared to extrapolating from PC/notebook graphic cards which weren't made with such a low power budget in mind and what tech improvements they made since they finished those cards. For example, I have more confidence that they will use DLSS as is rather than a lighter version and (Limited) Ray Tracing seems more likely now.
As for AAA ports, I would expect companies bringing PS360 ports now to move to PS4X1 ports and about as many PS5 "impossible ports" as we got PS4 "impossible ports" on OG. Overall, support should be better, since the publishers who took a couple years to get onboard might be there from the beginning, but don't expect anything close to parity (there are many reasons for games skipping the Switch besides power).
That was always based on nothing but hopes and dreams. Every single piece of rumour we heard, has indicated 8nm.
Some folks held out hope we'd get something better.
It seems very likely to me that he is talking about training here, not inference on the tensor cores. That’s the audience that a deep learning SDK would be targeting, anyway. (With the caveat that I haven’t watched the presentation that fwd-bwd mentioned about chaining/tilting, which I have never heard of.)NVIDIA increased the speed of the Deep-Learning SDK by 2-3x (Jensen Said 3X, lowballing it to a 2x increase minimum)
Thanks for laying this out for the tech illiterate like myself, Am officially hyped for Dane nowSimplified version, @Mbolibombo and @Chaotic Neutral :
The next Nintendo chip (codenamed Dane) is very likely to be derived from the chip they just announced (a cut down version, maybe shrank if we're being really optimistic about the "Orin X" chip, which has yet to be detailed).
There's not much we can take from this in terms of how powerful Dane will be besides more confidence on the previous ballpark ("it should be able to play XB1/PS4 games fine, DLSS should make games look closer to a PS4Pro version with some caveats"), since we don't know how much was cut nor how much they will lower the clocks (for sustained performance and power budget). There are many YouTube videos explaining better than words could what you can expect from DLSS if you're not familiar with it.
For tech enthusiasts, it means a more accurate idea what it can and can't do compared to extrapolating from PC/notebook graphic cards which weren't made with such a low power budget in mind and what tech improvements they made since they finished those cards. For example, I have more confidence that they will use DLSS as is rather than a lighter version and (Limited) Ray Tracing seems more likely now.
As for AAA ports, I would expect companies bringing PS360 ports now to move to PS4X1 ports and about as many PS5 "impossible ports" as we got PS4 "impossible ports" on OG. Overall, support should be better, since the publishers who took a couple years to get onboard might be there from the beginning, but don't expect anything close to parity (there are many reasons for games skipping the Switch besides power).
Sorry for my typo; it was tiling not tilting. I'll edit my post.It seems very likely to me that he is talking about training here, not inference on the tensor cores. That’s the audience that a deep learning SDK would be targeting, anyway. (With the caveat that I haven’t watched the presentation that fwd-bwd mentioned about chaining/tilting, which I have never heard of.)
Limited in terms of functionality. Kinda like what you see on Series S.Well, none of us expected hundreds of RT cores, if the RT core count is what he means by "limited".
I'm really curious now how 4-8 RT cores compare to a 6500XT (a hypothetical Series S gpu) given the differences in AMD's method (intersection testing) and Nvidia's method (BVH traversial and intersection testing)Limited in terms of functionality. Kinda like what you see on Series S.
in nvida's talk about medial stuff, they mention using ray tracing there, so Orin would be used there. however, this is also for the updated Clara, so there's also a separate gpu to extra performance. there's also some research in using ray tracing in depth perception and identification stuffIt almost seems as if they're there for compatibility reasons, or maybe they found some limited automotive use-cases for them and decided to keep some limited functionality there.
Anandtech mentioned that Jetson AGX Orin (and Jetson Orin NX by extension) has 17 billion transistors. I think Orin X could be the only chip in the Orin family that has 21 billion transistors.Using each of the three photos, the calculation in each case comes to a 22.1mm x 20.8mm die (+- about 0.1mm), for a die size of approx 460mm2. This tells us a few things:
- The Jetson Orin NX chip is the full Orin die, just binned with parts disabled. This is as I would have expected, but good to get confirmation.
- The Orin die has a density of approx 45.6 million transistors per mm2, assuming 21 billion transistors is still correct.
- This is in line with the density of GA102, GA104, etc., so it's likely using an identical 8N manufacturing process, and isn't using higher-density mobile libraries.
Are we sure Devastating is the right word here?Meanwhile I'm reading about new titles supporting DLSS (most recently Jurassic World Evolution 2) and I imagine Switch Dane could potentially have a devastating line-up (not obviously all at launch).
I hope by now even medium and small softwarehouses have their own dev-kits.
Are we sure Devastating is the right word here?
Fair.for my wallet yes
Really a lot of stuff.Fair.
Either way, the cache change alone would boost the system around 30% over Ampere.
So 2TFLOPS ampere would become 2.6TFLOPs effectively, so after DLSS Performance it would be above the GTX 1660 Super, and after DLSS Ultra Performance, it would be pushing near the RTX 2070.
And that is only the Cache improvement.
The FP16 Change if used can at least likely bring a 20% boost.
So that would make it equivalent to a GTX 1650 before DLSS, and then around a 2060 after DLSS performance.
Pretty crazy stuff.
Fair.
Either way, the cache change alone would boost the system around 30% over Ampere.
So 2TFLOPS ampere would become 2.6TFLOPs effectively, so after DLSS Performance it would be above the GTX 1660 Super, and after DLSS Ultra Performance, it would be pushing near the RTX 2070.
And that is only the Cache improvement.
The FP16 Change if used can at least likely bring a 20% boost.
So that would make it equivalent to a GTX 1650 before DLSS, and then around a 2060 after DLSS performance.
Pretty crazy stuff.
Are you forgetting the steam deck?Really a lot of stuff.
If it weren't for you experts, I'd have a really hard time imagining a small portable device capable of unleashing this much power.
I wonder if there will still be developers able to find excuses not to bring their titles to Nintendo's next console.
To be fair, the Deck is significantly larger than the OG Switch. I expect Switch 2 to br about the same size.Are you forgetting the steam deck?
SD really pushes the envelope of a "small, portable" device though.Are you forgetting the steam deck?
well in theory yes, but Switch is smaller and really portable, Steam deck...Are you forgetting the steam deck?
looking at the gpu clock, I can't help but think we'll see Switch clocks again, but on the high end. so 920MHz and 460MHzIt's worth to point out that the chip with 2 TFlops is the Orin NX. We don't know how many TF the chip Nintendo is going to use will have, but it's very unlikely it will have as many gpu cores as the NX and run the GPU at max clock on top of it.
Steam Deck feels like it's really just a laptop Valve carelessly shoved into a "handheld"SD really pushes the envelope of a "small, portable" device though.
Doesn't the NX only use half of the GPU cores of full Orin already?It's worth to point out that the chip with 2 TFlops is the Orin NX. We don't know how many TF the chip Nintendo is going to use will have, but it's very unlikely it will have as many gpu cores as the NX and run the GPU at max clock on top of it.
What GPU do you have?I’m excited for a handheld console with a better GPU than my PC
It is more likely that the 2.6FLOP/Stronger than 1050T-Weaker than GTX 1650 number is the native processing power Dane would have (1660 Super after DLSS performance, Around 2070 after DLSS Ultra Performance) considering the full "Potential" 3TFLOP number requires high levels of optimization using the double FP16 that Orin/Dane has vs Ampere.
You are known for your enthusiasm regarding the succ specs but that is going far.Fair.
Either way, the cache change alone would boost the system around 30% over Ampere.
So 2TFLOPS ampere would become 2.6TFLOPs effectively, so after DLSS Performance it would be above the GTX 1660 Super, and after DLSS Ultra Performance, it would be pushing near the RTX 2070.
And that is only the Cache improvement.
The FP16 Change if used can at least likely bring a 20% boost.
So that would make it equivalent to a GTX 1650 before DLSS, and then around a 2060 after DLSS performance.
Pretty crazy stuff.
Based on the recent Steam hardware survey https://store.steampowered.com/hwsurvey/videocard/What GPU do you have?
It is more likely that the 2.6FLOP/Stronger than 1050T-Weaker than GTX 1650 number is the native processing power Dane would have (1660 Super after DLSS performance, Around 2070 after DLSS Ultra Performance) considering the full "Potential" 3TFLOP number requires high levels of optimization using the double FP16 that Orin/Dane has vs Ampere.
Yes, but the NX is a binned Orin, so it doesn't have to worry about die size (which is 460mm2 for both). It also uses 25W at max clocks which is roughly twice what OG Switch used docked (max 18W with 4W being from charging IIRC).Doesn't the NX only use half of the GPU cores of full Orin already?
You are known for your enthusiasm regarding the succ specs but that is going far.
Based on the recent Steam hardware survey https://store.steampowered.com/hwsurvey/videocard/
1060
1050ti
1650
and Intel integrated graphics (collectively) remain the most common GPU setup. If Switch 2 is as you describe, it would easily beat those 3 setups before DLSS.
Orin NX is pretty much Dane outside of the CPU and DLA's (Config-wise at least)Can someone give me a summary of what this is all about Dane and his speculated specs please? And what performance would it suppose compared to another console? (for example PS4, PS4 Pro, Xbox One, Xbox One X ...)
And would Dane be a Pro or Successor model?
Also today I saw that new SoCs were introduced by Nvidia, called Jetson Orin NX and Jetson AGX Orin, and that they are being rumored about them on a Nintendo console. Which of the two would it be? Since according to the specifications table it seems that Jetson AGX Orin is much more powerful, also, would AGX Orin/ Orin NX be Dane like a Pro model? or would it be a successor? Thank you and forgive for all the questions and my infinite ignorance.
That's up to Nintendo not the hardware. Here's the summary.And would Dane be a Pro or Successor model?
How did you reach this conclusion again?Orin NX is pretty much Dane outside of the CPU and DLA's (Config-wise at least)
Because they literally halved Orin's GPU for Orin NX, and the CPU config for 8CPU cores is the most likely one for Dane (Outside of swapping it for A78Cs vs A78AEs)How did you reach this conclusion again?
We started discussing it from the middle of page 40. You may want to skim through them quickly. There are a few posts with good summaries.Also today I saw that new SoCs were introduced by Nvidia, called Jetson Orin NX and Jetson AGX Orin, and that they are being rumored about them on a Nintendo console. Which of the two would it be? Since according to the specifications table it seems that Jetson AGX Orin is much more powerful, also, would AGX Orin/ Orin NX be Dane like a Pro model? or would it be a successor? Thank you and forgive for all the questions and my infinite ignorance.
it's less Dane will be one of these, but rather Dane will be based on these designs. what we see here could trickle down to Dane. we might get something totally different (unlikely though).Also today I saw that new SoCs were introduced by Nvidia, called Jetson Orin NX and Jetson AGX Orin, and that they are being rumored about them on a Nintendo console. Which of the two would it be? Since according to the specifications table it seems that Jetson AGX Orin is much more powerful, also, would AGX Orin/ Orin NX be Dane like a Pro model? or would it be a successor? Thank you and forgive for all the questions and my infinite ignorance.
NX is a binned chip, it's normal that about half of it is disabled.Because they literally halved Orin's GPU for Orin NX, and the CPU config for 8CPU cores is the most likely one for Dane (Outside of swapping it for A78Cs vs A78AEs)
I'm not surprised with this extrapolation given you're the most optimistic in this thread by a good margin. Maybe you're even right about 2TF and I'm being too pessimistic, but please be careful about stating opinions and deductions as a fact or as a consensus in the thread.Also, the memory bandwidth of Orin NX was halved, and in Orin AGX and Orin NX, the Memory bandwidth looks to be scaling with the GPU (2048 CUDA in Orin AGX, for 204.8GB/s of bandwidth, 1024 CUDA in Orin NX, with 102.4GB/s of bandwidth). So 1MB/s per CUDA core.
So if they are scaled like that for maximum/most efficient performance for bandwidth (or something with the extra L1 and L2 Cache), then that sort of leaves them either dropping it to 4SMs at 102.4GB/s for 8GB-12GB/s of LPDDR5 if it's going to be less than 8SMs. resulting in 2MB/s per CUDA core.
8SMs is just the most likely scenario now.
The Tegra X1/X2 and Turing GPUs have mixed precision (combining use of fp16 and fp32) enabled which makez the Switch and Turing GPUs more performant per flop from on paper than Ampere.Fair.
Either way, the cache change alone would boost the system around 30% over Ampere.
So 2TFLOPS ampere would become 2.6TFLOPs effectively, so after DLSS Performance it would be above the GTX 1660 Super, and after DLSS Ultra Performance, it would be pushing near the RTX 2070.
And that is only the Cache improvement.
The FP16 Change if used can at least likely bring a 20% boost.
So that would make it equivalent to a GTX 1650 before DLSS, and then around a 2060 after DLSS performance.
Pretty crazy stuff.
I would be surprised if we don't have all the cores. The NX is already using half of the original Orion GPU cores and 8/12 cpu cores, but at 25 watts max.It's worth to point out that the chip with 2 TFlops is the Orin NX. We don't know how many TF the chip Nintendo is going to use will have, but it's very unlikely it will have as many gpu cores as the NX and run the GPU at max clock on top of it.
Again I will note, I am talking moreso about the Laptop GPUs when I refer to them.The Tegra X1/X2 and Turing GPUs have mixed precision (combining use of fp16 and fp32) enabled which makez the Switch and Turing GPUs more performant per flop from on paper than Ampere.
I'm not sure if the reason is because ampere apparently has half the fp16 as turing (please correct me if I'm wrong) or what. But perhaps that fp16 change in Orion architecture puts it more in line in performance with Turning.
And this of course also doesn't even count the cache increases. Which like you said would boost switch's performance considerably.
Even with DLSS, I think comparing it to an RTX 2070 is a bit reaching though, let alone 2060.
Despite Nvidia bring wizards on efficiency and power, There's only do much you can do on a 8nm node. I'm expecting 15 watts max on docked mode, but hoping they/Nintendo can push out as much performance as possible and match the battery life of V1 switch, while having the matters life of V1/OLED switch in the revision (along with LPDDR5x) in a revision on a 5nm or. smaller node. I do feel more comfortable we match ps4 in docked without DLSS though.
Let's recapitulate: a 1660 Super has a rated TDP of 125W, has 6.6 billion transistors occupying 284 mm2 of space and is built on the 12Nm TSMC node. The Dane chip should consume a frugal 10W, have say 4 billion transistors (if we go by Thraktor and Dakhil estimations/sources), fit inside the original Switch's board dimensions of 118 mm2 and will probably be built using a 8nm node if what Nate has heard isn't outdated.Well, it's just the numbers.
We have generally agreed on 2TFLOPs ampere.
But then NVIDIA goes and doubles the L2 Cache number and FP16 Performance, and adds 50% more L1 Cache into the Orin Architecture.
Remember, AMD got 30-40% of the 54% boost over RDNA1 with Infinity Cache, and that was L3 Cache which is slower than L1 and L2.
So something like that but with far faster memory and Ampere being more memory-efficient than RDNA1/RDNA2 could at least hit that 30% low-end number for a boost.
Therefore, 2 Orin TFLOPs, equates to 2.6 Ampere TFLOPs.
Double that for DLSS Performance mode, you get 5.2TFLOPs, which is above the 5TFLOPs of the 1660 Super.
It's simple math.
And the FP16 thing has an unknown performance boost and can swing wildly between the 40-50% boost FP16 optimization in games for Pascal got, or a lower number, but they only would need to hit a 20% increase to take that 2.6TFLOP number and push it to 3 TFLOPs, meaning it falls into the ballpark of the GTX 1650 natively
Now, it would run more like the Laptop versions of those cards due to the power-draw and thermals.
But it's still impressive and with how DLSS has advanced, it has room to grow even further with things like DRS+DLSS and with how tight-knit ARM SoCs are on communication between components like the shared memory pool vs Desktop and Laptop GPUs.
again, I am saying that it would perform like the LAPTOP 1660 AFTER DLSS Performance mode as it can effectively double the performance.Let's recapitulate: a 1660 Super has a rated TDP of 125W, has 6.6 billion transistors occupying 284 mm2 of space and is built on the 12Nm TSMC node. The Dane chip should consume a frugal 10W, have say 4 billion transistors (if we go by Thraktor and Dakhil estimations/sources), fit inside the original Switch's board dimensions of 118 mm2 and will probably be built using a 8nm node if what Nate has heard isn't outdated.
In order to have a first estimate of the gap in performance between the two chips (before applying DLSS or DLAA), I's suggest we determine how what impact these metrics above have. For starters, If we assume that the FLOPS scale linearly with the transistor count and TDP, then we obtain that our hypothetical Dane has an output that is (6.6/4)*(125/10) times smaller than the 1660. That is a factor of 18.
Of course, the premises are wrong (maybe even completely) and I haven't touched on memory cache configuration, CPU and CUDA core counts, how advanced DLSS might be by the time the unit launches, etc. But in my opinion, it is a better starting point for guesstimating what to expect from Dane. I am not saying you are wrong though but finding out how clock speeds, TDP, CUDA core counts, die size, cache configuration and amout, and node technology affect a chip's final TFLOPS output would help us grounding our expectation in reality massively.
I get your point but why is it apples to oranges to compare two architectures? Doesn't any comparison between chips eventually boil down to comparing a set of data with one another? I am not trolling, I am genuinely asking for clarification here.again, I am saying that it would perform like the LAPTOP 1660 AFTER DLSS Performance mode as it can effectively double the performance.
Orin at native would likely perform around the LAPTOP 1650 if the FP16 boost is used to peak effeictviness
Also, the Orin uArch is looking to be different than Ampere by more than we thought due to the FP16 change, so trying to compare transistor density, TDP, clock, die size is a comparison for naught really between Turing and Orin because it's Apples to Oranges.
They are quite a bit removed at this point.
Please can we not do this? Not to have a go at you specifically, but it's alright to dare to dream, however wild some might perceive said dreams to be. Let's leave "policed and curbed expectations" at the New Old Place - It's a speculation thread, so, exploring all possible avenues ought to be welcome. Nothing is so unthinkable in that regard, and if it doesn't happen, it doesn't happen. The onus should be on one's self, as an adult, to get a grip of one's emotions, rather than tell others they're too optimistic.You are known for your enthusiasm regarding the succ specs but that is going far.
True, it's just there comes a point where trying to get pinpoint accuracy just causes more inaccuracies due to those different architectures.I get your point but why is it apples to oranges to compare two architectures? Doesn't any comparison between chips eventually boil down to comparing a set of data with one another? I am not trolling, I am genuinely asking for clarification here.
costs. Nvidia did a lot of the work for 8nm. changing that, someone is gonna have to foot the bill. and costs of newer nodes are SIGNIFICANTLY more expensive. not to mention all these newer nodes have practically 0 capacity at any fabSo why is it ruled out that Nintendo will not go for a smaller node? I thought they were apparently not happy with the power efficiency of 8N and delayed the DSLL model (so we only got the OLED screen).
Dane is derived from Orin, not Orin NX, which is a binned chip. While the NX's 25W power profile is promising... like I said earlier, Orin is a 460mm2 chip, almost 4 times bigger than OG TX1 and it will need to cut more than 1/3 of the cpu core and half the SMs to fit in a similar sized SoC.I would be surprised if we don't have all the cores. The NX is already using half of the original Orion GPU cores and 8/12 cpu cores, but at 25 watts max.