PandaAndino
Tektite
the video was make with no new inside or consideration of 12gb of ram or dedicated platform vs pc. So it really is just a normal prediction like anyone can have.
No fucking way it's going to be that on 8nm.It was? Might be mixing them up, I do know NVN2 pretty much stated 660mhz (while on 8nm, to note).
Of course not, lol. Just saying 660mhz did show up in there under 8nm.No fucking way it's going to be that on 8nm.
The leak did state 8nm (or more accurately Samsung, it didn't specify node) in one place, but that could easily have been a copy paste from Orin.
2048 cores at 750 MHz is 3.07TF and 1536 cores at the speculated 1.1GHz is 3.38TF. That's just a 10% increase in performance.A big miss with the DF test was ignoring power draw. How many watts was the RTX2050 pulling when clocked at 750Mhz? If its over 20 watts, that should have been a red flag that T239 cannot be on 8nm. If its not on 8nm then it is very likely 4N and 750Mhz is closer to the portable profile rather than docked. If the docked profile does end up being 1.1Ghz, that would mean those tensor cores will be clocked 32% higher than in their test.
assuming we're talking about Thraktor's 660mhz as the base
Yea, Im just saying if a miracle happens and Nintendo somehow managed to make 12sm work on 8nm (which would be impressive for all the wrong reasons), Im expecting closer to Switch 1 clocks.Of course not, lol. Just saying 660mhz did show up in there under 8nm.
Eurogamer article on the DF video, adding a few thoughts
Inside Nvidia's new hardware for Switch 2: what is the T239 processor?
What new hardware has Nvidia cooked up for Nintendo? Digital Foundry puts together all available info on the cutting-edge T239 processor.www.eurogamer.net
Based on everything I've learned about T239 over the months, I think my biggest surprise is how ambitious the hardware is. Given the chance to re-architect a new, games-centric processor, Nvidia could have followed the route it laid down with its GTX 16-series graphics cards by stripping out machine learning and RT hardware, or paring them back to the minimum with cut-price alternatives (like the way AMD integrated hardware RT into RDNA 2 with an emphasis on the lowest silicon cost possible). Instead, the evidence suggests that Nvidia and Nintendo have agreed on a design that arguably over-delivers on next-gen features, while the File Decompression Engine (FDE) points to quality of life additions to the silicon that a firmly orientated around a great gaming experience. Could this be the most forward-looking hardware design from Nintendo since the GameCube?
I'd like to quote this part too, for those wondering how the lack of memory impacted things for this 2050...
Be still, my beating heart. Leadbetter sure knows how to hype up T239
I couldn't get The Matrix Awakens running owing to the 4GB memory limitation on the RTX 2050 and it seems that the demo requires around 5.6GB at the minimum. However, the fully enabled Fortnite UE5 experience using Lumen, Nanite and Virtual Shadow Maps does work. Indeed, unlike the console versions, we do actually have access to hardware accelerated ray tracing in the PC version and yes, it does run on the RTX 2050 laptop. I tested hardware RT Lumen, its software variant, and I also tested at native 720p and several DLSS variants, all resolving to a 1080p output. For the record, software RT with DLSS performance mode (540p in, 1080p out) is the winner but between best and worst results here, there's just a 9.5 percent difference, and in fps terms that's just 2.82fps.
My guess is that the Matrix Awakens demo for Switch 2 based on target hardware probably is real, but Epic must be doing a lot of work in making Unreal Engine 5's key features run well on the new Nintendo machine, simply because The Matrix Awakens is a lot heavier on the GPU than Fortnite. That being the case, just running PC code on a low power Ampere GPU isn't likely to be that revealing. Still, it does run. You can use hardware RT. DLSS is helpful.
Thraktor is expecting something between 500MHz/1GHz and 550MHz/1.1GHz
Do we know how much VRAM the T239 has? If it's 8GB, that's pretty damn good overall. Running RE4R with a 4070 Laptop GPU was already pretty snazzy, even if the RT on that game is considered "Eh" in comparison to a lot of other games.I'd like to quote this part too, for those wondering how the lack of memory impacted things for this 2050...
It's hard to say, VRAM in consoles is always shared with system memory. I'd say the absolute max for a 12 GB Switch 2 would be 8 GB, yeah.Do we know how much VRAM the T239 has? If it's 8GB, that's pretty damn good overall. Running RE4R with a 4070 Laptop GPU was already pretty snazzy, even if the RT on that game is considered "Eh" in comparison to a lot of other games.
Coming from a place wholly inexperienced with building laptops/desktops, I'm guessing there was no real way to swap out the RAM from a pre-built Dell Vostro to bump it up from 4GB to 6GB, let alone 8GB?I'd like to quote this part too, for those wondering how the lack of memory impacted things for this 2050...
With the knowledge that financials briefings are done in February, late April/early May, late July/early August, and late October/early November...I have a question have Nintendo ever done a reveal close to a investors meeting before? It would be interesting to see the history of the investor meetings and what they announced at them
Mainly in regards with 3rd party titles. My perspective basically changed from "some Nintendo games and some third party titles should be able to hit that golden benchmark" to "probably only Nintendo games are going to be able to hit it". It's not much of a shift in perspective but I can't really imagine we'll be seeing any games from people outside of Nintendo hitting 4k60 unless they're specifically designed for the Switch 2 itself, which I don't really think anything will be unless Capcom decides to make a Switch-centric Monster Hunter again like they did with Rise (which seems unlikely given the rumors surrounding MH6).Excuse me but, what were your expectations at the first place? This is a close approximation to the chip in the console lacking memory, console optimization and using Windows... DF himself stated this is all on the lower end of the spectrum, since T239 is a customized chip for a reason, the very fact it's already performing this incredibly well does all but lower expectations. And keep in mind this is under handheld clocks, docked should perform very similar to the stock 2050 out there (if not better).
Coolio. Sounds about right.It's hard to say, VRAM in consoles is always shared with system memory. I'd say the absolute max for a 12 GB Switch 2 would be 8 GB, yeah.
Easier to do with desktops than laptops. I’ve never even tried to open up laptops except change to new battery.Coming from a place wholly inexperienced with building laptops/desktops, I'm guessing there was no real way to swap out the RAM from a pre-built Dell Vostro to bump it up from 4GB to 6GB, let alone 8GB?
It'll be a unified memory pool. The CPU and GPU will be sharing the entire main system RAM.Do we know how much VRAM the T239 has? If it's 8GB, that's pretty damn good overall. Running RE4R with a 4070 Laptop GPU was already pretty snazzy, even if the RT on that game is considered "Eh" in comparison to a lot of other games.
I agree. The lack of power data for the RTX 2050@750MHz was very disappointing. I'm also perplexed by this:A big miss with the DF test was ignoring power draw. How many watts was the RTX2050 pulling when clocked at 750Mhz? If its over 20 watts, that should have been a red flag that T239 cannot be on 8nm. If its not on 8nm then it is very likely 4N and 750Mhz is closer to the portable profile rather than docked. If the docked profile does end up being 1.1Ghz, that would mean those tensor cores will be clocked 32% higher than in their test.
He says that 540MHz is the efficiency sweet spot for T234 GPU and it's a good candidate clock for Handheld Mode. He also says 1GHz is the sweet spot for CPU. And of this based on Samsung 8N. However, look at this quick power test from Jetson Power Tool made by Alovon:DF Rich:
On the clocks question, Nvidia provides a tool for calculating power consumption on the T234 and it suggests that the GPU is most efficient at 540MHz, a good candidate for handheld clocks. Efficiency drops alarmingly the faster you clock it, and the more power required, the more heat is generated. I'd be surprised if docked clocks exceeded 1GHz and it could be a fair degree lower. 1.0GHz also looks like the sweet spot for CPU clocks too. However, these are calculations for a larger Tegra chip that's definitely on the 8nm Samsung process.
To clear things up:It was? Might be mixing them up, I do know NVN2 pretty much stated 660mhz (while on 8nm, to note).
You can change memory on Desktop or Laptop GPUs. But you need significant eletronics and soldering skills and some bios flashing and what not. It's something very specialized and for people who really know what they're doing. Otherwise, no. GPU VRAM are fixed and can't be upgraded by the user, either on Desktop or Laptops.Easier to do with desktops than laptops. I’ve never even tried to open up laptops except change to new battery.
This crossed my mind too, wondering how much work it would be to swap the RAM to 8gb
Flops is calculated by: 2 * number of cores * clockAre there estimated teraflops and power consumption figures for 500 MHz, 1 GHz, 1.5 GHz, and 2 GHz?
Due to it being shared it won't be 8 GB at all times, because the game could end up demanding more. But yeah, accurate enough.Coolio. Sounds about right.
8GB seems to be the base standard for modern games, however you usually have to trade settings in order to get it running. I can run Deathloop at absolute cranked max settings but cannot enable RT no matter what unless I turn down other settings or suffer the consequences of insane stuttering.
Flops is calculated by: 2 * number of cores * clock
2 * 1536 cores * 0.500 = 1.536 TF
The other 3 clocks are exactly 2x, 3x and 4x that exactly.
And if you want a rough mental estimate for T239, just multiply by 3 and you will get quite close (1.1GHz = a bit above ~3.3TF).
Ah right, GPU ram is not like the regular RAM I was thinking of, they’re directly inside the GPU. Yeah now I understand why it’s not easyYou can change memory on Desktop or Laptop GPUs. But you need significant eletronics and soldering skils and some bios flashing and what not. It's something very specialized and for people who really know what they're doing. Otherwise, no. GPU VRAM are fixed and can't be upgraded by the user, either on Desktop or Laptops.
Checked the site for the first time today and I see chat exploded for a different reason. Can't watch the Digital Foundry video, I'm in class rn, is it looking good or bad? I saw some gif reactions and a lot of back and forth on clock speeds when scrolling back through.
Hey, this is actually a fantastic post! Thank you for digging and finding these posts by Thraktor, LiC and Oldpuck. They're insanely informative and had some discussions that I had actually missed.
Yeah it is.@bmfrosty its you!
Matrix could have ran on there, the issue was the lack of vram. Just to note.interesting video but clearly missing big pieces of the puzzle, given the Matrix Demo wasn't viable. and the seemingly impossible 4k60 which Nintendo has already demoed shows where things don't add up.
- clock speeds higher if not on 8nm
- Zelda demo upscaling from 1080 or higher to hit 4k/60
- further optimizations
still looking good but i think most of us expect T239 to surpass that laptop GPU significantly. the process node remains a sticking point, if it was known to be 4N expecations would raised across the board ie with mainstream commentators, methinks.
GPU load affect power consumption, so I don't think we will ever get how much it would consume to have the CUDA cores idle and boost the clock just for DLSS.Yeah, that makes sense.
Just curious how much power consumption would increase if you overclocked the tensor cores during the DLSS step as the CUDA cores are idle during the DLSS step...
There are 32 times as many CUDA cores as tensor cores so if you had all of the CUDA cores idle and then cranked the tensor cores up 4 times, jacking the power consumption up let's say 16x... You still end up consuming less power than in the non-DLSS step it would seem like from a naive perspective?
Yup it tells us T239 has more than 4gb which limes up with what we were thinking (6 or 8gb)Matrix could have ran on there, the issue was the lack of vram. Just to note.
the current thinking seems to be 12 or 16gb with 16 being less likely. but definitely 12 if they want this thing to have legs.Yup I think it tells us T239 has more than 4gb which limes up with what we were thinking (6 or 8gb)
Now that you mention 12gb and 16gb, that seems right (I'm now recalling mentions of those 2 numbers).the current thinking seems to be 12 or 16gb with 16 being less likely. but definitely 12 if they want this thing to have legs.
a 2050 has 4GBNow that you mention 12gb and 16gb, that seems right (I'm now recalling mentions of those 2 numbers).
What amount did the 2050 in the DF video have again? Was it 4gb? 4gb -> 12gb is a big jump but I could be misremembering again, was it 8gb?
The GPU had 4gb dedicated, but the laptop had another 16gb for the CPU.Now that you mention 12gb and 16gb, that seems right (I'm now recalling mentions of those 2 numbers).
What amount did the 2050 in the DF video have again? Was it 4gb? 4gb -> 12gb is a big jump but I could be misremembering again, was it 8gb?
The big problem with the DLSS testing in the vid is that it ignores PC games often scale stuff like LOD depending on OUTPUT resolution.
Obviously 4K DLSS looks unfeasable : because they're running the 4K LODs and/or post-process and/or textures.
Using DF's methodology on Techpowerup's data we can say a 3060 needs 2.9 ms for 4K dlss.
Considering Nvidia's number for the 3060 TI is 1.52ms...
I doubt the 3060 is twice slower than a TI.
And DF's numbers are made using depth of field, which would cripple performance if it's running at output resolution. Techpowerup's numbers are without depth of field, and yet still seem to indicate an unrealistic runtime.
* Hidden text: cannot be quoted. *
GPU load affect power consumption, so I don't think we will ever get how much it would consume to have the CUDA cores idle and boost the clock just for DLSS.
But I don't think we're going to see this situation. If the CUDA cores aren't used much for the DLSS, then I would expect the GPU move to the next frame while DLSS works on the current frame. This or games will just stick to lower output resolution and/or 30fps.
For example, BotW 4K60 could in theory be achieved by:
0 to 16.7ms: CPU work on the first frame
16.8 to 33.3ms: GPU work on the first frame, CPU works on the second frame
33.4 to 50ms: GPU is doing DLSS on the first frame and working on the second frame, CPU works on the third frame
As long as DLSS can be done within 16ms with enough room for the GPU also render a new frame, this could be achieved. This doesn't make sense on PCs with fast DLSS because it increases latency, but it's right within the compromises/optimization you can expect on consoles.
If you can get DLSS for 4K to run below 16.6ms, and run DLSS on Tensor cores in parallel with CUDA, it would be possible to obtain 60FPS (with input lag of 30Hz but still a very interesting result).Wheeeeeeeeeeeee, the CUDA cores are idle while the tensor cores are active.
"
To accelerate the execution of Machine Learning
applications, recent GPUs use Tensor cores to speed up the
general matrix multiplication (GEMM), which is the heart of
deep learning. The Streaming Processors in such GPUs also
contain CUDA cores to implement general computations. While
the Tensor cores can significantly improve the performance of
GEMM, the CUDA cores remain idle when Tensor cores are
running. This leads to inefficient resource utilization. In this
work, we propose to offload part of the GEMM operations from
Tensor cores to CUDA cores to fully utilize GPU resources.
We investigated the performance bottleneck in such offloading
schemes and proposed architectural optimization to maximize the
GPU throughput. Our technique is purely hardware-based and
does not require a new compiler or other software support. Our
evaluation results show that the proposed scheme can improve
performance by 19% at the maximum."
Utilizing both doesn't help as much as you would think as it's hard to offload parts of it.
This is a pretty big issue for NVIDIA to solve, we'll see.
"CPU RAM" size doesn't really help with moment-to-moment performance, though. Primarily, you just need enough to fit in all the assets that your render budget can handle.The GPU had 4gb dedicated, but the laptop had another 16gb for the CPU.
If you can get DLSS for 4K to run below 16.6ms, and run DLSS on Tensor cores in parallel with CUDA, it would be possible to obtain 60FPS (with input lag of 30Hz but still a very interesting result).
I know, I'm just noting that 4GB of dedicated RAM for the 2050 can't be compared directly to the larger quantity of shared RAM on T239."CPU RAM" size doesn't really help with moment-to-moment performance, though. Primarily, you just need enough to fit in all the assets that your render budget can handle.
Can't help but think that this could mean a bunch of titles relying too much on DLSS will end up with notable input latency.I've watched the DF video now and I recommend everyone to watch it later!
Rich readily concedes that it's a very imperfect comparison and of course software will be optimized for the specific hardware of the new hardware.
Having said all that, the most important finding is that upscaling to 4k from 720p with DLSS takes about 18ms. That is obviously longer than a single frame time for 60fps and also untenably high for 30fps. He did mentioned though that the T234 has special ML-accelarating hardware, and if the T239 retain some of that, those figures could well be lower.
But the TLR I got out of the video is that we shouldn't except 4k output and we shouldn't expect current generation titles to run at 60fps. It will be even harder than with Series S to eke out an acceptable performance mode.
But again, lots we don't know about the hardware and lots that can be achieved with smart optimization against a specific hardware target.
for real. the switch 2's t239 system-on-a-chip having RT cores and tensor (for AI processing) cores means after multiple generations nintendo is using hardware features that are more advanced than the ps5 and xbox series x hardware features
Be still, my beating heart. Leadbetter sure knows how to hype up T239
RTX 2050 is Ampere based. Hence why he's using as an approximate GPU to T239. RTX 2050 = RTX 3050M but with half the memory bus. The use the same die, GA107.Are the tensor cores of an RTX 30XX faster than the ones found on an RTX 20XX other than possible higher clocks? If they are, is the difference enough so that 48 TC's from 30 series can match 64 TC's from 20 series?
If DLSS scales this badly at 2 teraflops or 4 teraflops then the Switch 2's potential is massively lowered as DLSS would be unusable and FSR2 would likely be unusable as well (forcing the Switch 2 to have most of its cycles eaten up by native resolution), but I'm pretty doubtful of this.
Frankly, that DF video has me hyped even more.
It's just a ballpark estimation but they're running modern PC games with zero porting/optimization effort, and it runs pretty well at 1080p.
I don't care about 4K and never did. I want it to look good handheld primarily.
I have no doubt good port studios will do wonders with a little effort on any current-gen title, specially with Steam handhelds setting the baseline.
And regarding first-parties, when I see Metroid Prime or TotK running on a potato X1, I can barely imagine what they could do on the t239.
TBF think that it's memory framebuffer and clock related for DLSS's cost.DLSS 720p>4K is like twice as expensive in frametime as DLSS 720p>1440p going by NVIDIA's documentation so I would expect 720p>1440p to be the standard.
But 18ms sounds incredibly high compared to NVIDIA's documentation.
Obviously these chips are all much stronger than the Switch 2, but the gap shouldn't be this large unless DLSS starts scaling really badly at low spec.
If it does, hopefully NVIDIA can develop a pruned DLSS neural network for the Switch 2... We'll see if they're interested as pruned neural networks are really expensive (in terms of dollars and manpower) to create and the results are less good (obviously)