Let's go quickly over the main changes before going into the details.
Patch notes :
-Complete replacement of all DF data by official Nvidia data
-less variables, which means a smaller margin of error
-Introduced 1080p, removed 1440p
-Linear function used previously replaced by a 3rd degree polynomial (most important)
-Making it look slightly cooler
First thing I want to talk about is a correction : I claimed in a "technical stuff" section in a previous post that DLSS was not just linked to output resolution; I was wrong. This assumption came from the DF data in their second video, but the data in that video turned out to be... meh. Main issue is that the methodology worked perfectly for the purpose of the video, which was comparing FSR to DLSS, but the data is just not good for our use case.
The old calculator was entirely based on that DF data, and that data has been completely phased out of the calculator. Now the calculator uses the data from the document I linked and said was "The One document to rule them all". First advantage is, as said previously, that less variables are involved, reducing the number of ways this can go wrong. But that's not really what's important. That document is from Nvidia, and has data for a lot more GPUs, although only Ampere has been used here for reasons I'll detail later. This means the data is more accurate, but the most important is that the bigger number of GPUs tested allows us to verify if the cost of dlss scales linearily with performance, as supposed until then.
It does not.
It is hard to come to a conclusion as the data is still not as much as I'd like, but using Excel, we can see the curves are akin to a polynomial. I have, as such, asked Excel to make polynomials out of the curves. They ended up being of the third degree, simply because there’s only 4 GPUs whose data I can use.
Here are the curves, with the X axis being the tensor performance and the Y axis the speed of the DLSS calculations (not how long it takes, but the speed; a higher value means faster so a lower ms count).
That change in the way scalability is calculated makes a MASSIVE difference to the results, and is the main factor of change compared to the old calculator.
Here is an illustration : the intersection is the 3080; X coordinate is tensor performance, Y coordinate is the speed of the DLSS calculations.
The orange curve is how I previously thought it would scale, blue curve is how it actually scales.
To give you an idea, Drake would be around 0.07 on the X axis. We see that this new method of calculation heavily benefits Drake.
Using this data, I discovered a lot of things. First of all, the scaling of DLSS speed depending on resolution. It scales kinda linearily, but very loosely. I’ll first talk about the 1080p to 4K difference. I have noticed that the more powerful a GPU is, the smaller the difference between 4K and 1080p speed is; and the less powerful a GPU is, the more that difference tends to be the same as the difference between the number of pixels (so 4x). For example, For Drake, the difference between 1080p speed and 4k speed is very close to 4x because Drake is very not powerful, but for a 3090 it's more like 2.9x.
4K to 8K is significantly weirder. It starts at around 4.3x, then goes up, then down. At first I thought this was some proof the Excel predictions were just not working, but after looking at the Nvidia data, we do see 4k to 8k going up then down.
It isn’t shown in this graph, but I also made 1080p to 1440p, which also looks very weird : starts at 2.2x difference, goes down to 1.5x then starts going up again. So 1080p to 4k looks normal and makes sense, but 4k to 8k and 1080p to 1440p are much weirder for not apparent reason. Btw, talking about 1440p…
I removed it because I’m not sure how accurate it is. Even just looking at the raw data through a graph, the 1440p speed as a function of tensor performance looks... wacky (cf first graph, the orange curve). It looks like the other curves but on steroids. And the fact that the 1080p to 1440p difference curve also looks wacky does not convince me of the validity of my results. I do think in the future I might add back 1440p; but I want to be more sure of what’s going on before, and I want to first put in place the enhancements I mention towards the end.
Now is the time to explain why I didn't use Turing to improve the prediction. I mean, I say now's the time but any point could be the time, it's not linked to any of the other points. In the document, 3 Turing GPUs are benched : 2060 super, 2080 laptop and 2080 TI. First problem : laptop GPUs have special power limiting software that makes it probable the laptop gpu was not running full speed. And we don't know what speed it's running at. This is also reflected in the graphs below, where we see the 2080 laptop is way closer to the 2060 super than it should be.
But that means for Turing we only have 2 GPUs left. How do you infer a curve out of that ? You don't really. That's not enough. 4 is already limiting, so 2? Don't even think about it. And that is why I didn't take into account Turing GPUs in the calculator.
I'll now talk about the sparsity improvement. You may recall that earlier, we used DF's data with a 2060 to estimate the improvement from sparsity. Turns out that improvement massively depends on the card : the faster the card, the smaller the difference. The resolution also plays a role, but a minor one. As such, at the 2060 performance level, we see a 25% improvement from sparsity. 25% is what was found with DF data with a 2060, so that was close. On the other hand, at the 2080 ti performance level, the improvement is merely 9%.
Now let’s talk limitations of my simulations.
First of all, because of the data itslelf. We are talking about official data from Nvidia, but they said they mesured that experimentally by running a command prompt. But anyone who has ever done physics knows that experimental measurements will always be imperfect. Who knows if the GPUs were boosting as they should. Who knows if the antivirus was running in the background. We can never rule out the possibility that the data is not perfect, even if it comes from Nvidia.
This problem may be the reason why some of the behaviors in the graphs are really weird. Or it's not. I do not know. Maybe the wackiness of the 1440p results come from there. Maybe the weird 4K to 8k speed difference is also caused by that. We’ve talked about how the weird 4K to 8K curve isn't some product of Excel hallucination, but is actually in accordance with what the data shows – and maybe that’s how it behaves in real life. Or maybe there is some slight error in the data, that snowballed into those curves. That’s ultimately something we can’t really know, unless we can have extensive testing from several sources.
The second problem is simply that the Excel predictions take only 4 GPUs into account, limiting the precision. We can already see it in the first picture, the predictions don't completely line up. This is even more apparent if I show you what the predicted curves look like when extending further :
As you see, it goes to shit very, VERY quickly. At around 1.6 it starts becoming unrealistic ; and to give you an idea, on this scale 1=RTX 3080 tensor performance, and the highest actual data from nvidia we have is the 3090 at around 1.2. So it goes to shit real quick. I don’t think that means that what comes before in the curve is useless; the curve still has to intersect at the origin. 0 tensor performance means 0 speed. And there aren’t infinite ways to go to that (I mean yes there are, but shut up you got the point) ; meanwhile, on the right side of the curve Excel doesn’t have any indication of where to go and this explains how the curve can become so unrealistic.
Can this be improved ? Yes.
What we need is more GPUs. Ideally, less powerful and more powerful than what we have now. I have no clue of where to find info about slower GPUs, but I do know where to start with higher end GPUs : I know techpowerup has done some testing on a 4080 for DLSS. This would allow us to see an actual data point at like 1.6, which would go a long way for improving the predictions. Important thing to note is that we can’t rely on Turing to extend the graph, because Turing doesn’t benefit from sparsity, and the improvement gains from that are unpredicatble as we've seen before; on the other hand, Lovelace tensor cores are basically Ampere tensor cores but with FP8. Considering DLSS most likely does not use that, we can use Lovelace cards to extend the graph. Although what would be best would be a lower end card as, you know, we’re trying to predict the performance of Drake. 4080 results may help for that, but 3050 results would be significantly more appreciated, as it would be much more relevant.
In the next version of the calculator, I will try adding 4080 data to make the predictions more accurate, and I will also continue searching for more data that could help us in our quest of predicting Drake DLSS performance.
You have finally reached the end of the technical stuff. Hope you enjoyed it and if you have any question, don't hesistate. There are many things I glossed over, and I’d be happy to give you more details if you want.