I wanted to give an detailed explanation about the whitepapers and the source code but
@Anatole did it far better than I ever could. So go read his posts first if you want to understand what comes after. I just want to add a few points :
1) Why is full rate FP32 accumulation disabled on desktop Ampere, when it could have easily been enabled ? The answer : artificial segmentation.
Nvidia don't want AI researchers to use the gaming cards.
@oldpuck made the assumption that FP32 accumulation is only useful for training; and it's higly possible this assumption is 100% true. Looking at the Internet, model training is the only use case I see mentioned. It would make sense for Nvidia, who wants to sell the AI researchers the more expensive stuff, to cripple the gaming cards in that specific task, making it much less useful for AI researchers but with no incidence on gamers since FP32 accumulation is only for training, not inference.
But that leads us to point
2) The full rate accumulation in Drake will probably be useless. It's there because Nvidia doesn't see the point in intentionally crippling Drake as it's not like it's gonna be used for AI research, but that doesn't mean that full rate will bring any actual improvement to DLSS performance.
3) (minor point) DLSS probably uses FP16 and not INT8 because in each marketing material where they talk about DLSS, they talk about TFLOPS and not TOPS. The schematics they use to illustrate the AI operations for DLSS are also the same as for FP16.
Now, regarding the calculator :
I used 2 videos from Digital Foundry : the one about an eventual Switch pro, and the one reviewing DLSS2.3 vs FSR2.0. I ended up using almost exclusively that second video, because the frame time costs were very different probably because of the different versions, and also in that first video they say some flat out wrong stuff, like DLSS cost is only related to output resolution (it's not)(Edit : actually it may be true which would indicate the 2nd video is the bogus one. Oops).
To approximate the ms cost, I based the calculator on the RTX 3080 data. The RTX 3090 data doesn't change much, and the other data point is the 2060 which doesn't benefit from sparsity, and as we don't know what would be the improvement this is a pretty bad base to estimate Drake performance. I did however use the 2060 data , combined with the 3080 data as the Ampere reference, to estimate what would be that sparsity improvement: I calculated it a 1.25x, which tracks with what
@Anatole said about the performance boost likely being far from the theoretical max of 2x because of the real-time aspect of DLSS (If I understood correctly).
I do think this calculator would really benefit from more data, as what's in the DF videos is really limited and leaves a huge margin of error. I you have any idea where I can find more data, I'd love to hear it.