DLSS is a neural network, which is designed for one piece of hardware - the tensor core. For the most part, you don't tune neural networks for different hardware, you make a new neural network if you want a different performance characteristics. And in the case of the one in DLSS, it represents hundreds of thousands of hours of compute time. Not only would it cost as much to build a customized DLSS as it did to make DLSS in the first place, there is no reason to believe that there are any optimizations to be made for Drake specifically.
The value of DLSS is that it shares one model that is trained on truly massive quantities of data. Forking DLSS would effectively lock Nintendo off from DLSS development going forward.
I don't think the bolded is true. The cost of developing DLSS (aside from the general R&D of investigating how best to apply neural networks to the problem)
would have been largely in building the training set, which is something they've already done and can re-use. DLSS itself is by necessity a very small model, as it needs to handle hundreds of millions of pixels a second on consumer hardware, so if the training data already exists, the actual computational cost of training the model would be relatively low. I did a back of a paper envelop calculation a while back and I came up with a parameter count in the tens of thousands for DLSS. Hence why Nvidia can crack out new versions of the network on a pretty regular basis.
I should emphasise that I don't think Nvidia would
fork DLSS for Nintendo, but I do think it's possible that, alongside access to regular DLSS, Nvidia could provide a DLSS-lite to Switch 2 developers which trades off some image quality for increased performance.
I think it’s time to treat the hypothetical Nintendo-customized DLSS as a myth.
The customizations that you could feasibly make, like reducing the number of channels in each layer of the architecture, would only make a marginal performance difference and will always penalize image quality. (And for anyone who’s read my older posts, I no longer believe that decreasing the total number of layers in the network is a good way to decrease the cost, for reasons that I may get into some other time).
The “optimized hardware” customizations that people keep dreaming up simply don’t exist; the tensor cores are the hardware optimization, and we know those are just Ampere tensor cores in T239. So I can only conclude:
Custom DLSS is dead, and we have killed him.
You're correct in that any attempt to reduce the performance cost of DLSS would impact image quality, but I don't think that's necessarily always a bad thing. When developing DLSS, Nvidia would have had to find a balance between image quality and speed. You can always use a bigger, more complex network (so long as you have sufficient training data) to get better quality*, or a smaller, simpler network to get better performance, and we can assume that DLSS currently represents what Nvidia believes to be the sweet spot, where moving in either direction wouldn't be a worthwhile trade-off.
However, the sweet spot between speed and quality for desktop GPUs isn't necessarily the same as the sweet spot for portable devices with a fraction of the performance. Different trade-offs apply, and what might be considered cheap on a desktop GPU might take an unreasonable portion of the frame time on a low-power console. Even the quality trade-offs may differ, as IQ issues that may be noticeable to someone sitting right in front of a computer monitor may not be as noticeable on a TV screen further away, or a much smaller handheld screen.
I'm sure Nvidia is and will continue to provide the standard versions of DLSS to Switch developers to use in their games, and I don't think there's any free lunch where Nintendo gets a DLSS implementation that's magically faster without any trade-offs, but I do think that there's potentially value, in addition to regular DLSS, to providing a more light-weight version of the model as an option for developers who are comfortable sacrificing a bit of image quality for performance. Whether that's because they're stretching to squeeze in their chosen lighting model and feel it's important enough to sacrifice a bit of IQ by cutting down DLSS time, or because they're targeting 60fps and prefer using DLSS-lite to hit 4K rather than the 1440p output of regular DLSS, or because the limitations of DLSS-lite simply aren't readily apparent in their game (say it has more artifacting around certain high-frequency detail patterns, but they're not present).
* To a certain point. I assume that you'll asymptotically approach "ideal" IQ for the amount of input data you have, and adding excess complexity for this particular task may end up over-fitting or hallucinating, which wouldn't be desirable.