Great post all round. This absolutely isn't my field, so it's great to hear you break it down.
I know the "double upscale" gets pooh-poohed a lot but I don't think it's that weird. Games at non-native resolution have been an issue for decades, and are still being produced on the current consoles. But DLSS isn't just an upscaler - it's finding detail that doesn't exist in the native frame. It's more like DLSS is a cheaper path to a reasonable 1440p (or 4k, or 1080p) image, and if the resolution is sufficient, then why not enable more features in that base image?
All that said, I've been messing with a theory that I wanna throw out. Credit to Richard Leadbetter for putting this worm in my head.
Let’s go back to February of last year. Prior to the leak, we were trying to game out what a cut down, Nvidia specific Orin would look like and what hardware could be cut. One of the obvious candidates was the Deep Learning Accelerator, machine learning hardware that seemed useless in a console.
Next month, with the leak in full analysis mode, we discovered Orin’s “double rate” tensor cores. The obvious question is why doesn’t Drake have them. The obvious answer was, at the time, that Nintendo decided that the extra DLSS performance isn’t necessary.
Now, much further out, with additional information, I doubt both of these conclusions.
In the case of the tensor cores, they’re not truly “double rate.” Only a small number of instructions run at double the clock speed. I suspect because these are actually useful for model training, not model execution. That would mean they don’t accelerate DLSS at all, and Nvidia reserves them for their ML products purely for product differentiation.
The DLA is actually well documented, and functions much like tensor cores. They accelerate the same matrix operations as the tensor cores, and in fact, Nvidia’s machine learning tools let you run workloads that combine them into a single compute pool.
I think you’re seeing where this is going. Speculating: DLSS is tensor core limited. Speculating: DLSS can take advantage the of DLA. If so, Nvidia/Nintendo have an option for accelerating DLSS without making the GPU larger than it’s already substantial size.
Initially I thought the DLA was ruled out because it’s not in the list of blocks in the Nvidia driver. But upon review, the block isn’t listed on that place on Orin either. It does show up in other files for Orin, but Drake doesn’t have those files at all - not because the functionality is disabled but because the work was in progress when Nvidia made it private.
Totally speculative, but y’all are smarter than me so I thought I’d throw it out.