This is roughly what I expected, though it's interesting to see it born out. The Ada Lovelace white paper just doesn't suggest there is a lot going on that isn't pure die shrink. No mention is made of changes to the SM design, and the main bullet points for RT, Tensor Core, and OFA improvements all parrot the same "over 2x performance" increase, which happens to match the performance increase that Nvidia attributes to the node shrink.
Nvidia claims that in "Cyberpunk 2077 running in RT: Overdrive Mode, we’ve measured overall performance gains of up to 44% from SER." But SER requires a custom API, and I have reason to believe this context-free benchmark is comparing Ada-with-SER to Ada-without-SER, rather than Ada-with-SER to
Ampere-without-SER, because that would be less flattering.
The remaining two RT related architectural improvements are the Opacity Micromap Engine, and the Displaced Micro-mesh Engine, and both of them require you reauthor assets for them to work, and both of them are wins for things that I don't think would affect Cyberpunk much. The OME is designed to make efficient ray tracing of noisy geometry, like foliage, work better, and the DMME is targeted at small objects with lots of detail.
The larger L2 cache is Nvidia gesturing vaguely in the direction of Infinity Cache. I've said that while Infinity Cache is a good idea, it's not a performance win, it's a performance trade off, where AMD is sacrificing performance on "touch once" data paths in order to hugely accelerate "touch multiple times" data paths, and the net result is usually a wash. RT workloads are highly divergent and cache antagonistic.
This is where I think SER comes in. SER lets developers give the driver more information about RT Shaders so that they can be scheduled to increase thread/cache locality. I suspect that SER is simply reclaiming performance
lost in moving to a "cache over bandwidth" design.
If you look carefully a Nvidia's white paper on Ada, and keep an eye out for contextless benchmarks you can see that Nvidia claims a 2x performance increase from moving to 5nm. If you go through an eliminate every time they mention 2x increase without a clear reason why, you're left with very very little. The full list of documented changes:
- Bigger Cache: it's entirely possible that Drake has a larger cache, like Orin did, but this represents a standing-in-place for Ada, where the bandwidth is not substantially increased relative to performance
- Shader Execution Reordering: a change to the SM scheduler that seems to exist mostly to help counter the performance loss of not scaling bandwidth with performance
- Opacity Micromap Engine: In rasterization, high density foliage and fire can be approximated by textures with alpha, rather than a lot of geometry. If you wanted your shadows of that foliage to look super detailed, you'd have to write a custom shader for that. The OME lets you supplement the texture with more complex faux-geometry that the RT engine uses to test triangle intersection, which lets your custom shader run less often. This might be a real win, but probably not in Cyberpunk
- MicroMesh Engine: This is like the opposite. Really detailed small objects are rendered with lots of geometry, which is the worst case scenario for our Fully Path Traced future. This lets you once again substitute simpler faux-geometry for triangle intersection testing, which is post-processed back to the more detailed version. This looks like a pretty big win for fully path traced games, but I doubt we're going to see any of those, and in the case of CPDR, they are not listed as one of the early partners, instead Adobe is. I suspect Nvidia needs to get asset authoring tools in place before they can even look at engine support.
- FP8 Transformer Engine: the only documented update to the tensor cores, and it's a backport from Hopper. It doesn't accelerate FP8 operations, it accelerates FP8 scaling, which is something only used during model training, not during model execution.
- NVENC updates: Neat. And backported to Drake.
- Improved Clock Gating: This doesn't show up in the white paper, but does get talked about a bit when Nvidia talks about Ada power efficiency. It's poorly documented, but it appears that Ada supports fine grained control over the memory clock to save power. Considering Ada moves to a more cache centric design, this is probably smart, but it's purely a power win, and the Lapsus$ hack indicates it was backported to Drake already (FLCG).
You'll note the OFA isn't on the list. Nvidia's description of Ada's OFA doesn't document any core improvements over Ampere, just the 2x increase in performance that seems matches the increase in TFLOPS for the GPU over all. This is why I think "does Drake have the OFA from Ada" is an irrelevant question. The OFA is a customized ARM core, and there doesn't appear to be any upgrades to it that would enable RTX 40 perf without drawing RTX 40 watts. DLSS 3 on Drake will be set by the power budget and no other magic exists to work around that.
Of these 7 Ada improvements, 2 are back ported to Drake (NVENC/FLCG), 1 of them is a tradeoff (bigger cache), 1 of them is a workaround for said trade off (SER), and 1 is not usable in video games (FP8 Transformer).
That just leaves the Opacity Micromap and MicroMesh engines. One of these is laying the ground work for a fully path traced future which no 9th gen console is going to be able to reach, and the other accelerates specific edge cases in RT workloads that I suspect are out of REDACTED's reach 99.9% of the time anyway.
We come back to folks saying that REDACTED is "out of date" already. In terms of its design, Nvidia has nothing in its pocket that could deliver a more powerful mobile device than T239. It remains to be seen what Blackwell will look like, whether Nvidia will fundamentally shake up the design it's been maturing since 2006, or ride a node shrink One Last Time, but whichever path they take will not be available to Nintendo till 2026 at the earliest.
At the end of the day
It's the Process Node, Stupid, and the driver there is going to be "how much power can Nintendo give folks at a price point they'll accept."