So can we talk about ampere itself. Like what benefits will it have for developer to use. I am talking about the developers who strictly will take full advantage of the architecture.
We've talked about this a lot, but I know you're newish. I'll compare it to other hardware, I think that's the simplest way to look at it.
BIG FAT NOTE HERE, DON'T @ ME WITHOUT READING: I'm not talking about T239, or the Switch 2, I'm talking about
Ampere the architecture. I'm not trying to say anything about how powerful these tools are in one machine versus another. I'm just talking about what extra tools are in the toolbox, period. I'm counting the number of widgets on the swiss army knife, not comparing whose toothpick is bigger.
Against RDNA 2, the core of modern consoles like PS5/Xbox Series/Steam Deck
DLSS: We talk about this a lot. The short version is that it is a technology that allows you to keep most of the detail of high resolution, and most of the frame rate of a low resolution. It has multiple modes, which give you higher and higher frame rates for lower and lower levels of detail. There is similar technology on the other consoles, but DLSS looks much better at each mode, allowing developers to push it much farther.
RT Parallelism: Nvidia's RT cores can run simultaneous with shading, AMD's cannot. Even in a case where and Nvidia GPU and an AMD GPU are equally as fast as every single step of rendering, a clever programmer (and it does require cleverness to utilize fully) can get better performance out of Nvidia by doing more things at once.
BVH traversal acceleration: Nvidia's RT cores are, about, 1.7-2.0x as powerful as AMD's, mostly because they accelerate this part of the RT pipeline. Note, I'm not comparing
Switch 2 to
PS5. I'm comparing
Ampere to
RDNA2. The number of cores, and their clock speed matters. But all else being equal, complex RT effects are cheaper on Ampere hardware.
Against PS5 specifically
Variable Rate Shading: This is DirectX 12 feature that didn't make it into the PS5, but did make it into the Xbox Series. (Or, more honestly, it's an Xbox feature which is why Microsoft put it in DirectX 12). DLSS and other upscalers have made this seem less useful, though some smart developers are starting to do cool things with it. It allows parts of the screen to render at lower resolution than the rest. Can be useful in areas where there isn't a lot of detail, or where the player is unlikely to be looking, saving performance.
ML acceleration: The tensor cores - mostly exists to support DLSS, but technically it can support any machine learning based operation. Xbox Series consoles have custom machine learning hardware to handle this, but PS5 doesn't. (The rumored PS5 Pro does)
Against Maxwell, the core of the original Switch
Everything above, plus:
Raytracing, period: Above, I talked about Nvidia ray tracing hardware versus AMD ray tracing hardware. But Switch doesn't have RT hardware at all. RT hardware can be used to do clever things that aren't graphics, either, like 3D sound. There are also some obscure features in the GPU that really exist to support RT (like
Conservative Rasterization) which I won't get into here.
Mesh Shaders: A more modern way of handling geometry in the GPU. It is designed to allow devs to push much higher polygon count. But some clever developers are using it to do GPU accelerated animations as well.
Sampler Feedback: A nice but small optimization. Lets the game know more about how textures are being drawn to the screen, allowing them to optimize away some high resolution drawing that wouldn't be visible
Higher quality video: Nvidia has built in hardware for encoding/decoding video. Ampere supports higher quality versions of the existing formats, and adds AV1 support
Hardware accelerated JPEG: This is probably the single most useless thing on the list, as it's really for things like Photoshop, and game textures will almost always be in a special format, but Nvidia has added dedicated JPEG support in hardware.
Against GCN, the core of PS4/PS4 Pro/Xbox One/One S/One X
Everything from before, obviously, but also
16 bit precision: Most graphics operations use 32-bit numbers, because they're very precise, and you don't want pixels shaking around the screen because of rounding errors. But their are some operations where a full 32 bits of precision isn't necessary. Ampere can pack two 16-bit instructions into a 32-bit window, and get double performance there. This absolutely takes some cleverness to take advantage of.
Tiled rendering/caching: Old architectures drew the whole screen at once, which was heavy on memory bandwidth and cache hostile. Ampere (and Maxwell, for that matter, which is why it's down here) slice the frame up into tiles that fit in the cache, and draw them piecemeal. This reduces memory load, because it's more friendly to cache, and spreads memory usage out, so that memory is working longer at lower speeds instead of briefly at high speeds. This is basically free, developers don't have to do anything to take advantage of it, and is why last gen having "higher bandwidth" isn't really a problem.