So, a few general comments on the updates:
Firstly, and most importantly, now that "around March" is on the table, I'd like to officially announce the re-launch of #TeamLeapDay. We'll be accepting membership applications by messenger pigeon, which must be accompanied by a one thousand word essay on why Wave Race should be a launch title.
Secondly, it looks like I wasn't completely wrong about BoTW, with both DLSS and fast loading confirmed to be in the demo. No mention of ray tracing, but two out of three ain't bad.
Actually being able to run DLSS at 4K/60fps is a bit better than I'd expected. My personal expectations were 4K/30fps, or around 1440p/60fps, just due to the cost of running DLSS itself, but it looks like it's either cheaper than I expected, or the hardware is more capable than I expected, or a bit of both. Good news either way.
The talk of "instant" load times in BoTW is interesting. Of course nothing is truly instant (even Ratchet & Clank uses a short portal sequence to hide loading, a bit like the old door-opening trick on Metroid Prime back in the day), but we're probably talking around a second or so for it to feel instant. That's about a 30x speed up compared to the Switch.
It's important to note that there are multiple factors which impact loading speeds, not just the speed of the storage itself (although as I've explained before, I was already expecting that to be much quicker). Roughly speaking, I'd categorise them into three different bottlenecks which could factor into how fast loading is:
- The speed of the storage medium itself. Obviously this can be a big bottleneck, as if you want to pull 1GB of data off a storage device which maxes out at 100MB/s, you can't do that any quicker than 10 seconds.
- The CPU overhead associated with pulling that data off the storage medium. The main part of this nowadays is decompression. If your CPU can't decompress data fast enough, then even if you've got super fast storage you're not going to be leveraging it (unless you want to ship purely uncompressed assets and balloon the game size). Aside from decompression, though, there's still some additional overhead of just communicating with the storage medium and transferring the data, which can be non-trivial.
- Non-storage related CPU work. This is doing everything you need to do to actually set up the game state. Allocate memory for all the different things you need to keep track of during gameplay, initialise all the different systems which are going to be running, and sync the game state with the save file. This is also where you're doing any kind of procedural generation, which is often overlooked when thinking about load times. People playing No Man's Sky on the PS5 for the first time were really puzzled why it wasn't loading much quicker than the PS4, but the game is fully procedural, so it was never really bottlenecked by storage in the first place.
On Switch, it seems most games are bottlenecked by number 2. We know that T239 contains a dedicated File Decompression Engine, so ideally we're not bottlenecked here anymore, but if we were, it would suggest that the FDE is around 30x as fast at decompressing data as the Switch CPU. I don't know if we can say for sure what data rates BoTW was operating at while loading, but I don't believe it's noticeably faster on internal storage or microSD than game cards. Game cards top out at 50MB/s, so if the Switch CPU was able to keep up with that, then we'd be looking at 30x as much, or about 1.5GB/s of compressed data coming into the FDE. It could certainly be less than that, with the CPU perhaps bottlenecking things even lower on the Switch, but honestly 1.5GB/s or so wouldn't be crazy. Both MS and Sony designed much faster decompression hardware 3 years previously, and Nvidia isn't exactly a slouch when it comes to designing fast coprocessors.
If we were bottlenecked on storage, then again comparing to the original Switch game card as the baseline we'd once more be looking at around 1.5GB/s. For a while I've been considering UFS 3.1 to be reasonably likely, which ranges from 1.7GB/s to 2GB/s read speeds, and is pretty much in line with this. In fact, if Nintendo actually is considering 512GB of storage for the console, then 1.7GB/s UFS 3.1 is the
slowest option they would have. Nobody makes 512GB modules for either eMMC or UFS 2.
The last one probably isn't trivial for BoTW on the original Switch, although likely a lighter workload than it is for ToTK with its increased number of gameplay systems. I definitely don't expect a 30x speedup here, but between taking the decompression work off the CPU, and just having more, faster CPU cores to do the work, it should be much faster either way.
Of course this is all based on a very specific reading of "instant" loading. If 2 seconds counts as instant, then divide every number in half. If it's 0.5 seconds, multiply by 2, etc. However, given expectations of internal storage close to 2GB/s, and dedicated decompression hardware to remove the main CPU bottleneck, dropping load times from 30s to around 1s seems quite reasonable.