I'm not really a hardware nerd, So could someone please explain the benefits of the Cache?
Like what is the difference between a 2MB Cache and a 4MB Cache, And why is it so small but apparently makes a big difference?
Please explain like I am 5, On the level of an
@oldpuck post
The reason cache exists is that RAM, like the DDR4 in your PC or the LPDDR4 in the Switch, is slow. Not slow in human terms, but slow from the point of view of the CPU. When a CPU core requests data from RAM, it takes something like 50 nanoseconds to arrive (each nanosecond is a
billionth of a second). For us, that's an incomprehensibly short amount of time, but for a CPU it's an eternity. If you've got a CPU clocked at 2GHz, then it's capable of running an instruction once every 0.5 nanoseconds. So if the CPU has to wait 50ns for data to come back, it has to wait 100 times as long as it takes to run an instruction. If every instruction needs data (which they all do), then the CPU would spend 99% of its time waiting for data to arrive from RAM, and only 1% of its time actually calculating anything.
So, instead of having to get data all the way from RAM, CPU designers added what's called a cache next to the CPU. This is a much smaller pool of embedded memory right next to the CPU that's designed to return data much more quickly than RAM, potentially as quickly as one clock cycle (0.5ns in our example above). Every time the CPU accesses any data from RAM, it's held in the cache, until it's either used again, or it's not accessed for a while and gets booted out of the cache back to the main RAM. This means that if the CPU is accessing the same data (or instructions, which are also data) over and over again, so long as it's small enough to fit in cache, it will be stored there and can be accessed very quickly. Typically most code does behave like this (access small amounts of data repeatedly), so caches work very well to keep CPUs fed, and they're a very important part of CPU design.
When it comes to the size of a cache, it's a bit more complicated, because the bigger the cache is the
slower it is, which means bigger isn't always better when it comes to cache design. In general you want to have as much data as possible close to the CPU so that it can be accessed quickly, but if you just add an absurdly large cache on there it's not going to do much good if it's almost as slow as waiting for data from the RAM itself.
This is the reason you'll see multiple levels of cache on a CPU, typically L1, L2 and L3. L1 is a very small pool of cache, usually around 64KB (with a separate L1 for instruction data as well), with the goal of being as fast as possible, usually returning data within a single clock cycle. Then after that, there's the L2 cache, which is a bit bigger, and a bit slower, typically anywhere from 256KB to 1MB per core on modern CPUs. Finally, there's usually an L3 cache, which is shared between the cores and it even bigger and even slower, ranging from around 2MB on a lower-end phone CPU to as much as 1GB on high-end server CPUs. This cache hierarchy, as it's called, is an attempt to get the best of both worlds with very quick response from the L1 if the data's available there, but also having larger pools that are at least reasonably quick if it's not available from the closest cache.
Regarding how much of an improvement cache makes, it all depends on the software. Software which only works on a few hundred KB of data won't see much benefit moving from 4MB of L3 to 8MB, as the dataset will all fit in cache either way, but if your software is working with datasets of hundreds of MBs, then more cache is pretty much always better. On the PC front, games tend to benefit from increased cache sizes more than other software, but it still depends quite a bit from game to game, and may be a little different on a console where developers can optimise their code around datasets that fit nicely in the cache they have available to them.