Looking at the Nintendo switch, and then looking at these leaked specifications, I wanted to look at the pixel rate and the textile rate between the two. I should preface this by saying that these are assumptions, not 100% based on actual information from the data breach but simply based on other Nvidia cards, the Lovelace, Ampere and Maxwell architecture (the one in the switch), the PS4, XBOne, PS4 Pro and Xbox One X. Granted they don't really mean
that much across different architectures buuuut, it's interesting to know and the focus in the end will be about the Drake model in a speculative manner.
With that out if the way let me begin.
The switch contains 2 Streaming Multiprocessors with 256 CUDA cores, 16 Texture Mapping Units (TMUs) and 16 Render Output Units or Raster Operations Pipeline (ROPs).
Now, what does a TMU and a ROP do? I think Tom's Hardware does a nice job of putting it succinctly:
Texture Mapping Units (TMUs) Textures need to be addressed and filtered. This job is done by TMUs that work in conjunction with pixel and vertex shader units. It is the TMU's job to apply texture operations to pixels. The number of texture units in a graphics processor is used when comparing two different cards for texturing performance. It is reasonable to assume that the card with more TMUs will be faster at processing texture information.
Raster Operator Units (a.k.a. ROPs)The raster operation processors are responsible for writing pixel data to memory. The speed at which this is done is known as the fill rate. ROPs and fill rates used to be a much more important metric in the early days of 3D graphics cards. While the job of the ROPs is important, it is not really a performance bottleneck as much as it once was, and is not used as a relative performance indicator to good effect at this time.
Textures need to be addressed and filtered. This job is done by TMUs that work in conjunction with pixel and vertex shader units. It is the TMU's job to apply texture operations to pixels. The number of texture units in a graphics processor is used when comparing two different cards for...
www.tomshardware.com
So, how would we figure out the pixel fill rate and the textile rate of the switch? Well it is (ROP*Clock speed) for the Pixel/s and (TMU*Clock speed) for textile/s
With the switch having multiple GPU clock speeds, it comes down to this for the ones that I know of:
ROP affects this:
@ 307.2 MHz: (16*307.2)/1000= 4.915GPixel/s
@ 384 MHz (Xenoblade Resolution
) : (16*384)/1000= 6.144GPixel/s
@ 460 MHz: (16*460)/1000= 7.36GPixel/s
@ 768 MHz: (16*768)/1000= 12.29GPixel/s
TMU affects this:
@ 307.2 MHz: (16*307.2)/1000= 4.915GTexel/s
@ 384 MHz: (16*384)/1000= 6.144GTexel/s
@ 460 MHz: (16*460)/1000= 7.36GTexel/s
@ 768 MHz: (16*768)/1000= 12.29GTexel/s
Note: I divided it by 1000 to make it easier to see.
Looks the same right? Well, it has both 16TMUs and 16 ROPs. And quite frankly, these were overkill for the device. So much for such a tiny package.
Now here is where the assumption starts, Ampere and Lovelace both have 16ROPs
per GPC for all the cards except their data center cards who have
24 ROPs per GPC.
Unless they changed it themselves in the Ampere/Lovelace implementation for Drake, this is the same ROP for these clock frequencies:
@ 307.2 MHz: (16*307.2)/1000= 4.915GPixel/s
@ 384 MHz: (16*384)/1000= 6.144GPixel/s
@ 460 MHz: (16*460)/1000= 7.36GPixel/s
@ 768 MHz: (16*768)/1000= 12.29GPixel/s
and if we entertain higher clock speeds, then it is this:
@ 921 MHz: 14.736GPixel/s
@ 1267 MHz: 20.272GPixel/s
(Note: More does not necessarily mean better, there is such a thing as a "waste" if it is not necessary to have so much)
For the TMU it's not really as simple, Maxwell in the switch has 16TMUs for the one GPC, but in Ampere it seems like across the whole stack it ranges from 32 at the lower end per GPC and 48 TMUs per GPC at the higher end. So, considering that this is more likely akin to the lower stack of ampere I will assume it has 32TMUs, unless it was changed for reasons unbeknownst to us.
So it would look like this for the TMU:
@ 307.2 MHz: (32*307.2)/1000= 9.83GTexel/s
@ 384 MHz: (32*384)/1000= 12.29GTexel/s
@ 460 MHz: (32*460)/1000= 14.72GTexel/s
@ 768 MHz: (32*768)/1000= 24.576GTexel/s
for the ones curious on higher clocks:
@ 921 MHz: 29.472GTexel/s
@ 1267 MHz: 40.544GTexel/s
What can I conclude about this? Well, the pixel filtrate will likely stay the same, but the texel rate will double going from the Switch to the Drake model in both portable and docked mode. A healthy level of growth there if I do say so myself
Here's just a brief comparison to the other consoles:
XBox ONE: 48TMUs/16ROPs @ 853MHz gives us 13.65GPixel/s and 40.94GTexel/s
XBox ONE S: 48TMUs/16ROPs @ 914MHz gives us 14.62GPixel/s and 43.87GTexel/s
XBox ONE X: 160TMUs/32ROPs @ 1172MHz gives us 37.50GPixel/s and 187.5GTexel/s
XBox Series S: 80TMUs/32ROPs @ 1565MHz gives us 50.08GPixel/s and 125.2GTexel/s
XBox Series X: 208TMUs/64ROPs @ 1825MHz gives us 116.8GPixel/s and 379.6GTexel/s
PlayStation 4: 72TMUs/32ROPs @ 800MHz gives us 25.60GPixel/s and 57.60GTexel/s
PlayStation 4 Pro: 144TMUs/32ROPs @ 911MHz gives us 29.15GPixel/s and 131.2 GTexel/s
PlayStation 5: 144TMUs/64ROPs @ 2233MHz gives us 142.9GPixel/s and 321.6GTexel/s
These:
XB1, PS4, XB1S, PS4 PRO, XB1X, can be compared with each other as they all use the same core architecture of GCN
These: XBSS, XBSX and PS5 can be compared with each other as they use RDNA2 (or its derivative)
Drake is a tricky comparison honestly, next to the AMD scenarios as Nvidia doesn't need as many to perform equally or better than their AMD equivalent.
Discuss.
The overall wattage for the system matters more. Perhaps for the OLED screen, it ended up having the same power draw as V2 because of the larger screen. There's a difference in max settings in battery life though.
the fan is more active on the OLED I presume since it is smaller and needs to work harder, but only in instances that are required.