I have done it
I HAVE FOUND A WAY TO CONVERT TFLOPS ACROSS uARCHS
Code:
---GPU FLOP Comparison Method---
Ampere: 2(SM Count * (128 * Clock speed))
Turing/Vega/GCN 2(SM/CU Count * (64 * Clock Speed))
RDNA1/IC-Less RDNA2 converted to GCN TFLOPs (2(CU Count * (64 * Clock Speed)))+25%
Calculate the FP32/Cycle of a GPU with the equation then look at the TFLOP value of that GPU as rated by the manufacturer, the "Efficiency" of the TFLOPs will be exposed by the difference in the FP32/Cycle result
Factor in % Additions to extrapolate back to a weaker/older uArch if the % difference is properly known
(EX: RDNA1/IC-Less RDNA2 is 25% better in IPC than GCN, so it would be the GCN Equation + 25%)
|
And with this and some extrapolation
Z0m3le and I have more or less determined Desktop Ampere to be equivalent to Polaris and therefore not so far off from the rest of GCN in regards to FLOP Efficiency
And that is not considering all the features that even Ampere has over GCN like Tile-Based Rasterization, Mixed Precision FP, The Tensor and RT cores, Primitive and Mesh Shaders, Variable Rate Shading (which can boost effective GPU perf up to 20% in some reports)
Which, considering Drake is running pretty much "Ampere+" with the extra L2 Cache, and RDNA2 reporting massive IPC uplifts with infinity Cache over the 25% from GCN to RDNA1, can likely give us determination on what clocks Drake needs to hit to match the PS4 Pro or even the Series S assuming different IPC uplifts over Ampere via us being able to covert IC-Less RDNA2 to GCN and Ampere and GCN being similar FLOP-to-FLOP
(Note, this is actually lowballing Ampere as modern games that take advantage of Ampere's features will outperform GCN, but this gives us a more level comparison between Ampere, GCN (PS4 Pro/One X), and IC-less RDNA2 (Series S). So technically these numbers are a lowball for modern effectiveness)
So for example
Code:
-PS4 Pro (GCN): 2(36*(64*0.911) = 4197.888 FP32/Cycle rated at 4.2 GCN TFLOPs
-Series S (IC-Less RDNA2): (2(20(64*1.565)) = 4006.4 FP32/Cycle rated at 4 RDNA"2" TFLOPs + 25% = 5008 FP32/Cycle or 5 GCN TFLOPS
---Drake based on Default Ampere for Reference---
-Drake (Pure Ampere, OG Switch Docked Clocks): (2(12(128*0.768)) = 2359.296 FP32/Cycle or 2.35 Ampere TFLOPs
-Drake (Pure Ampere, 1Ghz): (2(12(128*1)) = 3072 FP32/Cycle or 3 Ampere TFLOPs
-Drake (Pure Ampere, 1.5Ghz): (2(12(128*1.5)) = 4608 FP32/Cycle or 4.3 Ampere TFLOPs
-Drake (Pure Ampere, 1.63Ghz aka Matching Series S): (2(12(128*1.63)) = 5007.36 FP32/Cycle or 5 Ampere TFLOps.
----------------------------------------------------------
Now, assuming even just a marginal 10% increase in IPC over Ampere those values effectively become
-Drake (768Mhz): 2.6 Ampere TFLOPs
-Drake (1Ghz): 3.3 Ampere TFLOps
-Drake (1.5Ghz): 5 Ampere TFLOps
-Drake (1.63Ghz): 5.5 Ampere TFLOps
|
Even a 10% increase over Ampere due to that Cache is enough to make 1.5Ghz Drake match the Series S's GCN Equivalent!
Now, Assuming Drake gets the 25% boost AMD did just from going GCN to RDNA1
Code:
--Drake: 25% better than Ampere calculation--
-Drake (768Mhz): 2.95 Ampere TFLOPs
-Drake (1Ghz): 3.8 Ampere TFLOps
-Drake (1.5Ghz): 5.7 Ampere TFLOps
-Drake (1.63Ghz): 6.2 Ampere TFLOps
|
You'd only need to hit 1.1Ghz to match the PS4 Pro if they pull out a 25% IPC increase through the cache (Which, considering reports of the 4070 with the exact same core count
or less than the 3090 and the only major difference in raster perf seemingly being the Cache reporting an up to 30% increase over the 3090, that may very well be the case)
And only 1.3Ghz to match the Series S!