• Hey everyone, staff have documented a list of banned content and subject matter that we feel are not consistent with site values, and don't make sense to host discussion of on Famiboards. This list (and the relevant reasoning per item) is viewable here.
  • Do you have audio editing experience and want to help out with the Famiboards Discussion Club Podcast? If so, we're looking for help and would love to have you on the team! Just let us know in the Podcast Thread if you are interested!

StarTopic Future Nintendo Hardware & Technology Speculation & Discussion |ST| (Read the staff posts before commenting!)

I’m not sure if that’s an apt comparison, the 28SM 3060 outperforms the Vega 64 with 64 CUs by 20% as per TechPowerUP. While also having less memory bandwidth. And roughy the same TFLOP count.


If we were to normalize it, CU to SM, 2.28:1 ratio just using that kind of metric.


Add the 20% when at equal TFLOPs, and you get that an SM performs like 2.736 CUs of GCN5.0 who uses rapid packed math.


Or let’s compare the RX Vega 56 to the 3060 mobile.

56CUs vs 30SMs, both around 10TF (latter is a bit higher), the 3060 mobile outperforms that by 11%

So the performance would be 2.07CU:1SM in this case. Without accounting for the TF, 1.866CU:1SM to have roughly similar performance.


Unfortunately, the techpowerup numbers are useless.

The "performance comparison number" isn't even compressing all their benchmarking data down to a single figure, it's selecting a single figure from the benchmark as indicative. Worse, it doesn't select the same number for each card. It uses a summary of the 1080p data for older cards, and the 4k data for cards "faster than the 2080 ti". They appear to be using a fudge factor to account for that, but they don't document it. Which number is the Vega 64 using? Who knows!

Which would be bad enough, except the benchmarks aren't the same. There are 20 games in the benchmark used for the Vega 64, 23 for the 3060, and only two games are in both lists. In one of them (The Witcher III) the Vega 64 gets within 14% of the 3060, which was released years later. On the other Civ VI, the 3060 outperforms the Vega 64 by well over 200%. Which is not only a wild variation across the two benchmarks, but the Vega 64 breaks down on Civ VI, performing much worse than it did on The Witcher III, which the 3060 performs much, much better.

Why? Well, maybe Ampere is really good at Civ VI. Or GCN5 is really good at The Witcher III. But it's probably that Civ VI is a heavily CPU bound game and the 3060 test rig is using a 16 thread Core i9 which the Vega 64 test rig was using just an 8 thread Core i7.

Further, the two Vega cards you mention are GCN5 GPUs. GCN2 was used in the last gen consoles, upgraded to GCN4 in the midgen consoles. While its performance characteristics are likely very similar, it isn't directly relevant.

So I’m not really sure using CUs and SMs is really a good idea in this case, or Drake would be >20% the PS4.
I'm perfectly willing to believe that my numbers underestimate Ampere and Drake. I would love to have more data.


They do not. AMD APUs don’t do that.

My understanding was the Infinity Fabric 3.0 introduced cache coherence between GPU and CPU via a shared L3 cache - that's certainly what the AMD reps said when they tried to sell it to me. My understanding is also that the PS5 was confirmed to have this via die shots which matched the AMD patent application. But I admit that this is not my area of expertise.

That is only if Drake doesn’t exceed 1.043GHz. After that it has worse memory bandwidth allocation per TF than the desktop Ampere.

Basically, why I said before about the ceiling for Drake with respect to TF is 3.2, after that it doesn’t scale in the favor and you get a more imbalanced system like the Tegra X1.

And also why I think that with the GPU clocking to half in portable mode, a bandwidth of 68.2GB/s is “preferred” to balance the scaling from portable to Dock as best as possible. If they used LPDDR5X it can make it easier, being 2x the bandwidth scaling with 2x the FP32 increase.
Yes, I agree on all counts
 
So, in some places I've seen some comments around a video yesterday from SuperMetalDave64 offering some kind of indication/confirmation of a next gen device. I don't think that it's the case, so let me summarize the content of the video for you:

— Retro Studios hired a VFX artist with an impressive portfolio (who has previously worked on the game Remnant: From the Ashes).
— Another LinkedIn finding of UI Technical Designer at Retro Studios showed that they are developing Metroid Prime 4 through an in-house engine.


That's all. I guess it bodes well for MP4, but obviously there's no linking to any new hardware development. In case you want to see the video, you have it here:

 
My understanding was the Infinity Fabric 3.0 introduced cache coherence between GPU and CPU via a shared L3 cache - that's certainly what the AMD reps said when they tried to sell it to me. My understanding is also that the PS5 was confirmed to have this via die shots which matched the AMD patent application. But I admit that this is not my area of expertise.
Do you mean the Scaleable Data Fabric? TMK, it’s a branded interconnect IP that allows for the GPU and CPU to communicate without requiring PCIe lanes for that like on PC. It also allows the two CCX in the PS5 and XBox Series to communicate with one another, something that they wouldn’t be able to do otherwise incurring a performance penalty.




Infinity Fabric isn’t that different from NVLink iirc, similar concept with different branding. It’s a very powerful interconnect and why a game like Spider-Man on PC is much more demanding on the PC, the interconnect is that strong. SoC interconnects>>>> PCIe communication.


I’ve been digging for almost 2 hours and I can’t really find anything definitive about the PS5 using the L3 for the GPU (this extends to the XB too). Just speculation about it and fanboy drivel 🤭

The way I interpreted it is that the Scalable data fabric is the agent where they connect with the GDDR6 memory and other IO on the SoC, it also connects the CCX to the other CCX, but it’s a hypertransport of sorts.

This doesn’t mean that the PS5 and Series are weaker; just designed very well but aren’t doing something like ARM SOCs that use CoreLink IP which has an SLC for coherency, Intel SOCs where iGPU accesses the CPU L3 and it functions as an L4 for the GPU, or Apple silicon, or whatever else.



For Drake, I think it will have a SLC due to the CoreLink IP. It’s required for cache coherency on ARM-based products. Unless they outright have their own interconnect IP for that, it will have an SLC for the interconnect to communicate with the other aspects of the silicon, otherwise there is no coherency. And while the TX1 did that, it did it very poorly. The Xavier and later are the ones that did proper coherency. The thing to keep in mind is that the SLC houses communication with other elements of the SoC like the GPU and anything else, and it is called the Point of Coherency and Point of Serialization.

Some can do double duty of it, but they aren’t housing a GPU but only a CPU cluster, like Neoverse CPUs.


This is how it was explained to me anyway with respect to ARM based products. So… Drake has a SLC for reasons.


Here’s something about the Xavier:



I’ll get to the other part of your post later 😛, I’ll see if I can find comparisons between the 3060 and the Vega 64. I used GCN 5.0 as it is the most advanced form of GCN that came out and the most refined, meaning that the earlier ones should be technically worse than the GCN 5.0 version.


I’ll also see to compare GCN 2 to 5.0, finer is what the consoles were based on…. a weird amalgamation that is.
 
So, in some places I've seen some comments around a video yesterday from SuperMetalDave64 offering some kind of indication/confirmation of a next gen device. I don't think that it's the case, so let me summarize the content of the video for you:

— Retro Studios hired a VFX artist with an impressive portfolio (who has previously worked on the game Remnant: From the Ashes).
— Another LinkedIn finding of UI Technical Designer at Retro Studios showed that they are developing Metroid Prime 4 through an in-house engine.


That's all. I guess it bodes well for MP4, but obviously there's no linking to any new hardware development. In case you want to see the video, you have it here:



Tier A source tbqh.
 
They went through all that two weeks ago. There won't be anything Switch Pro related (apart from the repetitive questions at the end).
Started doubting myself when I remembered they did two episodes about boycotting a week apart. It is more Pokemon. But apparently they got the "REAL answer" lol

 
Anyway…. @oldpuck is this video good enough for your curiosity on GCN5.0 vs Ampere?

Here’s one of Vega 56:


Same core count, CU is twice the SM count. Does 1080p tests as well.



Maybe this is ok for Vega 56 and R9 290X?
 
Started doubting myself when I remembered they did two episodes about boycotting a week apart. It is more Pokemon. But apparently they got the "REAL answer" lol


So again it boils down to Game Freak setting too early a release date, not accounting for delays, and then with the merchandising machine in action they're stuck with the release date they set three years ago.

I mean I appreciate the enlightening of how GameFreak, Creatures, Nintendo and TPCi work together, but all I can do is sigh and shake my head yet again at how this franchise currently managed is despite a rabid fanbase that will throw all their money at it regardless of game performance (kinda like Toronto Maple Leafs fans)
 
So, in some places I've seen some comments around a video yesterday from SuperMetalDave64 offering some kind of indication/confirmation of a next gen device. I don't think that it's the case, so let me summarize the content of the video for you:

— Retro Studios hired a VFX artist with an impressive portfolio (who has previously worked on the game Remnant: From the Ashes).
— Another LinkedIn finding of UI Technical Designer at Retro Studios showed that they are developing Metroid Prime 4 through an in-house engine.


That's all. I guess it bodes well for MP4, but obviously there's no linking to any new hardware development. In case you want to see the video, you have it here:


SMD has a long history of reporting bad hardware information around Nintendo and the Switch, specifically. He had a legitimate source exactly once, and burned it in a rather spectacular fashion.
 
0
So again it boils down to Game Freak setting too early a release date, not accounting for delays, and then with the merchandising machine in action they're stuck with the release date they set three years ago.

I mean I appreciate the enlightening of how GameFreak, Creatures, Nintendo and TPCi work together, but all I can do is sigh and shake my head yet again at how this franchise currently managed is despite a rabid fanbase that will throw all their money at it regardless of game performance (kinda like Toronto Maple Leafs fans)
It's especially bad this time because everyone and their mother knows that AAA (or even AA) open world games specifically will undergo delays. It's a fact of life. No matter what you set as the release date it will be delayed.

Game Freak should have accounted for that and given themselves more wiggle room. But like basically every other developer they did not. The issue is Pokemon is uniquely impossible to delay on relatively short notice.
 
The next Switch after Drake will be 1000 teraflops.


This was a very enlightnening watch, thanks for sharing.

Makes me wonder if the future of not only gaming, put computers in general, is focusing on energy efficiency and form factors rather than just raw power, since we'll get to a point where we simply can't "use" all of the machine's available power in consumer-grade applications (i.e. non-scientific quantum stuff).
 
0
I’ll also see to compare GCN 2 to 5.0, finer is what the consoles were based on…. a weird amalgamation that is.
Fortunately, we are quite helped in this matter but Richard Leadbetter, who did a like-for-like comparison across AMD generations, for exactly the same reason we want to here - to spec out an unreleased console



Richard is doing the exact same thing I'm doing, only he has the advantage of an elaborate testing rig. He compares graphics cards across generations by looking at cards with identical CUs. He, of course, actually controls the hardware and can force clocks so he doesn't have to do any math, but because CU-comparable cards don't exist at every stage he does some math there, along with adding a card in the middle to get extra data points.

Richard's conclusions are complex, namely that many of the "improvements" of the arches are really defined by better support for evolving graphics APIs and that older cards are saddled as much by drivers and very narrow feature support, but he settles on a 30-60% performance uplift from GCN1->RDNA1. This is what gen-on-gen performance looks like without the advantage of higher clocks or nodeshrinks for increased density.

Unfortunately, that only gets us up to RDNA1. And while DF has done extensive reviews of RDNA1 and RDNA2 cards, they are not the same sort of forced like-for-like comparison. On the other hand, other's have done so.

https://www.extremetech.com/gaming/321068-amd-radeon-6700-xt-vs-5700-xt-rdna2-comparison

Extremetech takes the same basic route, comparing the CU-for-CU 5700XT to the 6700XT, in a clock-to-clock matchup, and discovers that RDNA2 is not vastly more performant than RDNA, about 4%. It is vastly more power efficient, which is where the real advantage comes from. Infinity Cache - and a multitude of other changes - allow RDNA2 to pursue higher clocks.

RDNA2 is obviously relevant to consoles in terms of feature set - RT, variable rate shading, etc - and these features effect performance, as well as being able to hit reasonably high clock speeds at solid thermals. These are important. but in terms of trying to compare Drake, with it's known arch and PS5 with it's known clock speed, I think we can leave it. At least, at this level, where we don't have devices to physically benchmark :)

So, if we can just get a similar sort of Ampere to RDNA comparison, we can make reasonable estimates to the Drake's level of raster performance relative to the PS5/Series X, the last gen consoles, and their mid-gen upgrades. Unfortunately, this gets squishier.

If you were going to repeat these kinds of experiments, you'd probably want to compare either the RX 6600 and the RTX 3060, as 28 CU/SM cards, and/or the RX 6900 XT and the RTX 3080 Ti, as 80 CU/SM cards. I don't have a testing rig myself, so I can't do a comparison that controls for clock speed, and the two vendors have made fundamentally different decisions on handling memory bandwidth, so forcing the cards to share that value would produce results that are really just testing the cards memory systems. So the closest I can get is to look at an extensive, identical benchmark for these pairs of cards, and scale for clock speed.

DF provides those benchmarks. The existing reviews for these 4 cards all use the same testing rig and the same sets of games, with matched settings. Because I'm measuring raster performance, I've removed the RT driven benchmarks. Similarly, I've removed some CPU bound benchmarks, though in effect, these did nothing significant to the scores.

RX 6900 XT VS RTX 3080 Ti: Ampere performs from 94% of RDNA2 to 120% depending on the benchmark. The average is a mere 105%, inside the standard deviation of 8%. However, if we adjust for the clock, Nvidia pulls ahead. 142% if you use the "official" clock numbers. DF notes that the 3080 Ti's power solution is not the base 3080s, but the 3090s, and that the clocks tend to pull outside their official ranges. Fair enough, 135% it is. That is my "reasonable" floor.

RX 6600 vs RTX 3060: This starts to get more interesting. Ampere sits in a much wider range, from 91% of RDNA2 to a whopping 136%. Average is a somewhat less brutal 113%, with a stddev of 11%. The RX6600 is clearly bandwidth starved. Unlike the Ampere cards, which have a fairly consistent bandwidth-to-terraflop ratio, the RDNA2 cards are all over the place, while also trying to balance it with Infinity Cache size at various tiers. The RX6600 is just vastly underfed. Adjusted for clocks, this gives Ampere 159% of RDNA2

Ampere is clearly amazing. The gap between Ampere and RDNA2 is effectively generational, in roughly the same 30-60% range that AMD achieved from GCN->RDNA2. There are open questions about which end of that range is fair, and how much performance variation is due memory challenged low end cards. There are similarly open questions of whether some of RDNA's perceived improvements over it's predecessors are due to evolving driver support for DX12 and Vulkan that left older cards behind.

We can take this daisy chain of head to head test (and a grain of salt) and come up with Ampere performing 1.8x above and beyond the original GCN as a reasonable rate. This assumes that the RX6600 is memory starved above and beyond the rest of the RDNA2 line and that the PS4's active software library is continuing to extract performance out of the aging hardware, while the GCN PC cards are left behind.

Or we can assume absolute max gap, and get something like a 2.5x uplift. At this level, Drake would blast absolutely past the PS4, but not match the PS4 Pro (on the GPU level alone) with its 36 CUs on GCN4 . DLSS + the vastly superior CPU are obviously major factors in the console’s actual performance, as are the cross-gen period.

Similarly, we can use the RDNA2 numbers for comparing to current gen. 1.35x means that Drake does remarkable well relative to Series S for being a portable console, but the series S’s higher clocks and shader numbers still leave it in a different league. The larger 1.6x ratio makes that gap much smaller, but the Series S still has the edge, and in the faster CPU comes along for the ride.

Oh wow: Just saw @ReddDreadtheLead’s additional benchmarks. I will take a look but at the moment I haven’t had a chance
 
So again it boils down to Game Freak setting too early a release date, not accounting for delays, and then with the merchandising machine in action they're stuck with the release date they set three years ago.
This is why Nintendo haven’t announced a 3D Mario game to go with the opening of the movie or the theme park.
 
0
Similarly, we can use the RDNA2 numbers for comparing to current gen. 1.35x means that Drake does remarkable well relative to Series S for being a portable console, but the series S’s higher clocks and shader numbers still leave it in a different league. The larger 1.6x ratio makes that gap much smaller, but the Series S still has the edge, and in the faster CPU comes along for the ride.
It has less shaders than Drake.

1536 vs 1280.
 
Last edited:
Anyway…. @oldpuck is this video good enough for your curiosity on GCN5.0 vs Ampere?

Yeah, this is useful data, as is DF's head to heads between Vega64 and the R9 Fury X. Specifically, it says that GCN5 was godawful. The the Vega64 card barely inches about the R9 Fury X despite having 1.5x the clock speed. This trends across anything you can normalize to, Vega 64 performance plummets relative to its predecessors and obviously its successors. The 3060s vast over-performance relative to compute units is matched by other AMD GPUs.

Ampere performs 2.5x better than Vega, if we divide by per compute unit. And Vega is outperformed by GCN3 by like 25%. If we use Vega to guess at Ampere's efficiency over GCN3, we get 2x, which is suspiciously close to my 1.8x.

But GCN3 never shipped in a console, nor did GCN5, it's only use in this kind of comparison would be to use it as a stepping stone chip in the absence of other data. With the existing GCN1:GCN4:RDNA1 numbers, the RDNA1:RDNA2 numbers, and the RDNA2:Ampere numbers, as long as that daisy chain is, it seems much more reliable than using Vega in any way.

I'm not saying my analysis is perfect by any means, but the Vega data seems consistent with it

It has less shaders than Drake.

1536 vs 1280.
Yes, I misspoke, I meant to refer to CUs. The additional shaders per SM to RDNAs CU is a significant of why Ampere gets ahead.
 
So again it boils down to Game Freak setting too early a release date, not accounting for delays, and then with the merchandising machine in action they're stuck with the release date they set three years ago.

I mean I appreciate the enlightening of how GameFreak, Creatures, Nintendo and TPCi work together, but all I can do is sigh and shake my head yet again at how this franchise currently managed is despite a rabid fanbase that will throw all their money at it regardless of game performance (kinda like Toronto Maple Leafs fans)


Man said the leafs lmaoooo
 
Yeah, this is useful data, as is DF's head to heads between Vega64 and the R9 Fury X. Specifically, it says that GCN5 was godawful. The the Vega64 card barely inches about the R9 Fury X despite having 1.5x the clock speed. This trends across anything you can normalize to, Vega 64 performance plummets relative to its predecessors and obviously its successors. The 3060s vast over-performance relative to compute units is matched by other AMD GPUs.

Ampere performs 2.5x better than Vega, if we divide by per compute unit. And Vega is outperformed by GCN3 by like 25%. If we use Vega to guess at Ampere's efficiency over GCN3, we get 2x, which is suspiciously close to my 1.8x.

But GCN3 never shipped in a console, nor did GCN5, it's only use in this kind of comparison would be to use it as a stepping stone chip in the absence of other data. With the existing GCN1:GCN4:RDNA1 numbers, the RDNA1:RDNA2 numbers, and the RDNA2:Ampere numbers, as long as that daisy chain is, it seems much more reliable than using Vega in any way.

I'm not saying my analysis is perfect by any means, but the Vega data seems consistent with it


Yes, I misspoke, I meant to refer to CUs. The additional shaders per SM to RDNAs CU is a significant of why Ampere gets ahead.
To be fair, it should be noted though the DirectX 12 is on PC and the newer cards are better at it than the older cards. And all the while NVN2 will be lower level and targeting a single platform, not more general to multiple platforms (though DX12 is low level as is, just not console level afaik). GCN 1.0 GPUs are less suited for DX12U than say… GCN5.0 or even RDNA based GPUs. Or

At that point you have to basically just wonder how good is Nvidia‘s implementation of the NVN2.0 API for Drake versus Sony‘s GNM API for the PlayStation 4 that developers can squeeze the most out of it. So the comparisons will really just come down to who can actually do the job better in the long term between the two platforms. Who does the better job in the long run? You know what I mean?


In reality, I do think that Drake in portable mode is what would be comparable to the Xbox one/PlayStation 4 consoles however, and if it was around between the PS4 and XBox One in TF number (ex:1.5-1.6TF) it should be better at handling the task than the PS4 aka more efficient at doing the same job, while also targeting a lower resolution because of the screen.

That is assuming it is a 720p display and I’m going to assume it is until I see it otherwise noted that it’s going to be a higher resolution display. For docked mode, it’s gonna be a completely different Scenario entirely. and it would be able to compare to the PS4 Pro despite being lower by virtue of being more efficient. (Not accounting anything DLSS to clean up the image)


It’s really difficult to do proper comparison of architectures across very different hardwares, and we just have to wait and see how that pans out.




I will say this, however:

1) these consoles lack the infinity cache and therefore they do not get a 25% performance uplift from the infinity cache in terms of raster performance.

2) they cannot access the CPU cache, and they cannot act as a pseudo-infinity cache, They simply fetch out to the VRAM and that’s pretty much it, they do have an infinity fabric and it does connect the entire soc elements to one another so it can communicate better but that’s about it.

3) the ram situation in these consoles is very interesting, and should be studied. Because they are RDNA-based they are less efficient with memory, and they are less efficient with memory bandwidth. Mind you RDNA is better than the GCN-based architecture but it is not that great. Still good, but not excellent. Nvidia products have been able to manage more with less, or rather do about the same with less resources


4) the PS5 doesn’t have the full RDNA2 implementation, it has a few features but not all of them. The most noteworthy feature in RDNA2 besides the large L3 is the Ray Tracing, but the PS5 only has that in terms of features. The other feature that is more of an efficiency thing is that it is able to clock higher. Doesn’t have hardware based VRS, doesn’t have mesh shaders and uses a legacy geometry path that is the same as the PS4 Pro (that’s how you got BC!), doesn’t have sampler feedback, doesn’t have the “AI acceleration hardware” this is a meme to me because the series doesn’t have it either.

The Series may have the “full” RDNA2 implementation (more like 1.6-1.8), however from the current looks of it devs aren’t keen on using it and prefer to use their own software solution (VRS for example with CoD).

It remains unclear how much better AMpere and Turing are with the same features, or how much worse. Both have mesh shaders but again hard to say which does it better and if it’s even enough for devs to rely on that, it makes the development process much easier.


But in any case will the PS5 not having the full implementation matter in the long term? That remains to be seen, they’ll just fallback on a software solution that emulates it and the PS5 is plenty powerful. It is also the target platform therefore it doesn’t lose out, the others having more that can do it reasonably well doesn’t mean it’ll get fully used. It’ll just exist and probably be underutilized.

Some of the features that aren’t there Sony simply offered their own hardware based solution elsewhere. Fast SSD isn’t for fun, well it is but really they don’t use DX12U, they found their own implementation that is a work around that. It’s a blistering fast SSD.

6) the GDDR. Funnily enough with the GDDR memory, it has a much higher latency than DDR, and LPDDR memory has a higher latency than DDR, but is significantly lower than GDDR memory when it comes to latency. It’s between DDR and GDDR but it is closer to the former than the latter, and the higher the latency the more negatively it affects game performance (mainly the CPU). And speaking of, x86_64 should have a higher memory bandwidth requirement than the ARM-based SOCs. I remember Thraktor using a 3700X as a point of comparison in the speculation before, and that has 23.84GB/s for bandwidth with a max of 47.68GB/s. but it’s a UMA so it’s hard to say how much it really matters. We will see with the games.

Or maybe the 4700U which is the closest to the consoles listed here, has a single channel up to 17.07GB/s but a quad channel up to 68.28GB/s



There’s more but I’m blanking at the moment on how to properly articulate it. 😅

But to get the idea of all of what I want to say, drake is such an enigma with respect to what it can actually do, and how it would actually perform that all of the speculation could be so right on the money, and we’re getting exactly what we’re saying, or we are greatly overestimating it that it is so bad that it makes the PS4 look like a technical marvel through some way or form that I can’t quantify. Or we are greatly underestimating Drake, and how it will be implemented into the next Nintendo system, that it actually performs better, and I mean noticeably better than what we are giving a credit for. Maybe it’ll be 40-50% better than what we are implying even with numbers/ giving it credit for, but in actual real world practice.


Henceforth, my lowest expectation is XB1/PS4 in portable mode. There’s only a 30% difference between these two.
 
To be fair, it should be noted though the DirectX 12 is on PC and the newer cards are better at it than the older cards. And all the while NVN2 will be lower level and targeting a single platform, not more general to multiple platforms (though DX12 is low level as is, just not console level afaik). GCN 1.0 GPUs are less suited for DX12U than say… GCN5.0 or even RDNA based GPUs. Or

At that point you have to basically just wonder how good is Nvidia‘s implementation of the NVN2.0 API for Drake versus Sony‘s GNM API for the PlayStation 4 that developers can squeeze the most out of it. So the comparisons will really just come down to who can actually do the job better in the long term between the two platforms. Who does the better job in the long run? You know what I mean?


In reality, I do think that Drake in portable mode is what would be comparable to the Xbox one/PlayStation 4 consoles however, and if it was around between the PS4 and XBox One in TF number (ex:1.5-1.6TF) it should be better at handling the task than the PS4 aka more efficient at doing the same job, while also targeting a lower resolution because of the screen.

That is assuming it is a 720p display and I’m going to assume it is until I see it otherwise noted that it’s going to be a higher resolution display. For docked mode, it’s gonna be a completely different Scenario entirely. and it would be able to compare to the PS4 Pro despite being lower by virtue of being more efficient. (Not accounting anything DLSS to clean up the image)


It’s really difficult to do proper comparison of architectures across very different hardwares, and we just have to wait and see how that pans out.




I will say this, however:

1) these consoles lack the infinity cache and therefore they do not get a 25% performance uplift from the infinity cache in terms of raster performance.

2) they cannot access the CPU cache, and they cannot act as a pseudo-infinity cache, They simply fetch out to the VRAM and that’s pretty much it, they do have an infinity fabric and it does connect the entire soc elements to one another so it can communicate better but that’s about it.

3) the ram situation in these consoles is very interesting, and should be studied. Because they are RDNA-based they are less efficient with memory, and they are less efficient with memory bandwidth. Mind you RDNA is better than the GCN-based architecture but it is not that great. Still good, but not excellent. Nvidia products have been able to manage more with less, or rather do about the same with less resources


4) the PS5 doesn’t have the full RDNA2 implementation, it has a few features but not all of them. The most noteworthy feature in RDNA2 besides the large L3 is the Ray Tracing, but the PS5 only has that in terms of features. The other feature that is more of an efficiency thing is that it is able to clock higher. Doesn’t have hardware based VRS, doesn’t have mesh shaders and uses a legacy geometry path that is the same as the PS4 Pro (that’s how you got BC!), doesn’t have sampler feedback, doesn’t have the “AI acceleration hardware” this is a meme to me because the series doesn’t have it either.

The Series may have the “full” RDNA2 implementation (more like 1.6-1.8), however from the current looks of it devs aren’t keen on using it and prefer to use their own software solution (VRS for example with CoD).

It remains unclear how much better AMpere and Turing are with the same features, or how much worse. Both have mesh shaders but again hard to say which does it better and if it’s even enough for devs to rely on that, it makes the development process much easier.


But in any case will the PS5 not having the full implementation matter in the long term? That remains to be seen, they’ll just fallback on a software solution that emulates it and the PS5 is plenty powerful. It is also the target platform therefore it doesn’t lose out, the others having more that can do it reasonably well doesn’t mean it’ll get fully used. It’ll just exist and probably be underutilized.

Some of the features that aren’t there Sony simply offered their own hardware based solution elsewhere. Fast SSD isn’t for fun, well it is but really they don’t use DX12U, they found their own implementation that is a work around that. It’s a blistering fast SSD.

6) the GDDR. Funnily enough with the GDDR memory, it has a much higher latency than DDR, and LPDDR memory has a higher latency than DDR, but is significantly lower than GDDR memory when it comes to latency. It’s between DDR and GDDR but it is closer to the former than the latter, and the higher the latency the more negatively it affects game performance (mainly the CPU). And speaking of, x86_64 should have a higher memory bandwidth requirement than the ARM-based SOCs. I remember Thraktor using a 3700X as a point of comparison in the speculation before, and that has 23.84GB/s for bandwidth with a max of 47.68GB/s. but it’s a UMA so it’s hard to say how much it really matters. We will see with the games.

Or maybe the 4700U which is the closest to the consoles listed here, has a single channel up to 17.07GB/s but a quad channel up to 68.28GB/s



There’s more but I’m blanking at the moment on how to properly articulate it. 😅

But to get the idea of all of what I want to say, drake is such an enigma with respect to what it can actually do, and how it would actually perform that all of the speculation could be so right on the money, and we’re getting exactly what we’re saying, or we are greatly overestimating it that it is so bad that it makes the PS4 look like a technical marvel through some way or form that I can’t quantify. Or we are greatly underestimating Drake, and how it will be implemented into the next Nintendo system, that it actually performs better, and I mean noticeably better than what we are giving a credit for. Maybe it’ll be 40-50% better than what we are implying even with numbers/ giving it credit for, but in actual real world practice.


Henceforth, my lowest expectation is XB1/PS4 in portable mode. There’s only a 30% difference between these two.
One thing I'd like to note about that is:

That's still 30% more powerful and likely targeting 720p (compared to 1080p), with RT and DLSS support, with a CPU generations ahead of it. I'd say with confidence that there will be plenty of games that run on Drake which would be infeasible on Xbox One and PS4 due to CPU, memory, or raytracing limitations. In theory, if it has a 720p screen, it could render games at (as low as) 240-360p, and use DLSS to get an "OK" 720p image. While power doesn't scale perfectly with resolution, say 360p is the internal target for a Drake game in handheld and tabletop mode, that's 1/10th the pixels of 1080p, freeing up even more power and memory that the PS4 and Xbox One couldn't spare.
 
For the 4700U, 17.07/34.13/68.27 GB/s lines up with 32-bit/64-bit/128-bit LPDDR4X-4266, so that's speaking more for the ram support itself.
Interesting, the 3700X's numbers are like 6-7% off of what I'd expect from DDR4-3200 (but still ever so slightly above DDR-2933, so I'm ruling that out). Is there really that much of a hit from the IO being a separate die?
 
[snip a lot of smart stuff]
Henceforth, my lowest expectation is XB1/PS4 in portable mode. There’s only a 30% difference between these two.
Short: totally agree.

The reason that I did this analysis was to set a confident floor. We can be confident that Ampere is outpacing RDNA* by a generational margin. Are there subtleties? Yes. Are there ranges in which it could operate? Of course. But if you're accounting for native resolution, XB1/PS4 level power is basically a slam dunk, and PS4 Pro power is sorta past the optimistic line - but DLSS means that line means something totally different. Once you get enough of a gap for DLSS additional power can go to actual in game features, where the Pro consoles were totally obligated to spend it on jacking up internal res for checkerboarding.

At that point it's really about Nintendo balancing how much power they want for their first party games, versus battery life. I feel pretty good that ~400 is a comfortable perf range, and very close to the best battery life, and anything in the 800Mhz in handheld mode is going to make docked mode perfectly scalable. It's a very good place to be.

How compromised Series S is on the memory front, and how compromised the RDNA implementation is in current gen is just more room for impossible ports.

I have a dumb fan theory that Nintendo will land on 852 Mhz/1.7Ghz for the gpu/cpu clocks, just because with 12SMs and octo core, it would (on paper) look exactly like a Xbone but in portable, which just amuses me to now end.
 
So, in some places I've seen some comments around a video yesterday from SuperMetalDave64 offering some kind of indication/confirmation of a next gen device. I don't think that it's the case, so let me summarize the content of the video for you:

— Retro Studios hired a VFX artist with an impressive portfolio (who has previously worked on the game Remnant: From the Ashes).
— Another LinkedIn finding of UI Technical Designer at Retro Studios showed that they are developing Metroid Prime 4 through an in-house engine.


That's all. I guess it bodes well for MP4, but obviously there's no linking to any new hardware development. In case you want to see the video, you have it here:


I think visuals will be the last thing we will have to worry about mp4 (something common with retro but especially true now). Not that i dont think it will be a fantastic game, but i wasnt ever worried that retro was going to disappoint.

As for still using RUDE/another version of RUDE/A new ingame engine that was expected, its common inside nintendo and from what the kiwitalkz interviews told retro is quite good at building them.
 
Short: totally agree.

The reason that I did this analysis was to set a confident floor. We can be confident that Ampere is outpacing RDNA* by a generational margin. Are there subtleties? Yes. Are there ranges in which it could operate? Of course. But if you're accounting for native resolution, XB1/PS4 level power is basically a slam dunk, and PS4 Pro power is sorta past the optimistic line - but DLSS means that line means something totally different. Once you get enough of a gap for DLSS additional power can go to actual in game features, where the Pro consoles were totally obligated to spend it on jacking up internal res for checkerboarding.

At that point it's really about Nintendo balancing how much power they want for their first party games, versus battery life. I feel pretty good that ~400 is a comfortable perf range, and very close to the best battery life, and anything in the 800Mhz in handheld mode is going to make docked mode perfectly scalable. It's a very good place to be.

How compromised Series S is on the memory front, and how compromised the RDNA implementation is in current gen is just more room for impossible ports.

I have a dumb fan theory that Nintendo will land on 852 Mhz/1.7Ghz for the gpu/cpu clocks, just because with 12SMs and octo core, it would (on paper) look exactly like a Xbone but in portable, which just amuses me to now end.
I only want 520.8333MHz portable and 1,041.6667MHz docked… with the CPU at 2.13GHz if at 5nm….

Why? Because it wouldn’t get that hot, warm but never hot. And I want to go above the actual expectation for just a sec.

And just a slightly bigger battery. It could manage a min 3-3.5H battery minimum and around 7-7.5H if using the same current battery and OG switch cooling solution.


I’m already expecting comments of it being too high, but I consider 2.5-2.8GHz too high… and this is only hypothetical. The cores themselves aren’t rated for that max, they are rated for higher.



The situation of the original TX1 is no more! Plus, 5nm is scary efficient… have any of you seen the 4090? Rated for 450W, 90% of the time it hovers around 350-370W range.


I think that, within that range for portable it can land in the 5-10W range pretty comfortably not like the TX1 which operates close to the maximum and was rated for 15W.





Off topic: I was looking at the the Exynos 850 which has 8 A55 cores clocked to 2GHz and on the 8nm, it supposedly draws 7W for the SoC. But I’m not 100% sure on that.

And the 865 with those Perf cores does 10W max.
 
0
lol, that's about what I expected

it'll really be about native nintendo games for me
In some scenarios with DLSS involved, absolutely.

Edit: I got the wrong quote and for some reason I thought 1060.

Your actual quote for one:

potentially amusing question for some, would this machine likely outperform my gtx 1080?


1060 is 4.3 Teraflops in pascal hardware. If the hardware was turing, I couldn't say that with confidence, since turing is supposedly more efficient per flop than Ampere. This is assuming Drake is 2.5-3.

But not 1080. 1080 is 8.9. My bad.
 
Last edited:
All Pro related stuff I have needs a great deal of confirmation before I can even begin to consider reporting it. As said earlier this year, I need strong confirmation if I am to report/talk on what I have because the info is heavy.

Are you able to say if you are leaning toward the news being positive or negative? It's fine if you can't say.
 
All Pro related stuff I have needs a great deal of confirmation before I can even begin to consider reporting it. As said earlier this year, I need strong confirmation if I am to report/talk on what I have because the info is heavy.

jordan_peele_sweating.gif

good heavy or bad heavy? you probably won’t answer that
 
HugeVioletCaterpillar-max-1mb.gif
 
Now is the time for optimism! A reveal around the Game Awards is still possible! Don't let your dreams be memes! Meme your dreams!

#SwitchSweep
#Switch2in2022

But seriously, as absurd as some would expect it to be, I could see a Game Awards announcement. If it happens I'd expect it to be more akin to the original announcement for Nintendo Switch, tease its capabilities, shtum on what games are coming out and when, no pricing, just a release window.

If it doesn't happen, well, sue me! 😆
 
Now is the time for optimism! A reveal around the Game Awards is still possible! Don't let your dreams be memes! Meme your dreams!

#SwitchSweep
#Switch2in2022

But seriously, as absurd as some would expect it to be, I could see a Game Awards announcement. If it happens I'd expect it to be more akin to the original announcement for Nintendo Switch, tease its capabilities, shtum on what games are coming out and when, no pricing, just a release window.

If it doesn't happen, well, sue me! 😆
Did they ever do any meaningful announcement/reveal during the Game Awards?

I mean, at all, not just Nintendo
 
I know looking at a few numbers doesn't give the whole story, but for a quick idea of how various NVIDIA GPUs compare I like this Wikipedia page. 1080 is around 8-9 teraflops, notebook 1070 about 6. A newer design like Drake would have advantages in DLSS and ray tracing, but those older cards aren't helpless in decent scaling as FSR2 is still an option.
 
0
Please read this staff post before posting.

Furthermore, according to this follow-up post, all off-topic chat will be moderated.
Last edited:


Back
Top Bottom