• Hey everyone, staff have documented a list of banned content and subject matter that we feel are not consistent with site values, and don't make sense to host discussion of on Famiboards. This list (and the relevant reasoning per item) is viewable here.
  • Do you have audio editing experience and want to help out with the Famiboards Discussion Club Podcast? If so, we're looking for help and would love to have you on the team! Just let us know in the Podcast Thread if you are interested!

StarTopic Future Nintendo Hardware & Technology Speculation & Discussion |ST| (Read the staff posts before commenting!)

you're still using Orin as a representative of what Nintendo would use when it isn't. Orin isn't made for gaming and Nvidia wouldn't sell it to nintendo for such
IIRC, the drake SoC is also based off the same ampere arch and off of Tegra Orin. So CUDA core count per SM is still applicable and I believe the same can be said about GPU clock speeds. The fact that the TDP targets in the lower end also match what a device like the switch would typically use, only proves my point.

Also, why do you care about Orin not being targeted towards gaming? like I said, the only important thing on that dev kit is the chip itself.
The original purpose of the the hardware is meaningless if it can be repurposed to be used within another platform.
 
IIRC, the drake SoC is also based off the same ampere arch and off of Tegra Orin. So CUDA core count per SM is still applicable and I believe the same can be said about GPU clock speeds. The fact that the TDP targets in the lower end also match what a device like the switch would typically use, only proves my point.

Also, why do you care about Orin not being targeted towards gaming? like I said, the only important thing on that dev kit is the chip itself.
The original purpose of the the hardware is meaningless if it can be repurposed to be used within another platform.

The SOC of an automotive design has a LOT of it's footprint dedicated to things that just waste space and power for a product that only needs to run videogames.
 
0
IIRC, the drake SoC is also based off the same ampere arch and off of Tegra Orin. So CUDA core count per SM is still applicable and I believe the same can be said about GPU clock speeds. The fact that the TDP targets in the lower end also match what a device like the switch would typically use, only proves my point.

Also, why do you care about Orin not being targeted towards gaming? like I said, the only important thing on that dev kit is the chip itself.
The original purpose of the the hardware is meaningless if it can be repurposed to be used within another platform.
lots of power is being used for other parts. if you look back on the specs of Orin, the listed clocks for the gpu and cpu are low for the TDP because power is used elsewhere. given the purpose of the SoC, it'd be an esoteric case to turn off those bits for the sake of more cpu/gpu power. which is why we don't actually know just how much power is devoted to the PVA/DLA/etc because the general use case is with them being utilized
 
0
The technical side of this discussion is interesting to say the least. I must admit you all are very knowledgeable on this subject. It was fascinating all the way back in the days of the Wii U , when everyone was tossing their hat in the ring trying to figure out what it was capable of and where it stood compared to its peers.

I am a bit more partial to the grander scope of the device for example: what it can do over the original, what makes the software impressive from the gameplay standpoint, any innovative gimmicks they come up with and something along the lines of the dualsense controller. It's exciting and I hope it's not just a spec bump.

Moreover, from my limited knowledge it appears as though "Drake" is within spitting distance of Xbox Series S? I would hope to enjoy games such as Eldin Ring and REmake 4 on this thing. It would be lovely to go nintendo only, truth be told.
 
0
the Series S is the best thing to happen to Drake
Seriously.

Also, looking at the cross-gen period we're in, none of the current crop of PS5/XSX games seem to push game design boundaries much. It's mostly PS4/XBO level games spruced up. That's also a blessing for Drake's image as a revitalised hybrid console.
 
Small summary:
  • Because games will be written specifically for the platform with a low-level API it will likely outperform Steam Deck due to the latter's OS overhead and unoptimised nature.
  • On par with Series S is not gonna happen due to battery life, but with strategic nips and tucks (and DLSS + Ampere being more efficient compared to RDNA2) some games might be comparable while docked.
Very good. These have been my expectations since Drake was leaked. . Since DF reaches a wider audience and folks take them almost as gospel (what with the whole discussion about BotW2 showing Switch Pro footage...) I hope this can levelset expectations. I'm a little tired of hearing about lowballing specs 'because Nintendo' in a world after the Nvidia leak. And by lowballing I mean the unreasonable kind.
 
Seriously.

Also, looking at the cross-gen period we're in, none of the current crop of PS5/XSX games seem to push game design boundaries much. It's mostly PS4/XBO level games spruced up. That's also a blessing for Drake's image as a revitalised hybrid console.

The most game design pushing game I am aware of right now, is astroneer, which is a voxel based seamless open world game, where the world is an entire solar system, and every planet is entirely moddable voxel based geometry, you can carve out and extrude from, all the way to the core of the planet, the path to being full of subterranean caves to dig through to and explore and the changes are permanent..... And if large enough, visible from space/other planets. As was the case when I made a massive scaffolding tower for my multiteired base, only to find later that my planet basically had a gigantic erection, until I was finished and removed the scaffolding.

Also the game has automated equipment, which you can set up and even primitively program by creating logic gates, which continue to run autonomously even when you are on another planet, even across multiple planets.

And it's running surprisingly well on switch.
 
IIRC, the drake SoC is also based off the same ampere arch and off of Tegra Orin. So CUDA core count per SM is still applicable and I believe the same can be said about GPU clock speeds. The fact that the TDP targets in the lower end also match what a device like the switch would typically use, only proves my point.

Also, why do you care about Orin not being targeted towards gaming? like I said, the only important thing on that dev kit is the chip itself.
The original purpose of the the hardware is meaningless if it can be repurposed to be used within another platform.
Orin is an unusually bad Tegra SOC for a new Switch. Orin is a useful data point for a lot of things but we know Drake is a custom chip, so even where we don't have hard info on Drake, we can make assumptions about changes. This makes power-per-watt comparisons using Orin pretty difficult.

Orin's GPU has a large number of fixed function hardware for real time computer vision, which, if you're going to build a custom chip, you want to cut out. On the CPU side, the Hercules-AE class ARM CPUS are specifically designed for self-driving hardware and have to be ASIL-B compliant. That means, effectively, that the CPU is 2x the power draw/transistor count per cycle. On top of that, we know that the Drake GPU supports power management features that no other Ampere GPU has.

There has been extensive coverage in the thread about what differences between Drake and Orin that were discovered in the NVN2 hack, plus assumed differences that come along with getting to customize a chip rather than repurposing Orin, if you want to dig in the, I believe most of them are bookmarked in OP
 
0
Despite you realizing that nintendo doesn't go for even mid-range hardware but really low end, there doesn't seem to be any actual documentation or spec info available online that would reinforce your claims of a 4cores, 4SMs Orin SoC.
No one is claiming the existence of a 4 core, 4SM Orin SoC. We have hard data on the design of Drake, we know it's a custom chip. If Nintendo wanted a custom chip at a far less ambitious goal, that would be possible, and we know Drake isn't that.

Drake is not a customized or cut down Orin. They are likely architecturally related, but you're presuming an origin of the chip that many of us believed prior to the Nvidia hack but now we know is false.

Also, dev kits are not final hardware. Confusing the power of whatever similar off the shelf components were shoved into a Pro devkit with the components in a Jetson AGX devkit with Drake will leave you far from the actual configuration of the device.

I think your analysis is excellent but your assumptions don't jive with the Nvidia leak, which, if you want to assume that Drake != the new device, then you need to lay out those assumptions.

Edit: Sorry to double reply to you @niconiconick64, got twisted up following the thread.
 
0
  • On par with Series S is not gonna happen due to battery life, but with strategic nips and tucks (and DLSS + Ampere being more efficient compared to RDNA2) some games might be comparable while docked.
Some games "might be comparable" to the series S (when accounting for DLSS, architecture, etc.) wow. Just how weak are they expecting this thing to be? The Series S is 4 tflops. If you assume the switch 2 is 2 tflops, with a better architecture and DLSS then it should absolutely outperform the series S in many key areas. So I guess they think it's gonna be like 1tflop docked?
 
Some games "might be comparable" to the series S (when accounting for DLSS, architecture, etc.) wow. Just how weak are they expecting this thing to be? The Series S is 4 tflops. If you assume the switch 2 is 2 tflops, with a better architecture and DLSS then it should absolutely outperform the series S in many key areas. So I guess they think it's gonna be like 1tflop docked?

It’s a portable system. Expecting better than Series S shows a complete lack of understanding.
 
They're talking about docked mode, not handheld.
Docked… but always keep your expectation in check…
Nothing is confirmed … yet … we need to wait a little.
Datas are missing …we got rumors … leaks end speculations based on solids research ( tech reports.. business talks etc… )
But sure we will get a good piece of hardware anyway…
 
Codename Drake will have an A78C class CPU with it, which was quite disruptive - A78C can clock up to 3.3GHz, but I suspect that it will be around 2-2.3GHz and believe the sub 2GHz consensus estimations on here to be woefully conservative.
The performance CPU cores on smartphone SoCs only run at high frequencies in short bursts of time, which helps prevent smartphone SoCs from getting too hot, and allows smartphones to get therefore, the performance CPU cores on smartphone SoCs are advertised at running at high frequencies, usually within the range of around 2-3 GHz (e.g. 1 Cortex-A78 core at 3 GHz and 3 Cortex-A78 cores at 2.6 GHz for the Dimensity 1300, 4 Cortex-A78 cores at 2.85 GHz for the Dimensity 8100, 4 Cortex-A78 cores at 2.75 GHz for the Dimensity 8000, 2 Cortex-A78 cores at 2.5 GHz for the Dimensity 920, 2 Cortex-A78 cores at 2.4 GHz for the Dimensity 900, 2 Cortex-A78 cores at 2.2 GHz for the Dimensity 930).
The CPU cores on Drake, which as mentioned is possibly the Cortex-A78C, on the other hand, have to run for sustained periods of time at the same frequency for TV mode and handheld mode. And nobody knows for certain which process node Nintendo and Nvidia decide to use to fabricate Drake. If Nintendo and Nvidia decide to fabricate Drake using Samsung's 8N process node, which is still a possibility, especially with Orin likely being fabricated using Samsung's 8N process node, running the CPU cores at 2-2.3 GHz could cause Drake to potentially run too hot, if the varying CPU frequencies for the various AGX Orin and Orin NX modules are any indication. And although Nintendo's new hardware is certainly going to have access to adequate cooling, unlike smartphones, generally speaking, the higher the frequencies, the lower the yield rate for chips. And Drake is a high volume chip, so ensuring high yields for Drake is in Nintendo's and Nvidia's best interest.

Although not related to Nintendo, Nvidia is added as a board member of the UCIe (Universal Chiplet Interconnect Express) Consortium.

(Try removing the "?utm_source=Twitter&utm_medium=Social&utm_content=MediaTek+Delivers+Efficient+Cortex-X2&utm_campaign=2022+-+FY+-+Linley+Microprocessor+Report" portion after clicking on the link. But if that doesn't work, here's the archive of the TechInsights blog post.)
A-93-Title.jpg

A-93-Title2.jpg

Although the die shot of the Cortex-A710 is from the Dimensity 9000, which is fabricated using TSMC's N5 process node, I think the Cortex-A710 could potentially give a very rough idea of how large the Cortex-A78 is. Considering that Arm advertises the Cortex-A710 having a 10% IPC increase compared to the Cortex-A78, and all of the Cortex-A710's major structures in the front end are exactly the same as the Cortex-A78's, I presume the Cortex-A78 is ~10% larger than the Cortex-A710, assuming TSMC's N5 process node is used for the comparison.
 
Some games "might be comparable" to the series S (when accounting for DLSS, architecture, etc.) wow. Just how weak are they expecting this thing to be? The Series S is 4 tflops. If you assume the switch 2 is 2 tflops, with a better architecture and DLSS then it should absolutely outperform the series S in many key areas. So I guess they think it's gonna be like 1tflop docked?
I'm gonna go crazy and say Drake could run Metro Exodus Enhanced at the same resolution (but lower non-RT settings) as the Series S at 30fps
 
I know we been over this multiple times, but assuming 8 nm is all SM running in portable mode even feasible? If it is, wouldn't it be running below the lowest Switch clock?
If Drake is manufactured on 8nm, I would be amazed if they were running all 12 SMs in portable mode at any clock. All we have to do is look at the supported power states for the Jetson Orin variants (which can be accessed on Nvidia's developer site) to see how unlikely that is.

The full-fat Jetson Orin 64GB version (which has 16 SMs and 12 A78 CPU cores) has power profiles going down to 15W. To get there, Nvidia have to disable most of the CPU and GPU cores, leaving them with 4 CPU cores at 1.1GHz and 6 SMs at 420MHz. Memory is also clocked down by half. Now, there are PVA and DVA components on there (also clocked down), and we know Drake's tensor cores are less powerful, but to assume that Nvidia and Nintendo would somehow manage to get twice as many CPU and GPU cores running at half the power consumption of Orin using the same architecture on the same manufacturing process is extremely wishful thinking.

Even the cut-down Jetson Orin NX 16GB, which is most similar to Drake in memory width, (assumed) CPU cores and GPU SMs, has to drop down to just 4 SMs at 625MHz to hit its 15W power mode.

Given what the new model is trying to do, though, dropping down to (say) 6 SMs in portable mode really isn't that bad of an idea. The original Switch had a 720p output resolution in portable mode, and a 1080p output in docked. That's a 2.25x difference in the number of pixels being output between the mode, and originally the Switch had a 2x difference in GPU clocks and therefore performance between the two modes (I believe 384MHz was the original portable clock, and the 460MHz was added later, happy to be corrected on that, though.) What this meant is that the GPU performance per pixel was within the ballpark in both modes, which makes it easier for developers to support both modes. Obviously there are other aspects like memory bandwidth which come into play, but the whole idea of docked vs portable modes is to balance the performance, resolution and power consumption across the two use-cases.

Coming to the new Drake-powered model, pretty much the most-repeated fact about it is that it outputs in 4K resolution. Most people in this thread seem to think it will have a 720p screen, which would mean that instead of a 2.25x difference in output resolution on the original Switch, this would have a 9x difference in output resolution. This is the main reason I personally believe it's more likely to have a 1080p screen, but even in that case there's a 4x difference in resolution between handheld and docked. If we keep the number of SMs the same between both modes, then we're unlikely to come close to the 4x difference in GPU performance, let alone anywhere near the 9x difference a 720p screen would bring. Realistically, we would probably have something like a 2x difference in clocks between the two modes, which would leave the performance per pixel either 2x higher in portable mode or 4.5x higher.

If they're using an 8nm chip, then it really should be thought of as adding extra SMs that are only going to be used in docked mode, rather than disabling SMs in portable mode. Running 12 SMs in portable mode would never have been on the table for 8nm, so their options were either to run say 6 SMs in both modes and have perhaps 420MHz portable/1GHz docked clocks, or add 6 more SMs, only run them in docked mode, and have say 420MHz portable/840MHz docked. The latter would provide a lot more performance in docked mode, and if they use a 1080p screen for portable mode, would give developers a 1:1 ratio of performance per pixel between the two. Of course it would also require higher power draw in docked mode (which is more achievable than in portable, at it's only a question of cooling, not battery life), and some additional cost in terms of increased die area, but Nintendo may have found those trade-offs worthwhile.

The other possibility is that it's just not on 8nm. On a process in TSMC's 5nm family, like Nvidia are using for Hopper and Ada, it should probably be fine running all 12 SMs in portable mode. On either TSMC 7nm/6nm or Samsung 5nm/4nm it might be able to squeeze it, but it's hard to say without data like we have for Jetson Orin on 8nm.
 
The performance CPU cores on smartphone SoCs only run at high frequencies in short bursts of time, which helps prevent smartphone SoCs from getting too hot, and allows smartphones to get therefore, the performance CPU cores on smartphone SoCs are advertised at running at high frequencies, usually within the range of around 2-3 GHz (e.g. 1 Cortex-A78 core at 3 GHz and 3 Cortex-A78 cores at 2.6 GHz for the Dimensity 1300, 4 Cortex-A78 cores at 2.85 GHz for the Dimensity 8100, 4 Cortex-A78 cores at 2.75 GHz for the Dimensity 8000, 2 Cortex-A78 cores at 2.5 GHz for the Dimensity 920, 2 Cortex-A78 cores at 2.4 GHz for the Dimensity 900, 2 Cortex-A78 cores at 2.2 GHz for the Dimensity 930).
The CPU cores on Drake, which as mentioned is possibly the Cortex-A78C, on the other hand, have to run for sustained periods of time at the same frequency for TV mode and handheld mode. And nobody knows for certain which process node Nintendo and Nvidia decide to use to fabricate Drake. If Nintendo and Nvidia decide to fabricate Drake using Samsung's 8N process node, which is still a possibility, especially with Orin likely being fabricated using Samsung's 8N process node, running the CPU cores at 2-2.3 GHz could cause Drake to potentially run too hot, if the varying CPU frequencies for the various AGX Orin and Orin NX modules are any indication. And although Nintendo's new hardware is certainly going to have access to adequate cooling, unlike smartphones, generally speaking, the higher the frequencies, the lower the yield rate for chips. And Drake is a high volume chip, so ensuring high yields for Drake is in Nintendo's and Nvidia's best interest.

Although not related to Nintendo, Nvidia is added as a board member of the UCIe (Universal Chiplet Interconnect Express) Consortium.

(Try removing the "?utm_source=Twitter&utm_medium=Social&utm_content=MediaTek+Delivers+Efficient+Cortex-X2&utm_campaign=2022+-+FY+-+Linley+Microprocessor+Report" portion after clicking on the link. But if that doesn't work, here's the archive of the TechInsights blog post.)
A-93-Title.jpg

A-93-Title2.jpg

Although the die shot of the Cortex-A710 is from the Dimensity 9000, which is fabricated using TSMC's N5 process node, I think the Cortex-A710 could potentially give a very rough idea of how large the Cortex-A78 is. Considering that Arm advertises the Cortex-A710 having a 10% IPC increase compared to the Cortex-A78, and all of the Cortex-A710's major structures in the front end are exactly the same as the Cortex-A78's, I presume the Cortex-A78 is ~10% larger than the Cortex-A710, assuming TSMC's N5 process node is used for the comparison.

I see a direct correlation between ipc and percentage size increase from you. Your basis for your presumption is the increase is directly related to something physically added to the footprint like transistors/cache?
 
I see a direct correlation between ipc and percentage size increase from you. Your basis for your presumption is the increase is directly related to something physically added to the footprint like transistors/cache?
I think so, considering that Arm has reduced the macro-OP (MOP) cache from 6 MOPs on the Cortex-A78 to 5 MOPs on the Cortex-A710, which does decrease the area and power consumption, but with a performance penalty, for the Cortex-A710.

But there's always a possibility my presumption is incorrect.
 
If Drake is manufactured on 8nm, I would be amazed if they were running all 12 SMs in portable mode at any clock. All we have to do is look at the supported power states for the Jetson Orin variants (which can be accessed on Nvidia's developer site) to see how unlikely that is.

The full-fat Jetson Orin 64GB version (which has 16 SMs and 12 A78 CPU cores) has power profiles going down to 15W. To get there, Nvidia have to disable most of the CPU and GPU cores, leaving them with 4 CPU cores at 1.1GHz and 6 SMs at 420MHz. Memory is also clocked down by half. Now, there are PVA and DVA components on there (also clocked down), and we know Drake's tensor cores are less powerful, but to assume that Nvidia and Nintendo would somehow manage to get twice as many CPU and GPU cores running at half the power consumption of Orin using the same architecture on the same manufacturing process is extremely wishful thinking.

Even the cut-down Jetson Orin NX 16GB, which is most similar to Drake in memory width, (assumed) CPU cores and GPU SMs, has to drop down to just 4 SMs at 625MHz to hit its 15W power mode.

Given what the new model is trying to do, though, dropping down to (say) 6 SMs in portable mode really isn't that bad of an idea. The original Switch had a 720p output resolution in portable mode, and a 1080p output in docked. That's a 2.25x difference in the number of pixels being output between the mode, and originally the Switch had a 2x difference in GPU clocks and therefore performance between the two modes (I believe 384MHz was the original portable clock, and the 460MHz was added later, happy to be corrected on that, though.) What this meant is that the GPU performance per pixel was within the ballpark in both modes, which makes it easier for developers to support both modes. Obviously there are other aspects like memory bandwidth which come into play, but the whole idea of docked vs portable modes is to balance the performance, resolution and power consumption across the two use-cases.

Coming to the new Drake-powered model, pretty much the most-repeated fact about it is that it outputs in 4K resolution. Most people in this thread seem to think it will have a 720p screen, which would mean that instead of a 2.25x difference in output resolution on the original Switch, this would have a 9x difference in output resolution. This is the main reason I personally believe it's more likely to have a 1080p screen, but even in that case there's a 4x difference in resolution between handheld and docked. If we keep the number of SMs the same between both modes, then we're unlikely to come close to the 4x difference in GPU performance, let alone anywhere near the 9x difference a 720p screen would bring. Realistically, we would probably have something like a 2x difference in clocks between the two modes, which would leave the performance per pixel either 2x higher in portable mode or 4.5x higher.

If they're using an 8nm chip, then it really should be thought of as adding extra SMs that are only going to be used in docked mode, rather than disabling SMs in portable mode. Running 12 SMs in portable mode would never have been on the table for 8nm, so their options were either to run say 6 SMs in both modes and have perhaps 420MHz portable/1GHz docked clocks, or add 6 more SMs, only run them in docked mode, and have say 420MHz portable/840MHz docked. The latter would provide a lot more performance in docked mode, and if they use a 1080p screen for portable mode, would give developers a 1:1 ratio of performance per pixel between the two. Of course it would also require higher power draw in docked mode (which is more achievable than in portable, at it's only a question of cooling, not battery life), and some additional cost in terms of increased die area, but Nintendo may have found those trade-offs worthwhile.

The other possibility is that it's just not on 8nm. On a process in TSMC's 5nm family, like Nvidia are using for Hopper and Ada, it should probably be fine running all 12 SMs in portable mode. On either TSMC 7nm/6nm or Samsung 5nm/4nm it might be able to squeeze it, but it's hard to say without data like we have for Jetson Orin on 8nm.


The counter argument, is that the 1536 cc/ 12 SM figure is hardcoded several places in NVN2, which does seem to indicate its a constant.

But NVN2 in the state from the theft was definitely still work in progress, so we probably can't make conclusions like that from it.
 
The counter argument, is that the 1536 cc/ 12 SM figure is hardcoded several places in NVN2, which does seem to indicate its a constant.

But NVN2 in the state from the theft was definitely still work in progress, so we probably can't make conclusions like that from it.

As far as I'm aware, the same places that specify 12 SMs for Drake also hardcode 16 SMs for Orin, 144 SMs for AD102, etc. I don't think there are any references to the reduced SM modes for Orin, the binned versions of desktop GPUs, etc., so there wouldn't necessary have to be any reference to different modes for Drake in the stolen code, and it may be handled somewhere else (perhaps on Nintendo's side).

I see a direct correlation between ipc and percentage size increase from you. Your basis for your presumption is the increase is directly related to something physically added to the footprint like transistors/cache?
There's a rule of thumb (called Pollack's rule) that CPU core performance increases as a square of the increase in complexity, whereas power draw generally increases linearly with complexity. So if, on the same manufacturing process, a core is 4x bigger than another, we should expect it to be about 2x as powerful. ARM's X1/X2 cores are a good example of this, as they're basically just bigger versions of the A78/A710. With the X1 (according to Anandtech) the die area and power consumption should both be about 1.5x higher over the A78, and ARM claimed a 22% performance increase, which is exactly the square root of the die area increase.
 
oof, if Drake indeed disables 6SM and runs the GPU at an even lower clock in handheld mode it will be far weaker than the Xbone. Bit of a disappointment if rumored games like RDR2 would need to sacrifice graphics just to run at native 720p
 
0
As far as I'm aware, the same places that specify 12 SMs for Drake also hardcode 16 SMs for Orin, 144 SMs for AD102, etc. I don't think there are any references to the reduced SM modes for Orin, the binned versions of desktop GPUs, etc., so there wouldn't necessary have to be any reference to different modes for Drake in the stolen code, and it may be handled somewhere else (perhaps on Nintendo's side).
I believe the API calls for 12SMs, but other places just have information on what the hardware ORIN (and others) specs are and their differences. Like AD102 having 144SMs.

Though @LiC would be more appropriate to answer this.
 
0
They're talking about docked mode, not handheld.
They're not talking about either, they're talking about Switch in general.

This is one of the things that lots of discussions gloss over. Even if Drake matches series S pound for pound in docked mode, games still need to be built against the handheld power at some point - just as games on Series X are somewhat defined by the Series S's power.

Secondly, raw GPU performance isn't the only metric here. CPU perf absolutely matters, as does storage, memory bandwidth, etc. Even if we assume that docked mode GPU can go toe-to-toe with Series S, I don't expect comparable experiences. If nothing else, the size of an Xbox game on a disk is much much larger than a Switch cart, while a less powerful CPU means that lossless decompression of assets is significantly slower.

That means even if you assume parity on the GPU you will still need "strategic nips and tucks" to get a Series S game cut down for Drake. Similarly, while Nvidia's RT hardware outperforms AMDs, the sheer quantity of RT hardware in Drake seems minimal (and there have been rumors of the RT solution underperforming in Devkits), so I would expect downgrades in the lighting solutions as well.

It's been said multiple times here, the direction Drake seems to take is the raw power of previous generation hardware with the feature set that exceeds current generation. That combination makes a whole new class of miracle ports possible, but it doesn't make "multiplats that run on S can run on Switch" out of the box (which is the question that was being asked).
 
If people are expecting better than Series S even in docked then I think they’re going to be disappointed.
I don’t think anybody is expecting overall more power.

Certain well optimized games with dlss might be comparable or better strictly from an image quality perspective. Like a lot of games have better image quality on one x than series s.

But it will still be weaker overall.
 
0
ARM didn't mention anything about size for the A710, so yea, A710 at the minimum is equal in transistor count to the A78, if not bigger. If there are any area savings, marketing would mention it (like the A78 having -5% area compared to the A77).

I don't particularly care for going by flop comparisons, but IIRC, going off of what Alovon posted some time ago, RDNA2 flops without Infinite Cache ended up comparable to RDNA1 flops, right? And RDNA1 flops were comparable to 1.25x GCN flops? And Ampere flops were comparable to GCN? So as far as raw rasterization power goes, Series S's 4 tflops convert to something like 5 tflops in Ampere?

Hmm, looks like there's been a slight update to Samsung's eUFS catalog. So, it used to be the case that 3.0 and 3.1 were lumped together in the same page, and sometime earlier this year I've commented that the only 3.0 part remaining is the 1 TB part. That 1 TB 3.0 part can still be found on the website, but 3.0 got split off to its own page. And if you go up a level to Samsung's UFS page in general, they link only to 3.1 and 4.0. Annoyingly, for 3.1, the parts finder section only returns 128/256/512 GB parts, despite the images on the website clearly advertising a 1 TB version.

...oh, Micron's LPDDR5 catalog, since when did you grow from 3 pages to 4?
 
If Drake is manufactured on 8nm, I would be amazed if they were running all 12 SMs in portable mode at any clock. All we have to do is look at the supported power states for the Jetson Orin variants (which can be accessed on Nvidia's developer site) to see how unlikely that is.

The full-fat Jetson Orin 64GB version (which has 16 SMs and 12 A78 CPU cores) has power profiles going down to 15W. To get there, Nvidia have to disable most of the CPU and GPU cores, leaving them with 4 CPU cores at 1.1GHz and 6 SMs at 420MHz. Memory is also clocked down by half. Now, there are PVA and DVA components on there (also clocked down), and we know Drake's tensor cores are less powerful, but to assume that Nvidia and Nintendo would somehow manage to get twice as many CPU and GPU cores running at half the power consumption of Orin using the same architecture on the same manufacturing process is extremely wishful thinking.

Even the cut-down Jetson Orin NX 16GB, which is most similar to Drake in memory width, (assumed) CPU cores and GPU SMs, has to drop down to just 4 SMs at 625MHz to hit its 15W power mode.

Given what the new model is trying to do, though, dropping down to (say) 6 SMs in portable mode really isn't that bad of an idea. The original Switch had a 720p output resolution in portable mode, and a 1080p output in docked. That's a 2.25x difference in the number of pixels being output between the mode, and originally the Switch had a 2x difference in GPU clocks and therefore performance between the two modes (I believe 384MHz was the original portable clock, and the 460MHz was added later, happy to be corrected on that, though.) What this meant is that the GPU performance per pixel was within the ballpark in both modes, which makes it easier for developers to support both modes. Obviously there are other aspects like memory bandwidth which come into play, but the whole idea of docked vs portable modes is to balance the performance, resolution and power consumption across the two use-cases.

Coming to the new Drake-powered model, pretty much the most-repeated fact about it is that it outputs in 4K resolution. Most people in this thread seem to think it will have a 720p screen, which would mean that instead of a 2.25x difference in output resolution on the original Switch, this would have a 9x difference in output resolution. This is the main reason I personally believe it's more likely to have a 1080p screen, but even in that case there's a 4x difference in resolution between handheld and docked. If we keep the number of SMs the same between both modes, then we're unlikely to come close to the 4x difference in GPU performance, let alone anywhere near the 9x difference a 720p screen would bring. Realistically, we would probably have something like a 2x difference in clocks between the two modes, which would leave the performance per pixel either 2x higher in portable mode or 4.5x higher.

If they're using an 8nm chip, then it really should be thought of as adding extra SMs that are only going to be used in docked mode, rather than disabling SMs in portable mode. Running 12 SMs in portable mode would never have been on the table for 8nm, so their options were either to run say 6 SMs in both modes and have perhaps 420MHz portable/1GHz docked clocks, or add 6 more SMs, only run them in docked mode, and have say 420MHz portable/840MHz docked. The latter would provide a lot more performance in docked mode, and if they use a 1080p screen for portable mode, would give developers a 1:1 ratio of performance per pixel between the two. Of course it would also require higher power draw in docked mode (which is more achievable than in portable, at it's only a question of cooling, not battery life), and some additional cost in terms of increased die area, but Nintendo may have found those trade-offs worthwhile.

The other possibility is that it's just not on 8nm. On a process in TSMC's 5nm family, like Nvidia are using for Hopper and Ada, it should probably be fine running all 12 SMs in portable mode. On either TSMC 7nm/6nm or Samsung 5nm/4nm it might be able to squeeze it, but it's hard to say without data like we have for Jetson Orin on 8nm.
This is very good analysis, however, I think the power-per-pixel ratios don't take DLSS properly into account. The power-per-pixel number needs to be factored based on the targeted internal resolution, not the targeted output, with accommodations made for additional frame budget for DLSS.

One example scenario is that 4k output is achieved via something like DLSS Performance Ultra mode, and that Tensor Cores are disabled in handheld mode. That would mean that the target internal resolution in handheld is 720p60, and the target internal resolution of docked mode is 1080p70 - same gap as current, res wise, but the extra frame time is needed in docked mode to have time to run DLSS to get to 4k

Another scenario is that handheld targets 1080p70, with DLDSR running to downsample supported games to 720p60, with docked mode targetting and internal res of 1440p70, again, DLSS that higher quality image to the same 4k.

My assumption is that, just like on current Switch, Nintendo will offer an array of performance profiles for each mode, and a game will be able to trade off various clocks across the CPU, GPU, Tensor cores (if independently scalable?), and Memory to maintain power draw. My suspicion too is that migrating state from fully shut down cores transitioning between docked and undocked mode - or spinning up cores and expanding the threadpool - requires too thick an abstraction layer or too tricky a dev environment, and that instead they'll go with ridiculously low clocks instead.
 
I expect better IQ than some Series S games because of DLSS (if a game is using it in the first place), not necessarily comparable fidelity. Skyward Sword HD has better IQ than Xenoblade 2 but it's obvious the latter is pushing more detail and effects. Drake and Series S comparisons are going to be really interesting.
 
ARM didn't mention anything about size for the A710, so yea, A710 at the minimum is equal in transistor count to the A78, if not bigger. If there are any area savings, marketing would mention it (like the A78 having -5% area compared to the A77).

I don't particularly care for going by flop comparisons, but IIRC, going off of what Alovon posted some time ago, RDNA2 flops without Infinite Cache ended up comparable to RDNA1 flops, right? And RDNA1 flops were comparable to 1.25x GCN flops? And Ampere flops were comparable to GCN? So as far as raw rasterization power goes, Series S's 4 tflops convert to something like 5 tflops in Ampere?

Hmm, looks like there's been a slight update to Samsung's eUFS catalog. So, it used to be the case that 3.0 and 3.1 were lumped together in the same page, and sometime earlier this year I've commented that the only 3.0 part remaining is the 1 TB part. That 1 TB 3.0 part can still be found on the website, but 3.0 got split off to its own page. And if you go up a level to Samsung's UFS page in general, they link only to 3.1 and 4.0. Annoyingly, for 3.1, the parts finder section only returns 128/256/512 GB parts, despite the images on the website clearly advertising a 1 TB version.

...oh, Micron's LPDDR5 catalog, since when did you grow from 3 pages to 4?
It’s interesting that it matches it, but ampere does it as a much lower memory bandwidth than GCN.

(Even RDNA too)
 
It would really be frustrating if the next switch cant get to 1440p
without having resolution scaling dipping a lot.
were to far away from switch for a 1080p console.

@people being worried about less cores mobile: with a 720p screen that should really not be to big of a problem.

and i really hope its better then the SD.
I know some argue that the steamdeck isnot that old, it wont be much better.
But steamdeck is not a mass market product as the switch is, and games are not made native for it. I would be shocked if games dont run much better even with the same power on switch.
 
It would really be frustrating if the next switch cant get to 1440p
without having resolution scaling dipping a lot.
were to far away from switch for a 1080p console.

@people being worried about less cores mobile: with a 720p screen that should really not be to big of a problem.

and i really hope its better then the SD.
I know some argue that the steamdeck isnot that old, it wont be much better.
But steamdeck is not a mass market product as the switch is, and games are not made native for it. I would be shocked if games dont run much better even with the same power on switch.
Steamdeck is a more expensive product sold at a lose, Nintendo will not sell hardware at a lose*

*there is the chance they do in individual regions due to currency exchange reasons but it will not be global thing and especially will not happen in NA
 
How far away from the PS4 will Drake be in portable mode???

Any advantages???

Please don’t laugh, I play a ton in that mode and was just curious.

Why would anybody laugh? Portable will always remain a major selling point of the console. If a game has a great looking portable mode and a mediocre or bad docked mode I mind far less than the inverse.
 
0
Steamdeck is a more expensive product sold at a lose, Nintendo will not sell hardware at a lose*

*there is the chance they do in individual regions due to currency exchange reasons but it will not be global thing and especially will not happen in NA
Economy of scale is the thing i hope for. i am hoping for roughly brute force comparable to SD, maybe slightly better, with optimization and low level punching above SD, and in regards to price 4-450€.
 
0
How far away from the PS4 will Drake be in portable mode???

I think we roughly expect them to be in the ballpark of each other but we’re not sure.
Any advantages???

This we know better. There will be some (possibly very minimal) ray tracing support, and likely some form of DLSS
Please don’t laugh, I play a ton in that mode and was just curious.
I play exclusively in handheld.
 
0
If all 12 SMs are active in portable mode, even with copy/pasted clocks from OG Switch, it should be ahead of the PS4 in raw graphical power without taking into account the modern feature set. CPU is at minimum equal to, but likely stronger. If what you're hoping for is a PS4 or a PS4+ in portable form, I wouldn't be worried yet.
Edit: wait, I'm striking that out because I was thinking of docked clocks, hold on for edit 2.
Edit 2: 12 SMs + copy/pasted clocks is below PS4 in raw flops, but feature set should bring it up to the same ballpark. Plus, if it's still a 720p display, it should be plenty fine from a raw grunt per pixel perspective?

Speaking of profiles, I was wondering if those frequency presets are more like what the cores/SMs/RAM are locked to (ie, profile says X clock for CPU, CPU runs at X as long as that profile is active) or more like caps/ceilings and let DVFS do its thing for power conversation (ie, opportunistically downclock)?
The reason this question comes to mind is that I'm trying to figure out how LPDDR5X saves energy over base LPDDR5 (with the context being, I'm assuming either Drake or a future revised-Drake will use 5X). I don't think that the voltages change. Micron doesn't really explain yet as far as I know. Samsung's page for 5X makes it sound like improved DVFS is the main driver for power consumption improvements.
 
Last edited:
0
And 900p is not a TV standard, yet BotW renderes 900p docked and upscales it to 1080p as the output.

When im talking 1440 im talking render resolution, not output resolution.
Given DLSS, would there be any advantage to rendering at 1440p?
 
Given DLSS, would there be any advantage to rendering at 1440p?
Honest answer? as long as it looks about right (comparable), im fine with how ever they acomplish that.

But my problem with dlss: it is a specific implementation. it is not just there.
I could see many ports just not bothering. I could see som eindies not having the resources to implement it just for the switch port. If its strong enough to brute force some games to 1440 that would be great.
And then theres the fact, that we dont know how effective DLSS on the switch would be.
Or if it will be there, actually, since the console is not anounced. (i know, nvidia leak, we hat a ton of leaks of stuff where nintendo did not release it, see NSO emulators... (for now))
 
Honest answer? as long as it looks about right (comparable), im fine with how ever they acomplish that.

But my problem with dlss: it is a specific implementation. it is not just there.
I could see many ports just not bothering. I could see som eindies not having the resources to implement it just for the switch port. If its strong enough to brute force some games to 1440 that would be great.
And then theres the fact, that we dont know how effective DLSS on the switch would be.
Or if it will be there, actually, since the console is not anounced. (i know, nvidia leak, we hat a ton of leaks of stuff where nintendo did not release it, see NSO emulators... (for now))

I recognize these concerns. I feel both Nintendo and Nvidia assuredly do too. And if Nvidia and Nintendo DOhave a 'Switch DLSS' that is using that fixed functions tensor core hardware, I would imagine a massive part of that undertaking, would be in making sure switch dlss is as product is of course, functional, and as conveniently packaged as possible all the way down to the smallest indie studio (well, I mean, 2d is kinda non applicable but we all get the gist yeah?)
 
0
It’s not like Nintendo can’t support 1440p.

(nor does it really hurt anyway).

Expecting them to do so is just a luck if the draw. They don’t gain nor lose from supporting it.
 
0
Honest answer? as long as it looks about right (comparable), im fine with how ever they acomplish that.

But my problem with dlss: it is a specific implementation. it is not just there.
I could see many ports just not bothering. I could see som eindies not having the resources to implement it just for the switch port. If its strong enough to brute force some games to 1440 that would be great.
And then theres the fact, that we dont know how effective DLSS on the switch would be.
Or if it will be there, actually, since the console is not anounced. (i know, nvidia leak, we hat a ton of leaks of stuff where nintendo did not release it, see NSO emulators... (for now))

I don’t want to trivialize things but is it so complicated or difficult that teams couldn’t update games to use it without any changes to assets? And would it not be in Nintendo’s and Nvidia’s best interests to make the inclusion as easy as possible?

Indie devs also have to deal with making changes for performance improvements, and DLSS I’d hope would be near first cab off the rank when weighing cost/benefits.
 
Correct me if I'm wrong, but can't DLSS be used as a drop in for TAA in game engines? Surely that must reduce the work needed.
 
Honest answer? as long as it looks about right (comparable), im fine with how ever they acomplish that.

But my problem with dlss: it is a specific implementation. it is not just there.
I could see many ports just not bothering. I could see som eindies not having the resources to implement it just for the switch port. If its strong enough to brute force some games to 1440 that would be great.
And then theres the fact, that we dont know how effective DLSS on the switch would be.
Or if it will be there, actually, since the console is not anounced. (i know, nvidia leak, we hat a ton of leaks of stuff where nintendo did not release it, see NSO emulators... (for now))
what do you mean it's not there? if the porting team doesn't bother, it's more likely because the game won't benefit (it's a 2d game for example), or that the game reaches the performance without it. as for effectiveness, we know enough that TAAU works well on switch and that the tensor cores would still offload work. Steam Deck and FSR2 is enough of a show on that
 
Correct me if I'm wrong, but can't DLSS be used as a drop in for TAA in game engines? Surely that must reduce the work needed.
If it has TAA it reduces the work by a lot, but it also need a bit of work for like a few days to a few weeks to make sure it actually looks good.

Not all games would be designed for it.
 
0
Please read this staff post before posting.

Furthermore, according to this follow-up post, all off-topic chat will be moderated.
Last edited:


Back
Top Bottom