• Hey everyone, staff have documented a list of banned content and subject matter that we feel are not consistent with site values, and don't make sense to host discussion of on Famiboards. This list (and the relevant reasoning per item) is viewable here.
  • Furukawa Speaks! We discuss the announcement of the Nintendo Switch Successor and our June Direct Predictions on the new episode of the Famiboards Discussion Club! Check it out here!

StarTopic Future Nintendo Hardware & Technology Speculation & Discussion |ST| (New Staff Post, Please read)

Sure, I'll just do some quick clocks.

Portable
400MHz ->1.228TFLOPs
500MHz ->1.536TFLOPs
600MHz ->1.843TFLOPs

Docked
768MHz (Switch docked clock) ->2.36TFLOPs
1GHz -> 3.07TFLOPs
1.152GHz -> 3.539TFLOPs
1.3GHz -> 3.993TFLOPs
1.5GHz -> 4.6TFLOPs
1.8GHz -> 5.53TFLOPs
2GHz -> 6.14TFLOPs

Realistically, I think only the first 4 for Docked would be possible with such a big GPU, with DLSS on top, it would be a current gen console and match fidelity with expectations, would sit right between XBSS and XBSX.

Yeah, it said 12RT for Drake and 16RT for Orin. I'm just trying to preserve the known specs that we have without taking into account theories, maybe Dakhil will be able to fill out a list for the OP of known specs, also what that data sheet with benchmarks might be is actually tensor cores and gives 16.6TOPs I think for Drake.
Any idea how many watts would it pull at a 1.8Ghz clock?
 
0
Sure, I'll just do some quick clocks.

Portable
400MHz ->1.228TFLOPs
500MHz ->1.536TFLOPs
600MHz ->1.843TFLOPs

Docked
768MHz (Switch docked clock) ->2.36TFLOPs
1GHz -> 3.07TFLOPs
1.152GHz -> 3.539TFLOPs
1.3GHz -> 3.993TFLOPs
1.5GHz -> 4.6TFLOPs
1.8GHz -> 5.53TFLOPs
2GHz -> 6.14TFLOPs

Realistically, I think only the first 4 for Docked would be possible with such a big GPU, with DLSS on top, it would be a current gen console and match fidelity with expectations, would sit right between XBSS and XBSX.

Yeah, it said 12RT for Drake and 16RT for Orin. I'm just trying to preserve the known specs that we have without taking into account theories, maybe Dakhil will be able to fill out a list for the OP of known specs, also what that data sheet with benchmarks might be is actually tensor cores and gives 16.6TOPs I think for Drake.
I’m leaning towards 500 MHz portable and 1 ghz docked it just feels right
 
also what that data sheet with benchmarks might be is actually tensor cores and gives 16.6TOPs I think for Drake.
Wouldn’t that be a really small amount, especially with 12 SMs?
I’m leaning towards 500 MHz portable and 1 ghz docked it just feels right
I’m betting we get multiple clock options in handheld for devs, like Switch currently has.
 
I'm so glad we finally have something new to talk about this week! My guess for the GPU clocks is 400mhz portable / 1ghz docked. Could be higher of course, but with 12 SM to feed I'm erring on the side of caution.

As for clock gating though, isn't the simplest answer that it's being used for backwards compatibility? No need to utilize 12SM when Mariko only has two. Even if it's 'enhanced' BC I doubt they would go above four. The main benefit to a wide GPU like Drake in Nintendo's circumstances would seem to be that you get more Tensor and RT cores, like @Skittzo said.

And on that note, the battery life when playing Switch games should be amazing, right? If it's only using half the CPU cores and 1/3 - 1/6 of the GPU, that's gotta be some huge power savings. Maybe eight hours on BotW? Ten if we're lucky? Or am I underestimating how much of the power budget is spent on the game card drive, screen, speakers, et al?
 
A surprising amount of info out of this. I mean I feel bad for Nvidia, but at the same time it's hard not to be excited about the potential for more leaks in the days ahead.

12SM is certainly an unexpected possibility. Seems like it would necessitate a larger form factor and/or smaller node than expected.
 
SO, I forgot what this person's name was, but there was this developer that posted here. at some stage they said they could no longer comment on things due to NDA, and at some later stage said they probably won't make their game for Switch 2 if there were some things missing (and they seemed to thing they might be missing). Anyone know who I'm talking about? I thought it would be interesting if, based solely on the leak (so no NDA breaking whatsoever), they would comment here on whether they would still not make their game for Switch 2.

Then again maybe there is still insufficient information from the leak.
The technology brainchild referred to was Nanite.

To quote him:
Even an RTX 3050 would be fine. That isn't the issue. Based on current profiling, however, I need a data transfer rate of at least around 500 MB/s for loading new Nanite assets, which would be often due to how navigation in the game works.
Unless proven otherwise, a RTX 3050 is way above what we can realistically except from a Switch successor, if it retains the same form factor. Sure, the RTX 3050 might not be the abolute lowest hardware that would satisfy requirements but I won't hold my breath for Nanite on the succ.
 
The technology brainchild referred to was Nanite.

To quote him:

Unless proven otherwise, a RTX 3050 is way above what we can realistically except from a Switch successor, if it retains the same form factor. Sure, the RTX 3050 might not be the abolute lowest hardware that would satisfy requirements but I won't hold my breath for Nanite on the succ.
Brainchild specifically stated gpu wasn't the issue in that quote
If nanite doesn't make it to switch next it wouldn't be because of the GPU... it would be the storage speed which is still a huge unknown at this point in time.
which if I remember correctly he stated 300MB/s would be low but somewhat workable and 500MB/s in that quote would be ideal.
 
The technology brainchild referred to was Nanite.

To quote him:

Unless proven otherwise, a RTX 3050 is way above what we can realistically except from a Switch successor, if it retains the same form factor. Sure, the RTX 3050 might not be the abolute lowest hardware that would satisfy requirements but I won't hold my breath for Nanite on the succ.
Nanite can run on GPUs far weaker than the 3050.
 
Brainchild specifically stated gpu wasn't the issue in that quote
If nanite doesn't make it to switch next it wouldn't be because of the GPU... it would be the storage speed which is still a huge unknown at this point in time.
which if I remember correctly he stated 300MB/s would be low but somewhat workable and 500MB/s in that quote would be ideal.
The Switch's current storage speed is around 104MB/s if my Google search is correct, yeah? What kind of storage standard would Nintendo have to pursue to get that minimum 300MB/s? Would that also require SD cards at minimum that are UHS-II or UHS-III if NVMe is out of the question?
 
The Switch's current storage speed is around 104MB/s if my Google search is correct, yeah? What kind of storage standard would Nintendo have to pursue to get that minimum 300MB/s? Would that also require SD cards at minimum that are UHS-II or UHS-III if NVMe is out of the question?
I think the greater issue than internal storage would be the speed of game cards. Fast internal storage won’t mean much if the game cards are too slow.
 
The Switch's current storage speed is around 104MB/s if my Google search is correct, yeah? What kind of storage standard would Nintendo have to pursue to get that minimum 300MB/s? Would that also require SD cards at minimum that are UHS-II or UHS-III if NVMe is out of the question?
I don't remember if we are getting those speeds or not... I seem to remember hearing that they are throttled to match game card speeds
Personally I'm hoping for UFS as a solution as they perform at the required speeds ... but we just don't know and have't had any indication of what they might use
I think the greater issue than internal storage would be the speed of game cards. Fast internal storage won’t mean much if the game cards are too slow.
game installs could be a thing but yeah
 
Nanite can run on GPUs far weaker than the 3050.
It's not a question of Nanite specifically. Brainchild is working on a game that combines Nanite with some form (undescribed) of asset streaming. Let's not confuse Brainchild (a specific developer who has made his needs for a specific project clear) with generic features of a third party engine.
 
Nanite can run on GPUs far weaker than the 3050.
See, this is one information that I haven't found even after checking the official Unreal presentation about Nanite and the official Unreal documentation:

General Advice on Data Size and a Look to the Future​

Nanite and Virtual Texturing systems, coupled with fast SSDs, have lessened concern over runtime budgets of geometry and textures. The biggest bottleneck now is how to deliver this data to the user.

Data size on disk is an important factor when considering how content is delivered — on physical media or downloaded over the internet — and compression technology can only do so much. Average end user's internet bandwidth, optical media sizes, and hard drive sizes have not scaled at the same rate as hard drive bandwidth and access latency, GPU compute power, and software technology like Nanite. Pushing that data to users is proving challenging.

Rendering highly detailed meshes efficiently is less of a concern with Nanite, but storage of its data on disk is now the key area that must be kept in check. Outside of compression, future releases of Unreal Engine should see tools to support more aggressive reuse of repeated detail, and tools to enable trimming data late in production to get package size in line, allowing art to safely overshoot their quality bar instead of undershoot it.
@brainchild : do you have a better estimate for the minimum hardware requirements to run Nanite, GPU-wise?
 
FYI some guy back in June 2021 seemed to accurately guess the number of GPU shaders (assuming Drake still has 128 CUDA cores per SM):

 
Brainchild specifically stated gpu wasn't the issue in that quote
If nanite doesn't make it to switch next it wouldn't be because of the GPU... it would be the storage speed which is still a huge unknown at this point in time.
which if I remember correctly he stated 300MB/s would be low but somewhat workable and 500MB/s in that quote would be ideal.
IIRC, for his project, 300 MB/s was the minimum, 1000 MB/s was the worry-free ideal, and 500 MB/s was the 'gotta keep some restraints in mind, but still quite workable' middle.
 
The recent nvidia leak actually makes the storage issue more complicated in my opinion.

Now we know its likely to be more performant than we previously thought, higher quality assets will need to be introduced into games. For new switch games this means bigger cartridges and downloads, for old games getting patches I imagine the patches will be pretty large.

With those bigger assets comes a need to load them faster, otherwise the load times on switch could quickly become unbearable, though some of the current load time issues are certainly due to CPU.

I guess the question is, how much read and write speed could a theoretical 8 core ARM A78C setup utilise? Then what solution could satisfy this need?

In the original switch I have seen it mentioned on here and other places that it's limited to around 25Mb/s to match cartridge read speed and due to CPU. So does the new switch really need anything close to something like 1Gb/s from NVME? Probably not.

For internal, MMC at 350mb/s operating at full speed is probably enough as long as the cpu makes use of it. For cartridges I am not sure what Nintendo will do, but for external storage I am completely stumped.

There's nothing really out there that is widely adopted outside of NVME that's fast enough. NVME likely draws too much power and takes up too much space for the form factor. I know there are smaller drives but again, not widely adopted.

Maybe Nintendo brings back the old kick stand design and to install external storage you replace the actual kick stand with an NVME and use the drive as your kickstand. /s
 
The recent nvidia leak actually makes the storage issue more complicated in my opinion.

Now we know its likely to be more performant than we previously thought, higher quality assets will need to be introduced into games. For new switch games this means bigger cartridges and downloads, for old games getting patches I imagine the patches will be pretty large.

With those bigger assets comes a need to load them faster, otherwise the load times on switch could quickly become unbearable, though some of the current load time issues are certainly due to CPU.

I guess the question is, how much read and write speed could a theoretical 8 core ARM A78C setup utilise? Then what solution could satisfy this need?

In the original switch I have seen it mentioned on here and other places that it's limited to around 25Mb/s to match cartridge read speed and due to CPU. So does the new switch really need anything close to something like 1Gb/s from NVME? Probably not.

For internal, MMC at 350mb/s operating at full speed is probably enough as long as the cpu makes use of it. For cartridges I am not sure what Nintendo will do, but for external storage I am completely stumped.

There's nothing really out there that is widely adopted outside of NVME that's fast enough. NVME likely draws too much power and takes up too much space for the form factor. I know there are smaller drives but again, not widely adopted.

Maybe Nintendo brings back the old kick stand design and to install external storage you replace the actual kick stand with an NVME and use the drive as your kickstand. /s

Don’t expect old games to recieve new assets in patches, they will just adjust settings like resolution and draw distance as well as add DLSS in games where it can be done quickly and cheaply.
 
The recent nvidia leak actually makes the storage issue more complicated in my opinion.

Now we know its likely to be more performant than we previously thought, higher quality assets will need to be introduced into games. For new switch games this means bigger cartridges and downloads, for old games getting patches I imagine the patches will be pretty large.

With those bigger assets comes a need to load them faster, otherwise the load times on switch could quickly become unbearable, though some of the current load time issues are certainly due to CPU.

I guess the question is, how much read and write speed could a theoretical 8 core ARM A78C setup utilise? Then what solution could satisfy this need?
This is why RTX IO has come up, and whether or not something similar would appear in a future switch
 
Don’t expect old games to recieve new assets in patches, they will just adjust settings like resolution and draw distance as well as add DLSS in games where it can be done quickly and cheaply.
I think you are probably right for the most part. I can see some studios doing work on some evergreen titles though like cd project Red touching up the witcher 3 and maybe Bethesda upping asset quality on the doom games. I don't think Nintendo will drop 4k assets for Mario Kart or Zelda because their art style really doesn't need it, the extra resolution will be enough.
 
0
There's some ways Nintendo can tackle this. The cheapest way is to make GameCards a install only media, like physical is on PS5/XSeries. And external storage be cold storage.
Or they can do deal with Samsung and use eUFS for external storage. And for GameCards, there's a rumor(I think it's in the OP of this thread) about Nintendo having a deal with Macronix regarding use of 3D Nand for GameCards.
 
There's some ways Nintendo can tackle this. The cheapest way is to make GameCards a install only media, like physical is on PS5/XSeries. And external storage be cold storage.
Or they can do deal with Samsung and use eUFS for external storage. And for GameCards, there's a rumor(I think it's in the OP of this thread) about Nintendo having a deal with Macronix regarding use of 3D Nand for GameCards.
The problem with required installs is that it would mean Nintendo would have to invest in a large amount of internal storage, which would be costly.
 
Ufs cards are really the only thing that have high speeds aren’t they in that form factor?
More specifically the combination of high speed while keeping power consumption (and hopefully heat) down in that form factor. At least, according to the specs.

https://www.anandtech.com/show/16938/silicon-motion-sm2708-sd-express-review-nvme-ssd-served-hot/5
SD Express does offer speed in that form, but this particular example draws on average a little over 3 watts.
...and its surface temperature can reach as high as 96C, ouch.
 
More specifically the combination of high speed while keeping power consumption (and hopefully heat) down in that form factor. At least, according to the specs.

https://www.anandtech.com/show/16938/silicon-motion-sm2708-sd-express-review-nvme-ssd-served-hot/5
SD Express does offer speed in that form, but this particular example draws on average a little over 3 watts.
...and its surface temperature can reach as high as 96C, ouch.
So it’s basically ufs cards or sd cards that we are already using in current switch

I have a feeling it’s just going to use sd cards
 
0
Another problem is part of the appeal of the switch, is the inherent simplicity of the platform. Just slap the game card into the drive and you’re good to go.
Ngl, this is one of my favorite things about the system. Every time I get a new game for my Xbone it's a colossal headache and sometimes several days before I can play the thing, even if I buy it on disc. It drives me nuts. But the Switch? Pop in the game, play the game. Makes sense.
 
The recent nvidia leak actually makes the storage issue more complicated in my opinion.

Now we know its likely to be more performant than we previously thought, higher quality assets will need to be introduced into games. For new switch games this means bigger cartridges and downloads, for old games getting patches I imagine the patches will be pretty large.

With those bigger assets comes a need to load them faster, otherwise the load times on switch could quickly become unbearable, though some of the current load time issues are certainly due to CPU.

I guess the question is, how much read and write speed could a theoretical 8 core ARM A78C setup utilise? Then what solution could satisfy this need?

In the original switch I have seen it mentioned on here and other places that it's limited to around 25Mb/s to match cartridge read speed and due to CPU. So does the new switch really need anything close to something like 1Gb/s from NVME? Probably not.

For internal, MMC at 350mb/s operating at full speed is probably enough as long as the cpu makes use of it. For cartridges I am not sure what Nintendo will do, but for external storage I am completely stumped.

There's nothing really out there that is widely adopted outside of NVME that's fast enough. NVME likely draws too much power and takes up too much space for the form factor. I know there are smaller drives but again, not widely adopted.

Maybe Nintendo brings back the old kick stand design and to install external storage you replace the actual kick stand with an NVME and use the drive as your kickstand. /s
Smartphones use eUFS which can operate at fast speeds.
 
0
That’s only enough for one or two big PS4 games. Even 500 gigs is ridiculously small for modern games.
Yeah, that's true. But 256GB still costs as lot given only flagship smartphones have it.
I hope Nintendo take the eUFS/Faster GameCards route. But that also has it set of problems. Faster speeds = higher energy consumption and heat to be dissipated.
 
One thing I am realising is that whether it comes end of 22 or start of 23, this console seems to be more substantial than what people have imagined it would be.

It’s quite exciting really.
 
Yeah, that's true. But 256GB still costs as lot given only flagship smartphones have it.
I hope Nintendo take the eUFS/Faster GameCards route. But that also has it set of problems. Faster speeds = higher energy consumption and heat to be dissipated.
Also there's the external storage issue: UFSs aren't as ubiqutous as microSDs, and I'm guessing there will be particular microSD requirements if one wants to use it.
 
Yeah, that's true. But 256GB still costs as lot given only flagship smartphones have it.
I hope Nintendo take the eUFS/Faster GameCards route. But that also has it set of problems. Faster speeds = higher energy consumption and heat to be dissipated.
Cost is why I think storage will be the thing Nintendo cuts corners on. It’ll certainly be faster than on Switch, but we’re probably not going to get anywhere near SSD speeds. If they can manage a 4x increase (Switch is limited to 25mb/s, right?) on game cards and internal storage compared to what Switch is capable of then I’ll consider that a win, even if that will limit some of the ports we’ll get later in the generation.
 
Also there's the external storage issue: UFSs aren't as ubiqutous as microSDs, and I'm guessing there will be particular microSD requirements if one wants to use it.
Yeah. It's a hard problem for Nintendo. They will either have to do like MS(custom proprietary expansion memory) or spearhead the eUFS or SD Express format, which aren't as ubiquitous as MicroSDs.
The cheapest route for Nintendo would be keep using eMMC(Maximum speed for eMMC is 400-500MB/s iirc) and use MicroSD UHS-III. But these speeds would be a far cry from XBSeries/PS5.
 
0
Cost is why I think storage will be the thing Nintendo cuts corners on. It’ll certainly be faster than on Switch, but we’re probably not going to get anywhere near SSD speeds. If they can manage a 4x increase (Switch is limited to 25mb/s, right?) on game cards and internal storage compared to what Switch is capable of then I’ll consider that a win, even if that will limit some of the ports we’ll get later in the generation.
That's a problem for UE5 nanite though. And we know how huge of a deal getting UE4 support was for Switch. Minimum for Nanite was 300 MB/s iirc???
 
0
Woudnt that have been baked into NVN2, and therefore likely somewhere in this leak?
The RTX IO strategy is to make compressed data look like normal reads and writes, with the GPU handling decompression. That would be in Horizon, not in NVN
 
Regarding the dwindling supply of UFS card 1.0, Terrell did bring up the possibility that Samsung's winding down production of 1.0 to switch over to 3.0 (the spec was finalized towards the end of 2020).
That’s old enough I think.

The RTX IO strategy is to make compressed data look like normal reads and writes, with the GPU handling decompression. That would be in Horizon, not in NVN
I don’t think RTX IO would really be used for a console that would typically have a dedicated DMA controller unlike a PC which is what this is meant for as they lack a DMA controller like in the consoles.
 
I think TSMC's N6 process node being used for the fabrication of Drake is probably the best case scenario, considering that Nvidia probably paid a premium to secure enough capacity for TSMC's N5 process node for datacentre chips (e.g. Hopper, Grace), and at the very least, high-end Lovelace GPUs.

Well, frankly all my previous assumptions are out the window now, so I can't rule anything out. My gut instinct is that a Samsung EUV process is more likely because this seems to share a lot of DNA with Orin, and it would be easier to design both of them in parallel if they're being manufactured by the same foundry. The reason I'd suggest Samsung 5nm is that I imagine Samsung want to migrate their 7nm capacity over to 5nm (and possibly already have), because they can charge more for the latter, and both are limited by their EUV lithography equipment. It could in theory be any of Samsung's 7nm, 5nm or even 4nm processes, but I'm just guessing 5nm being most likely on a hunch.

On the TSMC side, Nvidia have A100 on their DUV 7nm process, and they don't have any products on 6nm, but given it's part of the same family as 7nm (and shares design rules, as far as I'm aware), and that TSMC themselves seem to be actively pushing new chips onto 6nm rather than 7nm, it would seem the most likely TSMC process. I would say that TSMC 5nm is very, very unlikely, but given the current revelations I wouldn't say it's strictly impossible (let's just say a 1% chance). The reason I say this is that TSMC are rapidly expanding their 5nm capacity, to the point where it should actually exceed the 7nm/6nm capacity in the not-too-distant future, and Nvidia have committed to a very large allocation of this. They'll be using this for Hopper, probably Grace, Ada (kopite recently mentioned that all Ada GPUs are going to use TSMC N5, not just the high-end ones), and possibly more stuff we don't know about.

This means they'll be using a lot of TSMC N5 wafers for other stuff, but conversely if they have made huge advance payments to TSMC for that capacity, having many different product lines use it gives them flexibility in how they use up that allocation. If Grace doesn't sell as well as expected, for example, they can divert the allocation towards Ada consumer GPUs, or Hopper HPC GPUs, or perhaps even Drake. Conversely if they build Drake on N6, then it's the only chip they're going to be making on it, and if TSMC also require a significant commitment to get N6 allocation in the same vein as what they're doing for N5, then Nvidia will have a lot less flexibility with that allocation. Basically if Nintendo's Drake-powered model sells below expectations then Nvidia will have N6 allocation they can't really do anything else with, and if it exceeds expectations then it may be more difficult to scale up production.

I'd expect Samsung to be a lot more flexible, because they aren't in the same kind of negotiating position as TSMC.

Isn’t it a possibility that all sms are active in portable, they’re just clocked absurdly low?

The issue is that there's a limit on how much power you can save by reducing clocks. When you see power consumption vs clock graphs, the driving factor in terms of power consumption (and the reason you see something that looks like an exponential curve) is the voltage, as voltage typically increases alongside clock speed, and power consumption is proportional to voltage squared (which is why you typically get better perf/W at lower clocks). However in practice there's a minimum voltage required for any chip to operate, so you can't just keep reducing voltage ever further to keep getting better efficiency from the chip. For any given chip there's some clock speed that can be reliably achieved on the minimum voltage, and that clock speed is around the peak of the performance per Watt for the chip. You can clock lower than it, but as you can't reduce voltage any lower, your power consumption doesn't reduce much and your perf/W gets much worse.

As a specific datapoint, for Mariko we know that this "max clock at min voltage" GPU clock is 384MHz (at least as far as Nvidia is concerned). The Linux4Tegra DVFS tables for Mariko support clocks lower than 384MHz, but every clock 384MHz and lower uses the same voltage, so there's not much point clocking below that if you're concerned about performance per Watt. This clock for Drake will likely be different, and due to the improved manufacturing process compared to Mariko, it will probably be higher. It's quite possible that there's no benefit from clocking any lower than 500MHz or so (which is why I labelled my clock estimates conservative).

We should look at the GA107S to get some grounds for this discussion. Here we see a 16SM GPU clocked at 1.2GHz with 4GB of GDDR6 memory and a TDP of 25 watts. The memory takes up ~8 watts here, so the TDP of the GPU is about ~17 watts, so lets say it's 13 watts on the same 8nm node for the same 1.2GHz clock on Drake. However we can reduce the clock to 1GHz and see a huge reduction in power consumption, lets error on the side of caution and say 30% for a 17% clock reduction, now we are looking at 10 watts for Drake's GPU when docked.

I think docked, Drake could be anywhere from 768MHz to 1GHz, offering 2.36TFLOPs to 3TFLOPs when docked. When portable, a 400MHz clock should drastically reduce the GPU power consumption further, lets just go with under 5 watts for the GPU. The big question is will Nintendo allow Drake to use more energy than the original Switch? I think it is very possible, battery technology has gotten much better, so there should be room to pull 10 watts (original Switch was 9 watts with full brightness on a bad screen).

It's also worth noting that it could be 8ULA and not Ampere's current process node, it could even be Samsung's 7nm process node, as we have seen Nvidia shrink an 8nm SoC to 7nm recently. Since the leaker is unsure about the process node, and he is the only source for 8nm, and that GA10F is very likely seen a few changes over the past 3 years... We can no longer assume it is still 8nm. I also think it is a bit reckless to solidify the idea that they would disable Two 3rds of the GPU for portable mode, if that were the case, they would just shrink the GPU and run it at a higher clock instead, as it would save money.

I'd caution about using MX570 as an example, as it hasn't released yet, so we don't have any confirmed details, and in particular we don't know what they steady-state clocks are with a 25W TGP (which is just tricky in general because of how much GPU clocks vary and how measures like TDP and TGP are less clear than they used to be). The laptop version of the RTX 3050 is out and about, which is very similar but at a slightly higher TGP, but I can't find any reviews which actually measure clocks for this. If I get the chance I might see if I can downclock my RTX 3070, as GPU-Z reports a GPU package power draw (ie excluding memory or anything else on the board), and I could divide by the relative number of SMs to get a rough idea of what Drake's GPU might consume at these kind of clocks.

That said, I don't think it's impossible that Drake could be on 8nm and use 12 SMs in docked mode, I just feel it's just hitting into higher power consumption territory than I'd personally expect. This plus kopite becoming less certain on the manufacturing process makes me think other nodes might be back on the table.

On disabling SMs, I also argued basically the same thing just a few weeks ago. However, we now have details showing:
  1. A much larger GPU than pretty much anyone expected.
  2. Confirmation that this is the only Ampere GPU to support clock gating.
The evidence seems to support the idea that they're disabling SMs in portable mode. As to why they would do that, rather than just use a smaller GPU and clock it higher, one possible explanation is that they want a close match in terms of performance-per-pixel in both modes. With 4K output in docked mode and 1080p in portable that's a stretch to do without disabling SMs, as you'd need a 4x clock difference between the two modes. Plausibly this could be done with a 400MHz portable/1.6GHz docked, but I don't know how feasible that is. If they're using a 720p display, then there's a 9x difference in resolution, and that just isn't feasible to achieve with clock differences alone. Disabling SMs would allow them to maintain performance-per-pixel parity across both modes without pushing docked clocks too high, which may be enough of a reason for Nintendo.
 
I'd caution about using MX570 as an example, as it hasn't released yet, so we don't have any confirmed details, and in particular we don't know what they steady-state clocks are with a 25W TGP (which is just tricky in general because of how much GPU clocks vary and how measures like TDP and TGP are less clear than they used to be). The laptop version of the RTX 3050 is out and about, which is very similar but at a slightly higher TGP, but I can't find any reviews which actually measure clocks for this. If I get the chance I might see if I can downclock my RTX 3070, as GPU-Z reports a GPU package power draw (ie excluding memory or anything else on the board), and I could divide by the relative number of SMs to get a rough idea of what Drake's GPU might consume at these kind of clocks.

That said, I don't think it's impossible that Drake could be on 8nm and use 12 SMs in docked mode, I just feel it's just hitting into higher power consumption territory than I'd personally expect. This plus kopite becoming less certain on the manufacturing process makes me think other nodes might be back on the table.

On disabling SMs, I also argued basically the same thing just a few weeks ago. However, we now have details showing:
  1. A much larger GPU than pretty much anyone expected.
  2. Confirmation that this is the only Ampere GPU to support clock gating.
The evidence seems to support the idea that they're disabling SMs in portable mode. As to why they would do that, rather than just use a smaller GPU and clock it higher, one possible explanation is that they want a close match in terms of performance-per-pixel in both modes. With 4K output in docked mode and 1080p in portable that's a stretch to do without disabling SMs, as you'd need a 4x clock difference between the two modes. Plausibly this could be done with a 400MHz portable/1.6GHz docked, but I don't know how feasible that is. If they're using a 720p display, then there's a 9x difference in resolution, and that just isn't feasible to achieve with clock differences alone. Disabling SMs would allow them to maintain performance-per-pixel parity across both modes without pushing docked clocks too high, which may be enough of a reason for Nintendo.
Well the reason for wanting a large die with 12 SMs may have to do with the fact that the amount of RT cores and tensor cores are tied to SM count. If they want this thing to have significant raytracing or AI/DLSS capabilities they might need more SMs to get them more of those dedicated cores.

As for them disabling SMs in portable mode, this would necessitate disabling those corresponding RT and tensor cores too, right? Couldn't that cause logic issues if they're using tensor cores for anything else besides DLSS, which is something you yourself have speculated on?
 
Regarding storage speeds:
After publication, Epic asked for a correction here, the original piece quoted Michal Valient as saying the data throughput is 10MB/s - it's 10MB per frame. At 30fps, this would be 300MB per second.
So, for a 60 FPS UE5 title, minimum would be 600MB/s.
 
Well, frankly all my previous assumptions are out the window now, so I can't rule anything out. My gut instinct is that a Samsung EUV process is more likely because this seems to share a lot of DNA with Orin, and it would be easier to design both of them in parallel if they're being manufactured by the same foundry. The reason I'd suggest Samsung 5nm is that I imagine Samsung want to migrate their 7nm capacity over to 5nm (and possibly already have), because they can charge more for the latter, and both are limited by their EUV lithography equipment. It could in theory be any of Samsung's 7nm, 5nm or even 4nm processes, but I'm just guessing 5nm being most likely on a hunch.

On the TSMC side, Nvidia have A100 on their DUV 7nm process, and they don't have any products on 6nm, but given it's part of the same family as 7nm (and shares design rules, as far as I'm aware), and that TSMC themselves seem to be actively pushing new chips onto 6nm rather than 7nm, it would seem the most likely TSMC process. I would say that TSMC 5nm is very, very unlikely, but given the current revelations I wouldn't say it's strictly impossible (let's just say a 1% chance). The reason I say this is that TSMC are rapidly expanding their 5nm capacity, to the point where it should actually exceed the 7nm/6nm capacity in the not-too-distant future, and Nvidia have committed to a very large allocation of this. They'll be using this for Hopper, probably Grace, Ada (kopite recently mentioned that all Ada GPUs are going to use TSMC N5, not just the high-end ones), and possibly more stuff we don't know about.

This means they'll be using a lot of TSMC N5 wafers for other stuff, but conversely if they have made huge advance payments to TSMC for that capacity, having many different product lines use it gives them flexibility in how they use up that allocation. If Grace doesn't sell as well as expected, for example, they can divert the allocation towards Ada consumer GPUs, or Hopper HPC GPUs, or perhaps even Drake. Conversely if they build Drake on N6, then it's the only chip they're going to be making on it, and if TSMC also require a significant commitment to get N6 allocation in the same vein as what they're doing for N5, then Nvidia will have a lot less flexibility with that allocation. Basically if Nintendo's Drake-powered model sells below expectations then Nvidia will have N6 allocation they can't really do anything else with, and if it exceeds expectations then it may be more difficult to scale up production.

I'd expect Samsung to be a lot more flexible, because they aren't in the same kind of negotiating position as TSMC.



The issue is that there's a limit on how much power you can save by reducing clocks. When you see power consumption vs clock graphs, the driving factor in terms of power consumption (and the reason you see something that looks like an exponential curve) is the voltage, as voltage typically increases alongside clock speed, and power consumption is proportional to voltage squared (which is why you typically get better perf/W at lower clocks). However in practice there's a minimum voltage required for any chip to operate, so you can't just keep reducing voltage ever further to keep getting better efficiency from the chip. For any given chip there's some clock speed that can be reliably achieved on the minimum voltage, and that clock speed is around the peak of the performance per Watt for the chip. You can clock lower than it, but as you can't reduce voltage any lower, your power consumption doesn't reduce much and your perf/W gets much worse.

As a specific datapoint, for Mariko we know that this "max clock at min voltage" GPU clock is 384MHz (at least as far as Nvidia is concerned). The Linux4Tegra DVFS tables for Mariko support clocks lower than 384MHz, but every clock 384MHz and lower uses the same voltage, so there's not much point clocking below that if you're concerned about performance per Watt. This clock for Drake will likely be different, and due to the improved manufacturing process compared to Mariko, it will probably be higher. It's quite possible that there's no benefit from clocking any lower than 500MHz or so (which is why I labelled my clock estimates conservative).



I'd caution about using MX570 as an example, as it hasn't released yet, so we don't have any confirmed details, and in particular we don't know what they steady-state clocks are with a 25W TGP (which is just tricky in general because of how much GPU clocks vary and how measures like TDP and TGP are less clear than they used to be). The laptop version of the RTX 3050 is out and about, which is very similar but at a slightly higher TGP, but I can't find any reviews which actually measure clocks for this. If I get the chance I might see if I can downclock my RTX 3070, as GPU-Z reports a GPU package power draw (ie excluding memory or anything else on the board), and I could divide by the relative number of SMs to get a rough idea of what Drake's GPU might consume at these kind of clocks.

That said, I don't think it's impossible that Drake could be on 8nm and use 12 SMs in docked mode, I just feel it's just hitting into higher power consumption territory than I'd personally expect. This plus kopite becoming less certain on the manufacturing process makes me think other nodes might be back on the table.

On disabling SMs, I also argued basically the same thing just a few weeks ago. However, we now have details showing:
  1. A much larger GPU than pretty much anyone expected.
  2. Confirmation that this is the only Ampere GPU to support clock gating.
The evidence seems to support the idea that they're disabling SMs in portable mode. As to why they would do that, rather than just use a smaller GPU and clock it higher, one possible explanation is that they want a close match in terms of performance-per-pixel in both modes. With 4K output in docked mode and 1080p in portable that's a stretch to do without disabling SMs, as you'd need a 4x clock difference between the two modes. Plausibly this could be done with a 400MHz portable/1.6GHz docked, but I don't know how feasible that is. If they're using a 720p display, then there's a 9x difference in resolution, and that just isn't feasible to achieve with clock differences alone. Disabling SMs would allow them to maintain performance-per-pixel parity across both modes without pushing docked clocks too high, which may be enough of a reason for Nintendo.
But then there's DLSS which throws a whole wrench in those calculations.

not to mention RT cores and Tensor cores.
Now we don't know if NVIDA's Rapid Core Scaling tech as part of Max-Q4 specifically turns off CUDA cores only, which in that case then no problem the RT and Tensor acceleration should be intact outside of clock speed drops.

But if it does turn off the full SM then that is a bit knock against that.

Now, TBH if they are adding Rapid Core Scaling to it and it only turns off the CUDA cores and doesn't touch Cache, RT or Tensor cores, then I could even see them expose Dynamic IQ and Dynamic Boost to the Developer (Ofc having a "Default" Docked and Portable Profile and likely a B/C profile for both)

That way devs if they want can tweak the clocks within a clock budget per game and move resources between the CPU/GPU far more granularly while optimizing

(EX: a game is highly CPU bound, they can turn off some SMs and flood the power from those SMs into the CPU to up clock speeds to what clock limit NVIDIA/Nintendo set for the CPU.

A game is highly GPU bound, they can draw off power from some of the less important CPU cores if a game is more Single-Threaded, and put that power back into the GPU)
 
Please read this new, consolidated staff post before posting.

Furthermore, according to this follow-up post, all off-topic chat will be moderated.
Last edited by a moderator:


Back
Top Bottom