@miamoto - are you okay? You seem extremely angry? I don't know your post history - have you received any sort of lash out from your opinions in the past? Like, I think some of your analysis doesn't hold up, partially because it seems based on misunderstanding some facts, but I don't think your position is unreasonable. I don't think it's even extreme for the thread, excepting some extreme optimists.
I'm gonna touch on some things where I think you are factually inaccurate, and how I think that puts your analysis on the wrong path, but I don't want to get into a pileup here. You seem upset, and I don't want to add to that.
This isn't true. Memory usage is almost entirely dictated by geometry and texture size, not architecture. Nvidia doesn't have a substantial advantage here.
The RAM differences between AMD and Nvidia cards is a side effect of how the two companies have chosen to bin their dies. AMD's biggest 6000 series card only has 16 GB of memory in it, Nvidia's has much more, 24 GB.
However, Nvidia continues to use a floorswept version of that largest die for many more cards than AMD does. As they drop GPCs from the die, they also lose a certain number of memory controllers. GDDR is only produced in certain quantities, without custom orders. With only so many slots and so many DIMMs available, Nvidia is placed in a position where they they either have to way
over provision RAM or way
under provision RAM, especially on their midrange cards
This is why the 4060 Ti was such a disappointment. The 4060 started out with 8 GB of RAM, but that's not enough for the level of power on the card offers. The Ti offered 16GB of RAM at 100 dollars more - which is too goddamn much. That's because the hardware for a 12 GB card
simply doesn't exist.
This is not true either. Mid-range cards are often floorswept versions of the high end cards, yes, but low-end cards are
not. The AD-107 Die, the basis for the bottom of the RTX 40 range is sold as a full die, it's not intentionally over designed and then cut down. Initially it was the same thing for GA-107, the RTX 3050 die. Years later, Nvidia would start selling GA-107 dies with cores disabled, but it seems unlikely that yields suddenly
dropped. It seems much more likely that Nvidia needed an even cheaper lower end card, and needed to manually disable some working cores to create product differentiation.
This is actually quite important to understand. It makes sense to overdesign big chips, but not to overdesign small chips.
Chunks of a wafer have errors. Those errors will result in dies that don't work fully. The bigger a chip, the more likely it will have errors. Some parts of a chip can be duplicated, mainly cores. Some parts cannot, like L2 cache, uncore, IO interconnects, etc.
If you have a larger chip (like an Xbox Series X APU, for example), you can make the GPU 30% bigger but only make the chip, say, 15% bigger.
That means you get a 15% decrease in the number of chips per wafer, but you have a 30% buffer on your GPU where you can work around errors. That results in more chips per wafer.
But that breaks down for smaller chips. As a chip gets smaller, it becomes less and less likely that you can survive an error, because a same-size error on a same-size wafer hits a larger percentage of the chip. A 30% increase in the GPU size of a small chip means you only get like a 10% increase in the chip size (because uncore is a much larger percentage of the chip), but something like a 0-5% increase tin the number of errors you can survive (because you can never survive errors in the uncore, and the chip is so small that you can only survive tiny errors in the cores).
This is just fewer chips per wafer.
...but this is a highly custom chip with no market for binned/floorswept units. Being able to use binning to create multiple product lines doesn't actually work here. The most cost effective strategy is not to maximize the number of chips that work, period, and then sell off chips at different price/performance tiers (like Orin, or a GPU) but to maximize the number of chips that hit the highest end performance target.
No, that is not what the leak said. There are no chip design leaks that I am aware of. There are
Nintendo software leaks and that
software says that
production chips have 12SMs.
Yeah, I buy this.
I think you are roughly correct in power draw. However, to get down to 3-5 W on Orin, you actually need to spec it down below the current Switch. So I suspect the Orin power profiles don't actually apply.
Yeah, I don't think you're in an extreme position there
This
is an extreme position. Remember what I said about memory controllers dictating the RAM options available? Well, obviously, T239 is going to use a different memory controller, LPDDR based instead of GDDR. Which changes the economics of what's available.
No one manufactures 4GB 64-bit LPDDR5 modules. They're useless in the market. Unless Nintendo and Nvidia want to pay extra money for custom RAM, this isn't an option. And as we saw with the 4060Ti above, it's probably not cost effective to pay extra money for custom RAM.
From a technological point of view, Nintendo will need some RAM for, you know,
games. It's not just GPU RAM. Nvidia cards are notorious for being underspecced in RAM, and Nintendo needs more RAM than just RAM for the GPU. 8GB seems extremely unlikely.
I would agree with you, if the NVN2 leak hadn't happened. I understand that many people misunderstand the leak to think that it shows the number of SMs on die, which they can then assume might be floorswept off. But that's not what the leak says. The leaks says that Nintendo's operating system and games expect 12 SMs to be there, in production.
I'm not disputing your handheld clocks, but your docked clocks seem like you're just trying to create controversy? Like what is your rationale here?
You're driving a much larger screen with 4x the pixels, but you're looking at as little as a 30% increase in GPU performance? Why? You're not power constrained, you're only constrained by the fan's ability to dissipate heat. The Switch was originally clocked at over double in docked mode,
and the docked clock was on the lower end of what other Maxwell GPUs had (like the 800M mobile chip).
You think that the RTX design is so power inefficient at clocks this low that you can't double the clock speeds even at 15W up from
3W?
I mean, again, this isn't even the leak. Nvidia's Linux driver says it will be 8 cores. But yeah, that's about where I expect the clocks to be. Not crazy.
Yeah, that tracks.
Not lashing out, it just seems like you're making up some limits that don't have anything to do with TDP and production processes, like RAM and Storage.
I don't think 8nm is an unreasonable node, and I think your analysis on the power limits is spot-fucking on. But that's actually the reason that some folks tend to not think 8nm is the node - because your power analysis is exactly right,
and it does come from Nvidia but the chip size (SMs, CPU cores)
also comes from Nvidia and are production numbers. You seem to have not known that the numbers weren't from a spec sheet or a chip design, but from software, and assumed that Nvidia would need to floorsweep the chip to hit the power targets.
That makes sense, except that it takes Orin, which is a different design, as a Absolute Accurate Reference for the power consumption of T239, while thinking that leaks about T239 itself are Squishy Malleable References for T239's chip size. There are three majory possibilities.
The power numbers apply to T239, and the chip size is right, so it's not 8nm. This is maybe the loudest contingent of the thread. If you put a gun to my head, this is where I fall, but please don't put a gun to my head.
The power numbers apply to T239, and it's on 8nm, so the chip size is wrong. This is your position, and a pretty common position in other parts of the internet. As I have tried to show here, that doesn't actually jibe with what leaked, just with what people
assume leaked, and it doesn't really jibe with how Nvidia handles their small dies. That's why this is not my position, and I find it the least likely.
It's on 8nm for all the reasons you describe, the chip size is right for all the reasons I describe, then Orin's power limits must not apply. There is substantial evidence for this as well. In both the Nvidia hack and in the Linux drivers there is evidence of power saving tech that Orin doesn't have. And AMD was able to increase Steam Deck's power efficiency just by tuning power curves for the chip. This has been my position historically, and something I still think has lots of merit. It's just that the amount of savings required seem pretty high.
By way of comparison, I'll put down my current predictions as well.
12 SM GPU
Just straight out of the leak. If this is floorswept, then I'll count this as a serious failure of analysis on my part. I'll take the L.
500Mhz handheld, 1GHz docked.
Not much higher than your numbers, and not much higher than the current Switch. Thraktor has made a really good argument about likely clock speeds, but I still tend toward some slight pessimism here.
12 GB of RAM.
Rumors have pushed this up to 16 GB, but I'm still holding to the older 12 GB prediction. We've gone round and round 1000 times about the availability of 64bit 4GB modules, they just aren't out there. Nintendo is not historically conservative about RAM - but RAM isn't getting cheap as fast as it once did. 12 GB is still really nice compared to last gen, I think it's a good bang-for-the-buck cost savings.
8 A78C cores
Straight out of Nvidia's Linux drivers. I think it's funny that no one suggests floor sweeping CPU cores, even though the logic would be the same.
1.5GHz
I think this is one of the more flexible specs here, and I wouldn't be surprised by something wildly different. CPU power is less at a premium, but also, the ARM cores are super efficient at these clocks. So it really depends on how much room there is in the power budget. I wouldn't be completely surprised by 2.0GHz, or as low as 1.2GHz.
You'll note I don't list the node. That's because I don't much give a shit, and because, as I've said, I'm really flaky on that part. I'm more confident about the performance target based on the hardware leaks than I am in how they achieve it. I can give more analysis if you like, but I feel like I've gone on too long already. I just want it to boil down to - I actually agree with most of what you're saying, but a small change in data (the nature of the leak, the odds of binning/floorsweeping) takes me in a substantially different direction.