I've been doing a bit of research into this, and I'm actually not sure that the bolded is correct. Samsung 8N is surely the cheapest plausible node per wafer, but once you take into account density and yields, it's entirely possible that a more advanced node like TSMC N5 is actually cheaper per chip. In fact, my back-of-a-paper-envelope maths suggests that a Samsung 8N Drake could cost
70% more than a TSMC N5 Drake.
I should emphasise that I have no expertise in this field, my analysis contains a lot of assumptions and estimations which may deviate significantly from reality, and you shouldn't take what I'm about to write any more seriously than any other random person on the internet. That said, I can run through the maths of it.
A few pages back, I posted
an estimate of Drake's die size on various manufacturing processes. I've revised my estimates on these figures in two ways since then. The first is that I'm now estimating Drake's transistor count to be around 8 billion transistors. This is based on Nvidia's Orin die photo actually being for an older 17 billion transistor configuration of the chip, but also from the fact that Xbox Series S's "Lockhart" SoC
reportedly comes in at 8 billion transistors itself. This is the same number of CPU cores (8) and GPU shader "cores" (1536) on the silicon as Drake, but we know that the Zen2 CPU is larger and uses more transistors than A78, and RDNA2 similarly is larger and uses more transistors per "core" than Ampere. There are some differences between Drake and base Ampere, though, the 4MB of L2 cache will add considerably to the total (based on the GA102 die, it looks like it could be around 1.3 billion transistors for that alone), and there might be some additional components on there care of Nvidia that Nintendo don't really need, but might be useful for Nvidia's other customers (eg an 8K codec block). I'm just going with 8 billion as a round figure, but again there's a large margin of error.
The second change is that I'm changing my estimate for TSMC N7->N6 density improvement from 18% (TSMC's claim) to 8.1% (actual measured improvement from Navi 23 to Navi 24). That being the case, my new estimates are as follows:
Process | Density (mT/mm2) | Drake size (mm2) |
Samsung 8nm | 45.6 | 175.4 |
Samsung 7nm | 59.2 | 135.1 |
Samsung 5nm | 83.4 | 95.9 |
Samsung 4nm | 109.7 | 72.9 |
TSMC N7 | 65.6 | 122.0 |
TSMC N6 | 70.9 | 112.8 |
TSMC N5 | 106.1 | 75.4 |
In terms of cost per wafer, my starting point was the figures shown in
Ian Cutress's video on wafer prices (which incidentally is very informative if you're curious about how this kind of stuff works). This contains wafer cost figures for many of TSMC's nodes. It's important to note here that these numbers are a few years old at this point, and that the exact prices per wafer have surely changed (in fact they've probably gone down and come back up again since then), however I'm not really that interested in the absolute numbers, but rather the relative costs across different processes. The cost Nintendo pay for a Drake chip has a lot of other factors involved (packaging, testing, and obviously Nvidia's margins), which are difficult to estimate, so it's simpler to think about costs in relative terms.
The costs per wafer (in USD) quoted in that video for more recent nodes are:
28nm | 20nm | 16nm | 10nm | 7nm |
2,361.84 | 2,981.75 | 4,081.22 | 5,126.35 | 5,859.28 |
These are just TSMC nodes, and this predated their 5nm processes. To estimate the 5nm wafer costs, I'm relying on
this chart which TSMC released in mid-2021, showing the relative wafer manufacturing capacity of 16nm, 7nm and 5nm process families. This shows that the capacity of 7nm relative to 5nm in 2020 was 3.87:1, and the estimated capacity ratio in 2021 is shown as 1.76:1. We also know from
TSMC's 2021 Q4 financials that 5nm accounted for 19% of revenue in 2021, compared to 31% for 7nm. The capacity figure from the chart doesn't reflect actual output, and it seems to reflect installed capacity at year-end, which obviously wouldn't be in operation over the entire year they're reporting revenue for. Therefore, if we assume that capacity was added uniformly over the year, the actual ratio of 7nm to 5nm wafers produced should be half way between the 2020 and 2021 year-end capacity numbers. That is, we would expect that over the course of 2021, TSMC produced about 2.4x as many 7nm wafers as 5nm wafers. With a 1.63x ratio of revenue between the two nodes, we can estimate that the revenue per wafer was approximately 47% higher for 5nm than 7nm. This would put a 5nm wafer at $8,622.76. Again, this may not be the correct absolute figure, but I'm mostly interested in whether the relative prices are accurate.
So, onto the cost per die. To do this we have to estimate the number of dies per wafer, for which I use
this yield calculator. I take the die sizes above and assume all dies are square. For the defect density, I'm using a figure of 0.1 defect/cm2, which is based on
this Anandtech article. It's likely yields are actually a bit better than this by now, but it won't make a huge difference to the analysis.
| Die area | Dies per wafer | Cost per wafer ($) | Cost per die ($) | Cost per die ratio |
TSMC N7 | 122.0 | 427 | 5,859.28 | 13.72 | 1.15 |
TSMC N6 | 112.8 | 462 | 5,859.28 | 12.68 | 1.06 |
TSMC N5 | 75.4 | 723 | 8,622.76 | 11.93 | 1.00 |
For N6 TSMC are probably charging a bit more per wafer than N7, but as I have no way of estimating this, I'm just leaving the price per wafer the same. The actual cost per die here won't be even close to what Nintendo will have to pay, both with the old numbers being used for wafer prices, and with packaging, testing and Nvidia's margins being added on top. However, the cost per die ratio in the last column is independent of those things. I've chosen TSMC N5 here as the baseline, and you can see that N7 and N6 are actually calculated as being more expensive per die than N5. The dies per wafer gives you a clue as to why, with the substantial increase in density of N5 (plus the smaller die resulting in a better yield ratio) meaning that even a significantly more expensive wafer cost doesn't necessarily mean more expensive chips themselves.
For the Samsung manufacturing processes, I haven't been able to find any information (even rough estimates) on wafer costs, or wafer output and revenue splits that might be used to estimate revenue per wafer. However, we can look at the cost per wafer required to hit a cost per die ratio of 1.0 (ie the same cost per die as TSMC N5) and evaluate whether that's feasible. For defect density on 5nm I'm going to use 0.5, as it was rumoured to be resulting in 50% yields for mobile SoCs that should be roughly 100mm2 in size. For 8nm defect density it's a bit trickier, but I'm estimating 0.3 defects per square cm, based on product distribution of Nvidia's desktop GPUs (if it were lower, then they wouldn't have to bin quite so heavily, if higher they wouldn't be able to sell full-die chips like the 3090Ti at all). These are only very rough estimates, so I'll also look at a range of estimates for both of these.
Process | Defect density (per cm2) | Dies per wafer | Cost per wafer ($) - 1.00 ratio |
Samsung 5nm | 0.5 | 383 | 4,569.19 |
Samsung 5nm | 0.3 | 459 | 5,475.87 |
Samsung 8nm | 0.5 | 148 | 1,765.64 |
Samsung 8nm | 0.3 | 201 | 2,397.93 |
Samsung 8nm | 0.1 | 280 | 3,340.40 |
Samsung's 5nm processes are a bit more realistically priced here. They're most comparable to TSMC's 7nm family in terms of performance and efficiency, and if they've got the defect density down to 0.3 then they could charge a similar amount per wafer to TSMC N7 and be competitive on a per-chip cost. If the defect density is actually 0.5, then they'd have to be much more aggressive on price per wafer, coming in below TSMC 10nm, and not that far off TSMC's 16nm family. Note that the manufacturing costs on Samsung's side are likely quite a bit higher for their 5nm processes than even TSMC's N7, as Samsung are using EUV extensively in their 5nm process, so there's only a limited extent to which they can be aggressive on price.
On the 8nm side, wafer costs get a lot more unrealistic if we're trying to assume that they can be competitive on a cost per die basis with N5. If we use the 0.3 defect density estimate, then they'd have to charge about $2,400 per wafer for N8, which is basically the same as TSMC's 28nm process. Keep in mind that Samsung have their own 28nm and 14nm processes that are pretty competitive with TSMC's 28nm and 16nm families, which means Samsung would either have to be charging a similar amount for an 8nm wafer as they charge for a 28nm wafer, or they are massively undercharging for their 28nm and 14nm processes if they're proportionally cheaper than 8nm. Both of these seem very unlikely. Even with only a 0.1 defect density (similar to TSMC's processes), they would have to charge $3,340 per wafer, which is quite a bit less than TSMC 16nm.
If we assume the cheapest Samsung could charge for an 8nm wafer is the same as a TSMC 16nm wafer (which would make it very aggressively priced), and the defect density is 0.3, the cost per die would be $20.30, which gives a cost per die ratio of 1.70, or 70% more expensive than the same die on TSMC N5. This is even ignoring the significant performance and efficiency benefits of going with TSMC's N5 process over Samsung's 8nm process.
We can also plug Mariko into these to figure out a relative cost. For the Mariko die size, I measured some photos I found online in comparison to the original TX1, and it looks to be approximately 10.1mm by 10.2mm. With an assumed 0.1 defect ratio on 16nm, this would put it at 507 dies per wafer, and therefore $8.05 per die. Again this doesn't represent the actual price Nintendo pay, but this means a TSMC N5 Drake (with about 4x the transistor count) would cost about 50% more than Mariko does.
This might explain why Nvidia is moving so aggressively onto TSMC's 5nm process. I had assumed that they would keep lower-end Ada chips on Samsung 8nm, or maybe Samsung 5nm, but this would suggest that it's actually cheaper per chip to use TSMC 5nm, even before the clock speed/efficiency benefits of the better node. It also, from my perspective, makes Drake's 12 SM GPU a lot more reasonable. For an 8nm chip in a Switch form-factor, 12 SMs is much more than any of us expected, but if you were to design a TSMC N5 chip for a Switch like device, 12 SMs is actually not excessive at all. It's a small ~75mm2 die, and there shouldn't be any issue running all 12 SMs at reasonable clocks in both handheld and docked modes. Yields would be extremely high, and as TSMC N5 will be a very long-lived node, there would be no pressure to do a node shrink any time soon.
Now, to caveat all of this again, I'm just a random person on the internet with no relevant expertise or insight, so it's entirely possible (probable?) that there are inaccurate assumptions and estimates above, or just straightforward misunderstanding of how these things work. So take it all with a huge grain of salt. Personally I still think 8nm is very likely, possibly even moreso than TSMC N5, but I think it's nonetheless interesting to run through the numbers to try to actually verify my assumptions.