Just catching up on the Jetson Orin info from GTC. There's not much surprising about it, with the exception of RT, but it's nice that we've got a white paper on the architecture, as I'd imagine Dane will be very similar. On the RT cores, the fact that they're there at all is a surprise, but perhaps more interesting is the choice to include half as many of them as desktop Ampere. It almost seems as if they're there for compatibility reasons, or maybe they found some limited automotive use-cases for them and decided to keep some limited functionality there. In any case, it does increase the likelihood that we see RT cores in Dane, but it reduces the expected performance of those cores even lower (from a pretty low base), so I still have low expectations of many games making extensive use of them.
There is one thing which we can infer from the photos provided, though, which is the size (and therefore transistor density) of Orin. In particular, the
Jetson AGX Orin and
Jetson Orin NX pages both provide nice head-on photos of the boards, which makes calculating the die size easy. As these don't show the actual bare die (just a grey rectangle with the Nvidia logo), I also used the photo in
this press release, which shows an actual bare die, but is lower resolution and at an awkward angle.
Using each of the three photos, the calculation in each case comes to a 22.1mm x 20.8mm die (+- about 0.1mm), for a die size of approx 460mm2. This tells us a few things:
- The Jetson Orin NX chip is the full Orin die, just binned with parts disabled. This is as I would have expected, but good to get confirmation.
- The Orin die has a density of approx 45.6 million transistors per mm2, assuming 21 billion transistors is still correct.
- This is in line with the density of GA102, GA104, etc., so it's likely using an identical 8N manufacturing process, and isn't using higher-density mobile libraries.
So, if we're to assume about a 100mm2 die size for Dane, a transistor count of about 4.5 billion seems likely. This compares to 2 billion transistors for the TX1/Mariko chips used in existing Switch models.
I wanted to get to this post before, but forgot and then remembered now
22.1mm=2000 pixels
20.8mm=1495 pixels
GPU(2048 cores) + cache memory + interconnect=~861x1142 pixels
9.5mm x 15.88mm = ~150.86mm^2 (2048 cores)
CPU + Caches= ~693x450
7.65mm x 6.26mm = ~47.89mm^2 die space (12 cores)
This does not include the DLA, MCs, PVA, etc., just the GPU + Cache and the CPU+Cache.
If shortened/ made smaller, perhaps?
i.e., 512-1024 CUDA Cores + 8 CPU cores?
9.5mm x ~7.94mm = ~75.47mm^2 (1024 CUDA Cores)
9.5mm x ~5.95mm = ~56.603mm^2 (768 CUDA Cores)
9.5mm x ~3.96mm = ~37.64mm^2 (512 CUDA Cores)
This is about how much of the die these would occupy for the shaders + the cache
+ the interconnect, if it remains at roughly the 45MTr per mm. Though, as GA106 has shown they can be a bit denser, with it being 48MTr per mm, can’t they make it even a bit denser to fit as much in a small package? Unsure. It is supposed to be a derivative, correct? So, to the specifications it likely doesn’t have to follow it to a T
~7.65mm x ~4.21mm = ~32.249mm^2 (8 CPU Cores)
~3.82mm x ~6.26mm = ~23.96mm^2 (6 CPU Cores)
~3.82mm x ~4.17mm = 15.94mm^2 (4 CPU Cores + no little cores)
When looking at the Tegra X1 die shot, you mentioned that the SM Logic extends beyond that, but that begs the question of, even with the smallest configuration of say 4SMs + 4 CPU Core, wouldn’t this extend to beyond 100mm^2 as a die size regardless unless they go with 2 SMs again? And 4 cores? The 4 CPU cores are closest to the 4 A57 CPU cores on die space, as per Locuza noting it to be around the 13mm^2 range.
And it was brought up to me, but why do they have to limit it to 100mm^2 or lower? I understand having a small chip, but don’t understand why they necessarily
need to in this context. If they stretch it to say, 120? 130? 140? 150? Or hell, 160mm^2 if they need to what would be the issue here really? I understand that a smaller chip means a lower cost and can have more chips that are not damaged on the wafer as they are each separate and individual, but I don’t think the penalty is that severe even for a smaller chip like this in comparison to the PS5 or Series X APUs which would be far bigger and where a damaged good becomes more detrimental there. I’m not really saying that they should go for 160 mind you, I just don’t see why they
need to be at 100mm^2 or less here.
It’s not like there isn’t a possible benefit of using a larger chip while having it at lower clocks which we reasonably expect, in that it could be easier to cool due to the higher amount of surface area.
If it is highest config (8SMs + 8CPU cores) and it uses the lowest density, I suspect that it being ~160mm^2 for the total die size at their
own modification to it. The PS5 and Series X APU make a good use of the amount that is taken up by the GPU, CPU, Memory Controllers, Media Engines, etc., and don’t seem to take up too much with respect to the Logic, isn’t that possible here as well for Dane?
And then there is the physical space of the switch, the TX1, OG even being 18% larger than the Mariko, didn’t take up that much space, and Nintendo even managed to squeeze the overall package even smaller with the OLED revision, or rather, more kept/merged together at the cost of a smaller fan (because it wasn’t needed to be that large?).
So, again, I fail to see why they have to keep the die so small, they seem to have a bit more elbow room for this and can squeeze a bit more while not requiring to get blood from stone here. Mind you, again, I don’t think it will be that large, I just don’t see why it needs to be so small. Or why it would be that small.
Would it
really be that much more expensive going Fromm a 100mm^2 die to a, say 120, 140 or even 160?
And this is assuming that it isn’t denser than the desktop equivalent at all, though the GA106 die is denser than it’s sister dies.