StarTopic Future Nintendo Hardware & Technology Speculation & Discussion |ST| (Read the staff posts before commenting!)

Stinky Horse · Sep 20, 2022

DLSS 3.0 sounds like that old frame generation thing that was shown off for Arkham Asylum or one of those games.
Does anyone remember what I mean? Because I barely do, but it was something about using motion vectors to generate an in-between frame.

ILikeFeet · Sep 20, 2022

Drake's potential power draw is probably limiting frame interpolation aspect of DLSS 3.0

Stinky Horse said:
DLSS 3.0 sounds like that old frame generation thing that was shown off for Arkham Asylum or one of those games.

Star Wars Force Unleashed

Deleted member 887 · Sep 20, 2022

bloopland33 said:
Thanks..........I think I understand. Kind of. Maybe? Haha.

I guess I'm still kinda stuck on why emails from an @nvidia email have to be public. I assume this was a mistake obviously - since it gave us info about T239 I wonder if there was some other channel that NVIDIA individual or whoever should have been using so it wasn't added to the archive. Unless the archive is a rule if you're dealing in Linux.

This patch was to head in the mainline kernel. The public mailing lists are the correct path for that, so this isn’t an accidental leak. This is totally normal for kernel development. The only unusual thing here is the fan scrutiny on this particular product.

Gay Bowser · Sep 20, 2022

Bonejack said:
I look forward starting to speculate about Switch 2 Thor starting in 2025. I just need to know if i have to use "historical" Thor or Marvel Thor.

Given Nvidia’s SoC naming conventions (Parker, Xavier) it’s 100% Marvel Thor.

They should demo Drive Thor in a van with Thor airbrushed on the side

Pokemaniac · Sep 20, 2022

bloopland33 said:
Thanks..........I think I understand. Kind of. Maybe? Haha.

I guess I'm still kinda stuck on why emails from an @nvidia email have to be public. I assume this was a mistake obviously - since it gave us info about T239 I wonder if there was some other channel that NVIDIA individual or whoever should have been using so it wasn't added to the archive. Unless the archive is a rule if you're dealing in Linux.

Because the emails were sent to a public mailing list. That's just how Linux kernel development works.

Even if the mailing list itself wasn't public, we'd still have most/all of the same information from version control, anyway.

Lelouch0612 said:
Does the T239 Linux mention mean the chip taped out ?

Probably, I wouldn't expect them to upstream it unless the hardware was finalized.

TheSpearGuy · Sep 20, 2022

yall need jesus

Brofield · Sep 20, 2022

Even if Switch Drake can't do DLSS 3.0, could it still get updates from the DLSS 2.x build? I think 2.4 was the most recent release this past August, but could it still get updates where it's DLSS 2.5, 2.6, etc. that offer additional features?

kvetcha · Sep 20, 2022

ILikeFeet said:
Drake's potential power draw is probably limiting frame interpolation aspect of DLSS 3.0

Also lacks Ada's Optical Flow Accelerator and much faster Tensor cores. The OFA specifically is what powers the frame interpolation, from what I'm reading. My 3080 doesn't share T239s power limitations and I'm not getting 3.0 either.

Vash · Sep 20, 2022

the price of these new gpus yikeeesss. Thankfullly i still have my 3080 in working condition

Anatole · Sep 20, 2022

I guess it’s not really accurate to call it interpolation, since it’s not drawing information from future frames. It’s more like extrapolation.

Skittzo · Sep 20, 2022

TheSpearGuy said:
yall need jesus

?

Hajeman · Sep 20, 2022

Finally, a gigantic news for our coming beloved Switch Drake

Stinky Horse · Sep 20, 2022

ILikeFeet said:
Drake's potential power draw is probably limiting frame interpolation aspect of DLSS 3.0

Star Wars Force Unleashed

ah thanks! yeah I was drawing a blank with Batman.

arsen · Sep 20, 2022

ArchedThunder said:
Man, I hope Switch 2 is able to do DLSS 3.0.

It will most likely have most of the latest tech, but not the really latest

kvetcha · Sep 20, 2022

Anatole said:
I guess it’s not really accurate to call it interpolation, since it’s not drawing information from future frames. It’s more like extrapolation.

Isn't that interpolation? You have point A and point E and you use those to generate points B, C and D?

edit: Nevermind, I think I was misunderstanding the sequence of events in DLSS 3.0.

ezidro3 · Sep 20, 2022

Wasn't there a rumour that Drake was based on an updated Ampere with some backported Lovelace features? Maybe some DLSS 3 enhancements could be used for Drake's version of DLSS

#Salt · Sep 20, 2022

Are we sure T329 will use a78C's. Does a710/a715 not support an 8-core cluster?

Also, I remember someone saying something like how Drake may have features from Ada just like how Erista has features from Pascal, even though it is technically Maxwell. My question is, could these features include support for DLSS 3.0?

Deleted member 2551 · Sep 20, 2022

TheSpearGuy said:
yall need jesus

Jesus Christ

Jesus Christ, referred to in the Xenosaga trilogy as the Messiah (メシア, Meshia), is a major figure referenced multiple times in the Xenosaga series who directed and influenced chaos and Mary Magdalene, setting into motion several wide-reaching events in life and in death. He is seen physically in...

xenosaga.fandom.com

Bandai Namco won't port it.

kvetcha · Sep 20, 2022

ezidro3 said:
Wasn't there a rumour that Drake was based on an updated Ampere with some backported Lovelace features? Maybe some DLSS 3 enhancements could be used for Drake's version of DLSS

I mean, in theory. People have mostly assumed they were referring to AV1 encode.

ILikeFeet · Sep 20, 2022

kvetcha said:
Also lacks Ada's Optical Flow Accelerator and much faster Tensor cores. The OFA specifically is what powers the frame interpolation, from what I'm reading. My 3080 doesn't share T239s power limitations and I'm not getting 3.0 either.

yea, the power draw is why I think it's not there. impossible to know just how much more power the tensor cores need now, but it's definitely more

#Salt said:
Are we sure T329 will use a78C's. Does a710/a715 not support an 8-core cluster?

Also, I remember someone saying something like how Drake may have features from Ada just like how Erista has features from Pascal, even though it is technically Maxwell. My question is, could these features include support for DLSS 3.0?

they do support an 8-core cluster, but they're too new for when Drake started production

olobolger · Sep 20, 2022

bloopland33 said:
So what the hell is lore.kernel.com? The Google search preview says "This is a listing of public inboxes, see the `mirror' link of each inbox for instructions on how to mirror all the data and code on this site."

Why are these inboxes public and on that site!? I don't get it.

This is how the Linux kernel development (and some other OSS projects) works. What you are seeing there is a mailing list being published in a traditional HTTP accessible website as a way of preserving/accessing it.

This has been the way the kernel has been developed since it was born. It could be seen as cumbersome with today's internet collaboration tools standards, but it's something that worked back in the 90s up until now. The archives exist because without them, it would be impossible to follow the discussion unless you are subscribed to the mailing list, and to the specific discussions that arise in there. There are some other archives out there. For example, the discussion that brought us the T239 support into the kernel can also be accessed in here:

LKML: Thierry Reding: Re: [Patch] cpufreq: tegra239: Add support for T239

LKML is just another archive of the Linux Kernel Mailing List.

And to expand into this, the specific file of the Linux kernel source code that contains Tegra cores, CPU frequencies, etc. is in here:

linux/drivers/cpufreq/tegra194-cpufreq.c at master · torvalds/linux

Linux kernel source tree. Contribute to torvalds/linux development by creating an account on GitHub.

github.com

Nvidia is a Linux Kernel contributor. So, an engineer from them is making the necessary changes in the code to add support for T239. He writes to the mailing list, proposing some changes to that file:

C:

> diff --git a/drivers/cpufreq/tegra194-cpufreq.c b/drivers/cpufreq/tegra194-cpufreq.c
> index 1216046cf4c2..f38a760da61b 100644
> --- a/drivers/cpufreq/tegra194-cpufreq.c
> +++ b/drivers/cpufreq/tegra194-cpufreq.c
> @@ -38,14 +38,6 @@
>  /* cpufreq transisition latency */
>  #define TEGRA_CPUFREQ_TRANSITION_LATENCY (300 * 1000) /* unit in nanoseconds */
>
> -enum cluster {
> -    CLUSTER0,
> -    CLUSTER1,
> -    CLUSTER2,
> -    CLUSTER3,
> -    MAX_CLUSTERS,
> -};
> -
>  struct tegra_cpu_ctr {
>      u32 cpu;
>      u32 coreclk_cnt, last_coreclk_cnt;
> @@ -67,12 +59,12 @@ struct tegra_cpufreq_ops {
>  struct tegra_cpufreq_soc {
>      struct tegra_cpufreq_ops *ops;
>      int maxcpus_per_cluster;
> +    size_t num_clusters;
>      phys_addr_t actmon_cntr_base;
>  };
>
>  struct tegra194_cpufreq_data {
>      void __iomem *regs;
> -    size_t num_clusters;
>      struct cpufreq_frequency_table **tables;
>      const struct tegra_cpufreq_soc *soc;
>  };
> @@ -166,6 +158,14 @@ static const struct tegra_cpufreq_soc tegra234_cpufreq_soc = {
>      .ops = &tegra234_cpufreq_ops,
>      .actmon_cntr_base = 0x9000,
>      .maxcpus_per_cluster = 4,
> +    .num_clusters = 3,
> +};
> +
> +const struct tegra_cpufreq_soc tegra239_cpufreq_soc = {
> +    .ops = &tegra234_cpufreq_ops,
> +    .actmon_cntr_base = 0x4000,
> +    .maxcpus_per_cluster = 8,
> +    .num_clusters = 1,
>  };
>
>  static void tegra194_get_cpu_cluster_id(u32 cpu, u32 *cpuid, u32 *clusterid)
> @@ -382,7 +382,7 @@ static int tegra194_cpufreq_init(struct cpufreq_policy *policy)
>
>      data->soc->ops->get_cpu_cluster_id(policy->cpu, NULL, &clusterid);
>
> -    if (clusterid >= data->num_clusters || !data->tables[clusterid])
> +    if (clusterid >= data->soc->num_clusters || !data->tables[clusterid])
>          return -EINVAL;
>
>      start_cpu = rounddown(policy->cpu, maxcpus_per_cluster);
> @@ -433,6 +433,7 @@ static struct tegra_cpufreq_ops tegra194_cpufreq_ops = {
>  static const struct tegra_cpufreq_soc tegra194_cpufreq_soc = {
>      .ops = &tegra194_cpufreq_ops,
>      .maxcpus_per_cluster = 2,
> +    .num_clusters = 4,
>  };
>
>  static void tegra194_cpufreq_free_resources(void)
> @@ -525,15 +526,14 @@ static int tegra194_cpufreq_probe(struct platform_device *pdev)
>
>      soc = of_device_get_match_data(&pdev->dev);
>
> -    if (soc->ops && soc->maxcpus_per_cluster) {
> +    if (soc->ops && soc->maxcpus_per_cluster && soc->num_clusters) {
>          data->soc = soc;
>      } else {
>          dev_err(&pdev->dev, "soc data missing\n");
>          return -EINVAL;
>      }
>
> -    data->num_clusters = MAX_CLUSTERS;
> -    data->tables = devm_kcalloc(&pdev->dev, data->num_clusters,
> +    data->tables = devm_kcalloc(&pdev->dev, data->soc->num_clusters,
>                      sizeof(*data->tables), GFP_KERNEL);
>      if (!data->tables)
>          return -ENOMEM;
> @@ -558,7 +558,7 @@ static int tegra194_cpufreq_probe(struct platform_device *pdev)
>          goto put_bpmp;
>      }
>
> -    for (i = 0; i < data->num_clusters; i++) {
> +    for (i = 0; i < data->soc->num_clusters; i++) {
>          data->tables[i] = init_freq_table(pdev, bpmp, i);
>          if (IS_ERR(data->tables[i])) {
>              err = PTR_ERR(data->tables[i]);
> @@ -590,6 +590,7 @@ static int tegra194_cpufreq_remove(struct platform_device *pdev)
>  static const struct of_device_id tegra194_cpufreq_of_match[] = {
>      { .compatible = "nvidia,tegra194-ccplex", .data = &tegra194_cpufreq_soc },
>      { .compatible = "nvidia,tegra234-ccplex-cluster", .data = &tegra234_cpufreq_soc },
> +    { .compatible = "nvidia,tegra239-ccplex-cluster", .data = &tegra239_cpufreq_soc },
>      { /* sentinel */ }
>  };
>
> --
> 2.17.1
>

And then another engineer reviews them and makes some comments around it (in this case it seems all of them are around consistency in the way of naming the variables, etc.). When they agree on the best course of action, the changes will be merged into the code, and the next Linux release will then include support for that CPU.

The engineer submitting the changes just wrote today on the mailing list, saying that the v2 of the changes addressing the comments will come from their part soon:

LKML: Sumit Gupta: Re: [Patch] cpufreq: tegra239: Add support for T239

Skittzo · Sep 20, 2022

Joseki said:
Jesus Christ

Jesus Christ, referred to in the Xenosaga trilogy as the Messiah (メシア, Meshia), is a major figure referenced multiple times in the Xenosaga series who directed and influenced chaos and Mary Magdalene, setting into motion several wide-reaching events in life and in death. He is seen physically in...

xenosaga.fandom.com

Bandai Namco won't port it.

Harada is standing in the way of the second coming.

Thraktor · Sep 20, 2022

oldpuck said:
You know what, you're right. I saw this, thought "man, this confirms A78C" then immediately read Thraktor's comment and confused myself. This essentially confirms the C variant, yes, as base A78 has a max of 4 CPUs per cluster?

Yeah, A78C is the only variant of A78 that supports 8 cores in a cluster. The newer ARMv9 cores seem to be a bit more flexible in terms of cluster configuration, allowing eight X2 or X3 cores in a cluster, but these make little sense for a device like the Switch. However none of Arm's examples show 8-core configurations for either the A710 or A715, so I presume they're not supported. That puts A78C as the only big core which outside the X series which Arm supports in 8 core configurations, so it's a pretty safe guess.

ReddDreadtheLead said:
Solving Entry-Level Edge AI Challenges with NVIDIA Jetson Orin Nano | NVIDIA Technical Blog

NVIDIA Jetson Orin Nano series system-on-modules (SoMs) deliver up to 80x the AI performance of NVIDIA Jetson Nano and set the new standard for entry-level edge AI.

developer.nvidia.com

Jetson Orin Nano 4GB Jetson Orin Nano 8GB
AI Performance 20 Sparse TOPs | 10 Dense TOPs 40 Sparse TOPs | 20 Dense TOPs
GPU 512-core NVIDIA Ampere Architecture GPU with 16 Tensor Cores 1024-core NVIDIA Ampere Architecture GPU with 32 Tensor Cores
GPU Max Frequency 625 MHz
CPU 6-core Arm Cortex-A78AE v8.2 64-bit CPU 1.5 MB L2 + 4 MB L3
CPU Max Frequency 1.5 GHz
Memory 4GB 64-bit LPDDR5 34 GB/s 8GB 128-bit LPDDR5 68 GB/s
Storage –
(Supports external NVMe)
Video Encode 1080p30 supported by 1-2 CPU cores
Video Decode 1x 4K60 (H.265) | 2x 4K30 (H.265) | 5x 1080p60 (H.265) | 11x 1080p30 (H.265)
Camera Up to 4 cameras (8 through virtual channels*) 8 lanes MIPI CSI-2 D-PHY 2.1 (up to 20 Gbps)
PCIe 1 x4 + 3 x1 (PCIe Gen3, Root Port, & Endpoint)
USB 3x USB 3.2 Gen2 (10 Gbps) 3x USB 2.0
Networking 1x GbE
Display 1x 4K30 multimode DisplayPort 1.2 (+MST)/e DisplayPort 1.4/HDMI 1.4*
Other I/O 3x UART, 2x SPI, 2x I2S, 4x I2C, 1x CAN, DMIC and DSPK, PWM, GPIOs
Power 5W – 10W 7W – 15W
Mechanical 69.6 mm x 45 mm 260-pin SO-DIMM connector
Price $199† $299†

Yeah…. I’m not seeing 8N

This is almost certainly a further cut-down Orin. The CPU is split into two clusters, one with 4 cores and one with 2, and they specify that it's an A78AE, which wouldn't make any sense to use for Drake. The chip also looks identical to the Orin NX.

Dakhil said:
I think T239 being used for a Jetson devkit is unlikely, considering Nvidia recently announced Jetson Orin Nano.

Solving Entry-Level Edge AI Challenges with NVIDIA Jetson Orin Nano | NVIDIA Technical Blog

NVIDIA Jetson Orin Nano series system-on-modules (SoMs) deliver up to 80x the AI performance of NVIDIA Jetson Nano and set the new standard for entry-level edge AI.

developer.nvidia.com

Yeah, it seems like Nvidia disproved me on that one pretty quick! It makes more sense to use Orin for their Jetson line anyway, as it has the double-rate tensor cores and they advertise these based on ML performance.

A new Shield TV seems plausible, and for that reason I wouldn't be surprised to see codec support beyond Nintendo's requirements (ie 8K decode) on the chip. I was thinking a chromebook could be one potential target for this, but I'm not sure what manufacturer would make it. Google could have used it in a PixelBook, but they just scrapped that entire division, so it's not them.

robertman2 · Sep 20, 2022

TheSpearGuy said:
yall need jesus

But I am already here?

Anatole · Sep 20, 2022

kvetcha said:
Isn't that interpolation? You have point A and point E and you use those to generate points B, C and D?

Yes, that’s true for interpolation, but if I understand correctly, with this technique they aren’t rendering point E ahead of time. They are relying entirely on past frames and their motion vectors fields to predict what the content of the next frame will be.

kvetcha · Sep 20, 2022

Anatole said:
Yes, that’s true for interpolation, but if I understand correctly, with this technique they aren’t rendering point E ahead of time. They are relying entirely on past frames and their motion vectors fields to predict what the content of the next frame will be.

It almost sounds like it's alternating generated frames with game engine frames, unless it's doing 2:2:2:2.

Pairs of super-resolution frames from the game, along with both engine and optical flow motion vectors, are then fed into a convolutional neural network that analyzes the data and automatically generates an additional frame for each game-rendered frame — a first for real-time game rendering.

I guess the language could be interpreted several ways. I'm excited to see something like a Digital Foundry breakdown of the technology, either way.

Simba1 · Sep 20, 2022

ReddDreadtheLead said:
You’re comparing apples to oranges here. You’re comparing the first iteration of arm’s new design, which was the A57 that supported 64-bit instruction sets to the eighth generation of CPUs the arm has which support the 64-bit instruction, and the latter is on a much superior node. On top of that you’re comparing a CPU who is designed to not clock beyond 2.1 GHz to a CPU who is designed to clock no higher than 3.3 GHz. The comparison was already flawed to begin with. Yes, they are both arm-based cores, however the A57 is very ancient at this point and as soon as you start clocking it higher than its max or close to its max, You start wasting a lot of energy, you introduce throttling, a lot of heat, etc.

If Drake had a CPU clocked to 2GHz (not saying it will), it would not be comparable to the situation of the A57 and 2GHz.

To make it even clearer, a single A57 core clocked to 1.9 GHz consumes about 2W. A single a78 core clocked to 1.9 GHz consumes less than a quarter of that.

You made quite assumption here and put to me things that I didnt wrote.

Actually I dont comparing A57 in current Switch and A78 in Drake. I know all that, I also know that A57 throttling around 2GHz, also I didn't 1st mentioned 2GHz for A57, guy that reply to me said that Nintendo considered 2GHz for current Switch, I just reply to that. My question was simple, why Nintendo used 1GHz, and didnt used at least 1.2-1.3GHz for A57 instead, if they reportedly wanted 2GHz from A57.
My point is that Nintendo usually going with lower clocks than device could run at, point that Drake could run 8-core CPU at 2GHz doesnt mean automatically that Nintendo will do that, similar like A57 in current Switch could easily run above 1GHz but Nintendo didnt do that.

Of Course that "If Drake had a CPU clocked to 2GHz it would not be comparable to the situation of the A57 and 2GHz", no one said that, I dont know where you get that.

I know also know that.

Dakhil · Sep 20, 2022

Thraktor said:
However none of Arm's examples show 8-core configurations for either the A710 or A715, so I presume they're not supported. That puts A78C as the only big core which outside the X series which Arm supports in 8 core configurations, so it's a pretty safe guess.

But Arm does mention the Cortex-A710 supporting up to 8 CPU cores per cluster and the Cortex-A715 supporting up to 12 CPU cores per cluster.

The Cortex-A715 is probably a no go for Nintendo since the Cortex-A715 doesn't have 32-bit support, which I think can be problematic in terms of backwards compatibility with the Nintendo Switch.

kvetcha · Sep 20, 2022

Simba1 said:
You made quite assumption here and put to me things that I didnt wrote.

Actually I dont comparing A57 in current Switch and A78 in Drake. I know all that, I also know that A57 throttling around 2GHz, also I didn't 1st mentioned 2GHz for A57, guy that reply to me said that Nintendo considered 2GHz for current Switch, I just reply to that. My question was simple, why Nintendo used 1GHz, and didnt used at least 1.2-1.3GHz for A57 instead if they reportedly wanted 2GHz from A57.
My point is that even Drake could run 8-core CPU at 2GHz doesnt mean automatically that Nintendo will do that, same like A57 in current Switch could easily run above 1GHz but Nintendo didnt do that.

Of Course that "If Drake had a CPU clocked to 2GHz it would not be comparable to the situation of the A57 and 2GHz", no one said that, I dont know where you get that.

I know also know that.

I think the details of your framing argument distracted a bit from what you were trying to say. But I get it now.

Basically: Just because the hardware is capable of something does not mean Nintendo will push it to its limit. They will always balance against battery life, heat, and other considerations, and without having a full understanding of the hardware, the node, the cooling solution, the chassis, etc. that play into their decision making, we can't take anything for granted.

Deleted member 887 · Sep 20, 2022

kvetcha said:
Also lacks Ada's Optical Flow Accelerator and much faster Tensor cores. The OFA specifically is what powers the frame interpolation,

Yeah, it seems like they use the OFA to guess the next frame's motion vectors.

Thraktor said:
Yeah, A78C is the only variant of A78 that supports 8 cores in a cluster. The newer ARMv9 cores seem to be a bit more flexible in terms of cluster configuration, allowing eight X2 or X3 cores in a cluster, but these make little sense for a device like the Switch. However none of Arm's examples show 8-core configurations for either the A710 or A715, so I presume they're not supported. That puts A78C as the only big core which outside the X series which Arm supports in 8 core configurations, so it's a pretty safe guess.

The docs say 8 cores are supported, but A710 seems too new to have been used, and A715 definitely is.

Simba1 · Sep 20, 2022

kvetcha said:
I think the details of your framing argument distracted a bit from what you were trying to say. But I get it now.

Basically: Just because the hardware is capable of something does not mean Nintendo will push it to its limit. They will always balance against battery life, heat, and other considerations, and without having a full understanding of the hardware, the node, the cooling solution, the chassis, etc. that play into their decision making, we can't take anything for granted.

I wrote that but you maid it littler cleaner.

ReddDreadtheLead · Sep 20, 2022

TheSpearGuy said:
yall need jesus

you added absolutely zero to this discussion, thanks for that.

Brofield said:
Even if Switch Drake can't do DLSS 3.0, could it still get updates from the DLSS 2.x build? I think 2.4 was the most recent release this past August, but could it still get updates where it's DLSS 2.5, 2.6, etc. that offer additional features?

Yes, since it is still using tensor cores. It would have to be updated in the software suite for devs though.

Anatole said:
I guess it’s not really accurate to call it interpolation, since it’s not drawing information from future frames. It’s more like extrapolation.

So would it be from the past?

Hajeman said:
Finally, a gigantic news for our coming beloved Switch Drake

This wouldn’t really come to Drake I’m afraid.

#Salt said:
Are we sure T329 will use a78C's. Does a710/a715 not support an 8-core cluster?

Also, I remember someone saying something like how Drake may have features from Ada just like how Erista has features from Pascal, even though it is technically Maxwell. My question is, could these features include support for DLSS 3.0?

They do support an 8 core cluster, but A715 does not support 32-bit and a few switch games are 32-bit.

And since it’s a derivative of ORIN, it is likely a78 than A710.

As for Ada Lovelace, we don’t know but it doesn’t seem like it.

ezidro3 said:
Wasn't there a rumour that Drake was based on an updated Ampere with some backported Lovelace features? Maybe some DLSS 3 enhancements could be used for Drake's version of DLSS

We don’t really know tbh.

But I doubt it has Lovelace feature or can use DLSS.

Simba1 said:
You made quite assumption here and put to me things that I didnt wrote.

Actually I dont comparing A57 in current Switch and A78 in Drake. I know all that, I also know that A57 throttling around 2GHz, also I didn't 1st mentioned 2GHz for A57, guy that reply to me said that Nintendo considered 2GHz for current Switch, I just reply to that. My question was simple, why Nintendo used 1GHz, and didnt used at least 1.2-1.3GHz for A57 instead, if they reportedly wanted 2GHz from A57.
My point is that Nintendo usually going with lower clocks than device could run at, point that Drake could run 8-core CPU at 2GHz doesnt mean automatically that Nintendo will do that, similar like A57 in current Switch could easily run above 1GHz but Nintendo didnt do that.

Of Course that "If Drake had a CPU clocked to 2GHz it would not be comparable to the situation of the A57 and 2GHz", no one said that, I dont know where you get that.

I know also know that.

I mean, if that’s what you meant, then you should’ve phrased it like that because the way you were responding to the others implied heavily that you were basically saying that of which I wrote. I apologize if I didn’t understand it but like I can’t really imagine anything else if what’s in front of me, it just looks like you’re painting a different picture

Thraktor · Sep 20, 2022

Dakhil said:
But Arm does mention the Cortex-A710 supporting up to 8 CPU cores per cluster and the Cortex-A715 supporting up to 12 CPU cores per cluster.

The Cortex-A715 is probably a no go for Nintendo since the Cortex-A715 doesn't have 32-bit support, which I think can be problematic in terms of backwards compatibility with the Nintendo Switch.

Thanks, I missed that. Yeah, apparently some Wii U ports use 32 bit binaries, which would make the A715 tricky (although this is something they're going to have to deal with at some point). In theory the A710 would work, but I'd still say the A78C is by far the most likely unless we get any evidence otherwise.

Skittzo said:
Any idea why there's such a large range for power consumption for these? I guess the amount they throttle?

They have different power profiles. As a developer, you can select, say the 15W profile, and it will lower clocks and/or disable cores to bring power consumption down to that level. There may be some throttling too.

kvetcha said:
It almost sounds like it's alternating generated frames with game engine frames, unless it's doing 2:2:2:2.

I guess the language could be interpreted several ways. I'm excited to see something like a Digital Foundry breakdown of the technology, either way.

The fact that it's taking in two frames makes me think this might actually be interpolation, not extrapolation. ie the two frames are the frames before and after the generated frame. I can't see why an extrapolation scheme would require two frames if it's got motion vectors (unless it's for the optical flow?). This should provide better results in terms of image quality over extrapolation, but would add latency, as you effectively have to render a frame in advance before you can generate the intermediate frame.

JoshuaJSlone · Sep 20, 2022

Very different type of tech speculation: In just a few years the standard for VR went from using external cameras for tracking headsets, to doing inside-out tracking using cameras on the headsets themselves. I'm not sure what kind of camera complexity/CPU cost is involved in that, but I wonder if some future Joy-Con successor might use similar tech to bring back better-than-Wii style pointer controls without the need for a sensor bar.

kvetcha · Sep 20, 2022

Thraktor said:
The fact that it's taking in two frames makes me think this might actually be interpolation, not extrapolation. ie the two frames are the frames before and after the generated frame. I can't see why an extrapolation scheme would require two frames if it's got motion vectors (unless it's for the optical flow?). This should provide better results in terms of image quality over extrapolation, but would add latency, as you effectively have to render a frame in advance before you can generate the intermediate frame.

Yeah, the more I think about it the more interpolation makes sense, except that they specifically say DLSS is generating one frame for each game-rendered frame, which makes it sound like you get four frames out for every two frames in. Unless it just means that 'after the first two frames, DLSS alternates generated frames with game-rendered frames.'

niconiconick64 · Sep 20, 2022

ReddDreadtheLead said:
Video Encode 1080p30 supported by 1-2 CPU cores
Video Decode 1x 4K60 (H.265) | 2x 4K30 (H.265) | 5x 1080p60 (H.265) | 11x 1080p30 (H.265)

Wait a minute, only 30FPS for video encode? that's kinda disappointing

I mean, my cope is that it's at least capable of 720p60. Shouldn't be hard, right? I mean, most cheapo chinese capture cards that are capable of 1080p30 always support 720p60 on the flipside. Well, actually MJPEG and YUY2 (at 480p). So not really comparable... But you get the point.

Currently, iirc the switch on both handheld and docked mode captures video (up to 30 seconds) @ 720p30 encoded in h264. But many games don't allow you to even use said feature because of "performance concerns". So I'm thinking this might be a huge upgrade since they could offload it to a dedicated core.

At least 4k60 H265 video decode should make it easier to bring more streaming platforms to the console. I wish NVIDIA said anything about vp9 since av1 is still too new to be even relevant to this discussion.

Anatole · Sep 20, 2022

kvetcha said:
It almost sounds like it's alternating generated frames with game engine frames, unless it's doing 2:2:2:2.

I guess the language could be interpreted several ways. I'm excited to see something like a Digital Foundry breakdown of the technology, either way.

They are definitely alternating - there’s a diagram on this page that shows it:

I believe the pairs of images part is talking about the optical flow estimation:

Ada’s Optical Flow Accelerator analyzes two sequential in-game frames and calculates an optical flow field. The optical flow field captures the direction and speed at which pixels are moving from frame 1 to frame 2. The Optical Flow Accelerator is able to capture pixel-level information such as particles, reflections, shadows, and lighting, which are not included in game engine motion vector calculations. In the motorcycle example below, the motion flow of the motorcyclist accurately represents that the shadow stays in roughly the same place on the screen with respect to their bike.

The description of the inputs on that page also doesn't mention having to draw from future frames:

The DLSS Frame Generation convolutional autoencoder takes 4 inputs – current and prior game frames, an optical flow field generated by Ada’s Optical Flow Accelerator, and game engine data such as motion vectors and depth.

Still not 100% certain, but I feel more confident that they really are relying completely on inference for the generated frames.

ReddDreadtheLead said:
So would it be from the past?

Yeah, if I understand it correctly anyway.

Deleted member 887 · Sep 20, 2022

Preview of a preview, but some early DLSS 3.0 numbers

Thraktor said:
The fact that it's taking in two frames makes me think this might actually be interpolation, not extrapolation. ie the two frames are the frames before and after the generated frame. I can't see why an extrapolation scheme would require two frames if it's got motion vectors (unless it's for the optical flow?). This should provide better results in terms of image quality over extrapolation, but would add latency, as you effectively have to render a frame in advance before you can generate the intermediate frame.

It's definitely extrapolation, NVidia specifically mentions generating frames while the CPU bottlenecks a frame drop. It needs a buffer of input frame data in order to infer motion vectors in the OFA but also the DLSS 3.0 use case involves extrapolating frames while also supersampling.

Dark Cloud · Sep 20, 2022

It’ll be called Switch DLSS 3.0

kvetcha · Sep 20, 2022

Anatole said:
They are definitely alternating - there’s a diagram on this page that shows it:

I believe the pairs of images part is talking about the optical flow estimation:

The description of the inputs on that page also doesn't mention having to draw from future frames:

Still not 100% certain, but I feel more confident that they really are relying completely on inference for the generated frames.

Yeah, if I understand it correctly anyway.

Very cool, thanks for pointing this out.

ReddDreadtheLead · Sep 20, 2022

if there is any Ada feature that comes to Drake, the only one I can see coming with low chance is the SER feature:

Shader Execution Reordering (SER) that improves execution efficiency by rescheduling shading workloads on the fly to better utilize the GPU’s resources. As significant an innovation as out-of-order execution was for CPUs, SER improves ray-tracing performance up to 3x and in-game frame rates by up to 25%.

NVIDIA Delivers Quantum Leap in Performance, Introduces New Era of Neural Rendering With GeForce RTX 40 Series

NVIDIA today unveiled the GeForce RTX® 40 Series of GPUs, designed to deliver revolutionary performance for gamers and creators, led by its new flagship, the RTX 4090 GPU, with up to 4x the performance of its predecessor.

nvidianews.nvidia.com

This would make it more efficient at doing the same job with RT or otherwise.

I don’t see 4th gen Tensor Cores or 3rd gen Ray Tracing Cores. But, something akin to an out of order executor for the GPU but useful for RT would be really nice.

Simba1 · Sep 20, 2022

ReddDreadtheLead said:
I mean, if that’s what you meant, then you should’ve phrased it like that because the way you were responding to the others implied heavily that you were basically saying that of which I wrote. I apologize if I didn’t understand it but like I can’t really imagine anything else if what’s in front of me, it just looks like you’re painting a different picture

I mean, I wrote couple of times that Nintendo usually going with lower clocks than device could run at, I even mentioned point that Drake could run 8-core CPU at 2GHz doesnt mean automatically that Nintendo will do that", or wrote that A57 in current Switch could easily run above 1GHz but Nintendo didnt do that,
while I didnt compared 2GHz on A57 and on 2GHz on A78 (Drake) at all.

It seems that some of you guys doing reply to me for things I didnt wrote or made point, while in same time ignoring my main points,
maybe my wording is bad, but maybe some of you don't read hole discussion and posts carefully.

In any case, anyone wants bet that Drake CPU want run at 2GHz or higher!?

Herb Alpert · Sep 20, 2022

TheSpearGuy said:
yall need jesus

Nah

ShaunSwitch · Sep 20, 2022

Just thinking about where Nvidia can take DLSS from here. A recent addition to DLSS was DRS. DLSS would target say 4k resolution and just change the input resolution to meet a given performance target.

As DLSS 3.0 gets better, what's stopping them from adding even more inferred frames? Say 2 frames inferred for each one rendered? They could then implement it into a DRS type solution where you specify target resolution and frame rate and it just modifies input resolution and the number of inferred frames between real frames to meet a performance target.

Crazy potential that could be unlocked from this technology, great for low power consumption gaming as well as inferring frames is less demanding compared to rendering them. Part of the reason I love my switch is because it draws barely anything from the wall, great for both my wallet and the environment.

Thraktor · Sep 20, 2022

JoshuaJSlone said:
Very different type of tech speculation: In just a few years the standard for VR went from using external cameras for tracking headsets, to doing inside-out tracking using cameras on the headsets themselves. I'm not sure what kind of camera complexity/CPU cost is involved in that, but I wonder if some future Joy-Con successor might use similar tech to bring back better-than-Wii style pointer controls without the need for a sensor bar.

From a technical point of view I'm sure they could, but from a practical point of view the small size of joy-cons would probably be a hinderance. VR controllers tend to be quite large, usually with a halo ring of tracking points on top, to make them visible to the tracking cameras, and to provide enough distance between tracking points to properly triangulate the position. The joy-cons have to be small for the Switch to be portable, and even if you added a bunch of tracking points onto them they'd pretty much all be covered by people's hands while in use. I wonder if something along the lines of the ultra-wideband RF-based location tracking Apple use in their U1 chip could provide accurate enough tracking without the need for cameras.

Deleted member 887 · Sep 20, 2022

ShaunSwitch said:
Just thinking about where Nvidia can take DLSS from here. A recent addition to DLSS was DRS. DLSS would target say 4k resolution and just change the input resolution to meet a given performance target.

As DLSS 3.0 gets better, what's stopping them from adding even more inferred frames? Say 2 frames inferred for each one rendered? They could then implement it into a DRS type solution where you specify target resolution and frame rate and it just modifies input resolution and the number of inferred frames between real frames to meet a performance target.

I think that's the eventual target. You set an output resolution and frame rate and DLSS - or whatever the end upscaler is - guarantees it, upscaling and extrapolating as it goes. Ultimately, the only barrier is actually rendering frames starving out the GPU before it can generate pixel/frame data.

It'll be interesting to see if we start to see art styles converge because of it, the same way UE has a certain "look" that it's various tools seem to push devs to. If DLSS/FSR/XeSS become essential to deliver a triple-A experience (and they will), will their various weakness/strengths be designed to, and how much will that bubble up to saminess in the visual design?

Deleted member 887 · Sep 20, 2022

Thraktor said:
From a technical point of view I'm sure they could, but from a practical point of view the small size of joy-cons would probably be a hinderance. VR controllers tend to be quite large, usually with a halo ring of tracking points on top, to make them visible to the tracking cameras, and to provide enough distance between tracking points to properly triangulate the position. The joy-cons have to be small for the Switch to be portable, and even if you added a bunch of tracking points onto them they'd pretty much all be covered by people's hands while in use. I wonder if something along the lines of the ultra-wideband RF-based location tracking Apple use in their U1 chip could provide accurate enough tracking without the need for cameras.

My hope is that something like this simply allows for Wii level motion controls, outside of VR - using the dock, for example, as a reference point, and the tensor cores for inference.

Dark Cloud · Sep 20, 2022

TheSpearGuy said:
yall need jesus

Maybe relax

arsen · Sep 20, 2022

Dark Cloud said:
Maybe relax

it seems to be a bait similar to the ones who can read under the big twitter accounts, the best is to just ignore

Onizuka · Sep 20, 2022

What is happening today? What new information do we have? I'm trying to collect today's information but it's getting out of hand.
I don't fully understand what is happening and also my language is not English (in fact I usually use translators to be able to communicate).
The reason for my question about what's going on today is because I'm seeing a lot of movement and comments on social media about the T239 and other clues that seem to lead us to the Switch review. Could someone explain what is happening? Thanks.

	Jetson Orin Nano 4GB	Jetson Orin Nano 8GB
AI Performance	20 Sparse TOPs \| 10 Dense TOPs	40 Sparse TOPs \| 20 Dense TOPs
GPU	512-core NVIDIA Ampere Architecture GPU with 16 Tensor Cores	1024-core NVIDIA Ampere Architecture GPU with 32 Tensor Cores
GPU Max Frequency	625 MHz
CPU	6-core Arm Cortex-A78AE v8.2 64-bit CPU 1.5 MB L2 + 4 MB L3
CPU Max Frequency	1.5 GHz
Memory	4GB 64-bit LPDDR5 34 GB/s	8GB 128-bit LPDDR5 68 GB/s
Storage	– (Supports external NVMe)
Video Encode	1080p30 supported by 1-2 CPU cores
Video Decode	1x 4K60 (H.265) \| 2x 4K30 (H.265) \| 5x 1080p60 (H.265) \| 11x 1080p30 (H.265)
Camera	Up to 4 cameras (8 through virtual channels*) 8 lanes MIPI CSI-2 D-PHY 2.1 (up to 20 Gbps)
PCIe	1 x4 + 3 x1 (PCIe Gen3, Root Port, & Endpoint)
USB	3x USB 3.2 Gen2 (10 Gbps) 3x USB 2.0
Networking	1x GbE
Display	1x 4K30 multimode DisplayPort 1.2 (+MST)/e DisplayPort 1.4/HDMI 1.4*
Other I/O	3x UART, 2x SPI, 2x I2S, 4x I2C, 1x CAN, DMIC and DSPK, PWM, GPIOs
Power	5W – 10W	7W – 15W
Mechanical	69.6 mm x 45 mm 260-pin SO-DIMM connector
Price	$199†	$299†

StarTopic Future Nintendo Hardware & Technology Speculation & Discussion |ST| (Read the staff posts before commenting!)

Bob-omb

Warpstar Knight

Deleted member 887

Guest

i’m lost out here for good

Caught: 1025

transcended to the gods

Magical Famicomrade

hoopy frood

Tears of the Kingdom Enjoyer

Octorok

Baba Yaga Hut

Handheld

Bob-omb

Tektite

hoopy frood

Moblin

Rattata

Deleted member 2551

Guest

hoopy frood

Warpstar Knight

Tektite

Baba Yaga Hut

"[✄]. [✄]. [✄]. [✄]." -Microsoft

Chain Chomp

Octorok

hoopy frood

Bob-omb

2010 experience points!

hoopy frood

Deleted member 887

Guest

Bob-omb

#TeamLate2025WithAPotentialForEarly2026

"[✄]. [✄]. [✄]. [✄]." -Microsoft

Kremling

hoopy frood

Shriekbat

Octorok

Deleted member 887

Guest

Warpstar Knight

hoopy frood

#TeamLate2025WithAPotentialForEarly2026

Bob-omb

Insert clever pun here

Moblin

"[✄]. [✄]. [✄]. [✄]." -Microsoft

Deleted member 887

Guest

Deleted member 887

Guest

Warpstar Knight

Tektite

Great Player Onizuka