• Hey everyone, staff have documented a list of banned content and subject matter that we feel are not consistent with site values, and don't make sense to host discussion of on Famiboards. This list (and the relevant reasoning per item) is viewable here.

StarTopic Future Nintendo Hardware & Technology Speculation & Discussion |ST| (Read the staff posts before commenting!)

DLSS 3.0 sounds like that old frame generation thing that was shown off for Arkham Asylum or one of those games.
Does anyone remember what I mean? Because I barely do, but it was something about using motion vectors to generate an in-between frame.
 
Last edited:
Thanks..........I think I understand. Kind of. Maybe? Haha.

I guess I'm still kinda stuck on why emails from an @nvidia email have to be public. I assume this was a mistake obviously - since it gave us info about T239 I wonder if there was some other channel that NVIDIA individual or whoever should have been using so it wasn't added to the archive. Unless the archive is a rule if you're dealing in Linux.
This patch was to head in the mainline kernel. The public mailing lists are the correct path for that, so this isn’t an accidental leak. This is totally normal for kernel development. The only unusual thing here is the fan scrutiny on this particular product.
 
0
I look forward starting to speculate about Switch 2 Thor starting in 2025. I just need to know if i have to use "historical" Thor or Marvel Thor.
Given Nvidia’s SoC naming conventions (Parker, Xavier) it’s 100% Marvel Thor.

They should demo Drive Thor in a van with Thor airbrushed on the side
 
Thanks..........I think I understand. Kind of. Maybe? Haha.

I guess I'm still kinda stuck on why emails from an @nvidia email have to be public. I assume this was a mistake obviously - since it gave us info about T239 I wonder if there was some other channel that NVIDIA individual or whoever should have been using so it wasn't added to the archive. Unless the archive is a rule if you're dealing in Linux.
Because the emails were sent to a public mailing list. That's just how Linux kernel development works.

Even if the mailing list itself wasn't public, we'd still have most/all of the same information from version control, anyway.
Does the T239 Linux mention mean the chip taped out ?
Probably, I wouldn't expect them to upstream it unless the hardware was finalized.
 
Even if Switch Drake can't do DLSS 3.0, could it still get updates from the DLSS 2.x build? I think 2.4 was the most recent release this past August, but could it still get updates where it's DLSS 2.5, 2.6, etc. that offer additional features?
 
Drake's potential power draw is probably limiting frame interpolation aspect of DLSS 3.0
Also lacks Ada's Optical Flow Accelerator and much faster Tensor cores. The OFA specifically is what powers the frame interpolation, from what I'm reading. My 3080 doesn't share T239s power limitations and I'm not getting 3.0 either. :cry:
 
the price of these new gpus yikeeesss. Thankfullly i still have my 3080 in working condition
 
0
I guess it’s not really accurate to call it interpolation, since it’s not drawing information from future frames. It’s more like extrapolation.
Isn't that interpolation? You have point A and point E and you use those to generate points B, C and D?

edit: Nevermind, I think I was misunderstanding the sequence of events in DLSS 3.0.
 
Are we sure T329 will use a78C's. Does a710/a715 not support an 8-core cluster?

Also, I remember someone saying something like how Drake may have features from Ada just like how Erista has features from Pascal, even though it is technically Maxwell. My question is, could these features include support for DLSS 3.0?
 
yall need jesus



Bandai Namco won't port it.
 
Wasn't there a rumour that Drake was based on an updated Ampere with some backported Lovelace features? Maybe some DLSS 3 enhancements could be used for Drake's version of DLSS
I mean, in theory. People have mostly assumed they were referring to AV1 encode.
 
Also lacks Ada's Optical Flow Accelerator and much faster Tensor cores. The OFA specifically is what powers the frame interpolation, from what I'm reading. My 3080 doesn't share T239s power limitations and I'm not getting 3.0 either. :cry:
yea, the power draw is why I think it's not there. impossible to know just how much more power the tensor cores need now, but it's definitely more

Are we sure T329 will use a78C's. Does a710/a715 not support an 8-core cluster?

Also, I remember someone saying something like how Drake may have features from Ada just like how Erista has features from Pascal, even though it is technically Maxwell. My question is, could these features include support for DLSS 3.0?

they do support an 8-core cluster, but they're too new for when Drake started production
 
So what the hell is lore.kernel.com? The Google search preview says "This is a listing of public inboxes, see the `mirror' link of each inbox for instructions on how to mirror all the data and code on this site."

Why are these inboxes public and on that site!? I don't get it.

This is how the Linux kernel development (and some other OSS projects) works. What you are seeing there is a mailing list being published in a traditional HTTP accessible website as a way of preserving/accessing it.

This has been the way the kernel has been developed since it was born. It could be seen as cumbersome with today's internet collaboration tools standards, but it's something that worked back in the 90s up until now. The archives exist because without them, it would be impossible to follow the discussion unless you are subscribed to the mailing list, and to the specific discussions that arise in there. There are some other archives out there. For example, the discussion that brought us the T239 support into the kernel can also be accessed in here:


LKML is just another archive of the Linux Kernel Mailing List.


And to expand into this, the specific file of the Linux kernel source code that contains Tegra cores, CPU frequencies, etc. is in here:


Nvidia is a Linux Kernel contributor. So, an engineer from them is making the necessary changes in the code to add support for T239. He writes to the mailing list, proposing some changes to that file:

C:
> diff --git a/drivers/cpufreq/tegra194-cpufreq.c b/drivers/cpufreq/tegra194-cpufreq.c
> index 1216046cf4c2..f38a760da61b 100644
> --- a/drivers/cpufreq/tegra194-cpufreq.c
> +++ b/drivers/cpufreq/tegra194-cpufreq.c
> @@ -38,14 +38,6 @@
>  /* cpufreq transisition latency */
>  #define TEGRA_CPUFREQ_TRANSITION_LATENCY (300 * 1000) /* unit in nanoseconds */
>
> -enum cluster {
> -    CLUSTER0,
> -    CLUSTER1,
> -    CLUSTER2,
> -    CLUSTER3,
> -    MAX_CLUSTERS,
> -};
> -
>  struct tegra_cpu_ctr {
>      u32 cpu;
>      u32 coreclk_cnt, last_coreclk_cnt;
> @@ -67,12 +59,12 @@ struct tegra_cpufreq_ops {
>  struct tegra_cpufreq_soc {
>      struct tegra_cpufreq_ops *ops;
>      int maxcpus_per_cluster;
> +    size_t num_clusters;
>      phys_addr_t actmon_cntr_base;
>  };
>
>  struct tegra194_cpufreq_data {
>      void __iomem *regs;
> -    size_t num_clusters;
>      struct cpufreq_frequency_table **tables;
>      const struct tegra_cpufreq_soc *soc;
>  };
> @@ -166,6 +158,14 @@ static const struct tegra_cpufreq_soc tegra234_cpufreq_soc = {
>      .ops = &tegra234_cpufreq_ops,
>      .actmon_cntr_base = 0x9000,
>      .maxcpus_per_cluster = 4,
> +    .num_clusters = 3,
> +};
> +
> +const struct tegra_cpufreq_soc tegra239_cpufreq_soc = {
> +    .ops = &tegra234_cpufreq_ops,
> +    .actmon_cntr_base = 0x4000,
> +    .maxcpus_per_cluster = 8,
> +    .num_clusters = 1,
>  };
>
>  static void tegra194_get_cpu_cluster_id(u32 cpu, u32 *cpuid, u32 *clusterid)
> @@ -382,7 +382,7 @@ static int tegra194_cpufreq_init(struct cpufreq_policy *policy)
>
>      data->soc->ops->get_cpu_cluster_id(policy->cpu, NULL, &clusterid);
>
> -    if (clusterid >= data->num_clusters || !data->tables[clusterid])
> +    if (clusterid >= data->soc->num_clusters || !data->tables[clusterid])
>          return -EINVAL;
>
>      start_cpu = rounddown(policy->cpu, maxcpus_per_cluster);
> @@ -433,6 +433,7 @@ static struct tegra_cpufreq_ops tegra194_cpufreq_ops = {
>  static const struct tegra_cpufreq_soc tegra194_cpufreq_soc = {
>      .ops = &tegra194_cpufreq_ops,
>      .maxcpus_per_cluster = 2,
> +    .num_clusters = 4,
>  };
>
>  static void tegra194_cpufreq_free_resources(void)
> @@ -525,15 +526,14 @@ static int tegra194_cpufreq_probe(struct platform_device *pdev)
>
>      soc = of_device_get_match_data(&pdev->dev);
>
> -    if (soc->ops && soc->maxcpus_per_cluster) {
> +    if (soc->ops && soc->maxcpus_per_cluster && soc->num_clusters) {
>          data->soc = soc;
>      } else {
>          dev_err(&pdev->dev, "soc data missing\n");
>          return -EINVAL;
>      }
>
> -    data->num_clusters = MAX_CLUSTERS;
> -    data->tables = devm_kcalloc(&pdev->dev, data->num_clusters,
> +    data->tables = devm_kcalloc(&pdev->dev, data->soc->num_clusters,
>                      sizeof(*data->tables), GFP_KERNEL);
>      if (!data->tables)
>          return -ENOMEM;
> @@ -558,7 +558,7 @@ static int tegra194_cpufreq_probe(struct platform_device *pdev)
>          goto put_bpmp;
>      }
>
> -    for (i = 0; i < data->num_clusters; i++) {
> +    for (i = 0; i < data->soc->num_clusters; i++) {
>          data->tables[i] = init_freq_table(pdev, bpmp, i);
>          if (IS_ERR(data->tables[i])) {
>              err = PTR_ERR(data->tables[i]);
> @@ -590,6 +590,7 @@ static int tegra194_cpufreq_remove(struct platform_device *pdev)
>  static const struct of_device_id tegra194_cpufreq_of_match[] = {
>      { .compatible = "nvidia,tegra194-ccplex", .data = &tegra194_cpufreq_soc },
>      { .compatible = "nvidia,tegra234-ccplex-cluster", .data = &tegra234_cpufreq_soc },
> +    { .compatible = "nvidia,tegra239-ccplex-cluster", .data = &tegra239_cpufreq_soc },
>      { /* sentinel */ }
>  };
>
> --
> 2.17.1
>


And then another engineer reviews them and makes some comments around it (in this case it seems all of them are around consistency in the way of naming the variables, etc.). When they agree on the best course of action, the changes will be merged into the code, and the next Linux release will then include support for that CPU.

The engineer submitting the changes just wrote today on the mailing list, saying that the v2 of the changes addressing the comments will come from their part soon:

 

Bandai Namco won't port it.
Harada is standing in the way of the second coming.
 
You know what, you're right. I saw this, thought "man, this confirms A78C" then immediately read Thraktor's comment and confused myself. This essentially confirms the C variant, yes, as base A78 has a max of 4 CPUs per cluster?

Yeah, A78C is the only variant of A78 that supports 8 cores in a cluster. The newer ARMv9 cores seem to be a bit more flexible in terms of cluster configuration, allowing eight X2 or X3 cores in a cluster, but these make little sense for a device like the Switch. However none of Arm's examples show 8-core configurations for either the A710 or A715, so I presume they're not supported. That puts A78C as the only big core which outside the X series which Arm supports in 8 core configurations, so it's a pretty safe guess.

Jetson Orin Nano 4GBJetson Orin Nano 8GB
AI Performance20 Sparse TOPs | 10 Dense TOPs40 Sparse TOPs | 20 Dense TOPs
GPU512-core NVIDIA Ampere Architecture GPU with 16 Tensor Cores1024-core NVIDIA Ampere Architecture GPU with 32 Tensor Cores
GPU Max Frequency625 MHz
CPU6-core Arm Cortex-A78AE v8.2 64-bit CPU 1.5 MB L2 + 4 MB L3
CPU Max Frequency1.5 GHz
Memory4GB 64-bit LPDDR5 34 GB/s8GB 128-bit LPDDR5 68 GB/s
Storage
(Supports external NVMe)
Video Encode 1080p30 supported by 1-2 CPU cores
Video Decode1x 4K60 (H.265) | 2x 4K30 (H.265) | 5x 1080p60 (H.265) | 11x 1080p30 (H.265)
CameraUp to 4 cameras (8 through virtual channels*) 8 lanes MIPI CSI-2 D-PHY 2.1 (up to 20 Gbps)
PCIe1 x4 + 3 x1 (PCIe Gen3, Root Port, & Endpoint)
USB3x USB 3.2 Gen2 (10 Gbps) 3x USB 2.0
Networking1x GbE
Display1x 4K30 multimode DisplayPort 1.2 (+MST)/e DisplayPort 1.4/HDMI 1.4*
Other I/O3x UART, 2x SPI, 2x I2S, 4x I2C, 1x CAN, DMIC and DSPK, PWM, GPIOs
Power5W – 10W7W – 15W
Mechanical69.6 mm x 45 mm 260-pin SO-DIMM connector
Price$199†$299†



Yeah…. I’m not seeing 8N

This is almost certainly a further cut-down Orin. The CPU is split into two clusters, one with 4 cores and one with 2, and they specify that it's an A78AE, which wouldn't make any sense to use for Drake. The chip also looks identical to the Orin NX.

I think T239 being used for a Jetson devkit is unlikely, considering Nvidia recently announced Jetson Orin Nano.
Yeah, it seems like Nvidia disproved me on that one pretty quick! It makes more sense to use Orin for their Jetson line anyway, as it has the double-rate tensor cores and they advertise these based on ML performance.

A new Shield TV seems plausible, and for that reason I wouldn't be surprised to see codec support beyond Nintendo's requirements (ie 8K decode) on the chip. I was thinking a chromebook could be one potential target for this, but I'm not sure what manufacturer would make it. Google could have used it in a PixelBook, but they just scrapped that entire division, so it's not them.
 
Isn't that interpolation? You have point A and point E and you use those to generate points B, C and D?
Yes, that’s true for interpolation, but if I understand correctly, with this technique they aren’t rendering point E ahead of time. They are relying entirely on past frames and their motion vectors fields to predict what the content of the next frame will be.
 
Yes, that’s true for interpolation, but if I understand correctly, with this technique they aren’t rendering point E ahead of time. They are relying entirely on past frames and their motion vectors fields to predict what the content of the next frame will be.
It almost sounds like it's alternating generated frames with game engine frames, unless it's doing 2:2:2:2.
Pairs of super-resolution frames from the game, along with both engine and optical flow motion vectors, are then fed into a convolutional neural network that analyzes the data and automatically generates an additional frame for each game-rendered frame — a first for real-time game rendering.
I guess the language could be interpreted several ways. I'm excited to see something like a Digital Foundry breakdown of the technology, either way.
 
You’re comparing apples to oranges here. You’re comparing the first iteration of arm’s new design, which was the A57 that supported 64-bit instruction sets to the eighth generation of CPUs the arm has which support the 64-bit instruction, and the latter is on a much superior node. On top of that you’re comparing a CPU who is designed to not clock beyond 2.1 GHz to a CPU who is designed to clock no higher than 3.3 GHz. The comparison was already flawed to begin with. Yes, they are both arm-based cores, however the A57 is very ancient at this point and as soon as you start clocking it higher than its max or close to its max, You start wasting a lot of energy, you introduce throttling, a lot of heat, etc.

If Drake had a CPU clocked to 2GHz (not saying it will), it would not be comparable to the situation of the A57 and 2GHz.

To make it even clearer, a single A57 core clocked to 1.9 GHz consumes about 2W. A single a78 core clocked to 1.9 GHz consumes less than a quarter of that.

You made quite assumption here and put to me things that I didnt wrote.

Actually I dont comparing A57 in current Switch and A78 in Drake. I know all that, I also know that A57 throttling around 2GHz, also I didn't 1st mentioned 2GHz for A57, guy that reply to me said that Nintendo considered 2GHz for current Switch, I just reply to that. My question was simple, why Nintendo used 1GHz, and didnt used at least 1.2-1.3GHz for A57 instead, if they reportedly wanted 2GHz from A57.
My point is that Nintendo usually going with lower clocks than device could run at, point that Drake could run 8-core CPU at 2GHz doesnt mean automatically that Nintendo will do that, similar like A57 in current Switch could easily run above 1GHz but Nintendo didnt do that.

Of Course that "If Drake had a CPU clocked to 2GHz it would not be comparable to the situation of the A57 and 2GHz", no one said that, I dont know where you get that.

I know also know that.
 
However none of Arm's examples show 8-core configurations for either the A710 or A715, so I presume they're not supported. That puts A78C as the only big core which outside the X series which Arm supports in 8 core configurations, so it's a pretty safe guess.
But Arm does mention the Cortex-A710 supporting up to 8 CPU cores per cluster and the Cortex-A715 supporting up to 12 CPU cores per cluster.

The Cortex-A715 is probably a no go for Nintendo since the Cortex-A715 doesn't have 32-bit support, which I think can be problematic in terms of backwards compatibility with the Nintendo Switch.
 
You made quite assumption here and put to me things that I didnt wrote.

Actually I dont comparing A57 in current Switch and A78 in Drake. I know all that, I also know that A57 throttling around 2GHz, also I didn't 1st mentioned 2GHz for A57, guy that reply to me said that Nintendo considered 2GHz for current Switch, I just reply to that. My question was simple, why Nintendo used 1GHz, and didnt used at least 1.2-1.3GHz for A57 instead if they reportedly wanted 2GHz from A57.
My point is that even Drake could run 8-core CPU at 2GHz doesnt mean automatically that Nintendo will do that, same like A57 in current Switch could easily run above 1GHz but Nintendo didnt do that.

Of Course that "If Drake had a CPU clocked to 2GHz it would not be comparable to the situation of the A57 and 2GHz", no one said that, I dont know where you get that.

I know also know that.
I think the details of your framing argument distracted a bit from what you were trying to say. But I get it now.

Basically: Just because the hardware is capable of something does not mean Nintendo will push it to its limit. They will always balance against battery life, heat, and other considerations, and without having a full understanding of the hardware, the node, the cooling solution, the chassis, etc. that play into their decision making, we can't take anything for granted.
 
Also lacks Ada's Optical Flow Accelerator and much faster Tensor cores. The OFA specifically is what powers the frame interpolation,
Yeah, it seems like they use the OFA to guess the next frame's motion vectors.

Yeah, A78C is the only variant of A78 that supports 8 cores in a cluster. The newer ARMv9 cores seem to be a bit more flexible in terms of cluster configuration, allowing eight X2 or X3 cores in a cluster, but these make little sense for a device like the Switch. However none of Arm's examples show 8-core configurations for either the A710 or A715, so I presume they're not supported. That puts A78C as the only big core which outside the X series which Arm supports in 8 core configurations, so it's a pretty safe guess.
The docs say 8 cores are supported, but A710 seems too new to have been used, and A715 definitely is.
 
0
I think the details of your framing argument distracted a bit from what you were trying to say. But I get it now.

Basically: Just because the hardware is capable of something does not mean Nintendo will push it to its limit. They will always balance against battery life, heat, and other considerations, and without having a full understanding of the hardware, the node, the cooling solution, the chassis, etc. that play into their decision making, we can't take anything for granted.

I wrote that but you maid it littler cleaner. :)
 
yall need jesus

you added absolutely zero to this discussion, thanks for that.
Even if Switch Drake can't do DLSS 3.0, could it still get updates from the DLSS 2.x build? I think 2.4 was the most recent release this past August, but could it still get updates where it's DLSS 2.5, 2.6, etc. that offer additional features?
Yes, since it is still using tensor cores. It would have to be updated in the software suite for devs though.

I guess it’s not really accurate to call it interpolation, since it’s not drawing information from future frames. It’s more like extrapolation.
So would it be from the past?
Finally, a gigantic news for our coming beloved Switch Drake 🧡
This wouldn’t really come to Drake I’m afraid.

Are we sure T329 will use a78C's. Does a710/a715 not support an 8-core cluster?

Also, I remember someone saying something like how Drake may have features from Ada just like how Erista has features from Pascal, even though it is technically Maxwell. My question is, could these features include support for DLSS 3.0?
They do support an 8 core cluster, but A715 does not support 32-bit and a few switch games are 32-bit.

And since it’s a derivative of ORIN, it is likely a78 than A710.


As for Ada Lovelace, we don’t know but it doesn’t seem like it.
Wasn't there a rumour that Drake was based on an updated Ampere with some backported Lovelace features? Maybe some DLSS 3 enhancements could be used for Drake's version of DLSS
We don’t really know tbh.


But I doubt it has Lovelace feature or can use DLSS.

You made quite assumption here and put to me things that I didnt wrote.

Actually I dont comparing A57 in current Switch and A78 in Drake. I know all that, I also know that A57 throttling around 2GHz, also I didn't 1st mentioned 2GHz for A57, guy that reply to me said that Nintendo considered 2GHz for current Switch, I just reply to that. My question was simple, why Nintendo used 1GHz, and didnt used at least 1.2-1.3GHz for A57 instead, if they reportedly wanted 2GHz from A57.
My point is that Nintendo usually going with lower clocks than device could run at, point that Drake could run 8-core CPU at 2GHz doesnt mean automatically that Nintendo will do that, similar like A57 in current Switch could easily run above 1GHz but Nintendo didnt do that.

Of Course that "If Drake had a CPU clocked to 2GHz it would not be comparable to the situation of the A57 and 2GHz", no one said that, I dont know where you get that.

I know also know that.
I mean, if that’s what you meant, then you should’ve phrased it like that because the way you were responding to the others implied heavily that you were basically saying that of which I wrote. I apologize if I didn’t understand it but like I can’t really imagine anything else if what’s in front of me, it just looks like you’re painting a different picture
 
But Arm does mention the Cortex-A710 supporting up to 8 CPU cores per cluster and the Cortex-A715 supporting up to 12 CPU cores per cluster.

The Cortex-A715 is probably a no go for Nintendo since the Cortex-A715 doesn't have 32-bit support, which I think can be problematic in terms of backwards compatibility with the Nintendo Switch.
Thanks, I missed that. Yeah, apparently some Wii U ports use 32 bit binaries, which would make the A715 tricky (although this is something they're going to have to deal with at some point). In theory the A710 would work, but I'd still say the A78C is by far the most likely unless we get any evidence otherwise.

Any idea why there's such a large range for power consumption for these? I guess the amount they throttle?

They have different power profiles. As a developer, you can select, say the 15W profile, and it will lower clocks and/or disable cores to bring power consumption down to that level. There may be some throttling too.

It almost sounds like it's alternating generated frames with game engine frames, unless it's doing 2:2:2:2.

I guess the language could be interpreted several ways. I'm excited to see something like a Digital Foundry breakdown of the technology, either way.

The fact that it's taking in two frames makes me think this might actually be interpolation, not extrapolation. ie the two frames are the frames before and after the generated frame. I can't see why an extrapolation scheme would require two frames if it's got motion vectors (unless it's for the optical flow?). This should provide better results in terms of image quality over extrapolation, but would add latency, as you effectively have to render a frame in advance before you can generate the intermediate frame.
 
Very different type of tech speculation: In just a few years the standard for VR went from using external cameras for tracking headsets, to doing inside-out tracking using cameras on the headsets themselves. I'm not sure what kind of camera complexity/CPU cost is involved in that, but I wonder if some future Joy-Con successor might use similar tech to bring back better-than-Wii style pointer controls without the need for a sensor bar.
 
The fact that it's taking in two frames makes me think this might actually be interpolation, not extrapolation. ie the two frames are the frames before and after the generated frame. I can't see why an extrapolation scheme would require two frames if it's got motion vectors (unless it's for the optical flow?). This should provide better results in terms of image quality over extrapolation, but would add latency, as you effectively have to render a frame in advance before you can generate the intermediate frame.
Yeah, the more I think about it the more interpolation makes sense, except that they specifically say DLSS is generating one frame for each game-rendered frame, which makes it sound like you get four frames out for every two frames in. Unless it just means that 'after the first two frames, DLSS alternates generated frames with game-rendered frames.'
 
0
Video Encode1080p30 supported by 1-2 CPU cores
Video Decode1x 4K60 (H.265) | 2x 4K30 (H.265) | 5x 1080p60 (H.265) | 11x 1080p30 (H.265)
Wait a minute, only 30FPS for video encode? that's kinda disappointing :(
I mean, my cope is that it's at least capable of 720p60. Shouldn't be hard, right? I mean, most cheapo chinese capture cards that are capable of 1080p30 always support 720p60 on the flipside. Well, actually MJPEG and YUY2 (at 480p). So not really comparable... But you get the point.

Currently, iirc the switch on both handheld and docked mode captures video (up to 30 seconds) @ 720p30 encoded in h264. But many games don't allow you to even use said feature because of "performance concerns". So I'm thinking this might be a huge upgrade since they could offload it to a dedicated core.

At least 4k60 H265 video decode should make it easier to bring more streaming platforms to the console. I wish NVIDIA said anything about vp9 since av1 is still too new to be even relevant to this discussion.
 
It almost sounds like it's alternating generated frames with game engine frames, unless it's doing 2:2:2:2.

I guess the language could be interpreted several ways. I'm excited to see something like a Digital Foundry breakdown of the technology, either way.
They are definitely alternating - there’s a diagram on this page that shows it:
nvidia-dlss-3-reconstructs-pixels.png

I believe the pairs of images part is talking about the optical flow estimation:
Ada’s Optical Flow Accelerator analyzes two sequential in-game frames and calculates an optical flow field. The optical flow field captures the direction and speed at which pixels are moving from frame 1 to frame 2. The Optical Flow Accelerator is able to capture pixel-level information such as particles, reflections, shadows, and lighting, which are not included in game engine motion vector calculations. In the motorcycle example below, the motion flow of the motorcyclist accurately represents that the shadow stays in roughly the same place on the screen with respect to their bike.
The description of the inputs on that page also doesn't mention having to draw from future frames:
The DLSS Frame Generation convolutional autoencoder takes 4 inputs – current and prior game frames, an optical flow field generated by Ada’s Optical Flow Accelerator, and game engine data such as motion vectors and depth.
Still not 100% certain, but I feel more confident that they really are relying completely on inference for the generated frames.
So would it be from the past?
Yeah, if I understand it correctly anyway.
 


Preview of a preview, but some early DLSS 3.0 numbers
The fact that it's taking in two frames makes me think this might actually be interpolation, not extrapolation. ie the two frames are the frames before and after the generated frame. I can't see why an extrapolation scheme would require two frames if it's got motion vectors (unless it's for the optical flow?). This should provide better results in terms of image quality over extrapolation, but would add latency, as you effectively have to render a frame in advance before you can generate the intermediate frame.
It's definitely extrapolation, NVidia specifically mentions generating frames while the CPU bottlenecks a frame drop. It needs a buffer of input frame data in order to infer motion vectors in the OFA but also the DLSS 3.0 use case involves extrapolating frames while also supersampling.
 
They are definitely alternating - there’s a diagram on this page that shows it:
nvidia-dlss-3-reconstructs-pixels.png

I believe the pairs of images part is talking about the optical flow estimation:

The description of the inputs on that page also doesn't mention having to draw from future frames:

Still not 100% certain, but I feel more confident that they really are relying completely on inference for the generated frames.

Yeah, if I understand it correctly anyway.
Very cool, thanks for pointing this out.
 
0
if there is any Ada feature that comes to Drake, the only one I can see coming with low chance is the SER feature:

  • Shader Execution Reordering (SER) that improves execution efficiency by rescheduling shading workloads on the fly to better utilize the GPU’s resources. As significant an innovation as out-of-order execution was for CPUs, SER improves ray-tracing performance up to 3x and in-game frame rates by up to 25%.




This would make it more efficient at doing the same job with RT or otherwise.


I don’t see 4th gen Tensor Cores or 3rd gen Ray Tracing Cores. But, something akin to an out of order executor for the GPU but useful for RT would be really nice.
 
I mean, if that’s what you meant, then you should’ve phrased it like that because the way you were responding to the others implied heavily that you were basically saying that of which I wrote. I apologize if I didn’t understand it but like I can’t really imagine anything else if what’s in front of me, it just looks like you’re painting a different picture

I mean, I wrote couple of times that Nintendo usually going with lower clocks than device could run at, I even mentioned point that Drake could run 8-core CPU at 2GHz doesnt mean automatically that Nintendo will do that", or wrote that A57 in current Switch could easily run above 1GHz but Nintendo didnt do that,
while I didnt compared 2GHz on A57 and on 2GHz on A78 (Drake) at all.

It seems that some of you guys doing reply to me for things I didnt wrote or made point, while in same time ignoring my main points,
maybe my wording is bad, but maybe some of you don't read hole discussion and posts carefully.


In any case, anyone wants bet that Drake CPU want run at 2GHz or higher!? :)
 
0
Just thinking about where Nvidia can take DLSS from here. A recent addition to DLSS was DRS. DLSS would target say 4k resolution and just change the input resolution to meet a given performance target.

As DLSS 3.0 gets better, what's stopping them from adding even more inferred frames? Say 2 frames inferred for each one rendered? They could then implement it into a DRS type solution where you specify target resolution and frame rate and it just modifies input resolution and the number of inferred frames between real frames to meet a performance target.

Crazy potential that could be unlocked from this technology, great for low power consumption gaming as well as inferring frames is less demanding compared to rendering them. Part of the reason I love my switch is because it draws barely anything from the wall, great for both my wallet and the environment.
 
Very different type of tech speculation: In just a few years the standard for VR went from using external cameras for tracking headsets, to doing inside-out tracking using cameras on the headsets themselves. I'm not sure what kind of camera complexity/CPU cost is involved in that, but I wonder if some future Joy-Con successor might use similar tech to bring back better-than-Wii style pointer controls without the need for a sensor bar.
From a technical point of view I'm sure they could, but from a practical point of view the small size of joy-cons would probably be a hinderance. VR controllers tend to be quite large, usually with a halo ring of tracking points on top, to make them visible to the tracking cameras, and to provide enough distance between tracking points to properly triangulate the position. The joy-cons have to be small for the Switch to be portable, and even if you added a bunch of tracking points onto them they'd pretty much all be covered by people's hands while in use. I wonder if something along the lines of the ultra-wideband RF-based location tracking Apple use in their U1 chip could provide accurate enough tracking without the need for cameras.
 
Just thinking about where Nvidia can take DLSS from here. A recent addition to DLSS was DRS. DLSS would target say 4k resolution and just change the input resolution to meet a given performance target.

As DLSS 3.0 gets better, what's stopping them from adding even more inferred frames? Say 2 frames inferred for each one rendered? They could then implement it into a DRS type solution where you specify target resolution and frame rate and it just modifies input resolution and the number of inferred frames between real frames to meet a performance target.
I think that's the eventual target. You set an output resolution and frame rate and DLSS - or whatever the end upscaler is - guarantees it, upscaling and extrapolating as it goes. Ultimately, the only barrier is actually rendering frames starving out the GPU before it can generate pixel/frame data.

It'll be interesting to see if we start to see art styles converge because of it, the same way UE has a certain "look" that it's various tools seem to push devs to. If DLSS/FSR/XeSS become essential to deliver a triple-A experience (and they will), will their various weakness/strengths be designed to, and how much will that bubble up to saminess in the visual design?
 
From a technical point of view I'm sure they could, but from a practical point of view the small size of joy-cons would probably be a hinderance. VR controllers tend to be quite large, usually with a halo ring of tracking points on top, to make them visible to the tracking cameras, and to provide enough distance between tracking points to properly triangulate the position. The joy-cons have to be small for the Switch to be portable, and even if you added a bunch of tracking points onto them they'd pretty much all be covered by people's hands while in use. I wonder if something along the lines of the ultra-wideband RF-based location tracking Apple use in their U1 chip could provide accurate enough tracking without the need for cameras.
My hope is that something like this simply allows for Wii level motion controls, outside of VR - using the dock, for example, as a reference point, and the tensor cores for inference.
 
What is happening today? What new information do we have? I'm trying to collect today's information but it's getting out of hand.
I don't fully understand what is happening and also my language is not English (in fact I usually use translators to be able to communicate).
The reason for my question about what's going on today is because I'm seeing a lot of movement and comments on social media about the T239 and other clues that seem to lead us to the Switch review. Could someone explain what is happening? Thanks.
 
Please read this staff post before posting.

Furthermore, according to this follow-up post, all off-topic chat will be moderated.
Last edited:


Back
Top Bottom