In fact, RT uses so few rays that the raw image generated looks worse than a 2001 flip phone camera taking a picture at night. It doesn't even have connected lines, just semi-random dots. The job of the denoiser is to go and connect those dots into a coherent image. You can think of a denoiser kinda like anti-aliasing on steroids - antialiasing takes jagged lines, figures out what the artistic intent of those lines was supposed to be, and smoothes it out.
The problem with anti-aliasing is that it can be blurry - you're deleting "real" pixels, and replacing them with higher res guesses. It's cleaner, but deletes real detail to get a smoother output. That's why DLSS 2 is also an anti-aliaser - DLSS 2 needs that raw, "real" pixel data to feed its AI model, with the goal of producing an image that keeps all the detail and smoothes the output.
And that's why DLSS 2 has not always interacted well with RT effects. The RT denoiser runs before DLSS 2 does, deleting useful information that DLSS would normally use to show you a higher resolution image. DLSS 3.5 now replaces the denoiser with its own upscaler, just as it replaced anti-aliasing tech before it. This has four big advantages.
This post is a great explainer, but there are some fundamental differences between denoising and antialiasing that are worth clarifying. For example, I wouldn't say that anti-aliasing is deleting "real" detail. Both noise and aliasing are sampling artifacts, but their origin is very different. I'll elaborate on why:
Spatial frequency
Just like how signals that vary in time have a frequency, so do signals that vary in space. A really simple signal in time is something like sin(2 * pi * t). It's easy to imagine a similar signal in 2D space: sin(2 * pi * x) * sin(2 * pi * y). I'll have Wolfram Alpha plot it:
The important thing to notice here is that the frequency content in x and y
separable. You could have a function that has a higher frequency in x then in y, like sin(5 * pi * x) * sin(2 * pi * y):
So just like time frequency has dimensions 1/[Time], spatial frequency is a 1-dimensional concept with dimensions 1/[Length]. The way the signal varies with x and the way it varies with y are independent. That's true in 1D, 2D, 3D... N-D, but we care about 2D because images are 2D signals.
What is aliasing, really?
Those sine functions above are continuous; you know exactly what the value is at every point you can imagine. But a digital image is discrete; it's made up of a finite number of equally spaced points. To make a discrete signal out of a continuous signal, you have to
sample each point on the grid. If you want to take that discrete signal back to a continuous signal, then you have to
reconstruct the original signal.
Ideally, that reconstruction would be perfect. A signal is called band-limited if the highest frequency is finite. For example, in digital music, we think of most signals as band-limited to the frequency that the human ears can hear, which is generally accepted to be around 20,000 Hz. A very important theory in digital signal processing, called the Nyquist-Shannon theorem, says that you can reconstruct a band-limited signal perfectly if you sample at
more than twice the highest frequency in the signal. That's why music with a 44 kHz sampling rate is considered lossless; 44 kHz is more than twice the limit of human hearing at 20 kHz, so the audio signal can be perfectly reconstructed.
When you sample at less than twice the highest frequency, it's no longer possible to perfectly reconstruct the original signal. Instead, you get an
aliased representation of the data. The sampling rate of a digital image is the resolution of the sensor, in a camera, or of the display in a computer-rendered image. This sampling rate needs to be high enough to correctly represent the information that you want to capture/display; otherwise, you will get aliasing.
By the way, this tells us why we get diminishing returns with increasing resolution. Since the x and y components of the signal are separable, quadrupling the "resolution" in the sense of the number of pixels (for example, going from 1080p to 2160p) only doubles the Nyquist frequency in x and in y.
So why does aliasing get explained as "jagged edges" so often? Well, any discontinuity, like a geometric edge, in an image is essentially an infinite frequency. With an infinite frequency, the signal is not band-limited, and there's no frequency that can satisfy the Nyquist-Shannon theorem. It's
impossible to get perfect reconstruction. (
https://pbr-book.org/3ed-2018/Sampling_and_Reconstruction/Sampling_Theory) But you can also have aliasing without a discontinuity, when the spatial resolution is too low to represent a signal (this is the reason why texture mipmaps exist; lower resolution mipmaps are low-pass filtered to remove high frequency content, preventing aliasing).
You can even have temporal aliasing in a game, when the framerate is too low to represent something moving quickly (for example, imagine a particle oscillating between 2 positions at 30 Hz; if your game is rendering at less than 60 fps, then by the Nyquist-Shannon theorem, the motion of the particle will be temporally aliased).
So what do we do to get around aliasing?
The best solution, from an image quality perspective, is to low-pass filter the signal
before sampling it. Which yes, does essentially mean blurring it. For a continuous signal, the best function is called the sinc function, because it acts as a perfect low pass filter in frequency space. But the sinc function is infinite, so the best you can do in discrete space is to use a finite approximation. That, with some hand-waving, is what Lanczos filtering is, which (plus some extra functionality to handle contrast at the edges and the like) is how FSR handles reconstruction. Samples of the scene are collected in each frame, warped by the motion vectors, then filtered to reconstruct as much of the higher frequency information as possible.
The old-school methods of anti-aliasing, like supersampling and MSAA, worked similarly. You take more samples than you need (in the case of MSAA, you do it selectively near edges), then low-pass filter them to generate a final image without aliasing. By the way, even though it seems like an intuitive choice, the averaging filter (e.g. taking 4 4K pixels and averaging them to a single 1080p pixel) is actually kind of a shitty low-pass filter, because it introduces ringing artifacts in frequency space. Lanczos is much better.
An alternative way to do the filtering is to use a convolutional neural network (specifically, a convolutional autoencoder). DLDSR is a low-pass filter for spatial supersampling, and of course, DLSS does reconstruction. These are preferable to Lanczos because, since the signal is discrete and not band-limited, there's no perfect analytical filter for reconstruction. Instead of doing contrast-adaptive shenanigans like FSR does, you can just train a neural network to do the work. (And, by the way, if Lanczos is the ideal filter, then the neural network will learn to reproduce Lanczos, because a neural network is a universal function approximator; with enough nodes, it can learn any function.). Internally, the convolutional neural network downsamples the image several times while learning relevant features about the image, then you use the learned features to reconstruct the output image.
What's different about ray tracing, from a signal processing perspective?
(I have no professional background in rendering. I do work that involves image processing, so I know more about that. But I have done some reading about this for fun, so let's go).
When light hits a surface, some amount of it is transmitted, and some amount is scattered. To calculate the emitted light, you have to solve what's called the light transport equation, which is essentially an integral over some function that describes how the material emits light. But in most cases, this equation does not have an exact, analytic solution. Instead, you need to use a numerical approximation.
Monte Carlo algorithms numerically approximate an integral by randomly sampling over the integration domain. Path tracing is the application of a Monte Carlo algorithm to the light transport equation. Because you are randomly sampling, you get image noise, which converges with more random samples. But if you have a good denoising algorithm, you can reduce the number of samples for convergence. Unsurprisingly, convolutional autoencoders are also very good at this (because again, universal function approximators). Again, I'm not in this field, but I mean, Nvidia's published on it before (
https://research.nvidia.com/publica...n-monte-carlo-image-sequences-using-recurrent). It's out there!
And yes, you can have aliasing in ray-traced images. If you took all the ray samples from the same pixel grid, and you happen to come across any high-frequency information, it would be aliased. So instead, you can randomly distribute the Monte Carlo samples, using some sampling algorithm (
https://www.pbr-book.org/3ed-2018/Monte_Carlo_Integration/Careful_Sample_Placement).
Once you have the samples, DLSS was already very similar in structure to a denoising algorithm. If, for example, the Halton sampling algorithm (
https://pbr-book.org/3ed-2018/Sampling_and_Reconstruction/The_Halton_Sampler) for distributing Monte Carlo samples sounds familiar, it's because it's the algorithm that Nvidia recommends for subpixel jittering in DLSS. So temporal upscalers like DLSS already exploit random distribution to sample and reconstruct higher frequency information. So it makes sense to combine the DLSS reconstruction passes for rasterized and ray traced samples because, in many ways, the way the data are structured and processed is very similar.
tl;dr
Aliasing is an artifact of undersampling a high-frequency signal. Good anti-aliasing methods filter out the high frequency information before sampling to remove aliasing from the signal. Temporal reconstruction methods, like DLSS and FSR, use randomly jittered samples collected over multiple frames to reconstruct high frequency image content.
Noise in ray tracing is an artifact of randomly sampling rays using a Monte Carlo algorithm. Instead of taking large numbers of random samples, denoising algorithms attempt to reconstruct the signal from a noisy input.