Thraktor
"[✄]. [✄]. [✄]. [✄]." -Microsoft
- Pronouns
- He/Him
Preview of a preview, but some early DLSS 3.0 numbers
It's definitely extrapolation, NVidia specifically mentions generating frames while the CPU bottlenecks a frame drop. It needs a buffer of input frame data in order to infer motion vectors in the OFA but also the DLSS 3.0 use case involves extrapolating frames while also supersampling.
Thanks. I suppose my confusion was from the fact that (as far as I'm aware) DLSS 2 only requires a single frame to be input to perform temporal supersampling, as the temporal aspect is handled implicitly by virtue of it being implemented as a recurrent neural network. It also specifies that the two frames are required in addition to the already generated optical flow field, as per this quote:
The DLSS Frame Generation convolutional autoencoder takes 4 inputs – current and prior game frames, an optical flow field generated by Ada’s Optical Flow Accelerator, and game engine data such as motion vectors and depth.
Perhaps the frame generation portion of DLSS 3 isn't a recurrent network, hence requiring two frames rather than one.
My hope is that something like this simply allows for Wii level motion controls, outside of VR - using the dock, for example, as a reference point, and the tensor cores for inference.
Yeah, the dock (or just the Switch tablet itself, given the joy-cons would be detached), could serve as a reference point for tracking. I'm not sure how accurate it would be possible to get a UWB solution, though. A quick search indicates accuracy in the range of 10cm to 50cm, which is fine for finding a lost pair of keys, but not great for controller tracking. I'm curious if that's a technological limit, or just a limit of current implementations based on their less strict requirements.