How Nvidia DLSS 3 works, and why FSR can’t catch up for now


Nvidia’s RTX 40-series graphics cards are due out in weeks, but beneath all the hardware improvements lies what could be Nvidia’s golden egg: DLSS 3. It’s much more than just an update to Nvidia’s popular DLSS Function (Deep Learning Super Sampling). , and it could end up defining the next generation of Nvidia much more than the graphics cards themselves.

AMD has been working hard to bring FidelityFX Super Resolution (FSR) on par with DLSS, and over the past few months it has been successful. DLSS 3 looks set to change that dynamic – and this time around, FSR may not be able to catch up any time soon.

How DLSS 3 works (and how it doesn’t)

A diagram showing how Nvidia's DLSS 3 technology works.
NVIDIA

You’d be forgiven for thinking that DLSS 3 is an entirely new version of DLSS, but it’s not. Or at least it’s not entirely new. The backbone of DLSS 3 is the same Super Resolution technology available in DLSS titles today, and Nvidia will likely continue to improve it with new releases. Nvidia says you’ll now see the Super Resolution portion of DLSS 3 as a separate option in graphics settings.

The new part is the frame generation. DLSS 3 generates a completely unique image every other frame, essentially generating seven out of every eight pixels you see. See the flow chart below for an illustration. In the case of 4K, your GPU only renders the pixels for 1080p and uses that information not only for the current frame but also for the next frame.

A diagram showing how DLSS reconstructs 3 frames.
NVIDIA

Frame generation will be a separate switch from super resolution, according to Nvidia. That’s because frame generation only works on RTX 40-series GPUs for now, while super-resolution will continue to work on all RTX graphics cards, including in games updated to DLSS 3. It should go without saying, but if half of your frames are fully generated, that will boost your performance by a lot of.

Frame generation isn’t just some AI secret sauce, though. In DLSS 2 and tools like FSR, motion vectors are an important input for upscaling. They describe where objects move from one frame to the next, but motion vectors only apply to the geometry in a scene. Elements without 3D geometry, such as shadows, reflections, and particles, have traditionally been hidden from the upscaling process to avoid visual artifacts.

READ:  Advantages And Disadvantages Of Cloud Computing
A chart-shing move through Nvidia's DLSS 3.
NVIDIA

Masking isn’t an option when an AI is generating a completely unique frame, and that’s where the Optical Flow Accelerator in the RTX 40 series GPUs comes into play. It’s like a motion vector, except the graphics card tracks the movement of individual pixels from one frame to the next. This visual flow field contributes to the AI ​​generated frame along with motion vectors, depth and color.

It all sounds like benefits, but there’s a big problem with frames generated by the AI: they increase latency. The frame generated by the AI ​​never goes through your PC – it’s a “fake” frame, so you won’t see it on traditional FPS meters in games or tools like FRAPS. So the latency doesn’t go down despite having so many extra frames, and due to the computational overhead of the optical flow, the latency actually goes up. Because of this, DLSS 3 requires Nvidia Reflex to compensate for the higher latency.

Normally, your CPU stores a render queue for your graphics card to ensure your GPU is never waiting for work (which would cause stuttering and framerate drops). Reflex removes the render queue and synchronizes your GPU and CPU, so the GPU starts processing as soon as your CPU is able to send instructions. When applied over DLSS 3, Reflex can sometimes even result in a reduction in latency, according to Nvidia.

Where AI makes a difference

AMD’s FSR 2.0 doesn’t use AI, and as I wrote a while ago, it proves that you can achieve the same quality as DLSS using algorithms instead of machine learning. DLSS 3 changes that with its unique frame generation capabilities and the introduction of Optical Flow.

Optical flow isn’t a new idea—it’s been around for decades and has applications in everything from video editing applications to self-driving cars. However, computing optical flow with machine learning is relatively new as more and more datasets are used to train AI models. The reason you want to use AI is simple: it produces fewer visual bugs with enough training and doesn’t incur as much overhead at runtime.

DLSS runs at runtime. It’s possible to develop an algorithm without machine learning to estimate how each pixel moves from one frame to the next, but it’s computationally intensive, which defeats the very purpose of supersampling. With an AI model that doesn’t require a lot of power and enough training data – and rest assured, Nvidia has plenty of training data to work with – you can achieve an optical flow that is high quality and can be run at runtime .

This leads to an improvement in the frame rate even in CPU-limited games. Supersampling only applies to your resolution, which depends almost entirely on your GPU. With a new frame that bypasses CPU processing, DLSS 3 can double frame rates in games even when you’re completely CPU bottlenecked. This is impressive and currently only possible with AI.

Why FSR 2.0 (for now) can’t catch up

FSR and DLSS image quality comparison in God of War.

AMD really did the impossible with FSR 2.0. It looks amazing and the fact that it’s brand independent is even better. I’ve been willing to give up DLSS for FSR 2.0 since I first saw it death loop. But as much as I enjoy FSR 2.0 and think it’s a great piece of AMD’s kit, it’s not going to catch up with DLSS 3 any time soon.

READ:  The Benefits of Edge AI

For starters, developing an algorithm that can track every pixel between frames free of artifacts is difficult enough, especially in a 3D environment with dense fine detail (Cyberpunk 2077 is a prime example). It’s possible, but hard. The bigger issue, however, is how bloated that algorithm would need to be. Tracing each pixel through 3D space, calculating the optical flow, generating a frame and cleaning up any glitches that happen along the way – that’s asking a lot.

Getting this to work while a game is running and still offer framerate improvement on par with FSR 2.0 or DLSS is asking even more. Even with dedicated processors and a trained model, Nvidia still has to use Reflex to compensate for the higher latency caused by the optical flow. Without that hardware or software, FSR would likely sacrifice too much latency to generate frames.

I have no doubt that AMD and other developers will eventually get there – or find another way to work around the problem – but that could take a couple of years. That’s hard to say now.

Coming Soon – GeForce RTX 4090 DLSS 3 First Look Teaser Trailer

What is easy to say is that DLSS 3 looks very exciting. Of course, we’ll have to wait until it’s here to validate Nvidia’s performance claims and see how image quality holds up. So far we only have a short video from Digital Foundry showing DLSS 3 footage (above) which I would highly recommend watching until we see more third party testing. From today’s perspective, however, DLSS 3 looks quite promising.

This article is part of ReSpec – an ongoing bi-weekly column featuring discussion, advice and in-depth coverage of the technology behind PC gaming.

Editor’s Recommendations








Source link