Client
Interact with Intel® product support specialists on client concerns and recommendations
48 Discussions

Assessing Video Quality in Real-time Computer Graphics

Anton_Sochenov
Employee
0 1 7,321

This technical deep dive was written by Akshay Jindal and Anton Sochenov as part of their research efforts at the Visual Compute and Graphics lab, within Intel Labs in collaboration with Nabil Sadaka.

 Highlights

  • In this blog post, we introduce our new Computer Graphics Video Quality Metric (CGVQM), designed to capture complex artifacts introduced by modern rendering techniques in computer graphics.
  • CGVQM is a step toward a smarter, more perceptually grounded way of building and evaluating the future of graphics.
  • When evaluated on CG-VQD and other datasets, CGVQM outperforms all state-of-the-art metrics in predicting human ratings.
  • Ongoing research aims to expand CGVQM’s reach by incorporating saliency, motion coherence, and semantic awareness.

teaser_3.png

Introduction

In today’s visually rich digital world, assessing the quality of videos isn’t just a matter of taste; it is a technical necessity. Whether it’s fine-tuning a game engine, streaming a high-performance game over the cloud, or evaluating a new rendering technique, video quality assessment plays a crucial role across a range of scenarios.

Take cloud gaming, for instance. Platforms must compress and stream gameplay in real time, often under varying network conditions. If the compression is too aggressive, visual fidelity suffers; if it's too lenient, latency spikes become an issue. Similarly, in real-time rendering, developers constantly juggle performance and quality, deciding whether to allocate GPU resources toward smoother shadows or higher frame rates. Many game engines rely on quality benchmarks to compare competing algorithms or rendering pipelines.

The most accurate way to measure visual quality is through user studies, where human participants rate different videos. These studies help quantify how noticeable or annoying specific distortions are, depending on rendering settings or compression levels. However, they come with a major drawback: they’re time-consuming, costly, and hard to scale, especially when testing hundreds of video samples across different devices and content types.

This is where objective video quality metrics come in. These automated tools act as proxies for human perception, predicting how good (or bad) a video looks compared to a pristine reference. The goal is to replicate what a human would rate without needing a lab full of participants every time a new rendering tweak is tested.

In this blog post, we introduce our new Computer Graphics Video Quality Metric (CGVQM), designed to capture complex artifacts introduced by modern rendering techniques in computer graphics.

The Challenges of Video Quality Assessment in Modern Games

Most of the video quality metrics we use today were originally designed to detect compression artifacts, the kinds of visual glitches that appear when a video is heavily compressed to save bandwidth. These include blockiness, blurring, and color banding, and are common in streaming platforms like YouTube or Netflix. Metrics such as PSNR (Peak Signal-to-Noise Ratio) and SSIM (Structural Similarity Index) are widely used to measure how closely a compressed video resembles its original version.

But when it comes to modern game rendering, these traditional metrics often fall short. In recent years, rendering techniques have advanced dramatically. Tools like neural supersampling, path tracing, novel-view synthesis, and variable rate shading are now being used in cutting-edge games to deliver more immersive visuals. However, these methods come with their own unique visual quirks, artifacts such as ghosting, temporal flicker, shimmering noise, and even hallucinated textures introduced by neural networks. These distortions are often spatio-temporal in nature, meaning they change over both space (image regions) and time (across frames).

To make matters more complicated, real-time graphics content in games behaves very differently from natural videos. Think of how a game camera quickly pans across a battlefield or how lighting dynamically changes in response to player actions. These fast-changing, synthetic visuals don't always trigger the same perceptual cues as a real-world video would, making it harder to judge quality using existing models trained on natural content.

What we really need is a new kind of video quality metric — one that:

  • Can detect and rate these modern distortions in a way that reflects human perception.
  • Is able to output results on an interpretable scale (e.g., “annoying” vs. “imperceptible”).
  • Is transparent, showing not just a single quality score but also where and why it thinks errors occur.
  • Works robustly across diverse content, from stylized fantasy environments to photorealistic open worlds, and across a wide range of rendering techniques.

ghosting_taa.pngFigure 1. TAA ghosting artifacts from a video game. Notice how video game scene statistics can differ from the real world.

A Novel Video Quality Metric and Dataset for Computer Graphics

To bridge this gap in video quality assessment, we created a specialized dataset and metric designed specifically for the needs of real-time computer graphics. This effort resulted in the Computer Graphics Video Quality Dataset (CGVQD), a first-of-its-kind collection focused on spatio-temporal distortions introduced by modern rendering techniques.

The dataset includes 80 short video sequences depicting a wide range of visual artifacts commonly seen in games. These were generated using six popular rendering methods:

  • Neural supersampling (e.g., DLSS, XeSS)
  • Novel-view synthesis (e.g., from 3D Gaussian splatting)
  • Path tracing
  • Neural denoising
  • Frame interpolation
  • Variable rate shading

To understand how these artifacts affect human perception, we conducted a subjective study involving 20 participants, each of whom rated the perceived quality of the distorted videos compared to their reference versions. These ratings produced Difference Mean Opinion Scores (DMOS), a standard way to quantify how much a distortion deviates from the ideal, based on human opinion.

The dataset captures a variety of distortions, including:

  • Spatio-temporal aliasing: jagged or unstable lines in motion
  • Flicker and ghosting: fast-moving shadows or light trails that feel “off”
  • Moire and fireflies: patterned noise or bright specks from rendering errors
  • Blur, tiling, and hallucinations: loss of detail or incorrect reconstruction by neural models

Figure 2. Examples of distortions from CGVQD. Parts of videos have been magnified for illustration. Notice the aliasing, ghosting, flickering, and reconstruction artifacts resulting from rendering errors.

Using this dataset, we developed a new objective video quality metric called CGVQM, built on top of a 3D ResNet-18 architecture. The core idea is simple but powerful: 3D convolutional neural networks (CNNs), originally trained to recognize actions in videos, also learn feature spaces that strongly align with human perception of visual quality.

Instead of retraining the network from scratch, we reused the internal activations (deep features) of the 3D CNN and calibrated them to predict video quality by comparing differences between reference and distorted videos. This approach not only improves accuracy but also produces:

  • A global quality score, summarizing overall perceptual degradation
  • Per-pixel error maps, which visually highlight where and how distortions occur

When evaluated on CG-VQD and other datasets, CGVQM outperforms all state-of-the-art metrics in predicting human ratings. Figure 3 below shows how well each metric correlates with human opinion, where CGVQM achieves the highest score, indicating its strong perceptual alignment.

benchmark_icgvqa_small.pngFigure 3. A bar plot showing PLCC (Pearson Linear Correlation Coefficient) values – how well each metric correlates with human opinion – with CGVQM achieving the highest score.

In addition to accuracy, CGVQM offers interpretability. Here's an example of its error maps, where the metric visually identifies and weighs distortions, highlighting noise in diffuse areas as more noticeable (e.g., tree shadows), and de-emphasizing noise in less visible areas (e.g., textured leaves).

Figure 4. Reference (left), Noisy path traced video (middle), CGVQM error map (right)

By combining high accuracy, interpretability, and generalizability, CGVQM provides a powerful new tool for graphics researchers and game developers to evaluate rendering quality as perceived by an end user without the need for costly human studies.

Applications of CGVQM

Video quality metrics aren’t just for academic benchmarking; they can directly impact the way developers, researchers, and studios build, optimize, and evaluate graphics systems. As a full-reference metric designed specifically for computer graphics, CGVQM opens up new possibilities for improving visual quality, reducing development time, and making smarter trade-offs in real-time applications.

Here are a couple of examples of how CGVQM can be applied:

  1. Smarter Reference Generation for Denoising

    Training modern denoising algorithms – especially neural ones – requires a large dataset of video pairs: one with noisy low-sample-per-pixel (spp) renders and the other with artifact-free high-spp ground truth. Generating such high-quality reference videos is computationally expensive. In some scenes, producing a perceptually "perfect" reference might require more than 100,000 spp, which can take hours or days to render.

    CGVQM can make this training data generation process more efficient by identifying the minimum spp level at which the output becomes perceptually indistinguishable from ultra-high-quality references. For example, instead of rendering a 16K-spp reference video, you might discover that 256-spp is already “perceptually indistinguishable,” saving hours of computation per video.

    Figure 5. Path traced videos at different samples-per-pixel counts and their corresponding CGVQM error maps.

    Beyond just defining upper bounds on reference quality, CGVQM is also fully differentiable, making it a promising candidate for use as a loss function during training. This opens the door to training denoisers that optimize not just for pixel accuracy, but for visual quality, effectively answering the question: "Is the network output perceptually good enough?"

  2.  Optimizing Quality-Performance Trade-offs in Upscaling

    AI-based upscaling or frame generation technologies improve frame rates but can introduce visible artifacts – such as ghosting, flickering, or hallucinated details – especially in fast-moving scenes.

    CGVQM provides a quantitative way to evaluate these trade-offs. By analyzing how perceptual quality degrades across different input resolutions and upscaling configurations, developers can determine the sweet spot, where performance gains are maximized with minimal visual compromise.

    For instance, suppose you’re deciding between rendering at 1/3, 1/2, or 2/3 resolution. A simple FPS counter might suggest the lowest setting, but CGVQM might reveal that quality sharply degrades below 1/2 resolution, giving you a more informed and perceptually aware decision.
    upscaling.pngFigure 6. CGVQM predictions closely match human ratings and can be used to select optimal quality settings in a video game.

These are just a couple of scenarios where CGVQM can help. Whether you’re training neural renderers, evaluating engine updates, or testing new upscaling techniques, having a perceptual metric that aligns with human judgment is a huge advantage.

Got your own ideas? Tell us in the comments how you might use CGVQM in your development or research pipeline!

What’s Next?

While CGVQM offers a powerful toolkit for evaluating rendering quality during technology development, it's not without limitations. One key constraint is that CGVQM is a full-reference metric, which means it requires access to a perfect, undistorted version of the video for comparison. In controlled experiments or offline training, this is feasible. However, in many practical scenarios, such as testing in-game settings, graphics optimizations, or user experience evaluations, a pixel-perfect reference simply isn’t available. Beyond this, perceptual quality in games involves more than just detecting pixel differences:

  • Some visual areas matter more than others (this is called saliency).
  • Smooth, coherent motion is sometimes more important than per-frame sharpness.
  • In AI-generated content, semantic preservation – making sure things look “correct” even if they’re not identical – can be more meaningful than exact reproduction.

The next steps in our research aim to address these challenges. We are actively working to extend CGVQM with capabilities that:

  • Relax the need for pixel-perfect references, enabling no-reference or reduced-reference assessments.
  • Account for visual saliency to weigh distortions by how noticeable they are.
  • Include temporal smoothness and motion consistency as part of the quality evaluation.
  • Evaluate semantic correctness, especially in AI-generated pixels, where realism matters more than accuracy.

These improvements will make CGVQM more practical for real-time quality assessment in production environments, opening the door to smarter optimization decisions across gaming, streaming, and even creative AI workflows.

Summary

As real-time graphics continue to evolve – driven by neural rendering, AI upscaling, and photorealistic techniques – the demand for accurate, perceptual video quality assessment is more critical than ever. Traditional metrics built for compression artifacts in natural videos simply don’t cut it in the dynamic, artifact-prone world of modern games.

CGVQM, introduced alongside the CG-VQD dataset, marks an important step forward. It offers a perception-aligned, explainable, and scalable way to evaluate rendering quality, one that not only correlates well with human judgment but also delivers actionable insights through per-pixel error maps. Whether it’s speeding up training dataset creation for denoisers or fine-tuning quality-performance trade-offs in AI-based rendering pipelines, CGVQM brings real utility to developers and researchers alike.

While its current reliance on reference videos limits some applications, ongoing work aims to expand CGVQM’s reach by incorporating saliency, motion coherence, and semantic awareness, making it even more robust for real-world scenarios.

In short, CGVQM is not just a new metric; it is a step toward a smarter, more perceptually grounded way of building and evaluating the future of graphics.

Check back soon for the links to our paper and source code.

Series

    1. Path Tracing a Trillion Triangles
    2. Neural Image Reconstruction for Real-Time Path Tracing
    3. Jungle Ruins Scene: Technical Art Meets Real-Time Path-Tracing Research
    4. Path Tracing Massive Dynamic Geometry in Jungle Ruins
    5. Assessing Video Quality in Real-time Computer Graphics - this post

 

About the Author
I support the Real-Time Graphics Research team at Intel, focusing on a combination of classical and novel neural technologies that push the boundaries of industrial research. Previously, I led the software engineering team within the Meta Reality Labs' Graphics Research Team, working on a graphics stack for socially acceptable augmented reality (AR) glasses. Before that, I was bringing telepresence technology to Microsoft HoloLens AR headset.
1 Comment
Vipitis
Beginner

I loved reading this series of blog posts, I suspected it  may lead up to publications at HPG'25 and I was hoping for potentially new products too. 

 

To me the final section is most interesting, I wrote my thesis on evaluating language model generated shader code. And finding a good metric was a real struggle, as stated here the literature is mostly providing reconstruction loss from video compression research. I was looking for something more semantic, as the functions I generate and compare can influence the scene very drastically like camera transformations, and still be semantically correct with the limited context. In my research I did not get to a semantic loss and instead of just have a semantic match without considerations to footprint and performance. This remains the biggest open question in my results: is this a good or bad variation. 

 

Looking forward to the full paper as I want to read up on how your metric does on more rendering specific errors such as missing triangles, Z fighting or even maths errors due to UB where bloom blows out the screen.