Video super-resolution explained

Video super-resolution (VSR) is the process of generating high-resolution video frames from the given low-resolution video frames. Unlike single-image super-resolution (SISR), the main goal is not only to restore more fine details while saving coarse ones, but also to preserve motion consistency.

There are many approaches for this task, but this problem still remains to be popular and challenging.----

Mathematical explanation

Most research considers the degradation process of frames as

\{y\}=(\{x\}*k)\downarrow{s}+\{n\}

where:

\{x\}

— original high-resolution frame sequence,

k

— blur kernel,

*

— convolution operation,

\downarrow{s}

— downscaling operation,

\{n\}

— additive noise,

\{y\}

— low-resolution frame sequence.

Super-resolution is an inverse operation, so its problem is to estimate frame sequence

\{\overline{x}\}

from frame sequence

\{y\}

so that

\{\overline{x}\}

is close to original

\{x\}

. Blur kernel, downscaling operation and additive noise should be estimated for given input to achieve better results.

Video super-resolution approaches tend to have more components than the image counterparts as they need to exploit the additional temporal dimension. Complex designs are not uncommon. Some most essential components for VSR are guided by four basic functionalities: Propagation, Alignment, Aggregation, and Upsampling.[1]

Methods

When working with video, temporal information could be used to improve upscaling quality. Single image super-resolution methods could be used too, generating high-resolution frames independently from their neighbours, but it's less effective and introduces temporal instability. There are a few traditional methods, which consider the video super-resolution task as an optimization problem. Last years deep learning based methods for video upscaling outperform traditional ones.

Traditional methods

There are several traditional methods for video upscaling. These methods try to use some natural preferences and effectively estimate motion between frames. The high-resolution frame is reconstructed based on both natural preferences and estimated motion.

Frequency domain

Firstly the low-resolution frame is transformed to the frequency domain. The high-resolution frame is estimated in this domain. Finally, this result frame is transformed to the spatial domain.Some methods use Fourier transform, which helps to extend the spectrum of captured signal and though increase resolution. There are different approaches for these methods: using weighted least squares theory,[2] total least squares (TLS) algorithm,[3] space-varying[4] or spatio-temporal[5] varying filtering.Other methods use wavelet transform, which helps to find similarities in neighboring local areas.[6] Later second-generation wavelet transform was used for video super resolution.[7]

Spatial domain

Iterative back-projection methods assume some function between low-resolution and high-resolution frames and try to improve their guessed function in each step of an iterative process.[8] Projections onto convex sets (POCS), that defines a specific cost function, also can be used for iterative methods.[9]

Iterative adaptive filtering algorithms use Kalman filter to estimate transformation from low-resolution frame to high-resolution one.[10] To improve the final result these methods consider temporal correlation among low-resolution sequences. Some approaches also consider temporal correlation among high-resolution sequence.[11] To approximate Kalman filter a common way is to use least mean squares (LMS).[12] One can also use steepest descent,[13] least squares (LS),[14] recursive least squares (RLS).[14]

Direct methods estimate motion between frames, upscale a reference frame, and warp neighboring frames to the high-resolution reference one. To construct result, these upscaled frames are fused together by median filter,[15] weighted median filter,[16] adaptive normalized averaging, AdaBoost classifier[17] or SVD based filters.[18]

Non-parametric algorithms join motion estimation and frames fusion to one step. It is performed by consideration of patches similarities. Weights for fusion can be calculated by nonlocal-means filters.[19] To strength searching for similar patches, one can use rotation invariance similarity measure[20] or adaptive patch size.[21] Calculating intra-frame similarity help to preserve small details and edges.[22] Parameters for fusion also can be calculated by kernel regression.[23]

Probabilistic methods use statistical theory to solve the task. maximum likelihood (ML) methods estimate more probable image.[24] [25] Another group of methods use maximum a posteriori (MAP) estimation. Regularization parameter for MAP can be estimated by Tikhonov regularization.[26] Markov random fields (MRF) is often used along with MAP and helps to preserve similarity in neighboring patches.[27] Huber MRFs are used to preserve sharp edges.[28] Gaussian MRF can smooth some edges, but remove noise.[29]

Deep learning based methods

Aligned by motion estimation and motion compensation

In approaches with alignment, neighboring frames are firstly aligned with target one. One can align frames by performing motion estimation and motion compensation (MEMC) or by using Deformable convolution (DC). Motion estimation gives information about the motion of pixels between frames. motion compensation is a warping operation, which aligns one frame to another based on motion information. Examples of such methods:

Aligned by deformable convolution

Another way to align neighboring frames with target one is deformable convolution. While usual convolution has fixed kernel, deformable convolution on the first step estimate shifts for kernel and then do convolution. Examples of such methods:

Aligned by homography

Some methods align frames by calculated homography between frames.

Spatial non-aligned

Methods without alignment do not perform alignment as a first step and just process input frames.

3D convolutions

While 2D convolutions work on spatial domain, 3D convolutions use both spatial and temporal information. They perform motion compensation and maintain temporal consistency

Recurrent neural networks

Recurrent convolutional neural networks perform video super-resolution by storing temporal dependencies.

Videos

Non-local methods extract both spatial and temporal information. The key idea is to use all possible positions as a weighted sum. This strategy may be more effective than local approaches (the progressive fusion non-local method) extract spatio-temporal features by non-local residual blocks, then fuse them by progressive fusion residual block (PFRB). The result of these blocks is a residual image. The final result is gained by adding bicubically upsampled input frame

Metrics

The common way to estimate the performance of video super-resolution algorithms is to use a few metrics:

Currently, there aren't so many objective metrics to verify video super-resolution method's ability to restore real details. Research is currently underway in this area.

Another way to assess the performance of the video super-resolution algorithm is to organize the subjective evaluation. People are asked to compare the corresponding frames, and the final mean opinion score (MOS) is calculated as the arithmetic mean overall ratings.

Datasets

While deep learning approaches of video super-resolution outperform traditional ones, it's crucial to form a high-quality dataset for evaluation. It's important to verify models' ability to restore small details, text, and objects with complicated structure, to cope with big motion and noise.

Comparison of datasets
Dataset Videos Mean video length Ground-truth resolution Motion in frames Fine details
Vid4 4 43 frames 720×480 Without fast motion Some small details, without text
SPMCS 30 31 frames 960×540 SLow motion A lot of small details
Vimeo-90K (test SR set) 7824 7 frames 448×256 A lot of fast, difficult, diverse motion Few details, text in a few sequences
Xiph HD (complete sets) 70 2 seconds from 640×360
to 4096×2160
A lot of fast, difficult, diverse motion Few details, text in a few sequences
Ultra Video Dataset 4K 16 10 seconds 4096×2160 Diverse motion Few details, without text
REDS (test SR) 30 100 frames 1280×720 A lot of fast, difficult, diverse motion Few details, without text
Space-Time SR 5 100 frames 1280×720 Diverse motion Without small details and text
Harmonic 4096×2160
CDVL 1920×1080

Benchmarks

A few benchmarks in video super-resolution were organized by companies and conferences. The purposes of such challenges are to compare diverse algorithms and to find the state-of-the-art for the task.

Comparison of benchmarks
Benchmark Organizer Dataset Upscale factor Metrics
NTIRE 2019 Challenge CVPR (Computer Vision and pattern recognition) REDS 4 PSNR, SSIM
Youku-VESR Challenge 2019 Youku Youku-VESR 4 PSNR, VMAF
AIM 2019 Challenge ECCV (European Conference on Computer Vision) Vid3oC 16 PSNR, SSIM, MOS
AIM 2020 Challenge ECCV (European Conference on Computer Vision) Vid3oC 16 PSNR, SSIM, LPIPS
Mobile Video Restoration Challenge PSNR, SSIM, MOS
MSU Video Super-Resolution Benchmark 2021 MSU (Moscow State University) 4 ERQAv1.0, PSNR and SSIM with shift compensation, QRCRv1.0, CRRMv1.0
MSU Super-Resolution for Video Compression Benchmark 2022 MSU (Moscow State University) 4 ERQAv2.0, PSNR, MS-SSIM, VMAF, LPIPS

NTIRE 2019 Challenge

The NTIRE 2019 Challenge was organized by CVPR and proposed two tracks for Video Super-Resolution: clean (only bicubic degradation) and blur (blur added firstly). Each track had more than 100 participants and 14 final results were submitted.
Dataset REDS was collected for this challenge. It consists of 30 videos of 100 frames each. The resolution of ground-truth frames is 1280×720. The tested scale factor is 4. To evaluate models' performance PSNR and SSIM were used. The best participants' results are performed in the table:

Top teams
Team Model name PSNR
(clean track)
SSIM
(clean track)
PSNR
(blur track)
SSIM
(blur track)
Runtime per image in sec
(clean track)
Runtime per image in sec
(blur track)
Platform GPU Open source
HelloVSR EDVR 31.79 0.8962 30.17 0.8647 2.788 3.562 PyTorch TITAN Xp YES
UIUC-IFP WDVR 30.81 0.8748 29.46 0.8430 0.980 0.980 PyTorch Tesla V100 YES
SuperRior ensemble of RDN,
RCAN, DUF
31.13 0.8811 120.000 PyTorch Tesla V100 NO
CyberverseSanDiego RecNet 31.00 0.8822 27.71 0.8067 3.000 3.000 TensorFlow RTX 2080 Ti YES
TTI RBPN 30.97 0.8804 28.92 0.8333 1.390 1.390 PyTorch TITAN X YES
NERCMS PFNL 30.91 0.8782 28.98 0.8307 6.020 6.020 PyTorch GTX 1080 Ti YES
XJTU-IAIR FSTDN 28.86 0.8301 13.000 PyTorch GTX 1080 Ti NO

Youku-VESR Challenge 2019

The Youku-VESR Challenge was organized to check models' ability to cope with degradation and noise, which are real for Youku online video-watching application. The proposed dataset consists of 1000 videos, each length is 4–6 seconds. The resolution of ground-truth frames is 1920×1080. The tested scale factor is 4. PSNR and VMAF metrics were used for performance evaluation. Top methods are performed in the table:

Top teams
Team PSNR VMAF
Avengers Assemble 37.851 41.617
NJU_L1 37.681 41.227
ALONG_NTES 37.632 40.405

AIM 2019 Challenge

The challenge was held by ECCV and had two tracks on video extreme super-resolution: first track checks the fidelity with reference frame (measured by PSNR and SSIM). The second track checks the perceptual quality of videos (MOS).Dataset consists of 328 video sequences of 120 frames each. The resolution of ground-truth frames is 1920×1080. The tested scale factor is 16. Top methods are performed in the table:

Top teams
Team Model name PSNR SSIM MOS Runtime per image in secPlatform GPU/CPU Open source
fenglinglwb based on EDVR 22.53 0.64 first result 0.35 PyTorch 4× Titan X NO
NERCMS PFNL 22.35 0.63 0.51 PyTorch 2× 1080 Ti NO
baseline RLSP 21.75 0.60 0.09 TensorFlow Titan Xp NO
HIT-XLab based on EDSR 21.45 0.60 second result 60.00 PyTorch V100 NO

AIM 2020 Challenge

Challenge's conditions are the same as AIM 2019 Challenge. Top methods are performed in the table:

Top teams
Team Model name Params number PSNR !SSIM Runtime per image in secGPU/CPU Open source
KirinUK EVESRNet 45.29M 22.83 0.6450 6.1 s 1 × 2080 Ti 6 NO
Team-WVU 29.51M 22.48 0.6378 4.9 s 1 × Titan Xp NO
BOE-IOT-AIBD 3D-MGBP 53M 22.48 0.6304 4.83 s 1 × 1080 NO
sr xxx based on EDVR 22.43 0.6353 4 s 1 × V100 NO
ZZX MAHA 31.14M 22.28 0.6321 4 s 1 × 1080 Ti NO
lyl FineNet 22.08 0.6256 13 s NO
TTI based on STARnet 21.91 0.6165 0.249 s NO
CET CVLab 21.77 0.6112 0.04 s 1 × P100 NO

MSU Video Super-Resolution Benchmark

The MSU Video Super-Resolution Benchmark was organized by MSU and proposed three types of motion, two ways to lower resolution, and eight types of content in the dataset. The resolution of ground-truth frames is 1920×1280. The tested scale factor is 4. 14 models were tested. To evaluate models' performance PSNR and SSIM were used with shift compensation. Also proposed a few new metrics: ERQAv1.0, QRCRv1.0, and CRRMv1.0.[72] Top methods are performed in the table:

Top methods
Model name Multi-frame Subjective ERQAv1.0 PSNR SSIM QRCRv1.0 CRRMv1.0 Runtime per image in sec Open source
DBVSR YES 5.561 0.737 31.071 0.894 0.629 0.992 YES
LGFN YES 5.040 0.740 31.291 0.898 0.629 0.996 1.499 YES
DynaVSR-R YES 4.751 0.709 28.377 0.865 0.557 0.997 5.664 YES
TDAN YES 4.036 0.706 30.244 0.883 0.557 0.994 YES
DUF-28L YES 3.910 0.645 25.852 0.830 0.549 0.993 2.392 YES
RRN-10L YES 3.887 0.627 24.252 0.790 0.557 0.989 0.390 YES
RealSR NO 3.749 0.690 25.989 0.767 0.000 0.886 YES

MSU Super-Resolution for Video Compression Benchmark

The MSU Super-Resolution for Video Compression Benchmark was organized by MSU. This benchmark tests models' ability to work with compressed videos. The dataset consists of 9 videos, compressed with different Video codec standards and different bitrates. Models are ranked by BSQ-rate[73] over subjective score. The resolution of ground-truth frames is 1920×1080. The tested scale factor is 4. 17 models were tested. 5 video codecs were used to compress ground-truth videos. Top combinations of Super-Resolution methods and video codecs are performed in the table:

Top methods
Model name BSQ-rate (Subjective score) BSQ-rate (ERQAv2.0) BSQ-rate (VMAF) BSQ-rate (PSNR) BSQ-rate (MS-SSIM) BSQ-rate (LPIPS) Open source
RealSR + x264 0.196 0.770 0.775 0.675 0.487 0.591 YES
ahq-11 + x264 0.271 0.883 0.753 0.873 0.719 0.656 NO
SwinIR + x264 0.304 0.760 0.642 6.268 0.736 0.559 YES
Real-ESRGAN + x264 0.335 5.580 0.698 7.874 0.881 0.733 YES
SwinIR + x265 0.346 1.575 1.304 8.130 4.641 1.474 YES
COMISR + x264 0.367 0.969 1.302 6.081 0.672 1.118 YES
RealSR + x265 0.502 1.622 1.617 1.064 1.033 1.206 YES

Application

In many areas, working with video, we deal with different types of video degradation, including downscaling. The resolution of video can be degraded because of imperfections of measuring devices, such as optical degradations and limited size of camera sensors. Bad light and weather conditions add noise to video. Object and camera motion also decrease video quality.Super Resolution techniques help to restore the original video. It's useful in a wide range of applications, such as

It also helps to solve task of object detection, face and character recognition (as preprocessing step). The interest to super-resolution is growing with the development of high definition computer displays and TVs.

Video super-resolution finds its practical use in some modern smartphones and cameras, where it is used to reconstruct digital photographs.

Reconstructing details on digital photographs is a difficult task since these photographs are already incomplete: the camera sensor elements measure only the intensity of the light, not directly its color. A process called demosaicing is used to reconstruct the photos from partial color information. A single frame doesn't give us enough data to fill in the missing colors, however, we can receive some of the missing information from multiple images taken one after the other. This process is known as burst photography and can be used to restore a single image of good quality from multiple sequential frames.

When we capture a lot of sequential photos with a smartphone or handheld camera, there is always some movement present between the frames because of the hand motion. We can take advantage of this hand tremor by combining the information on those images. We choose a single image as the "base" or reference frame and align every other frame relative to it.

There are situations where hand motion is simply not present because the device is stabilized (e.g. placed on a tripod). There is a way to simulate natural hand motion by intentionally slightly moving the camera. The movements are extremely small so they don't interfere with regular photos. You can observe these motions on Google Pixel 3[74] phone by holding it perfectly still (e.g. pressing it against the window) and maximally pinch-zooming the viewfinder.

See also

Notes and References

  1. Chan, Kelvin CK, et al. "BasicVSR: The search for essential components in video super-resolution and beyond." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021.
  2. Book: Kim . S. P. . Bose . N. K. . Valenzuela . H. M. . Lecture Notes in Control and Information Sciences . Reconstruction of high resolution image from noise undersampled frames . 1989 . 129 . Springer-Verlag . Berlin/Heidelberg . 3-540-51424-4 . 10.1007/bfb0042742 . 315–326.
  3. Bose . N.K. . Kim . H.C. . Zhou . B. . Proceedings of 1st International Conference on Image Processing . Performance analysis of the TLS algorithm for image reconstruction from a sequence of undersampled noisy and blurred frames . 1994 . 3 . 571–574 . IEEE Comput. Soc. Press . 0-8186-6952-7 . 10.1109/icip.1994.413741 .
  4. Tekalp . A.M. . Ozkan . M.K. . Sezan . M.I. . [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing . High-resolution image reconstruction from lower-resolution image sequences and space-varying image restoration . IEEE . 1992 . 169–172 vol.3 . 0-7803-0532-9 . 10.1109/icassp.1992.226249 .
  5. Goldberg . N. . Feuer . A. . Goodwin . G.C. . Super-resolution reconstruction using spatio-temporal filtering . Journal of Visual Communication and Image Representation . Elsevier BV . 14 . 4 . 2003 . 1047-3203 . 10.1016/s1047-3203(03)00042-7 . 508–525.
  6. Mallat . S . Super-Resolution With Sparse Mixing Estimators . IEEE Transactions on Image Processing . Institute of Electrical and Electronics Engineers (IEEE) . 19 . 11 . 2010 . 1057-7149 . 10.1109/tip.2010.2049927 . 2889–2900. 20457549 . 2010ITIP...19.2889M . 856101 .
  7. Bose . N.K. . Lertrattanapanich . S. . Chappalli . M.B. . Superresolution with second generation wavelets . Signal Processing: Image Communication . Elsevier BV . 19 . 5 . 2004 . 0923-5965 . 10.1016/j.image.2004.02.001 . 387–391.
  8. Cohen . B. . Avrin . V. . Dinstein . I. . 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100) . Polyphase back-projection filtering for resolution enhancement of image sequences . 2000 . 4 . 2171–2174 . IEEE . 0-7803-6293-4 . 10.1109/icassp.2000.859267 .
  9. Katsaggelos . A.K. . Proceedings of International Conference on Image Processing . An iterative weighted regularized algorithm for improving the resolution of video sequences . 1997 . 474–477 . IEEE Comput. Soc . 0-8186-8183-7 . 10.1109/icip.1997.638811 .
  10. Farsiu . Sina . Elad . Michael . Milanfar . Peyman . Visual Communications and Image Processing 2006 . Apostolopoulos . John G. . Said . Amir . A practical approach to superresolution . SPIE . 2006-01-15 . 6077 . 10.1117/12.644391 . 607703.
  11. A new state-space approach for super-resolution image sequence reconstruction . IEEE . 2005 . 0-7803-9134-9 . 10.1109/icip.2005.1529892 . IEEE International Conference on Image Processing 2005 . Jing Tian . Kai-Kuang Ma . I-881 .
  12. Costa . Guilherme Holsbach . Bermudez . Jos Carlos Moreira . Statistical Analysis of the LMS Algorithm Applied to Super-Resolution Image Reconstruction . IEEE Transactions on Signal Processing . Institute of Electrical and Electronics Engineers (IEEE) . 55 . 5 . 2007 . 1053-587X . 10.1109/tsp.2007.892704 . 2084–2095. 2007ITSP...55.2084C . 52857681 .
  13. Elad . M. . Feuer . A. . Proceedings 1999 International Conference on Image Processing (Cat. 99CH36348) . Super-resolution reconstruction of continuous image sequences . 1999 . 3 . 459–463 . IEEE . 0-7803-5467-2 . 10.1109/icip.1999.817156 .
  14. Elad . M. . Feuer . A. . Superresolution restoration of an image sequence: adaptive filtering approach . IEEE Transactions on Image Processing . Institute of Electrical and Electronics Engineers (IEEE) . 8 . 3 . 1999 . 1057-7149 . 10.1109/83.748893 . 387–395. 18262881 . 1999ITIP....8..387E .
  15. Pickering . M. . Frater . M. . Arnold . J. . IEEE International Conference on Image Processing 2005 . Arobust approach to super-resolution sprite generation . IEEE . 2005 . I-897 . 0-7803-9134-9 . 10.1109/icip.2005.1529896 .
  16. Nasonov . Andrey V. . Krylov . Andrey S. . 2010 20th International Conference on Pattern Recognition . Fast Super-Resolution Using Weighted Median Filtering . IEEE . 2010 . 2230–2233 . 978-1-4244-7542-1 . 10.1109/icpr.2010.546 .
  17. Simonyan . K. . Grishin . S. . Vatolin . D. . Popov . D. . 2008 15th IEEE International Conference on Image Processing . Fast video super-resolution via classification . IEEE . 2008 . 349–352 . 978-1-4244-1765-0 . 10.1109/icip.2008.4711763 .
  18. Nasir . Haidawati . Stankovic . Vladimir . Marshall . Stephen . 2011 IEEE International Conference on Signal and Image Processing Applications (ICSIPA) . Singular value decomposition based fusion for super-resolution image reconstruction . IEEE . 2011 . 393–398 . 978-1-4577-0242-6 . 10.1109/icsipa.2011.6144138 .
  19. Protter . M. . Elad . M. . Takeda . H. . Milanfar . P. . Generalizing the Nonlocal-Means to Super-Resolution Reconstruction . IEEE Transactions on Image Processing . Institute of Electrical and Electronics Engineers (IEEE) . 18 . 1 . 2009 . 1057-7149 . 10.1109/tip.2008.2008067 . 36–51. 19095517 . 2009ITIP...18...36P . 2142115 .
  20. Zhuo . Yue . Liu . Jiaying . Ren . Jie . Guo . Zongming . 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) . Nonlocal based Super Resolution with rotation invariance and search window relocation . IEEE . 2012 . 853–856 . 978-1-4673-0046-9 . 10.1109/icassp.2012.6288018 .
  21. Cheng . Ming-Hui . Chen . Hsuan-Ying . Leou . Jin-Jang . Video super-resolution reconstruction using a mobile search strategy and adaptive patch size . Signal Processing . Elsevier BV . 91 . 5 . 2011 . 0165-1684 . 10.1016/j.sigpro.2010.12.016 . 1284–1297. 17920263 .
  22. Huhle . Benjamin . Schairer . Timo . Jenke . Philipp . Straßer . Wolfgang . Fusion of range and color images for denoising and resolution enhancement with a non-local filter . Computer Vision and Image Understanding . Elsevier BV . 114 . 12 . 2010 . 1077-3142 . 10.1016/j.cviu.2009.11.004 . 1336–1345.
  23. Takeda . Hiroyuki . Farsiu . Sina . Milanfar . Peyman . Kernel Regression for Image Processing and Reconstruction . IEEE Transactions on Image Processing . Institute of Electrical and Electronics Engineers (IEEE) . 16 . 2 . 2007 . 1057-7149 . 10.1109/tip.2006.888330 . 349–366. 17269630 . 2007ITIP...16..349T . 12116009 .
  24. Elad . M. . Feuer . A. . Restoration of a single superresolution image from several blurred, noisy, and undersampled measured images . IEEE Transactions on Image Processing . Institute of Electrical and Electronics Engineers (IEEE) . 6 . 12 . 1997 . 1057-7149 . 10.1109/83.650118 . 1646–1658. 18285235 . 1997ITIP....6.1646E .
  25. Farsiu . Sina . Robinson . Dirk . Elad . Michael . Milanfar . Peyman . Applications of Digital Image Processing XXVI . Tescher . Andrew G. . Robust shift and add approach to superresolution . SPIE . 2003-11-20 . 5203 . 10.1117/12.507194 . 121.
  26. Chantas . G.K. . Galatsanos . N.P. . Woods . N.A. . Super-Resolution Based on Fast Registration and Maximum a Posteriori Reconstruction . IEEE Transactions on Image Processing . Institute of Electrical and Electronics Engineers (IEEE) . 16 . 7 . 2007 . 1057-7149 . 10.1109/tip.2007.896664 . 1821–1830. 17605380 . 2007ITIP...16.1821C . 1811280 .
  27. Rajan . D. . Chaudhuri . S. . 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221) . Generation of super-resolution images from blurred observations using Markov random fields . 2001 . 3 . 1837–1840 . IEEE . 0-7803-7041-4 . 10.1109/icassp.2001.941300 .
  28. Zibetti . Marcelo Victor Wust . Mayer . Joceli . 2006 International Conference on Image Processing . Outlier Robust and Edge-Preserving Simultaneous Super-Resolution . IEEE . 2006 . 1741–1744 . 1-4244-0480-0 . 10.1109/icip.2006.312718 .
  29. Joshi . M.V. . Chaudhuri . S. . Panuganti . R. . A Learning-Based Method for Image Super-Resolution From Zoomed Observations . IEEE Transactions on Systems, Man, and Cybernetics - Part B: Cybernetics . Institute of Electrical and Electronics Engineers (IEEE) . 35 . 3 . 2005 . 1083-4419 . 10.1109/tsmcb.2005.846647 . 527–537. 15971920 . 3162908 .
  30. Liao . Renjie . Tao . Xin . Li . Ruiyu . Ma . Ziyang . Jia . Jiaya . 2015 IEEE International Conference on Computer Vision (ICCV) . Video Super-Resolution via Deep Draft-Ensemble Learning . IEEE . 2015 . 531–539 . 978-1-4673-8391-2 . 10.1109/iccv.2015.68 .
  31. Kappeler . Armin . Yoo . Seunghwan . Dai . Qiqin . Katsaggelos . Aggelos K. . Video Super-Resolution With Convolutional Neural Networks . IEEE Transactions on Computational Imaging . Institute of Electrical and Electronics Engineers (IEEE) . 2 . 2 . 2016 . 2333-9403 . 10.1109/tci.2016.2532323 . 109–122. 9356783 .
  32. Caballero . Jose . Ledig . Christian . Aitken . Andrew . Acosta . Alejandro . Totz . Johannes . Wang . Zehan . Shi . Wenzhe . Real-Time Video Super-Resolution with Spatio-Temporal Networks and Motion Compensation . 2016-11-16 . cs.CV . 1611.05250v2.
  33. Tao . Xin . Gao . Hongyun . Liao . Renjie . Wang . Jue . Jia . Jiaya . 2017 IEEE International Conference on Computer Vision (ICCV) . Detail-Revealing Deep Video Super-Resolution . IEEE . 2017 . 4482–4490 . 978-1-5386-1032-9 . 10.1109/iccv.2017.479 . 1704.02738 .
  34. Liu . Ding . Wang . Zhaowen . Fan . Yuchen . Liu . Xianming . Wang . Zhangyang . Chang . Shiyu . Huang . Thomas . 2017 IEEE International Conference on Computer Vision (ICCV) . Robust Video Super-Resolution with Learned Temporal Dynamics . IEEE . 2017 . 2526–2534 . 978-1-5386-1032-9 . 10.1109/iccv.2017.274 .
  35. Sajjadi . Mehdi S. M. . Vemulapalli . Raviteja . Brown . Matthew . 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Frame-Recurrent Video Super-Resolution . IEEE . 2018 . 6626–6634 . 978-1-5386-6420-9 . 10.1109/cvpr.2018.00693 . 1801.04590 .
  36. Book: Kim . Tae Hyun . Sajjadi . Mehdi S. M. . Hirsch . Michael . Schölkopf . Bernhard . Lecture Notes in Computer Science . Computer Vision – ECCV 2018 . Spatio-Temporal Transformer Network for Video Restoration . Springer International Publishing . Cham . 2018 . 11207 . 978-3-030-01218-2 . 0302-9743 . 10.1007/978-3-030-01219-9_7 . 111–127.
  37. Wang . Longguang . Guo . Yulan . Liu . Li . Lin . Zaiping . Deng . Xinpu . An . Wei . Deep Video Super-Resolution Using HR Optical Flow Estimation . IEEE Transactions on Image Processing . Institute of Electrical and Electronics Engineers (IEEE) . 29 . 2020 . 1057-7149 . 10.1109/tip.2020.2967596 . 4323–4336. 31995491 . 2001.02129 . 2020ITIP...29.4323W . 210023539 .
  38. Chu . Mengyu . Xie . You . Mayer . Jonas . Leal-Taixé . Laura . Thuerey . Nils . Learning temporal coherence via self-supervision for GAN-based video generation . ACM Transactions on Graphics . Association for Computing Machinery (ACM) . 39 . 4 . 2020-07-08 . 0730-0301 . 10.1145/3386569.3392457 . 1811.09393 . 209460786 .
  39. Xue . Tianfan . Chen . Baian . Wu . Jiajun . Wei . Donglai . Freeman . William T. . Video Enhancement with Task-Oriented Flow . International Journal of Computer Vision . Springer Science and Business Media LLC . 127 . 8 . 2019-02-12 . 0920-5691 . 10.1007/s11263-018-01144-2 . 1106–1125. 1711.09078 . 40412298 .
  40. Wang . Zhongyuan . Yi . Peng . Jiang . Kui . Jiang . Junjun . Han . Zhen . Lu . Tao . Ma . Jiayi . Multi-Memory Convolutional Neural Network for Video Super-Resolution . IEEE Transactions on Image Processing . Institute of Electrical and Electronics Engineers (IEEE) . 28 . 5 . 2019 . 1057-7149 . 10.1109/tip.2018.2887017 . 2530–2544. 30571634 . 2019ITIP...28.2530W . 58595890 .
  41. Haris . Muhammad . Shakhnarovich . Gregory . Ukita . Norimichi . 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Recurrent Back-Projection Network for Video Super-Resolution . IEEE . 2019 . 3892–3901 . 978-1-7281-3293-8 . 10.1109/cvpr.2019.00402 . 1903.10128 .
  42. Bao . Wenbo . Lai . Wei-Sheng . Zhang . Xiaoyun . Gao . Zhiyong . Yang . Ming-Hsuan . MEMC-Net: Motion Estimation and Motion Compensation Driven Neural Network for Video Interpolation and Enhancement . IEEE Transactions on Pattern Analysis and Machine Intelligence . Institute of Electrical and Electronics Engineers (IEEE) . 43 . 3 . 2021-03-01 . 0162-8828 . 10.1109/tpami.2019.2941941 . 933–948. 31722471 . 1810.08768 . 53046739 .
  43. Bare . Bahetiyaer . Yan . Bo . Ma . Chenxi . Li . Ke . Real-time video super-resolution via motion convolution kernel estimation . Neurocomputing . Elsevier BV . 367 . 2019 . 0925-2312 . 10.1016/j.neucom.2019.07.089 . 236–245. 201264266 .
  44. Kalarot . Ratheesh . Porikli . Fatih . 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) . MultiBoot Vsr: Multi-Stage Multi-Reference Bootstrapping for Video Super-Resolution . IEEE . 2019 . 2060–2069 . 978-1-7281-2506-0 . 10.1109/cvprw.2019.00258 .
  45. Chan . Kelvin C. K. . Wang . Xintao . Yu . Ke . Dong . Chao . Loy . Chen Change . BasicVSR: The Search for Essential Components in Video Super-Resolution and Beyond . 2020-12-03 . cs.CV . 2012.02181v1.
  46. Naoto Chiche . Benjamin . Frontera-Pons . Joana . Woiselle . Arnaud . Starck . Jean-Luc . 2020 Tenth International Conference on Image Processing Theory, Tools and Applications (IPTA) . Deep Unrolled Network for Video Super-Resolution . IEEE . 2020-11-09 . 1–6 . 978-1-7281-8750-1 . 10.1109/ipta50016.2020.9286636 . 2102.11720 .
  47. Wang . Xintao . Chan . Kelvin C. K. . Yu . Ke . Dong . Chao . Loy . Chen Change . EDVR: Video Restoration with Enhanced Deformable Convolutional Networks . 2019-05-07 . cs.CV . 1905.02716v1.
  48. Wang . Hua . Su . Dewei . Liu . Chuangchuang . Jin . Longcun . Sun . Xianfang . Peng . Xinyi . Deformable Non-Local Network for Video Super-Resolution . IEEE Access . Institute of Electrical and Electronics Engineers (IEEE) . 7 . 2019 . 2169-3536 . 10.1109/access.2019.2958030 . 177734–177744. 1909.10692 . 2019IEEEA...7q7734W . free .
  49. Tian . Yapeng . Zhang . Yulun . Fu . Yun . Xu . Chenliang . 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . TDAN: Temporally-Deformable Alignment Network for Video Super-Resolution . IEEE . 2020 . 3357–3366 . 978-1-7281-7168-5 . 10.1109/cvpr42600.2020.00342 . 1812.02898 .
  50. Song . Huihui . Xu . Wenjie . Liu . Dong . Liua . Bo . Liub . Qingshan . Metaxas . Dimitris N. . Multi-Stage Feature Fusion Network for Video Super-Resolution . IEEE Transactions on Image Processing . Institute of Electrical and Electronics Engineers (IEEE) . 2021 . 30 . 1057-7149 . 10.1109/tip.2021.3056868 . 2923–2934. 33560986 . 2021ITIP...30.2923S . 231864067 .
  51. Isobe . Takashi . Li . Songjiang . Jia . Xu . Yuan . Shanxin . Slabaugh . Gregory . Xu . Chunjing . Li . Ya-Li . Wang . Shengjin . Tian . Qi . 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Video Super-Resolution With Temporal Group Attention . IEEE . 2020 . 8005–8014 . 978-1-7281-7168-5 . 10.1109/cvpr42600.2020.00803 . 2007.10595 .
  52. Lucas . Alice . Lopez-Tapia . Santiago . Molina . Rafael . Katsaggelos . Aggelos K. . Generative Adversarial Networks and Perceptual Losses for Video Super-Resolution . IEEE Transactions on Image Processing . Institute of Electrical and Electronics Engineers (IEEE) . 28 . 7 . 2019 . 1057-7149 . 10.1109/tip.2019.2895768 . 3312–3327. 30714918 . 1806.05764 . 2019ITIP...28.3312L . 73415655 .
  53. Yan . Bo . Lin . Chuming . Tan . Weimin . Frame and Feature-Context Video Super-Resolution . 2019-09-28 . cs.CV . 1909.13057v1.
  54. Tian . Zhiqiang . Wang . Yudiao . Du . Shaoyi . Lan . Xuguang . Yang . You . A multiresolution mixture generative adversarial network for video super-resolution . PLOS ONE . Public Library of Science (PLoS) . 15 . 7 . 2020-07-10 . 1932-6203 . 10.1371/journal.pone.0235352 . e0235352. 32649694 . 7351143 . 2020PLoSO..1535352T . free .
  55. Zhu. Xiaobin. Li. Zhuangzi. Lou. Jungang. Shen. Qing. Video super-resolution based on a spatio-temporal matching network. Pattern Recognition. 110. 2021. 107619. 0031-3203. 10.1016/j.patcog.2020.107619. 2021PatRe.11007619Z. 225285804.
  56. Li . Wenbo . Tao . Xin . Guo . Taian . Qi . Lu . Lu . Jiangbo . Jia . Jiaya . MuCAN: Multi-Correspondence Aggregation Network for Video Super-Resolution . 2020-07-23 . cs.CV . 2007.11803v1.
  57. Jo . Younghyun . Oh . Seoung Wug . Kang . Jaeyeon . Kim . Seon Joo . 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Deep Video Super-Resolution Network Using Dynamic Upsampling Filters Without Explicit Motion Compensation . IEEE . 2018 . 3224–3232 . 978-1-5386-6420-9 . 10.1109/cvpr.2018.00340 .
  58. Li . Sheng . He . Fengxiang . Du . Bo . Zhang . Lefei . Xu . Yonghao . Tao . Dacheng . Fast Spatio-Temporal Residual Network for Video Super-Resolution . 2019-04-05 . cs.CV . 1904.02870v1.
  59. Book: Kim. Soo Ye. Lim. Jeongyeon. Na. Taeyoung. Kim. Munchurl. 2019 IEEE International Conference on Image Processing (ICIP). Video Super-Resolution Based on 3D-CNNS with Consideration of Scene Change. 2019. 2831–2835. 10.1109/ICIP.2019.8803297. 978-1-5386-6249-6. 202763112.
  60. Book: Luo. Jianping. Huang. Shaofei. Yuan. Yuan. Proceedings of the 28th ACM International Conference on Multimedia. Video Super-Resolution using Multi-scale Pyramid 3D Convolutional Networks. 2020. 1882–1890. 10.1145/3394171.3413587. 9781450379885. 222278621.
  61. Zhang. Dongyang. Shao. Jie. Liang. Zhenwen. Liu. Xueliang. Shen. Heng Tao. Multi-branch Networks for Video Super-Resolution with Dynamic Reconstruction Strategy. IEEE Transactions on Circuits and Systems for Video Technology. 2020. 31. 10. 3954–3966. 1051-8215. 10.1109/TCSVT.2020.3044451. 235057646.
  62. Aksan . Emre . Hilliges . Otmar . STCN: Stochastic Temporal Convolutional Networks . 2019-02-18 . cs.LG . 1902.06568v1.
  63. Huang. Yan. Wang. Wei. Wang. Liang. Video Super-Resolution via Bidirectional Recurrent Convolutional Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence. 40. 4. 2018. 1015–1028. 0162-8828. 10.1109/TPAMI.2017.2701380. 28489532. 136582.
  64. Zhu. Xiaobin. Li. Zhuangzi. Zhang. Xiao-Yu. Li. Changsheng. Liu. Yaqi. Xue. Ziyu. Residual Invertible Spatio-Temporal Network for Video Super-Resolution. Proceedings of the AAAI Conference on Artificial Intelligence. 33. 2019. 5981–5988. 2374-3468. 10.1609/aaai.v33i01.33015981. free.
  65. Li. Dingyi. Liu. Yu. Wang. Zengfu. Video Super-Resolution Using Non-Simultaneous Fully Recurrent Convolutional Network. IEEE Transactions on Image Processing. 28. 3. 2019. 1342–1355. 1057-7149. 10.1109/TIP.2018.2877334. 30346282. 2019ITIP...28.1342L. 53044490.
  66. Isobe . Takashi . Zhu . Fang . Jia . Xu . Wang . Shengjin . Revisiting Temporal Modeling for Video Super-resolution . 2020-08-13 . eess.IV . 2008.05765v2.
  67. Han. Lei. Fan. Cien. Yang. Ye. Zou. Lian. Bidirectional Temporal-Recurrent Propagation Networks for Video Super-Resolution. Electronics. 9. 12. 2020. 2085. 2079-9292. 10.3390/electronics9122085. free.
  68. Fuoli. Dario. Gu. Shuhang. Timofte. Radu. 2019-09-17. Efficient Video Super-Resolution through Recurrent Latent Space Propagation. eess.IV. 1909.08080.
  69. Isobe . Takashi . Jia . Xu . Gu . Shuhang . Li . Songjiang . Wang . Shengjin . Tian . Qi . Video Super-Resolution with Recurrent Structure-Detail Network . 2020-08-02 . cs.CV . 2008.00455v1.
  70. Zhou. Chao. Chen. Can. Ding. Fei. Zhang. Dengyin. Video super-resolution with non-local alignment network. IET Image Processing. 2021. 15. 8. 1655–1667. 1751-9659. 10.1049/ipr2.12134. free.
  71. Yi. Peng. Wang. Zhongyuan. Jiang. Kui. Jiang. Junjun. Lu. Tao. Ma. Jiayi. A Progressive Fusion Generative Adversarial Network for Realistic and Consistent Video Super-Resolution. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2020. PP. 5 . 2264–2280. 0162-8828. 10.1109/TPAMI.2020.3042298. 33270559. 227282569.
  72. Web site: MSU VSR Benchmark Methodology . Video Processing . 2021-04-26 . 2021-05-12.
  73. Zvezdakova . A. V. . Kulikov . D. L. . Zvezdakov . S. V. . Vatolin . D. S. . BSQ-rate: a new approach for video-codec performance comparison and drawbacks of current solutions . Programming and Computer Software . 46 . 2020 . 3 . 183–194 . 10.1134/S0361768820030111 . 219157416 .
  74. Web site: See Better and Further with Super Res Zoom on the Pixel 3 . Google AI Blog . 2018-10-15 .