Video matting is a technique for separating the video into two or more layers, usually foreground and background, and generating alpha mattes which determine blending of the layers. The technique is very popular in video editing because it allows to substitute the background, or process the layers individually.
When combining two images the alpha matte is utilized, also known as the transparency map. In the case of digital video, the alpha matte is a sequence of images. The matte can serve as a binary mask, defining which of the image parts are visible. In a more complicated case it enables smooth blending of the images, the alpha matte is used as the transparency map of the top image. Film production has known alpha matting since the very creation of filmmaking. The mattes were drawn by hand. Nowadays, the process can be automatized with computer algorithms.
The basic matting problem is defined as following: given an image
I
F
B
A
I=AF+(1-A)B
A=1
F=I
B
The main criteria for video matting methods from a user perspective are following:
The first known video matting method [1] was developed in 2001. The method utilizes optical flow for trimap propagation and a Bayesian image matting technique which is applied to each image separately.
Video SnapCut,[2] which later was incorporated in Adobe After Effects as Roto Brush tool, was developed in 2009. The method makes use of local classifiers for binary image segmentation near the target object's boundary. The results of the segmentation are propagated to the next frame using optical flow, and an image matting algorithm [3] is applied.
A method [4] from 2011 was also included in Adobe After Effects as Refine Edge tool. The propagation of trimap with optical flow was enhanced with control points along the object edge. The method uses per-image matting, but temporal coherence was improved with a temporal filter.
Finally, a deep learning method [5] was developed for image matting in 2017. It overcomes most traditional methods.
Video matting is a rapidly-evolving field with many practical applications. However, in order to compare the quality of the methods, they must be tested on a benchmark. The benchmark consists of a dataset with test sequences and a result comparison methodology. Currently there exists one major video matting online benchmark, which uses chroma keying and stop motion for ground truth estimation. After method submission, the rating for each method is derived from objective metrics. As objective metrics do not represent human perception of quality, a subjective survey is necessary to provide adequate comparison.
Method | Year of development | Ranking place | |
---|---|---|---|
Deep Image Matting | 2016 | 1 | |
Self-Adaptive [7] | 2016 | 2 | |
Learning Based [8] | 2009 | 3 | |
Sparse Sampling [9] | 2016 | 4 | |
Closed Form | 2008 | 5 |
Video matting methods are required in video editing software. The most common application is cutting out and transferring an object into another scene. The tool allows users to cut out a moving object by interactively painting areas that must or must not belong to the object, or specifying complete trimaps as input. There are several software implementations:
To enhance the speed and quality of matting, some methods use additional data. For example, time-of-flight cameras had been explored in real-time matting systems.[12]
Another application of video matting is background matting, which is very popular in online video calls. A Zoom plugin had been developed,[13] and Skype announced Background Replace in June 2020.[14] Video matting methods also allow to apply video effects only to background or foreground.
Video matting is crucial in 2D to 3D conversion, where the alpha matte is used to correctly process transparent objects. It is also employed in stereo to multiview conversion.
Closely related to matting is video completion [15] after removal of an object in the video. While matting is used to separate the video into several layers, completion allows to fill gaps with plausible contents from the video after removing one of the layers.