Fréchet inception distance explained
The Fréchet inception distance (FID) is a metric used to assess the quality of images created by a generative model, like a generative adversarial network (GAN).[1] Unlike the earlier inception score (IS), which evaluates only the distribution of generated images, the FID compares the distribution of generated images with the distribution of a set of real images ("ground truth"). The FID metric does not completely replace the IS metric. Classifiers that achieve the best (lowest) FID score tend to have greater sample variety while classifiers achieving the best (highest) IS score tend to have better quality within individual images.[2]
The FID metric was introduced in 2017, and is the current standard metric for assessing the quality of models that generate synthetic images as of 2024. It has been used to measure the quality of many recent models including the high-resolution StyleGAN1[3] and StyleGAN2[4] networks and the Classifier-Free Diffusion Model.
Definition
For any two probability distributions
over
having finite mean and variances, their
Fréchet distance is
[5] where
is the set of all measures on
with
marginals
and
on the first and second factors respectively. (The set
is also called the set of all
couplings of
and
.). In other words, it is the
2-Wasserstein distance on
.
and
, it is explicitly solvable as
[6] This allows us to define the FID in
pseudocode form:
INPUT a function
.INPUT two datasets
.Compute
.Fit two gaussian distributions
lN(\mu,\Sigma),lN(\mu',\Sigma')
, respectively for
.RETURN
dF(lN(\mu,\Sigma),lN(\mu',\Sigma'))2
.
In most practical uses of the FID,
is the space of images, and
is an
Inception v3 model trained on the
ImageNet, but without its final classification layer. Technically, it is the 2048-dimensional activation vector of its last pooling layer. Of the two datasets
, one of them is a reference dataset, which could be the ImageNet itself, and the other is a set of images generated by a
generative model, such as
GAN, or
diffusion model.
Interpretation
Rather than directly comparing images pixel by pixel (for example, as done by the L2 norm), the FID compares the mean and standard deviation of the deepest layer in Inception v3. These layers are closer to output nodes that correspond to real-world objects such as a specific breed of dog or an airplane, and further from the shallow layers near the input image.
Variants
Specialized variants of FID have been suggested as evaluation metric for music enhancement algorithms as Fréchet Audio Distance (FAD),[7] for generative models of video as Fréchet Video Distance (FVD),[8] and for AI-generated molecules as Fréchet ChemNet Distance (FCD).[9]
Limitations
Chong and Forsyth[10] showed FID to be statistically biased, in the sense that their expected value over a finite data is not their true value. Also, because FID measured the Wasserstein distance towards the ground-truth distribution, it is inadequate for evaluating the quality of generators in domain adaptation setups, or in zero-shot generation. Finally, while FID is more consistent with human judgement than previously used inception score, there are cases where FID is inconsistent with human judgment (e.g. Figure 3,5 in Liu et al.).[11]
See also
Notes and References
- Heusel . Martin . Ramsauer . Hubert . Unterthiner . Thomas . Nessler . Bernhard . Hochreiter . Sepp . GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium . Advances in Neural Information Processing Systems . 2017 . 30 . 1706.08500 . en.
- Ho . Jonathan . Salimans . Tim . Classifier-Free Diffusion Guidance . 2022 . cs.LG . 2207.12598.
- Karras . Tero . Laine . Samuli . Aila . Timo . 2020 . A Style-Based Generator Architecture for Generative Adversarial Networks . IEEE Transactions on Pattern Analysis and Machine Intelligence . PP . 12 . 4217–4228 . 1812.04948 . 10.1109/TPAMI.2020.2970919 . 32012000 . 211022860.
- 1912.04958 . cs.CV . Tero . Karras . Samuli . Laine . Analyzing and Improving the Image Quality of StyleGAN . 23 March 2020 . Aittala . Miika . Hellsten . Janne . Lehtinen . Jaakko . Aila . Timo.
- Fréchet. . M . 1957 . Sur la distance de deux lois de probabilité. . C. R. Acad. Sci. Paris . 244 . 689–692.
- Dowson . D. C . Landau . B. V . 1 September 1982 . The Fréchet distance between multivariate normal distributions . Journal of Multivariate Analysis . en . 12 . 3 . 450–455 . 10.1016/0047-259X(82)90077-X . 0047-259X . free.
- Kilgour . Kevin . Zuluaga . Mauricio . Roblek . Dominik . Sharifi . Matthew . Fréchet Audio Distance: A Reference-Free Metric for Evaluating Music Enhancement Algorithms . Interspeech 2019 . 2019-09-15 . 2350–2354 . 10.21437/Interspeech.2019-2219. 202725406 .
- Open Review . Unterthiner . Thomas . Steenkiste . Sjoerd van . Kurach . Karol . Marinier . Raphaël . Michalski . Marcin . Gelly . Sylvain . FVD: A new Metric for Video Generation . 2019-03-27 . en.
- Preuer . Kristina . Renz . Philipp . Unterthiner . Thomas . Hochreiter . Sepp . Klambauer . Günter . Fréchet ChemNet Distance: A Metric for Generative Models for Molecules in Drug Discovery . Journal of Chemical Information and Modeling . 2018-09-24 . 58 . 9 . 1736–1741 . 10.1021/acs.jcim.8b00234. 30118593 . 1803.09518 . 51892387 .
- Chong . Min Jin . Forsyth . David . 2020-06-15 . Effectively Unbiased FID and Inception Score and where to find them . cs.CV . 1911.07023 .
- Liu . Shaohui . Wei . Yi . Lu . Jiwen . Zhou . Jie . 2018-07-19 . An Improved Evaluation Framework for Generative Adversarial Networks . cs.CV . 1803.07474 .