Filters, random fields, and maximum entropy model explained

In the domain of physics and probability, the filters, random fields, and maximum entropy (FRAME) model[1] [2] is a Markov random field model (or a Gibbs distribution) of stationary spatial processes, in which the energy function is the sum of translation-invariant potential functions that are one-dimensional non-linear transformations of linear filter responses. The FRAME model was originally developed by Song-Chun Zhu, Ying Nian Wu, and David Mumford for modeling stochastic texture patterns, such as grasses, tree leaves, brick walls, water waves, etc. This model is the maximum entropy distribution that reproduces the observed marginal histograms of responses from a bank of filters (such as Gabor filters or Gabor wavelets), where for each filter tuned to a specific scale and orientation, the marginal histogram is pooled over all the pixels in the image domain. The FRAME model is also proved to be equivalent to the micro-canonical ensemble,[3] which was named the Julesz ensemble. Gibbs sampler is adopted to synthesize texture images by drawing samples from the FRAME model.

The original FRAME model is homogeneous for texture modeling. Xie et al. proposed the sparse FRAME model,[4] [5] which is an inhomogeneous generalization of the original FRAME model, for the purpose of modeling object patterns, such as animal bodies, faces, etc. It is a non-stationary Markov random field model that reproduces the observed statistical properties of filter responses at a subset of selected locations, scales and orientations. The sparse FRAME model can be considered a deformable template.

The deep FRAME model [6] [7] is a deep generalization of the original FRAME model. Instead of using linear filters as in the original FRAME model, Lu et al. uses the filters at a certain convolutional layer of a pre-learned ConvNet. Instead of relying on the pre-trained filters from an existing ConvNet, Xie et al. parameterized the energy function of the FRAME model by a ConvNet structure and learn all parameters from scratch. The deep FRAME model is the first framework that integrates modern deep neural network from deep learning and Gibbs distribution from statistical physics. The deep FRAME models are further generalized to modeling video patterns,[8] [9] 3D volumetric shape patterns [10]

Notes and References

  1. Zhu. Song-Chun. Wu. Ying Nian. Mumford. David. Filters, Random Fields and Maximum Entropy (FRAME): Towards a Unified Theory for Texture Modeling. International Journal of Computer Vision. 1998.
  2. Zhu. Song Chun. Wu. Ying Nian. Mumford. David. November 1997. Minimax Entropy Principle and Its Application to Texture Modeling. Neural Computation. 9. 8. 1627–1660. 10.1162/neco.1997.9.8.1627. 15926. 0899-7667.
  3. Book: Ying Nian Wu. Song Chun Zhu. Xiuwen Liu. Proceedings of the Seventh IEEE International Conference on Computer Vision . Equivalence of Julesz and Gibbs texture ensembles . 1999. 1025–1032 vol.2. IEEE. 10.1109/iccv.1999.790382. 0-7695-0164-8. 7550898.
  4. Xie. Jianwen. Hu. Wenze. Zhu. Song-Chun. Wu. Ying Nian. 2014-10-02. Learning Sparse FRAME Models for Natural Image Patterns. International Journal of Computer Vision. 114. 2–3. 91–112. 10.1007/s11263-014-0757-x. 8742525. 0920-5691. 10.1.1.434.7360.
  5. Xie. Jianwen. Lu. Yang. Zhu. Song-Chun. Wu. Ying Nian. July 2016. Inducing wavelets into random fields via generative boosting. Applied and Computational Harmonic Analysis. 41. 1. 4–25. 10.1016/j.acha.2015.08.004. 521731 . 1063-5203.
  6. Lu. Yang. Zhu. Song-Chun. Wu. Ying Nian. Learning FRAME Models Using CNN Filters . Proceedings of the AAAI Conference on Artificial Intelligence . 2016 . 30 . 10.1609/aaai.v30i1.10238 . 2387309 . 1509.08379.
  7. Xie. Jianwen. Lu. Yang. Zhu. Song-Chun. Wu. Ying Nian. 2016. A theory of generative ConvNet. International Conference on Machine Learning. 1602.03264 .
  8. Book: Xie. Jianwen. Zhu. Song-Chun. Wu. Ying Nian. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Synthesizing Dynamic Patterns by Spatial-Temporal Generative ConvNet . July 2017. 1061–1069. IEEE. 10.1109/cvpr.2017.119. 978-1-5386-0457-1. 1606.00972. 763074.
  9. Xie. Jianwen. Zhu. Song-Chun. Wu. Ying Nian. 2019. Learning energy-based spatial-temporal generative ConvNet for dynamic patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence. 43. 2. 516–531. 10.1109/TPAMI.2019.2934852. 31425020. 1909.11975. 201098397.
  10. Book: Xie. Jianwen. Zheng. Zilong. Gao. Ruiqi. Wang. Wenguan. Zhu. Song-Chun. Wu. Ying Nian. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Learning Descriptor Networks for 3D Shape Synthesis and Analysis . June 2018. 8629–8638. IEEE. 10.1109/cvpr.2018.00900. 978-1-5386-6420-9. 1804.00586. 4564025.