Bernhard Schölkopf Explained

Bernhard Schölkopf
Work Institution:	Max Planck Institute for Intelligent Systems
Thesis Title:	Support Vector Learning
Thesis Year:	1997
Doctoral Advisor:	Vladimir Vapnik
Doctoral Students:	Stefanie Jegelka Ulrike von Luxburg

Bernhard Schölkopf (born 20 February 1968) is a German computer scientist known for his work in machine learning, especially on kernel methods and causality. He is a director at the Max Planck Institute for Intelligent Systems in Tübingen, Germany, where he heads the Department of Empirical Inference. He is also an affiliated professor at ETH Zürich, honorary professor at the University of Tübingen and Technische Universität Berlin, and chairman of the European Laboratory for Learning and Intelligent Systems (ELLIS).

Research

Kernel methods

Schölkopf developed SVM methods achieving world record performance on the MNIST pattern recognition benchmark at the time.^[1] With the introduction of kernel PCA, Schölkopf and coauthors argued that SVMs are a special case of a much larger class of methods, and all algorithms that can be expressed in terms of dot products can be generalized to a nonlinear setting by means of what is known as reproducing kernels.^[2] ^[3] ^[4] Another significant observation was that the data on which the kernel is defined need not be vectorial, as long as the kernel Gram matrix is positive definite. Both insights together led to the foundation of the field of kernel methods, encompassing SVMs and many other algorithms. Kernel methods are now textbook knowledge and one of the major machine learning paradigms in research and applications.

Developing kernel PCA, Schölkopf extended it to extract invariant features and to design invariant kernels^[5] ^[6] and showed how to view other major dimensionality reduction methods such as LLE and Isomap as special cases. In further work with Alex Smola and others, he extended the SVM method to regression and classification with pre-specified sparsity^[7] and quantile/support estimation.^[8] He proved a representer theorem implying that SVMs, kernel PCA, and most other kernel algorithms, regularized by a norm in a reproducing kernel Hilbert space, have solutions taking the form of kernel expansions on the training data, thus reducing an infinite dimensional optimization problem to a finite dimensional one. He co-developed kernel embeddings of distributions methods to represent probability distributions in Hilbert Spaces,^[9] ^[10] ^[11] ^[12] with links to Fraunhofer diffraction^[13] as well as applications to independence testing.^[14] ^[15] ^[16]

Causality

Starting in 2005, Schölkopf turned his attention to causal inference. Causal mechanisms in the world give rise to statistical dependencies as epiphenomena, but only the latter are exploited by popular machine learning algorithms. Knowledge about causal structures and mechanisms is useful by letting us predict not only future data coming from the same source, but also the effect of interventions in a system, and by facilitating transfer of detected regularities to new situations.^[17]

Schölkopf and co-workers addressed (and in certain settings solved) the problem of causal discovery for the two-variable setting^[18] ^[19] ^[20] ^[21] ^[22] and connected causality to Kolmogorov complexity.^[23]

Around 2010, Schölkopf began to explore how to use causality for machine learning, exploiting assumptions of independence of mechanisms and invariance.^[24] His early work on causal learning was exposed to a wider machine learning audience during his Posner lecture ^[25] at NeurIPS 2011, as well as in a keynote talk at ICML 2017.^[26] He assayed how to exploit underlying causal structures in order to make machine learning methods more robust with respect to distribution shifts^[17] ^[27] ^[28] and systematic errors,^[29] the latter leading to the discovery of a number of new exoplanets^[30] including K2-18b, which was subsequently found to contain water vapour in its atmosphere, a first for an exoplanet in the habitable zone.

Education and employment

Schölkopf studied mathematics, physics, and philosophy in Tübingen and London. He was supported by the Studienstiftung and won the Lionel Cooper Memorial Prize for the best M.Sc. in Mathematics at the University of London.^[31] He completed a Diplom in Physics, and then moved to Bell Labs in New Jersey, where he worked with Vladimir Vapnik, who became co-adviser of his PhD thesis at TU Berlin (with Stefan Jähnichen). His thesis, defended in 1997, won the annual award of the German Informatics Association.^[32] In 2001, following positions in Berlin, Cambridge and New York, he founded the Department for Empirical Inference at the Max Planck Institute for Biological Cybernetics, which grew into a leading center for research in machine learning. In 2011, he became founding director at the Max Planck Institute for Intelligent Systems.^[33] ^[34]

With Alex Smola, Schölkopf co-founded the series of Machine Learning Summer Schools.^[35] He also co-founded a Cambridge-Tübingen PhD Programme^[36] and the Max Planck-ETH Center for Learning Systems.^[37] In 2016, he co-founded the Cyber Valley research consortium.^[38] He participated in the IEEE Global Initiative on "Ethically Aligned Design".^[39]

Schölkopf is co-editor-in-Chief of the Journal of Machine Learning Research, a journal he helped found, being part of a mass resignation of the editorial board of Machine Learning (journal). He is among the world’s most cited computer scientists.^[40] Alumni of his lab include Ulrike von Luxburg, Carl Rasmussen, Matthias Hein, Arthur Gretton, Gunnar Rätsch, Matthias Bethge, Stefanie Jegelka, Jason Weston, Olivier Bousquet, Olivier Chapelle, Joaquin Quinonero-Candela, and Sebastian Nowozin.^[41]

As of late 2023, Schölkopf is also a scientific advisor to French research group Kyutai which is being funded by Xavier Niel, Rodolphe Saadé, Eric Schmidt, and others.^[42]

Awards

Schölkopf’s awards include the Royal Society Milner Award and, shared with Isabelle Guyon and Vladimir Vapnik, the BBVA Foundation Frontiers of Knowledge Award in the Information and Communication Technologies category. He was the first scientist working in Europe to receive this award.^[43]

Notes and References

Training Invariant Support Vector Machines. Dennis. Decoste. Bernhard. Schölkopf. 1 January 2002. Machine Learning. 46. 1. 161–190. Springer Link. 10.1023/A:1012454411458. 85843. 11858/00-001M-0000-0013-E06A-A. free.
Book: Schölkopf, Bernhard . Support vector learning . 1997 . Oldenbourg . 978-3-486-24632-2 . GMD-Berichte . München Wien.
Schölkopf . Bernhard . Smola . Alexander . Müller . Klaus-Robert . Klaus-Robert Müller . 1998-07-01 . Nonlinear Component Analysis as a Kernel Eigenvalue Problem . Neural Computation . 10 . 5 . 1299–1319 . 10.1162/089976698300017467 . 6674407 . 0899-7667.
A Tutorial on Support Vector Machines for Pattern Recognition. Christopher J.C.. Burges. 1 June 1998. Data Mining and Knowledge Discovery. 2. 2. 121–167. Springer Link. 10.1023/A:1009715923555. 221627509.
Schölkopf, P. Simard, A. J. Smola, and V. Vapnik. Prior knowledge in support vector kernels. In M. Jordan, M. Kearns, and S. Solla, editors, Advances in Neural Information Processing Systems 10, pages 640–646, Cambridge, MA, USA, 1998d. MIT Press
Chapelle and B. Schölkopf. Incorporating invariances in nonlinear SVMs. In T. G. Dietterich, S. Becker, and Z. Ghahramani, editors, Advances in Neural Information Processing Systems 14, pages 609–616, Cambridge, MA, USA, 2002. MIT Press
B. Schölkopf, A. J. Smola, R. C. Williamson, and P. L. Bartlett. New support vector algorithms. Neural Computation, 12(5):1207–1245, 2000a
B. Schölkopf, J. C. Platt, J. Shawe-Taylor, A. J. Smola, and R. C. Williamson. Estimating the support of a high-dimensional distribution. Neural Computation, 13(7):1443–1471, 2001b
A. Gretton, K. Borgwardt, M. Rasch, B. Schölkopf and A. Smola. A Kernel Method for the Two-Sample-Problem. Advances in Neural Information Processing Systems 19: 513—520, 2007
A. J. Smola and A. Gretton and L. Song and B. Schölkopf. A Hilbert Space Embedding for Distributions. Algorithmic Learning Theory: 18th International Conference: 13—31, 2007
B. Sriperumbudur, A. Gretton, K. Fukumizu, B. Schölkopf and G. Lanckriet. Hilbert Space Embeddings and Metrics on Probability Measures. Journal of Machine Learning Research, 11: 1517—1561, 2010
A. Gretton, K. Borgwardt, M. Rasch, B. Schölkopf and A. J. Smola. A Kernel Two-Sample Test. Journal of Machine Learning Research, 13: 723—773, 2012
S. Harmeling, M. Hirsch, and B. Schölkopf. On a link between kernel mean maps and Fraunhofer diffraction, with an application to super-resolution beyond the diffraction limit. In Computer Vision and Pattern Recognition (CVPR), pages 1083–1090. IEEE, 2013
A. Gretton, R. Herbrich, A. J. Smola, O. Bousquet, and B. Schölkopf. Kernel methods for measuring independence. Journal of Machine Learning Research, 6:2075–2129, 2005a
A. Gretton, O. Bousquet, A. J. Smola and B. Schölkopf. Measuring Statistical Dependence with Hilbert-Schmidt Norms. Algorithmic Learning Theory: 16th International Conference, 2005b
A. Gretton, K. Fukumizu, C.H. Teo, L. Song, B. Schölkopf and A. J. Smola. A Kernel Statistical Test of Independence. Advances in Neural Information Processing Systems 20, 2007
B. Schölkopf, D. Janzing, J. Peters, E. Sgouritsa, K. Zhang, and J. Mooij. On causal and anticausal learning. In J. Langford and J. Pineau, editors, Proceedings of the 29th International Conference on Machine Learning (ICML), pages 1255–1262, New York, NY, USA, 2012. Omnipress
P. O. Hoyer, D. Janzing, J. M. Mooij, J. Peters, and B. Schölkopf. Nonlinear causal discovery with additive noise models. In D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, editors, Advances in Neural Information Processing Systems 21, pages 689–696, Red Hook, NY, USA, 2009. Curran
D. Janzing, P. Hoyer, and B. Schölkopf. Telling cause from effect based on high-dimensional observations. In J. Fu ̈rnkranz and T. Joachims, editors, Proceedings of the 27th International Conference on Machine Learning, pages 479–486, Madison, WI, USA, 2010. International Machine Learning Society
J.M. Mooij, J. Peters, D. Janzing, J. Zscheischler, and B. Schölkopf. Distinguishing cause from effect using observational data: methods and benchmarks. Journal of Machine Learning Research, 17(32):1–102, 2016
J. Peters, JM. Mooij, D. Janzing, and B. Schölkopf. Causal discovery with continuous additive noise models. Journal of Machine Learning Research, 15:2009–2053, 2014
P. Daniusis, D. Janzing, J. Mooij, J. Zscheischler, B. Steudel, K. Zhang, and B. Schölkopf. Inferring deterministic causal relations. In P. Grünwald and P. Spirtes, editors, 26th Conference on Uncertainty in Artificial Intelligence, pages 143–150, Corvallis, OR, 2010. AUAI Press. Best student paper award
Causal Inference Using the Algorithmic Markov Condition. Dominik. Janzing. Bernhard. Schölkopf. 6 October 2010. IEEE Transactions on Information Theory. 56. 10. 5168–5194. IEEE Xplore. 10.1109/TIT.2010.2060095. 0804.3678. 11867432.
Schölkopf . Bernhard . Janzing . Dominik . Peters . Jonas . Sgouritsa . Eleni . Zhang . Kun . 2012-06-27 . On Causal and Anticausal Learning . International Conference of Machine Learning.
Web site: From kernels to causal inference. videolectures.net.
Web site: Causal Learning --- Bernhard Schölkopf. 15 October 2017. Vimeo.
K. Zhang, B. Schölkopf, K. Muandet, and Z. Wang. Domain adaptation under target and conditional shift. In S. Dasgupta and D. McAllester, editors, Proceedings of the 30th International Conference on Machine Learning, volume 28 of JMLR Workshop and Conference Proceedings, pages 819–827, 2013
Learning to see and act. Bernhard. Schölkopf. 6 February 2015. Nature. 518. 7540. 486–487. www.nature.com. 10.1038/518486a. 25719660. 4461791.
Modeling confounding by half-sibling regression. Bernhard. Schölkopf. David W.. Hogg. Dun. Wang. Daniel. Foreman-Mackey. Dominik. Janzing. Carl-Johann. Simon-Gabriel. Jonas. Peters. 5 July 2016. Proceedings of the National Academy of Sciences. 113. 27. 7391–7398. 10.1073/pnas.1511656113. 27382154. 4941423. 2016PNAS..113.7391S. free.
D. Foreman-Mackey, B. T. Montet, D. W. Hogg, T. D. Morton, D. Wang, and B. Schölkopf. A systematic search for transiting planets in the K2 data. The Astrophysical Journal, 806(2), 2015
Web site: Curriculum Vitae Prof. Dr. Bernhard Schölkopf . Leopoldina . de .
Web site: TU Berlin – Medieninformation Nr. 209 – 17. September 1998. archiv.pressestelle.tu-berlin.de.
Web site: History of the Institute. www.kyb.tuebingen.mpg.de.
Web site: 2011 . Prescriptions for the Medicine of Tomorrow . The Science Magazine of the Max Planck Society .
Web site: Machine Learning Summer Schools – MLSS. mlss.cc.
Web site: Cambridge Machine Learning Group. Cambridge Machine Learning Group.
Web site: Max Planck ETH Center for Learning Systems. Jonathan. Williams. cls-staging.is.localnet.
Web site: Service. Baden-Württemberg.de. 15 December 2016 .
Web site: 2016-12-13 . Ethically Aligned Design . IEEE .
Web site: World's Top Computer Scientists: H-Index Computer Science Ranking. www.guide2research.com.
Web site: Alumni . people.tuebingen.mpg.de.
Web site: Dillet . Romain . 2023-11-17 . Kyutai is a French AI research lab with a $330 million budget that will make everything open source . 2024-06-16 . TechCrunch . en-US.
Web site: Bernhard Schölkopf receives Frontiers of Knowledge Award | Empirical Inference. Jon. Williams. Max Planck Institute for Intelligent Systems.