Philipp Koehn Explained

Philipp Koehn
Birth Date:1971 8, df=y
Birth Place:Erlangen, Bavaria,
West Germany
Citizenship:Germany
Field:computer science, natural language processing, machine translation, cross-language information retrieval
Work Institution:University of Edinburgh, Johns Hopkins University
Alma Mater:Albert Schweitzer High School (Erlangen), University of Erlangen-Nuremberg, University of Tennessee, University of Southern California
Doctoral Advisor:Kevin Knight
Known For:Europarl corpus, Moses

Philipp Koehn (born 1 August 1971 in Erlangen, West Germany) is a computer scientist and researcher in the field of machine translation.[1] [2] His primary research interest is statistical machine translation and he is one of the inventors of a method called phrase based machine translation. This is a sub-field of statistical translation methods that employs sequences of words (or so-called "phrases") as the basis of translation, expanding the previous word based approaches.  A 2003 paper which he authored with Franz Josef Och and Daniel Marcu called Statistical phrase-based translation has attracted wide attention in Machine translation community and has been cited over a thousand times.[3] Phrase based methods are widely used in machine translation applications in industry.

Philipp Koehn received his PhD in computer science in 2003 from the University of Southern California, where he worked at the Information Sciences Institute advised by Kevin Knight. After a year as a postdoctoral fellow under Michael Collins at the Massachusetts Institute of Technology, he joined the University of Edinburgh as a lecturer in the School of Informatics in 2005. He was appointed reader in 2010 and professor in 2012. In 2014, he was appointed professor at the computer science department of The Johns Hopkins University, where he is affiliated with the Center for Language and Speech Processing.

Moses statistical machine translation decoder

The Moses machine translation decoder is an open source project that was created by and is maintained under the guidance of Philipp Koehn.[4] The Moses decoder is a platform for developing Statistical machine translation systems given a parallel corpus for any language pair.[5] The decoder was mainly developed by Hieu Hoang and Philipp Koehn at the University of Edinburgh and extended during a Johns Hopkins University Summer Workshop and further developed under Euromatrix and GALE project funding.  The decoder (which is part of a complete statistical machine translation toolkit) is the de facto benchmark for research in the field.

Although Koehn continues to play a major role in the development of Moses, the Moses decoder was supported by the European Framework 6 projects Euromatrix, TC-Star, the European Framework 7 projects EuroMatrixPlus, Let's MT, META-NET and MosesCore and the DARPA GALE project, as well as several universities such as the University of Edinburgh, the University of Maryland, ITC-irst, Massachusetts Institute of Technology, and others.  Substantial additional contributors to the Moses decoder include Hieu Hoang, Chris Dyer, Josh Schroeder, Marcello Federico, Richard Zens, and Wade Shen.

Europarl corpus

The Europarl corpus is a set of documents that consists of the proceedings of the European Parliament from 1996 to the present.  The corpus has been compiled and expanded by a group of researchers led by Philipp Koehn at University of Edinburgh.  The data that makes up the corpus was extracted from the website of the European Parliament and then prepared for linguistic research.  The latest release (2012) comprised up to 60 million words per language,[6] with 21 European languages represented: Romanic (French, Italian, Spanish, Portuguese, Romanian), Germanic (English, Dutch, German, Danish, Swedish), Slavic (Bulgarian, Czech, Polish, Slovak, Slovene), Finno-Ugric (Finnish, Hungarian, Estonian), Baltic (Latvian, Lithuanian), and Greek.

Other interests and activities in chronological order

Awards and recognition

Notes and References

  1. http://www.translationautomation.com/technology/interview-with-philipp-koehn.html Interview with Philipp Koehn | Technology | TAUS – Enabling better translation
  2. http://www.tausdata.org/index.php/about-tda/technology-advisory Technology Advisory
  3. https://scholar.google.com.tr/scholar?q=%22philipp+koehn%22&hl=tr&btnG=Ara&lr= "philipp koehn" – Google Akademik
  4. http://www.statmt.org/moses/manual/manual.pdf Moses Manual
  5. http://mloss.org/software/author/philipp-koehn/ mloss | Projects authored by philipp koehn
  6. http://www.statmt.org/europarl/ Europarl Home Page
  7. http://www.statmt.org/ued/ SMT Group Edinburgh – Main/HomePage
  8. http://homepages.inf.ed.ac.uk/pkoehn/resume.html Philipp Koehn's online resume
  9. http://www.systransoft.com/download/press-releases/systran-pr-csli-acquisition-20140425.pdf Press Release – CLSI acquires a controlling stake in SYSTRAN
  10. Web site: CLSI website . 6 May 2020 . https://web.archive.org/web/20150204230237/http://www.clsi.co.kr/ . 4 February 2015 . dead .
  11. http://homepages.inf.ed.ac.uk/pkoehn/resume.html Philipp Koehn's online resume
  12. https://omniscien.com/about-us/company/#PhilippKoehn Omniscien Technologies – About Us/Company
  13. https://web.archive.org/web/20110526002550/http://applij.oxfordjournals.org/content/early/2011/04/27/applin.amr017.extract Philipp Koehn: Statistical Machine Translation
  14. Philipp Koehn, Statistical machine translation . 10.1007/s10590-010-9083-4 . 2010 . Sánchez-Martínez . Felipe . Pérez-Ortiz . Juan Antonio . Machine Translation . 24 . 3–4 . 273–278 .
  15. http://www.mt-archive.info/Koehn-2009-TOC.htm Statistical machine translation – contents
  16. http://www.statmt.org/nmt-book/ Philipp Koehn: Neural Machine Translation
  17. http://www.epo.org/learning-events/european-inventor/finalists/2013/koehn.html EPO: Found in translation: a present-day Rosetta Stone
  18. Web site: International Association for Machine Translation . 8 September 2016 . 24 June 2010 . https://web.archive.org/web/20100624162302/http://www.eamt.org/iamt.php . dead .