CMU Pronouncing Dictionary explained

CMU Pronouncing Dictionary
Developer:Carnegie Mellon University
Latest Release Version:0.7b
Language:English
License:BSD

The CMU Pronouncing Dictionary (also known as CMUdict) is an open-source pronouncing dictionary originally created by the Speech Group at Carnegie Mellon University (CMU) for use in speech recognition research.

CMUdict provides a mapping orthographic/phonetic for English words in their North American pronunciations. It is commonly used to generate representations for speech recognition (ASR), e.g. the CMU Sphinx system, and speech synthesis (TTS), e.g. the Festival system. CMUdict can be used as a training corpus for building statistical grapheme-to-phoneme (g2p) models[1] that will generate pronunciations for words not yet included in the dictionary.

The most recent release is 0.7b; it contains over 134,000 entries. An interactive lookup version is available.[2]

Database format

The database is distributed as a plain text file with one entry to a line in the format "WORD&nbsp;&nbsp;<pronunciation>" with a two-space separator between the parts. If multiple pronunciations are available for a word, variants are identified using numbered versions (e.g. WORD(1)). The pronunciation is encoded using a modified form of the ARPABET system, with the addition of stress marks on vowels of levels 0, 1, and 2. A line-initial ;;; token indicates a comment. A derived format, directly suitable for speech recognition engines is also available as part of the distribution; this format collapses stress distinctions (typically not used in ASR).

The following is a table of phonemes used by CMU Pronouncing Dictionary.[2]

ARPABET! colspan="1"
Rspl.IPAExample
AAahpronounced as /link/odd
AEapronounced as /link/at
AH0əpronounced as /link/about
AHuhpronounced as /link/hut
AOawpronounced as /link/ought, story
AWowpronounced as /aʊ/cow
AYeyepronounced as /aɪ/hide
EHehpronounced as /link/Ed
ARPABET! colspan="1"
Rspl.IPAExample
ERur, ərpronounced as /link/, pronounced as /link/hurt
EYaypronounced as /eɪ/ate
IHi, ihpronounced as /link/it
IYeepronounced as /link/eat
OWohpronounced as /oʊ/oat
OYoypronounced as /ɔɪ/toy
UHuupronounced as /link/hood
UWoopronounced as /link/two
Stress! AB! Description
0No stress
1Primary stress
2Secondary stress
ARPABET! colspan="1"
Rspl.IPAExample
Bbpronounced as /link/be
CHch, tchpronounced as /link/cheese
Ddpronounced as /link/dee
DHdhpronounced as /link/thee
Ffpronounced as /link/fee
Ggpronounced as /link/green
HHhpronounced as /link/he
JHjpronounced as /link/gee
ARPABET! colspan="1"
Rspl.IPAExample
Kkpronounced as /link/key
Llpronounced as /link/lee
Mmpronounced as /link/me
Nnpronounced as /link/knee
NGngpronounced as /link/ping
Pppronounced as /link/pee
Rrpronounced as /link/read
Ss, sspronounced as /link/sea
ARPABET! colspan="1"
Rspl.IPAExample
SHshpronounced as /link/she
Ttpronounced as /link/tea
THthpronounced as /link/theta
Vvpronounced as /link/vee
Ww, whpronounced as /link/we
Yypronounced as /link/yield
Zzpronounced as /link/zee
ZHzhpronounced as /link/seizure

History

VersionRelease date[3] License
0.116 September 1993Public Domain
0.210 March 1994Public Domain
0.328 September 1994Public Domain
0.48 November 1995Public Domain
0.5No public releasePublic Domain
0.611 August 1998Public Domain
0.7No public releasePublic Domain
0.7a18 February 20082-clause BSD
0.7b19 November 2014[4] 2-clause BSD
GitHub (unversioned)26 May 20212-clause BSD

Applications

See also

References

  1. Web site: Sequitur G2P - A trainable Grapheme-to-Phoneme converter.
  2. Web site: The CMU Pronouncing Dictionary . 2015-07-16 . CMU Pronouncing Dictionary . 2022-06-04 . 2022-06-03 . https://web.archive.org/web/20220603181334/http://www.speech.cs.cmu.edu/cgi-bin/cmudict . live.
  3. ftp://ftp.cs.cmu.edu/project/speech/dict/
  4. Web site: CMUdict . svn.code.sf.net.
  5. Web site: Cmusphinx - Revision 10973: /Trunk/Logios . 2009-12-19 . https://web.archive.org/web/20110520085139/https://cmusphinx.svn.sourceforge.net/svnroot/cmusphinx/trunk/logios/ . 2011-05-20 . dead .

External links