ARPABET explained

pronounced as /notice/ARPABET (also spelled ARPAbet) is a set of phonetic transcription codes developed by Advanced Research Projects Agency (ARPA) as a part of their Speech Understanding Research project in the 1970s. It represents phonemes and allophones of General American English with distinct sequences of ASCII characters. Two systems, one representing each segment with one character (alternating upper- and lower-case letters) and the other with one or two (case-insensitive), were devised, the latter being far more widely adopted.[1]

ARPABET has been used in several speech synthesizers, including Computalker for the S-100 system, SAM for the Commodore 64, SAY for the Amiga, TextAssist for the PC and Speakeasy from Intelligent Artefacts which used the Votrax SC-01 speech synthesiser IC. It is also used in the CMU Pronouncing Dictionary. A revised version of ARPABET is used in the TIMIT corpus.

Symbols

Stress is indicated by a digit immediately following a vowel. Auxiliary symbols are identical in 1- and 2-letter codes. In 2-letter notation, segments are separated by a space.

ARPABET! rowspan="2"
IPAExample(s)
1-letter2-letter
@AApronounced as /link/~pronounced as /link/balm, bot (with father–bother merger)
aAEpronounced as /link/bat
AAHpronounced as /link/butt
cAOpronounced as /link/caught, story
WAWpronounced as /aʊ/bout
xAXpronounced as /link/comma
AXR[2] pronounced as /link/letter, forward
YAYpronounced as /aɪ/bite
EEHpronounced as /link/bet
RERpronounced as /link/bird, foreword
eEYpronounced as /eɪ/bait
IIHpronounced as /link/bit
XIXpronounced as /link/roses, rabbit
iIYpronounced as /link/beat
oOWpronounced as /oʊ/boat
OOYpronounced as /ɔɪ/boy
UUHpronounced as /link/book
uUWpronounced as /link/boot
UXpronounced as /link/dude
ARPABET! rowspan="2"
IPAExample
1-letter2-letter
bBpronounced as /link/buy
CCHpronounced as /link/China
dDpronounced as /link/die
DDHpronounced as /link/thy
FDXpronounced as /link/butter
LELpronounced as /link/bottle
MEMpronounced as /link/rhythm
NENpronounced as /link/button
fFpronounced as /link/fight
gGpronounced as /link/guy
hHH or Hpronounced as /link/high
JJHpronounced as /link/jive
kKpronounced as /link/kite
lLpronounced as /link/lie
mMpronounced as /link/my
nNpronounced as /link/nigh
GNX or NGpronounced as /link/sing
NXpronounced as /link/winner
pPpronounced as /link/pie
QQpronounced as /link/uh-oh
rRpronounced as /link/rye
sSpronounced as /link/sigh
SSHpronounced as /link/shy
tTpronounced as /link/tie
TTHpronounced as /link/thigh
vVpronounced as /link/vie
wWpronounced as /link/wise
HWHpronounced as /link/why (without wine–whine merger)
yYpronounced as /link/yacht
zZpronounced as /link/zoo
ZZHpronounced as /link/pleasure
Stress and auxiliary symbols! AB! Description
0No stress
1Primary stress
2Secondary stress
3...Tertiary and further stress
-Silence
Non-speech segment
+Morpheme boundary
/Word boundary
Utterance boundary
Tone group boundary

1 or .

Falling or declining juncture

2 or ?

Rising or internal juncture

3 or .

Fall-rise or non-terminal juncture

TIMIT

In TIMIT, the following symbols are used in addition to the ones listed above:[3]

SymbolIPAExampleDescription
AX-Hpronounced as /ə̥/suspectDevoiced pronounced as //ə//
BCLpronounced as /b̚/obtainpronounced as /[b]/ closure
DCLpronounced as /d̚/widthpronounced as /[d]/ closure
ENGpronounced as /ŋ̍/WashingtonSyllabic pronounced as /[ŋ]/
GCLpronounced as /ɡ̚/dogtoothpronounced as /[ɡ]/ closure
HVpronounced as /link/aheadVoiced pronounced as //h//
KCLpronounced as /k̚/doctorpronounced as /[k]/ closure
PCLpronounced as /p̚/acceptpronounced as /[p]/ closure
TCLpronounced as /t̚/catnippronounced as /[t]/ closure
PAUPause
EPIEpenthetic silence
H#Begin/end marker

See also

External links

Notes and References

  1. Web site: Klautau. Aldebaro. 2001. ARPABET and the TIMIT alphabet. September 8, 2017. https://web.archive.org/web/20160603180727/http://www.laps.ufpa.br/aldebaro/papers/ak_arpabet01.pdf. June 3, 2016.
  2. Book: Daniel Jurafsky

    . Jurafsky. Daniel. Daniel Jurafsky. Martin. James H.. 2000. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Prentice Hall. 94–5. 0-1309-5069-6.

  3. Web site: Table of all the phonemic and phonetic symbols used in the TIMIT lexicon. Linguistic Data Consortium. October 12, 1990. September 8, 2017.