Arabic Speech Corpus Explained

The Arabic Speech Corpus is a Modern Standard Arabic (MSA) speech corpus for speech synthesis. The corpus contains phonetic and orthographic transcriptions of more than 3.7 hours of MSA speech aligned with recorded speech on the phoneme level. The annotations include word stress marks on the individual phonemes.[1]

The Arabic Speech Corpus was built as part of a doctoral project by Nawar Halabi at the University of Southampton funded by MicroLinkPC who own an exclusive license to commercialise the corpus, but the corpus is available for strictly non-commercial purposes through the official Arabic Speech Corpus website. It is distributed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Purpose

The corpus was mainly built for speech synthesis purposes, specifically Speech Synthesis, but the corpus has been used for building HMM based voices in Arabic. It was also used to automatically align other speech corpora with their phonetic transcript and could be used as part of a larger corpus for training speech recognition systems.[2]

Contents

The package contains the following:

The corpus was also used to prove that using automatically extracted, orthography-based stress marks[3] improve the quality of speech synthesis in MSA.

See also

External links

Notes and References

  1. Halabi . Nawar . 2016 . Modern Standard Arabic Phonetics for Speech Synthesis . PhD Thesis . University of Southampton, School of Electronics and Computer Science.
  2. Halabi . Nawar . 2016 . Modern Standard Arabic Phonetics for Speech Synthesis . PhD Thesis . University of Southampton, School of Electronics and Computer Science.
  3. Word Stress and Vowel Neutralization in Modern Standard Arabic . Jack . Halpern . 2009 . 2nd International Conference on Arabic Language Resources and Tools . Cairo.