Pronunciation assessment explained

Automatic pronunciation assessment is the use of speech recognition to verify the correctness of pronounced speech,[1] as distinguished from manual assessment by an instructor or proctor.[2] Also called speech verification, pronunciation evaluation, and pronunciation scoring, the main application of this technology is computer-aided pronunciation teaching (CAPT) when combined with computer-aided instruction for computer-assisted language learning (CALL), speech remediation, or accent reduction. Pronunciation assessment does not determine unknown speech (as in dictation or automatic transcription) but instead, knowing the expected word(s) in advance, it attempts to verify the correctness of the learner's pronunciation and ideally their intelligibility to listeners,[3] sometimes along with often inconsequential prosody such as intonation, pitch, tempo, rhythm, and syllable and word stress.[4] Pronunciation assessment is also used in reading tutoring, for example in products such as Microsoft Teams[5] and from Amira Learning.[6] Automatic pronunciation assessment can also be used to help diagnose and treat speech disorders such as apraxia.[7]

The earliest work on pronunciation assessment avoided measuring genuine listener intelligibility,[8] a shortcoming corrected in 2011 at the Toyohashi University of Technology, and included in the Versant high-stakes English fluency assessment from Pearson[9] and mobile apps from 17zuoye Education & Technology, but still missing in 2023 products from Google Search,[10] Microsoft,[11] Educational Testing Service,[12] Speechace, and ELSA.[13] Assessing authentic listener intelligibility is essential for avoiding inaccuracies from accent bias, especially in high-stakes assessments;[14] [15] [16] from words with multiple correct pronunciations;[17] and from phoneme coding errors in machine-readable pronunciation dictionaries.[18] In 2022, researchers found that some newer speech to text systems, based on end-to-end reinforcement learning to map audio signals directly into words, produce word and phrase confidence scores closely correlated with genuine listener intelligibility.[19] In the Common European Framework of Reference for Languages (CEFR) assessment criteria for "overall phonological control", intelligibility outweighs formally correct pronunciation at all levels.[20]

Although there are as yet no industry-standard benchmarks for evaluating pronunciation assessment accuracy, researchers occasionally release evaluation speech corpuses for others to use for improving assessment quality.[21] [22] Such evaluation databases often emphasize formally unaccented pronunciation to the exclusion of genuine intelligibility evident from blinded listener transcriptions. Some promising areas for improvement being developed in 2024 include articulatory feature extraction[23] [24] [25] and transfer learning to suppress unnecessary corrections.[26] Other interesting advances under development include "augmented reality" interfaces for mobile devices using optical character recognition to provide pronunciation training on text found in user environments.[27] [28]

See also

External links

Notes and References

  1. Ehsani . Farzad . Knodt . Eva . Speech technology in computer-aided language learning: Strengths and limitations of a new CALL paradigm . Language Learning & Technology . July 1998 . 2 . 1 . 54–73 . 11 February 2023 . University of Hawaii National Foreign Language Resource Center; Michigan State University Center for Language Education and Research . en.
  2. Isaacs . Talia . Harding . Luke . Pronunciation assessment . Language Teaching . July 2017 . 50 . 3 . 347–366 . 10.1017/S0261444817000118 . 209353525 . en . 0261-4448. free .
  3. O’Brien . Mary Grantham . Derwing . Tracey M. . 1 . Directions for the future of technology in pronunciation research and teaching . Journal of Second Language Pronunciation . 31 December 2018 . 4 . 2 . 182–207 . 10.1075/jslp.17001.obr . 86440885 . en . 2215-1931 . pronunciation researchers are primarily interested in improving L2 learners’ intelligibility and comprehensibility, but they have not yet collected sufficient amounts of representative and reliable data (speech recordings with corresponding annotations and judgments) indicating which errors affect these speech dimensions and which do not. These data are essential to train ASR algorithms to assess L2 learners’ intelligibility.. free . 2066/199273 . free .
  4. Eskenazi . Maxine . Using automatic speech processing for foreign language pronunciation tutoring: Some issues and a prototype . Language Learning & Technology . January 1999 . 2 . 2 . 62–76 . 11 February 2023 . en.
  5. News: Tholfsen . Mike . Reading Coach in Immersive Reader plus new features coming to Reading Progress in Microsoft Teams . 12 February 2023 . Techcommunity Education Blog . Microsoft . 9 February 2023 . en.
  6. News: Banerji . Olina . Schools Are Using Voice Technology to Teach Reading. Is It Helping? . 7 March 2023 . EdSurge News . 7 March 2023 . en.
  7. Book: Hair . Adam . Monroe . Penelope . Proceedings of the 17th ACM Conference on Interaction Design and Children . Apraxia world: A speech therapy game for children with speech sound disorders . 1 . 19 June 2018 . 119–131 . 10.1145/3202185.3202733 . 9781450351522 . 13790002 .
  8. (Section 2.2.2.)
  9. Web site: Bonk . Bill . New innovations in assessment: Versant's Intelligibility Index score . Resources for English Language Learners and Teachers . Pearson English . 11 February 2023 . 25 August 2020 . https://web.archive.org/web/20230127122339/https://www.english.com/blog/intelligibility-index-score-versant/ . 2023-01-27 . you don’t need a perfect accent, grammar, or vocabulary to be understandable. In reality, you just need to be understandable with little effort by listeners..
  10. Web site: Snir . Tal . How do you pronounce quokka? Practice with Search . The Keyword . Google . 11 February 2023 . en-us . 14 November 2019.
  11. Web site: Pronunciation assessment tool . Azure Cognitive Services Speech Studio . Microsoft . 11 February 2023.
  12. Book: Chen . Lei . Zechner . Klaus . 1 . Automated Scoring of Nonnative Speech: Using the SpeechRater v. 5.0 Engine . December 2018 . Educational Testing Service . Princeton, NJ . 10.1002/ets2.12198 . ETS Research Report Series . 2018 . 1 . 1–31 . 69925114 . 2330-8516 . 11 February 2023 . en.
  13. Gorham . Jon . Raine . Paul . 1 . Speech Recognition for English Language Learning . March 10, 2022 . Education Solutions . Technology in Language Teaching and Learning . video . en . 2023-02-14.
  14. News: Computer says no: Irish vet fails oral English test needed to stay in Australia . 12 February 2023 . Australian Associated Press . The Guardian . 8 August 2017.
  15. News: Ferrier . Tracey . Australian ex-news reader with English degree fails robot's English test . 12 February 2023 . The Sydney Morning Herald . 9 August 2017 . en.
  16. News: Main . Ed . Watson . Richard . The English test that ruined thousands of lives . 12 February 2023 . BBC News . 9 February 2022.
  17. Web site: Joyce . Katy Spratte . 13 Words That Can Be Pronounced Two Ways . Reader's Digest . 23 February 2023 . January 24, 2023.
  18. E.g., CMUDICT, Web site: The CMU Pronouncing Dictionary . www.speech.cs.cmu.edu . 15 February 2023. Compare "four" given as "F AO R" with the vowel AO as in "caught," to "row" given as "R OW" with the vowel OW as in "oat."
  19. Tu . Zehai . Ma . Ning . Barker . Jon . Unsupervised Uncertainty Measures of Automatic Speech Recognition for Non-intrusive Speech Intelligibility Prediction . 2022 . Proc. Interspeech 2022 . INTERSPEECH 2022 . 3493–3497 . 10.21437/Interspeech.2022-10408 . 17 December 2023 . ISCA.
  20. Book: Common European framework of reference for languages learning, teaching, assessment: Companion volume with new descriptors . February 2018 . . 1090351600 . 136.
  21. database .zip file.
  22. GitHub corpus repository.
  23. cs2 . Wu . Peter . Chen . Li-Wei . 1 . Speaker-Independent Acoustic-to-Articulatory Speech Inversion . 14 February 2023 . eess.AS . 2302.06774 .
  24. Cho . Cheol Jun . Mohamed . Abdelrahman . Black . Alan W. . Anumanchipalli . Gopala K. . Self-Supervised Models of Speech Infer Universal Articulatory Kinematics . 16 January 2024 . eess.AS . 2310.10788 . en.
  25. Mallela . Jhansi . Aluru . Sai Harshitha . Yarra . Chiranjeevi . Exploring the Use of Self-Supervised Representations for Automatic Syllable Stress Detection . 28 February 2024 . 1–6 . 10.1109/NCC60321.2024.10486028 . 10 June 2024 . National Conference on Communications . Chennai, India .
  26. Book: Sancinetti . Marcelo . Vidal . Jazmin . ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) . A Transfer Learning Approach for Pronunciation Scoring . 1 . 23 May 2022 . 6812–6816 . 10.1109/ICASSP43922.2022.9747727 . 2111.00976 . 978-1-6654-0540-9 . 249437375 .
  27. Che Dalim . Che Samihah . Sunar . Mohd Shahrizal . 1 . Using augmented reality with speech input for non-native children's language learning . International Journal of Human-Computer Studies . February 2020 . 134 . 44–64 . 10.1016/j.ijhcs.2019.10.002 . 208098513 . 28 February 2023.
  28. Tolba . Rahma M. . Elarif . Taha . 1 . Mobile Augmented Reality for Learning Phonetics: A Review (2012–2022) . Extended Reality and Metaverse . Springer Proceedings in Business and Economics . 2023 . 87–98 . 10.1007/978-3-031-25390-4_7 . 28 February 2023 . Springer International Publishing . 978-3-031-25389-8 . en.
  29. Book: Mathad . Vikram C. . Mahr . Tristan J. . 22nd Annual Conference of the International Speech Communication Association (INTERSPEECH 2021) . The Impact of Forced-Alignment Errors on Automatic Pronunciation Evaluation . 1 . 2021 . 176–180 . https://www.isca-speech.org/archive/pdfs/interspeech_2021/mathad21_interspeech.pdf . 10.21437/interspeech.2021-1403 . 10 March 2023 . International Speech Communication Association . 9781713836902. 239694157 .