A child speech corpus is a speech corpus documenting first-language language acquisition. Such databases are used in the development of computer-assisted language learning systems and the characterization of children's speech at difference ages.[1] Children's speech varies not only by language, but also by region within a language. It can also be different for specific groups like autistic children, especially when emotion is considered. Thus different databases are needed for different populations. Corpora are available for American and British English as well as for many other European languages.[2] [3]
In the table below, the age range may be described in terms of school grades. "K" denotes "kindergarten" while "G" denotes "grade". For example, an age range of "K - G10" refers to speakers ranging from kindergarten age to grade 10.
This table is based on a paper from the Interspeech conference, 2016.[4] This online article is intended to provide an interactive table for readers and a place where information about children speech corpora that can be updated continuously by the speech research community.
Corpus | Author | Languages |
|
| Duration | Age Range | Date | Remarks | |
---|---|---|---|---|---|---|---|---|---|
Boulder Learning—MyST Corpus (v0.4.0) [5] | Cole et al.[6] | English | 1371 | 228,874 | ~393h | G3 - G5 | 2019 | dialog interaction between a student and a virtual tutor on science topics; typically 20-40 minute (wall clock) duration of a session; roughly 49% of the utterances have been transcribed, and more being transcribed. volunteers encouraged. available free for research; flat $10K for commercial use. | |
CMU Kids Corpus [7] | Eskenazi | English | 24M, 52F | 5180 | 6 - 11 | 1997 | |||
CSLU Kids' Speech Corpus [8] | Shobaki | English | 1100 | 1017 | K - G10 | 2007 | |||
PF-STAR Children's Speech Corpus [9] [10] | Russell | English, | 158 | ~14.5h | 4 - 14 | 2006 | word-level transcriptions | ||
CALL-SLT [11] | Rayner | German | 5000 | 2014 | |||||
TBALL [12] | Kazemgadeh | English | 256 | 5000 | 40h | K - G4 | 2005 | partially non-native speech | |
CASS_CHILD [13] | Gao | Mandarin | 23 | 1 - 4 | 2012 | phonetic transcriptions | |||
CU Children's Read and Prompted Speech Corpus [14] | Hagen | English | 663 | ~100 | K - G5 | 2001 | consists of isolated words, sentences and short spontaneous story telling; word-level transcriptions | ||
CU Story Corpus | Hagen | English | 106 | 5000 | 40h | G3 - G5 | 2003 | consists of story prompts and spontaneous spoken summary of the material; word-level transcriptions | |
Providence Corpus [15] | Demuth | English | 6 | 363h | 1 - 3 | 2006 | mother-child spontaneous speech interactions; broad phonetic transcription | ||
Lyon Corpus [16] | Demuth | French | 4 | 185h | 1 - 3 | 2007 | mother-child spontaneous speech interactions; broad phonetic transcription | ||
Demuth Sesotho Corpus [17] | Demuth | Sesotho | 4 | ~13250 | 98h | 2 - 4 | 1992 | family/peer spontaneous speech interactions; morphologically tagged | |
CHIEDE [18] | Garrote | Spanish | 59 | 15444 | ~8h | 2008 | spontaneous conversation, personal interviews, adult-child interaction; orthographic transcriptions; automatic phonological transcription | ||
TIDIGITS [19] | Leonard | English | 326 (101 children) | 6 - 15 | 1993 | mix of adult and child speakers | |||
FAU Aibo Emotion Corpus | Steidl | German | 51 | 9h | 10 - 13 | human-annotated with 11 emotion categories | |||
Swedish NICE Corpus [20] | Bell | 5580 | 8 - 15 | 2005 | consists of child-machine and adult-child interactions; orthographic transcriptions | ||||
SingaKids-Mandarin | Chen | Mandarin | 255 | 79,843 | 125h | 7 - 12 | 2016 | word and phone-level transcriptions; human-annotated proficiency ratings | |
CFSC[21] | Pascual | Filipino | 57 | ~8h | 6-11 | 2012 | consists of children's read speech; contains both good pronunciations and reading miscues; partially transcribed to word- and phoneme-levels |