Like most other Niger–Congo languages, Sesotho is a tonal language, spoken with two basic tones, high (H) and low (L). The Sesotho grammatical tone system (unlike the lexical tone system used in Mandarin, for example) is rather complex and uses a large number of "sandhi" rules.
However, the Sesotho system is by no means the most complicated, nor even one of the more complicated. For example, there exist African grammatical tone languages with much more than just two tonemes, and the existence of breathy voiced consonants in the Nguni and other languages greatly complicates their tonology. (In Sesotho there is absolutely no interaction whatsoever between the tonemes and phones of the syllables.) There are also very few instances of "floating" tones, and fewer grammatical constructs indicated purely by a change in tone. (The most common instances of this are rule 1 of the plain copulative and the formation of many positive participial sub-mood clauses.) The rules are generally not very dramatic either, and there is generally a very strong tendency to preserve underlying high tones. (For example, in the Nguni languages the underlying high tone of verb stems, subjectival concords, the noun pre-prefix, and/or objectival concords often shifts several syllables to the right, to the antepenultimate or penultimate syllable.)
The tone of a syllable is carried by the vowel, or the nasal, if the nasal is syllabic.[1] The tone carried by syllabic pronounced as //l̩// (and, in Northern Sotho and Setswana, syllabic (r) is left over from the elided vowel.
__TOC__
Underlyingly, each syllable of every morpheme may be described as having one of two tone types:[2] high (H [''' ¯ ''']) and null (ø). On the surface, all remaining null tones default to low (the LTA rule below) and the language is therefore spoken with two contrasting tonemes (H and L).
A classic example of a nasal carrying a tone:
To form a locative from a noun, one of the possible procedures involves simply suffixing a low tone pronounced as /[ŋ̩]/ to the noun. To form the locative meaning "on the grass" one suffixes -ng to the word pronounced as /[ʒʷɑŋ̩]/ jwang‡ [''' _ ¯ '''], giving jwanng‡ [''' _ ¯ _ '''] pronounced as /[ʒʷɑŋ̩ŋ̩]/, with the two last syllabic nasals having contrasting tones.
Names, being nouns, frequently have a tonal pattern distinct from the noun:
The Sesotho word for "mother" is pronounced as /[m̩mɛ]/ mme‡ [''' _ ¯ '''], but a child would call their own mother pronounced as /[m̩mɛ]/ mme‡ [''' ¯ _ '''], using it as a first name. Also, pronounced as /[n̩tʼɑtʼe]/ ntate‡ [''' _ _ ¯ '''] means "father", while pronounced as /[n̩tʼɑtʼe]/ ntate‡ [''' _ ¯ ¯ '''] might be used by a small child to address their father.
In speech, the two surface tonemes may be pronounced as one of several allotones due to the influence of surrounding tones and the length of the syllable. These changes naturally occur due to the way the language is spoken, including the effect of the penultimate lengthening, but ultimately each syllable of every morpheme may be completely described as having only high and low tones.[3]
In this and related articles, the tonemes of a word are delimited with square brackets and the specific (approximate) spoken allotones are between curly braces.
pronounced as /[lɪpʼɑtʼɑ]/ lepata euphemism; tonemes: [''' _ ¯ _ '''] (L—H — L), allotones: (low—high-falling—low)
Allotone | Where found | Example | |
H [''' ¯ '''] | extra-high | After another H, not penult | pronounced as /[mʊʀiʀiɲɑnɑ]/ moririnyana ('small hair') |
---|---|---|---|
high | In the bodies of words | pronounced as /[hʊlekʼʊlɑ]/ ho lekola ('to investigate') | |
mid | Finally in a phrase | pronounced as /[mʊpʼedi]/ Mopedi Mopedi ('person') | |
high-falling | Penultimate syllable of phrase before L | pronounced as /[m̩mu'ɔ]/ mmuo ('dialect') | |
high-mid | Penult before H | pronounced as /[mʊsɑdi]/ mosadi ('woman') | |
L [''' _ '''] | extra-low | Finally after another L | pronounced as /[tʼɪmɔ]/ temo ('agriculture') |
low | Noun class prefixes, in the bodies of words, and finally after H | pronounced as /[hʊsɛbɑ]/ ho seba ('to perform mischief') | |
low-falling | Penultimate | pronounced as /[tʼʷɛbɑ]/ tweba ('mouse') |
Thus in all there are, at least in our analysis, eight allotones[4] .
Most of these allotones only appear on the final word in the phrase in moderately slow or emphasised speech. When not phrase-final, the mid, high-falling, high-mid, low-falling, and extra-low allotones are normally not heard. Bear in mind that the falling tones only occur on lengthened syllables, and if a word has irregular stress then the falling tones will not appear on the penult (for example, the second form of the first demonstrative pronoun has tonemic pattern [''' ¯ ¯ '''] which is pronounced due to the stressed final syllable).
There are no rising tones. For example, [''' _ ¯ '''] (where the L is penultimate) is pronounced though one might have expected *. This is a general trend among almost all Bantu languages with (contrastive or stressed) lengthened vowels, though languages with depressor consonants do have audible upward "swoops" on depressing syllable onsets which may be interpreted as rising allotones.
There are several cases of seemingly tonemic instances of some of these allotones. As expected, some ideophones and radical interjectives have strange tones, but relative concord has an irregular extra-high tone (except when used to form demonstrative pronouns). The difference in relative pitch between the high tone and its extra-high allotone is less than that between the low and high tones.
The purpose of the tones can fall into at least one of the following categories:
Each complete Sesotho word has an inherent tone for its syllables, which, although not essential to forming correct speech, will betray a foreign accent:
pronounced as /[mʊtʰʊ]/ motho‡ [''' _ _ '''] ('human being')
pronounced as /[ɲ̩t͡ʃʼɑ]/ ntja‡ [''' _ ¯ '''] ('dog')
pronounced as /[mʊsʊtʰʊ]/ Mosotho‡ [''' _ ¯ _ '''] ('singular of Basotho')
pronounced as /[lɪʀɑtʼɑ]/ lerata‡ [''' _ _ ¯ '''] ('noise')
Various factors mean that the tones of a word may change, but the characteristic tone in a Sesotho word is found when the word is the last in a question sentence not employing the interrogative adverb pronounced as /[nɑ]/ na?. In this situation, downdrift is greatly attenuated, the penultimate syllable of the sentence is short (although the vowel of the last syllable may completely cut), and the tone of the last word is largely preserved (though a final H tone may fall to L).
pronounced as /[ʊbɑt͡ɬʼɑ{{nbsp|2}}hʊ'ɪbɑ{{nbsp|2}}sɪt͡sʼiˌbi]/ O batla ho eba setsebi‡ ('you want to be a scientist')
pronounced as /[nɑ{{nbsp|2}}ʊbɑt͡ɬʼɑ{{nbsp|2}}hʊ'ɪbɑ{{nbsp|2}}sɪt͡sʼiˌbi]/ Na o batla ho eba setsebi?‡ ('do you want to be a scientist?')
pronounced as /[ʊbɑt͡ɬʼɑ{{nbsp|2}}hʊ'ɪbɑ{{nbsp|2}}sɪt͡sʼiˌbi]/ O batla ho eba setsebi?‡ ('do you want to be a scientist?')
The most important property of tonal languages which distinguishes them from languages that merely use pitch as part of intonation (such as English) is the existence of numerous tonal minimal pairs. Often, a few words may be composed of exactly the same syllables/phonemes, yet have different characteristic tones (the example H verbs have low final tone due to the Finality Restriction):
pronounced as /[hʊ'ɑkʼɑ]/ ho aka‡ [''' _ ¯ _ '''] ('to kiss')
pronounced as /[hʊ'ɑkʼɑ]/ ho aka‡ [''' _ _ _ '''] ('to tell lies')
pronounced as /[ʒʷɑŋ̩]/ jwang‡ [''' _ ¯ '''] ('grass')
pronounced as /[ʒʷɑŋ̩]/ jwang‡ [''' ¯ _ '''] ('how?')
pronounced as /[hʊtʼɪnɑ]/ ho tena‡ [''' _ ¯ _ '''] ('to wear')
pronounced as /[hʊtʼɪnɑ]/ ho tena‡ [''' _ _ _ '''] ('to disgust')
There are, however, several basic homophones pronounced with exactly the same tonal patterns. In these cases only the context may be used to distinguish between the different meanings.
pronounced as /[lɑ'ʊlɑ]/ -laola L verb (i) 'rule'; (ii) 'divine'
pronounced as /[ʀʊlɑ]/ -rola H verb (i) 'to forge metal', 'to hammer'; (ii) 'to undress'
pronounced as /[mʊɬʷɑ]/ mohlwa [''' _ ¯ '''] (i) 'termite'; (ii) 'lawn grass (of the graminaceae family)'
There are instances of words being changed either through inflexion or derivation and as a result ending up sounding exactly like other words.
pronounced as /[ɬɔlɔ]/ hlolo [''' _ _ '''] (i) 'hare', (ii) 'creation' (from the L verb pronounced as /[ɬɔlɑ]/ -hlola)
It regularly occurs that two otherwise similar sounding phrases may have two very different meanings mainly due to a difference in tone of one or more words or concords.
pronounced as /[kʼɪŋʷɑnɑ{{nbsp|2}}wɑhɑ'ʊ]/ Ke ngwana wa hao‡ [''' _ _ ¯ ¯ ¯ _ '''] ('I am your child')
pronounced as /[kʼɪŋʷɑnɑ{{nbsp|2}}wɑhɑ'ʊ]/ Ke ngwana wa hao‡ [''' ¯ _ ¯ ¯ ¯ _ '''] ('he/she/it is your child')
pronounced as /[ʊmʊbɪ]/ O mobe‡ [''' _ _ ¯ '''] ('you are ugly')
pronounced as /[ʊmʊbɪ]/ O mobe‡ [''' ¯ _ ¯ '''] ('he/she is ugly')
pronounced as /[kʼɪbɑt͡ɬʼɑnɑ{{nbsp|2}}lɪbɔnɑ]/ Ke batlana le bona‡ [''' _ _ _ _ ¯ _ _ '''] ('I am looking for them' present indicative mood)
pronounced as /[kʼɪbɑt͡ɬʼɑnɑ{{nbsp|2}}lɪbɔnɑ]/ Ke batlana le bona‡ [''' ¯ ¯ _ _ ¯ _ _ '''] ('as I was looking for them' participial sub-mood; this is not a complete sentence but part of a longer sentence)
Note that when grammatical tone is used the tone of the significant word may influence the relative pitch of the rest of the phrase, although the tones of other words tend to remain intact.
Downdrift, where the absolute pitch (not tones) of the speaker's voice is gradually decreased as the sentence continues (often resulting in initial low tones being pronounced at a higher pitch than final high tones), is a feature during natural speech. Basically, a high tone immediately following a low tone is pronounced at a slightly lower frequency than a previous high tone.
Additionally, a slightly more dramatic lowering of pitch (a downstep) may occur between certain syllables. In Sesotho, the downstep (indicated with a !) naturally occurs between words (being less noticeable if the first word has no low tones) though there is at least one instances (in rule 1 of the plain copulative) where the lack of downstep (as well as other tonal factors) changes the utterance's meaning. In the following example, a grave accent (à) indicates a low tone and an acute accent (á) indicates a high tone.
This downdrift is greatly attenuated when the sentence is a question not using the interrogative adverb pronounced as /[nɑ]/ na?.
Sesotho verb stems fall into two categories: H stems and L stems. The difference lies in the "underlying tone" of the stem's first syllable (or the stem's "basic tone") being either high or null. When used with an object in the indicative remote future tense (the simple pronounced as /[t͡ɬʼɑ]/ -tla- tense) the verb's stem is monotonous (all syllables high toned or all low toned) with the underlying tone of the first syllable spread to all the following syllables.
Nouns derived from the verb stem are fossilised with the tones of the simple class 15 infinitive as appears in medial positions without a subject or object. The procedure for creating this tonal pattern is intricate and involves several tonal rules.
These factors may also apply in normal verbal conjugations. Adding a verbal suffix (through derivation, not inflexion) creates a new verb stem which falls in the same tone category as the original, and is subject to the same rules.
pronounced as /[pʼɑǃɑmɑ]/ -paqama (L verb stem) lie (face downwards) ⇒ pronounced as /[hʊpʼɑǃɑmɑ]/ ho paqama [''' _ _ _ _ '''] ('to lie') ⇒ pronounced as /[hʊpʼɑǃɑmisɑ]/ ho paqamisa [''' _ _ _ _ _ '''] ('to cause to lie') ⇒ pronounced as /[hʊpʼɑǃɑmisɑ]/ ho paqamisuwa [''' _ _ _ _ _ _ '''] ('to be caused to lie'), etc.
pronounced as /[ɑɬʊlɑ]/ -ahlola (H verb stem) ('judge') ⇒ pronounced as /[hʊ'ɑɬʊlɑ]/ ho ahlola [''' _ ¯ ¯ _ '''] ('to judge') pronounced as /[kʼɑɬʊlɔ]/ kahlolo [''' ¯ ¯ _ '''] ('judgement'), pronounced as /[mʊ'ɑɬuˌdi]/ moahlodi [''' _ ¯ ¯ _ '''] ('judge'), pronounced as /[bʊ'ɑɬuˌdi]/ boahlodi [''' _ ¯ ¯ _ '''] ('state of being a judge')
The tones of the noun prefixes of nouns derived from verbs are independent of the tones of the stem.
Some nouns derived from verbs have idiomatic tonal patterns independent of the original verb stem's tones.
pronounced as /[lʊkʼɑ]/ -loka (L verb stem) ('be sufficient') ⇒ pronounced as /[lʊkʼɛlɑ]/ -lokela ('be sufficient for') ⇒ pronounced as /[tʼʊkʼɛlɑ]/ tokelo ('human right'; irregular tone [''' _ ¯ _ '''] instead of the expected [''' _ _ _ '''])
Several "tonal melodies" may be assigned to certain verbal conjugations based on the desired tense, aspect, and mood (for example, with many verb conjugations the only difference between the indicative mood and the participial sub-mood is one of tone). These are applied before most other rules and may be indicated by a code including the symbols H (high tone), L (low tone), B (verb stem's basic tone), and * (iteratively applying the preceding tone).
For example, applying the (present) "Subjunctive Melody" (HL*H) to the H verb stem pronounced as /[bɔnɑ]/ -bona ('see') and the L verb stem pronounced as /[ʃɛbɑ]/ -sheba ('look for') results in both pronounced as /[kʼɪʃɛbɛ{{nbsp|2}}tʼɑ'u]/ ke shebe tau ('so I may look at the lion') and pronounced as /[kʼɪbɔnɛ{{nbsp|2}}tʼɑ'u]/ ke bone tau ('so I may see the lion') being pronounced with exactly the same tone pattern [''' ¯ ¯ ¯ _ ¯ '''].
Another way to designate the melodies is to use a standard template of the tense in question and indicate the melody by assigning tones to specific syllables in the resultant word (for example, the final syllable, the subjectival concord, etc.). So for the above example the Subjunctive Melody (actually, present-future subjunctive) may be specified by putting H tones on the first syllable (the subjectival concord's basic tone is ignored), the second syllable, and final syllable of the word and putting an explicit L tone on the fourth syllable (unless if the verb is disyllabic, in which case the fourth syllable is the final syllable and has an H tone)—thus preventing HTD.
Sesotho is a grammatical tone language; this means that words may be pronounced with varying tonal patterns depending on their particular function in a sentence. Another interpretation is that the tones of the language interact in their own intricate "tonal grammar."
In order to create certain grammatical constructs, certain tonal rules may be used to modify the underlying tones of the word to create their surface tones. The words are then spoken using the surface tones.
This system is naturally somewhat complex. Indeed, the development of autosegmental phonology was largely motivated by the need for a satisfactory theoretical framework to deal with the tonal grammars of Niger–Congo languages. This article attempts to explain certain aspects of Sesotho tonology in a rule-based autosegmental framework.
The rules presented below are almost exclusively used in constructing the verbal complex as this is the part of speech most radically affected by the tonal grammar.
Autosegmental phonology was motivated by the need to represent properties which seem to span several "segments" (in our case, syllables) and seem to be somewhat independent of them. Underlyingly (that is, in the speaker's lexicon), some, but not necessarily all, of the segments of morphemes are associated with one or more properties. The segments are on one "tier" and their properties are on another, and the relationships between the two are indicated by joining them with association lines as follows:
Each of the rules changes the associations in some way. For example, High Tone Doubling (HTD) causes the underlying H tone on the first syllable of the verb to also be linked to the syllable immediately to the right:
In this article, the application of several rules in succession will be indicated with the following abbreviation:
The fact that the line emanating from the second syllable is only linked on the HTD line means that this is the first time that syllable is associated with that property.
One popular classification of tonal Bantu languages broadly separates them into two group: shifting languages and spreading languages. The Sotho–Tswana languages are bounded spreading languages as they have primitive rules which directly cause underlying high tones to be associated with (spread to) syllables to the right. The closely related Nguni languages, on the other hand, are unbounded shifting languages as they have primitive rules which directly cause underlying high tones to be moved (shifted to) syllables to the right. The following table presents an informal comparison between the tonal processes found in Sesotho and isiZulu (= isiZulu, = Sesotho):
Bounded | Unbounded | ||
Spread | ▄ | ▄ | |
Shift |
In the table, a process is unbounded if there is no set limit on the number of syllables over which it may occur. Sesotho has basic bounded spread (High Tone Doubling) and isiZulu has basic unbounded shift. Bounded shift in Sesotho occurs as the cumulative effect of bounded right tone spread (High Tone Doubling) and Left Branch Delinking, while various forms of spreading may occur in isiZulu if the word is very short or has two or more underlying highs.
In dealing with verbs, the following rules may be applied at various times:
| HTD |
|
| ITS |
|
| RBD |
|
| LBD |
|
| FR |
|
| LTA |
|
To construct many verb forms, including many positive indicative tenses without direct objects as well as infinitives, the following rules are applied in order:Note that the three main levels are always applied in this order, though the actual rules contained in the levels will change depending on the parts of speech, verb moods, etc. For the word pronounced as /[ʊ'ɑbinɑ]/ o a bina ('she is singing') the application of the rules is as follows:
The word appears on the surface with tonal pattern [''' ¯ _ ¯ _ '''].
Furthermore, the second last syllable of the word is lengthened (or "stressed"), and the interaction of the tones as well as the penultimate lengthening results in the word being pronounced with pitch levels .
Extending the word by one syllable (pronounced as /[ʊ'ɑbin̩t͡sʰɑ]/ o a bintsha 'She is conducting'):
The word appears on the surface with tonal pattern [''' ¯ _ ¯ ¯ _ '''] (the high beneath the third syllable is associated with two syllables).
The second last syllable of the word is lengthened and the interaction of the tones as well as the penultimate lengthening results in the word being pronounced with pitch levels .