Sotho tonology explained

Like most other Niger–Congo languages, Sesotho is a tonal language, spoken with two basic tones, high (H) and low (L). The Sesotho grammatical tone system (unlike the lexical tone system used in Mandarin, for example) is rather complex and uses a large number of "sandhi" rules.

However, the Sesotho system is by no means the most complicated, nor even one of the more complicated. For example, there exist African grammatical tone languages with much more than just two tonemes, and the existence of breathy voiced consonants in the Nguni and other languages greatly complicates their tonology. (In Sesotho there is absolutely no interaction whatsoever between the tonemes and phones of the syllables.) There are also very few instances of "floating" tones, and fewer grammatical constructs indicated purely by a change in tone. (The most common instances of this are rule 1 of the plain copulative and the formation of many positive participial sub-mood clauses.) The rules are generally not very dramatic either, and there is generally a very strong tendency to preserve underlying high tones. (For example, in the Nguni languages the underlying high tone of verb stems, subjectival concords, the noun pre-prefix, and/or objectival concords often shifts several syllables to the right, to the antepenultimate or penultimate syllable.)

The tone of a syllable is carried by the vowel, or the nasal, if the nasal is syllabic.^[1] The tone carried by syllabic pronounced as //l̩// (and, in Northern Sotho and Setswana, syllabic (r) is left over from the elided vowel.

__TOC__

Tone types

Underlyingly, each syllable of every morpheme may be described as having one of two tone types:^[2] high (H [''' ¯ ''']) and null (ø). On the surface, all remaining null tones default to low (the LTA rule below) and the language is therefore spoken with two contrasting tonemes (H and L).

A classic example of a nasal carrying a tone:

To form a locative from a noun, one of the possible procedures involves simply suffixing a low tone pronounced as /[ŋ̩]/ to the noun. To form the locative meaning "on the grass" one suffixes -ng to the word pronounced as /[ʒʷɑŋ̩]/ jwang^{‡ [''' _ ¯ '''], giving jwanng^{‡ [''' _ ¯ _ '''] pronounced as /[ʒʷɑŋ̩ŋ̩]/, with the two last syllabic nasals having contrasting tones.}}

Names, being nouns, frequently have a tonal pattern distinct from the noun:

The Sesotho word for "mother" is pronounced as /[m̩mɛ]/ mme^{‡ [''' _ ¯ '''], but a child would call their own mother pronounced as /[m̩mɛ]/ mme^{‡ [''' ¯ _ '''], using it as a first name. Also, pronounced as /[n̩tʼɑtʼe]/ ntate^{‡ [''' _ _ ¯ '''] means "father", while pronounced as /[n̩tʼɑtʼe]/ ntate^{‡ [''' _ ¯ ¯ '''] might be used by a small child to address their father.}}}}

Allotones

In speech, the two surface tonemes may be pronounced as one of several allotones due to the influence of surrounding tones and the length of the syllable. These changes naturally occur due to the way the language is spoken, including the effect of the penultimate lengthening, but ultimately each syllable of every morpheme may be completely described as having only high and low tones.^[3]

In this and related articles, the tonemes of a word are delimited with square brackets and the specific (approximate) spoken allotones are between curly braces.

pronounced as /[lɪpʼɑtʼɑ]/ lepata euphemism; tonemes: [''' _ ¯ _ '''] (L—H — L), allotones: (low—high-falling—low)


Allotone	Where found	Example
H [''' ¯ ''']	extra-high	After another H, not penult	pronounced as /[mʊʀiʀiɲɑnɑ]/ moririnyana ('small hair')
	high	In the bodies of words	pronounced as /[hʊlekʼʊlɑ]/ ho lekola ('to investigate')
	mid	Finally in a phrase	pronounced as /[mʊpʼedi]/ Mopedi Mopedi ('person')
	high-falling	Penultimate syllable of phrase before L	pronounced as /[m̩mu'ɔ]/ mmuo ('dialect')
	high-mid	Penult before H	pronounced as /[mʊsɑdi]/ mosadi ('woman')
L [''' _ ''']	extra-low	Finally after another L	pronounced as /[tʼɪmɔ]/ temo ('agriculture')
	low	Noun class prefixes, in the bodies of words, and finally after H	pronounced as /[hʊsɛbɑ]/ ho seba ('to perform mischief')
	low-falling	Penultimate	pronounced as /[tʼʷɛbɑ]/ *twe*ba ('mouse')

Thus in all there are, at least in our analysis, eight allotones^[4] .

Most of these allotones only appear on the final word in the phrase in moderately slow or emphasised speech. When not phrase-final, the mid, high-falling, high-mid, low-falling, and extra-low allotones are normally not heard. Bear in mind that the falling tones only occur on lengthened syllables, and if a word has irregular stress then the falling tones will not appear on the penult (for example, the second form of the first demonstrative pronoun has tonemic pattern [''' ¯ ¯ '''] which is pronounced due to the stressed final syllable).

There are no rising tones. For example, [''' _ ¯ '''] (where the L is penultimate) is pronounced though one might have expected *. This is a general trend among almost all Bantu languages with (contrastive or stressed) lengthened vowels, though languages with depressor consonants do have audible upward "swoops" on depressing syllable onsets which may be interpreted as rising allotones.

There are several cases of seemingly tonemic instances of some of these allotones. As expected, some ideophones and radical interjectives have strange tones, but relative concord has an irregular extra-high tone (except when used to form demonstrative pronouns). The difference in relative pitch between the high tone and its extra-high allotone is less than that between the low and high tones.

Tone usage

The purpose of the tones can fall into at least one of the following categories:

Characteristic tone

Each complete Sesotho word has an inherent tone for its syllables, which, although not essential to forming correct speech, will betray a foreign accent:

pronounced as /[mʊtʰʊ]/ motho^{‡ [''' _ _ '''] ('human being')}

pronounced as /[ɲ̩t͡ʃʼɑ]/ ntja^{‡ [''' _ ¯ '''] ('dog')}

pronounced as /[mʊsʊtʰʊ]/ Mosotho^{‡ [''' _ ¯ _ '''] ('singular of Basotho')}

pronounced as /[lɪʀɑtʼɑ]/ lerata^{‡ [''' _ _ ¯ '''] ('noise')}

Various factors mean that the tones of a word may change, but the characteristic tone in a Sesotho word is found when the word is the last in a question sentence not employing the interrogative adverb pronounced as /[nɑ]/ na?. In this situation, downdrift is greatly attenuated, the penultimate syllable of the sentence is short (although the vowel of the last syllable may completely cut), and the tone of the last word is largely preserved (though a final H tone may fall to L).

pronounced as /[ʊbɑt͡ɬʼɑ{{nbsp|2}}hʊ'ɪbɑ{{nbsp|2}}sɪt͡sʼiˌbi]/ O batla ho eba setsebi^{‡ ('you want to be a scientist')}

pronounced as /[nɑ{{nbsp|2}}ʊbɑt͡ɬʼɑ{{nbsp|2}}hʊ'ɪbɑ{{nbsp|2}}sɪt͡sʼiˌbi]/ Na o batla ho eba setsebi?^{‡ ('do you want to be a scientist?')}

pronounced as /[ʊbɑt͡ɬʼɑ{{nbsp|2}}hʊ'ɪbɑ{{nbsp|2}}sɪt͡sʼiˌbi]/ O batla ho eba setsebi?^{‡ ('do you want to be a scientist?')}

Distinguishing/semantic tone

The most important property of tonal languages which distinguishes them from languages that merely use pitch as part of intonation (such as English) is the existence of numerous tonal minimal pairs. Often, a few words may be composed of exactly the same syllables/phonemes, yet have different characteristic tones (the example H verbs have low final tone due to the Finality Restriction):

pronounced as /[hʊ'ɑkʼɑ]/ ho aka^{‡ [''' _ ¯ _ '''] ('to kiss')}

pronounced as /[hʊ'ɑkʼɑ]/ ho aka^{‡ [''' _ _ _ '''] ('to tell lies')}

pronounced as /[ʒʷɑŋ̩]/ jwang^{‡ [''' _ ¯ '''] ('grass')}

pronounced as /[ʒʷɑŋ̩]/ jwang^{‡ [''' ¯ _ '''] ('how?')}

pronounced as /[hʊtʼɪnɑ]/ ho tena^{‡ [''' _ ¯ _ '''] ('to wear')}

pronounced as /[hʊtʼɪnɑ]/ ho tena^{‡ [''' _ _ _ '''] ('to disgust')}

There are, however, several basic homophones pronounced with exactly the same tonal patterns. In these cases only the context may be used to distinguish between the different meanings.

pronounced as /[lɑ'ʊlɑ]/ -laola L verb (i) 'rule'; (ii) 'divine'

pronounced as /[ʀʊlɑ]/ -rola H verb (i) 'to forge metal', 'to hammer'; (ii) 'to undress'

pronounced as /[mʊɬʷɑ]/ mohlwa [''' _ ¯ '''] (i) 'termite'; (ii) 'lawn grass (of the graminaceae family)'

There are instances of words being changed either through inflexion or derivation and as a result ending up sounding exactly like other words.

pronounced as /[ɬɔlɔ]/ hlolo [''' _ _ '''] (i) 'hare', (ii) 'creation' (from the L verb pronounced as /[ɬɔlɑ]/ -hlola)

Grammatical tone

It regularly occurs that two otherwise similar sounding phrases may have two very different meanings mainly due to a difference in tone of one or more words or concords.

pronounced as /[kʼɪŋʷɑnɑ{{nbsp|2}}wɑhɑ'ʊ]/ Ke ngwana wa hao^{‡ [''' _ _ ¯ ¯ ¯ _ '''] ('I am your child')}

pronounced as /[kʼɪŋʷɑnɑ{{nbsp|2}}wɑhɑ'ʊ]/ Ke ngwana wa hao^{‡ [''' ¯ _ ¯ ¯ ¯ _ '''] ('he/she/it is your child')}

pronounced as /[ʊmʊbɪ]/ O mobe^{‡ [''' _ _ ¯ '''] ('you are ugly')}

pronounced as /[ʊmʊbɪ]/ O mobe^{‡ [''' ¯ _ ¯ '''] ('he/she is ugly')}

pronounced as /[kʼɪbɑt͡ɬʼɑnɑ{{nbsp|2}}lɪbɔnɑ]/ Ke batlana le bona^{‡ [''' _ _ _ _ ¯ _ _ '''] ('I am looking for them' present indicative mood)}

pronounced as /[kʼɪbɑt͡ɬʼɑnɑ{{nbsp|2}}lɪbɔnɑ]/ Ke batlana le bona^{‡ [''' ¯ ¯ _ _ ¯ _ _ '''] ('as I was looking for them' participial sub-mood; this is not a complete sentence but part of a longer sentence)}

Note that when grammatical tone is used the tone of the significant word may influence the relative pitch of the rest of the phrase, although the tones of other words tend to remain intact.

Downdrift

Downdrift, where the absolute pitch (not tones) of the speaker's voice is gradually decreased as the sentence continues (often resulting in initial low tones being pronounced at a higher pitch than final high tones), is a feature during natural speech. Basically, a high tone immediately following a low tone is pronounced at a slightly lower frequency than a previous high tone.

Additionally, a slightly more dramatic lowering of pitch (a downstep) may occur between certain syllables. In Sesotho, the downstep (indicated with a !) naturally occurs between words (being less noticeable if the first word has no low tones) though there is at least one instances (in rule 1 of the plain copulative) where the lack of downstep (as well as other tonal factors) changes the utterance's meaning. In the following example, a grave accent (à) indicates a low tone and an acute accent (á) indicates a high tone.

This downdrift is greatly attenuated when the sentence is a question not using the interrogative adverb pronounced as /[nɑ]/ na?.

Verb tone

Sesotho verb stems fall into two categories: H stems and L stems. The difference lies in the "underlying tone" of the stem's first syllable (or the stem's "basic tone") being either high or null. When used with an object in the indicative remote future tense (the simple pronounced as /[t͡ɬʼɑ]/ -tla- tense) the verb's stem is monotonous (all syllables high toned or all low toned) with the underlying tone of the first syllable spread to all the following syllables.

Nouns derived from the verb stem are fossilised with the tones of the simple class 15 infinitive as appears in medial positions without a subject or object. The procedure for creating this tonal pattern is intricate and involves several tonal rules.

These factors may also apply in normal verbal conjugations. Adding a verbal suffix (through derivation, not inflexion) creates a new verb stem which falls in the same tone category as the original, and is subject to the same rules.

pronounced as /[pʼɑǃɑmɑ]/ -paqama (L verb stem) lie (face downwards) ⇒ pronounced as /[hʊpʼɑǃɑmɑ]/ ho paqama [''' _ _ _ _ '''] ('to lie') ⇒ pronounced as /[hʊpʼɑǃɑmisɑ]/ ho paqamisa [''' _ _ _ _ _ '''] ('to cause to lie') ⇒ pronounced as /[hʊpʼɑǃɑmisɑ]/ ho paqamisuwa [''' _ _ _ _ _ _ '''] ('to be caused to lie'), etc.

pronounced as /[ɑɬʊlɑ]/ -ahlola (H verb stem) ('judge') ⇒ pronounced as /[hʊ'ɑɬʊlɑ]/ ho ahlola [''' _ ¯ ¯ _ '''] ('to judge') pronounced as /[kʼɑɬʊlɔ]/ kahlolo [''' ¯ ¯ _ '''] ('judgement'), pronounced as /[mʊ'ɑɬuˌdi]/ moahlodi [''' _ ¯ ¯ _ '''] ('judge'), pronounced as /[bʊ'ɑɬuˌdi]/ boahlodi [''' _ ¯ ¯ _ '''] ('state of being a judge')

The tones of the noun prefixes of nouns derived from verbs are independent of the tones of the stem.

Some nouns derived from verbs have idiomatic tonal patterns independent of the original verb stem's tones.

pronounced as /[lʊkʼɑ]/ -loka (L verb stem) ('be sufficient') ⇒ pronounced as /[lʊkʼɛlɑ]/ -lokela ('be sufficient for') ⇒ pronounced as /[tʼʊkʼɛlɑ]/ tokelo ('human right'; irregular tone [''' _ ¯ _ '''] instead of the expected [''' _ _ _ '''])

Several "tonal melodies" may be assigned to certain verbal conjugations based on the desired tense, aspect, and mood (for example, with many verb conjugations the only difference between the indicative mood and the participial sub-mood is one of tone). These are applied before most other rules and may be indicated by a code including the symbols H (high tone), L (low tone), B (verb stem's basic tone), and * (iteratively applying the preceding tone).

For example, applying the (present) "Subjunctive Melody" (HL*H) to the H verb stem pronounced as /[bɔnɑ]/ -bona ('see') and the L verb stem pronounced as /[ʃɛbɑ]/ -sheba ('look for') results in both pronounced as /[kʼɪʃɛbɛ{{nbsp|2}}tʼɑ'u]/ ke shebe tau ('so I may look at the lion') and pronounced as /[kʼɪbɔnɛ{{nbsp|2}}tʼɑ'u]/ ke bone tau ('so I may see the lion') being pronounced with exactly the same tone pattern [''' ¯ ¯ ¯ _ ¯ '''].

Another way to designate the melodies is to use a standard template of the tense in question and indicate the melody by assigning tones to specific syllables in the resultant word (for example, the final syllable, the subjectival concord, etc.). So for the above example the Subjunctive Melody (actually, present-future subjunctive) may be specified by putting H tones on the first syllable (the subjectival concord's basic tone is ignored), the second syllable, and final syllable of the word and putting an explicit L tone on the fourth syllable (unless if the verb is disyllabic, in which case the fourth syllable is the final syllable and has an H tone)—thus preventing HTD.

Tonal rules

Sesotho is a grammatical tone language; this means that words may be pronounced with varying tonal patterns depending on their particular function in a sentence. Another interpretation is that the tones of the language interact in their own intricate "tonal grammar."

In order to create certain grammatical constructs, certain tonal rules may be used to modify the underlying tones of the word to create their surface tones. The words are then spoken using the surface tones.

This system is naturally somewhat complex. Indeed, the development of autosegmental phonology was largely motivated by the need for a satisfactory theoretical framework to deal with the tonal grammars of Niger–Congo languages. This article attempts to explain certain aspects of Sesotho tonology in a rule-based autosegmental framework.

The rules presented below are almost exclusively used in constructing the verbal complex as this is the part of speech most radically affected by the tonal grammar.

About autosegmental phonology

Autosegmental phonology was motivated by the need to represent properties which seem to span several "segments" (in our case, syllables) and seem to be somewhat independent of them. Underlyingly (that is, in the speaker's lexicon), some, but not necessarily all, of the segments of morphemes are associated with one or more properties. The segments are on one "tier" and their properties are on another, and the relationships between the two are indicated by joining them with association lines as follows:

Each of the rules changes the associations in some way. For example, High Tone Doubling (HTD) causes the underlying H tone on the first syllable of the verb to also be linked to the syllable immediately to the right:

In this article, the application of several rules in succession will be indicated with the following abbreviation:

The fact that the line emanating from the second syllable is only linked on the HTD line means that this is the first time that syllable is associated with that property.

Typology

One popular classification of tonal Bantu languages broadly separates them into two group: shifting languages and spreading languages. The Sotho–Tswana languages are bounded spreading languages as they have primitive rules which directly cause underlying high tones to be associated with (spread to) syllables to the right. The closely related Nguni languages, on the other hand, are unbounded shifting languages as they have primitive rules which directly cause underlying high tones to be moved (shifted to) syllables to the right. The following table presents an informal comparison between the tonal processes found in Sesotho and isiZulu (= isiZulu, = Sesotho):

**Sesotho and isiZulu tonal effects**
	Bounded	Unbounded
Spread	▄	▄
Shift

In the table, a process is unbounded if there is no set limit on the number of syllables over which it may occur. Sesotho has basic bounded spread (High Tone Doubling) and isiZulu has basic unbounded shift. Bounded shift in Sesotho occurs as the cumulative effect of bounded right tone spread (High Tone Doubling) and Left Branch Delinking, while various forms of spreading may occur in isiZulu if the word is very short or has two or more underlying highs.

Some tonal rules

In dealing with verbs, the following rules may be applied at various times:

High Tone Doubling (HTD) causes the H tone found on the first syllable of the verb stem, or on an H toned subjectival concord (whether it is used as part of a verb or a copulative), to be spread to (associated with) the syllable immediately to the right. For example, ("They see" with no direct object; the bullets • are used here to join the parts of single words which would have been written separately in the current disjunctive orthography):

││ HH

HTD

├├ HH

Iterative Tone Spread (ITS) causes the H tone found on the first syllable of the verb stem to be spread repeatedly to the right until the end of the verb complex. This rule is only applied in certain situations (such as when forming the perfect). For example, ("I have bought for..." with two direct objects):

│ H

ITS

├ H

Right Branch Delinking (RBD) is an application of the obligatory contour principle which causes an H tone spread from a subjectival concord to a verbal auxiliary infix or objectival concord immediately to the left of the verb stem to be removed (delinked) if the verb stem is an H stem. For example, ("They see"):

├├─┘ HH

RBD

│├─┘ HH

Left Branch Delinking (LBD) is an application of the "obligatory" contour principle which causes the H tone on the first syllable of an H verb stem to be delinked if the stem immediately follows an H toned subjectival concord, resulting in tonal pattern (HøH). This rule is idiolectical and is not applied by all Sesotho speakers.^[5] For example, ("They see..." when used with a direct object):

│─┘ HH

LBD

│┌─┘ HH

The Finality Restriction (FR) causes any H tones spread to the final syllable of the verb complex to be removed. This rule is not applied under all circumstances, and is never applied if the verb's stem is monosyllabic (that is, it never delinks the H tone on the verb stem's first syllable). It is also never applied when the verb is immediately followed a direct object (therefore it doesn't undo ITS, or the high tone copied to a disyllabic H verb's last syllable if it is immediately followed by an object).^[6] For example, ("I love" with no direct object):

├ H

│ H

Low Tone Assignment (LTA) is the very last rule applied and is always applied in all circumstances (not just when dealing with verbs). It simply assigns all unlinked segments (that is, segments with null tone) with an L tone. For example, ("She is looking on behalf of" with two direct objects):

├──┘ H

LTA

├──┘ HLL

Some examples

To construct many verb forms, including many positive indicative tenses without direct objects as well as infinitives, the following rules are applied in order:Note that the three main levels are always applied in this order, though the actual rules contained in the levels will change depending on the parts of speech, verb moods, etc. For the word pronounced as /[ʊ'ɑbinɑ]/ o a bina ('she is singing') the application of the rules is as follows:

The word appears on the surface with tonal pattern [''' ¯ _ ¯ _ '''].

Furthermore, the second last syllable of the word is lengthened (or "stressed"), and the interaction of the tones as well as the penultimate lengthening results in the word being pronounced with pitch levels .

Extending the word by one syllable (pronounced as /[ʊ'ɑbin̩t͡sʰɑ]/ o a bintsha 'She is conducting'):

The word appears on the surface with tonal pattern [''' ¯ _ ¯ ¯ _ '''] (the high beneath the third syllable is associated with two syllables).

The second last syllable of the word is lengthened and the interaction of the tones as well as the penultimate lengthening results in the word being pronounced with pitch levels .

Notes

That is, the tone-bearing unit (TBU) is basically the syllable. In general, to include languages with long vowels, one may say that the TBU of Bantu languages is the mora, and indeed, when dealing with stressed syllables, many descriptions of Sesotho tonology treat the TBU as the mora (that is, a long stressed syllable is analysed as two moras with different tones), but this is really unnecessary.
One could just as easily say that there are three underlying tone types—high (H [''' ¯ ''']), low (L [''' _ ''']), and null (ø)—and indeed many authors and researchers do. The truth is revealed by noting that all tonal rules work by only manipulating high tones, thus each syllable may be either attached to a high tone (H), or not attached at all (ø). A three tone model would at least require a rule that works exclusively on the L tones.
Doke & Vilakazi cites nine pitch levels (not counting contours) for isiZulu, while admitting that they may have overlooked some factors which could have superficially increased the actual number. Subsequent work on isiZulu tonology and depressor (breathy voiced) consonants suggests that the language, like Sesotho, may be fully described with only three or two basic tonemes.
The number may increase or decrease depending on how one counts etc, but there are only two contrastive tonemes in the language. The enumeration may be further complicated by considering the effects of downdrift and downstep.
There are numerous examples in rule-based linguistic models (such as autosegmental phonology) when the OCP is broken or only applied under some circumstances. For example, the fact that HTD causes the first two syllables of an H verb stem to be high is yet another "violation" of the OCP. Some Bantu languages also have a "Plateau rule" which changes tone pattern (HøH) to (HHH)—a process which actively creates a sequence that "violates" the OCP.
In a nutshell (under syntactic and/or Optimal Domains theory) the finality restriction prevents a high tone from being spread to the last syllable of the "Prosodic phrase" (though an underlying phrase-final high tone will be left alone). See Sesotho deficient verbs for a fuller explanation.

References

Cassimjee, F. and Kisseberth, C. W. 1998. Optimality Domains Theory and Bantu tonology: a case study from isiXhosa and Shingazidja. In Hyman L. M. & Kisseberth, C. W. (eds.), Theoretical Aspects of Bantu Tone, pp.33–132. Stanford, Calif.: CSLI.
Demuth, K. 1992. Acquisition of Sesotho. In D. Slobin (ed.), The Cross-Linguistic Study of Language Acquisition.
Demuth, K. 1995. Problems in the acquisition of tonal systems. In J. Archibald (ed.), The Acquisition of Non-Linear Phonology, pp.111–134. Hillsdale, N. J.: Lawrence Erlbaum Associates.
Demuth, K. In press. Sesotho speech acquisition. In S. McLeod (ed), The international guide to speech acquisition, pp.526–538. Clifton Park, NY: Thomas Delmar Learning.
Doke, C. M., and Mofokeng, S. M. 1974. Textbook of Southern Sotho Grammar. Cape Town: Longman Southern Africa, 3rd. impression. .
Doke, C.M., and Vilakazi, B.W. 1948. Zulu-English Dictionary. Johannesburg: Witwatersrand University Press. As cited in Schadeberg 1981.
Tucker, A. N. 1949. Sotho-Nguni orthography and tone marking. Bulletin of the School of Oriental and African Studies, pp.200–224. University of London, Vol. 13, No. 1. (1949)
Schadeberg, T.C. 1981. Tone in South African Bantu Dictionaries. In Journal of African Languages and Linguistics 3, pp.175–180.
Zerbian, S. 2006. High Tone Spread in the Sotho Verb. In J. Mugane et al. (eds.), Selected proceedings of the 35th Annual Conference on African Linguistics, 147-157. Somerville, MA: Cascadilla Proceedings Project.