Hindustani orthography explained

Hindustani (standardized Hindi and standardized Urdu) has been written in several different scripts. Most Hindi texts are written in the Devanagari script, which is derived from the Brāhmī script of Ancient India. Most Urdu texts are written in the Urdu alphabet, which comes from the Persian alphabet. Hindustani has been written in both scripts. In recent years, the Latin script has been used in these languages for technological or internationalization reasons. Historically, Kaithi script has also been used.

Devanagari script

Consonants

Hindi: क	Hindi: ख	Hindi: ग	Hindi: घ	Hindi: ङ
pronounced as /k/	pronounced as /kʰ/	pronounced as /ɡ/	pronounced as /ɡʱ/	pronounced as /ŋ/
Hindi: च	Hindi: छ	Hindi: ज	Hindi: झ	Hindi: ञ
pronounced as /t͡ʃ/	pronounced as /t͡ʃʰ/	pronounced as /d͡ʒ/	pronounced as /d͡ʒʱ/	pronounced as /ɲ/
Hindi: ट	Hindi: ठ	Hindi: ड	Hindi: ढ	Hindi: ण
pronounced as /ʈ/	pronounced as /ʈʰ/	pronounced as /ɖ/	pronounced as /ɖʱ/	pronounced as /ɳ/
Hindi: त	Hindi: थ	Hindi: द	Hindi: ध	Hindi: न
pronounced as /t̪/	pronounced as /t̪ʰ/	pronounced as /d̪/	pronounced as /d̪ʱ/	pronounced as /n/
Hindi: प	Hindi: फ	Hindi: ब	Hindi: भ	Hindi: म
pronounced as /p/	pronounced as /pʰ/	pronounced as /b/	pronounced as /bʱ/	pronounced as /m/
Hindi: य	Hindi: र	Hindi: ल	Hindi: व
pronounced as /j/	pronounced as /ɾ/	pronounced as /l/	pronounced as /ʋ/
Hindi: श	Hindi: ष	Hindi: स	Hindi: ह
pronounced as /ʃ/	pronounced as /ʂ/	pronounced as /s/	pronounced as /ɦ/

Modified consonants for non-native phonemes

Irregular ligatures:

क्ष (क and ष) kṣ is pronounced pronounced as //kʂ// or pronounced as //x// (Irregular shape, regular pronunciation)
त्र (त and र) tr is pronounced pronounced as //t̪ɾ// (Irregular shape, regular pronunciation)
ज्ञ (ज and ञ) jñ is pronounced pronounced as //ɡj// (Irregular shape and pronunciation)

Schwa deletion

The schwa (अ or 'ə', sometimes written 'a') implicit in each consonant of the Devanagari script is "obligatorily deleted" in Hindi at the end of words and in certain other contexts. This phenomenon has been termed the "schwa syncope rule" or the "schwa deletion rule" of Hindi. One formalization of this rule has been summarized as ə -> ø | VC_CV. In other words, when a vowel-preceded consonant is followed by a vowel-succeeded consonant, the schwa inherent in the first consonant is deleted. However, this formalization is inexact and incomplete (i.e. sometimes deletes a schwa when it shouldn't or, at other times, fails to delete it when it should), and can yield errors. Schwa deletion is computationally important because it is essential to building text-to-speech software for Hindi.

As a result of schwa syncope, the correct Hindi pronunciation of many words differs from that expected from a literal rendering of Devanagari. For instance, राम is Rām (incorrect: Rāma), रचना is Rachnā (incorrect: Rachanā), वेद is Véd (incorrect: Véda) and नमकीन is Namkeen (incorrect Namakeena).

Persian script

See main article: Urdu alphabet.

See also: Nastaʿlīq script. The Urdu alphabet is based on the Persian, which is an Arabic alphabet. Urdu is written from right to left, and most letters link together. This leads to variations in the form of a letter depending on its position in a word. Most vowels are omitted in generic texts, although they may be written for disambiguation or for pedagogical purposes. Urdu is primarily written in a calligraphic style of the script called Nasta'liq.

Letter	Name of letter	Transcription	IPA
Urdu: ا	alif	a, i, u	pronounced as //ə//, pronounced as //ɪ//, pronounced as //ʊ//
Urdu: آ	alif madda	ā	/ɑː/
Urdu: ب	be	b	pronounced as //b//
Urdu: پ	pe	p	pronounced as //p//
Urdu: ت	te	t	pronounced as //t̪//
Urdu: ٹ	ṭe	ṭ	pronounced as //ʈ//
Urdu: ث	se	s	pronounced as //s//
Urdu: ج	jīm	j	pronounced as //d͡ʒ//
Urdu: چ	che	ch	pronounced as //t͡ʃ//
Urdu: ح	baṛī he	h	pronounced as //h//
Urdu: خ	khe	kh	pronounced as //x//
Urdu: د	dāl	d	pronounced as //d̪//
Urdu: ڈ	ḍāl	ḍ	pronounced as //ɖ//
Urdu: ذ	zāl	dh	pronounced as //z//
Urdu: ر	re	r	pronounced as //r//
Urdu: ڑ	ṛe	ṛ	pronounced as //ɽ//
Urdu: ز	ze	z	pronounced as //z//
Urdu: ژ	zhe	zh	pronounced as //ʒ//
Urdu: س	sīn	s	pronounced as //s//
Urdu: ش	shīn	sh	pronounced as //ʃ//
Urdu: ص	su'ād	ṣ	pronounced as //s//
Urdu: ض	zu'ād	z̤	pronounced as //z//
Urdu: ط	to'e	t	pronounced as //t//
Urdu: ظ	zo'e	ẓ	pronounced as //z//
Urdu: ع	‘ain	'	pronounced as //ʔ// ^[1]
Urdu: غ	ghain	gh	pronounced as //ɣ//
Urdu: ف	fe	f	pronounced as //f//
Urdu: ق	qāf	q	pronounced as //q//
Urdu: ک	kāf	k	pronounced as //k//
Urdu: گ	gāf	g	pronounced as //ɡ//
Urdu: ل	lām	l	pronounced as //l//
Urdu: م	mīm	m	pronounced as //m//
Urdu: ن	nūn	n	pronounced as //n//
Urdu: ں	nūn ghunna	n	pronounced as //~//
Urdu: و	vā'o	v, o, or ū	pronounced as //ʋ//, pronounced as //oː//, pronounced as //ɔ// or pronounced as //uː//
Urdu: ہ	choṭī he	h	pronounced as //h//
Urdu: ھ	do chashmī he	h	pronounced as //ʰ//
Urdu: ء	hamza	'	pronounced as //ʔ//
Urdu: ی	ye	y, i	pronounced as //j// or pronounced as //iː//
Urdu: ے	barī ye	ai or e	pronounced as //ɛː//, or pronounced as //eː//

Latin script

See main article: Roman Urdu and Devanagari transliteration.

The Latin script has been used to write Hindustani for technological or internationalization reasons. Roman Hindi and Roman Urdu uses the basic Latin alphabet. It is most commonly used by young native speakers for technological applications, such as chat, emails and SMS.

ITRANS, ISCII, IAST (and the near-identical ISO 15919), and Harvard-Kyoto romanization schemes have been employed primarily for usage by non-native speakers who are more familiar with the Latin alphabet.

Transliteration

Transliteration between the three scripts can be complicated, particularly when transliterating between Devanagari and Persian scripts.^[2] One obstacle to this is that multiple different letters in one script often all correspond to the same letter in the other script.^[2] So, simple substitution often does not produce the correct spellings.

Urdu letters with similar sounds

Some sets of Urdu letters have matching sounds.^[2] ^[3]

4 letters Urdu: ز ذ ض ظ are all ≈ Z ^[2] ^[3]
3 letters Urdu: س ص ث are all ≈ S ^[2] ^[3]
2 letters Urdu: ت ط are both ≈ T ^[2] ^[3] (a third letter Urdu: ٹ is also often shown as English T, but is different to the other two Urdu letters, see
1. retroflex consonants
below.)
2 letters Urdu: ہ ح are both ≈ H ^[3] but are sometimes regarded as distinct.

Braille script

See main article: Hindi Braille and Urdu Braille.

Three braille alphabets are used: Hindi and Urdu braille in India, based on Bharati braille conventions, and Urdu Braille in Pakistan, based on Persian Braille conventions. Hindi Braille is an alphabet with a not written in some environments, while for Urdu Braille in Pakistan, it seems that vowels may be optional as they are in print.

Notes and References

Web site: Urdu Phonetic Inventory . www.cle.org.pk . Center for Language Engineering . 19 May 2020.
Jawaid . Bushra . Ahmed . Tafseer . Hindi to Urdu Conversion: Beyond Simple Transliteration . Proceedings of the Conference on Language & Technology 2009 . 2009 . 19 May 2020.
Web site: Bhatia . Tej K. . Khoul . Ashok . Koul . Ashok . Colloquial Urdu: The Complete Course for Beginners . Routledge . 19 May 2020 . en . 27 August 2015.