ISO basic Latin alphabet explained

The ISO basic Latin alphabet is an international standard (beginning with ISO/IEC 646) for a Latin-script alphabet that consists of two sets (uppercase and lowercase) of 26 letters, codified in various national and international standards and used widely in international communication. They are the same letters that comprise the current English alphabet. Since medieval times, they are also the same letters of the modern Latin alphabet. The order is also important for sorting words into alphabetical order.

The two sets contain the following 26 letters each:^[1]

ISO basic Latin alphabet
Uppercase letter set	A	B	C	D	E	F	G	H	I	J	K	L	M	N	O	P	Q	R	S	T	U	V	W	X	Y	Z
Lowercase letter set	a	b	c	d	e	f	g	h	i	j	k	l	m	n	o	p	q	r	s	t	u	v	w	x	y	z

History

By the 1960s it became apparent to the computer and telecommunications industries in the First World that a non-proprietary method of encoding characters was needed. The International Organization for Standardization (ISO) encapsulated the Latin script in their (ISO/IEC 646) 7-bit character-encoding standard. To achieve widespread acceptance, this encapsulation was based on popular usage. The standard was based on the already published American Standard Code for Information Interchange, better known as ASCII, which included in the character set the 26 × 2 letters of the English alphabet. Later standards issued by the ISO, for example ISO/IEC 8859 (8-bit character encoding) and ISO/IEC 10646 (Unicode Latin), have continued to define the 26 × 2 letters of the English alphabet as the basic Latin script with extensions to handle other letters in other languages.^[1]

Terminology

See main article: Basic Latin (Unicode block).

The Unicode block that contains the alphabet is called "C0 Controls and Basic Latin". Two subheadings exist:^[2]

"Uppercase Latin alphabet": the letters start at U+0041 and contain the string LATIN CAPITAL LETTER in their descriptions
"Lowercase Latin alphabet": the letters start at U+0061 and contain the string LATIN SMALL LETTER in their descriptions

There are also another two sets in the Halfwidth and Fullwidth Forms block:^[3]

Uppercase: the letters start at U+FF21 and contain the string FULLWIDTH LATIN CAPITAL LETTER in their descriptions
Lowercase: the letters start at U+FF41 and contain the string FULLWIDTH LATIN SMALL LETTER in their descriptions

Timeline for encoding standards

1865 International Morse Code was standardized at the International Telegraphy Congress in Paris, and was later made the standard by the International Telecommunication Union (ITU)
1950s Radiotelephony Spelling Alphabet by ICAO^[4]

Timeline for widely used computer codes supporting the alphabet

1963: ASCII (7-bit character-encoding standard from the American Standards Association, which became the American National Standards Institute in 1969)
1963/1964: EBCDIC (developed by IBM and supporting the same alphabetic characters as ASCII, but with different code values)
1965-04-30: Ratified by ECMA as ECMA-6^[5] based on work the ECMA's Technical Committee TC1 had carried out since December 1960.^[5]
1972: ISO 646 (ISO 7-bit character-encoding standard, using the same alphabetic code values as ASCII, revised in second edition ISO 646:1983 and third edition ISO/IEC 646:1991 as a joint ISO/IEC standard)
1983: ITU-T Rec. T.51 | ISO/IEC 6937 (a multi-byte extension of ASCII)
1987: ISO/IEC 8859-1:1987 (8-bit character encoding)
- Subsequently, other versions and parts of ISO/IEC 8859 have been published.
Mid-to-late 1980s: Windows-1250, Windows-1252, and other encodings used in Microsoft Windows (some roughly similar to ISO/IEC 8859-1)
1990: Unicode 1.0 (developed by the Unicode Consortium),^[6] ^[7] contained in the block "C0 Controls and Basic Latin" using the same alphabetic code values as ASCII and ISO/IEC 646
- Subsequently, other versions of Unicode have been published and it later became a joint ISO/IEC standard as well, as identified below.
1993: ISO/IEC 10646-1:1993, ISO/IEC standard for characters in Unicode 1.1
- Subsequently, other versions of ISO/IEC 10646-1 and one of ISO/IEC 10646-2 have been published. Since 2003, the standards have been published under the name "ISO/IEC 10646" without the separation into two parts.
1997: Windows Glyph List 4

Representation

In ASCII the letters belong to the printable characters and in Unicode since version 1.0 they belong to the block "C0 Controls and Basic Latin". In both cases, as well as in ISO/IEC 646, ISO/IEC 8859 and ISO/IEC 10646 they are occupying the positions in hexadecimal notation 41 to 5A for uppercase and 61 to 7A for lowercase.

Not case sensitive, all letters have code words in the ICAO spelling alphabet and can be represented with Morse code.

Usage

All of the lowercase letters are used in the International Phonetic Alphabet (IPA). In X-SAMPA and SAMPA these letters have the same sound value as in IPA.

Alphabets containing the same set of letters

See also: List of Latin-script alphabets. The list below only includes alphabets that include all the 26 letters but exclude:

letters whose diacritical marks make them distinct letters.
multigraphs that constitute distinct letters.
ligatures that are distinct letters.

Notable omissions due to these rules include Spanish, Esperanto, Filipino and German. The German alphabet is sometimes considered by tradition to contain only 26 letters (with (ä), (ö), (ü) considered variants and (ß) considered a ligature of (ſ) (long s) and (s)), but the current German orthographic rules include (ä), (ö), (ü), (ß) in the alphabet placed after (Z). In Spanish orthography, the letters (n) and (ñ) are distinct; the tilde is not considered a diacritic in this case.

Alphabet	Diacritic	Multigraphs (not constituting distinct letters)	Ligatures
Afrikaans alphabet	á, ä, é, è, ê, ë, í, î, ï, ó, ô, ö, ú, û, ü, ý	Digraphs ⟨aa⟩, ⟨ai⟩, ⟨ch⟩, ⟨ee⟩, ⟨ei⟩, ⟨eu⟩, ⟨gh⟩, ⟨ie⟩, ⟨nj⟩, ⟨ng⟩ ⟨oe⟩, ⟨oi⟩, ⟨oo⟩, ⟨ou⟩, ⟨sj⟩, ⟨tj⟩, ⟨ts⟩, ⟨ui⟩, ⟨uu⟩Trigraphs ⟨aai⟩, ⟨eeu⟩, ⟨oei⟩, ⟨ooi⟩	ŉ (Napostrophe)
Aragonese alphabet (Academia de l'Aragonés orthography)	á, é, í, ó, ú, ü, lꞏl	⟨ch⟩, ⟨gu⟩, ⟨ll⟩, ⟨ny⟩, ⟨qu⟩, ⟨rr⟩, ⟨tz⟩
Catalan alphabet	à, é, è, í, ï, ó, ò, ú, ü, ç, lꞏl	⟨gu⟩, ⟨ig⟩, ⟨ix⟩, ⟨ll⟩, ⟨ny⟩, ⟨qu⟩, ⟨rr⟩, ⟨ss⟩
Dutch alphabet	ä, é, è, ë, ï, ö, ü	The digraph ⟨ij⟩ is sometimes considered to be a separate letter. When that is the case, it usually replaces or is intermixed with ⟨y⟩. Other digraphs: ⟨aa⟩, ⟨ae⟩, ⟨ai⟩, ⟨au⟩, ⟨ch⟩, ⟨ee⟩, ⟨ei⟩, ⟨eu⟩, ⟨ie⟩, ⟨oe⟩, ⟨oi⟩, ⟨oo⟩, ⟨ou⟩, ⟨ui⟩, ⟨uu⟩
English alphabet		⟨sh⟩, ⟨ch⟩, ⟨ea⟩, ⟨ou⟩, ⟨th⟩, ⟨ph⟩, ⟨ng⟩	æ, œ (both archaic)
French alphabet	à, â, ç, é, è, ê, ë, î, ï, ô, ù, û, ü, ÿ	⟨ai⟩, ⟨au⟩, ⟨ei⟩, ⟨eu⟩, ⟨oi⟩, ⟨ou⟩, ⟨eau⟩, ⟨ch⟩, ⟨ph⟩, ⟨gn⟩, ⟨an⟩, ⟨am⟩, ⟨en⟩, ⟨em⟩, ⟨in⟩, ⟨im⟩, ⟨on⟩, ⟨om⟩, ⟨un⟩, ⟨um⟩, ⟨yn⟩, ⟨ym⟩, ⟨ain⟩, ⟨aim⟩, ⟨ein⟩, ⟨oin⟩, ⟨aî⟩, ⟨eî⟩	æ (rare), œ(mandatory)
Hmong Latin alphabet		⟨bh⟩, ⟨bl⟩, ⟨ch⟩, ⟨dh⟩, ⟨dl⟩, ⟨gh⟩, ⟨hl⟩, ⟨hm⟩, ⟨hn⟩, ⟨jh⟩, ⟨kh⟩, ⟨ml⟩, ⟨nc⟩, ⟨nq⟩, ⟨nr⟩, ⟨nt⟩, ⟨nx⟩, ⟨ny⟩, ⟨ph⟩, ⟨pl⟩, ⟨qh⟩, ⟨rh⟩, ⟨th⟩, ⟨ts⟩, ⟨tx⟩, ⟨xy⟩, ⟨bhl⟩, ⟨dhl⟩, ⟨hml⟩, ⟨hny⟩, ⟨nch⟩, ⟨ndl⟩, ⟨ngh⟩, ⟨nrh⟩, ⟨nth⟩, ⟨nxh⟩, ⟨phl⟩, ⟨tsh⟩, ⟨txh⟩, ⟨ndhl⟩
Italian alphabet (extended)	à, è, é, ì, î (formal), ò, ó, ù	⟨ch⟩, ⟨ci⟩, ⟨gh⟩, ⟨gi⟩, ⟨gl⟩, ⟨gli⟩, ⟨gn⟩, ⟨sc⟩, ⟨sci⟩
Ido alphabet		⟨qu⟩, ⟨ch⟩, ⟨sh⟩
Indonesian alphabet		⟨kh⟩, ⟨ng⟩, ⟨ny⟩, ⟨sy⟩, diphthongs: ⟨ai⟩, ⟨au⟩, ⟨ei⟩, ⟨oi⟩
Interlingua alphabet		⟨ch⟩, ⟨ph⟩, ⟨qu⟩, ⟨rh⟩, ⟨sh⟩
Javanese Latin alphabet	é, è	⟨dh⟩, ⟨kh⟩, ⟨ng⟩, ⟨ny⟩, ⟨sy⟩, ⟨th⟩
Latino sine flexione alphabet*		⟨ae⟩, ⟨ch⟩, ⟨oe⟩, ⟨ph⟩, ⟨qu⟩, ⟨rh⟩, ⟨th⟩ ^[8]
Luxembourgish alphabet	ä, é, ë	⟨aa⟩, ⟨ch⟩, ⟨ck⟩, ⟨ee⟩, ⟨ei⟩, ⟨ie⟩, ⟨ii⟩, ⟨ng⟩, ⟨oo⟩, ⟨ou⟩, ⟨qu⟩, ⟨ue⟩, ⟨uu⟩, ⟨sch⟩
Malay alphabet		⟨gh⟩, ⟨kh⟩, ⟨ng⟩, ⟨ny⟩, ⟨sy⟩
Portuguese alphabet		⟨ch⟩, ⟨lh⟩, ⟨nh⟩, ⟨rr⟩, ⟨ss⟩, ⟨am⟩, ⟨em⟩, ⟨im⟩, ⟨om⟩, ⟨um⟩, ⟨ãe⟩, ⟨ão⟩, ⟨õe⟩
Sundanese Latin alphabet	é	⟨eu⟩, ⟨ng⟩, ⟨ny⟩
Xhosa alphabet		⟨bh⟩, ⟨ch⟩, ⟨dl⟩, ⟨dy⟩, ⟨dz⟩, ⟨gc⟩, ⟨gq⟩, ⟨gr⟩, ⟨gx⟩, ⟨hh⟩, ⟨hl⟩, ⟨kh⟩, ⟨kr⟩, ⟨krh⟩, ⟨lh⟩, ⟨mh⟩, ⟨nc⟩, ⟨ng⟩, ⟨ngʼ⟩, ⟨ngc⟩, ⟨ngh⟩, ⟨ngq⟩, ⟨ngx⟩, ⟨nh⟩, ⟨nkc⟩, ⟨nkq⟩, ⟨nkx⟩, ⟨nq⟩, ⟨nx⟩, ⟨ny⟩, ⟨nyh⟩, ⟨ph⟩, ⟨qh⟩, ⟨rh⟩, ⟨sh⟩, ⟨th⟩, ⟨ths⟩, ⟨thsh⟩, ⟨ts⟩, ⟨tsh⟩, ⟨ty⟩, ⟨tyh⟩, ⟨wh⟩, ⟨xh⟩, ⟨yh⟩, ⟨zh⟩
Zulu alphabet		⟨bh⟩, ⟨ch⟩, ⟨dl⟩, ⟨dy⟩, ⟨gc⟩, ⟨gq⟩, ⟨gx⟩, ⟨hh⟩, ⟨hl⟩, ⟨kh⟩, ⟨kl⟩, ⟨mb⟩, ⟨nc⟩, ⟨ng⟩, ⟨ngc⟩, ⟨ngq⟩, ⟨ngx⟩, ⟨nj⟩, ⟨nk⟩, ⟨nq⟩, ⟨ntsh⟩, ⟨nx⟩, ⟨ny⟩, ⟨ph⟩, ⟨qh⟩, ⟨sh⟩, ⟨th⟩, ⟨ts⟩, ⟨tsh⟩, ⟨xh⟩

* Constructed languages

English is one of the few modern European languages requiring no diacritics for native words (although a diaeresis is used by some American publishers in words such as "coöperation").^[9]
Interlingua, a constructed language, never uses diacritics except in unassimilated loanwords. However, they can be removed if they are not used to modify the vowel (e.g. cafe, from fr|café).^[10]
Latino sine flexione, a.k.a. "Peano's Interlingua", allows but does not require the placement of an accent for unusual stress. (It antedates the other "Interlingua" by roughly four decades.)
Malay and Indonesian (based on Malay) use all the Latin alphabet and require no diacritics and ligatures. However, Malay and Indonesian learning materials may use ⟨é⟩ (E with acute) to clarify the pronunciation of the letter E; in that case, ⟨e⟩ is pronounced /ə/ while ⟨é⟩ is pronounced /e/ and (è) is pronounced /ɛ/. Many of the 700+ languages of Indonesia also use the Indonesian alphabet to write their languages, some—such as Javanese—adding diacritics é and è, and some omitting q, x, and z.
Xhosa is usually written without diacritics, but may optionally use diacritics over (a, e, i, o, u) for tones: (à, á, â, ä).

Column numbering

The Roman (Latin) alphabet is commonly used for column numbering in a table or chart. This avoids confusion with row numbers using Arabic numerals. For example, a 3-by-3 table would contain columns A, B, and C, set against rows 1, 2, and 3. If more columns are needed beyond Z (normally the final letter of the alphabet), the column immediately after Z is AA, followed by AB, and so on^[11] (see bijective base-26 system). This can be seen by scrolling far to the right in a spreadsheet program such as Microsoft Excel or LibreOffice Calc.

The letters are often used for indexing nested bullet points. In this case after the 26th it is more common to use AA, BB, CC, ... instead of base-26 numbers.

Notes and References

Web site: Internationalisation standardization of 7-bit codes, ISO 646. Trans-European Research and Education Networking Association (TERENA). 2010-10-03.
Web site: C0 Controls and Basic Latin . Unicode.org . 2016-08-08.
Web site: Halfwidth and Fullwidth Forms . Unicode.org . 2016-08-08.
Web site: The Postal History of ICAO. www.icao.int. 2019-02-17. February 12, 2019. https://web.archive.org/web/20190212211147/https://www.icao.int/secretariat/PostalHistory/annex_10_aeronautical_telecommunications.htm. dead.
Book: Standard ECMA-6: 7-Bit Coded Character Set . 5th . March 1985 . European Computer Manufacturers Association (Ecma) . Geneva, Switzerland . dead . 2016-05-29 . https://web.archive.org/web/20160529230908/http://www.ecma-international.org/publications/files/ECMA-ST-WITHDRAWN/ECMA-6%2C%205th%20Edition%2C%20March%201985.pdf . May 29, 2016 . The Technical Committee TC1 of ECMA met for the first time in December 1960 to prepare standard codes for Input/Output purposes. On April 30, 1965, Standard ECMA-6 was adopted by the General Assembly of ECMA. .
Web site: Unicode character database. The Unicode Standard. 2013-03-22.
Book: The Unicode Standard Version 1.0, Volume 1. 1990. Addison-Wesley Publishing Company, Inc.. 0-201-56788-1.
Not "letters", per: Web site: Simon . Ager . Simon Ager . Latino sine Flexione alphabet . Latino sine Flexione . . 2023-04-14.
Web site: The New Yorker's odd mark — the diaeresis. https://web.archive.org/web/20101216160024/http://dscriber.com/news/121-the-new-yorkers-odd-mark-the-diaeresis. dead. 16 December 2010. 16 December 2010.
Web site: 2020-09-21. Introduction al IED (in anglese). www.interlingua.com.
Web site: How To Switch From Letters to Numbers for Columns in Excel . Indeed . 21 November 2024.