ISO/IEC 646

ISO/IEC 646 Information technology — ISO 7-bit coded character set for information interchange, is an ISO/IEC standard in the field of character encoding. It is equivalent to the ECMA standard ECMA-6 and developed in cooperation with ASCII at least since 1964.^[1]^[2] The first version of ECMA-6 had been published in 1965,^[3] based on work the ECMA's Technical Committee TC1 had carried out since December 1960.^[3] The first edition of ISO/IEC 646 was published in 1973, and the most recent, third, edition in 1991.

Quick facts Standard, Classification ...

ISO/IEC 646 encoding family
ISO/IEC 646 Invariant. Red looped squares (⌘) denote national code points. Other red characters are changed in noteworthy minor modifications.
Standard	ISO/IEC 646, ITU T.50
Classification	7-bit Basic Latin encoding
Preceded by	US-ASCII
Succeeded by	ISO/IEC 8859, ISO/IEC 10646
Other related encodings	DEC NRCS, World System Teletext Adaptations to other alphabets: ELOT 927, Symbol, KOI-7, SRPSCII and MAKSCII, ASMO 449, SI 960

ISO/IEC 646 specifies a 7-bit character code from which several national standards are derived. It allocates a set of 82 unique graphic characters to 7-bit code points, known as the invariant^[4] (INV) or basic character set,^[5] including letters of the ISO basic Latin alphabet, digits, and some common English punctuation. It leaves 12 code points to be allocated by conforming national standards for additional letters of Latin-based alphabets or other symbols.

It also defines the International Reference Version (IRV), including a full allocation of 94 graphic characters, to be used when a specific national version is not required. As of the 1991 edition of ISO/IEC 646, the IRV and ASCII are identical. Previous editions differed in only one or two code points.

Remove ads

History

Summarize

Perspective

Early ASCII (ASA X3.4:1963)

ISO/IEC 646 and its predecessor ASCII (ASA X3.4) largely endorsed existing practice regarding character encodings in the telecommunications industry.

US-ASCII, or ISO/IEC 646:US

As ASCII did not provide a number of characters needed for languages other than English, a number of national variants were made that substituted some less-used characters with needed ones. Due to the incompatibility of the various national variants, an International Reference Version (IRV) of ISO/IEC 646 was introduced, in an attempt to at least restrict the replaced set to the same characters in all variants. The original version (ISO 646 IRV) differed from ASCII only in that code point 0x24, ASCII's dollar sign $ was replaced by the international currency symbol ¤. The final 1991 version of the code ISO/IEC 646:1991 is also known as ITU T.50, International Reference Alphabet or IRA, formerly International Alphabet No. 5 (IA5). This standard allows users to exercise the 12 variable characters (i.e., two alternative graphic characters and 10 national defined characters). Among these exercises, ISO 646:1991 IRV (International Reference Version) is explicitly defined and identical to ASCII.^[6]

The ISO/IEC 8859 series of standards governing 8-bit character encodings supersede the ISO/IEC 646 international standard and its national variants, by providing 96 additional characters with the additional bit and thus avoiding any substitution of ASCII codes. The ISO/IEC 10646 standard, directly related to Unicode, supersedes all of the ISO646 and ISO/IEC 8859 sets with one unified set of character encodings using a larger 21-bit value.

A legacy of ISO/IEC 646 is visible on Windows, where in many East Asian locales the backslash character used in filenames is rendered as ¥ or other characters such as ₩. Despite the fact that a different code for ¥ was available even on the original IBM PC's code page 437, and a separate double-byte code for ¥ is available in Shift JIS (although this often uses alternative mapping), so much text was created with the backslash code used for ¥ (due to Shift_JIS being officially based on ISO 646:JP, although Microsoft maps it as ASCII) that even modern Windows fonts have found it necessary to render the code that way. A similar situation exists with ₩ and EUC-KR. Another legacy is the existence of trigraphs in the C programming language.

Published standards

ECMA-6 (1965-04-30), first edition (withdrawn)^[3]
ISO/R646-1967 (withdrawn),^[7] or ECMA-6 (1967-06), second edition (withdrawn)^[7]^[3]
ECMA-6 (1970-07), third edition (withdrawn)^[3]^[8]
ISO 646:1972 (withdrawn), or ECMA-6 (1973-08), fourth edition (withdrawn)^[3]^[8]
ISO 646:1983 (withdrawn),^[9] or ECMA-6 (1984-12, 1985-03), fifth edition (withdrawn)^[3]
ITU-T Recommendation T.50 IA5 (1988-11-25) (withdrawn),^[10]^[11] or ISO/IEC 646:1991 (in force),^[12]^[13] or ECMA-6 (1991-12, 1997-08), sixth edition (in force)^[12]
ITU-T Recommendation T.50 IRA (1992-09-18) (in force)^[10]^[14]

Remove ads

Code page layout

Summarize

Perspective

The following table shows the ISO/IEC 646:1991 International Reference Version character set. Each character is shown with its Unicode equivalent. Code points open for substitution in national variants are shown with a grey background. Yellow background indicates a character that, in some variants, could be combined with a previous character as a diacritic using the backspace character, which may affect glyph choice.

In addition to the invariant set restrictions, 0x23 is restricted to be either # or £ and 0x24 is restricted to be either $ or ¤.^[12] However, these restrictions are not followed by all national variants.^[15]^[16]

ISO/IEC 646:1991 IRV
	0	1	2	3	4	5	6	7	8	9	A	B	C	D	E	F
0x	NUL	SOH	STX	ETX	EOT	ENQ	ACK	BEL	BS	HT	LF	VT	FF	CR	SO	SI
1x	DLE	DC1	DC2	DC3	DC4	NAK	SYN	ETB	CAN	EM	SUB	ESC	FS	GS	RS	US
2x	SP	!	"	#	$	%	&	'	(	)	*	+	,	-	.	/
3x	0	1	2	3	4	5	6	7	8	9	:	;	<	=	>	?
4x	@	A	B	C	D	E	F	G	H	I	J	K	L	M	N	O
5x	P	Q	R	S	T	U	V	W	X	Y	Z	[	\	]	^	_
6x	`	a	b	c	d	e	f	g	h	i	j	k	l	m	n	o
7x	p	q	r	s	t	u	v	w	x	y	z	{	\|	}	~	DEL

Remove ads

Composite Graphic Characters

Summarize

Perspective

According to ISO/IEC 646, every graphic character must be a spacing character; that is, it must advance the character position forward. As a result, non-spacing combining characters are not permitted in any national version. This is in contrast to later standards such as ISO/IEC 2022 and ISO/IEC 10646 which permit or include combining characters.

Several spacing characters can be used as diacritical marks, when preceded or followed with a backspace C0 control to create accented letters, referred to as composite graphic characters in the standard. For example, the sequence E <BS> ' may be used to image the character É. This encoding method originated in the typewriter/teletype era when use of backspace would overstrike a glyph, and may be considered deprecated.

This method is attested in the code charts for the IRV, as well as the GB, FR1, CA, and CA2 national versions, which note that ", ', ,, and ^ may behave as the diaeresis, acute accent, cedilla, and circumflex (rather than quotation marks, a comma, and an upward arrowhead), respectively, when preceded or followed by a backspace. The current PL-2002 standard explicitly directs the use of the backspace and apostrophe to form Polish letters with an acute accent. Some editions of ISO/IEC 646 also suggest that the solidus / may be used with the equal sign = to compose the not equal sign, ≠, and that the underscore _ may be used to effect underlined text. The tilde character ~ was similarly introduced as a diacritic ˜, although the standard is silent about its use.

Later, when wider character sets gained more acceptance, ISO/IEC 8859, vendor-specific character sets and eventually Unicode became the preferred methods of coding accented letters.

Variant codes and descriptions

Summarize

Perspective

ISO/IEC 646 national variants

Some national variants of ISO/IEC 646 are as follows:

More information Version Code, Registered Escape Sequence ...

Version Code^[a]	ISO-IR	Registered Escape Sequence	Standard	Description
CA	121	ESC 2/8 7/7	CSA Z243.4-1985-1	Canada (No. 1 alternative, with "î") (French, classical) (Code page 1020^[17])
CA2	122	ESC 2/8 7/8	CSA Z243.4-1985-2	Canada (No. 2 alternative, with "É") (French, reformed orthography)
CN	57^[18]	ESC 2/8 5/4	GB/T 1988-80	People's Republic of China (Basic Latin)
CU	151	ESC 2/8 2/1 4/1	NC 99-10:81 / NC NC00-10:81	Cuba (Spanish)
DANO	9-1^[19]	ESC 2/8 4/5^[19]	NATS-DANO (SIS)	Norway and Denmark (journalistic texts). Invariant code point 0x22 is displayed as `«`, (compare `"` in the IRV). It is, however, still considered a double quotation mark.^[20] Accompanies SEFI (NATS-SEFI).
DE	21^[19]^[18]	ESC 2/8 4/11^[19]	DIN 66003	Germany (German) (Code page 1011,^[21] 20106^[22]^[23]^[24])
DK	—	—	DS 2089^[25]^[26]	Denmark (Danish) (Code page 1017^[27])
ES	17^[19]	ESC 2/8 5/10^[19]	Olivetti	Spanish (international) (Code page 1023^[28])
ES2	85^[18]	ESC 2/8 6/8	IBM	Spain (Basque, Castilian, Catalan, Galician) (Code page 1014^[29])
FI	10^[18]		SFS 4017	Finland (basic version) (Code page 1018^[30])
FR	69^[18]	ESC 2/8 6/6	AFNOR NF Z 62010-1982	France (French) (Code page 1010^[31])
FR1	25^[19]^[18]	ESC 2/8 5/2^[19]	AFNOR NF Z 62010-1973	France (obsolete since April 1985) (Code page 1104^[32])
GB	4^[19]^[18]	ESC 2/8 4/1^[19]	BS 4730	United Kingdom (English) (Code page 1013^[33])
HU	86	ESC 2/8 6/9	MSZ 7795-3:1984	Hungary (Hungarian)
IE	207	ESC 2/8 2/1 4/3	I.S. 433:1996	Ireland (Irish)
INV	170	ESC 2/8 2/1 4/2	ISO 646:1983	Invariant subset
(IRV)	2^[19]^[18]	ESC 2/8 4/0^[19]	ISO 646:1973	International Reference Version. 0x7E as an overline (ISO-IR-002).^[34]
—	—	ISO 646:1983	International Reference Version. 0x7E as a tilde (Code page 1009,^[35] 20105^[22]^[23]^[36]).
ISO 646:1991 International Reference Version matches the US variant (see below).
IS	—	—	—	Iceland (Icelandic) De facto standard, proposed in 1978 but never formally approved.
IT	15^[19]^[18]	ESC 2/8 5/9^[19]	UNI 0204-70 / Olivetti?	Italian (Code page 1012^[37])
JP	14^[19]^[18]	ESC 2/8 4/10^[19]	JIS C 6220:1969-ro	Japan (Romaji) (Code page 895^[38]). Also used as an 8-bit code with the corresponding Katakana supplementary set.
JP-OCR-B	92	ESC 2/8 6/14	JIS C 6229-1984-b	Japan (OCR-B)
KR	—	—	KS C 5636-1989	South Korea
MT	—	—	?	Malta (Maltese, English)
NL	—	—	IBM	Netherlands (Dutch) (Code page 1019^[39])
NO	60^[18]	ESC 2/8 6/0	NS 4551 version 1^[18]	Norway (Code page 1016^[40])
NO2	61^[18]	ESC 2/8 6/1	NS 4551 version 2^[18]	Norway (obsolete since June 1987) (Code page 20108^[22]^[23]^[41])
PL-2002	—	—	PN-I-10050:2002^[42]	Poland (current as of 2025) Set for writing Polish. Includes the Euro sign.
PL-ZU0	—	—	PN-T-42109-02:1984^[43]	Poland (withdrawn in 2000) Set named "ZU0" for writing Polish.
PT	16^[18]	ESC 2/8 4/12	Olivetti	Portuguese (international)
PT2	84^[18]	ESC 2/8 6/7	IBM	Portugal (Portuguese, Spanish) (Code page 1015^[44])
SE	10^[19]^[18]	ESC 2/8 4/7^[19]	SEN 850200 Annex B, SIS 63 61 27	Sweden (basic Swedish) (Code page 1018,^[30] D47)
SE2	11^[19]^[18]	ESC 2/8 4/8^[19]	SEN 850200 Annex C, SIS 63 61 27	Sweden (extended Swedish for names) (Code page 20107,^[22]^[23]^[45] E47)
SEFI	8-1^[19]	ESC 2/8 4/3^[19]	NATS-SEFI (SIS)	Sweden and Finland (journalistic texts). Accompanies DANO (NATS-DANO).
T.61-7bit	102	ESC 2/8 7/5	ITU/CCITT T.61 Recommendation	International (Teletex). Also used with the corresponding supplementary set as an 8-bit code.
TW	—		CNS 5205-1996	Republic of China (Taiwan)
US / (IRV)	6^[19]^[18]	ESC 2/8 4/2^[19]	ANSI X3.4-1968 and ISO 646:1983 (also IRV in ISO/IEC 646:1991)	United States (ASCII, Code page 367,^[46] 20127^[22]^[23]^[47])
YU	141	ESC 2/8 7/10	JUS I.B1.002 (YUSCII)	former Yugoslavia (Croatian, Slovene, Serbian, Bosnian)
INIS	49	ESC 2/8 5/7	INIS (IAEA)	ISO 646 IRV subset

National derivatives

Some national character sets also exist which are based on ISO/IEC 646 but do not strictly follow its invariant set (see also § Derivatives for other alphabets):

More information Character set, Registered Escape Sequence ...

Character set	ISO-IR	Registered Escape Sequence	Standard	Description
BS_viewdata	47	ESC 2/8 5/6	British Post Office	Viewdata and Teletext. Viewdata square (⌗) substituted for normally invariant underscore (_) which cannot be displayed on the target hardware.^[48] This is actually the encoding of Microsoft's WST_Engl.
GR / greek7	88	ESC 2/8 6/10	HOS ELOT 927	Greece (withdrawn in November 1986). Uses Greek letters in place of Roman ones^[49] and hence is not strictly speaking an ISO 646 variant.
greek7-old	18	ESC 2/8 5/11	?	Greek graphic set. Similar in concept to greek7, but uses a different mapping of letters. Also, the upper case follows the lower case.
Latin-Greek	19	ESC 2/8 5/12	?	Latin-Greek combined graphics (capitals only). Follows greek7-old, but includes Latin capitals without modification, and Greek capitals over the Latin lower case.
Latin-Greek-1	27^[19]	ESC 2/8 5/5^[19]	Honeywell-Bull	Latin-Greek mixed graphics (Greek capitals only).^[19] Visually unifies Greek capitals with Latin capitals where possible, and adds the remaining Greek capitals. Unlike the other Greek versions, all Basic Latin letters remain intact. Replaces invariant punctuation as well as national characters, however,^[50] and hence is still not strictly speaking an ISO 646 variant.
CH7DEC	—	—	DEC	Switzerland (French, German) (Code page 1021^[51]) Invariant code point 0x5F is changed from `_` to `è`. Is a DEC NRCS variant, closely related to ISO 646, but lacks a fully ISO 646 compliant equivalent.
PL-ZU1	—	—	PN-⁠T⁠-⁠42109-02^[43]	Poland (withdrawn in 2000) Set named "ZU1" intended for use with ODRA 1300 mainframes. These use the same character set as ICT 1900 mainframes, which was based on a 1963 proposed version of ASCII prior to its standardization.
TR7DEC	—	—	DEC	A 7-bit set for writing Turkish, available on some DEC terminals and printing equipment.^[52] It is not referred to as a NRCS in DEC's documentation, but is mentioned separately. Invariant code point 0x21 is changed from `!` to `ı`, and 0x26 is changed from `&` to `ğ`.

Control characters

All the variants listed above are solely graphical character sets, and are to be used with a C0 control character set such as listed in the following table:

More information Description ...

ISO-IR	ISO ESC	Description
1^[19]	ESC 2/1 4/0^[19]	ISO 646 controls^[19] ("ASCII controls")
7^[19]	ESC 2/1 4/1^[19]	Scandinavian newspaper (NATS) controls^[19]
26^[19]	ESC 2/1 4/3^[19]	IPTC controls^[19]

Associated supplementary character sets

The following table lists supplementary graphical character sets defined by the same standard as specific ISO/IEC 646 variants. These would be selected by using a mechanism such as shift out or the NATS super shift (single shift),^[53] or by setting the eighth bit in environments where one was available:

More information National Standard, Description ...

ISO-IR	ISO/IEC ESC	National Standard	Description
8-2^[19]	ESC 2/8 4/4^[19]	NATS-SEFI-ADD	Supplementary code used with NATS-SEFI.
9-2^[19]	ESC 2/8 4/6^[19]	NATS-DANO-ADD	Supplementary code used with NATS-DANO.
13^[19]^[18]	ESC 2/8 4/9^[19]	JIS C 6220:1969-jp	Katakana, used as a supplementary code with ISO-646-JP.
103	ESC 2/8 7/6	ITU/CCITT T.61 Recommendation, Supplementary Set	Supplementary code used with T.61.
—	—	PN⁠-⁠T⁠-⁠42109-03:1986^[54]	(withdrawn in 2000) Set named "ZU2" for writing Polish. Contains all letters used in Polish, including the uppercase letters missing from ZU0. Intended to be used as a supplementary set with either the IRV, ZU0, or ZU1 as the primary set.

Remove ads

Variant comparison chart

Summarize

Perspective

The specifics of the changes for some of these variants are given in the following table. Character assignments unchanged across all listed variants (i.e. which remain the same as ASCII) are not shown.

For ease of comparison, variants detailed include national variants of ISO/IEC 646, DEC's closely related National Replacement Character Set (NRCS) series used on VT200 terminals, the related European World System Teletext encoding series defined in ETS 300 706, and a few other closely related encodings based on ISO/IEC 646. Individual code charts are linked from the second column. The cells with non-white background emphasize the differences from US-ASCII (also the Basic Latin subset of ISO/IEC 10646 and Unicode).

More information Version Code, Code Chart ...

Version Code^[a]	Code Chart	Characters for each ISO 646 / NRCS compatible or derived charset
US / IRV (1991)	ISO-IR-006^[55]	!	"	#	$	&	:	?	@	[	\	]	^	_	`	{	\|	}	~
Older International Reference Versions
IRV (1973)	ISO-IR-002^[34]	!	"	#	¤	&	:	?	@	[	\	]	^	_	`	{	\|	}	‾
IRV (1983)	CP01009^[56]	!	"	#	¤	&	:	?	@	[	\	]	^	_	`	{	\|	}	~
Invariant and other IRV subsets
INV	ISO-IR-170^[5]	!	"			&	:	?						_
INV (NRCS)^[b]	---	!	"		$	&	:	?
INV (Teletext)^[b]	ETS WST^[57]	!	"			&	:	?
INIS Subset^[b]	ISO-IR-049^[58]				$		:			[		]					\|
T.61	ISO-IR-102^[59]	!	"	#	¤	&	:	?	@	[		]		_			\|
East Asian
JP	ISO-IR-014^[60]	!	"	#	$	&	:	?	@	[	¥	]	^	_	`	{	\|	}	‾
JP-OCR-B	ISO-IR-092^[61]	!	"	#	$	&	:	?	@	[	¥	]	^	_		{	\|	}
KR	(KS X 1003)^[62]	!	"	#	$	&	:	?	@	[	₩	]	^	_	`	{	\|	}	‾
CN	ISO-IR-057^[16]	!	"	#	¥	&	:	?	@	[	\	]	^	_	`	{	\|	}	‾
TW	(CNS 5205)^[62]	!	"	#	$	&	:	?	@	[	\	]	^	_	`	{	\|	}	‾
British and Irish
GB	ISO-IR-004^[63]	!	"	£	$	&	:	?	@	[	\	]	^	_	`	{	\|	}	‾
GB (NRCS)	CP01101^[64]	!	"	£	$	&	:	?	@	[	\	]	^	_	`	{	\|	}	~
Viewdata^[c]^[d]	ISO-IR-047^[48]	!	"	£	$	&	:	?	@	←	½	→	↑	⌗	―	¼	‖	¾	÷
IE	ISO-IR-207^[65]	!	"	£	$	&	:	?	Ó	É	Í	Ú	Á	_	ó	é	í	ú	á
Italophone or Francophone
IT^[e]	ISO-IR-015^[66]	!	"	£	$	&	:	?	§	°	ç	é	^	_	ù	à	ò	è	ì
IT (Teletext)^[d]	ETS WST^[67]	!	"	£	$	&	:	?	é	°	ç	→	↑	⌗	ù	à	ò	è	ì
FR	ISO-IR-069^[68]	!	"	£	$	&	:	?	à	°	ç	§	^	_	µ	é	ù	è	¨
FR1 ^[e]	ISO-IR-025^[69]	!	"	£	$	&	:	?	à	°	ç	§	^	_	`	é	ù	è	¨
FR Teletext^[d]	ETS WST^[67]	!	"	é	ï	&	:	?	à	ë	ê	ù	î	⌗	è	â	ô	û	ç
CA^[e]	ISO-IR-121^[70]	!	"	#	$	&	:	?	à	â	ç	ê	î	_	ô	é	ù	è	û
CA2	ISO-IR-122^[71]	!	"	#	$	&	:	?	à	â	ç	ê	É	_	ô	é	ù	è	û
Francophone-Germanophone
CH (NRCS)^[d]	CP01021^[72]	!	"	ù	$	&	:	?	à	é	ç	ê	î	è	ô	ä	ö	ü	û
Germanophone
DE^[e]^[f]	ISO-IR-021^[73]	!	"	#	$	&	:	?	§	Ä	Ö	Ü	^	_	`	ä	ö	ü	ß
Nordic (Eastern) and Baltic
FI / SE	ISO-IR-010^[74]	!	"	#	¤	&	:	?	@	Ä	Ö	Å	^	_	`	ä	ö	å	‾
SE2^[f]	ISO-IR-011^[75]	!	"	#	¤	&	:	?	É	Ä	Ö	Å	Ü	_	é	ä	ö	å	ü
SE (NRCS)	CP01106^[76]	!	"	#	$	&	:	?	É	Ä	Ö	Å	Ü	_	é	ä	ö	å	ü
FI (NRCS)	CP01103^[77]	!	"	#	$	&	:	?	@	Ä	Ö	Å	Ü	_	é	ä	ö	å	ü
SEFI (NATS)^[g]	ISO-IR-008-1^[78]	!	"	#	$	&	:	?		Ä	Ö	Å	■	_		ä	ö	å	–
EE (Teletext)^[d]	ETS WST^[67]	!	"	#	õ	&	:	?	Š	Ä	Ö	Ž	Ü	Õ	š	ä	ö	ž	ü
LV / LT (Teletext)^[d]	ETS WST^[67]	!	"	#	$	&	:	?	Š	ė	ę	Ž	č	ū	š	ą	ų	ž	į
Nordic (Western)
DK	CP01017^[79]	!	"	#	¤	&	:	?	@	Æ	Ø	Å	Ü	_	`	æ	ø	å	ü
DK/NO (NRCS)	CP01105^[80]	!	"	#	$	&	:	?	Ä	Æ	Ø	Å	Ü	_	ä	æ	ø	å	ü
DK/NO-alt (NRCS)	CP01107^[81]	!	"	#	$	&	:	?	@	Æ	Ø	Å	^	_	`	æ	ø	å	~
NO	ISO-IR-060^[82]	!	"	#	$	&	:	?	@	Æ	Ø	Å	^	_	`	æ	ø	å	‾
NO2	ISO-IR-061^[15]	!	"	§	$	&	:	?	@	Æ	Ø	Å	^	_	`	æ	ø	å	\|
DANO (NATS)^[g]^[h]	ISO-IR-009-1^[20]	!	«	»	$	&	:	?		Æ	Ø	Å	■	_		æ	ø	å	–
IS	[proposed]^[83]	!	"	#	¤	&	:	?	Ð	Þ	´ ^[i]	Æ	Ö	_	ð	þ	´ ^[i]	æ	ö
Hispanophone
ES^[e]	ISO-IR-017^[84]	!	"	£	$	&	:	?	§	¡	Ñ	¿	^	_	`	°	ñ	ç	~
ES2	ISO-IR-085^[85]	!	"	#	$	&	:	?	·	¡	Ñ	Ç	¿	_	`	´	ñ	ç	¨
CU	ISO-IR-151^[86]	!	"	#	¤	&	:	?	@	¡	Ñ	]	¿	_	`	´	ñ	[	¨
Hispanophone-Lusophone
ES/PT Teletext^[d]	ETS WST^[67]	!	"	ç	$	&	:	?	¡	á	é	í	ó	ú	¿	ü	ñ	è	à
Lusophone
PT	ISO-IR-016^[87]	!	"	#	$	&	:	?	§	Ã	Ç	Õ	^	_	`	ã	ç	õ	°
PT2	ISO-IR-084^[88]	!	"	#	$	&	:	?	´	Ã	Ç	Õ	^	_	`	ã	ç	õ	~
PT (NRCS)	---	!	"	#	$	&	:	?	@	Ã	Ç	Õ	^	_	`	ã	ç	õ	~
Greek
Latin-GR mixed^[d]	ISO-IR-027^[50]	Ξ	"	Γ	¤	&	Ψ	Π	Δ	Ω	Θ	Φ	Λ	Σ	`	{	\|	}	‾
ISO-IR-088 (GR / ELOT 927), ISO-IR-018 and ISO-IR-019 replace Roman letters with Greek letters and are detailed in a separate chart.
Slavic (Latin script)
YU	ISO-IR-141^[89]	!	"	#	$	&	:	?	Ž	Š	Đ	Ć	Č	_	ž	š	đ	ć	č
YU Teletext^[d]	ETS WST^[67]	!	"	#	Ë	&	:	?	Č	Ć	Ž	Đ	Š	ë	č	ć	ž	đ	š
YU-alt Teletext^[d]	ETS WST^[67]	!	"	#	$	&	:	?	Č	Ć	Ž	Đ	Š	ë	č	ć	ž	đ	š
CS/CZ/SK (Teletext)^[d]	ETS WST^[67]	!	"	#	ů	&	:	?	č	ť	ž	ý	í	ř	é	á	ě	ú	š
PL-2002	PN⁠-⁠I-⁠10050^[42]	!	"	#	$	&	:	?	@	Ą	Ę	Ł	Ż	_	€	ą	ę	ł	ż
PL-ZU0	PN⁠-⁠T⁠-⁠42109-02^[43]	!	"	#	¤	&	:	?	ę	ź	Ł	ń	ś	_	ą	ó	ł	ż	ć
PL-ZU1^[d]	PN⁠-⁠T⁠-⁠42109-02^[43]	!	"	#	£	&	:	?	@	[	$	]	↑	←	_
PL-ZU2^[j]^[d]	PN⁠-⁠T⁠-⁠42109-03^[54]	!	"	Ę	Ć	Ż	:	?	ę	ź	Ł	ń	ś	_	ą	ó	ł	ż	ć
PL Teletext^[d]	ETS WST^[67]	!	"	#	ń	&	:	?	ą	Ƶ	Ś	Ł	ć	ó	ę	ż	ś	ł	ź
Adaptations for the Cyrillic script replace Roman letters and are detailed in a separate chart
Other
NL	CP01019^[90]	!	"	#	$	&	:	?	@	[	\	]	^	_	`	{	\|	}	‾
NL NRCS	CP01102^[91]	!	"	£	$	&	:	?	¾	ĳ	½	\|	^	_	`	¨	ƒ	¼	´
HU	ISO-IR-086^[92]	!	"	#	¤	&	:	?	Á	É	Ö	Ü	^	_	á	é	ö	ü	˝
MT	CP03041^[93]	!	"	#	$	&	:	?	@	ġ	ż	ħ	^	_	ċ	Ġ	Ż	Ħ	Ċ
RO (Teletext)^[d]	ETS WST^[67]	!	"	#	¤	&	:	?	Ţ	Â	Ş	Ă	Î	ı	ţ	â	ş	ă	î
TR (DEC)^[d]	DEC^[52]	ı	"	#	$	ğ	:	?	İ	Ş	Ö	Ç	Ü	_	Ğ	ş	ö	ç	ü
TR (Teletext)^[d]	ETS WST^[67]	!	"	TL	ğ	&	:	?	İ	Ş	Ö	Ç	Ü	Ğ	ı	ş	ö	ç	ü

[a]
The short code used in these tables is taken from the last part of the "ISO646-xx" alias in the IANA charset registry, if one exists. For charsets not registered with IANA, reasonable short names has been chosen for the purposes of this article. There is no standardized naming scheme that encompasses all ISO/IEC 646-related charsets.
[b]
Is a subset of one of the International Reference Versions of ISO 646, but does not include all characters which are present in the invariant set. Included for comparison.
[c]
Also UK Teletext.
[d]
Does not completely conform to ISO/IEC 646, but is a closely related derivative. Included here for comparison.
[e]
Corresponding DEC NRC set also exists and is identical to the ISO/IEC 646 national version.
[f]
Corresponding WST national option also exists and is identical to the ISO/IEC 646 national version
[g]
The NATS charsets replace @ (0x40) and ` (0x60) with "Unit space A" (UA) and "Unit space B" (UB). The plain space (0x20) expands on justification. UA and UB are for fixed widths, UA must be at least as wide as UB. RFC 1345 maps UA and UB to ISO 10646 (UCS) code points U+E002 and U+E003, both in the Private Use Area, respectively (although it also lists PUA mappings for several other characters which now have UCS code points). Unicode contains a number of space characters which might approximately correspond.
[h]
Conformance to the ISO 646 invariant set is questionable, but it is a closely related derivative of ISO 646. Included here for comparison.
[i]
The characters at 0x5C and 0x7C in the Icelandic set are both the acute accent. The first is intended for use with uppercase letters, the second with lowercase letters.
[j]
In addition to the replacements shown here, ZU2 also replaces ' (0x27) with Ó, * (0x2A) with Ź, ; (0x3B) with Ą, < (0x3C) with Ń, and > (0x3E) with Ś. This completes coverage of the Polish alphabet in uppercase and lowercase.

Remove ads

Related encoding families

Summarize

Perspective

National Replacement Character Set

The National Replacement Character Set (NRCS) is a family of 7-bit encodings introduced in 1983 by DEC with the VT200 series of computer terminals. It is closely related to ISO/IEC 646, being based on a similar invariant subset of ASCII, differing in retaining $ as invariant but not _. All NRCS variants except Swiss retain _ in its ASCII position, and are therefore in conformance with ISO/IEC 646. Several NRCS variants are identical to ISO/IEC 646 variants, and others are very similar, with the exception of the Dutch variant.

World System Teletext

The European telecommunications standard ETS 300 706, "Enhanced Teletext specification", defines Latin, Greek, Cyrillic, Arabic, and Hebrew code sets with several national variants for both Latin and Cyrillic.^[67] Like NRCS and ISO/IEC 646, within the Latin variants, the family of encodings known as the G0 set are based on a similar invariant subset of ASCII, but do not retain either $ nor _ as invariant. Unlike NRCS, variants often differ considerably from corresponding national ISO/IEC 646 variants.

HP

HP has code page 1054, which adds the medium shade (▒, U+2592) at 0x7F.^[94] Code page 1052 replaces a few ASCII characters from code page 1054.^[95]

Code page 1052
	0	1	2	3	4	5	6	7	8	9	A	B	C	D	E	F
2x	SP	!	″	#	$	%	&	′	(	)	*	+	,	-	.	/
3x	0	1	2	3	4	5	6	7	8	9	:	;	‗	=	¢	?
4x	@	A	B	C	D	E	F	G	H	I	J	K	L	M	N	O
5x	P	Q	R	S	T	U	V	W	X	Y	Z	[	®	]	©	_
6x	°	a	b	c	d	e	f	g	h	i	j	k	l	m	n	o
7x	p	q	r	s	t	u	v	w	x	y	z	§	¶	†	™	▒

Differences from ASCII

Remove ads

Derivatives for other alphabets

Summarize

Perspective

Some 7-bit character sets for non-Latin alphabets are derived from the ISO/IEC 646 standard: these do not themselves constitute ISO/IEC 646 due to not following its invariant code points (often replacing the letters of at least one case), due to supporting differing alphabets which the set of national code points provide insufficient encoding space for. Examples include:

7-bit Turkmen (ISO-IR-230).^[96]
7-bit Greek.
- In ELOT 927 (ISO-IR-088),^[49] the Greek alphabet is mapped in alphabetical order (except for the final-sigma) to positions 0x61–0x71 and 0x73–0x79, on top of the Latin lowercase letters.
- ISO-IR-018^[97] maps the Greek alphabet over both letter cases using a different scheme (not in alphabetical order, but trying where possible to match Greek letters over Roman letters which correspond in some sense), and ISO-IR-019^[98] maps the Greek uppercase alphabet over the Latin lowercase letters using the same scheme as ISO-IR-018.
- The lower half of the Symbol font character encoding^[99] uses its own scheme for mapping Greek letters of both cases over the ASCII Roman letters, also trying to map Greek letters over Roman letters which correspond in some sense, but making different decisions in this regard (see chart below). It also replaces invariant code points 0x22 and 0x27 and five national code points with mathematical symbols. Although not intended for use in typesetting Greek prose, it is sometimes used for that purpose.
- ISO-IR-027^[50] (detailed in the chart above rather than below) includes the Latin alphabet unchanged, but adds some Greek capital letters which cannot be represented with Latin-script homoglyphs; while it is explicitly based on ISO/IEC 646, some of these are mapped to code points which are invariant in ISO/IEC 646 (0x21, 0x3A, and 0x3F), and it is therefore not a true ISO/IEC 646 variant.
- The World System Teletext encoding for Greek uses yet another scheme of mapping Greek letters in alphabetical order over the ASCII letters of both cases, notably including several letters with diacritics.^[100]
7-bit Cyrillic
- KOI-7 or Short KOI, used for Russian. The Cyrillic characters are mapped to positions 0x60–0x7E, on top of the Latin lowercase letters, matching homologous letters where possible (where в is mapped to w, not v). Superseded by the KOI-8 variants.
- SRPSCII and MAKSCII, Cyrillic variants of YUSCII (the Latin variant is YU/ISO-IR-141 in the chart above), used for Serbian and Macedonian respectively. Largely homologous to the Latin variant of YUSCII (following Serbian digraphia rules), except for Љ (lj), Њ (nj), Џ (dž), and ѕ (dz), which correspond to digraphs in Latin-script orthography, and are mapped over letters which are not used in Serbian or Macedonian (q, w, x, y).
- The G0 sets for the World System Teletext encodings for Russian/Bulgarian^[101] and Ukrainian^[102] use G0 sets similar to KOI-7 with some modifications. The corresponding G0 set for Serbian Cyrillic^[a]^[103] uses a scheme based on the Teletext encoding for Latin-script Serbo-Croatian and Slovene, as opposed to the significantly different YUSCII.
7-bit Hebrew, SI 960. The Hebrew alphabet is mapped to positions 0x60–0x7A, on top of the lowercase Latin letters (and grave accent for aleph). 7-bit Hebrew was always stored in visual order. This mapping with the high bit set, i.e. with the Hebrew letters in 0xE0–0xFA, is ISO/IEC 8859-8. The World System Teletext encoding for Hebrew uses the same letter mappings, but uses BS_Viewdata as its base encoding (whereas SI 960 uses US-ASCII) and includes a shekel sign at 0x7B.
7-bit Arabic, ASMO 449 (ISO-IR-089).^[104] The Arabic alphabet is mapped to positions 0x41–0x5A and 0x60–0x6A, on top of both uppercase and lowercase Latin letters.

A comparison of some of these encodings is below. Only one case is shown, except in instances where the cases are mapped to different letters. In such instances, the mapping with the smallest code is shown first. Possible transcriptions are given for some letters; where this is omitted, the letter can be considered to correspond to the Roman one which it is mapped over.

More information English (ASCII), Cyrillic alphabets ...

English (ASCII)	Cyrillic alphabets						Greek alphabet				Hebrew
	Semi-transliterative								Naturally ordered
	Russian (KOI-7)	Russian, Bulgarian (WST RU/BG)	Ukrainian (WST UKR)	Serbian (SRPSCII)	Macedonian (MAKSCII)	Serbian, Macedonian^[a] (WST SRP)	Greek (Symbol)	Greek (IR-18^[97])	Greek (ELOT 927)	Greek (WST EL)	Hebrew (SI 960)
@ `	Ю (ju/yu)	Ю (ju/yu)	Ю (ju/yu)	Ж (ž)	Ж (ž)	Ч (č)	≅ ‾	´ `	@ `	ΐ ΰ	א (ʾ/ʔ)
A	А	А (a/á)	А	А	А	А	Α	Α	Α	Α	ב (b)
B	Б	Б	Б	Б	Б	Б	Β	Β	Β	Β	ג (g)
C	Ц (c/ts)	Ц (c/ts)	Ц (c/ts)	Ц (c/ts)	Ц (c/ts)	Ц (c/ts)	Χ (ch/kh)	Ψ (ps)	Γ (g)	Γ (g)	ד (d)
D	Д	Д	Д	Д	Д	Д	Δ	Δ	Δ	Δ	ה (h)
E	Е (je/ye)	Е (je/ye)	Е (e)	Е (e)	Е (e)	Е (e)	Ε	Ε	Ε	Ε	ו‬ (w)
F	Ф	Ф	Ф	Ф	Ф	Ф	Φ (ph/f)	Φ (ph/f)	Ζ (z)	Ζ (z)	ז (z)
G	Г	Г	Г	Г	Г	Γ	Γ	Γ	Η (ē)	Η (ē)	ח (ch/kh)
H	Х (h/kh/ch)	Х (h/kh/ch)	Х (h/kh/ch)	Х (h/kh/ch)	Х (h/kh/ch)	Х (h/kh/ch)	Η (ē)	Η (ē)	Θ (th)	Θ (th)	ט (tt)
I	И	И	И (y)	И	И	И	Ι	Ι	Ι	Ι	י (j/y)
J	Й (j/y)	Й (j/y)	Й (j/y)	Ј (j/y)	Ј (j/y)	Ј (j/y)	ϑ (th) ϕ (ph/f)	Ξ (x/ks)		Κ (k)	ך (k final)
K	К	К	К	К	К	К	Κ	Κ	Κ	Λ (l)	כ
L	Л	Л	Л	Л	Л	Л	Λ	Λ	Λ	Μ (m)	ל
M	М	М	М	М	М	М	Μ	Μ	Μ	Ν (n)	ם (m final)
N	Н	Н	Н	Н	Н	Н	Ν	Ν	Ν	Ξ (x/ks)	מ (m)
O	О	О	О	О	О	О	Ο	Ο	Ξ (x/ks)	Ο	ן (n final)
P	П	П	П	П	П	П	Π	Π	Ο (o)	Π	נ (n)
Q	Я (ja/ya)	Я (ja/ya)	Я (ja/ya)	Љ (lj/ly)	Љ (lj/ly)	Ќ (Ḱ/kj)	Θ (th)	ͺ (i)	Π (p)	Ρ (r)	ס (s)
R	Р	Р	Р	Р	Р	Р	Ρ	Ρ	Ρ	ʹ ς (s final)	ע (ʽ/ŋ)
S	С	С	С	С	С	С	Σ	Σ	Σ	Σ	ף (p final)
T	Т	Т	Т	Т	Т	Т	Τ	Τ	Τ	Τ	פ (p)
U	У	У	У	У	У	У	Υ	Θ (th)	Υ	Υ	ץ (ṣ/ts final)
V	Ж (ž)	Ж (ž)	Ж (ž)	В	В	В	ς (s final) ϖ (p)	Ω (ō)	Φ (f/ph)	Φ (f/ph)	צ (ṣ/ts)
W	В (v)	В (v)	В (v)	Њ (nj/ny/ñ)	Њ (nj/ny/ñ)	Ѓ (ǵ/gj)	Ω (ō)	ς (s final)	ς (s final)	Χ (ch/kh)	ק (q)
X	Ь (’)	Ь (’)	Ь (’)	Џ (dž)	Џ (dž)	Љ (lj/ly)	Ξ	Χ (ch/kh)	Χ (ch/kh)	Ψ (ps)	ר (r)
Y	Ы (y/ı)	Ъ (″/ǎ/ŭ)	І (i)	Ѕ (dz)	Ѕ (dz)	Њ (nj/ny/ñ)	Ψ (ps)	Υ (u)	Ψ (ps)	Ω (ō)	ש (š/sh)
Z	З	З	З	З	З	З	Ζ	Ζ	Ω (ō)	Ϊ	ת (t)
[ {	Ш (š/sh)	Ш (š/sh)	Ш (š/sh)	Ш (š/sh)	Ш (š/sh)	Ћ (ć)	[ {	῏ ῟	[ {	Ϋ	[ {
\ \|	Э (e)	Э (e)	Є (je/ye)	Ђ (đ/dj)	Ѓ (ǵ/gj)	Ж (ž)	∴ \|	᾿ ῾ (h)	\ \|	ά ό	\ \|
] }	Щ (šč)	Щ (šč)	Щ (šč)	Ћ (ć)	Ќ (Ḱ/kj)	Ђ (đ/dj)	] }	῎ ῞	] }	έ ύ	] }
^ ~	Ч (č)	Ч (č)	Ч (č)	Ч (č)	Ч (č)	Ш (š/sh)	⊥ ~	˜ ¨	^ ‾	ή ώ	^ ‾
_	Ъ (″)	Ы (y/ı)	Ї (ji/yi)	_	_	Џ (dž)	_	_	_	ί	_

Remove ads

Footnotes

[a]
Labelled "Cyrillic G0 Primary Set - Option 1 - Serbian/Croatian", but includes Macedonian letters Ќ and Ѓ (but not Ѕ). A subset of Roman letters, mostly those without homoglyphs in the G0 set, are included in the G1 set (15.6.7 Table 41), including S/s at 0x6B/7B. Croatian is written in Latin script.

References

Loading content...

External links

Loading content...

Loading related searches...

Wikiwand - on

Seamless Wikipedia browsing. On steroids.

Remove ads

History

Published standards

Code page layout

Composite Graphic Characters

Variant codes and descriptions

ISO/IEC 646 national variants

National derivatives

Control characters

Associated supplementary character sets

Variant comparison chart

Related encoding families

National Replacement Character Set

World System Teletext

HP

Derivatives for other alphabets

See also

Footnotes

References

Further reading

External links

Wikiwand - on