Character Set Tables

Frank da Cruz
10 March 2011
Updated 1 August 2021:
FTP links converted to HTTP; HTML4 to HTML5; W3C validation; minor content improvements.

The following links are to character set tables in a uniform format, in which each character is included literally, its code shown in four ways (decimal, row/column, octal, hexadecimal), and its name given from the corresponding standard (if any), or else its Unicode name, or failing that a short-form name. "C1 Safe" tells whether the character set conforms to international standards and reserves the area 0x80-0x9f for control characters. Character sets that are not C1-Safe are not suitable for cross-platform data interchange.

Each table includes an HTML file with an announcer for its character set, so the characters should appear correctly in your Web browser if it supports HTML character-set declarations of the following form:

<META http-equiv="Content-Type" content="text/html; charset=iso-8859-1">

in which "charset" names are from the IANA / MIME registry. In HTML5, this would be:

<META charset="iso-8859-1">

or (preferably, since all pages are now supposed to be encoded in UTF-8):

<META charset="utf-8">

If the characters do not display correctly in your browser, it means your browser does not understand the declaration, or it does not support that character set, or you don't have an appropriate font. However, you can still save the file and use it locally.

If you save a table, you can use it (you might want to keep only the part between <pre> and </pre>) to test character-set aware software. For example, if you save it on a host, then make a terminal connection (ssh, telnet, dialup, whatever) from your desktop computer to the host, you can see if your character-set definitions are working, and/or if you are using an appropriate font.

Note that nonprintable characters such as Soft Hyphen are likely to occupy no space in the display. Even though the brackets appear to be empty, there really is a character between them.

If you need a table that's not here, let me know and I'll add it.

Table IANA/MIME Script C1 Safe Remarks
US ASCII / ISO 646 IRV US-ASCII Latin N/A USA
KOI-7 / Short KOI Cyrillic N/A USSR
ISO 8859-1 Latin Alphabet 1 ISO-8859-1 Latin Yes West Europe
ISO 8859-2 Latin Alphabet 2 ISO-8859-2 Latin Yes East Europe
ISO 8859-3 Latin Alphabet 3 ISO-8859-3 Latin Yes West Europe / Turkey
ISO 8859-4 Latin Alphabet 4 ISO-8859-4 Latin Yes North & West Europe
ISO 8859-5 Latin/Cyrillic Alphabet ISO-8859-5 Cyrillic Yes
ISO 8859-6 Latin/Arabic Alphabet ISO-8859-6 Arabic Yes
ISO 8859-7 Latin/Greek Alphabet ISO-8859-7 Greek Yes
ISO 8859-8 Latin/Hebrew Alphabet ISO-8859-8 Hebrew Yes
ISO 8859-15 Latin Alphabet 9 ISO-8859-15 Latin Yes West Europe
DEC Multinational (MCS) DEC-MCS Latin Yes West Europe
PC Code Page 437 IBM437 Latin No West Europe
PC Code Page 850 IBM850 Latin No West Europe
PC Code Page 852 IBM852 Latin No East Europe
PC Code Page 856 (none) Cyrillic No
PC Code Page 861 IBM861 Latin No Iceland
PC Code Page 862 IBM862 Hebrew No
PC Code Page 866 IBM866 Cyrillic No
Microsoft Windows Code Page 1250 windows-1250 Latin No East Europe
Microsoft Windows Code Page 1251 windows-1251 Cyrillic No
Microsoft Windows Code Page 1252 windows-1252 Latin No West Europe
Microsoft Windows Code Page 1254 windows-1254 Latin No Turkey
Unicode UTF-8 U+0020-28FF UTF-8 (many) No (All but CJK) (BIG!)
Unicode Gothic U+10330-1034F UTF-8 Gothic No Unicode 3.1 Plane 1

Table	IANA/MIME	Script	C1 Safe	Remarks
US ASCII / ISO 646 IRV	US-ASCII	Latin	N/A	USA
KOI-7 / Short KOI		Cyrillic	N/A	USSR
ISO 8859-1 Latin Alphabet 1	ISO-8859-1	Latin	Yes	West Europe
ISO 8859-2 Latin Alphabet 2	ISO-8859-2	Latin	Yes	East Europe
ISO 8859-3 Latin Alphabet 3	ISO-8859-3	Latin	Yes	West Europe / Turkey
ISO 8859-4 Latin Alphabet 4	ISO-8859-4	Latin	Yes	North & West Europe
ISO 8859-5 Latin/Cyrillic Alphabet	ISO-8859-5	Cyrillic	Yes
ISO 8859-6 Latin/Arabic Alphabet	ISO-8859-6	Arabic	Yes
ISO 8859-7 Latin/Greek Alphabet	ISO-8859-7	Greek	Yes
ISO 8859-8 Latin/Hebrew Alphabet	ISO-8859-8	Hebrew	Yes
ISO 8859-15 Latin Alphabet 9	ISO-8859-15	Latin	Yes	West Europe
DEC Multinational (MCS)	DEC-MCS	Latin	Yes	West Europe
PC Code Page 437	IBM437	Latin	No	West Europe
PC Code Page 850	IBM850	Latin	No	West Europe
PC Code Page 852	IBM852	Latin	No	East Europe
PC Code Page 856	(none)	Cyrillic	No
PC Code Page 861	IBM861	Latin	No	Iceland
PC Code Page 862	IBM862	Hebrew	No
PC Code Page 866	IBM866	Cyrillic	No
Microsoft Windows Code Page 1250	windows-1250	Latin	No	East Europe
Microsoft Windows Code Page 1251	windows-1251	Cyrillic	No
Microsoft Windows Code Page 1252	windows-1252	Latin	No	West Europe
Microsoft Windows Code Page 1254	windows-1254	Latin	No	Turkey
Unicode UTF-8 U+0020-28FF	UTF-8	(many)	No	(All but CJK) (BIG!)
Unicode Gothic U+10330-1034F	UTF-8	Gothic	No	Unicode 3.1 Plane 1

You can find plain-text (not embedded in HTML) versions of these tables (and many more) in the Kermit Project archive: http://www.columbia.edu/kermit/archivefiles/charsets.html; transfer them in BINARY mode only. For any pair of files xxx.c and xxx.txt, the first is a C program to generate the table, the second is the table itself. NOTE: these tables won't display correctly in your browser because they are plain text, which cannot announce its character encoding. To download any of these tables, right-click on its name (rightmost column) and choose "Save" or "Download".