Windows-1252

Windows character set for Latin alphabet From Wikipedia, the free encyclopedia

Windows-1252

Windows-1252 or CP-1252 (Windows code page 1252) is a legacy single-byte character encoding[2] that is used by default (as the "ANSI code page") in Microsoft Windows throughout the Americas, Western Europe, Oceania, and much of Africa.[3]

Quick Facts MIME / IANA, Alias(es) ...
Windows-1252
Thumb
MIME / IANAwindows-1252[1]
Alias(es)cp1252 (code page 1252)
Language(s)All supported by ISO/IEC 8859-1 plus full support for French[a] and Finnish and ligature forms for English; e.g. Danish (except for a rare exceptional letter), Irish, Italian, Norwegian, Portuguese, Spanish, Swedish, German (missing uppercase [b]), Icelandic, Faroese, Luxembourgish, Albanian, Estonian, Swahili, Tswana, Catalan, Basque, Occitan, Rotokas, Toki Pona, Lojban, Romansh, Dutch (except the IJ/ij character, substituted by IJ/ij or ÿ), and Slovene (except the č character, substituted by ç). Some languages lack their standard quotation marks (such as German „quotes“).
Created byMicrosoft
StandardWHATWG Encoding Standard
Classificationextended ASCII, Windows-125x
ExtendsISO 8859-1 (excluding C1 controls)
Transforms / EncodesISO 8859-15
Succeeded byUnicode (UTF-8, UTF-16)
Close

Initially the same as ISO 8859-1, it began to diverge starting in Windows 2.0 by adding additional characters in the 0x80 to 0x9F (hex) range (the ISO standards reserve this range for C1 control codes). Notable additional characters include curly quotation marks and all printable characters from ISO 8859-15.

It is the most-used single-byte character encoding in the world. Although almost all websites now use the multi-byte character encoding UTF-8, as of April 2025, 1.1%[4] of websites declared ISO 8859-1 which is treated as Windows-1252 by all modern browsers (as required by the HTML5 standard[5]), plus 0.3% declared Windows-1252 directly,[4][6] for a total of 1.4%. Some countries or languages show a higher usage than the global average, in 2025 Brazil according to website use, use is at 2.9%,[7] and in Germany at 2.4%[8][9] (these are the sums of ISO-8859-1 and CP-1252 declarations).

Name

It is known to Windows by the code page number 1252, and by the IANA-approved name "windows-1252".

Historically, the phrase "ANSI Code Page" was used in Windows to refer to non-DOS encodings; the intention was that most of these would be ANSI standards such as ISO-8859-1. Even though Windows-1252 was the first and by far most popular code page named so in Microsoft Windows parlance, the code page has never been an ANSI standard. Microsoft explains, "The term ANSI as used to signify Windows code pages is a historical reference, but is nowadays a misnomer that continues to persist in the Windows community."[10]

LaTeX can input Windows-1252 by using inputenc.sty with parameter ansinew (and more recently cp1252).[11][12]

IBM uses code page 1252 (CCSID 1252 and euro sign extended CCSID 5348) for Windows-1252.[13][14][15]

It is called "WE8MSWIN1252" by Oracle Database.[16]

History

Summarize
Perspective
  • The first version of the codepage was used in Microsoft Windows 1.0. It matched the ISO-8859-1 standard (including leaving code points 0xD7 and 0xF7 undefined, as they were not in the standard at that time).
  • The second version of the codepage was introduced in Microsoft Windows 2.0. In this version, code points 0xD7, 0xF7, 0x91, and 0x92 are defined.
  • The third version of the codepage was introduced in Microsoft Windows 3.1. It defined all code points used in the final version except the euro sign and the Z with caron character pair.
  • The final version (shown below) was introduced in Microsoft Windows 98.

Starting in the 1990s, many Microsoft products that could produce HTML included Windows-1252-exclusive characters, but marked the encoding as ISO-8859-1, ASCII, or undeclared.[citation needed] Characters exclusive to Windows-1252 would render incorrectly on non-Windows operating systems (often as question marks).[17][18] In particular, typographers' quotes—curly variants of the standard straight apostrophes and quotation marks in US-ASCII—were commonly used in files produced in Windows applications such as Microsoft Word due to the smart quotes feature, which can automatically convert straight apostrophes and quotation marks to the curly variants.[19] To fix this, by 2000 most web browsers and e-mail clients treated the charsets ISO-8859-1 and US-ASCII as Windows-1252[citation needed]—this behavior is now required by the HTML5 specification.[5] Undeclared charsets in HTML are also assumed to be Windows-1252.[20][21]

Although Windows NT supported Unicode and attempted to encourage programs to use it, it only provided the 16-bit code units of UCS-2/UTF-16, despite the existing support for other multibyte character encodings such as Shift-JIS. As many applications preferred to use 8-bit strings, Windows-1252 remained the most popular encoding on Windows.[citation needed] UTF-8 has been supported since Windows 10 so this is gradually changing.[citation needed]

Codepage layout

Summarize
Perspective

The following table shows Windows-1252. Differences from ISO-8859-1 have the Unicode code point number below the character, based on the Unicode.org mapping of Windows-1252 with "best fit". A tooltip, generally available only when one points to the immediate right of the character, shows the Unicode code point name and the decimal Alt code.

Windows-1252 (CP1252)[22][23][24][25][26]
0 1 2 3 4 5 6 7 8 9 A B C D E F
0_ NUL SOH STX ETX EOT ENQ ACK BEL BS HT LF VT FF CR SO SI
1_ DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS GS RS US
2_  SP  ! " # $ % & ' ( ) * + , - . /
3_ 0 1 2 3 4 5 6 7 8 9 : ; < = > ?
4_ @ A B C D E F G H I J K L M N O
5_ P Q R S T U V W X Y Z [ \ ] ^ _
6_ ` a b c d e f g h i j k l m n o
7_ p q r s t u v w x y z { | } ~ DEL
8_
20AC

201A
ƒ
0192

201E

2026

2020

2021
ˆ
02C6

2030
Š
0160

2039
Œ
0152
Ž
017D
9_
2018

2019

201C

201D

2022

2013

2014
˜
02DC

2122
š
0161

203A
œ
0153
ž
017E
Ÿ
0178
A_ NBSP ¡ ¢ £ ¤ ¥ ¦ § ¨ © ª « ¬ SHY ® ¯
B_ ° ± ² ³ ´ µ · ¸ ¹ º » ¼ ½ ¾ ¿
C_ À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï
D_ Ð Ñ Ò Ó Ô Õ Ö × Ø Ù Ú Û Ü Ý Þ ß
E_ à á â ã ä å æ ç è é ê ë ì í î ï
F_ ð ñ ò ó ô õ ö ÷ ø ù ú û ü ý þ ÿ

  According to the information on Microsoft's and the Unicode Consortium's websites, positions 81, 8D, 8F, 90, and 9D are unused; however, the Windows API MultiByteToWideChar maps these to the corresponding C1 control codes. The "best fit" mapping documents this behavior, too.[22]

OS/2 extensions

The OS/2 operating system supports an encoding by the name of Code page 1004 (CCSID 1004) or "Windows Extended".[27][28] This mostly matches code page 1252, with the exception of certain C0 control characters being replaced by diacritic characters.

Code page 1004 (differing rows only)[29][30][31][32]
0 1 2 3 4 5 6 7 8 9 A B C D E F
0_ NUL SOH STX ETX ˉ
02C9
˘
02D8
˙
02D9
BEL ˚
02DA
HT ˝
02DD
˛
02DB
ˇ
02C7
CR SO SI

MS-DOS extensions (rare)

There is a rarely used, but useful, graphics extended code page 1252 where codes 0x00 to 0x1f allow for box drawing as used in applications such as MSDOS Edit and Codeview. One of the applications to use this code page was an Intel Corporation Install/Recovery disk image utility from mid/late 1995. These programs were written for its P6 User Test Program machines (US example[33]). It was used exclusively in its then EMEA region (Europe, Middle East & Africa). In time the programs were changed to use code page 850.

Graphics Extended Code Page 1252[citation needed]
0 1 2 3 4 5 6 7 8 9 A B C D E F
0_
1_

See also

Notes

  1. Excluding the narrow non-breaking space, which is preferred to the regular non-breaking space when spacing certain kinds of punctuation.
  2. uppercase ẞ was not officially adopted until 2017

References

Loading related searches...

Wikiwand - on

Seamless Wikipedia browsing. On steroids.