UTF-16

UTF-16
	The first 216 Unicode code points. The white stripe near the bottom are the surrogate halves used by UTF-16.
Language(s)	International
Standard	Unicode Standard
Classification	Unicode Transformation Format, variable-width encoding
Extends	UCS-2
Transforms / Encodes	ISO/IEC 10646 (Unicode)
	v; t; e;

UTF-16 (16-bit Unicode Transformation Format) is a character encoding capable of encoding all 1,112,064 valid code points of Unicode (in fact this number of code points is dictated by the design of UTF-16). The encoding is variable-length, as code points are encoded with one or two 16-bit code units. UTF-16 arose from an earlier obsolete fixed-width 16-bit encoding now known as "UCS-2" (for 2-byte Universal Character Set),^[1]^[2] once it became clear that more than 2¹⁶ (65,536) code points were needed,^[3] including most emoji and important CJK characters such as for personal and place names.^[4]

Quick Facts Language(s), Standard ...

Close

UTF-16 is used by systems such as the Microsoft Windows API, the Java programming language and JavaScript/ECMAScript. It is also sometimes used for plain text and word-processing data files on Microsoft Windows. It is used by more modern implementations of SMS.^[5]

UTF-16 is the only encoding (still) allowed on the web that is incompatible with ASCII^[6]^{[nb 1]} and never gained popularity on the web, where it is declared by under 0.004% of web pages.^[8] UTF-8, by comparison, accounts for over 98% of all web pages.^[9] The Web Hypertext Application Technology Working Group (WHATWG) considers UTF-8 "the mandatory encoding for all [text]" and that for security reasons browser applications should not use UTF-16.^[10]

[1]

[2]

[3]

[4]

[5]

[6]

[nb 1]

[8]

[9]

[10]

The first 2¹⁶ Unicode code points. The white stripe near the bottom are the surrogate halves used by UTF-16.
Language(s)	International
Standard	Unicode Standard
Classification	Unicode Transformation Format, variable-width encoding
Extends	UCS-2
Transforms / Encodes	ISO/IEC 10646 (Unicode)
v t e