Chinese character description languages

Chapter 18 of The Unicode Standard (version 15.0) defines the "Ideographic Description Sequences" (IDS) syntax used to describe characters in featural terms, by arrangements of components with code points. Sixteen special characters in the range U+2FF0..U+2FFF act as prefix operators to combine other characters or sequences to form larger characters.

More information Character, Unicode Character Number ...

Ideographic Description Characters in Unicode
Character	Unicode Character Number	Full Unicode Name
⿰	U+2FF0	Ideographic description character left to right
⿱	U+2FF1	Ideographic description character above to below
⿲	U+2FF2	Ideographic description character left to middle and right
⿳	U+2FF3	Ideographic description character above to middle and below
⿴	U+2FF4	Ideographic description character full surround
⿵	U+2FF5	Ideographic description character surround from above
⿶	U+2FF6	Ideographic description character surround from below
⿷	U+2FF7	Ideographic description character surround from left
⿼	U+2FFC	Ideographic description character surround from right
⿸	U+2FF8	Ideographic description character surround from upper left
⿹	U+2FF9	Ideographic description character surround from upper right
⿺	U+2FFA	Ideographic description character surround from lower left
⿽	U+2FFD	Ideographic description character surround from lower right
⿻	U+2FFB	Ideographic description character overlaid
⿾	U+2FFE	Ideographic description character horizontal reflection
⿿	U+2FFF	Ideographic description character rotation

Two additional ideographic description characters are scattered in other Unicode blocks. U+303E 〾 IDEOGRAPHIC VARIATION INDICATOR is not officially an ideographic description character, but is sometimes used in ideographic description sequences.

More information Character, Code point ...

Other Ideographic Description Characters in Unicode
Character	Code point	Block	Name
〾	U+303E	CJK Symbols and Punctuation	Ideographic variation indicator
㇯	U+31EF	CJK Strokes	Ideographic description character subtraction

These sequences are useful in describing to the reader a character that is not directly printable, either because it is absent in a given font, or is absent from the Unicode standard altogether. For example, the sawndip character encoded in CJK Unified Ideographs Extension F as U+2DA21 𭨡 can be described as ⿰書史. Another use is for dictionary lookup purposes, as a rough input method for queries.

These sequences can be rendered either by keeping the individual characters separately or by parsing the Ideographic Description Sequence and drawing the ideograph so described. They do not, by themselves, provide unambiguous rendering for all characters. For instance, the sequence ⿱十一 represents both ⼟ 'EARTH' with the middle bar being narrower, and ⼠ 'SCHOLAR' with the middle bar being wider.

Unicode's specification for these sequences is based on the characters and syntax of the earlier GBK encoding. Additional symbols are later encoded to fill in the missing combinations.

The IDSgrep free software package by Matthew Skala^[2]^[3] extends Unicode's IDS syntax to include additional features for dictionary lookup; it is capable of converting KanjiVG's database to its own extended IDS format, or of searching EIDS files generated by the related Tsukurimashou font family.

Citations

[1]
Bishop & Cook (2003c), pp. 2, 9.
[2]
"IDSgrep", Tsukurimashou Project, 2024, archived from the original on Feb 7, 2024
[3]
Skala, Matthew (2015), "A Structural Query System for Han Characters" (PDF), International Journal of Asian Language Processing, vol. 23, no. 2, pp. 127–159, arXiv:1404.5585, archived from the original (PDF) on 2016-03-04, retrieved 2016-01-13

Works cited

Wenlin User's Guide, Wenlin Institute, 2015
Bishop, Tom; Cook, Richard, CDL specification
———; Cook, Richard (2003), Character Description Language (CDL): The Set of Basic CJK Unified Stroke Types (PDF)
———; Cook, Richard (2003), A Specification for CDL Character Description Language (PDF)
- ———; Cook, Richard (2003), Specification for CDL (PDF), archived from the original (PDF) on 2016-04-05, retrieved 2018-01-17
——— (2007), A character description language for CJK (PDF), Multilingual, #91, vol. 18, pp. 62–68
Cook, Richard (2003), Chinese Character Description Languages (PDF)

Chinese character description languages

CDL

Ideographic Description Sequences

See also

References

Citations

Works cited

Wikiwand - on