CJK Characters

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

CJK characters

In internationalization, CJK characters is a


collective term for the Chinese, Japanese,
and Korean languages, all of which include
Chinese characters and derivatives in their
writing systems, sometimes paired with
other scripts. Occasionally, Vietnamese is
included, making the abbreviation CJKV,
since Vietnamese historically used
Chinese characters as well. Collectively,
the CJKV characters often include hànzì in
Chinese, kanji, kana in Japanese, hanja,
hangul in Korean, and hán tự or chữ nôm in
Vietnamese.

CJKV characters derived from Ancient Chinese


characters. Left to right: Japanese, Vietnamese,
Korean, Simplified Chinese, Taiwanese Traditional
Chinese.
Character repertoire
Standard Mandarin Chinese and Standard
Cantonese are written almost exclusively
in Chinese characters. It requires over
3,000 characters for general literacy, but
up to 40,000 characters for reasonably
complete coverage. Japanese uses fewer
characters—general literacy in Japanese
can be expected with 2,136 characters.
The use of Chinese characters in Korea is
becoming increasingly rare, although
idiosyncratic use of Chinese characters in
proper names requires knowledge (and
therefore availability) of many more
characters. However, even today, students
in South Korea are taught 1,800
characters.

Other scripts used for these languages,


such as bopomofo and the Latin-based
pinyin for Chinese, hiragana and katakana
for Japanese, and hangul for Korean, are
not strictly "CJK characters", although CJK
character sets almost invariably include
them as necessary for full coverage of the
target languages.

Until the early 20th century, Classical


Chinese was the written language of
government and scholarship in Vietnam.
Popular literature in Vietnamese was
written in the chữ Nôm script, consisting of
borrowed Chinese characters together
with many characters created locally. By
the end of the 1920s, both scripts had
been replaced by writing in Vietnamese
using the Latin-based Vietnamese
alphabet.[1][2]

The sinologist Carl Leban (1971) produced


an early survey of CJK encoding systems.

Encoding
The number of characters required for
complete coverage of all these languages'
needs cannot fit in the 256-character code
space of 8-bit character encodings,
requiring at least a 16-bit fixed width
encoding or multi-byte variable-length
encodings. The 16-bit fixed width
encodings, such as those from Unicode up
to and including version 2.0, are now
deprecated due to the requirement to
encode more characters than a 16-bit
encoding can accommodate—Unicode 5.0
has some 70,000 Han characters—and the
requirement by the Chinese government
that software in China support the GB
18030 character set.

Although CJK encodings have common


character sets, the encodings often used
to represent them have been developed
separately by different East Asian
governments and software companies,
and are mutually incompatible. Unicode
has attempted, with some controversy, to
unify the character sets in a process
known as Han unification.

CJK character encodings should consist


minimally of Han characters plus
language-specific phonetic scripts such as
pinyin, bopomofo, hiragana, katakana and
hangul.

CJK character encodings include:

Big5 (the most prevalent encoding


before Unicode was implemented)
CCCII
CNS 11643 (official standard of
Republic of China)
EUC-JP
EUC-KR
GB2312 (subset and predecessor of
GB18030)
GB18030 (mandated standard in the
People's Republic of China)
Giga Character Set (GCS)
ISO 2022-JP
KS C 5861
Shift-JIS
TRON
Unicode

The CJK character sets take up the bulk of


the assigned Unicode code space. There is
much controversy among Japanese
experts of Chinese characters about the
desirability and technical merit of the Han
unification process used to map multiple
Chinese and Japanese character sets into
a single set of unified characters.

All three languages can be written both


left-to-right and top-to-bottom (right-to-left
and top-to-bottom in ancient documents),
but are usually considered left-to-right
scripts when discussing encoding issues.
Legal status
Libraries cooperated on encoding
standards for JACKPHY characters in the
early 1980s. According to Ken Lunde, the
abbreviation "CJK" was a registered
trademark of Research Libraries Group[3]
(which merged with OCLC in 2006). The
trademark owned by OCLC between 1987
and 2009 has now expired.[4]

See also
Chinese character description
languages
Chinese character encoding
Chinese input methods for computers
CJK Compatibility Ideographs
CJK strokes
CJK Unified Ideographs
Complex Text Layout languages (CTL)
Input method editor
Japanese language and computers
Korean language and computers
List of CJK fonts
Sinoxenic
Variable-width encoding

References
1. Coulmas (1991), pp. 113–115.
2. DeFrancis (1997).
3. Ken Lunde, 1996
4. Justia listing

This article is based on material taken


from the Free On-line Dictionary of
Computing prior to 1 November 2008 and
incorporated under the "relicensing" terms
of the GFDL, version 1.3 or later.

DeFrancis, John. The Chinese Language:


Fact and Fantasy. Honolulu: University of
Hawaii Press, 1990. ISBN 0-8248-1068-
6.
Hannas, William C. Asia's Orthographic
Dilemma. Honolulu: University of Hawaii
Press, 1997. ISBN 0-8248-1892-X
(paperback); ISBN 0-8248-1842-3
(hardcover).
Lemberg, Werner: The CJK package for
LATEX2ε—Multilingual support beyond
babel. TUGboat, Volume 18 (1997), No.
3—Proceedings of the 1997 Annual
Meeting.
Leban, Carl. Automated Orthographic
Systems for East Asian Languages
(Chinese, Japanese, Korean) , State-of-
the-art Report, Prepared for the Board of
Directors, Association for Asian Studies.
1971.
Lunde, Ken. CJKV Information
Processing. Sebastopol, Calif.: O'Reilly &
Associates, 1998. ISBN 1-56592-224-7.

External links
CJKV: A Brief Introduction
Lemberg CJK article from above,
TUGboat18-3
On “CJK Unified Ideograph” , from
Wenlin.com
FGA: Unicode CJKV character set
rationalization

Retrieved from
"https://en.wikipedia.org/w/index.php?
title=CJK_characters&oldid=965868111"

Last edited 5 months ago by Voidxor

Content is available under CC BY-SA 3.0 unless


otherwise noted.

You might also like