Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

The Java programming language uses the Unicode character set for managing

text. A character set is simply an ordered list of characters, each corresponding to


a particular numeric value. Unicode is an international character set that contains
letters, symbols, and ideograms for languages all over the world. Each character
C
is represented as a 16-bit unsigned numeric value. Unicode, therefore, can sup-

the unicode character set


port over 65,000 unique characters. Only about half of those values have char-
acters assigned to them at this point. The Unicode character set continues to be
refined as characters from various languages are included.
Many programming languages still use the ASCII character set. ASCII stands
for the American Standard Code for Information Interchange. The 8-bit extended
ASCII set is quite small, so the developers of Java opted to use Unicode in order
to support international users. However, ASCII is essentially a subset of Unicode,
including corresponding numeric values, so programmers used to ASCII should
have no problems with Unicode.
Figure C.1 shows a list of commonly used characters and their Unicode
numeric values. These characters also happen to be ASCII characters. All of the
characters in Fig. C.1 are called printable characters because they have a symbolic
representation that can be displayed on a monitor or printed by a printer. Other
characters are called nonprintable characters because they have no such symbolic
representation. Note that the space character (numeric value 32) is considered a
printable character, even though no symbol is printed when it is displayed.
Nonprintable characters are sometimes called control characters because many of
them can be generated by holding down the control key on a keyboard and press-
ing another key.
The Unicode characters with numeric values 0 through 31 are nonprintable
characters. Also, the delete character, with numeric value 127, is a nonprintable
character. All of these characters are ASCII characters as well. Many of them have
fairly common and well-defined uses, while others are more general. The table in
Fig. C.2 lists a small sample of the nonprintable characters.
Nonprintable characters are used in many situations to represent special con-
ditions. For example, certain nonprintable characters can be stored in a text doc-
ument to indicate, among other things, the beginning of a new line. An editor will
process these characters by starting the text that follows it on a new line, instead
of printing a symbol to the screen. Various types of computer systems use differ-
ent nonprintable characters to represent particular conditions.
Except for having no visible representation, nonprintable characters are essen-
tially equivalent to printable characters. They can be stored in a Java character
variable and be part of a character string. They are stored using 16 bits, can be
converted to their numeric value, and can be compared using relational operators.
690 APPENDIX C the unicode character set

Value Char Value Char Value Char Value Char Value Char

32 space 51 3 70 F 89 Y 108 l
33 ! 52 4 71 G 90 Z 109 m
34 " 53 5 72 H 91 [ 110 n
35 # 54 6 73 I 92 \ 111 o
36 $ 55 7 74 J 93 ] 112 p
37 % 56 8 75 K 94 ˆ 113 q
38 & 57 9 76 L 95 – 114 r
39 ' 58 : 77 M 96 ' 115 s
40 ( 59 ; 78 N 97 a 116 t
41 ) 60 < 79 O 98 b 117 u
42 * 61 = 80 P 99 c 118 v
43 + 62 > 81 Q 100 d 119 w
44 ' 63 ? 82 R 101 e 120 x
45 – 64 @ 83 S 102 f 121 y
46 . 65 A 84 T 103 g 122 z
47 / 66 B 85 U 104 h 123 {
48 0 67 C 86 V 105 i 124 |
49 1 68 D 87 W 106 j 125 }
50 2 69 E 88 X 107 k 126 ~

figure C.1 A small portion of the Unicode character set

The first 128 characters of the Unicode character set correspond to the com-
mon ASCII character set. The first 256 characters correspond to the ISO-Latin-1
extended ASCII character set. Many operating systems and Web browsers will
handle these characters, but they may not be able to print the other Unicode
characters.
APPENDIX C the unicode character set 691

Value Character

0 null
7 bell
8 backspace
9 tab
10 line feed
12 form feed
13 carriage return
27 escape
127 delete

figure C.2 Some nonprintable characters in the Unicode character set

You might also like