Lecture 4 Comp SC and Engineering+

Lecture
in Computer Science and Engineering
Unit 6. Coding Text Information

Coding Text Information
1. Introduction.
2. ASCII encoding.
3. Unicode encoding.
3. Conclusions.
John Gammack, Val Hobbs, Diarmuid Pigott. The Book

of Informatics, Cengage Learning, 2011, 548 p.
1. Introduction
If you assign a certain number to each alphabet

symbol, you can also encode text information using
binary code. Thus, with 8 binary digits, 256 different
characters can be encoded. This is enough to express all
the symbols of the English and national alphabet
(lowercase and uppercase letters), numbers, punctuation
marks and service symbols, for example, the symbol §,
in various combinations of eight bits.
The main difficulty in encoding text data is caused
by a limited set of codes (256). If you increase the
number of digits allocated to the code, then this number
will become much larger. A system based on 16-bit
encoding is called universal encoding system - Unicode.
Sixteen bits allow you to provide unique codes for
65,536 different characters - this is enough to place in
one table the symbols of most of the planet's languages.
The main disadvantage of Unicode is naturally to
2
increase the file size by 2 times, however, with already
modern computational tools this is not a problem.
2. ASCII encoding
Thus, 1 character is 1 byte. A table that determines

the correspondence between a byte and a symbol is called
an encoding table.
However, in the early years of the development of
computer technology there was no single standard for
the code table, which led to the existence in the world of
an abundance of simultaneously operating and
conflicting standards. Therefore, there is a basic
contradiction in the encoding of textual information
when encoding symbols of national alphabets.
For the English language, the standardization
institute ANSI (American National Standard Institute)
introduced the ASCII coding system (Standard
Information Exchange Code). In the ASCII system, two
coding tables are fixed - basic and extended.
The base table fixes code values from 0 to 127.
The extended table fixes the code values from 128 to
255.
The first 32 codes of the base table, starting with
zero, are given to hardware manufacturers. In this area,
control codes are placed, which do not correspond to any
language symbols, and, accordingly, these codes are not
displayed either on the screen or on print devices, but
they can be controlled by how other data is output.
Starting with code 32 to code 127, the codes for the
symbols of the English alphabet, punctuation marks,
3
numbers, arithmetic operations and some auxiliary
symbols are placed. The basic ASCII encoding table is
shown in Table 1.
Тable 1. Original character set ASCII (American Standard

Code for Information Interchange, since 1963). Control
characters (0-31) Standard character set (32-127)
0 NUL 16 DLE 32 Space 48 0 64 @ 80 P 96 ` 112 p
1 SOH 17 DC1 33 ! 49 1 65 A 81 Q 97 a 113 q
2 STX 18 DC2 34 " 50 2 66 B 82 R 98 b 114 r
3 ETX 19 DC3 35 # 51 3 67 C 83 S 99 c 115 s
4 EOT 20 DC4 36 $ 52 4 68 D 84 T 100 d 116 t
5 ENQ 21 NAK 37 % 53 5 69 E 85 U 101 e 117 u
6 ACK 22 SYN 38 & 54 6 70 F 86 V 102 f 118 v
7 BEL 23 ETB 39 ' 55 7 71 G 87 W 103 g 119 w
8 BS 24 CAN 40 ( 56 8 72 H 88 X 104 h 120 x
9 HT 25 EM 41 ) 57 9 73 I 89 Y 105 i 121 y
10 LF 26 SUB 42 * 58 : 74 J 90 Z 106 j 122 z
11 VT 27 ESC 43 + 59 ; 75 K 91 [ 107 k 123 {
12 FF 28 FS 44 , 60 < 76 L 92 \ 108 l 124 |
13 CR 29 GS 45 - 61 = 77 M 93 ] 109 m 125 }
14 SO 30 RS 46 . 62 > 78 N 94 ^ 110 n 126 ~
15 SI 31 US 47 / 63 ? 79 O 95 _ 111 o 127 DEL
The names of ASCII symbols

№ Control № Control
0 NUL null character 16 ctrl-P (DLE) data link escape
ctrl-Q (START, DC1) device control
1 ctrl-A (SOH) start of header 17
1
2 ctrl-B (STX) start of text 18 ctrl-R (DC2) device control 2
3 ctrl-C (INTR, ETX) end of text 19 ctrl-S (STOP, DC3) device control 3
4 ctrl-D (EOF, EOT) end of transmission 20 ctrl-T (DC4) device control 4
5 ctrl-E (ENQ) enquiry 21 ctrl-U (NAK) negative acknowledge
6 ctrl-F (ACK) acknowledge 22 ctrl-V (SYN) synchronize
ctrl-W (ETB) end transmission
7 ctrl-G (BEL) bell (ring) 23
block
8 ctrl-H (BKSP, BS) backspace 24 ctrl-X (CAN) cancel
9 ctrl-I (TAB, HT) horizontal tab 25 ctrl-Y (EM) end of medium
10 ctrl-J (LF) line feed 26 ctrl-Z (SUB) substitute
11 ctrl-K (VT) vertical tab 27 (ESC) escape
4
12 ctrl-L (FF) form feed 28 (FS_ file separator
13 ctrl-M (CR) carriage return 29 (GS) group separator
14 ctrl-N (SO) shift out 30 (RS_ record separator
15 ctrl-O (SI) shift in 31 (US) unit separator
Lower
32 Space 56 Eight 80 Upper case P 104
case h
Exclamation Lower
33 57 Nine 81 Upper case Q 105
mark case i
Quotation Lower
34 58 Colon 82 Upper case R 106
Mark case j
Lower
35 Hash 59 Semicolon 83 Upper case S 107
case k
Lower
36 Dollar 60 Less than 84 Upper case T 108
case l
Equals Lower
37 Percent 61 85 Upper case U 109
sign case m
Greater Lower
38 Ampersand 62 86 Upper case V 110
than case n
Question Lower
39 Apostrophe 63 87 Upper case W 111
mark case o
Open Lower
40 64 At 88 Upper case X 112
bracket case p
Close Upper case Lower
41 65 89 Upper case Y 113
bracket A case q
Upper case Lower
42 Asterisk 66 90 Upper case Z 114
B case r
Upper case Open square Lower
43 Plus 67 91 115
C bracket case s
Upper case Lower
44 Comma 68 92 Backslash 116
D case t
Upper case Close square Lower
45 Dash 69 93 117
E bracket case u
Upper case Lower
46 Full stop 70 94 Caret 118
F case v
Upper case Lower
47 Slash 71 95 Underscore 119
G case w
Upper case Lower
48 Zero 72 96 Grave accent 120
H case x
Upper case Lower
49 One 73 97 Lower case a 121
I case y
Upper case Lower
50 Two 74 98 Lower case b 122
J case z
Upper case Open
51 Three 75 99 Lower case c 123
K brace
Upper case
52 Four 76 100 Lower case d 124 Pipe
L
53 Five 77 Upper case 101 Lower case e 125 Close
5
M brace
Upper case
54 Six 78 102 Lower case f 126 Tilde
N
Upper case
55 Seven 79 103 Lower case g 127 Delete
O
An expanded coding table is occupied by national

alphabets.
However, you can specify 3 current standards on the
territory of Ukraine:
1. Encoding Windows-1251.
2. Coding of IBM (cp-437).
3. WEST-alternative (cp-866).
Encoding Windows-1251 (Table 2) was developed
by Microsoft and judging by the name is intended for
local computers running the Windows operating system.
Extended character set (table 2, 3, 4)
Тable 2. Encoding Windows (cp-1251)

128 Ђ 144 ё 160 176 ° 192 А 208 Р 224 а 240 р
129 Ѓ 145 ‗ 161 ѝ 177 ± 193 Б 209 С 225 б 241 с
130 , 146 ‘ 162 ѐ 178 I 194 В 210 Т 226 в 242 т
131 ђ 147 ― 163 J 179 i 195 Г 211 У 227 г 243 у
132 „ 148 164 ¤ 180 Ѡ 196 Д 212 Ф 228 д 244 ф
133 … 149 • 165 џ 181 µ 197 Е 213 Х 229 е 245 х
134 † 150 – 166 ¦ 182 ¶ 198 Ж 214 Ц 230 ж 246 ц
135 ‡ 151 — 167 § 183 · 199 З 215 Ч 231 з 247 ч
136 È 152 ~ 168 Ё 184 Ѐ 200 И 216 Ш 232 и 248 ш
137 ‰ 153 ™ 169 © 185 № 201 Й 217 Щ 233 й 249 щ
138 Љ 154 љ 170 Є 186 ѓ 202 К 218 Ъ 234 к 250 ъ
139 < 155 > 171 « 187 » 203 Л 219 Ы 235 л 251 ы
140 Њ 156 њ 172 ¬ 188 j 204 М 220 Ь 236 м 252 ь
141 Ќ 157 ќ 173 • 189 S 205 Н 221 Э 237 н 253 э
142 Th 158 ћ 174 ® 190 s 206 О 222 Ю 238 о 254 ю
143 Ў 159 ў 175 Ї 191 ї 207 П 223 Я 239 п 255 я
6
Code page-437 is the original IBM PC coding (Table
3).
Тable 3. Code page 437 is the original code page of the

IBM PC
128 Ç 144 É 160 á 176 ░ 192 └ 208 ╨ 224 α 240 ≡
129 ü 145 æ 161 í 177 ▒ 193 ┴ 209 ╤ 225 ß 241 ±
130 é 146 Æ 162 ó 178 ▓ 194 ┬ 210 ╥ 226 Γ 242 ≥
131 â 147 ô 163 ú 179 │ 195 ├ 211 ╙ 227 π 243 ≤
132 ä 148 ö 164 ñ 180 ┤ 196 ─ 212 ╘ 228 Σ 244 ⌠
133 à 149 ò 165 Ñ 181 ╡ 197 ┼ 213 ╒ 229 σ 245 ⌡
134 å 150 û 166 ª 182 ╢ 198 ╞ 214 ╓ 230 µ 246 ÷
135 ç 151 ù 167 º 183 ╖ 199 ╟ 215 ╫ 231 τ 247 ≈
136 ê 152 ÿ 168 ¿ 184 ╕ 200 ╚ 216 ╪ 232 Φ 248 °
137 ë 153 Ö 169 ⌐ 185 ╣ 201 ╔ 217 ┘ 233 Θ 249 ·
138 Ѐ 154 Ü 170 ¬ 186 ║ 202 ╩ 218 ┌ 234 Ω 250 ·
139 ï 155 ¢ 171 ½ 187 ╗ 203 ╦ 219 █ 235 δ 251 √
140 î 156 £ 172 ¼ 188 ╝ 204 ╠ 220 ▄ 236 ∞ 252 ⁿ
141 ì 157 ¥ 173 ¡ 189 ╜ 205 ═ 221 ▌ 237 φ 253 ²
142 Ä 158 ּ₧ 174 « 190 ╛ 206 ╬ 222 ▐ 238 ε 254 ■
143 Å 159 ƒ 175 » 191 ┐ 207 ╧ 223 ▀ 239 ∩ 255
The names of characters in code page 437
128 Upper case C with cedilla 192 Box drawings light up and right
129 Lower case u with diaeresis 193 Box drawings light up and horizontal
130 Lower case e with acute 194 Box drawings light down and horizontal
131 Lower case a with circumflex 195 Box drawings light vertical and right
132 Lower case a with diaeresis 196 Box drawings light horizontal
133 Lower case a with grave 197 Box drawings light vertical and horizontal
134 Lower case a with ring above 198 Box drawings vertical single and right double
135 Lower case c with cedilla 199 Box drawings vertical double and right single
136 Lower case e with circumflex 200 Box drawings double up and right
137 Lower case e with diaeresis 201 Box drawings double down and right
138 Lower case e with grave 202 Box drawings double up and horizontal
139 Lower case i with diaeresis 203 Box drawings double down and horizontal
140 Lower case i with circumflex 204 Box drawings double vertical and right
141 Lower case i with grave 205 Box drawings double horizontal
7
142 Upper case A with diaeresis 206 Box drawings double vertical and horizontal
143 Upper case A with ring above 207 Box drawings up single and horizontal double
144 Upper case E with acute 208 Box drawings up double and horizontal single
Box drawings down single and horizontal
145 Lower case ae 209
double
Box drawings down double and horizontal
146 Upper case AE 210
single
147 Lower case o with circumflex 211 Box drawings up double and right single
148 Lower case o with diaeresis 212 Box drawings up single and right double
149 Lower case o with grave 213 Box drawings down single and right double
150 Lower case u with circumflex 214 Box drawings down double and right single
Box drawings vertical double and horizontal
151 Lower case u with grave 215
single
Box drawings vertical single and horizontal
152 Lower case y with diaeresis 216
double
153 Upper case O with diaeresis 217 Box drawings light up and left
154 Upper case U with diaeresis 218 Box drawings light down and right
155 Cent sign 219 Full block
156 Pound sign 220 Lower half block
157 Yen sign 221 Left half block
158 Peseta sign 222 Right half block
159 Lower case f with hook 223 Upper half block
160 Lower case a with acute 224 Greek lower case alpha
161 Lower case i with acute 225 Lower case sharp s
162 Lower case o with acute 226 Greek upper case letter gamma
163 Lower case u with acute 227 Greek lower case pi
164 Lower case n with tilde 228 Greek upper case letter sigma
165 Upper case N with tilde 229 Greek lower case sigma
166 Feminine ordinal indicator 230 Micro sign
167 Masculine ordinal indicator 231 Greek lower case tau
168 Inverted question mark 232 Greek upper case letter phi
169 Reversed not sign 233 Greek upper case letter theta
170 Not sign 234 Greek upper case letter omega
171 Vulgar fraction one half 235 Greek lower case delta
172 Vulgar fraction one quarter 236 Infinity
173 Inverted exclamation mark 237 Greek lower case phi
Left-pointing double angle quotation
174 238 Greek lower case epsilon
mark
Right-pointing double angle quotation
175 239 Intersection
mark
176 Light shade 240 Identical to
177 Medium shade 241 Plus-minus sign
178 Dark shade 242 Greater-than or equal to
179 Box drawings light vertical 243 Less-than or equal to
8
180 Box drawings light vertical and left 244 Top half integral
Box drawings vertical single and left
181 245 Bottom half integral
double
Box drawings vertical double and left
182 246 Division sign
single
Box drawings down double and left
183 247 Almost equal to
single
Box drawings down single and left
184 248 Degree sign
double
185 Box drawings double vertical and left 249 Bullet operator
186 Box drawings double vertical 250 Middle dot
187 Box drawings double down and left 251 Square root
188 Box drawings up single and left double 252 Superscript lower case n
189 Box drawings double up and left 253 Superscript two
190 Box drawings up double and left single 254 Black square
191 Box drawings light down and left 255 No-break space
On computers running the MS-DOS operating

system, the is used (Table 4). And although the operating
system is considered obsolete, the encoding is used to this
day.
Тable 4. VOST-alternative encoding (cp-866)
128 А 144 Р 160 а 176 192 └ 208 ╨ 224 р 240 Ё
129 Б 145 С 161 б 177 193 ┴ 209 ╤ 225 с 241 Ѐ
130 В 146 Т 162 в 178 194 ┬ 210 ╥ 226 т 242 Є
131 Г 147 У 163 г 179 │ 195 ├ 211 ╙ 227 у 243 є
132 Д 148 Ф 164 д 180 ┤ 196 − 212 ╘ 228 ф 244 Ї
133 Е 149 Х 165 е 181 ╡ 197 ┼ 213 ╒ 229 х 245 ї
134 Ж 150 Ц 166 ж 182 ╢ 198 ╞ 214 ╓ 230 ц 246 ѝ
135 З 151 Ч 167 з 183 ╖ 199 ╟ 215 ╫ 231 ч 247 ѐ
136 И 152 Ш 168 и 184 ╕ 200 ╚ 216 ╪ 232 ш 248 ·
137 Й 153 Щ 169 й 185 ╣ 201 ╔ 217 ┘ 233 щ 249 °
138 К 154 Ъ 170 к 186 ║ 202 ╩ 218 ┌ 234 ъ 250 ּ
139 Л 155 Ы 171 л 187 ╗ 203 ╦ 219 235 ы 251 √
140 М 156 Ь 172 м 188 ╝ 204 ╠ 220 236 ь 252 №
141 Н 157 Э 173 н 189 ╜ 205 ═ 221 237 э 253 ¤
142 О 158 Ю 174 о 190 ╛ 206 ╬ 222 238 ю 254 ╵
143 П 159 Я 175 п 191 ┐ 207 ╧ 223 239 я 255
9
3. Unicode encoding
https://habrahabr.ru/post/312642/ Unicode: the necessary
minimum for each developer
Unicode code space
The Unicode code space consists of 1 114 112 code

positions in the range from 0 to 10FFFF. Of these, the
values for the ninth version of the standard are only 128
237. Part of the space is reserved for private use and the
Unicode Consortium promises never to assign values to
positions from these special areas.
For the sake of convenience, the entire space is
divided into 17 planes (now six of them are involved).
Until recently, it was accepted to say that most likely you
will only have to face the basic multilingual plane (Basic
Multilingual Plane, BMP), which includes Unicode
characters from U + 0000 to U + FFFF. (Running a little
forward: the characters from BMP are represented in
UTF-16 by two bytes, not four). In 2016, this thesis is
already in doubt. So, for example, popular Emoji
characters may well meet in a user message and need to
be able to correctly process them.
Of the unicode-encodings, the most widespread on
the Internet is UTF-8 (it won the palm in 2008), mainly
due to its economy and transparent compatibility with the
seven-bit ASCII. Latin and service symbols, basic
punctuation and numbers - i.e. all seven-bit ASCII
characters are encoded in UTF-8 with one byte, the same
as in ASCII. The symbols of many basic scripts, not
10
counting some rarer hieroglyphic signs, are represented in
it by two or three bytes. The largest of the code positions
defined by the standard - 10FFFF - is encoded by four
bytes.
Note that UTF-8 is a variable-length encoding. Each
Unicode symbol in it is represented by a sequence of code
quanta with a minimum length of one quantum. The
number 8 denotes the bit length of the code block (code
unit) - 8 bits. For the UTF-16 encoding family, the size of
the code quant is 16 bits, respectively. For UTF-32 - 32
bits.
To store string information in applications, 16-bit
Unicode encodings are often used because of their
simplicity, as well as the fact that the symbols of the
world's major writing systems are encoded with a single
16-bit quantum. For example, Java successfully applies
UTF-16 for internal string representation. The operating
system Windows internally also uses UTF-16.
With the term “coding”, some confusion may occur.
Unicode encodes twice. For the first time, a set of
Unicode characters (character set) is encoded, in the sense
that each code is assigned a code position. As part of this
process, the Unicode character set is converted into a
coded character set. The second time the sequence of
Unicode characters is converted to a string of bytes and
this process is also called coding.
The base multilingual plane includes Unicode
characters from U + 0000 to U + FFFF, which are
encoded in UTF-16 by two bytes.
The encodings UTF-8 and UTF-16 have a variable
code length. In UTF-8, each Unicode character can be
11
encoded with one, two, three, or four bytes. In UTF-16,
two or four bytes.
Now to the conclusions
ASCII defines 128 characters that correspond to

numbers 0–127. Unicode defines (less) 221 characters,
which are similarly mapped to numbers 0–221 (although
not all numbers are currently assigned, and some are
reserved).
Unicode is an extended ASCII set, and numbers 0–
127 have the same ASCII value as Unicode. For example,
the number 65 means "Latin capital letter "A"".
Because Unicode characters usually do not fit in a
single 8-bit byte, there are many ways to store Unicode
characters in byte sequences, such as UTF-32 and UTF-8.
12
We continue the topic of the previous practical
lesson.
We settled on the type char (character).
This is a type for storing signed numbers.
Its length is 1 byte, i.e. 8 bit.
Let's see how many and what numbers he can store.
In past classes, we saw that positive numbers in the bit
grid start from zero, and negative numbers start from one.
For example,
We are already familiar with this table from lecture 3.

But now this table does not apply. It has complement and
inverse code, which are used to replace the subtraction by
addition.
13
Now we are talking about another representation of a
number - the representation of a signed number.
Here, the complement and inverse code do not apply. So
from this table only the first column remains.
The first column shows that all non-negative numbers

start at 0, and all negative numbers start at one. Thus, the
right (high) bit of a number is a sign.
The remaining 7 bits remain on the mantissa.
Those. in the char char, the most significant bit is the sign,
and only 7 bits remain for the number itself.
Zero in char type is represented as
00000000
14
Let's see which number corresponds
01111111
We use the scheme of degrees of 2
position # 8 7 6 5 4 3 2 1
Sign 26= 25= 24= 23= 22= 21= 20=

64 32 16 8 4 2 1
0 1 1 1 1 1 1 1
Those 64+32+16+8+4+2+1 = 127.
Those using the char type, you can represent non-
negative integers from 0 to 127 - a total of 128 non-
negative integers.
Find out how many and what negative numbers can
be represented using the char type.
First number
11111111
We use the scheme
position # 8 7 6 5 4 3 2 1
Sign 26= 25= 24= 23= 22= 21= 20=

64 32 16 8 4 2 1
1 1 1 1 1 1 1 1
Those 64+32+16+8+4+2+1= 127. But taking into account
sign 1, i.e. 0, it turns out -127.
And now the number
10000000
It turns out -0. But instead of -0 in type char, 10000000
corresponds to -128.
15
Thus, the range of type char is from -128 to 127.
11111111 = 127
...
00000001 = 1
00000000 = 0
10000001 = -1
...
11111111 = -127
10000000 = -128
A total of 256 numbers.
In the range from 0 to 127, i.e. from 00000000 to
01111111, you can place integers from 0 to 127 or
characters of the ASCII table, which we studied in lecture
"Coding of text information."
But place the characters of the Windows-1251 table
16
or tables like Code page-437 IBM-PC or VOST-
alternative is no longer possible, because they use positive
integers from 128 to 255.
To do this, you can do a special conversion. For
example, convert 128 to -128, i.e. 10000000, and 255 in -
1, i.e. 10000001.
To use a range from 128 to 255 without conversion,
there is an unsigned char type, or unsigned char.
It also takes 1 byte, but only now the high bit is free.
There is no sign. The unsigned char type is a type for
storing non-negative integers from 0 to 255, so the
leading sign is not required.
Those
from 00000000 = 0
up to 11111111 = 255
17
This type also contains the ASCII table (from 0 to
127) - the basical part, and any national additional table
such as Windows-1251 (from 128 to 255) - the variable
part, depending on the country or region in which the
computer is located.
This type can be used simply for storing any integers
from 0 to 255.
Thus, it is clear that data of type char must be
converted to type unsigned char and vice versa. However,
without such a conversion, the wrong answer may turn
out - an error. But this error will not lead to a program
crash in the sense that both the char type and the unsigned
char type have the same width - 1 byte (8 bits).
Much worse when data types of different capacities
are used. For example char (8 bits) and int (4 times 8 bits,
i.e. 32 bits - 4 bytes). In this case, if the types get
confused, the computer may crash. So in the int type will
fit 4 data of type char, but in char this type of int will no
longer fit.
What type is int? Type int - from the word integer
(integer) - this is the main type for storing integers
positive and negative numbers and 0.
8 bits (1 byte) end with the highest power of two? i.e. 7
high bit low bit
8 7 6 5 4 3 2 1
2 7 = 2 6 = 2 5 = 2 4 = 2 3 = 22 = 21= 20=
128 64 32 16 8 4 2 1
18
16 bits (2 bytes) end in high order 215
16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
215 214 213 212 211 210 29 28 27 26 25 24 23 22 21 20
Determine how much it is.

24 bits (3 bytes) end by 223
24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
223222221220 219218217216 215214213212 211210 292827262524232221 20
Determine how much it is.

Finally, 32 bits (4 bytes) end by 231
32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
231230229228227226 225224 223222221 220219218 217216215 214213212211210 29282726252423222120
Calculate also what equals to 231

But, as was said, the int type is used to store both
positive, or rather non-negative numbers, and negative
ones. Those it is a type with sign, just like char.
Thus, the high, rightmost bit is used for the sign.
Those the number starts at 230
For example, 0
00000000 00000000 00000000 00000000
We deliberately separate bytes (8 bits each) for clarity.
Or 1
00000000 00000000 00000000 00000001
Now -1
10000000 00000000 00000000 00000001
19
Largest positive number
01111111 11111111 11111111 11111111
Task: using powers of 2, determine which decimal
number it corresponds to, i.e. to fold
230+229+228+...+23+22+21+20
True, here you can do simpler.
For example, if you are limited to 1 byte, i.e.
2726252423222120
It will turn out 128 + 64 + 32 + 16 + 8 + 4 + 2 + 1 = 255
But this result can be obtained without addition. After all,
the next degree
28 = 256, and 255 is 256-1, i.e.
27+26+25+24+23+22+21+20=28-1
So is the amount
230+229+228+...+23+22+21+20=231-1
Calculate it.
So, a non-negative range of type int is from 0 to 231-1
It is clear that the smallest negative number will be
similar
11111111 11111111 11111111 11111111
those - (231-1).
20
Calculate it too.
However, -0 remains
10000000 00000000 00000000 00000000
It turns out, as in the type char, where -0, i.e. 10000000,
corresponding to -128, i.e. -127-1 or - (127 + 1).
So in int is an even smaller number
-(231-1)-1 or -(231-1+1)=-231
So the range of 4 bytes of type int is
from -231 to 231-1.
Or
01111111 11111111 11111111 11111111=231-1
...
00000000 00000000 00000000 00000001=1
00000000 00000000 00000000 00000000=0
10000000 00000000 00000000 00000001=-1
...
11111111 11111111 11111111 11111111=-(231-1)
10000000 00000000 00000000 00000000=-231
Also, as there is an unsigned type unsigned char from 0 to
255, there is an unsigned type unsigned int - from 0 to 231.
Total task
1) Calculate the boundary of the 2nd byte 215
2) Calculate the boundary of the 3rd byte 223
3) Calculate the border of the 4th byte 231
4) Calculate the upper bound of type int 231-1
5) Calculate -(231-1)
21
6) Calculate the lower bound of type int -231
7) Write a range of type int.
8) Write a range of type unsigned int.
9) How many total numbers can be encoded using 4
bytes?
Formulate answers in writing in a notebook.
Make a summary. On each page, put the last name,
first name, group number. Take pictures and send to
http: //nikolaiulyanoff1968@gmail.com
The type "char" (character).

The code of the proper program (the listing):
#include <conio.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define n 100
char st[n],i;
void main(void)
{
// The way #1
st[0]='N';
st[1]='i';
st[2]='k';
st[3]='o';
st[4]='l';
st[5]='a';
st[6]='i';
st[7]='\0';
printf("String:\n");
for(i=0;i<strlen(st);i++)
printf("%c",st[i]);
// The way #2
22
strcpy(st,"Nikolai");
printf("\nString #2:\n");
printf("%s",st);
// The ASCII way #3
st[0]=65;
st[1]=107;
st[2]=104;
st[3]=109;
st[4]=101;
st[5]=100;
st[6]=32;
st[7]=73;
st[8]=46;
st[9]='\0';
printf("\nString #3:\n");
for(i=0;i<strlen(st);i++)
printf("%c",st[i]);
getch();
}
The hometask.
To compile your own name online, for example, in
https://www.onlinegdb.com/online_c_compiler
To obtain the picture of the resulting online page and

to send me into the mail
http://nikolaiulyanoff1968@gmail.com
Developed by Associate Professor of the Department

№ 903
N.V. Ulyanov
Protocol № ___ ____________
23

Lecture 4 Comp SC and Engineering+

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture 4 Comp SC and Engineering+

Uploaded by

Copyright:

Available Formats

Lecture

in Computer Science and Engineering

Unit 6. Coding Text Information

John Gammack, Val Hobbs, Diarmuid Pigott. The Book

If you assign a certain number to each alphabet

Thus, 1 character is 1 byte. A table that determines

Тable 1. Original character set ASCII (American Standard

The names of ASCII symbols

An expanded coding table is occupied by national

Тable 2. Encoding Windows (cp-1251)

Тable 3. Code page 437 is the original code page of the

The names of characters in code page 437

On computers running the MS-DOS operating

Unicode code space

The Unicode code space consists of 1 114 112 code

Now to the conclusions

ASCII defines 128 characters that correspond to

We are already familiar with this table from lecture 3.

The first column shows that all non-negative numbers

Sign 26= 25= 24= 23= 22= 21= 20=

Sign 26= 25= 24= 23= 22= 21= 20=

But place the characters of the Windows-1251 table

215 214 213 212 211 210 29 28 27 26 25 24 23 22 21 20

Determine how much it is.

223222221220 219218217216 215214213212 211210 292827262524232221 20

Determine how much it is.

231230229228227226 225224 223222221 220219218 217216215 214213212211210 29282726252423222120

Calculate also what equals to 231

The type "char" (character).

// The ASCII way #3

To obtain the picture of the resulting online page and

Developed by Associate Professor of the Department

You might also like