Professional Documents
Culture Documents
C For Everyone CJ Willrich and Richard Man Compress
C For Everyone CJ Willrich and Richard Man Compress
for Everyone
The JumpStart Guide to C
by
Richard Man & C.J. Willrich
Copyright © 2015 ImageCraft Creations Inc.
All Rights Reserved
http://c4everyone.com
info@imagecraft.com
DEDICATION
int main(void) {
printf(“Thanks to %s, %s, %s, and %s\n”,
“David W. Krumme”, “Dennis M. Ritchie”,
“Brian W. Kernighan”, “Phillip J. Plauger”);
return 0;
}
TABLE OF CONTENTS
SECTION I – TUTORIAL INTRODUCTION
1 – INTRODUCTION
SECTION II – THE C PROGRAMMING LANGUAGE
2 – BASIC ELEMENTS OF C
3 – EXPRESSIONS AND OPERATORS
4 – STATEMENTS
5 – VARIABLES
6 – TYPES AND DECLARATIONS
7 – FUNCTIONS
8 – THE C PREPROCESSOR
9 – THE STANDARD C LIBRARY
SECTION III – ADVANCED TOPICS IN C
10 – EFFECTIVE POINTER AND ARRAY USAGE
11 – DYNAMIC DATA STRUCTURES
SECTION IV – APPENDICES
A – INTRODUCTION TO COMPUTER ARITHMETIC
B – A BRIEF HISTORY OF C
C – THE C STANDARDS
D – C COMPILERS AND THE RUNTIME ENVIRONMENT
SECTION I
TUTORIAL INTRODUCTION
Using example programs, we will examine the basic structures and features of C
programs.
1. INTRODUCTION
This book, C for Everyone - The JumpStart Guide to C, is part of the JumpStart
MicroBox education kit package. The purpose of this book is to teach the
programming language C, using program examples running on the JumpStart
MicroBox hardware.
In addition, the document “JumpStart MicroBox Hardware” focuses on the
hardware aspect of the kit and the document “JumpStart API” describes our API,
which makes getting started with Cortex-M much easier, eliminating much of the
tedium and potential mistakes in the low-level peripheral setup procedures.
If you are already proficient in C and want to get started with embedded
programming, you may skim this book and concentrate on the example section
that is focused on embedded programming. Be warned that unless you are a “C
Wizard”, chances are that there is information in this book that may be useful but
unknown to you. Therefore, we recommend at a minimum that you skim the
content.
A college level course syllabus can be constructed around this book and its
examples, suitable for both hardware and software engineers.
Finally, “can you learn C from this book without the MicroBox hardware?” The
answer is undoubtedly Yes! Section II, The C Programming Language, is a
concise practical introduction to the C language. Reading through this chapter and
Section II alone will give you a good understanding of the C language. So
whatever platform you are using C on, look at the examples and text, and “type
away”.
JumpStart MicroBox Preparation
Before you start, since we will make use of the JumpStart MicroBox to run the
example projects, please follow the Quick Start Guide document and do the
followings:
1. Install the JumpStart C for Cortex compiler
[1]
2. (Optional) Obtain a license for the JumpStart C for Cortex compiler
3. Install the PuTTY terminal program
4. Install the ST USB driver for the ST Nucleo
5. (Do not attach the ACE Shield to the ST Nucleo yet!)
6. Connect the ST Nucleo to the PC and make sure that the driver is working
properly
7. Follow the instructions on how to invoke the IDE, build, and run the Blink-
Nucleo-LED program
8. In the IDE, activate the Hello World project, build, and run the project
Focus of This Chapter
Our aim in this chapter is to show the essential elements of C by presenting
examples that you can run on the JumpStart MicroBox hardware using the
JumpStart C for Cortex-M compiler tools.
This chapter will only use the most basic features of C, as this is a quick
introduction chapter. More advanced topics such as pointers and structures -
which are key to effective C programming - will be presented later, along with
more comprehensive descriptions of some of the topics covered here.
The book assumes that you have some basic understanding of computer
arithmetic. If the terms bits and bytes, or CPU do not mean anything to you, then
you should read Appendix A <Introduction to Computer Arithmetic>.
What Is C?
C is a programming language. A programming language is a created language for
communicating instructions to the Central Processing Unit (CPU), the “heart” of a
computer or microcontroller system. Programming languages are described by
formal rules and definitions. The syntax of a programming language refers to what
lexical characters (letters, symbols, etc.) can be used in a program, and where
they may occur. The semantics of a programming language refer to the meanings
of the program elements.
C belongs to a class of programming languages called procedural languages.
[2]
Most algorithms can be expressed in C easily. Since its introduction in late
1960s, followed by an explosive rise in popularity in the late 1970s and the 1980s,
C has become the primary programming language of any new CPU (except for
extremely limited or specialized ones).
C is especially suited for low level programming since much of the low level
access code can be written in C, avoiding the need to deal with machine
languages, which are difficult to use. C does have many modern programming
language features, striking a good balance between power, ease of use, and
efficiency.
The First C Program
If you have followed any programming language tutorials, you may have
encountered “Hello World”, which is a typical first test program that prints out that
eponymous phrase. This practice was in fact popularized by the “C Bible”: The C
Programming Language, by Brian W. Kernighan and Dennis M. Ritchie. The
objective of the program is to print the words
hello, world
[3]
Life is simple on a Unix machine, where you use a terminal and create a text
file containing the following text:
#include <stdio.h>
void main(void)
{
printf(“hello, world\n”);
}
On a Unix machine, assuming you save the program in a file name hello.c, using
a shell prompt, you type (‘$’ is the “shell prompt”)
$cc hello.c
$./a.out
hello, world
The first line is the command to run the C compiler (named cc, clever eh?). The C
compiler generates an output file called a.out. The second line runs the program
a.out and the third line shows the output as the program is run.
However, most people do not use a shell on Unix. For Windows, it’s a lot more
complicated because it requires a lot of support code to create a “window” etc. For
embedded systems, it is complicated because there is no standard method of
“writing to a terminal”. Fortunately, we have made the process easy with the
JumpStart MicroBox.
“Hello World”
With the IDE, open the file main.c for the “Hello World” project:
Compared to the hello.c in the previous page, this file looks a bit more
complicated. For now, you can ignore all the elements that are not in the original
program (they appear after line 17, omitted here), as they are for setting up the
microcontroller environment.
What do we have here? All C programs make use of functions and variables. A
function contains statements specifying the computing operations. The operations
[4]
may use variables to store values. For example, starting at line 10 , there is a
function called main.
The function main is special, as it is the function that will be run first in a C
program, after the C environment is set up.
When run, its output can be viewed using a terminal emulator program such as
[5]
PuTTY :
The first line “ImageCraft JumpStart…” is produced in the Setup function.
The second line “hello, world” is the output we are interested in.
Comments
Examine the first 3 lines of the file main.c:
/*
* Hello World example
*/
This is called a comment block, and is ignored by the compiler. A comment block
is any text enclosed by the /* */ pair. They are often used to describe what the
program does or what a piece of code does.
BEST PRACTICE: do not state the obvious, for example:
/* assign 5 to “i” */
i = 5;
repeats what the code is in prose form, which is rarely useful. However:
/* use 5 as the starting seed to get good random value */
i = 5;
is more useful, as it explains why this is being done.
A line comment is a comment that starts with //. Any characters after // are
ignored by the compiler until the next line. For example:
i = 5; // use 5 as the starting seed to get good random
value
#include
After the comment block, we have two lines, starting with the # symbol:
#include <stdio.h>
#include <jsapi.h>
These are include file directives. When compiling, the compiler inserts the
contents of these files (stdio.h and jsapi.h) in place of these lines.
#include and all lines starting with # are preprocessor directives and are
described in the chapter <C Preprocessor>.
For now, it is sufficient to know that stdio.h contains information about the
standard input / output library, and jsapi.h contains information about the
JumpStart API.
Semicolons
You may notice that there are semicolons “;” in a few places in the example.
Semicolons are statement terminators, informing the compiler that the end of a
statement has been reached.
Unlike some languages, indentations and white space have no effect on the
meaning of a C program. Carriage returns (or, properly speaking: end-of-line
markers) affect the C preprocessor and terminate a single-line comment
beginning with //, but otherwise are not a part of the C syntax.
Function Definition
The basic form of a function definition, e.g main, looks as follows:
int main(void)
{
// function body
}
int is the data type of the return value of the function main. A more formal
skeleton of a function definition looks like this:
return-type function-name ( parameter-list )
{
<list-of-statements-and-declarations>
}
We will discuss data types further in the chapters <Variables> and <Types and
Declarations>. Following the data type, you write the name of the function,
[6]
followed by a list of arguments between a set of parenthesis ( ). In this
example, main takes no argument and is denoted by the use of the keyword void
in the argument list.
The function main contains these statements:
Setup();
printf(“hello, world\n”);
return 0;
The first two lines are function calls and the last line is a return statement.
An example of a function that takes arguments:
char *strcpy(char *dst, const char *src)
{
// body
}
The initial char * is the function return-type, strcpy is the name of the
function, and “char *dst, const char *src” is the argument list. There are
two arguments:
1. the first argument is named dst, and has the data type char *
2. the second argument is named src, and has the data type const char *
These data types will be explained later. The statements of a function are
enclosed in a set of braces { } .
Calling a Function
Invoking a function is colloquially known as “calling” a function. The preferred
method of communicating information when calling a function is to provide a list of
values, called arguments.
A function call is written as the name of the function, followed by a list of
arguments enclosed by a pair of parenthesis ( ):
Setup();
printf(“hello, world\n”);
In this example, there are two function calls. The first one is Setup and it is called
without any arguments, hence the empty parenthesis list. Setup initializes the
microcontroller environment using the JumpStart API and will be discussed later.
The second function call is printf, which is a library function that prints out its
argument. The argument in this case is a literal string, which is a sequence of
characters enclosed in a pair of double quotes. In C, a literal string is also known
as a string, or a string constant.
In this example, the string constant is “hello, world\n”. Note that this string
constant has the characters \n. This is an escape sequence.
Escape Sequence
There are characters that cannot be typed inside a string constant. For example,
the newline character - corresponding to hitting the ENTER key on the keyboard
moves the output to the beginning of next line - cannot be input directly as part of
a constant string. If you write:
printf(“hello, world
\n”);
Compiling this piece of code will result in many error diagnostic messages. For
example, JumpStart C produces:
!E x.c(7): syntax error; found `world’ expecting `)’
!E x.c(7): syntax error; found `world’ expecting `;’
!E x.c(7): missing “
!E x.c(7): undeclared identifier `world’
!W x.c(7):[warning] expression with no effect elided
!E x.c(7): syntax error; found “); … expecting `;’
To get over these limitations, escape sequences are used. The sequence \n in
the string is C notation for the newline character. When printed, it advances the
output to column one on the next line.
All escape sequence starts with the backslash character \. The most common
escape sequences are:
A Program to Print Miles to Kilometers Conversion
The next program prints a table of conversion from miles to kilometers. With the
IDE, open the “Miles to Kilometers” project. When run, the output should look like
this:
main looks like this:
Again, we will ignore the call to Setup for now.
Variables are for storing values used in a program. In this example, miles, end,
increment, and kilometers are variables used in main.
Variable Declaration
Before using a variable, you must write a declaration for it:
int miles = 20;
This declares a variable with the name miles. The declaration must appear
before any reference to the name miles.
int is the data type of miles. In the sample code, we have a separate
declaration for each variable, but they can be written in a single declaration
statement:
int miles = 20, end = 90, increment = 5;
The expression “= 20” after miles is called an initializer. The = is the
assignment operator and the value of the right hand side (20) is assigned to the
variable on the left (miles). You can write a declaration without an initializer, and
use the assignment statement separately:
int miles;
miles = 20;
The general form of a variable declaration is:
<data-type> <variable-name> <optional-initializer;>
Expression Statement
The key computation happens on line 21:
int kilometers = miles * 1.60934;
or written as separate declaration and assignment:
int kilometers;
kilometers = miles * 1.60934;
This is the mathematical formula which converts miles to kilometers, written in C.
The symbol *is the multiply operator.
While Loop
The expression kilometers = miles * 1.60934 computes one value of
kilometers. To print out the table, you can write a series of statements, each
one computing a new value of kilometers based on the current value of miles.
kilometers = 20 * 1.60934;
// print it
kilometers = 25 * 1.60934;
// print it
kilometers = 30 * 1.60934;
// print it
…
Or you can use a while loop:
Lines 14 to 16 declare three variables and assign them with initial values. Line 19
is the while statement. The expression inside the parenthesis following the
while keyword is called the test conditional.
To use a loop, first we initialize miles with the value 20. We want the conversion
to end when miles reaches 90, so the variable end contains this final value.
The body of the loop is a compound statement - from line 20 to line 24 - which is a
series of statements surrounded by a set of { } . You might notice that the
body of a function is in fact a compound statement.
With a while loop, the body of the loop is run again and again as long as the
test conditional (on line 19) is true:
while (miles <= end)
The conditional tests whether end is greater than or equal to miles, and if true,
the loop will run again. <= is the “greater than or equals to” relational operator.
Starting with a set of initial values, and given the end condition, all we need to add
to make our loop work is to update the miles variable in the loop body so the
loop will eventually terminate. This is done in line 23:
miles += increment;
The += is the addition-assignment operator. It adds the right hand side expression
to the variable on the left hand side, and is equivalent to writing:
miles = miles + increment;
Note that Instead of using the end variable, we could have also used a numeric
constant on line 19:
while (miles <= 90)
Similarly, the variable increment does not change, so the expression can be
rewritten as
miles += 5;
Variables and #define Constant
We have seen two instances where the variables do not change values and the
identical program can be written using the numeric constants. When should a
variable be used and when should a symbolic constant be used? A property of
using a variable is that its use would be self-documenting if a good name is
chosen. Consider
miles += increment;
versus
miles += 5;
The constant 5 seems more arbitrary whereas using the name increment is
more deliberate. Moreover, if the value is used in more than one place, using a
variable means that if a change is necessary, you only need to change it in one
place. For example, imagine the value is used in more than one place, and you
want to change the value from 5 to 6. If you have used the constant 5, you will
need to find all occurrences of 5 and check if it is referring to the value in question
or some other use of the number 5, then change it to 6. This could be tedious and
be error prone. However, if you have the variable increment, then you only need
to change the initial assignment code.
Nevertheless, using a constant might have a slight runtime speed advantage. A
compromise is to use the #define C preprocessor directive:
#define INCREMENT (5)
Any reference to INCREMENT will be replaced by its definition (5). There are
good reasons why 5 is inside a set of parenthesis, which will be explained later. C
is case-sensitive, therefore words like Increment, increment, etc. do not match the
word INCREMENT exactly and will not be replaced.
The same program fragment, then, can be written as:
int miles = 20;
#define END 90
#define INCREMENT 5
while (miles <= END)
{
int kilometers = miles * 1.60934;
// print
miles += INCREMENT;
}
A common convention is to use UPPERCASE for #define names, but this is not a
requirement.
EXERCISE: rewrite the program using the things that you have learned so far.
For Loop
A while loop is not the only looping construct in C. You can also write a for
loop:
#define END 90
#define INCREMENT 5
for (int miles = 20; miles <= END; miles += INCREMENT)
{
int kilometers = miles * 1.60934;
// print
}
One thing to notice is that C encourages writing succinctly. Partially it is because
C was designed in an era where slow 300 baud teletypes were the primary
interface, so the fewer characters used, the better. However, terseness does not
equate being unreadable; a best practice is to make your code succinct but clear.
A for loop combines several elements of a loop in the for expression:
After the keyword for, a set of ( ) surrounds the for expression, which in the
case above is a list of 3 expressions separated by -two semicolons.
The init-expression is run once, before the test conditional and the for loop body.
Usually you write variable initialization(s) in the init-expression. You may also
optionally declare the variable here, if it has not been declared before.
The test conditional serves the same function as the test conditional in a while
loop: the loop body will run as long as the test conditional is true.
The post-expression is run after the loop body is run, but before the next test
conditional check.
With a for loop, all the mechanisms related to the looping construct are collected
in a single for expression. While the same code can be done using a while
loop, it is more readable to see the initial condition, the test condition, and the
loop-increment in the same place.
By the way, the init-expression may contain multiple initialization expressions,
separated by commas:
for (a = 0, b = 1, c = 2; …
EXERCISE: rewrite the example project with a for loop and #define values.
printf Format Code
Astute readers may notice that something is going on with the call to printf in
the “miles to kilometers” conversion program:
printf(“%d\t%d\n”, miles, kilometers);
The first argument to printf is a string constant and specifies a format string. A
format string may contain format codes. A format code starts with the character %
followed by format specifiers. The specifiers can get quite involved, with many
options, as explained in the chapter <The Standard C Library>. For now, it is
sufficient to know these common codes:
Codes Descriptions
%d print the argument as a signed decimal number
%x print the argument as a hexadecimal number
%u print the argument as an unsigned decimal number
%f print the argument as a floating point number
%s print the argument as a string
%c print the argument as a character
printf processes the format string one character at a time. If it sees a %<code>
format code, it fetches the next argument and prints it out according to the format
specifier. Otherwise, it prints out the character it sees.
VARIADIC FUNCTIONS: Most functions are defined to have a fixed number of
arguments, including “no argument”. You can write functions that take a variable
number of arguments; they are known as variadic functions. printf is such a
function. Variadic functions will be discussed in depth later.
Integer Constants
Integer constants are numbers, e.g. 42. A negative constant has a - prefix, e.g.
[7]
-42 . For symmetry, you may also write a positive number with + prefix, e.g. +42.
In normal writing, numbers are written in base 10. That is, each digit in a number
is a power of 10:
123 = 1*100 + 2*10 + 3*1
= 1*102 + 2*101 + 3*100
Other number bases are possible; hexadecimal is base 16. In C, hexadecimal
constants are written with a 0x or 0X prefix, and the letters ‘A’ to ‘F’ and ‘a’ to ‘f’
are used to represent the numbers 10 to 15. For example, 0xA is 10, 0x1A is 26
etc.
This will be explained further later.
Character Constants
Enclosing a character inside a single quote‘ ’pair is the C method of writing a
character constant. You may write:
int c = ‘C’;
The value of a character constant is its numeric value in the compiler’s
environment character set. In English speaking countries, the ASCII code is
almost always used.
ASCII Code
ASCII (American Standard Code for Information Exchange) is a standard of
encoding characters. There are plenty of ASCII tables on the web, but we can
even write a C program to print out the values. The important portion is:
#include <ctype.h>
…
printf(“dec\thex\tcharacter\n”);
for (int i = 1; i <= 127; i++)
{
printf(“%d\t0x%x”, i, i);
if (isprint(i))
printf(“%c”, i);
printf(“\n”);
}
The line before the for loop prints out the table header. Notice the use of the
[8]
escape code \t to print out tabstops. ASCII is a 7-bit code , with values from 1
to 127. The test-conditional of the for loop runs through all the valid ASCII
values.
The first printf inside the loop prints out the decimal value and the
hexadecimal value of the loop variable i.
[9]
The if statement executes the if-body if the test conditional is “true” :
if (isprint(i))
printf(“%c”, i);
The function isprint is a function in the C Standard Library. The #include
<ctype.h> statement provides information to the compiler about this function.
isprint returns a nonzero value if the input argument is a printable character.
printf is called to print out character code of i by using the format code %c.
EXERCISE: Modify one of the existing projects, or create a new one, and print out
the ASCII codes as above.
Floating Point Data Type
The “miles to kilometers” conversion program prints the converted values as
whole integers, even though the conversion factor 1.60934 is a floating point
number. Open the “Miles to Kilometers - FP” project, and when run:
The “Kilometers” are now output as floating point numbers.
The changes in the program are minor:
float kilometers = miles * 1.60934;
printf(“%d\t%f\n”, miles, kilometers);
The data type for kilometers is now float instead of int. A more subtle
change is that inside the string constant argument to printf, it now reads
“%d\t%f\n” instead of “%d\t%d\n”. The %f format code prints out a floating
point argument.
Floating point is almost an “advanced topic” for embedded programming, since
floating point operations result in much long sequences of machine instructions,
and the resulting code would be longer and slower than code not using floating
point. However, in this introductory chapter, we want to introduce the concept that
C contains other data type besides basic integer types, and floating point type is a
natural follow-on.
Integer and Floating Point Conversion
C allows you to intermix integer and floating point expressions:
float kilometers = miles * 1.60934;
miles is an int. When it is multiplied with a floating point number, 1.60934, the
compiler converts miles into a floating point number and a floating point
multiplication is performed. The floating point result is assigned to kilometers.
In the earlier example:
int kilometers = miles * 1.60934;
kilometers is an int in this case. For this example, just like the previous case,
miles is converted into a floating point number and a floating point multiplication
is performed. However, since the target of the assignment is an int, the
multiplication result is converted into an int, and then assigned to kilometers.
C has precise (but sometimes misunderstood) rules on what happens when you
mix expressions with different data types, as we will see later.
IMPORTANT: be sure to get a good understanding of C’s promotion and
balancing rules in the chapter <Expressions and Operators>. The rules are simple
but may be non-intuitive. Not fully knowing these rules often result in subtle
defects in a C program.
Fixed Point Data Types
A good alternative to using floating point computations, especially for embedded
systems, is to use fixed point computations. A fixed point number consists of an
integer, usually represented in two’s complement, and a scale factor. There is no
single “standard” scale factor, and each unique scale factor is considered as a
separate fixed point type. Obviously in a given program, only a limited number of
scale factors - maybe even just one - will be used. The choice of the scale factor
depends on the expected value range of the data being processed by the
program.
Arithmetic operations with fixed point objects are well-defined, and run faster than
floating point operations, if the target device does not have native floating point
instructions, which is the case with most microcontrollers. The downside of using
fixed point is that the value range of a particular fixed point type is limited
compared to the floating point type. Thus, it is a trade-off between the flexibility of
floating point versus the speed of fixed point.
Standard C does not provide fixed point data types or operations. Therefore often
there is no choice but to use floating point. However, an extension called
Embedded C does define these. It is expected that JumpStart C will implement
fixed point support in 2016, and this book will be updated at that time.
Arrays
An integer or floating point variable is called a scalar variable since it can hold
only one value at a time. C also has compound or aggregate variables that can
hold multiple values. The simplest aggregate variable type is an array.
Open the project “Miles to Kilometers - FPArray”. This is exactly the same as the
previous “Miles to Kilometers - FP” project, except that the results of the
conversions are also stored in an array:
int miles = 20;
int end = 90;
int increment = 5;
#define NUM_OF_ELEMENTS ((90-20)/5 + 1)
float kilos_array[NUM_OF_ELEMENTS];
printf(“Miles to Kilometers conversion\n”);
for (int i = 0; miles <= end; i++)
{
float kilometers = miles * 1.60934;
printf(“%d\t%f\n”, miles, kilometers);
miles += increment;
kilos_array[i] = kilometers;
}
An array declaration looks like this:
<data-type> <variable-name> [ <number-of-elements> ];>
Just like a scalar variable declaration, it starts with the data type of the variable,
followed by the variable name, then followed by the number of array elements
surrounded by a pair of [ ]s. All array elements must have the same type.
In this example, kilos_array is an array of float items. When you declare an
array, you must specify its dimension (the number of array elements). In C, array
dimensions must be a constant value, therefore we use the constant expression
(90-20)/5+1 which evaluates to the value 15. As in real math, C expressions
use precedence rules to determine the order of evaluation of the subexpressions,
hence the need to use parentheses to force the subtraction to be performed first,
but there is no need to put parentheses around the division since it will be
evaluated before the +1 addition.
The +1 in the dimension is needed to account for the last element to be stored.
Without it, the array will be one element too small.
The loop has now changed from a while loop to a for loop using the variable i
as an index to store into the array kilos_array. An array index starts with 0 and
should not exceed dimension-1, or NUM_OF_ELEMENTS-1.
Array Indexing
To access an array element, you write the array variable name, followed by an
index enclosed in [ ]. An array index must be an integer or an integer
expression:
kilos_array[i] = kilometers;
BUG ALERT: C does not check if the indexing is beyond the range of the array.
For example, it’s perfectly legal in C to access kilos_array[-1] or
kilos_array[NUM_OF_ELEMENTS]; with the latter being an easy mistake to
make. For example, this is a wrongly-written terminating condition:
#define NUM_OF_ELEMENTS ((90-20)/5 + 1)
float kilos_array[NUM_OF_ELEMNTS];
for (int i = 0; i <= NUM_OF_ELEMENTS; i++)
kilos_array[i] = 0;
By incorrectly using the greater-than-or-equal <= comparison operator, the last
element accessed would be one beyond the dimension of the array.
Reading an out-of-bound array element returns a random value, depending on
how the variables are laid out in memory, but writing to an out-of-bound array
element would almost certainly cause a problem. Unfortunately, this bug may not
show up until later, and bad memory writes such as this are a major source of
bugs in C.
Despite the major problems, there are good reasons why C does not perform
index bound checking, as we will see later in the chapter <Pointers and Arrays>.
Nevertheless, be careful with array indexing!
Character Arrays
Arrays are commonly used to store characters:
char spaceship[] = { “NCC-1701” };
Unlike some -programming languages, C does not have a string data type. Arrays
of char are used instead. In the above declaration, the array variable spaceship
is initialized with the string constant “NCC-1701”. Notice that the the number of
elements is omitted in the declaration, leaving an empty []. Since this is a
declaration with an initializer, the compiler determines the number of elements
needed and allocates enough space for spaceship.
A char array holding a string needs a way to determine the end of the string. This
is done by storing a numeric value of 0, also known as the null value, after the last
character of the string. For example, spaceship looks like this in memory:
spaceship:
| ‘N’ | ‘C’ | ‘C’ | ‘-’ | ‘1’ | ‘7’ | ‘0’ | ‘1’ | 0 |
With each | | denoting an 8-bit memory cell (a byte). The integer value 0 ends
the string.
The length of a string array includes the terminating 0.
Chapter Review
Through example programs:
There are special words in C (called keywords), for example, int, while,
return.
We have seen the basic structure of a C program.
There are good coding practices such as writing clear and useful comments.
Functions contain code that performs computations.
There are different kind of statements in a function, such as for, while,
return, and if.
Functions are called with arguments, containing information you wish to
communicate to the function.
printf uses format codes to output arguments in different forms.
printf is a variadic function, as it can take a variable number of arguments.
There are different kinds of operators, such as + - * / ++
Variables are objects that hold values.
A variable declaration informs the compiler of a variable’s attributes such as its
name and data type.
C has integer and floating point scalar data types.
C has aggregate data type such as arrays.
Out-of-bound array access can cause runtime problems and is difficult to
detect.
Character arrays are used to store strings, and are terminated by a null
element.
SECTION II
THE C PROGRAMMING LANGUAGE
This section explains the elements of the C programming language.
2. BASIC ELEMENTS OF C
This chapter presents the basic building blocks of C.
Keywords
Some names in C have special meanings. They are called keywords:
Footnote
[10]
for Nibble
In C, bit numberings are read from right to left. Programmers count starting from
zero, so the bit on the far right, being considered the initial bit, is normally referred
to as “bit 0”. Bit 0 is also called the least significant bit (LSB), and the “zeroth bit”.
Bit 7, the far left bit, is likewise referred as the most significant bit (MSB) and the
“seventh bit”.
In a 32-bit word, the leftmost bit is the MSB, and it is bit 31. The most common
number representation used by modern CPUs is the 2’s complement form. In this
representation, one bit is the sign bit and usually the MSB is designated as such.
A sign bit of value 0 is a positive number and a sign bit of 1 is a negative number.
NOTE #1: One may argue that the rightmost bit is the “first bit”, the MSB would be
the 8th bit, or 32nd bit etc. However, most programmers use the 0th bit and 7th
bit/31st bit nomenclature.
NOTE #2: More information can be found in the Appendix <Introduction to
Computer Arithmetic>.
Integer Constants
Integer constants are numbers, e.g. 42. A negative constant has a - prefix, e.g.
-42. For symmetry, you may also write a positive number with + prefix, e.g. +42.
In normal writing, numbers are written in base 10. That is, each digit in a number
is a power of 10:
123 = 1*100 + 2*10 + 3*1
= 1*102 + 2*101 + 3*100
Other number bases are possible; hexadecimal is base 16. In C, hexadecimal
constants are written with a 0x or 0X prefix, and the letters A to F and a to f are
used to represent the numbers 10 to 15. An example of converting a hexadecimal
number to the decimal equivalent:
0xC0DE = C*163 + 0*162 + D*161 +E*160
= 12*4096 + 0 + 13*16 + 14
= 49152+ 208 + 14
= 49374
Hexadecimals are useful since a byte is 8 bits and half of a byte is a nibble, which
is 4 bits. A hexadecimal digit fits in a nibble exactly, and is particular useful when
used for low level bit patterns.
Octal is base 8. In C, octal constants are written with a 0 prefix. An octal number
fits in 3 bits exactly and thus is favored by some programmers. The only valid
digits are 0..7.
Binary is base 2. The C Standard does not define a binary notation, but most
compilers support the extension of using 0b and 0B as the prefix for a binary
number. 0 and 1 are the only valid digits.
SUFFIXES: the data type of an integer constant is an int. The following suffixes
are available to change data type of the constant. Note that you may use the
uppercase or lowercase letters for the suffixes:
BUG ALERT: it is easy to forget that a 0 prefix means an octal number, and not a
regular base 10 number!
Character Constants
Enclosing a character inside a ‘ ’pair is the C method of writing a character
constant. You may write:
char c = ‘C’;
Character constants are integers and can be assigned to int variable as well:
int ch = ‘C’;
The value of a character constant is its numeric value in the compiler’s
environment character set. In English speaking countries, the ASCII code is
almost always used.
You can also use the escape sequence (see string constants below) in a
character constant.
String Constants
A literal string, or string constant, is a sequence of characters enclosed in a pair of
double quotes.
“hello, world”
“I am \”alive\”!”
To write a double quote character “ inside a string, you use the character
backslash \, as shown in the second string above.
You can also put an arbitrary numeric value into a string by using numeric escape
sequences:
“hello,\x20world”
Numeric Escape Sequences
An escape sequence can be used in a character or string constant to insert a non-
printable character such as a tabstop, a new line, or a backspace.
octal escape sequence - you write: \d or \dd or \ddd where each d is an
octal digit (i.e. 0..7)
hexadecimal escape sequence – you write \xh or \xhh or \xhhh … where x
is the letter x, and each h is a hexadecimal digit (i.e. 0..9, A..F or a..f). The
example in the previous page \x20, is hexadecimal 20, which is the space
character in the ASCII set.
A numeric sequence terminates with the first invalid character for the sequence. A
common numeric escape sequence is ‘\0’, the null character.
Escape Sequences
The complete set of non-numeric escape sequences are:
As you can see, a number of the escape sequences are for formatting output on
old style CRT terminals and line printers.
CR and NL require a bit more explanation: CR moves the cursor to the beginning
of the line and NL moves the cursor one line down. A CR-NL combination does
[11]
what most people expect the keyboard key “ENTER” or “RETURN” to do . In a
standard C-conforming compiler, the escape sequence \n generates the right
sequence for the target environment. In microcontroller targets, it only matters
when communicating to the terminal emulator running on the host through the
UART and the low level character output function must map ‘\n’ into the host
requirement.
Integer Data Types
C has multiple integer data types, providing a choice of how much space a
variable may take and the range of the values it can hold.
SIGNED DATA TYPE: A signed data type hold negative and positive values.
UNSIGNED DATA TYPE: An unsigned data type hold positive values only.
Any signed integer type, e.g. int, can be written with the signed prefix, e.g.
signed int, but is not needed.
Signed and unsigned integers
The size or width of a data type or variable is expressed in number of bits.
The range of a data type depends on its width and whether it is signed or
unsigned. Let n be the number of bits in a data type, a signed type uses n-1 bits
to store the magnitude and 1 bit to store the sign. An unsigned type uses all n
bits to store the magnitude.
Signed range = -2n-1..2n-1-1
unsigned range = 0..2n
NOTATION: The .. between two integers denotes a range, e.g. -32768..32767
means from -32768 to 32767 inclusively.
C does not dictate the widths of the basic integer data types. However, the
following relationships and conditions must be observed by a conforming C
compiler:
1. The width of a char must be at least 8.
2. The width of a short must be at least 16 and must be at least as wide as of
a char.
3. The width of a int must be at least as wide as the short.
4. The width of a long must be at least as wide as the int.
The beauty of C is that by not enforcing fixed size data types, C can be compiled
to efficient code on most architectures. The resulting programs will maintain a
large degree of portability if the programmers are careful with their work. This
combination of efficiency and portability is one of the reasons that makes C the
most widely available language across all architectures.
Choosing an integer Data Type: Using the Native C
Type
With choices for integer data types, the question becomes which type to use. A
set of simple rules for the Cortex-M or most modern 32-bit CPU is:
1. If space usage is a consideration, and you know that the range of the values
will not exceed 8 bits, then use signed char or unsigned char.
2. If space usage is a consideration, and you know that the range of the values
will not exceed 16 bits, then use short and unsigned short.
3. Otherwise, just use int or unsigned int.
Floating Point Constants
A floating point number can be written in different ways:
Decimal: a decimal point is used, e.g. 3.14159267 or 1000.0023 etc.
Decimal notation is the “normal” way of writing a floating point number.
Scientific Notation: using the e-notation, e.g. 1.0000023e3. The number to
the left of the letter e (uppercase E can also be used) is called the digit term,
and the number to the right of the letter e/E is called the 10’s exponent.
0.0000314159267 is 3.14159267e-5
3141592.67 is 314.159267e4
A normalized floating point number in scientific notation is one where the digit
term always has one and only one digit to the left of the decimal point. For
example:
3.14159267e-5
3.14159267e6
10f
10lf
3.14159267f
3.14159267e6lf
Floating Point Representations
At the lowest level, the “brain” of a computer, the Central Processing Unit (CPU)
operates on bits, with each bit having a value of 0 or 1. The CPU has basic
arithmetic operations (such as add, subtract, multiply) for signed and unsigned
integer types.
Most CPUs do not support floating point number operations directly. Most C
compilers use the IEEE floating point formats to store a floating point number
internally. The internal representation is similar to using scientific notation, except
that instead of using 10’s exponent, 2’s exponent is used due to the binary nature
of computer arithmetic.
There are two common IEEE floating point formats, corresponding to C’s float
point data types float and double:
Note that storing a written decimal floating point number, such as 3.14915267 in
the internal FP format might not be exact since the internal format is in binary.
That is:
float pi = 3.14159267;
if (pi == 3.14159267)
…
might not compare as expected. This will be discussed further later.
3. EXPRESSIONS AND OPERATORS
Algorithms
Algorithms are step-by-step instructions on how to solve a problem. For example,
given the question “how do you convert miles into kilometers”, the answer is to
use a mathematical formula
miles = kilometers * 1.60934
Indeed, programming in a procedural language such as C can be said to be an
exercise in finding the correct algorithms and data structures to solve the
particular problem.
Overflow
In pure mathematics, there is no limit how large an integer can get, and a floating
point operation will always produce a floating point result. This is not the case with
computer arithmetics. For example, with 32-bit int, C does not say what will
happen if the result of an operation exceeds 32 bits. This is called overflow.
Overflow can also occur when converting a floating point value to an integer
value.
Most CPUs include a method to determine whether an overflow has occurred, but
there is no portable C method of accomplishing this. Most CPUs simply discard
the overflowed bits and truncate the result. For mission-critical programming, this
is a condition that you need to be aware of.
The sensor data was collected in floating point format. To make it easier and
faster to manipulate, the data was then converted into integer format.
16-bit integers were chosen to store the converted data. Unlike other places in
the program where similar conversions were done, there was no overflow
check for this particular code fragment because the rocket would “never” fly
fast enough to cause an overflow to happen. (Never say “never”…)
The same code had been running successfully on the previous Ariane IV
rocket launches.
This piece of code should only have been run at the start of the launch and
then been disabled afterward, but the engineers left it running during the first
40 seconds of the launch in order to make it easier to restart the rocket in case
there was a countdown hold.
The Ariane V was much more powerful than the Ariane IV rockets, with much
faster acceleration and speed.
The failure happened when the horizontal velocity overflowed the 16-bit
integer at 36.7 seconds into the flight. With no overflow checking, the CPU
determined that something must have gone wrong.
The failure was confirmed when a redundant CPU executing the identical
software experienced the same conditions.
The rocket was in fact operating correctly, but the control system initiated a
self-destruct.
As software gets more complex, more reuse of existing code inevitably occurs. It
is important to make embedded code as robust as possible. While there is no
foolproof way to write 100% robust embedded system software, in hindsight,
clearly a few major mistakes were made in the software implementation in this
case.
Sometimes “over-spec”-ing a system is far more important than worrying about
performance issues. In the particular example of the Ariane rocket, they could
have:
1. used a 32-bit integer to store the result of the floating point conversion, and
2. checked for an overflow condition regardless of what size integer variables
were being used, and
Type Promotion
Unless it is being used as the operand of the sizeof operator, an integer
expression has one of four types: int, unsigned int, long, and unsigned
long. If the expression does not have one of these types, the compiler promotes
its type. The table below summarizes promotions for integer and floating point
operands.
Interger Value Promotion
When a type is promoted, the value also changes, by using the following rules:
Type Balancing
When you write a binary operator with arithmetic operands, the compiler balances
the operand types using these rules:
Both operands are converted to the balanced type, and the operation is done
using the balanced type. If the operands do not have unsigned int and long
type, then select the wider operand type and use the section 2 entries.
Assignment Operators
The assignment operator assigns the value of the right operand to the left
operand.
[12]
The data type of the operands must be “similar” . The left operand can be one
of the following (this list corresponds with the example list in the table above). The
meaning of the operators mentioned in the list will be explained in a later section:
1. A variable name
2. Dereference of a pointer expression or pointer variable
3. An array element
4. A structure member reference
5. A pointer to structure member reference
In C, the arithmetic, the bitwise, and the shift operators have assignment-form,
e.g. += is “add-assigns” -= is “subtract from” etc.
i += j
is equivalent to writing
i = i + j;
The only difference between the two forms is that in the assignment-form, the left
operand is evaluated only once. The assignment-forms are listed in their
respective operator sections.
Arithmetic Operators
Arithmetic Operators mainly correspond to their mathematical counterparts.
Arithmetic operands must have integer, floating point, or pointer data types.
Pointers are one of the most important features in C, and will be explained later.
BUG ALERT #1: You cannot cascade comparison operators. That is, the
following piece of code probably does not do what you meant:
if (0 < i < 10)
…
you must write the following instead:
if (0 < i && i < 10)
…
EXERCISE: Nevertheless, if (0 < i < 10) is legal C. What does it mean?
BUG ALERT #2: Sometimes it is easy to mistakenly write the assignment
operator = instead of == inside a conditional check. Unfortunately, it is legal C and
may even be useful for advanced developers to write such an expression, but the
compiler does not necessarily warn beginners against such usage.
Logical Operators
Logical Operators evaluate multiple conditional expressions.
Shift Operators
Shift operators shift an integer by a specified number of bits. Shifts can be used
for:
multiplying by a power of two,
an unsigned divide by a power of two,
extracting a value packed inside a word
The ARM Cortex-M3 and above (ARM architecture V7m and above) includes
optional shift as part of the addressing mode of load and store instructions. This is
particularly useful for accessing array elements.
Notice
that shift operators have assignment-forms. Only integer operands are allowed.
Left shift moves all the bits of the left operand X position to the left, where X is
the value of the right operand, and fills the lowest order bit with zeroes. For
example:
unsigned char uc = 0b01110101;
uc <<= 3;
after the operation, uc will have the value 0b10101000. The upper 3 bits are
shifted off to the “bit bucket” and are lost. Left shifting by X bits is the same as
multiplying the operand by 2X:
var << 1 → var * 21 → var * 2
var << 2 → var * 22 → var * 4
var << 3 → var * 23 → var * 8
… etc.
Right shift moves all the bits of the left operand X position to the right, where X is
the value of the right operand. For a signed integer type, the sign bit / MSB is
replicated from the left. For an unsigned integer type, the vacant bits are filled
with zeroes. For example:
unsigned char uc = 0b11110101;
uc >>= 3;
// uc is now 0b00011110
signed char sc = 0b11110101;
sc >>= 3;
// sc is now 0b11111110
Right shifting an unsigned operand by X bits is the same as dividing the
operand by 2X. However, right shifting a SIGNED operand is not equivalent as
division:
unsigned-var >> 1 → unsigned-var / 21 → unsigned-var / 2
unsigned-var >> 2 → unsigned-var / 22 → unsigned-var / 4
unsigned-var >> 3 → unsigned-var / 23 → unsigned-var / 8
… etc.
NOTE: the result of shifting by more than the number of bits in an int type is
undefined. For example, shifting by more than 32 bits in Cortex-M is undefined.
Bitwise Operators
Bitwise operators operate on all the bits of the operands. There is no difference
between signed or unsigned operands.
Notice that bitwise operators have assignment-forms. Only integer operands are
allowed.
The bitwise operators apply the bit operations, as described in the
“Implementations” column above, to each bit of the operand(s). For example,
bitwise AND-ing two 8-bit operands:
operand 1: 10101101
operand 2: 01101110
— bitwise AND –––––
result: 00101100
Bitwise operations are commonly used in low level code such as accessing
microcontroller’s I/O registers, writing device drivers etc. Sometimes they are
used because the low level access (e.g. I/O registers) requires it, and sometimes
they are used where size and speed are at a premium.
TIPS #1: Turning on all bits in an unsigned variable: to turn on all bits in an
unsigned variable, assign ~0u to the variable. You will need to cast the
expression if the unsigned result is to be of a narrower type than unsigned
int:
unsigned char uc;
unsigned short us;
unsigned int ui;
ui = ~0u;
us = (unsigned short)~0u;
uc = (unsigned char)~0u;
TIPS #2: Bit toggle: to toggle a bit, exclusive OR it with the value 1.
I ^= 1; // toggle i
ADVANCED TIPS #1: to swap two same size variables without using a
temporary:
x = x ^ y;
y = x ^ y;
x = x ^ y;
ADVANCED TIPS #2: to isolate the rightmost bit that is ON of an unsigned
variable:
x = ~(x – 1);
or
x = ~x + 1;
Given a
variable with data type TypeX, the address of the variable has the data type
“pointer to TypeX”. To declare a variable of the data type “pointer to TypeX”, you
write a * in front of the variable name, to mirror the indirect or dereference
operator:
// “pui” is a “pointer to unsigned”
unsigned ui, *pui;
ui = 42;
// take the address of “ui” and assign it to “pui”
pui = &ui;
printf(“ui is %d, the address is %p\n”, *pui, pui);
After the assignment “pui = &ui”, *pui is an alias to ui. The printf call
prints out the value of *pui, which is 42, and the value of pui, which is the
address of ui. The format code %p in printf specifies the argument is a
pointer variable.
You can also modify ui indirectly through pui:
pui = &ui;
*pui = 5;
printf(“ui is %d\n”, ui);
prints out that “ui is 5”.
BUG ALERT: Illegal Pointer Access
Dereferencing a pointer variable will cause issues if the pointer variable does not
contain the address of a valid object. Accessing through uninitialized pointers or
pointer values that are beyond the bounds of the allocated objects is the biggest
source of program errors in C.
To access an element of an array, you write the name of the array variable
followed by the index enclosed in [ ]. The index must be an integer type.
As we will later see, an array subscript is semantically equivalent to *, the pointer
dereferencing operator. In other words:
char str[] = { “hello, world” };
printf(“%c %c\n”, str[0], *str);
produces the output
h h
(but let’s discuss this more in the chapter <Pointers and Arrays>)
BEST PRACTICE: While not strictly required by the C Standard, you should
always provide a function declaration prior to calling it. Otherwise, the return-type
or the argument types in the function may not match the types you called it with,
which may lead to runtime errors. A function declaration is sometimes known as a
function prototype or a function signature.
A function call may return a value. This is indicated by the function prototype. If a
function does not return a value, its return-type is void. A function call that
returns a value may be used anywhere that a value of that type is allowed.
Function calls can be nested (a function call can be used as an argument to
another function).
// a couple function prototypes
extern int foo(int, int);
extern int bar(int);
// foo() is a “nested” function call
nti = bar(foo() * 2) + 5;
ADVANCED TOPIC – “C With Classes”: JumpStart API uses C With Classes to
make the API functions easy to use. “C With Classes” is a feature borrowed from
C++, and in JumpStart API, allows you to write function calls such as this:
porta.MakeOutput(5, OSPEED_LOW);
The syntax is <struct var>.<member function> ( argument list ). C
With Classes is fully described later.
Conditional Operator
The Conditional Operator is the only ternary operator (i.e. with 3 operands) in C.
The first operand of a conditional operator must be of the arithmetic type, and is
evaluated as to whether it is nonzero (true) or zero (false). If it is nonzero, the
second operand is evaluated and the result of the expression is the result of the
second operand. The third operand is ignored. However, if the first operand is
zero, the third operand is evaluated and the result of the expression is the result
of the third operand. The second operand is ignored.
The second and third operand must be of compatible types.
[16]
A conditional operator can replace an if-else statement . For example, the
code in the table above “nti = isdigit(a) ? ‘Y’ : ‘N’;” can be written
in this much longer form:
nti ;
if (isdigit(a))
i = ‘Y’;
else
i = ‘N’;
Besides saving keystrokes, conditional expressions can be used as part of larger
expressions that would otherwise require more complicated control structures
and variables to hold temporary results. For example:
printf(“‘a’ is a digit: %c\n”, isdigit(a) ? ‘Y’ : ‘N’);
which is easier to read than the alternatives using if-else.
Cast Operator
A Cast operator casts the operand to a result with the casted-to type. It is written
as a type declaration inside a set of ( ) in front of an expression.
A Cast operator can be used for the following purposes. In the below examples,
assume the following variable declarations
signed char sc;
unsigned char uc;
nti ;
unsigned ui;
float f;
double d;
char *pc;
int *pi;
void *pv;
Conversion marked with (*) can be done without writing the cast operator
explicitly, as the compiler may deduce the intended operation by applying the type
promotion rules. This is known as Free Cast.
Converting between signed and unsigned integer types of the same size:
casting between signed and unsigned integer of the same size produces a
value with the same bit-pattern, but with the casted-to type. Mainly useful to
bypass compiler type checking rules.
I = (int)ui; // no code generated, type change only
(*) Converting a smaller integer type to a larger integer type: converting a
smaller signed integer to a larger integer type involves sign-extending of the
operand. Sign extension of a positive number fills the upper bits with zeroes and
sign extension of a negative number fills the upper bits with ones.
Converting a smaller unsigned integer to a larger integer type uses zero-
extension where the upper bits are filled with zeroes.
I = (int)sc; // signed extension
i = (int)uc; // unsigned extension
Converting a larger integer type to a smaller integer type: the excess bits are
discarded.
Sc = (signed char)i; // truncate
(*) Converting a floating point type to a different floating point type: a
compatible value is produced. When converting from a 64-bit double to a 32-bit
float, some precision may be lost.
F = d; // value may overflow
d = f;
(*) Converting an integer to a floating point type: a 32-bit floating point can
[17]
only hold about 7 decimal significant digits . Any integer with more than 7
decimal significant digits cannot be represented exactly in a 32-bit floating point
value.
A 64-bit floating point has about 16 decimal significant digits.
F = (float)i; // value may not be exact
d = (double)i; // value may not be exact
(*) Converting a floating point to an integer: the fractional portion is discarded,
and then the integral portion is truncated if needed.
I = (int)f; // value may truncate
Converting between a pointer type to a compatible pointer type: produces a
value with the same bit-pattern, but with the casted-to type. Mainly useful to
bypass compiler type checking rules. Void * is compatible to any data pointer
type.
Pc = (char *)pi;
pv = (void *)pi;
(*) Converting integer zero to a pointer type: a pointer value of zero is the null
pointer, signifying that it does not point to a valid location. However, in a
microcontroller environment, it is possible that zero may be a valid address; some
microcontrollers have memory at location 0.
Pc = 0;
Converting between a pointer type and an integer type: except for the integer
0, this is an unsafe practice and should be avoided.
Ui = (unsigned)pc;
pc = (char *)ui;
Bit patterns may change and casting back-and-forth may produce a different
result than the original value. With the above code fragment, there is no
guarantee that the pc has the original value.
Casting any expression to a “void” type: this discards the result of the
expression.
(void)ui;
(void)function_call();
Sizeof Operator
The sizeof operator is the only operator that use letters and not symbols. It
returns the size of a data object in number of bytes.
The following are valid operands to the sizeof operator (see examples in table
above):
1. a constant (integer, floating point, character, string)
2. a variable name
3. an array element
4. a struct/union member reference
5. dereferencing of a pointer variable (e.g. *ptr)
6. a type declaration
7. a typedef’ed name (see <Variables And Type Declarations>)
8. a function name (returns the size of a function pointer, not the size of the
function)
A set of enclosing ( ) is optional for the operand unless the operand is a “type
declaration”. In that case, the ( ) is required. However, for consistency, you
should just enclose all sizeof-operands with a set of ( ).
Comma Operator
The comma operator is just a comma (,), but it is different from the comma used to
separate arguments in a function call or a function prototype, or the comma used
to separate a list of variables in a variable declaration.
The operands of a comma operator are evaluated from left to right. The value of
the left operand is discarded and the result of the comma operator is the value of
the right operand.
A comma operator is most frequently used in the “initial expression” of a for
statement: as we will see in the chapter <Statements>, the general form of a for
statement is:
for (<init expr> ; <test condition>; <post expr> )
…
The <init expr> is typically for initializing loop variables. If there is more than
one variable you wish to initialize, then you can use the comma operator achieve
that goal:
for (i = 0, j = 0; …
A comma operator is also useful when you need to perform an operation, possibly
a function call, in a nested expression. For example:
nti = isdigit(a) ? (foo(), ‘Y’) : ‘N’;
Using the comma operator, you can call the function foo without affecting the
value of the subexpression.
4. STATEMENTS
C Statements
Most C statements provide control flow mechanisms, allowing the program to
execute pieces of code conditionally or repeatedly. Statements cannot be
embedded inside an expression. Some statements, including break, continue,
and case labels, can only be used inside other statements.
Statement Label
Any statement may be preceded with zero or more labels, in the form of
label:
For example:
// “top” is a label, as indicated by the colon ‘:’
top:
foo = a + b;
Any valid C identifier may be used as a label name as long as the name does not
conflict with another label name within the same function. A label is typically used
as target of a goto statement within the same function body. A label may appear
before or after the goto statement that references it.
Expression Statement
An expression statement is simply an expression. It’s called a statement just so
that we do not need to write things like “the body of a while loop is a statement or
an expression”.
Statements are at the “top level” (i.e. there is no such thing as a “sub-statement”,
unlike subexpressions) and it makes no sense to write an expression without side
effects:
nti , j;
i + j; // and then…?
[18]
Therefore, an expression statement usually has an operator with side effects
at the top level. These include assignment operators, increment / decrement
(which is a form of assignment) operators, and function calls.
K = i + j;
—i;
foo();
Compound Statement
A Compound statement is a list of statements and declarations enclosed in a pair
of { } , usually for the purpose of grouping them together as a single statement
to the body of an if-statement or a loop etc.
The body of a function definition is in fact a compound statement. Note that a
compound statement does not have or need a terminating semicolon.
Null Statement
A Null Statement is just a semicolon. This is useful if the syntax requires a
statement but the program has nothing to perform.
// copy src to dst until a null is encountered
char *strcpy(char *dst, const char *src)
{
char *s = dst;
while ((*dst++ = *src++) != 0)
;
return s;
}
All the work is done in the while condition itself, therefore the while body-
statement is a null statement.
Notice the use of the idiom (*dst++ = *src++) != 0. This single expression
assigns the value pointed-to by src to the address contained in dst, then
increments both pointers, and then the copied value is checked against 0.
Do-while Loop
Syntax Form
do <statement>
while (<expr>);
A do-while loop is the same as a while loop except that the test expression is
tested at the bottom. Therefore, a do-while loop is executed at least once,
whereas a while loop may never execute if the test expression is zero the first
time it is run.
Notice there is a semicolon after the while keyword, as a terminator for the do-
while statement.
For Loop
Syntax Form
for (<init-expr>; <expr>; <post-expr>)
<statement>
A for loop is a shorthand of writing a while loop, but combines an initial-
expression and a post-expression in a single syntactic element.
The semantics for the test expression <expr> and the use of break and
continue statements inside the loop body are exactly the same as in the while
loop. A continue statement will jump to the post-expression, before proceeding
the test expression check.
Example:
int n = 0;
for (nti = 0; i < 16; i++ )
{
channel_reading[i] = readADC(i);
n += channel_reading[i];
}
<init-expr> is typically used for loop variable initialization, and you may
declare the variable in-place, as shown in the above example. You may even
declare multiple variables, separating them by commas:
for (nti = 0, j = 0; …
However, as <init-expr> must be an expression, you cannot write multiple
declaration statements. The following is incorrect:
for (nti = 0; int j = 0; …
When you declare a variable as part of the <init-expr>, the “scope” of the
variable ends after the for statement body. Scope is discussed in the chapter
<Variables>, but this just means that you cannot access the variable beyond the
for statement body.
If you are do not use a declaration syntax, e.g.
for (i = 0, j = 0; …
then it is a list of expressions separated by the comma operators. <post-expr>
is typically used for loop variable increment.
Note that all of the expressions, including the test expression, are optional, and
can be omitted. This is different from the do and while loops, where the test
expression is not optional.
A forever loop can be written as:
for (;;)
…
as an alternative to
while (1)
…
Return Statement
Syntax Form
return;
return <expr>;
A return statement transfers control back to the calling function. If the function
has a return type, then an expression compatible with the return type must be
specified. If the function has no return type, i.e. a return type of void, then the
return statement must not have a return expression:
return 1; // returning a value
return; // function return with no return value
If the last statement of a function is not a return statement, then the compiler
inserts one so that execution will resume properly.
When a return statement executes, storage for local variables is reclaimed and
the C environment is restored to the calling function.
Example:
int foo(void)
{
return 42;
}
void bar(void)
{
return;
}
Note that the return expression does not need to be enclosed inside a set of ( ).
Some people always write a return statement that way, but it’s optional.
Switch Statement
Syntax Form
switch (<expr>)
{
case <const1>:
break;
default:
}
A switch statement evaluates an integer expression and compares the value to
the case label values in the body statement. If there is a match, then control is
transferred to the statement following the case label. If there is no match, and if
there is a default label within the body, then control is transferred to the
default label. If there is no default label, then execution proceeds to the
statement following the switch statement.
Example:
// check if a character is a hexadecimal or decimal digit
switch (ch)
{
case ‘a’: case ‘b’: case ‘c’: case ‘d’: case ‘e’: case ‘f’:
is_hex = 1;
// FALL THROUGH
case ‘0’: case ‘1’: case ‘2’: case ‘3’: case ‘4’: case ‘5’:
case ‘6’: case ‘7’: case ‘8’: case ‘9’:
is_digit = 1;
break;
default:
is_hex = is_digit = 0;
break;
}
A switch body is usually a block statement. All case labels must have unique
integer constant values within this switch body. Once execution starts at a case
label, it continues until either a break or return statement is executed or the
rest of the switch body is executed, ignoring any intervening case and
default labels.
In some programming languages, execution of a case body terminates when
another case label is encountered. Since this C behavior is potentially an error
case (e.g. the programmer accidentally forgot to put in a break statement), some
source code analysis programs would flag a case body without an unconditional
break statement. The comment
// FALL THROUGH
case …
is often used to inform these tools that this is intentional.
A straightforward implementation of a switch statement is the equivalent of
performing a series of if-else comparisons until a matching value is found or
until all tests are exhausted. A compiler may optimize the generated code by
using a jump table, or using a binary sorting algorithm. JumpStart C for Cortex-M
optimizes to use jump tables whenever possible.
5. VARIABLES
We have seen examples of variables and their declarations already. This chapter
discusses variables in detail, and the next will discuss types and declarations.
For the most basic understanding of variables, you only need to know a few
things:
1. Variable types: the difference between local, global, and static variables
2. Initializers: how to write initializers
Once you know these, you can “start writing code”, but we will also discuss
additional information important to mastering C.
Variable Names
A C variable name, also called an identifier, has the following syntactic
restrictions:
1. A variable name cannot be a C keyword.
5. Standard C advises not to exceed certain length for a name for portability
reasons, but most modern compilers support names in the excess of at least
100 characters, so this is not a concern in most cases.
Here are the “rules of thumb” on which variable type you need:
1. (This is usually the case) If the variable is only needed in a function, e.g. a
variable to hold temporary values, loop counters etc., and the values are
transient, then declare it as a local variable.
2. If you need to retain global information accessible from multiple functions and
it is not feasible to pass the information as function arguments, then declare it
as a global variable.
3. If the variable is only needed in a function, but the value must be kept across
invocations (e.g. an initialized array of constant content, a counter that does
not reset etc.), then declare it as a static variable inside a function.
4. As with all the reasons to use a global variable, but in addition, if all the
functions that need to access the variable are in a single file, then declare it
as a static variable at the file level of that file.
These rules keep the variable declaration close to where it is used, which is
important in program maintenance.
The next few pages elaborate on these.
Local Variables
If you declare a variable inside a function (without the static or extern
storage class, which will be described later), then it is a local variable. Local
variables do not have fixed locations in memory, and are created on demand only
when the enclosing function is running. New copies are created each time the
function runs, and the copies are destroyed when the function returns. The initial
value of a local variable is random.
Prior to C99, local variables could only be declared after the { of a compound
statement, before any statements within the block. C99 relaxed that, and allows
variable declarations anywhere a statement is allowed.
You may use the auto or register storage class in the variable declaration,
although their uses are archaic and should be avoided. Since you might
encounter them in sample code (not from us :-) ), a brief explanation is in order.
When used, the storage class is specified before the type. For example:
void foo(void)
{
int i; // local variable with no explicit storage
class
register char c; // register storage class
auto unsigned u; // auto storage class
// stuff
…
}
Notice the placement of auto or register. Originally, register was used as
a hint to the compiler that it should allocate the variable to a machine register if
possible. Since machine registers are faster to access, this is a desirable trait for
frequently used variables. However, most modern compilers use advanced
heuristics for register allocation, and this keyword is no longer needed. auto is
the original way to specify that the variable is probably not important enough to be
allocated in a register, and this is also no longer necessary.
There is one leftover use for the register keyword: any variables declared with
this storage class cannot be used in conjunction with the address-of & operator,
but that’s really minor fallout from the definition and is not much of use either.
Function Arguments
Function arguments are special cases of local variables. They act exactly like
local variables, except that their declaration form is different:
They are declared inside the set of ( ) after the function name
The argument list is separated by commas, and not semicolons
You cannot write initializers after their names, nor can you use the obsolete
auto storage class
Only one argument variable is allowed per type declaration
For example:
char *strcpy(char *dst, const char *src);
The last bullet means that if there are three int arguments, they have to be
written as follows:
int AddMult(int a, int b, int c);
and not:
// bad syntax
int AddMult(int a, b, c);
Global Variables
If you declare a variable outside a function, and without the static storage
class, then it is a global variable. Global variables have fixed locations in memory.
If a global variable declaration has no initializer, then the variable is initialized to
zero per Standard C requirements.
A global variable declaration may have the extern storage class, but it has two
meanings, depending on whether there is an initializer:
extern nti = 42;
With an initializer like above, then the extern keyword has no effect whatsoever;
it’s exactly the same without it. The second case is explained on the next page.
There are three primary uses for global variables:
1. To represent a persistent global state or a global object. For example, in the
JumpStart API, global variables are used to represent the state of hardware
setup, such as GPIO PORTA, or the I2C etc.
External Declarations
An external declaration is a declaration with the extern storage class and
without an initializer:
extern int i;
Its purpose is to declare that there is a global (and not a static) variable named i
defined elsewhere (may even be in the same file). This allows a global variable to
be accessed in other files besides where it is defined. It does not need to be
placed at file level either: if an external global variable only needs to be accessed
in a function, the the external declaration can be placed inside the function:
void foo(void)
{
extern int i;
…
}
Some people prefer this style to limit where the variable is accessed. You may
write a global variable declaration and an external declaration for the same
variable in the same file.
Definition vs. Declaration: In this book, we use the terms declaration and
external declaration. Some writers also use the term “definition” to refer to the
former. However, even using that terminology, a definition also serves as a
declaration, so we feel it is best to use the terminology we are using.
Static Variables
A static variable is similar to a global variable except that it is only visible in the
place it is declared:
If declared inside a function, then it can only be accessed within that function
If declared at the file level, then it can only be accessed in the file where it is
declared
A static variable has the static storage class:
static int i;
Like global variables, if a static variable declaration has no initializer, then the
variable is initialized to zero per Standard C requirements.
A function-level static variable could be used to keep track of persistent data that
is only of interest within the function.
Initializers
A variable declaration may be followed by an initializer:
int i = 5;
It is the same as if you have assigned the value before its use:
int i;
…
// before “i” is used
i = 5;
Using an initializer is more convenient and makes clearer code.
Initializers for aggregate types (array, struct/union) and pointers are explained
in chapter <Types and Declarations>.
Visually:
Block scope needs further explanation: each set of compound statement { }
introduces a new (nested) scope. A variable declared in a block scope is visible
only within that scope and the visibility ends when the matching ending } is
encountered.
The scope rules also dictate when the same name can be reused:
All global variables must have unique names.
File level static variables within a single file must have unique names.
Within a block scope, all static and local variable must have unique names
from each other, but may have the same names as variables declared in an
outer block scope, or at the file scope.
The last bullet means that you can write:
int counter; // global variable
void foo(void)
{
… = counter; // use of global var
float counter; // new local variable with same name
if (counter / 2 > 0)
{
char *counter = { “Counter” }; // new local
variable
…
}
// “float counter” is in scope again
… = counter // use of “float counter
}
There are three (non-conflicting) declarations of the name “counter”. The
example is for demonstration only, as there is not much point of purposefully
reusing the same name within a set of nested scopes as shown. Reusing a
variable name is most useful for being able to have same name for counter
variables, or have a local variable that just so happens to have the same name as
a global variable without creating a conflict.
Variable Alignment
Come CPUs, such as the Cortex-M0, have strict alignment requirements:
1. 16-bit access must be on a 16-bit (e.g. 2 byte) boundary.
2. 32-bit access must be on a 32-bit (e.g. 4 byte) boundary.
The Cortex-M3 and above have support for unaligned access, but even for those
processors, 16-bit and 32-bit access on a 16-bit boundary are preferred because
access time is shorter. The JumpStart Cortex-M compiler aligns variables at their
natural boundary.
Derived Types
We have already seen simple declarations such as for int and array variables.
C also allows you to create derived types:
Pointer Types
When you take the address of an object and store that address in another object,
the second object is said to contain a pointer to the first object. For example,
given the following:
int i = 42;
int *p = &i;
p has the type “pointer to int”. After the second assignment, p contains the
address of i and is said to contain a pointer to i. If you look at the memory cells
containing the variables, they may look like these:
Memory Address Variable Name Content
0x1000 i 0x0001
0x1004 p 0x1000
A pointer may contain the address of a data object or the address of a function. If
you are familiar with machine code programming, this is not a surprising feature,
but this might be a new concept for users familiar with only other high level
programming languages.
Pointers are one of the most powerful features in C, allowing C to be used for
embedded systems or to be used in writing system programs such as Linux, OS X
and Windows. Unfortunately, misuse of pointers also causes the majority of the
bugs in C programs.
Since pointers contain addresses, a proper pointer value must be a valid address.
However, a pointer object may contain an invalid address, as long as the program
is not accessing the object through the pointer (a process known as
“dereferencing”). This is why using pointers can be dangerous: a pointer may pick
up an incorrect value at some point during program execution, but the pointer may
not be dereferenced until in a different section of the program code. This may
cause program to crash even though the code otherwise looks correct.
Pointers can be applied to any type, not just basic types such as int, but also
other pointers, or to a struct etc. For a simple pointer type such as “pointer to
int”, the declaration syntax is
<type> *<name>;
// example
int *p;
More complicated pointer declarations will be described in the section on
“Reading a C Declaration”.
Pointer Initializations
In a pointer variable declaration, you can follow it with an initializer:
int i = 42;
int *p = &i;
int *q = 0;
For a global and static variable, the initializer must be the address of a variable
with the points-to type of the pointer variable, or 0, the null pointer. For example, p
is a pointer to int, and i is an int, therefore, &i has the same type as p.
For a local variable, the initializer may be an address of a variable with the points-
to type, a 0, or any type-compatible expression.
char * As “Strings”
C does not have a string data type. Literal strings such as “hello world” have
the char * data type and you may initialize a char * variable with a literal
string:
char *hello = “hello, world”;
The compiler places the literal string in the target memory and the address of the
allocation is assigned to the variable hello.
Array Types
An array type is indicated by putting the array symbols [ ] after the name:
unsigned char array[2];
unsigned char _2D_array[4][2];
unsigned char _3D_array[6[4][2];
array[0] = 1;
_2D_array[0][0] = 1;
_3D_array[0][0][0] = 1;
As shown in the example, multiple dimensions are supported. For each
dimension, the array indexing goes from 0 to the size-1 of the dimension.
Accessing an element of an array is called indexing the array. Array indexes must
have integer types.
The memory is laid out from the “left” dimensions to the “right”. For example, with
the _3D_array, assuming the starting address is 0x1000, the layout looks out
this:
Address Array element
0x1000 _3D_array[0][0][0]
0x1001 _3D_array[0][0][1]
0x1002 _3D_array[0][1][0]
0x1003 _3D_array[0][1][1]
0x1004 _3D_array[0][2][0]
0x1005 _3D_array[0][2][1]
0x1006 _3D_array[0][3][0]
0x1007 _3D_array[0][3][1]
0x1008 _3D_array[1][0][0]
0x1009 _3D_array[1][0][1]
0x100A _3D_array[1][1][0]
0x100B _3D_array[1][1][1]
0x100C _3D_array[1][2][0]
0x100D _3D_array[1][2][1]
0x100E _3D_array[1][3][0]
0x100F _3D_array[1][3][1]
0x1010 _3D_array[2][0][0]
0x1011 _3D_array[2][0][1]
0x1012 _3D_array[2][1][0]
…
Indeed, for any dimensions of the form [x][y][z], the layout is the same if the
[20]
array is declared as [x*y*z] for any dimension .
Array Initializations
For a one-dimensional array, you enclose the initializers with a set of { } . The
initialized values must be compile-time-expressions: constants (numeric, literal
strings, etc.) and addresses of other global and static variables. You may use
simple arithmetic operators but not function calls, nor a reference to another
variable.
char hello[] = { “Hello” };
int small_table[] = { 1 };
int table[3] = { 1, 2, 3};
int big_table[1000] = { 1 };
As can be seen here:
A char array (e.g. hello) can be initialized by a literal string. The initialization
includes the terminating \0.
An array with a specified dimension (e.g. table) can have the exact number
of initialized values
An array with a specified dimension (e.g. big_table) can also have a fewer
number of initialized values. In this case, the compiler fills the rest of the array
variable with zeros
For multi-dimensional array, you enclose each set of the initializers with a set of {
} :
int array[4][2] = { {0, 1}, {2, 3}, {4, 5}, {6, 7} };
The initializers are laid out in memory like this:
0, 1, 2, 3, 4, 5, 6, 7
Which is the same as writing
int array2[8] = { 0, 1, 2, 3, 4, 5, 6, 7 };
As with one-dimensional array, you may skip the leftmost dimension if there is an
initializer:
int array3[][2] = { {0, 1}, {2, 3}, {4, 5}, {6, 7} };
struct Type
A struct is a collection of members or elements, collected in a single data
structure. The member list looks like a list of variable declarations. Instead of
names of variables, the names are called member, or field, names:
enum color { RED = 0, GREEN, BLUE };
struct id_record {
char name[20];
unsigned id;
enum color eye_color;
};
struct id_record employee1, employee2;
After declaring the type struct id_record, we declare two variables of that
type employee1 and employee2. You access a struct member by using
the . dot notation:
strcpy(employee1.name, “Jame T. Kirk”);
employee1.id = 1;
employee1.eye_color = RED;
Members of a struct must have unique names within that struct, but
otherwise there is no further restriction as long as the name follows the same
rules as naming a variable.
You can also assign a struct to another struct variable of the same type:
employee2 = employee1;
You may also pass a struct type as a function argument, and a function may
return a struct type as its return type:
extern struct id_record AddRecord(char *name, unsigned id);
extern void ProcessRecord(struct id_record);
employee2 = AddRecord(“Spock”, 2);
ProcessRecord(employee2);
Bitfield Members
Bitfields are members of a struct that occupy a number of contiguous bits. The
type of a bitfield is either int or unsigned. You declare a struct member
as a bitfield by specifying a size after its name:
struct {
unsigned : 28,
V : 1, C : 1, N : 1, Z : 1;
unsigned PC;
} PSR;
The number of bits can range from zero up to the number of bits in the int
type.
Bitfields may be unnamed (: 28 above); in which case, they exist mainly for
padding purposes, or to match unused bits in a hardware IO register field.
The semantic of an int bitfield of size one is undefined, as the single bit is
used to store the sign of the bitfield, and so there is no space to store any
value.
Bitfields can be accessed and operated on like other struct/union members,
except that you cannot take the address of a bitfield member using the C address
operator &.
Bitfields can be used to map to the layout of a hardware IO register. The example
above shows a possible mapping of a “PSR” which contains a processor’s status
flags and the PC register. Be aware that the allocation order of bitfields is not
defined by the C Standard. All JumpStart C compilers allocate bitfields from right
to left. If you use bitfields to map to an IO register, make sure that the allocation
orders match up.
Bitfields can also be used to minimize the use of space, for example, if there are
many one-state status flags, they can be declared as bitfields. However, it takes
more code to access bitfields, and as modern microcontrollers have sizeable
amount of SRAM, this practice is discouraged.
Unlike other members of a struct, or indeed other C variables, you cannot take
the address of a bitfield using the & operator, and the following constraints exist:
A pointer cannot point to a bitfield.
As with all struct declarations, the inner struct’s tag name is optional. In
addition, if the nested struct’s member names do not conflict with the names of
other members of the outer struct, then the member name for the nested
struct (e.g. i on the left hand side) can be omitted as well. Again, see the
example on the right hand side. This is called anonymous struct.
Union Type
Everything that has been written about struct types is applicable to union
types. The only difference between a struct and a union is that a struct is
laid out with the members in ascending memory address order, whereas a union
is laid out with the members at the same starting address.
A union is useful to examine the bit patterns of underlying data types:
union u {
float f;
unsigned char a[4];
} u;
u.f = 3.14159267;
printf(“%f dump: %X%X%X%X\n”, u.a[0], u.a[1], u.a[2],
u.a[3]);
The 4-byte array u.a occupies the same memory as the floating point member
u.f. The printf call prints out the 4 individual bytes of the internal
representation for 3.14159267.
Struct Initializations
You write an initializer for a struct by enclosing list of initialized values in a set
of { }. The list of values matches the list of members of the structure. The
initialized values must be compile-time-expressions, as with initializers for array
declarations and global / static variables.
Enum color { RED = 0, GREEN, BLUE };
struct id_record {
char name[20];
unsigned id;
enum color eye_color;
};
struct id_record employee1 = {
“James T. Kirk”,
1,
RED
};
Bitfields are treated exactly the same as other struct member. If the initialized
value is too large to fit, the unused upper bits are discarded:
struct {
unsigned a : 2, b : 3;
unsigned c;
}x = { 0xC, 2, 3 };
…
printf(“%d %d %d\n”, x.a, x.b, x.c);
would output
0 2 3
As a is only two bits, the value 0xC overflows, and only the bottom 2 bits (i.e. 0) is
kept.
Union Initializations
You can only initialize the first member of a union variable, and even though
there is only one value, you still need to enclose it with a set of { } , just like a
struct initializer:
union u {
float f;
unsigned char a[4];
} u = { 3.14159267 };
Global struct/union Declarations
To declare a global variable of a struct/union type, typically you write the
type declaration in a header file, and #include the header file in all the C
source files that access the global variable:
enum color { RED = 0, GREEN, BLUE };
struct id_record {
char name[20];
unsigned id;
enum color eye_color;
};
extern struct id_record employee1, employee2;
extern struct color {
unsigned char R, G, B;
} mycolor;
It makes no sense to prefix pure type declarations such as enum color and
[23]
struct id_record with the extern keyword . On the other hand, the
external variable employee1, employee2 must have the extern keyword for a
proper external declaration.
You can combine an external declaration of a struct variable and the type
declaration of the struct such as the case for mycolor above.
C GEEKERY TOPIC: There is a subtle C interpretation here: within a single file, a
tag name can only be used once to declare an enum, or struct/union. Even
two identical declarations are considered an error:
// within the same file…
struct color {
unsigned char R, G, B;
};
// An error, even if the declaration is exactly the same
struct color {
unsigned char R, G, B;
};
However, this rule is relaxed and in fact turned opposite when it comes to multiple
files: a type declaration across multiple files is assumed to be referring to the
same type is it has the same tag. This allows a type declaration to be put in a
header file, as explained earlier.
Type Qualifiers
A type qualifier specifies additional attributes of a data type.
[24]
A type qualifier may appear to the left or to the right of a * in a declaration. The
rules are simple:
1. If the type qualifier is in the initial part of the declaration and precedes any *,
then it is modifying the base type of the declaration.
2. Otherwise, the type qualifier must appear after a *, and it is modifying the *
pointer to its immediate left.
For example,
int const i = 0;
int *__flash cp = (int * __flash)0x1000;
int __flash * xp = (int __flash *)0x1000;
[25]
i is a “const qualified int”
cp is a “pointer in flash space pointing to an int”
xp is a “pointer (in normal data space) pointing to an int in flash space”
A const-qualified or __flash-qualified variable must have an initializer in the
declaration since it cannot be assigned to due to the const attribute.
C allows you to freely assign a non-qualified pointer type to a qualified pointer:
int i;
int *pi = &i;
int *const cpi = pi;
cpi is “const pointer to int” and therefore the pointer value cannot change further,
but the points-to value (the integer) may be modified.
Compatible Types
The compiler tests whether two types are compatible in many contexts. Two types
are compatible if:
They are the same type.
They are pointer types with the same type qualifiers, and they point to
compatible types.
They are array types with the same compatible element types, and either at
least one array does not have an element count, or both counts are equal.
They are functions whose return types are compatible, and
if parameters are specified on both functions, then the parameter types must
match for all parameters, or
if parameters are specified on one function only, then the parameter types
must not be float or an integer type that changes when promoted (i.e. one of
the smallest integer types such as char or short), or
if neither of their parameter lists are empty
For struct/union or enum types, if both are declared in different source
files, and they have the exact same member names. For struct/union
their member types are compatible, and for enum the values for enumeration
members are the same.
In practice, this means that struct/union and enum types are best
declared in a common header file that is #include’d by all source files that use
those types.
2. Move right, reading off any array [ ] and function ( ) specifiers, until you
hit a terminator: ‘)’, a comma ‘,’ or a semicolon ‘;’.
3. Now move back to the left again, reading off any * symbol(s), along with
any type qualifier(s).
4. If you encounter a ‘)’ in step 2, then stop when you hit the corresponding ‘(‘.
Now repeat from step 2 starting at the rightmost place where you were before
after step 2.
5. If you do not encounter a ‘)’ in step 2, then you will be reading off the base
type.
float CelsisusToFhrenheit(float F)
{
return (F * 9.0/5) + 32;
}
Modular coding - break up your code into manageable chunks. For example,
the sequence of instructions to set up a microcontroller is long and pretty
boring: “turn on these bits”, “enable that other bit”, etc. By putting this code
fragment into its own function, it makes the calling function less cluttered and
easier to manage.
API (Application Programming Interface)
Following on from above, a current widely-used programming practice is to write
programming “components” that are described by its API (Application
Programming interface), allowing other programmers to use the components
without knowing how they are implemented or how they work. A software program
then can be built using these components.
An API typically consists of function prototypes and communication protocols of
the component. For example, JumpStart API is an interface to the low level
Cortex-M microcontrollers’ features.
Function Prototype / Function Declaration
While the C Standard allows you to call a function without specifying the
prototype, runtime errors may result. Therefore, before you call a function, you
should declare its return value and its parameter types. The declaration must
appear lexically prior to the function call, and is sometimes placed in a header file.
C allows you to omit some part of a function declaration, e.g. you may declare the
name and the return type, but not the argument types. This practice is, again, not
recommended. A full function declaration is called a function prototype:
// prototype
char *strcpy(char *dst, const char *src);
extern char *strcpy(char *dst, const char *src);
The general form of a function prototype is:
<return-type> <function-name> ( <parameter-list> );
The storage class extern is optional and does not provide additional meaning.
If the function does not accept any arguments, then the keyword void is used for
the parameter-list. Otherwise, the parameter-list is a list of parameter declaration
separated by commas. Each parameter declaration is in the form of a variable
declaration except that:
1. The variable name is optional
Example:
void PrintInts(int nargs, …)
{
va_list ap;
va_list(ap, nargs); // initialize va with the last
// argument
for (int i = 0; i < nargs; i++)
{
int n = va_arg(ap, int);
printf(“next arg: %d\n”, n);
}
va_end(ap);
}
int main(void)
{
PinrtInts(4, 0, 1, 2, 3);
return 0;
}
would output:
next_arg: 0
next_arg: 1
next_arg: 2
next_arg: 3
Remember that for a variadic function, the arguments are promoted according to
the Type Promotion rules described earlier.
8. THE C PREPROCESSOR
The C preprocessor takes a source file and produces a set of C tokens, which
looks similar to the original source file. Given a source file, the C Preprocessor
performs the following tasks:
A line ending with a \ (backslash) is concatenated with the following line. This
is most useful with the #define directive, see below.
Comment text block sections enclosed in /* */ are replaced by a single space
(i.e.: they are “stripped” out of the code, as they have nothing to do with
running the program).
A single-line comment starting with // is replaced by a single space.
A line that starts with # is checked to see if it is a C preprocessing directive. If
so, the directive will be carried out.
A legal C token is checked to see if it is a macro invocation, AKA a macro
expansion. If so, the token is replaced by its expansion.
Each of these steps are described below. The C Preprocessor operations are
relatively simple, but misunderstanding its features can lead to errors. In
particular, an important point is that the C Preprocessor is a textual processor, and
it does not know the C language per se. We will see how that could give rise to
subtle program bugs.
Backslash \ Concatenation, or Line Continuation
C Preprocessor commands are known as directives. Directives start with the ‘#’
symbol and must be contained within a single line. However, sometimes it is more
convenient and less cluttered if the content of the directive can be placed on
separate lines.
To do this, if the last character in a line is a backslash \, it specifies that the line is
to be concatenated with the following line. For example, you can define a multiple-
line macro:
#define ADDMULT(x, a, b, c) \
(x) = ((a) + (b)) * C)
With multiple lines, the definition visually looks quite similar to normal C code
instead of a single, possibly overly-long line.
Comments
For readability and maintainability, it is useful to add comments to source code
regarding what behavior a particular section of code is doing. A single line of
commentary in a source code file is prefaced by //, and does not require any
terminating character other than a newline; a large block of comment text that
spans multiple lines is enclosed in the symbol pair /* */.
Examples:
int i; // A single line of commentary in a C source code
file.
/* I am multiple lines of commentary that go into depth to
explain something in the code so that it will be clearer
when another person wants to figure out what this section
of code is doing */
After seeing the initial /*, the C preprocessor considers the next */ it sees as the
ending delimiter for the comment block. In other words, nested comment blocks
are not supported.
List of Standard C Processor Directives
[31]
Lines starting with the # character are interpreted as C preprocessor
directives. Compiler-specific extensions are usually specified using #pragma
directives.
Predefined Macros
2. Search for the file in the directory where the enclosing file is located as the
starting directory, and apply any relative path specification (e.g. “..” to reach
the parent directory). If the file is still not found, then
3. If the compiler allows the users to add additional include file paths, search
them in the order specified.
For JumpStart C, include path(s) are specified through the Project->Build Options
dialog box.
#define - Simple Macro Definitions
Use #define to define a simple substitution macro:
#define <name> <definition>
Examples:
#define PI 3.1415926
#define TABLE_LENGTH 10
Any whitespace between the macro name and the definition is ignored and is not
part of the definition, as is any whitespace after the last non-whitespace character
in the definition.
A macro definition is terminated by a newline. In particular, do not put a C
terminator semicolon ‘;’ at the end of a macro definition, as a preprocessor macro
does not require it.
A macro definition can be removed by using the #undef directive. The scope of
the macro runs from the place where it is defined until the end of the current
source file being processed, or until the Preprocessor encounters a #undef for
the named macro.
Unless placed in a header file that is included by multiple source files, a macro
definition in one source file is not visible to any other source files.
Simple Macro Invocation
Whenever a macro name is seen in the source file after the #define statement,
then its occurrence is replaced by the definition text. This is called a macro
invocation. A macro invocation may result in another macro name, which will then
itself be expanded. However, the C Preprocessor remembers the chain of macro
names being expanded, and will terminate the expansion if the same macro name
is invoked more than once (recursive definition).
Simple macro definition is useful to replace frequently used sequences. For
example, if you declare an array of 10 elements, it is best to use a macro name to
represent the array size, as it might appear multiple times in the code:
#define TABLE_LENGTH 10
int table[TABLE_LENGTH];
for (int i = 0; i < TABLE_LENGTH; i++)
…
Using a macro name allows the array size to be changed if specifications change,
without modifying all the places where the array size appears.
CAUTION: Macro Abuse!
BAD PROGRAMMING PRACTICES: Some C programmers like to define “more
user friendly” names to make it “easier to remember” some of the C operators; for
example, some common macros are:
#define AND &&
#define OR ||
//now they can write
if (a == b AND c != d OR a < d)
…
In our opinion, these macros do not serve any useful function: it makes it harder
for someone else to understand your “dialect” of C, and it makes it harder for you
to understand other people’s code.
Another abuse is to change a definition without changing the macro name. Most
of the time, doing this is not necessarily a bad practice, however, something like
this was actually found in production code:
#define FIVE 4
Using a macro like “FIVE” instead of simply writing “5” was probably a Bad Idea™
to begin with. It’s NOT the same as “#define NAME_LEN 5” where the name
provides some sort of clue on how it might be used. FIVE is just not informative.
The worst though is that the constant was obviously changed, perhaps to work
[33]
around An Issue , but the macro name remained unchanged. This
demonstrates a profound lack of consideration for code maintenance on the part
of the programmer.
#define: Function-Like Macros
Before the introduction of inline functions as a feature of the C language, using a
function-like macro was a way to simulate this feature:
#define <name>(<optional-args>) <definition>
Example:
#define max(a, b) (((a) > (b)) ? (a) : (b))
A function-like macro differs from a simple macro in that a list of macro arguments
separated by a comma and enclosed by a set of parenthesis ( ) follows the
macro name. Unlike a C function declaration, the argument list is just a list of
names, without any type declarations.
Function-Like Macro Invocation
After the function-like macro #define statement, if the macro name is seen in the
source file, and if it is followed by a set of ( ), then its occurrence including the
arguments (at the invocation point, they are known as actual arguments) is
replaced by the definition text. The number of actual arguments must match the
number of arguments in the macro definition.
During the substitution phase, any occurrence of an argument in the definition text
is replaced by the actual argument. For example, the macro definition #define
max(a, b)in the previous example:
int x, y, z;
x = max(y, z);
is expanded as:
x = (((y) > (z)) ? (y) : (z));
[34]
BEST PRACTICE: It must be emphasized, again , that the preprocessor is a
textual processor, i.e. the definition text replaces the original text without
considering the validity of the definition text as C code. This is why in the definition
text, you should always enclose any reference to an argument in a set of ( ), as
shown in the example above. It is also common to enclose the entire definition (if
it is an expression) in a set of ( ). This will prevent any unintended effects after
expansion.
For example, given a macro definition of “mult” macro as below:
#define mult(a, b) a * b
int x, y, z;
x = mult(y + z, y);
the macro expands to:
x = y + z * y;
According to C precedence rules, this expression is the same as:
x = y + (z * y);
even though the programmer probably meant the expression to be interpreted as:
x = (y + z) * y;
To fix this, simply enclose arguments in the macro definition with sets of
parentheses, and the desired effect will result:
#define mult(a, b) ((a) * (b))
Function-Like Macros vs. Inline Functions
With the introduction of inline functions, function-like macros should only be used
if special features (e.g. tokenization or stringizing, explained in later sections) are
needed, or if the macro definition is short. Otherwise, inline functions should be
used.
Variadic Macros
Just like a C function, a C macro may specify a variadic argument list:
#define mac1(a, …) foo(a+1, __VA_ARGS__)
#define mac2(…) bar(__VA_ARGS__)
mac1(1, 2, 3) foo(1+1, 2, 3)
mac1(a, b) foo(a+1, b)
mac2(1, 2, 3) bar(1, 2, 3)
mac2(1) bar(1)
Variadic macros allow a variable number of actual arguments to appear when the
macro argument is specified as “…” (three dots). In the macro definition, the
special word __VA_ARGS__ is replaced with the variadic actual arguments.
ADVANCED TOPIC: Tokenization
In a macro definition, a new token is created if two tokens are separated by the
character sequence ##. For example, writing a##b is the same as if you have
written the token ab. This is called tokenization or token pasting:
#define make_add(type) \
type add_##type(type a, type b) { \
return (a) + (b); \
}
make_add(float)
make_add(int)
The macro invocations expand to two function declarations:
float add_float(float a, float b) { return (a) + (b); }
int add_int(int a, int b) { return (a) + (b); }
Tokenization is useful in creating multiple functions from a basic template, as
shown above.
ADVANCED TOPIC: Stringizing
In the definition of a function-like macro, if a macro argument is prefixed with a
single # character, then a literal string is created with the actual argument at
invocation:
#define make_string(a) #a
#define concat(a, b) #a “ “ #b
make_string(hello world) “hello world”
concat(hello, world) “hello” ” ” “world”
“hello world”
In the “concat” macro, the definition also takes advantage of the C feature that if
two literal strings appear adjacent to each other, they will then be merged into a
single string.
#undef - Undefine a Macro Name
#undef directs the C Preprocessor to “forget” the macro definition of the macro
name. It is not considered an error to #undef a name that has not yet been
defined. #undef un-defines both simple and function-like macros.
#undef <name>
Example:
#undef max
Conditional Processing
These directives allow conditional processing of the source file. This must start
with an “if test directive”, and the conditional group of lines is terminated by
#endif at the other end. Before the ending #endif, there might optionally be an
“else directive” line group.
An “if test directive” is one of #if, #ifdef (“if defined”), or #ifndef (“if not
defined”). For readability, on this page, #if is used to represent any of these
directives.
An “else directive” is either #else or #elif (“else if”). For readability, on this
page, #else is used to mean either of these directives.
A conditional processing group is as follows:
#if <condition>
<a line group that will only be processed by the
Preprocessor if the condition passes>
#else
<an (optional!) line group that will only be processed
if the condition fails>
#endif
Conditional processing can be nested, and each #endif is always paired with the
immediately preceding #if directive. While in the actual source code,
indentations do not matter; the indentations in the examples show the groupings:
#if <condition>
<stuff>
#if
<…stuff…>
#else
<…some different other stuff…>
#endif
#else
<stuff>
#endif
Conditional Test with #if <expr>
If <expr> is evaluated to nonzero, then the line group following until the
corresponding #else or #endif is processed by the Preprocessor. Otherwise,
the entire line group, including any nested conditional group, is skipped.
When processing the <expr>, the following actions are performed:
Given a defined(<name>) or defined <name>:
If there is a macro definition for “<name>”, then it is replaced with the value 1.
Otherwise, it is replaced with the value 0.
Given a <name>:
#define A B
#define B 1
//So:
#if A → expands to #if B
// And then:
#if B → expands to #if 1
If there are any C-style operators, they are evaluated using C rules.
For example:
// These two are usually defined externally in the IDE, or
as
// predefined macros
#define __ICC_VERSION 81020
#define DEBUG 1
// In the source file:
#if __ICC_VERSION >= 81000 && defined(DEBUG)
expands to #if 81020 >= 81000 && 1
// And then:
#if 81020 >= 81000 && 1
evaluates to #if 1 && 1
// And then:
#if 1 && 1
evaluates to #if 1
Notes:
The sizeof operator is not allowed in the <expr>; e.g. you cannot write:
#define PI 3.1415926
#if (int)PI >= 3
Conditional Test with #ifdef / #ifndef <name>
#ifdef <name> is the same as writing #if defined(<name>)
#ifndef <name> is the same as writing #if !defined(<name>)
Conditional Test with #else / #elif <expr>
#else starts the alternate line group of a conditional group. #elif <expr> is a
shorthand of writing #else followed by a #if/#endif directive pair:
#elif <expression>
<stuff>
#endif
This is the same as writing
#else
#if <expression>
<stuff>
#endif
#endif
Conditional Test End Marker #endif
#endif ends the conditional group.
#warning <message> - Output a Warning Message
#warning <message> causes the C Preprocessor to output the <message> as
a warning. For example:
#if __ICC_VERSION < 81000
#warning “Out-of-date compiler is being used”
#endif
When processing this code fragment, if the conditional succeeds (evaluates to 1),
then the C Preprocessor outputs the message.
#error <message> - Output an Error Message
#error <message> causes the C Preprocessor to output the <message> as an
error diagnosis. For example:
#if __ICC_VERSION < 81000
#error “Out-of-date compiler is being used”
#endif
When processing this code fragment, if the conditional succeeds (evaluates to 1),
then the C Preprocessor outputs the message.
Unlike with a “mere warning”, the C Preprocessor may then terminate and return
with an error status.
#once - Include a File Once Only
#once directs the C Preprocessor not to process this file again. This can be
placed anywhere in the included file. This is usually used in a header file so it will
not be processed more than once, which could cause problems, depending on the
content of the include file. Without this directive, a previously common idiom
(sometimes still found in older code) that was used to prevent this behavior is:
// assuming the include file is named this_header.h
#ifndef THIS_HEADER_H
#define THIS_HEADER_H
…
// actual content of the file
#endif
The macro name “THIS_HEADER_H” should reflect the actual filename of the file,
to prevent name collision with other header files.
Using #once simplifies the process.
ADVANCED TOPIC: #pragma - Compiler Specific
Extensions
#pragma (derived from the word “pragmatic”) provides a method for a C compiler
to define product-specific extensions. As such, there are no generic #pragma
descriptions. If you are not using JumpStart C, please refer to your compiler
manual for specific list of supported pragmas.
List of (Incomplete) JumpStart C Pragmas
#pragma ignore_unused_var <name1> <name2> …
Normally, the JumpStart C compilers issue a warning if a local variable is not
referenced. This pragma prevents the compiler from issuing such diagnostics
on the listed name(s).
#pragma warn <message>
Same as #warning <message>
#pragma once
Same as #once
#pragma interrupt_handler <func1>:<vec1> <func2>:<vec2> …
Specific to JumpStart C for Atmel AVR, this directs the compiler to associate
function names as interrupt handlers, since interrupt handlers require different
entry and exit code than normal C functions. This pragma MUST precede the
definitions of those functions.
setjmp.h transfer program control bypassing normal function calls and returns
The printing is done using stdio output mechanisms (see stdio in a later
section.) The message is generally in the following form (there might be
variations depending on the compiler):
An equivalent set exists for float data type:
limits.h - Properties of Integer Type Representations
The following macros are defined. Note that there is no <unsigned type>_MIN, as
that is always zero.
math.h - Floating-Point Math Functions
The following floating-point math routines are supported in this header file.
float asinf(float x)
returns the arcsine of x for x in radians.
float acosf(float x)
returns the arccosine of x for x in radians.
float atanf(float x)
returns the arctangent of x for x in radians.
float atan2f(float y, float x)
returns the angle whose tangent is y/x, in the range [-pi, +pi] radians.
float ceilf(float x)
returns the smallest integer not less than x.
float cosf(float x) )
returns the cosine of x for x in radians.
float coshf(float x)
returns the hyperbolic cosine of x for x in radians.
float expf(float x)
returns e to the x power.
float exp10f(float x)
returns 10 to the x power.
float fabsf(float x)
returns the absolute value of x.
float floorf(float x)
returns the largest integer not greater than x.
float fmodf(float x, float y)
returns the remainder of x/y.
float frexpf(float x, int *pexp)
returns a fraction f and stores a base-2 integer into *pexp that represents the
value of the input x. The return value is in the interval of [1/2, 1) and x equals f
* 2**(*pexp).
float froundf(float x)
rounds x to the nearest integer.
float ldexpf(float x, int exp)
returns x * 2**exp.
float logf(float x)
returns the natural logarithm of x.
float log10f(float x)
returns the base-10 logarithm of x.
float modff(float x, float *pint)
returns a fraction f and stores an integer into *pint that represents x. f + (*pint)
equal x. abs(f) is in the interval [0, 1) and both f and *pint have the same sign
as x.
float powf(float x, float y) )
returns x raised to the power y.
float sqrtf(float x)
returns the square root of x.
float sinf(float x)
returns the sine of x for x in radians.
float sinhf(float x)
returns the hyperbolic sine of x for x in radians.
float tanf(float x) )
returns the tangent of x for x in radians.
float tanhf(float x)
returns the hyperbolic tangent of x for x in radians.
setjmp.h - Transfer Program Control
These functions transfer program control bypassing the normal function calls and
returns.
jmp_buf
a typedef type, for declaring a variable to be used for storing the execution
context. The actual type is compiler and target device dependent. For
example:
jmp_buf env;
int setjmp(jmp_buf env)
stores the current execution context into “env”, and immediately returns zero.
This function may “return” again via a longjmp call, but in that case the return
value is guaranteed to be nonzero. setjmp’s function is basically just to “set up”
a location in the code for a later longjmp (described below) to be able to
“return” to during runtime.
Setjmp example:
if (setjmp(env) == 0)
<normal processing>
void longjmp(jmp_buf env, int val)
causes execution to “jump” back to the location of the setjmp call, as if the
setjump call had just returned (but this time with a non-zero return value).
longjmp should never be called before the initial setjmp call has actually been
made, because this will almost certainly result in disaster. The longjmp call
must be made by a function in a runtime call chain originating inside the
function which called setjmp.
If “val” is nonzero, it is used as the return value from setjmp. Otherwise, the
value 1 is used as the return value.
struct s {
int x;
int y;
int z;
};
… = offset(struct s, y);
ptrdiff_t
a signed integer type that can hold the difference between two pointer values.
Example:
ptrdiff_t diff;
int *x, *y;
…
diff = x - y;
size_t
an unsigned integer type that can hold the result of a sizeof operator. It is
also the return type of certain string.h functions, e.g. strlen. Example:
size_t size;
…
size = sizeof (int);
stdio.h Standard Input / Output (I/O) Functions
Standard I/O functions read and write to “standard Input Output” channels. On a
Unix/Linux or a Windows target machine, these I/O channels might be the
keyboard input and the terminal output of the command prompt or shell window.
These machines would also support file systems, and these functions allow a
portable method to read and write to files in the file systems.
For embedded system targets, there is typically no file system, but nevertheless,
some I/O functions are useful and are usually supported.
This section describes the subset of Standard C Library functions supported by
the JumpStart C compilers. Other embedded system compilers might support
different functions or different options.
NOTE: You will need to initialize the input and output ports. The lowest level of I/O
routines consists of the single-character input (getchar) and output (putchar)
routines, which must be implemented specific to each target device. JumpStart C
provides example implementations for various devices, and for most cases you
can just copy the correct example file to your project.
Once you have implemented the low-level functions, you do not need to make
modifications to the high-level standard I/O functions such as printf, sprintf, scanf,
etc.
int getchar(void)
returns a character from the input channel. You must implement this function
(or copy example code to implement it), as it is device-specific.
char *gets(char *buf)
reads a line of input. You must pass a buffer “buf” into the function. The
function uses the getchar() function to store input characters into “buf” until a
newline character is read and copied. “gets” then stores a NUL character to
terminate the string and returns the address that was originally passed.
The width is either a decimal integer or a ‘*’ (star), indicating that the value is
taken from the next argument. The width specifies the minimal number of
characters that will be printed, which will be right or left aligned, and padded
with either spaces or zeros depending on the flag characters.
The precision is preceded by a ‘.’ and is either a decimal integer or ‘*’, denoting
that the value is to be taken from the next argument. The precision specifies
the minimal number of digits for an integer conversion, the maximum number
of characters for the s-string conversion, or the number of digits after the
decimal point for the floating-point conversions.
The next page contains examples of printf and its formatting code.
int putchar(char c)
prints out a single character. You must implement this function (or copy
example code to implement it), as it is target-specific.
int puts(char *s)
prints out a string followed by a newline.
int scanf(char *fmt, …)
reads the input according to the format string “fmt”. The function getchar() is
used to read the input. Therefore, if you override the function getchar(), you
can use this function to read from any device you choose.
Non-white whitespace characters in the format string must match exactly with
the input, and whitespace characters are matched with the longest sequence
(including null size) of whitespace characters in the input. A % character in the
format string introduces a conversion specifier.
[l] long modifier. This optional modifier specifies that the matching argument
is of the type pointer-to-long.
Both ftoa and the dtoa perform the same task, except that ftoa works with 32-
bit floating points, while dtoa works with 64-bit floating points.
Both “ftoa” and “dtoa” have two versions, selected by using an option in the
CodeBlocks IDE options dialog box. The default version of each function is
smaller and faster, but does not support the full range of the floating point
inputs (numbers that are too small or too large).
In the default version of these functions, If the input is out of range, “*status” is
set to the constant _FTOA_TOO_LARGE or _FTOA_TOO_SMALL (defined in
stdlib.h) and zero is returned. Otherwise, “*status” is set to zero and the buffer
is returned.
If you encounter these error results, you can enable the larger (and slower)
version which can handle all valid ranges by enabling the option in the dialog
box.
As with most other C functions with similar prototypes, “*status” means that
you must pass the address of a variable to this function. Do not declare a
pointer variable and pass it without initializing its pointer value.
void itoa(int value, char *buf, int base)
converts a signed integer value to an ASCII string, using “base” as the radix.
“base” can be an integer from 2 to 36.
long labs(long i)
returns the absolute value of “i”.
void ltoa(long value, char *buf, int base)
converts a long value to an ASCII string, using “base” as the radix.
void *malloc(size_t size)
allocates a memory chunk of size “size” from the heap. It returns zero if it
cannot honor the request.
void _NewHeap(void *start, void *end)
This is a JumpStart C specific function that initializes the heap for memory
allocation routines. A typical call uses the address of the symbol _bss_end+1
as the “start” value. The symbol _bss_end defines the end of the bss segment
(see Chapter 3. C Compilers And The Runtime Environment). Example:
The content of the previous allocated memory is copied to the new space.
void srand(unsigned seed)
initializes the seed value for subsequent rand() calls.
long strtol(char *s, char **endptr, int base)
converts the initial characters in “s” to a long integer according to the base. If
“base” is 0, then strtol chooses the base depending on the initial characters
(after the optional minus sign, if any) in “s”: 0x or 0X indicates a hexadecimal
integer, 0 indicates an octal integer, with a decimal integer assumed otherwise.
If “endptr” is not NULL, then “*endptr” will be set to where the conversion ends
in “s”.
unsigned long strtoul(char *s, char **endptr, int base)
is similar to “strtol” except that the return type is unsigned long.
void utoa(unsigned value, char *buf, int base)
same as itoa except that the argument is taken as unsigned int.
void ultoa(unsigned long value, har *buf, int base)
same as “ltoa” except that the argument is taken as unsigned long.
string.h - String Functions
The following string functions and macros are declared in string.h:
(There is related function, “strncpy” (see below) which is used to copy only a
portion of the “s2” string.)
size_t strcspn(const char *s1, const char *s2)
searches for the first element in “s1” that matches any of the elements in “s2”.
The terminating nulls are considered part of the strings. It returns the index in
“s1” where the match is found.
size_t strlen(const char *s)
returns the length of “s”. The terminating null is not counted.
char *strncat(char *s1, const char *s2, size_t n)
concatenates up to n elements, not including the terminating null, of “s2” into
“s1”. It then copies a null character onto the end of “s1”. It returns “s1”.
int strncmp(const char *s1, const char *s2, size_t n)
is the same as the “strcmp” function except it compares at most “n” characters.
char *strncpy(char *s1, const char *s2, size_t n)
is the same as the strcpy function except it copies at most “n” characters.
char *strpbrk(const char *s1, const char *s2)
does the same search as the “strcspn” function but returns the pointer to the
matching element in “s1” if the element is not the terminating null. Otherwise, it
returns a null pointer.
char *strrchr(const char *s, int c)
searches for the last occurrence of “c” in “s” and returns a pointer to it. It
returns a null pointer if no match is found.
size_t strspn(const char *s1, const char *s2)
searches for the first element in “s1” that does not match any of the elements
in “s2”. The terminating null of “s2” is considered part of “s2”. It returns the
index where the condition is true.
char *strstr(const char *s1, const char *s2)
finds the substring of “s1” that matches “s2”. It returns the address of the
substring in “s1” if found and a null pointer otherwise.
char *strtok(char *s1, const char *delim)
splits “s1” into tokens. Each token is separated by any of the characters in
“delim”. You specify the source string “s1” in the first call to “strtok”.
Subsequent calls to “strtok” with “s1” set to NULL will return the next token
until no more tokens are found, and “strtok” then returns NULL.
p = strtok(str, delim);
// p now points to “Hello”
printf(“%d: %s\n”, i++, p);
Prints out:
1: Hello
2: world
3: I
4: am
5: Alive!
6: May
7: be
time.h - Time Manipulation Functions
As embedded systems do not generally have an underlying OS that provides the
system time, low-level functions must be written in order to access the target
system’s RTC (Real Time Clock) or MCU timers to acquire the rudimentary time
data.
There are two sets of functions: Clock Functions and Time Functions.
time.h - Clock Functions
clock_t
a typedef type. An arithmetic type that can hold the values returned by the
function “clock”. Typically an int.
clock_t clock(void)
this returns the number of clock ticks that have occurred since the program
started, or -1 if the information is not available.
CLOCKS_PER_SEC
this is a macro defining the number of clock ticks per second.
time.h - Time Functions
The time functions operate on two defined types:
time_t
a typedef type. This is an arithmetic type that can hold the values returned by
the function “time”. Typically int or long.
struct tm
a structure that holds the time information. Typically it has the following fields,
but they may not be necessarily in the order shown, or have identical field
names:
struct tm {
int sec; // number of seconds after min
int min; // minutes after the hour
int hour; // hour of the day (from 0)
int mday; // day of the month (from 1)
int month; // month of the year (from 0)
int year; // year since 1900
int wday; // days since Sunday (from 0)
int yday; // day of the year (from 0)
int is_dst; // is Daylight Saving Time in effect
};
The lowest level function is:
time_t time(time_t *tod)
returns the current calendar time. If “tod” (time of day) is not NULL, then the
current time is written to “*tod” as well.
For an embedded target, you might have to implement this function to read
from a RTC or the target device timer, and convert the native value into the
time_t format.
time_t Functions:
char *ctime(time_t *cal)
converts the calendar time in “*cal” to an ASCII representation. It returns a
static buffer and is equivalent to calling “asctime(localtime(cal))” (see below)
double difftime(time_t t1, time_t t2)
returns the difference of two time values in number of seconds
struct tm *gmtime(const time_t *tod)
converts a calendar time to a “struct tm“ variable, in GMT (Greenwich Mean
Time, now more commonly known as UTC - Universal Time Coordinated). It
returns the address of a static struct holding the converted values.
struct tm *localtime(const time_t *tod)
converts a calendar time to a “struct tm“ variable, in local time. It returns the
address of a static struct holding the converted values.
Note: both gmtime and localtime use the same static “struct tm” variable.
“struct tm“ Functions:
char *asctime(struct tm *tptr)
converts the time value to an ASCII representation of 26 characters. The
format is
where:
www 3 character weekday, e.g. Mon
mmm 3 character month, e.g. Jan
dd 2 character day, e.g. 02
hh 2 character hour
mm 2 character minutes
ss 2 character seconds
yyyy 4 character year
char hello[6];
char *ptr;
ptr = hello; // OK
An array object has allocated storage.The name of the array object is the
starting address of the array storage.
char hello[6];
char *qptr; // uninitialized variable
qptr = hello;
*qptr = 1; // OK
C does not check the validity of an address. Reading and writing from an
invalid address is not checked by the compiler, and may cause runtime
problems at execution points far away from where the invalid address is used.
This is known as memory overwrite problems, or the “C gives you enough rope
to hang yourself” axiom.
Even if an address is valid, C does not check the size of the storage object it is
accessing. If you use strcpy or other functions to write 12 characters into a 10
character array, C is OK with that! Your program, however, won’t be.
char hello[6];
void * or a “pointer to void” is an incomplete type and is the only pointer type
that can be converted between any other pointer type.
Memory Aliasing: Pointer Access
The simplest use of a pointer variable is to assign the address of another variable
to it, so that the value of the pointed-to variable can be accessed through the
pointer variable. Being able to access the same memory location using different
objects is known as memory aliasing:
nti , *p;
p = &i;
*p = …; // write to “i”
… = *p; // read “i”
Pointer Arithmetic
One of the unique innovations of the C pointer, as compared to pointer
implementations in other programming languages at the time (early 1970s) is that
when you add an integer to a pointer, the integer is scaled by the number of bytes
of the pointed-to type. For example,
short *p;
int n;
Assuming “p” contains the value (presumably a valid address) 0x1000, and a
short is 2 bytes wide, and “n” has the value 3, then the expression “p + n” has the
value:
p + n 0x1000 + (n * 2)
0x1000 + (3 * 2)
0x1000 + 6
0x1006
This is also the address needed to access an “array of shorts”. For example:
short array[4];
Assume “array” starts at 0x1000. The addresses of the elements of “array” are:
address of array[0] 0x1000
address of array[1] 0x1002
address of array[2] 0x1004
address of array[3] 0x1006
As you can see, &array[3] (address of array[3]) is the same as (p + 3). Therefore,
by assigning the address of an array of type X to a pointer to type X, any array
element can be accessed by treating the pointer as if it is an array. Given:
p = &array[0];
// or alternately written as
p = array;
Then these are equivalent:
Using Pointers to Access Arrays
As mentioned, the type “array of type X” is type-compatible with “pointer to type
X”. A concise method to traverse an array can be written using pointers. For
example, the following sums (adds up the values of) the contents of an array
using array indexing:
#define SIZE 10
int table[SIZE];
int sum = 0;
for (nti = 0; i < SIZE; i++)
sum += table[i];
Here is the same behavior written using pointers with the same variable
declarations as above:
int *p = table;
for (nti = 0; i < SIZE; i++)
sum += *p++;
The primary advantage of using pointers in this example is the efficiency of using
the expression *p++, which can be compiled to a single instruction depending on
the target machine, whereas an indexing operation table[i] would take two to
four instructions. (See next section.)
Efficiency of Using Pointers
One idiom frequently used in writing pointer code is this expression:
*p++;
This evaluates to the value pointed to by “p”, and then increments “p” to point to
the next element. For example, the core part of the standard library function
strcpy() can be written as:
while (*dst++ = *src++)
;
This single line copies a byte from the memory whose address is stored in “src” to
the memory cell whose address is stored in “dst”, then increments both pointers to
point to their next locations. Finally, the copied value is tested against zero, and if
it is the terminating nul in the string, then the loop terminates.
On the Digital Equipment PDP-11, one of the earliest machines on which the C
programming language was created, the copy instruction with pointer increments
compiled down to a single instruction (!):
movb (r1)+,(r2)+
It was rumored that the C ++ and – operators were created to match the PDP-11
addressing modes, but Dennis Ritchie, the creator of C, has said that this is not
true. Nevertheless, C’s pointer access and increment/decrement operators
continue to influence the design of Instruction Set Architecture. For example, the
Cortex-M does the copy and increments in two instructions:
ldrb R3,[R0],#+1
strb R3,[R1],#+1
Even if the target architecture does not have the increment or decrement
addressing modes, accessing an array via a pointer is usually more efficient, as a
pointer variable can be allocated in a fast CPU register, whereas an array name
reference is always a (slower) memory access.
Out of Bound Array Accesses
A pointer may contain the address of any array element, not just the first element.
Using this feature allows a pointer to alias to the address of any array element as
the start of a “virtual array”:
p = &array[2];
then
p[1] is the same as array[3]
ADVANCED TOPIC: it is legal to write a negative index to a pointer or array
reference. An obvious use is:
p = &array[2];
assert(&p[-1] == &array[1]);
assert(p – 1 == &array[1]);
Even though “p” above is not set to the first element of “array”, it can still access
earlier elements of “array” by using a negative index.
POTENTIAL BUG TRAPS: it is this lack of index checking that can cause
program crashes. For example:
short array[4];
short *p;
p = &array[2];
// The following examples cause runtime errors
p[-3] = … ;
p[4] = … ;
Both array references through “p” exceed the boundary of “array”, and the writes
would be writing to random other objects depending on where “array” is allocated
– it may affect other global variables, or local variables, or even the function’s
return address if it is stored in the stack.
Indeed, there is no index bound checking even for a real array:
short array[4];
array[-1] = … ;
array[4] = … ;
Both array references above are also out of bounds and will cause runtime errors,
but they are perfectly valid C code.
Writing Generic Functions to Process Arrays
Since a “pointer to type X” is type-compatible to the type “array of type X”
regardless of the number of elements in the array, it is simple to write functions in
C that operate on input arrays of any size.
For example, “strings” in C are arrays of characters terminated in a nul character
(\0). Without needing to know the actual length of this array, the standard C library
function strcpy copies from a source string into a destination array. The function
walks through the source input and copies the input to the output, and only needs
to stop when it reaches the terminating nul:
char *strcpy(char *dst, const char *src)
{
char *val = dst;
while (*dst++ = *src++)
;
return val;
}
Indeed, it is acceptable to declare the argument types as incomplete array types:
char *strcpy(char dst[], const char src[]);
By contrast, in a programming language such as Pascal where an array of X
elements is considered to be a separate and distinct type from an array of Y
elements, it is impossible to write such generic functions.
Returning a Pointer from a Function
Sometimes a function needs to return a pointer type. For example, the strcpy
function in the Standard C Library is declared as:
char *strcpy(char *dst, const char *src);
dst is either an array object, or a pointer to a dynamically allocated object. In the
case of strcpy, the function returns the value of dst that was passed to the
function.
When a function returns a pointer, the returned pointer must point to a valid
address; for example, the address of a global or static variable, or a dynamically
allocated object.
Returning a pointer to a local variable would capture the address of a stack
location that becomes no longer valid once the function returns, and will cause
problems when you attempt to access the array later.
For example, the following function “ReverseString” reverses an input string of up
to 99 characters. The returned pointer is the address of an element in the static
array dst. Since it has a static lifetime, even after the function ReverseString
returns, the array object persists and therefore can still be accessed.
#define SIZE 100 // 99 characters + 1 nul
char *ReverseString(char *src)
{
static char dst[SIZE];
dst[SIZE-1] = 0;
char *pdst = &dst[SIZE-1];
for (nti = 0; i < SIZE-1 && *src != 0; i++)
*—pdst = *src++;
return pdst;
}
The “for” loop terminates when either the limit of the destination array is reached
or when the terminating nul of the source is reached. Characters are copied from
the end of the destination to the beginning using the *—pdst idiom.
PITFALL #1: the function above returns the same storage object every time it is
called. Therefore, if multiple calls are made to the function, you may need to make
copies of the reversed strings elsewhere.
PITFALL #2: the function only works with an input array of up to 99 characters. A
limit must be set, since you must give a size when you declare the array. You can
eliminate this limitation and also solve PITFALL #1 by using a dynamically
allocated storage space for the reversed string. The tradeoff is the use of heap
memory which incurs runtime overhead:
char *ReverseString(char *src)
{
int len = strlen(src);
char *dst = (char *)malloc(len+1);
dst[len] = 0;
char *pdst = &dst[len];
while (*src != 0)
*—pdst = *src++;
return dst;
}
strlen is used to compute the actual length of the input string. Space is
allocated by using malloc, after which the code copies the bytes in reverse
order, using a similar algorithm as before.
Using Pointers to Access Arbitrary Memory Locations
Embedded System programming frequently uses pointers to reference I/O
registers or arbitrary memory locations. The power of C and pointers is that you
can declare a structure and cast an arbitrary address to be of that type, and the C
compiler computes the correct offset to access the elements, so the programmers
do not need to remember constant offsets and addresses.
For example,
#define PERIPH_BASE ((uint32_t)0x40000000)
#define APB2PERIPH_BASE (PERIPH_BASE + 0x10000)
#define TIM1_BASE (APB2PERIPH_BASE + 0x2C00)
#define TIM1 ((TIM_TypeDef *) TIM1_BASE)
typedef struct
{
uint16_t CR1;
uint16_t RESERVED0;
uint16_t CR2;
uint16_t RESERVED1;
uint16_t SMCR;
uint16_t RESERVED2;
uint16_t DIER;
uint16_t RESERVED3;
uint16_t SR;
… // rest of the struct elided
} TIM_TypeDef;
TIM1->CR2 = 0;
While we have cautioned against type casting integer values to pointer values in
general, this is indeed the most efficient way for embedded system work to
access the I/O registers. As these values are provided by the silicon vendors or
the compiler vendors, and hidden behind macro names, the potential for misuse is
much reduced.
The cast operator is needed to satisfy the Standard C typing requirements. In the
end, TM1->CR2 just means to pretend that address 0x40012C00 is the address of
a structure defined by the TIM_TypedDef declaration, and then access the CR2
element of that structure.
In the above structure typedef definition, CR2 is the third element of the structure,
preceded by two uint16_t members (which we can assume are 16-bit each in
size), therefore the final address location we are accessing is 0x40012C00 + 0x4,
or 0x40012C04.
Volatile I/O Register Access
In C, reading memory data is normally useful only if the value of the data is to be
used in some manner, e.g. being assigning to a variable, or used in a
computation. However, sometimes an MCU peripheral subsystem requires the
firmware to read an I/O register to effect some changes in the hardware state:
This is an excerpt from the I2C section of the reference manual of ST’s
STM32F411 devices. In here, the “master” (i.e. the MCU) waits for the SR1
(Status Register 1) I/O register to be read.
In C, you would typically write
unsigned tmp = I2C1->SR1;
I2C1 is a type, defined similarly to the example given on a previous page, that
maps to the I2C1 I/O register of the MCU. This statement reads the register,
satisfying the hardware requirement. However, the variable is never used, and this
may trigger warnings from the compiler or a lint-style code checking tool.
One method to avoid warnings is to declare the I/O register field as volatile,
similar to a previous page’s example:
…
// inside the typedef for I2C struct
volatile unsigned SR1;
…
In this case, then, a read of the I/O register informs the compiler that the read
must be performed, even though the value is not explicitly used in the program
code:
I2C1->SR1; // a read with a side effect on hardware
ADVANCED TOPIC – Accessing the Stack Address
In embedded programming, sometimes you need to find out the location of the
stack pointer, for example, to write a task scheduler, you need to capture the CPU
context, which includes the stack pointer. While this is normally done using
assembly code routines, you can sometimes replace some or all of the assembly
code using C.
This function returns the address of the stack pointer:
void *StackPointerAddress(void)
{
unsigned x;
return (void *)&x;
}
ADVANCED TOPIC: Writing a Simple Function
Dispatcher
You can store a set of function addresses in an array or other data structure and
then invoke them indirectly. You can use this function dispatching to implement a
[35]
Finite State Machine (FSM ), a simple task scheduler, or even to implement a
rudimentary version of C++’s virtual function feature. It’s beyond the scope of this
section to explain these in detail. Nevertheless, a FSM is particularly suitable for
certain types of embedded programming, and it would be worthwhile for you to
follow up on the subject via web searches for tutorials on the subject.
Here we will just use a simple example of function dispatching based on an
integer argument:
typedef void (*FUNCPTR)(void);
void f1(void), f2(void), f3(void), f4(void);
FUNCPTR ftable[] = {
f1, f2, f3, f4
};
// Dispatch on 0..3
void FunctionDispatch(nti )
{
if (i >= 0 && i <= 3)
(*ftable[i-1])();
}
ftable is an array of function addresses that is initialized with the addresses of 4
functions. In FunctionDispatch, if the input argument i is between 0 to 3, then
it is used as an index to indirectly call the respective function.
ADVANCED TOPIC: Using Pointers to Pointers
A pointer may contain the address of any other data type, including another
pointer type. Please see chapter <Advanced Topic: Dynamic Data structures> to
see how to use pointers to pointers to simplify code for dynamic data structure
allocation.
11. DYNAMIC DATA STRUCTURES
A linked list is made up of a list of “nodes”, which is represented in C as a
struct. Each node has a link, i.e. a pointer, to another node. The rest of the
node holds data necessary for the computation. A root pointer contains the
address of the first node in the chain. The end of the chain is signified by the link
address value of 0, the null pointer (usually drawn as the electrical symbol for
GROUND).
Example of Linked List
For example, this code fragment creates a linked list of input lines (for example,
typed on a terminal keyboard):
#include <stdio.h>
struct input_str;
typedef struct input_str {
struct input_str *next;
char *str;
} INPUT_STR;
INPUT_STR *root;
void ReadName(void)
{
char *s;
char buf[1024];
while ((s = gets(buf)) != NULL)
{
INPUT_STR *p = (NAME *)malloc(sizeof (INPUT_STR));
// out of memory
if (p == 0)
break;
p->next = root;
root = p;
p->str = malloc(strlen(s) + 1);
if (p->str)
strcpy(p->str, s);
else
break;
}
}
Each node of the linked list is of type struct input_str, which consists of a
pointer to another node and a pointer to the string. The loop in the function
ReadName reads a line, then allocates a new node and links it to the existing
chain rooted at root. Root points to the last structure created. The list can be
traversed through the next pointer of the chain.
Two calls to malloc are used: one to create a new node, and the other one to
create an array in which to store the input string.
PITFALL: The Standard C Library function “gets()” takes an input buffer. However,
gets might overrun the buffer, as there is no method to communicate the size of
this buffer to gets.
EXERCISE: Note that new nodes are created at the beginning of the list. How
would you append the new node to the end of the list? Also see the section
“Pointers to Pointers” in the later section of this chapter.
Traversing a Linked List
C pointers make it easy to access and manipulate a linked list. Using the data
structures from previous page:
for (INPUT_STR *p = root; p; p = p->next)
printf(“input was ‘%s’\n”, p->str);
The for loop “walks” from the start of the list using the address stored in the
global variable root, and traverses each element of the list.
ADVANCED TOPIC: Allocating a Dynamic Size
Structure
Notice that in the previous example, there are two calls to malloc, one to allocate
space for the struct input_str structure, and one for the character array
holding the content of the input string. Besides function call overhead, each
allocated object requires some space overhead for allocation management, and
more allocation calls can potentially cause heap fragmentation.
There is a method to eliminate one malloc call in this example; since C does not
perform object size checking, you might use this to your advantage and eliminate
the second call to malloc to avoid some space overhead. To do this, you “lie” to
the compiler by telling it that the end of the structure is a single character array
instead of a pointer to a char, and then in the allocation call to malloc, you
specify enough space sufficient for both the structure and the input string.
#include <stdio.h>
struct input_str;
typedef struct input_str {
struct input_str *next;
char str[1]; // extensible array
} INPUT_STR;
INPUT_STR *root;
void ReadName(void)
{
char *s;
while ((s = gets()) != NULL)
{
int len = strlen(s);
INPUT_STR *p =
(NAME *)malloc(sizeof (INPUT_STR) + len);
if (p == 0)
break;
p->next = root;
root = p;
strcpy(p->str, s);
}
}
strlen returns the length of a string not counting the terminating nul. However,
as the structure has a one-byte array as a member, there is no need to account
for the space for the nul in the call to malloc.
The same call to strcpy is made as in the previous example. In the original
case, p->str is a separate heap object pointer, whereas in this case, p->str
has the type “array of one char”, but we have allocated extra space for the input
string.
This optimization only works because C allocates a struct member in declaration
order, so str will be at the end of the structure.
Appending Node at the End of the List
In the code example from the previous section, the function ReadName inserts
the new node at the beginning of the list. What if you wish to append the new
node to the beginning of the list instead? An obvious solution would be to use
another variable to keep track of the “current” node. Looking at just the while
loop where the nodes are created, we can write
INPUT_STR *current = 0;
while ((s = gets()) != NULL)
{
int len = strlen(s);
INPUT_STR *p =
(NAME *)malloc(sizeof (INPUT_STR) + len);
if (p == 0)
break;
p->next = 0;
strcpy(p->str, s);
if (current == 0)
root = p;
else
current->next = p;
current = p;
}
Inside the loop, either “current” is zero, denoting an empty list, or it has at least
one element. The if statement checks for either of those two conditions and acts
accordingly: the code should hopefully be obviously by now.
This code is easy to understand, however, there is an alternative.
ADVANCED TOPIC: Using Pointers to Pointers
The power of using a pointer is that it is that it can contain the address of ANY
object, including another pointer object! For example:
nti ;
int *pi = &i;
int **ppi = π
**ppi = 4;
// now *pi and i also equal to 4
While a pointer to a pointer to an integer may not be very useful generally,
consider the requirement to append a node at the end of a list. We have seen an
example where it was done by using another variable to keep track of the
“current” node. The following demonstrates a more compact coding style using
another variable nextp, a pointer to a pointer, to accomplish the same task:
INPUT_STR **nextp = &root;
while ((s = gets()) != NULL)
{
int len = strlen(s);
INPUT_STR *p =
(NAME *)malloc(sizeof (INPUT_STR) + len);
if (p == 0)
break;
p->next = 0;
strcpy(p->str, s);
*nextp = p;
nextp = &p->next;
}
By using a pointer to another pointer, this version eliminates the if statement and
therefore is visually much less cluttered.The initial assignment of nextp =
&root sets up the initial condition, and nextp is the address where the new
node should be attached.
Which version you prefer to write is up to you. There is no significant difference in
code size or speed. Nevertheless, while coding using pointers to pointers may
look more daunting, especially when you are not familiar with it, once you gain
proficiency you might find that it can make programs both visually shorter and
easier to understand and maintain.
Dangling Pointers
The function free is used to deallocate heap memory objects, so that future
objects can use that space. However, as C argument passing is call-by-value, the
pointer to the object is not changed. Therefore, one might still access that memory
space through the pointer. This is known as a “dangling pointer” problem.
// (using the data structure as previous pages)
INPUT_STR *p = (INPUT_STR *)malloc(sizeof (INPUT_STR));
…
free(p); // p is destroyed
printf(“input is \”%s\”\n”, p->str);
In this example, p->str is being accessed, even though the memory has been
released back to the heap. The call to printf may print the correct value (by
“luck”), print garbage, or even cause the program to crash.
The more insidious problem is when a pointer is free’d in one section of the code
but the “dangling pointer” is used in another section of code, potentially causing
random crashes.
BEST PRACTICE: always assign zero to a free’d pointer variable and check for
non-zero pointers before using them:
INPUT_STR *p = (INPUT_STR *)malloc(sizeof (INPUT_STR));
…
free(p);
p = 0;
…
// always check for valid pointer before accessing
if (p)
printf(“input is \”%s\”\n”, p->str);
Advantages and Pitfalls of Using Dynamic Memory
PITFALL #1: Out of (heap) memory error. The heap memory is limited in size.
Malloc and calloc return 0 if memory cannot be allocated. Your program must
account for the possibility that malloc/calloc cannot allocate storage, and so
should not use the returned pointer if it is a null pointer.
Unfortunately, error recovery in an embedded system is always tricky: that is,
there is no general mechanism to handle failure, as an embedded system should
never fail or crash! Imagine a self-driving car literally crashing due to a software
crash in the embedded control system! In the given examples above, the returned
value from malloc is checked against zero before it is used. However, how to
deal with the failure case is left as an exercise for the actual embedded system
firmware implementation.
It is also important to call free to release the memory back when the storage is
no longer needed.
PITFALL #2: Heap fragmentation. There are different algorithms to implement
heap memory routines – some favor speed of allocation and deallocation, and
others favor optimal space usage. The heap may become fragmented, which
refers to chunks of memory being allocated with unused space left between the
chunks. This condition happens more frequently when there are a lot of random
allocation and deallocation calls.
If the heap memory is highly fragmented, malloc/calloc might not be able to
honor an allocation request even if the total amount of free space is large enough.
Some programming languages provide a feature called garbage collection that
frees up unused heap objects automatically and merges the free heap memory
together as needed. Unfortunately, the pervasive unrestricted use of pointers in C
makes garbage collection for C programs not generally possible.
ADVANTAGE: For the reasons mentioned, some people recommend against ever
using dynamic memory. This is certainly in line with the spirit of “defensive
programming”. Nevertheless, while there are pitfalls, using dynamic memory can
be a very useful feature. Primarily, it can adapt to changing memory needs of your
programs. Imagine you have a number of data structures defined in your
program; by using dynamic memory, you do not need to predetermine how much
memory is allocated to each data structure set.
You can program defensively to account for out-of-memory errors. For example,
you might write your own functions that call malloc/calloc and take a
consistent approach when an out-of-memory error occurs. Moreover, if a program
can run of heap memory, then it will most likely run out of memory even when
using statically allocated memory, so not using heap memory just avoids the
issue, and is not dealing with it per se.
WAR STORY: One of the authors worked at Whitesmiths Ltd in the 1980s, with
Whitesmiths being the first company to produce a commercial C compiler outside
of Bell Labs. Whitesmiths also produced a Unix Edition 6 API compatible system
called Idris. Today’s Cortex-M microcontrollers are much more powerful and have
more memory than the machine targets of that time. One story told by PJ Plauger,
Whitesmiths’ founder and president, was that in the early Unix and C compilers
they were careful to avoid using dynamic memory, for fear of memory
fragmentation and other issues. Mr. Plauger wrote the Whitesmiths compilers and
Idris from scratch, and he did not set such restrictions. The resulting programs did
not suffer any performance or quality issues from use of dynamic memory.
SECTION IV
APPENDICES
A. INTRODUCTION TO COMPUTER
ARITHMETIC
Representing Numbers
At the heart of a digital computer is the Central Processing Unit (CPU). A CPU
operates purely on bit patterns, which are most convenient to notate as numbers.
Normally, numbers are written in base 10, or decimal notation: the digits are 0..9
(using the notation .. to mean from one end to another), and the number
sequence goes from 0..9, then 10..19, and so on.
However, since digital CPUs work in a binary digital domain, where a bit is either
in a state of “on” (1) or “off” (0), it is often more convenient to refer to these bit
patterns in other (non-decimal) numbering systems.
Binary Notation
Binary notation (also called “base 2”) uses only the digits 0 and 1 to write
numbers. The sequence of all the numbers that can be represented in a single
byte is:
Decimal Binary
0 00000000
1 00000001
2 00000010
…
10 00001010
11 00001011
12 00001100
…
252 11111100
253 11111101
254 11111110
255 11111111
As the table shows, starting with all zeros (0 in decimal), the highest number that
can be stored in a byte is 11111111 in binary, or 255 in decimal (assuming that we
are interpreting the byte as unsigned, but we will get into signed vs. unsigned
later).
Hexadecimal Notation
Writing in binary is very cumbersome; so one common notation used in the
computer field is hexadecimal notation, (also known as “base 16”) where the digits
go from 0 to 9, then A, B, C, D, E, F (or: a, b, c, d, e, f), corresponding to 10 to 15
in decimal.
Hexadecimal 0 to F (decimal 0 to 15) can be represented in exactly 4 bits, and 2
hexadecimal digits fit into an 8-bit byte exactly. This is why hexadecimal notation
is the preferred method of writing numbers “for computers”, not decimal notation.
Octal Notation
Another common notation is octal notation (“base 8”) which uses the digits 0 to 7.
Octal is useful because it fits into 3 bits, but of course it does not fit into a byte as
well as hexadecimal numbers.
Numbering Prefixes
To disambiguate the base of a number in this book, we will adopt the C notation: a
decimal number is given no special prefix, a binary number is prefixed with the
[36]
sequence 0b , a hexadecimal number is prefixed with the sequence 0x, and an
octal number is prefixed with 0. Occasionally, some examples in this book will
show a series of bit patterns without the 0b prefix, but it should be clear from the
context, and those bit patterns will usually be broken down into nibbles of 4 bits,
e.g. 0100 1001.
The number “42” as written in different bases:
Base Number
2 0b101010
8 052
10 42
16 0x2A
Interpreting a Number
The place value of the initial non-zero digit (1) in each number system is
equivalent to the next power of the base.
A few observations:
1. Any whole numbering system is possible, e.g. base 3, base 4 etc. However,
to programmers, bases 2, 8, 16 and 10 are the most common and useful ones.
result: 427
Prefixes: Kilo, Mega, and Giga etc.
In describing the size of computer memory, the convention is to use the nearest
power-of-two, e.g. a “kilobyte” is actually 1024 bytes, and not 1000 bytes.
[37]
However, for hard drive memory sizes , or in general usage as prefixes, power-
of-ten is used, e.g. kilohertz is a thousand Hertz, or a thousand cycles per second.
Fraction prefixes, e.g. “milli”, are always a (negative) power of ten.
The exponent number of the ten’s power can be viewed as the number of the
zeros following the initial one, e.g. 106 is one followed by 6 zeros, or 1,000,000.
ASCII Characters
ASCII (American Standard Code for Information interchange) is a character-
encoding scheme for 128 specific characters based on the English alphabet. It
has the following properties:
1. 7 bits are needed for the character encoding; the 8th bit is normally left as 0 /
zero / “off”. On an 8-bit byte, in a context where one is expecting an ASCII
[38]
character, any byte found that has the MSB turned on indicates that it is
an “escape code”.
In the old days of terminals and line printers, “escape codes” might contain
graphics or symbols (e.g. the copyright © symbol), or something totally non-
printable. Escape codes now can even be two or more bytes depending on
the encoding.
2. The digits ‘0’ to ‘9’ are consecutive.
3. The alphabetic characters ‘a’ to ‘z’ are consecutive, as are ‘A’ to ‘Z’, and the
lowercase letters have lesser values than their uppercase counterparts. This
means that the delta (difference) in numeric value between a lowercase letter
and its corresponding uppercase letter is constant for all letters.
4. Not all characters are printable. Some are non-printable graphics and some
are control character codes for a terminal, printer, or other older devices.
http://commons.wikimedia.org/wiki/File:Ascii-codes-table.gif
Carry, Borrow
When you add one number to another, the operation starts on the right and moves
from there to the next set of digits to the left (or from LSB to MSB), just as in
decimal calculations.
Similar to decimal operations, when you add two binary digits together, the result
may carry a 1 to the next digit to the left. If the final result has an extra 1 carried
from the MSB, then it’s called a carry-out condition.
Carry:
00000001
+ 00000001
00000001
Borrow from “outside”:
00000000
- 00000001
Shift Operations
Shifts, which have many uses (such as extracting certain bits from a word), are
also important for implementing arithmetic operations such as multiplication and
can also be used for unsigned division. “Left shift by one bit” moves all the bits
one position to the left, and fills the lowest order bit (bit 0) with a 0:
Left shift one bit
01110101 ←
11101010
Likewise, “right shift by one bit” moves all the bits one position to the right. While
there is only one type of left shift, there are, however, two types of right shift:
arithmetic and logical.
[39]
An arithmetic right shift (typically used for signed integers , to preserve the
number’s positive or negative number status) retains the value of the sign bit
(MSB) as the original sign bit is shifted to the right. A logical right shift fills the
MSB with a 0. (Of course, if the most significant bit was already a 0, then the
apparent result of an arithmetic right shift and a logical right shift are the same.)
Arithmetic right shift one bit
10101110 →
11010111
Logical right shift one bit
10101110 →
01010111
Signed Overflow
Regardless of how many bits a CPU uses to store numbers (8 bits, 32 bits, etc.),
overflow may happen. In two’s complement, this means, for example, that adding
two negative numbers results in a positive number; or the opposite: adding two
positive numbers results in a negative number. This is known as overflow.
Note that overflow is NOT the same as carry-out. As was shown before, a carry-
out occurs when a “1” is “pushed out” of the MSB to the left. An overflow occurs
when the sign of both operands are the same but the result has a different sign.
(By definition, overflow will not occur if the operands have different signs.)
Signed Decimal
01111111 127
+00000010 + 2
––––— ––
10000001 -127
Overflowed:
MSB changed from 0 to 1, the number has
become negative.
Overflow: Unsigned Wraparound
If you use unsigned representation, e.g. the numbers 0…255 represented in 8
bits, adding two unsigned values may result in a number smaller than either
operand. This is called wraparound. In fact, unsigned arithmetic is sometimes
known as modular arithmetic, because the results are as if you applied the
modulus to the intermediate result (the result you would have if you are not limited
by the number of bits being used) to get the final result.
For example, in using 8 bits to store a number, you would use modulus 256 (28):
Unsigned Decimal
10001010 138
+10001010 +138
––––– ––
00010100 = 20 276 % 256 = 20
1 ← carry-out
Status Flags / Status Bits
A CPU normally maintains a set of “status flags” (which are typically just a set of
dedicated bits). Status flags allow various types of useful information resulting
from hardware operations to be accessed.
The Carry (and sometimes Borrow) Bit
When you look at a CPU reference manual, there is usually a carry status bit in
the processor status register. The use of the carry status bit is well defined for
addition: it is set to “true” (1) when a hardware operation has just resulted in a
carry-out from the MSB. Otherwise, it set to “false” (0).
For subtraction, however, there are two interpretations for use of the carry status
bit. Let’s say that a given operation is “A - B”. One subtraction interpretation treats
the carry status as a borrow flag, so it will be set to true whenever A is less than
B. The other interpretation uses the two’s complement arithmetic model, where
subtraction is done via adding the two’s complement of the subtrahend (the
number we are subtracting). In that case, the carry status reflects the normal
addition case: it is set to “true” whenever there is a carry-out:
Carry being used as a Borrow:
00001010
-00010101
––––—
11110101
1 → borrowed, the carry flag is set to “true” (1)
Carry when subtraction is being performed via two’s
complement addition:
00001010
-00010101
First find two’s complement of 00010101 → 11101011, then
the problem becomes
00001010
+11101011
––––—
11110101
The carry flag is set to “false” (0)
Among CPUs, the 8080, Z80, x86 and 68K families use the borrow flag
subtraction interpretation, while ARM and PowerPC use the carry flag subtraction
interpretation. There is no particular advantage to one choice over the other, and if
you are writing in C, this “under the hood” implementation choice is irrelevant to
you.
Compare Operations
Compare operations are typically done by using subtract operations, with the
calculation results being ignored while the side effects of the operation (the states
of various CPU status flags) are noted.
The 4 most commonly found CPU status flags are:
C - Carry (see above): did the operation produce a carry-out?
V - Overflow: did the operation cause a signed overflow?
Z - Zero: was the result of the operation zero?
N - Negative: was the result a negative number?
Note that signed and unsigned compare are considered two different
operations. For example:
a: 0b11111111
being compared to
b: 0b00000001
If they are unsigned bytes, a is greater than b; you are comparing 255 to 1.
However, if they are being interpreted as signed bytes, then a is less than b since
it is comparing -1 to 1. Using combinations of flag statuses, you can deduce the
comparison result for both signed and unsigned operands.
To compare two operands, first subtract the second operand (b) from the first, and
check the flag status to determine whether the first operand (a) was equal to,
greater than, or less than b. (In the following examples, the carry-out and not the
borrow interpretation is used to determine the status of status flag C.)
Nomenclature used below:
“==” means “is equal to”, “!=” means “is not equal to”. This notation comes from C
programming language grammar. E.g.: “Z == 1” means the Z status flag’s value is
“true” (the status bit is 1 because the operation’s result was zero).
Compare Operations: Equal comparison (unsigned or signed)
‘a’ is equal to ‘b’: flag status result: Z == 1
not equal to: flag status result: Z == 0
Compare Operations: unsigned comparison
greater than (sometimes also called higher than): C == 1, Z == 0
less than (sometimes also called lower than): C == 0, Z == 0
Compare Operations: Signed comparison
greater than: N == V, Z == 0
less than: N != V, Z == 0
EXERCISE: Work out examples and show how signed greater than and signed
less than use the flag statuses as described.
ADVANCED EXERCISE: Although modern CPU designs normally have the
overflow flag V, which allows for more efficient software implementation, one
popular 8-bit CPU has a design based on an older convention which does not
provide this flag. How would you find the result of a signed comparison using only
the C, Z, and N flags?
Bitwise Operations
Bitwise operations include AND, OR, Exclusive OR, and Complement. Note that
there is no overflow or carry in bitwise operations.
Bitwise AND (&)
0 & 0 = 0
0 & 1 = 0
1 & 0 = 0
1 & 1 = 1
The result of an AND operation is ‘1’ if and only if both inputs are ‘1’s.
Bitwise OR (|)
0 | 0 = 0
0 | 1 = 1
1 | 0 = 1
1 | 1 = 1
The result of an OR operation is ‘1’ if either of its inputs is ‘1’.
Bitwise Exclusive OR (^)
0 ^ 0 = 0
0 ^ 1 = 1
1 ^ 0 = 1
1 ^ 1 = 0
The result of an “exclusive OR” operation is ‘1’ if (and only if!) only one of its
inputs is ‘1’.
Exclusive OR performed with “1” as the second operand is sometimes known as
the “toggle operation”, as the result is the toggled value of the first operand.
Bitwise Complement (also know as Bitwise Inverse, Toggle)
~0 = 1
~1 = 0
Bitwise complement is the one’s complement of the bit.
B. A BRIEF HISTORY OF C
In the 1960s, Bell Labs, the research arm of American Telephone and
[40]
Telegraph , was engaged in some of the most important computer science
research of the time. After working on a mainframe operating system project
called “Multics” (which Bell Labs finally pulled out of), Ken Thompson, a
researcher at Bell Labs, decided to implement the best ideas of Multics using
Assembly Language for a Digital Equipment Corp.’s PDP-7 minicomputer.
At that time, paper tape - not even punch cards! - was the standard type of
program storage unit. The code was first developed on a GE-635, then the output
paper tapes were carried over to the PDP-7 for processing, until enough of the
system was working on the PDP-7 to enable native development (although still in
Assembly).
Ken Thompson then began working on a compiler for a language he called B, the
name being attributed either to the language BCPL (to which B bears some
similarities) or Bon, an earlier language which Thompson had written for
[41]
Multics . He then rewrote part of the as-yet unnamed new OS in B.
In 1970, the Bell Labs researchers managed to convince Bell Labs management
to procure a PDP-11, a much more powerful machine, in order to implement a
“text processing system.” Along the way, the name “Unix” was adopted for the
new operating system by Brian W. Kernighan, attributed either to being a play-on-
words of “Multics”, or even possibly a tongue-in-cheek reference to “eunuchs”.
Meanwhile, Dennis M. Ritchie, while working with B on the PDP-11 Unix, decided
to add some needed features to B, and picking the next letter from BCPL, he
called this new language C. Using a procedure called “bootstrapping”, he first
wrote a prototype C compiler in B, after which he rewrote the compiler in C itself,
adding more features at each iteration as needed.
By 1973 and 1974, C was sufficiently mature enough that Unix and its utilities
were entirely rewritten in C. Remarkably, this version of C still bears a great
resemblance to even the latest Standard C, a testament to how well the original C
language was designed. During the same period, C was retargeted to the
Honeywell 635 and IBM 360/370, and through that experience, C adopted
features and encouraged practices that have improved the portability of C
programs, which in turn, contributes to the success of the language today.
Kernighan and Ritchie published the book The C Programming Language in 1978,
arguably considered “the C Bible” even to this day (a “version 2” was published in
1988). During the same time period, Steve Johnson wrote a reference C compiler
called the Portable C Compiler. Unix and C were ported to the interdata 8/32 and
then to the VAX-11 in the late 70s. Due to some anti-monopoly agreement with the
US government, the Unix source was given to the University of California
Berkeley, which they (mainly a programmer named Bill Joy, who went on to co-
found the legendary company Sun Microsystems) modified to become BSD Unix.
The various AT&T Unix and BSD Unix and C versions influenced an MIT hacker
named Richard Stallman (around 1984) to work on “free” versions of the compiler
and operating system, which eventually grew to become the set of GNU software,
including the GCC suite of (non-standard C) compilers. The GNU efforts in turn
influenced a young Finnish student named Linus Torvalds (in 1991) to write a “toy”
operating system that eventually became Linux.
Meanwhile, back in the early 1980s, commercial C compilers started to appear for
chip targets such as Motorola 68K, Intel 8086, and even small microcontrollers
such as the Motorola 68HC11 and Zilog Z80. P.J. Plauger, a researcher at Bell
Labs, left to found Whitesmiths Ltd. and produced one of the first commercial C
compilers outside of Bell Labs with wholly independently-developed compilers.
In the 1980s, C compiler companies practically blossomed like daisies after a
spring rain. Borland’s Turbo C blew the market open with its low price, but
eventually the Windows compiler market coalesced to mainly just Microsoft Visual
C (later Visual Studio). The Mac market was dominated by Think C, then Code
Warrior C, until Mac changed to the Intel x86 processors in the late 90s.
C became the “lingua franca” programming language for machines from small
embedded systems to mainframes. The Bell Labs researchers eventually followed
Unix/C with Plan 9 and then the commercial Inferno OS and the Limbo language,
but they did not achieve the earlier successes of Unix and C.
The timely popularity of many computer languages has often been fickle. After the
computer industry collectively embraced C/C++ in the 90s, its attention now
seems poised to fragment again, into Go (Google), Swift (Apple), C# (Microsoft),
Java (Sun/Oracle, Android), Objective C (Mac OS X and iOS, although with Apple
working on Swift, the writing is on the wall for Objective C), and a smattering of
scripting / interpreting languages for the web such as PHP, Python, Ruby, and
Perl.
For embedded systems, though, C is STILL the most popular high-level
programming language, even for the 32-bit segment. As long as price and
performance are the prime driving factors in the embedded space, C’s future is
still assured.
C. THE C STANDARDS
The definitive technical description of this language is the “ISO C Standard.” You
can find that on the web by using search terms, but in order to see the full content
of the released Standard documents, you must pay a fee to the Standards
[42]
Organization . Luckily, there are also various drafts, FAQs, summaries, and
plenty of helpful information on the web and in printed literature that you can
peruse for free. Most of the time, the draft versions are nearly identical to the
release versions.
There have been three major releases of the C Standard. Summarizing, with
emphasis on language differences (standard library differences are omitted):
[43]
ANSI C89 / ISO C90 - the first C Standard; 1989. This mostly codified
existing practices and resolved some of the differences between compiler
implementations up until that time. In particular, the original compilers were based
on the AT&T compilers, but by 1989, many commercial compilers written by other
vendors have sprung up, sometimes disagreeing with the AT&T compilers on
language syntax and semantics.
C99 (ratified in 1999) - This release added inline functions, interspersed
declarations and statements, variable-length arrays, and // as a single line
comment delimiter.
C11 (2011) - This release added new keywords and macros for alignment,
thread-local-storage, and anonymous struct / union.
Here are some drafts of the standards, with their availability subject to change
from the hosting organizations. Later drafts might also exist.
C90 Draft:
http://web.archive.org/web/20050207005628/http://dev.unicals.com/papers/c89-
draft.html
C99 Draft: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf
C11 Draft: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1494.pdf
Most embedded C compilers implement C89 or a subset of the ISO C99. This
book describes C as implemented by the ImageCraft JumpStart C compilers,
which implement C90 with some C99/C11 features, including // comment,
anonymous struct / union, interspersed declarations and statements, and for loop
variable declarations. Additional C99 and C11 features are also scheduled to be
[44]
added.
D. C COMPILERS AND THE RUNTIME
ENVIRONMENT
Regarding Compilers
A compiler is a software program that translates source files written in a high level
programming language, e.g. C, to a set of machine instructions for a target
machine.
There are C compilers available for various targets, ranging from large
mainframes to the smallest 8-bit MCUs. ImageCraft produces two popular
embedded system C compilers: JumpStart C for the Atmel AVR, and JumpStart C
for the Cortex-M. When implementation-specific topics are covered, this book
uses either of these compilers as examples.
While a program written in C and compiled to assembly might be slower than a
program written in assembly (from 1.5x to possibly 5x as slow, depending on the
compiler and the target machine), this overhead is usually well worth the tradeoff,
due to the numerous advantages of writing and modifying programs in a high level
language.
Building a C Program
The source of a C program is typically separated into multiple source files so that
it is easier to write and maintain. The source files are processed by multiple
programs, eventually resulting in a “program image” that can be download into the
microcontroller. The term “C Compiler” is commonly used to refer to the entire
chain of tools, even though properly speaking, a C compiler is only one part out of
many. In the narrow context, a C compiler translates C source file into assembly
code of the target CPU.
The C source file with .c extension is processed by the C preprocessor, then by
the compiler proper and the assembler; and finally the linker combines all the .o
files and .a library files into a set of output files. The .bin and .hex output files are
the output image in binary and hexadecimal format.
A top level program called a compiler driver hides the compilation process from
the users, and the integrated Development Environment (IDE) hides everything in
an easy to use GUI.
Compiler Passes
A “compiler” is typically made up of a few different programs, as shown in the
diagram on the previous page. Each stage of conversion between the original C
source file to the final executable file is called a “compiler pass”.
Compiler Driver
Not shown in the table above, the compiler driver is responsible for the top level
user interaction, and it processes the input files by calling the different compiler
passes in sequence.
C Preprocessor
A C source line that starts with the # symbol is a preprocessor directive (a specific
instruction to the preprocessor to perform some behavior) (see <Chapter 6. The C
Preprocessor>). Certain text in the source file might be textually replaced by other
content if it is affected by such directives. For example:
#include <stdio.h>
replaces the line with the contents of the file stdio.h. By convention, files that
are to be #include’d by C source files use a .h extension.
C Compiler Proper
The C compiler proper translates C language input into assembly language.
Assembler
The assembler translates an assembly language input into object code.
Linker
The linker combines any number of object files with the required library files to
form the final output executable files. The executable images are “burned” or
“downloaded” onto the target machine, e.g. the JumpStart MicroBox kit, and run.
The C Abstract Machine
A programming language can be described by its syntax and the semantics of the
language elements. A subtle implication is that the language also defines an
“abstract machine” that the generated code depends on, including the sizes,
behaviors, and representations of data types, and the memory model expected by
a running program.
The C programming language is a relatively simple language. Most C operators
and variable accesses compile to a few machine instructions. More involved
operations, for example floating point or 64-bit integer operations, might compile
to calls to internal library functions provided by the compilers.
Therefore, in most cases, a C Abstract Machine can be implemented with just
tens of instructions in the startup code, which takes initial control when a C
program is run. After the startup code sets up the C environment, it turns controls
over to the user function “main”.
C Startup
The C compiler inserts startup code in the program image (the loaded executable
version of the compiled program) that takes control as the CPU resets. The
startup code sets up a C abstract machine by:
Initializing the stack pointer
Zeroing the bss segment (see later section) - global and static variables (see
Chapter 10. Variables>) that do not have explicit initialized values.
Copying the initialized values of the global and static variables that so have
explicit initialized values from the program image to the data segment (see
later section).
After that, the startup code transfers control to the function main, which enters the
user code. Main is not expected to return, and if it somehow does, typically the
startup code will then just loop forever.
The Code Segment
Executable instructions are placed in the machine’s code segment, which usually
is located in read-only memory. In most MCUs, this corresponds to the flash
memory, often built into the MCU itself. The size of the code segment is known to
the compiler tools. A typical compiler may allocate the space in the code segment
as follows:
Low Memory Address
…
Startup code
Code for user function 1
Code for user function 2
…
Code for user function n
Code for C library function 1
Code for C library function 2
…
Code for C library function n
[ Data Segment ]
[ BSS Segment ]
…
…
(stack frames grow toward low memory addresses)
stack frame for function b ← current stack pointer
stack frame for function a
stack frame for function main ← top of stack (TOS)
High Memory Address
main is the root of the program (the first user function in a program that initiates
the other function calls), so its stack is located at the highest address. As each
function call is made, a new stack frame for that particular function is allocated.
When a function call exits, its stack frame is deallocated (freed up for other use).
Heap Memory
C allows allocation of dynamic data structures at runtime through the Standard C
Library functions malloc, calloc and free. These objects are allocated in the
heap memory, also located in RAM. Heap objects are accessed through pointers.
Heap objects are managed entirely by the user using the library functions. These
objects are created and destroyed as needed, depending on the program’s
runtime requirements.
To minimize the possibility of memory corruption, heap memory is placed to start
after the BSS segment, and memory allocation for the heap grows toward higher
memory address. Memory for dynamic objects may be reclaimed by calling the
function free; the object is then destroyed (deallocated). The heap space is
managed by library functions which compact and merge free heap space as
needed.
Low Memory Address
[ Data Segment ]
[ BSS Segment ]
heap object 1 ← start of the heap objects
heap object 2
heap object 3
(heap objects allocated toward high memory address)
…
…
(stack frames grow toward low memory address)
[ Stack Segment ]
[1]
If you do not obtain a license, the program runs in demo mode and is fully functional for 45 days. This will
be sufficient for the simpler programs.
[2]
Step by step set of instructions on how to do something.
[3]
Teletype and paper tape, when you go far enough back in the history of C.
[4]
The actual example you see might be slightly different from the excerpts in this book, as things might have
been changed in minor ways since publication.
[5]
See <JumpStart MicroBox Preparation> section earlier.
[6]
“Arguments” are values being passed to a function that you are invoking.
[7]
A VERY trivial fact is that there is actually no such thing as a “negative constant” in C, but rather it is a
negation operator applied to a positive constant. However, there is absolutely no difference between that and
a negative constant, except to score points as a C geek.
[8]
This is the original definition for ASCII. There are now extended ASCII codes.
[9]
A nonzero value is considered as “true” in C.
[10]
Seriously.
[11]
Traditionally, Unix accepts the just the code NL to this, Windows/DOS needs CR, followed by NL, and
Mac OS (the pre-OSX versions) accepts NL, followed by CR.
[12]
The C technical term is compatible type, which will be explained in chapter <Variables and Type
Declarations>.
[13]
But see _Bool Data Type in chapter <Types and Declarations>
[14]
Most people do not bother to #define other operators. Perhaps the fact that the symbols && and || look
so different from operators in other languages drives people to do this.
[15]
Except bitfields, but they are not standalone objects.
[16]
The if-else statement will be described in the chapter <Statements>.
[17]
That is, find the first nonzero digit from the right and discard all the zeroes to the right of it. The number of
digits left is known as significant digits. The number of decimal significant digits is not exact, because floating
point uses binary representation.
[18]
One exception is when a read access to a volatile I/O register effects a change in the I/O state. This is
explained later in <I/O Register Access>
[19]
An earlier attempt of this book tried to separate Types and Declarations into their own chapters, and
the results weren’t pretty.
[20]
Although using more than 3 dimensions is very rare.
[21]
There is no enum type in the original C definition, hence earlier programs made use of #define. Since
its introduction in C90, enum is the preferred mechanism.
[22]
This is a typical way to construct a linked list. See chapter <Advanced Data Structures>.
[23]
It’s not incorrect, just have no effect whatsoever.
[24]
JumpStart C for AVR only.
[25]
C actually allows you to also write “const int”; it means the same thing, as C is flexible in type
keyword placements.
[26]
These symbols “decorate” other parts of the declaration.
[27]
It must be preceded by at least one named argument. Variadic functions are described in more detail
later.
[28]
Which you really should not be doing. Yes, this book stresses this point any chance it gets.
[29]
Some languages allow an argument to be “call by reference”, allowing modifications to a passed
argument to be reflected in the calling function. Under the hood, they basically implement the pointer passing
and modification through the passed pointer.
[30]
Unless you really know what you are doing. See “Accessing the Stack Address” in the chapter
<Advanced Topic: Effective Pointer and Array Usage>
[31]
Whitespace in front of the # character is ignored.
[32]
Essentially, “building a project”, a term used by the IDE (Integrated Development Environment), basically
just means compiling all of the source files to object files and then linking them together into one executable.
[33]
Unfortunately, off-by-one errors and “hacking until it works” are commonly encountered phenomena in
programming. :(
[34]
Ad nauseum? :p
[35]
Not to be confused with a “Flying Spaghetti Monster”.
[36]
The official C Standard does not include binary notation, but 0b is a common extension accepted by a
number of C compilers, including the JumpStart C Compilers.
[37]
By referring to power-of-ten instead of the typical power-of-two, hard drive manufacturers are “cheating”
you on expected memory bytes
[38]
MSB = “most significant bit”
[39]
As opposed to signed char, or signed short, etc.
[40]
The original AT&T, with a monopoly on phone service, is the foremother (“Ma Bell”) of the current 2000s
AT&T cell phone carrier
[41]
B and BCPL have similar operators to C, but they have no concept of data types.
[42]
Almost as if they want to encourage people to use a non-standard C instead. Short-sighted, much?
[43]
Originally it was ANSI C89, an American standard. When it was adopted by the International Standards
Organization (ISO), it became ISO C90, but it’s the same standard.
[44]
Other implementation choices that are not applicable will not be discussed here; for example, ImageCraft
JumpStart compilers do not support multibyte characters or non-US locales.
[45]
Although it is possible to estimate the stack segment usage by performing program analysis.