Download as pdf or txt
Download as pdf or txt
You are on page 1of 390

C

for Everyone
The JumpStart Guide to C
by
Richard Man & C.J. Willrich

Copyright © 2015 ImageCraft Creations Inc.
All Rights Reserved


http://c4everyone.com
info@imagecraft.com


DEDICATION
int main(void) {
printf(“Thanks to %s, %s, %s, and %s\n”,
“David W. Krumme”, “Dennis M. Ritchie”,
“Brian W. Kernighan”, “Phillip J. Plauger”);
return 0;
}

To cjw for all your love and patience


TABLE OF CONTENTS
SECTION I – TUTORIAL INTRODUCTION
1 – INTRODUCTION
SECTION II – THE C PROGRAMMING LANGUAGE
2 – BASIC ELEMENTS OF C
3 – EXPRESSIONS AND OPERATORS
4 – STATEMENTS
5 – VARIABLES
6 – TYPES AND DECLARATIONS
7 – FUNCTIONS
8 – THE C PREPROCESSOR
9 – THE STANDARD C LIBRARY
SECTION III – ADVANCED TOPICS IN C
10 – EFFECTIVE POINTER AND ARRAY USAGE
11 – DYNAMIC DATA STRUCTURES
SECTION IV – APPENDICES
A – INTRODUCTION TO COMPUTER ARITHMETIC
B – A BRIEF HISTORY OF C
C – THE C STANDARDS
D – C COMPILERS AND THE RUNTIME ENVIRONMENT


SECTION I
TUTORIAL INTRODUCTION

Using example programs, we will examine the basic structures and features of C
programs.

1. INTRODUCTION


This book, C for Everyone - The JumpStart Guide to C, is part of the JumpStart
MicroBox education kit package. The purpose of this book is to teach the
programming language C, using program examples running on the JumpStart
MicroBox hardware.

In addition, the document “JumpStart MicroBox Hardware” focuses on the
hardware aspect of the kit and the document “JumpStart API” describes our API,
which makes getting started with Cortex-M much easier, eliminating much of the
tedium and potential mistakes in the low-level peripheral setup procedures.

If you are already proficient in C and want to get started with embedded
programming, you may skim this book and concentrate on the example section
that is focused on embedded programming. Be warned that unless you are a “C
Wizard”, chances are that there is information in this book that may be useful but
unknown to you. Therefore, we recommend at a minimum that you skim the
content.

A college level course syllabus can be constructed around this book and its
examples, suitable for both hardware and software engineers.

Finally, “can you learn C from this book without the MicroBox hardware?” The
answer is undoubtedly Yes! Section II, The C Programming Language, is a
concise practical introduction to the C language. Reading through this chapter and
Section II alone will give you a good understanding of the C language. So
whatever platform you are using C on, look at the examples and text, and “type
away”.

JumpStart MicroBox Preparation
Before you start, since we will make use of the JumpStart MicroBox to run the
example projects, please follow the Quick Start Guide document and do the
followings:

1. Install the JumpStart C for Cortex compiler
[1]
2. (Optional) Obtain a license for the JumpStart C for Cortex compiler
3. Install the PuTTY terminal program
4. Install the ST USB driver for the ST Nucleo
5. (Do not attach the ACE Shield to the ST Nucleo yet!)
6. Connect the ST Nucleo to the PC and make sure that the driver is working
properly
7. Follow the instructions on how to invoke the IDE, build, and run the Blink-
Nucleo-LED program
8. In the IDE, activate the Hello World project, build, and run the project
Focus of This Chapter
Our aim in this chapter is to show the essential elements of C by presenting
examples that you can run on the JumpStart MicroBox hardware using the
JumpStart C for Cortex-M compiler tools.

This chapter will only use the most basic features of C, as this is a quick
introduction chapter. More advanced topics such as pointers and structures -
which are key to effective C programming - will be presented later, along with
more comprehensive descriptions of some of the topics covered here.

The book assumes that you have some basic understanding of computer
arithmetic. If the terms bits and bytes, or CPU do not mean anything to you, then
you should read Appendix A <Introduction to Computer Arithmetic>.

What Is C?
C is a programming language. A programming language is a created language for
communicating instructions to the Central Processing Unit (CPU), the “heart” of a
computer or microcontroller system. Programming languages are described by
formal rules and definitions. The syntax of a programming language refers to what
lexical characters (letters, symbols, etc.) can be used in a program, and where
they may occur. The semantics of a programming language refer to the meanings
of the program elements.

C belongs to a class of programming languages called procedural languages.
[2]
Most algorithms can be expressed in C easily. Since its introduction in late
1960s, followed by an explosive rise in popularity in the late 1970s and the 1980s,
C has become the primary programming language of any new CPU (except for
extremely limited or specialized ones).

C is especially suited for low level programming since much of the low level
access code can be written in C, avoiding the need to deal with machine
languages, which are difficult to use. C does have many modern programming
language features, striking a good balance between power, ease of use, and
efficiency.

The First C Program
If you have followed any programming language tutorials, you may have
encountered “Hello World”, which is a typical first test program that prints out that
eponymous phrase. This practice was in fact popularized by the “C Bible”: The C
Programming Language, by Brian W. Kernighan and Dennis M. Ritchie. The
objective of the program is to print the words

hello, world

[3]
Life is simple on a Unix machine, where you use a terminal and create a text
file containing the following text:

#include <stdio.h>

void main(void)
{
printf(“hello, world\n”);
}

On a Unix machine, assuming you save the program in a file name hello.c, using
a shell prompt, you type (‘$’ is the “shell prompt”)

$cc hello.c
$./a.out
hello, world

The first line is the command to run the C compiler (named cc, clever eh?). The C
compiler generates an output file called a.out. The second line runs the program
a.out and the third line shows the output as the program is run.

However, most people do not use a shell on Unix. For Windows, it’s a lot more
complicated because it requires a lot of support code to create a “window” etc. For
embedded systems, it is complicated because there is no standard method of
“writing to a terminal”. Fortunately, we have made the process easy with the
JumpStart MicroBox.

“Hello World”
With the IDE, open the file main.c for the “Hello World” project:


Compared to the hello.c in the previous page, this file looks a bit more
complicated. For now, you can ignore all the elements that are not in the original
program (they appear after line 17, omitted here), as they are for setting up the
microcontroller environment.

What do we have here? All C programs make use of functions and variables. A
function contains statements specifying the computing operations. The operations
[4]
may use variables to store values. For example, starting at line 10 , there is a
function called main.

The function main is special, as it is the function that will be run first in a C
program, after the C environment is set up.

When run, its output can be viewed using a terminal emulator program such as
[5]
PuTTY :


The first line “ImageCraft JumpStart…” is produced in the Setup function.
The second line “hello, world” is the output we are interested in.

Comments
Examine the first 3 lines of the file main.c:

/*
* Hello World example
*/

This is called a comment block, and is ignored by the compiler. A comment block
is any text enclosed by the /* */ pair. They are often used to describe what the
program does or what a piece of code does.

BEST PRACTICE: do not state the obvious, for example:

/* assign 5 to “i” */
i = 5;

repeats what the code is in prose form, which is rarely useful. However:

/* use 5 as the starting seed to get good random value */
i = 5;

is more useful, as it explains why this is being done.

A line comment is a comment that starts with //. Any characters after // are
ignored by the compiler until the next line. For example:

i = 5; // use 5 as the starting seed to get good random
value

#include
After the comment block, we have two lines, starting with the # symbol:

#include <stdio.h>
#include <jsapi.h>

These are include file directives. When compiling, the compiler inserts the
contents of these files (stdio.h and jsapi.h) in place of these lines.

#include and all lines starting with # are preprocessor directives and are
described in the chapter <C Preprocessor>.

For now, it is sufficient to know that stdio.h contains information about the
standard input / output library, and jsapi.h contains information about the
JumpStart API.

Semicolons
You may notice that there are semicolons “;” in a few places in the example.
Semicolons are statement terminators, informing the compiler that the end of a
statement has been reached.

Unlike some languages, indentations and white space have no effect on the
meaning of a C program. Carriage returns (or, properly speaking: end-of-line
markers) affect the C preprocessor and terminate a single-line comment
beginning with //, but otherwise are not a part of the C syntax.

Function Definition
The basic form of a function definition, e.g main, looks as follows:

int main(void)
{
// function body
}

int is the data type of the return value of the function main. A more formal
skeleton of a function definition looks like this:

return-type function-name ( parameter-list )
{
<list-of-statements-and-declarations>
}

We will discuss data types further in the chapters <Variables> and <Types and
Declarations>. Following the data type, you write the name of the function,
[6]
followed by a list of arguments between a set of parenthesis ( ). In this
example, main takes no argument and is denoted by the use of the keyword void
in the argument list.

The function main contains these statements:

Setup();
printf(“hello, world\n”);

return 0;

The first two lines are function calls and the last line is a return statement.

An example of a function that takes arguments:

char *strcpy(char *dst, const char *src)
{
// body
}

The initial char * is the function return-type, strcpy is the name of the
function, and “char *dst, const char *src” is the argument list. There are
two arguments:

1. the first argument is named dst, and has the data type char *
2. the second argument is named src, and has the data type const char *

These data types will be explained later. The statements of a function are
enclosed in a set of braces { } .

Calling a Function
Invoking a function is colloquially known as “calling” a function. The preferred
method of communicating information when calling a function is to provide a list of
values, called arguments.

A function call is written as the name of the function, followed by a list of
arguments enclosed by a pair of parenthesis ( ):

Setup();
printf(“hello, world\n”);

In this example, there are two function calls. The first one is Setup and it is called
without any arguments, hence the empty parenthesis list. Setup initializes the
microcontroller environment using the JumpStart API and will be discussed later.

The second function call is printf, which is a library function that prints out its
argument. The argument in this case is a literal string, which is a sequence of
characters enclosed in a pair of double quotes. In C, a literal string is also known
as a string, or a string constant.

In this example, the string constant is “hello, world\n”. Note that this string
constant has the characters \n. This is an escape sequence.

Escape Sequence
There are characters that cannot be typed inside a string constant. For example,
the newline character - corresponding to hitting the ENTER key on the keyboard
moves the output to the beginning of next line - cannot be input directly as part of
a constant string. If you write:

printf(“hello, world
\n”);

Compiling this piece of code will result in many error diagnostic messages. For
example, JumpStart C produces:

!E x.c(7): syntax error; found `world’ expecting `)’
!E x.c(7): syntax error; found `world’ expecting `;’
!E x.c(7): missing “
!E x.c(7): undeclared identifier `world’
!W x.c(7):[warning] expression with no effect elided
!E x.c(7): syntax error; found “); … expecting `;’

To get over these limitations, escape sequences are used. The sequence \n in
the string is C notation for the newline character. When printed, it advances the
output to column one on the next line.

All escape sequence starts with the backslash character \. The most common
escape sequences are:

A Program to Print Miles to Kilometers Conversion
The next program prints a table of conversion from miles to kilometers. With the
IDE, open the “Miles to Kilometers” project. When run, the output should look like
this:


main looks like this:


Again, we will ignore the call to Setup for now.

Variables are for storing values used in a program. In this example, miles, end,
increment, and kilometers are variables used in main.

Variable Declaration
Before using a variable, you must write a declaration for it:

int miles = 20;

This declares a variable with the name miles. The declaration must appear
before any reference to the name miles.

int is the data type of miles. In the sample code, we have a separate
declaration for each variable, but they can be written in a single declaration
statement:

int miles = 20, end = 90, increment = 5;

The expression “= 20” after miles is called an initializer. The = is the
assignment operator and the value of the right hand side (20) is assigned to the
variable on the left (miles). You can write a declaration without an initializer, and
use the assignment statement separately:

int miles;
miles = 20;

The general form of a variable declaration is:

<data-type> <variable-name> <optional-initializer;>

Expression Statement
The key computation happens on line 21:

int kilometers = miles * 1.60934;

or written as separate declaration and assignment:

int kilometers;
kilometers = miles * 1.60934;

This is the mathematical formula which converts miles to kilometers, written in C.
The symbol *is the multiply operator.

While Loop
The expression kilometers = miles * 1.60934 computes one value of
kilometers. To print out the table, you can write a series of statements, each
one computing a new value of kilometers based on the current value of miles.

kilometers = 20 * 1.60934;
// print it
kilometers = 25 * 1.60934;
// print it
kilometers = 30 * 1.60934;
// print it


Or you can use a while loop:


Lines 14 to 16 declare three variables and assign them with initial values. Line 19
is the while statement. The expression inside the parenthesis following the
while keyword is called the test conditional.

To use a loop, first we initialize miles with the value 20. We want the conversion
to end when miles reaches 90, so the variable end contains this final value.

The body of the loop is a compound statement - from line 20 to line 24 - which is a
series of statements surrounded by a set of { } . You might notice that the
body of a function is in fact a compound statement.

With a while loop, the body of the loop is run again and again as long as the
test conditional (on line 19) is true:

while (miles <= end)

The conditional tests whether end is greater than or equal to miles, and if true,
the loop will run again. <= is the “greater than or equals to” relational operator.

Starting with a set of initial values, and given the end condition, all we need to add
to make our loop work is to update the miles variable in the loop body so the
loop will eventually terminate. This is done in line 23:

miles += increment;

The += is the addition-assignment operator. It adds the right hand side expression
to the variable on the left hand side, and is equivalent to writing:

miles = miles + increment;

Note that Instead of using the end variable, we could have also used a numeric
constant on line 19:

while (miles <= 90)

Similarly, the variable increment does not change, so the expression can be
rewritten as

miles += 5;

Variables and #define Constant
We have seen two instances where the variables do not change values and the
identical program can be written using the numeric constants. When should a
variable be used and when should a symbolic constant be used? A property of
using a variable is that its use would be self-documenting if a good name is
chosen. Consider

miles += increment;

versus

miles += 5;

The constant 5 seems more arbitrary whereas using the name increment is
more deliberate. Moreover, if the value is used in more than one place, using a
variable means that if a change is necessary, you only need to change it in one
place. For example, imagine the value is used in more than one place, and you
want to change the value from 5 to 6. If you have used the constant 5, you will
need to find all occurrences of 5 and check if it is referring to the value in question
or some other use of the number 5, then change it to 6. This could be tedious and
be error prone. However, if you have the variable increment, then you only need
to change the initial assignment code.

Nevertheless, using a constant might have a slight runtime speed advantage. A
compromise is to use the #define C preprocessor directive:

#define INCREMENT (5)

Any reference to INCREMENT will be replaced by its definition (5). There are
good reasons why 5 is inside a set of parenthesis, which will be explained later. C
is case-sensitive, therefore words like Increment, increment, etc. do not match the
word INCREMENT exactly and will not be replaced.

The same program fragment, then, can be written as:

int miles = 20;

#define END 90
#define INCREMENT 5
while (miles <= END)
{
int kilometers = miles * 1.60934;
// print
miles += INCREMENT;
}

A common convention is to use UPPERCASE for #define names, but this is not a
requirement.

EXERCISE: rewrite the program using the things that you have learned so far.

For Loop
A while loop is not the only looping construct in C. You can also write a for
loop:

#define END 90
#define INCREMENT 5
for (int miles = 20; miles <= END; miles += INCREMENT)
{
int kilometers = miles * 1.60934;
// print
}

One thing to notice is that C encourages writing succinctly. Partially it is because
C was designed in an era where slow 300 baud teletypes were the primary
interface, so the fewer characters used, the better. However, terseness does not
equate being unreadable; a best practice is to make your code succinct but clear.

A for loop combines several elements of a loop in the for expression:

After the keyword for, a set of ( ) surrounds the for expression, which in the
case above is a list of 3 expressions separated by -two semicolons.

The init-expression is run once, before the test conditional and the for loop body.
Usually you write variable initialization(s) in the init-expression. You may also
optionally declare the variable here, if it has not been declared before.

The test conditional serves the same function as the test conditional in a while
loop: the loop body will run as long as the test conditional is true.

The post-expression is run after the loop body is run, but before the next test
conditional check.
With a for loop, all the mechanisms related to the looping construct are collected
in a single for expression. While the same code can be done using a while
loop, it is more readable to see the initial condition, the test condition, and the
loop-increment in the same place.

By the way, the init-expression may contain multiple initialization expressions,
separated by commas:

for (a = 0, b = 1, c = 2; …

EXERCISE: rewrite the example project with a for loop and #define values.

printf Format Code
Astute readers may notice that something is going on with the call to printf in
the “miles to kilometers” conversion program:

printf(“%d\t%d\n”, miles, kilometers);

The first argument to printf is a string constant and specifies a format string. A
format string may contain format codes. A format code starts with the character %
followed by format specifiers. The specifiers can get quite involved, with many
options, as explained in the chapter <The Standard C Library>. For now, it is
sufficient to know these common codes:

Codes Descriptions
%d print the argument as a signed decimal number
%x print the argument as a hexadecimal number
%u print the argument as an unsigned decimal number
%f print the argument as a floating point number
%s print the argument as a string
%c print the argument as a character

printf processes the format string one character at a time. If it sees a %<code>
format code, it fetches the next argument and prints it out according to the format
specifier. Otherwise, it prints out the character it sees.

VARIADIC FUNCTIONS: Most functions are defined to have a fixed number of
arguments, including “no argument”. You can write functions that take a variable
number of arguments; they are known as variadic functions. printf is such a
function. Variadic functions will be discussed in depth later.

Integer Constants
Integer constants are numbers, e.g. 42. A negative constant has a - prefix, e.g.
[7]
-42 . For symmetry, you may also write a positive number with + prefix, e.g. +42.

In normal writing, numbers are written in base 10. That is, each digit in a number
is a power of 10:

123 = 1*100 + 2*10 + 3*1
= 1*102 + 2*101 + 3*100

Other number bases are possible; hexadecimal is base 16. In C, hexadecimal
constants are written with a 0x or 0X prefix, and the letters ‘A’ to ‘F’ and ‘a’ to ‘f’
are used to represent the numbers 10 to 15. For example, 0xA is 10, 0x1A is 26
etc.

This will be explained further later.

Character Constants
Enclosing a character inside a single quote‘ ’pair is the C method of writing a
character constant. You may write:

int c = ‘C’;

The value of a character constant is its numeric value in the compiler’s
environment character set. In English speaking countries, the ASCII code is
almost always used.

ASCII Code
ASCII (American Standard Code for Information Exchange) is a standard of
encoding characters. There are plenty of ASCII tables on the web, but we can
even write a C program to print out the values. The important portion is:

#include <ctype.h>

printf(“dec\thex\tcharacter\n”);
for (int i = 1; i <= 127; i++)
{
printf(“%d\t0x%x”, i, i);
if (isprint(i))
printf(“%c”, i);
printf(“\n”);
}

The line before the for loop prints out the table header. Notice the use of the
[8]
escape code \t to print out tabstops. ASCII is a 7-bit code , with values from 1
to 127. The test-conditional of the for loop runs through all the valid ASCII
values.

The first printf inside the loop prints out the decimal value and the
hexadecimal value of the loop variable i.

[9]
The if statement executes the if-body if the test conditional is “true” :

if (isprint(i))
printf(“%c”, i);

The function isprint is a function in the C Standard Library. The #include
<ctype.h> statement provides information to the compiler about this function.
isprint returns a nonzero value if the input argument is a printable character.

printf is called to print out character code of i by using the format code %c.

EXERCISE: Modify one of the existing projects, or create a new one, and print out
the ASCII codes as above.

Floating Point Data Type
The “miles to kilometers” conversion program prints the converted values as
whole integers, even though the conversion factor 1.60934 is a floating point
number. Open the “Miles to Kilometers - FP” project, and when run:


The “Kilometers” are now output as floating point numbers.

The changes in the program are minor:

float kilometers = miles * 1.60934;
printf(“%d\t%f\n”, miles, kilometers);

The data type for kilometers is now float instead of int. A more subtle
change is that inside the string constant argument to printf, it now reads
“%d\t%f\n” instead of “%d\t%d\n”. The %f format code prints out a floating
point argument.

Floating point is almost an “advanced topic” for embedded programming, since
floating point operations result in much long sequences of machine instructions,
and the resulting code would be longer and slower than code not using floating
point. However, in this introductory chapter, we want to introduce the concept that
C contains other data type besides basic integer types, and floating point type is a
natural follow-on.

Integer and Floating Point Conversion
C allows you to intermix integer and floating point expressions:

float kilometers = miles * 1.60934;
miles is an int. When it is multiplied with a floating point number, 1.60934, the
compiler converts miles into a floating point number and a floating point
multiplication is performed. The floating point result is assigned to kilometers.
In the earlier example:

int kilometers = miles * 1.60934;

kilometers is an int in this case. For this example, just like the previous case,
miles is converted into a floating point number and a floating point multiplication
is performed. However, since the target of the assignment is an int, the
multiplication result is converted into an int, and then assigned to kilometers.

C has precise (but sometimes misunderstood) rules on what happens when you
mix expressions with different data types, as we will see later.

IMPORTANT: be sure to get a good understanding of C’s promotion and
balancing rules in the chapter <Expressions and Operators>. The rules are simple
but may be non-intuitive. Not fully knowing these rules often result in subtle
defects in a C program.

Fixed Point Data Types
A good alternative to using floating point computations, especially for embedded
systems, is to use fixed point computations. A fixed point number consists of an
integer, usually represented in two’s complement, and a scale factor. There is no
single “standard” scale factor, and each unique scale factor is considered as a
separate fixed point type. Obviously in a given program, only a limited number of
scale factors - maybe even just one - will be used. The choice of the scale factor
depends on the expected value range of the data being processed by the
program.

Arithmetic operations with fixed point objects are well-defined, and run faster than
floating point operations, if the target device does not have native floating point
instructions, which is the case with most microcontrollers. The downside of using
fixed point is that the value range of a particular fixed point type is limited
compared to the floating point type. Thus, it is a trade-off between the flexibility of
floating point versus the speed of fixed point.

Standard C does not provide fixed point data types or operations. Therefore often
there is no choice but to use floating point. However, an extension called
Embedded C does define these. It is expected that JumpStart C will implement
fixed point support in 2016, and this book will be updated at that time.

Arrays
An integer or floating point variable is called a scalar variable since it can hold
only one value at a time. C also has compound or aggregate variables that can
hold multiple values. The simplest aggregate variable type is an array.

Open the project “Miles to Kilometers - FPArray”. This is exactly the same as the
previous “Miles to Kilometers - FP” project, except that the results of the
conversions are also stored in an array:

int miles = 20;
int end = 90;
int increment = 5;
#define NUM_OF_ELEMENTS ((90-20)/5 + 1)
float kilos_array[NUM_OF_ELEMENTS];

printf(“Miles to Kilometers conversion\n”);
for (int i = 0; miles <= end; i++)
{
float kilometers = miles * 1.60934;
printf(“%d\t%f\n”, miles, kilometers);
miles += increment;
kilos_array[i] = kilometers;
}

An array declaration looks like this:

<data-type> <variable-name> [ <number-of-elements> ];>

Just like a scalar variable declaration, it starts with the data type of the variable,
followed by the variable name, then followed by the number of array elements
surrounded by a pair of [ ]s. All array elements must have the same type.

In this example, kilos_array is an array of float items. When you declare an
array, you must specify its dimension (the number of array elements). In C, array
dimensions must be a constant value, therefore we use the constant expression
(90-20)/5+1 which evaluates to the value 15. As in real math, C expressions
use precedence rules to determine the order of evaluation of the subexpressions,
hence the need to use parentheses to force the subtraction to be performed first,
but there is no need to put parentheses around the division since it will be
evaluated before the +1 addition.

The +1 in the dimension is needed to account for the last element to be stored.
Without it, the array will be one element too small.

The loop has now changed from a while loop to a for loop using the variable i
as an index to store into the array kilos_array. An array index starts with 0 and
should not exceed dimension-1, or NUM_OF_ELEMENTS-1.

Array Indexing
To access an array element, you write the array variable name, followed by an
index enclosed in [ ]. An array index must be an integer or an integer
expression:

kilos_array[i] = kilometers;


BUG ALERT: C does not check if the indexing is beyond the range of the array.
For example, it’s perfectly legal in C to access kilos_array[-1] or
kilos_array[NUM_OF_ELEMENTS]; with the latter being an easy mistake to
make. For example, this is a wrongly-written terminating condition:

#define NUM_OF_ELEMENTS ((90-20)/5 + 1)
float kilos_array[NUM_OF_ELEMNTS];
for (int i = 0; i <= NUM_OF_ELEMENTS; i++)
kilos_array[i] = 0;

By incorrectly using the greater-than-or-equal <= comparison operator, the last
element accessed would be one beyond the dimension of the array.

Reading an out-of-bound array element returns a random value, depending on
how the variables are laid out in memory, but writing to an out-of-bound array
element would almost certainly cause a problem. Unfortunately, this bug may not
show up until later, and bad memory writes such as this are a major source of
bugs in C.

Despite the major problems, there are good reasons why C does not perform
index bound checking, as we will see later in the chapter <Pointers and Arrays>.
Nevertheless, be careful with array indexing!

Character Arrays
Arrays are commonly used to store characters:

char spaceship[] = { “NCC-1701” };

Unlike some -programming languages, C does not have a string data type. Arrays
of char are used instead. In the above declaration, the array variable spaceship
is initialized with the string constant “NCC-1701”. Notice that the the number of
elements is omitted in the declaration, leaving an empty []. Since this is a
declaration with an initializer, the compiler determines the number of elements
needed and allocates enough space for spaceship.

A char array holding a string needs a way to determine the end of the string. This
is done by storing a numeric value of 0, also known as the null value, after the last
character of the string. For example, spaceship looks like this in memory:

spaceship:
| ‘N’ | ‘C’ | ‘C’ | ‘-’ | ‘1’ | ‘7’ | ‘0’ | ‘1’ | 0 |

With each | | denoting an 8-bit memory cell (a byte). The integer value 0 ends
the string.

The length of a string array includes the terminating 0.

Chapter Review
Through example programs:

There are special words in C (called keywords), for example, int, while,
return.
We have seen the basic structure of a C program.
There are good coding practices such as writing clear and useful comments.
Functions contain code that performs computations.
There are different kind of statements in a function, such as for, while,
return, and if.
Functions are called with arguments, containing information you wish to
communicate to the function.
printf uses format codes to output arguments in different forms.
printf is a variadic function, as it can take a variable number of arguments.
There are different kinds of operators, such as + - * / ++
Variables are objects that hold values.
A variable declaration informs the compiler of a variable’s attributes such as its
name and data type.
C has integer and floating point scalar data types.
C has aggregate data type such as arrays.
Out-of-bound array access can cause runtime problems and is difficult to
detect.
Character arrays are used to store strings, and are terminated by a null
element.
SECTION II
THE C PROGRAMMING LANGUAGE

This section explains the elements of the C programming language.

2. BASIC ELEMENTS OF C
This chapter presents the basic building blocks of C.
Keywords
Some names in C have special meanings. They are called keywords:

JumpStart C also adds the following extended keywords:



__flash
__firstarg
__typecode

Keywords are case sensitive.

Bits, Bytes, Words…
A bit is either on or off, represented by the values 1 and 0 respectively. In vastly
simplified terms: in electrical circuits, an “on” bit means that current is flowing
through, and an “off” bit means that current is not flowing.

Computer memory and other storage units group bits into addressable units, most
common with 8 bits into a byte. Units of 16 bits and higher have different names,
depending on the native word size:

Footnote
[10]
for Nibble

In C, bit numberings are read from right to left. Programmers count starting from
zero, so the bit on the far right, being considered the initial bit, is normally referred
to as “bit 0”. Bit 0 is also called the least significant bit (LSB), and the “zeroth bit”.
Bit 7, the far left bit, is likewise referred as the most significant bit (MSB) and the
“seventh bit”.

In a 32-bit word, the leftmost bit is the MSB, and it is bit 31. The most common
number representation used by modern CPUs is the 2’s complement form. In this
representation, one bit is the sign bit and usually the MSB is designated as such.
A sign bit of value 0 is a positive number and a sign bit of 1 is a negative number.

NOTE #1: One may argue that the rightmost bit is the “first bit”, the MSB would be
the 8th bit, or 32nd bit etc. However, most programmers use the 0th bit and 7th
bit/31st bit nomenclature.

NOTE #2: More information can be found in the Appendix <Introduction to
Computer Arithmetic>.

Integer Constants
Integer constants are numbers, e.g. 42. A negative constant has a - prefix, e.g.
-42. For symmetry, you may also write a positive number with + prefix, e.g. +42.

In normal writing, numbers are written in base 10. That is, each digit in a number
is a power of 10:

123 = 1*100 + 2*10 + 3*1
= 1*102 + 2*101 + 3*100

Other number bases are possible; hexadecimal is base 16. In C, hexadecimal
constants are written with a 0x or 0X prefix, and the letters A to F and a to f are
used to represent the numbers 10 to 15. An example of converting a hexadecimal
number to the decimal equivalent:

0xC0DE = C*163 + 0*162 + D*161 +E*160
= 12*4096 + 0 + 13*16 + 14
= 49152+ 208 + 14
= 49374

Hexadecimals are useful since a byte is 8 bits and half of a byte is a nibble, which
is 4 bits. A hexadecimal digit fits in a nibble exactly, and is particular useful when
used for low level bit patterns.

Octal is base 8. In C, octal constants are written with a 0 prefix. An octal number
fits in 3 bits exactly and thus is favored by some programmers. The only valid
digits are 0..7.

Binary is base 2. The C Standard does not define a binary notation, but most
compilers support the extension of using 0b and 0B as the prefix for a binary
number. 0 and 1 are the only valid digits.

SUFFIXES: the data type of an integer constant is an int. The following suffixes
are available to change data type of the constant. Note that you may use the
uppercase or lowercase letters for the suffixes:


BUG ALERT: it is easy to forget that a 0 prefix means an octal number, and not a
regular base 10 number!
Character Constants
Enclosing a character inside a ‘ ’pair is the C method of writing a character
constant. You may write:

char c = ‘C’;

Character constants are integers and can be assigned to int variable as well:

int ch = ‘C’;

The value of a character constant is its numeric value in the compiler’s
environment character set. In English speaking countries, the ASCII code is
almost always used.

You can also use the escape sequence (see string constants below) in a
character constant.

String Constants
A literal string, or string constant, is a sequence of characters enclosed in a pair of
double quotes.

“hello, world”
“I am \”alive\”!”

To write a double quote character “ inside a string, you use the character
backslash \, as shown in the second string above.

You can also put an arbitrary numeric value into a string by using numeric escape
sequences:

“hello,\x20world”

Numeric Escape Sequences
An escape sequence can be used in a character or string constant to insert a non-
printable character such as a tabstop, a new line, or a backspace.

octal escape sequence - you write: \d or \dd or \ddd where each d is an
octal digit (i.e. 0..7)

hexadecimal escape sequence – you write \xh or \xhh or \xhhh … where x
is the letter x, and each h is a hexadecimal digit (i.e. 0..9, A..F or a..f). The
example in the previous page \x20, is hexadecimal 20, which is the space
character in the ASCII set.

A numeric sequence terminates with the first invalid character for the sequence. A
common numeric escape sequence is ‘\0’, the null character.

Escape Sequences
The complete set of non-numeric escape sequences are:

As you can see, a number of the escape sequences are for formatting output on
old style CRT terminals and line printers.

CR and NL require a bit more explanation: CR moves the cursor to the beginning
of the line and NL moves the cursor one line down. A CR-NL combination does
[11]
what most people expect the keyboard key “ENTER” or “RETURN” to do . In a
standard C-conforming compiler, the escape sequence \n generates the right
sequence for the target environment. In microcontroller targets, it only matters
when communicating to the terminal emulator running on the host through the
UART and the low level character output function must map ‘\n’ into the host
requirement.

Integer Data Types
C has multiple integer data types, providing a choice of how much space a
variable may take and the range of the values it can hold.

SIGNED DATA TYPE: A signed data type hold negative and positive values.

UNSIGNED DATA TYPE: An unsigned data type hold positive values only.

Any signed integer type, e.g. int, can be written with the signed prefix, e.g.
signed int, but is not needed.

Signed and unsigned integers
The size or width of a data type or variable is expressed in number of bits.

The range of a data type depends on its width and whether it is signed or
unsigned. Let n be the number of bits in a data type, a signed type uses n-1 bits
to store the magnitude and 1 bit to store the sign. An unsigned type uses all n
bits to store the magnitude.

Signed range = -2n-1..2n-1-1
unsigned range = 0..2n

NOTATION: The .. between two integers denotes a range, e.g. -32768..32767
means from -32768 to 32767 inclusively.

C does not dictate the widths of the basic integer data types. However, the
following relationships and conditions must be observed by a conforming C
compiler:

1. The width of a char must be at least 8.
2. The width of a short must be at least 16 and must be at least as wide as of
a char.
3. The width of a int must be at least as wide as the short.
4. The width of a long must be at least as wide as the int.

The beauty of C is that by not enforcing fixed size data types, C can be compiled
to efficient code on most architectures. The resulting programs will maintain a
large degree of portability if the programmers are careful with their work. This
combination of efficiency and portability is one of the reasons that makes C the
most widely available language across all architectures.

Choosing an integer Data Type: Using the Native C
Type
With choices for integer data types, the question becomes which type to use. A
set of simple rules for the Cortex-M or most modern 32-bit CPU is:

1. If space usage is a consideration, and you know that the range of the values
will not exceed 8 bits, then use signed char or unsigned char.

2. If space usage is a consideration, and you know that the range of the values
will not exceed 16 bits, then use short and unsigned short.

3. Otherwise, just use int or unsigned int.

Floating Point Constants
A floating point number can be written in different ways:

Decimal: a decimal point is used, e.g. 3.14159267 or 1000.0023 etc.
Decimal notation is the “normal” way of writing a floating point number.

Scientific Notation: using the e-notation, e.g. 1.0000023e3. The number to
the left of the letter e (uppercase E can also be used) is called the digit term,
and the number to the right of the letter e/E is called the 10’s exponent.

Scientific notation is a preferred method to write very large or very small


floating number. For example:

0.0000314159267 is 3.14159267e-5
3141592.67 is 314.159267e4

A normalized floating point number in scientific notation is one where the digit
term always has one and only one digit to the left of the decimal point. For
example:

3.14159267e-5
3.14159267e6

are normalized numbers. To convert a number in scientific notation to decimal


notation, if the 10’s exponent is positive, then move the decimal point of the
digit term to the right by the 10’s exponent number, adding zeroes as needed.
If the 10’s exponent is negative, then move the decimal point of the digit term
to the left by the exponent number, adding zeroes as needed. A preceding 0.
will no longer be needed (e.g. 3.14159267e-5 is 0.0000314159267).

f or F suffix: an integer constant with a f/F or lf/LF suffix is a floating point
constant. You may also use the suffix with floating point constants in decimal
or scientific notation.

10f
10lf
3.14159267f
3.14159267e6lf

Floating Point Representations
At the lowest level, the “brain” of a computer, the Central Processing Unit (CPU)
operates on bits, with each bit having a value of 0 or 1. The CPU has basic
arithmetic operations (such as add, subtract, multiply) for signed and unsigned
integer types.

Most CPUs do not support floating point number operations directly. Most C
compilers use the IEEE floating point formats to store a floating point number
internally. The internal representation is similar to using scientific notation, except
that instead of using 10’s exponent, 2’s exponent is used due to the binary nature
of computer arithmetic.

There are two common IEEE floating point formats, corresponding to C’s float
point data types float and double:

Note that storing a written decimal floating point number, such as 3.14915267 in
the internal FP format might not be exact since the internal format is in binary.
That is:

float pi = 3.14159267;

if (pi == 3.14159267)


might not compare as expected. This will be discussed further later.

3. EXPRESSIONS AND OPERATORS

Algorithms
Algorithms are step-by-step instructions on how to solve a problem. For example,
given the question “how do you convert miles into kilometers”, the answer is to
use a mathematical formula

miles = kilometers * 1.60934

Indeed, programming in a procedural language such as C can be said to be an
exercise in finding the correct algorithms and data structures to solve the
particular problem.

Expressions and Operators


Operators are special symbols that are shorthand for operations, e.g. + and –
denote the addition and subtraction respectively. C has a rich set of operators,
which contributes greatly to the expressive power of the language.

A C expression is simply a group of names and operators that produce a value. It
is defined from “bottom-up”:

1. A term is a name, a numeric or a string constant
2. A term is an expression
3. The result of an operation is an expression
4. Operand(s) of an operation are expression(s).

With these rules, you can build arbitrarily long expressions. We have already seen
examples of expressions:

miles = kilometers * 1.60934

Or even

printf(“hello, world\n”);

is an expression consisting of a single function call operator.

Operator Precedence Levels and Associativity


Given an expression with two different operators <op1> and <op2> with 3
operands:

<expr1> <op1> <expr2> <op2> <expr3>

For example:

c = i + j * k;

The operands of the addition operator + are either i and j, or i and the result of
j * k. Which operands are grouped with the operation depends on the relative
precedence levels of the two operators; with the operands grouping with the
operator with a higher precedence level first.

In this example, since multiply * has higher precedence than +, the operands of
* are j and k, and operands for + are i and the result of j * k.

An operator may have the same precedence level as another operator. Let say
both <op1> and <op2> below have the same precedence level:

<expr1> <op1> <expr2> <op2> <expr3>

The term grouping or associativity refers to the grouping of the operands if the
operations have the same precedence level. There are two possibilities:

Left-to-right associativity means that the grouping is from left to right in order.

Right-to-left associativity means that the grouping is from right to left.

For example:

c = i + j - k;

Plus + and minus - have the same precedence level. Since their associativity is
from left to right, the operands of + are i and j, and the operands of - are the
result of i + j and k.

A set of parentheses ( ) can be used to override grouping and precedence level:

<expr1> <op1> (<expr2> <op2> <expr3>)

With the parentheses, <op2> is always done with <expr2> and <expr3> as
operands.

Precedence Levels And Associativity Table


The following table summarizes the precedence levels (from highest to lowest)
and the associativity of all the C operators.

ADVANCED TOPIC: Order of Evaluation
Note that the precedence level and associativity does not dictate the order of
evaluation of the subexpressions. For example:

c = (i * j) + l / m * 4;

To make the C interpretation of grouping clearer, let’s rewrite this as:

c = (i * j) + ((l / m) * 4);

While the groupings of the operands are clear, the order of evaluation of the
subexpressions are not defined. That is, (i * j), (l / m), and ((l / m)
* 4) may be evaluated in this order:

1. (l / m)
2. (i * j)
3. (<result of step 1> * 4)
4. <result of step 2> + <result of step 3>

Or it could be

1. (i * j)
2. (l / m)
3. (<result of step 2> * 4)
4. <result of step 1> + <result of step 3>

A C compiler may generate either set of code. This is important since some C
operators have side effects (e.g. the increment operator ++) and different
evaluation orders may produce different results. For example, the fragment

int i = 5;

i = ++i + i++; 13?

printf(“i is %d\n”, i);



produces a few different answers, depending on the evaluation order.

ADVANCED EXERCISE: Read the description of the increment ++ operator in a
later section, then determine all the possible values printed for i. Explain the
reasons.

Overflow
In pure mathematics, there is no limit how large an integer can get, and a floating
point operation will always produce a floating point result. This is not the case with
computer arithmetics. For example, with 32-bit int, C does not say what will
happen if the result of an operation exceeds 32 bits. This is called overflow.
Overflow can also occur when converting a floating point value to an integer
value.

Most CPUs include a method to determine whether an overflow has occurred, but
there is no portable C method of accomplishing this. Most CPUs simply discard
the overflowed bits and truncate the result. For mission-critical programming, this
is a condition that you need to be aware of.

Case Study: When Overflow Matters, the case of the


Ariane V
In 1996, on the maiden voyage of the European Space Agency’s Ariane V, the
rocket self-destructed because the engineers used an integer variable that was
“too small” and caused a catastrophic failure. In summary, a series of events lead
to this failure:

The code checked the horizontal velocity of the rocket to ensure that the
rocket was in the correct orientation.

The sensor data was collected in floating point format. To make it easier and
faster to manipulate, the data was then converted into integer format.

16-bit integers were chosen to store the converted data. Unlike other places in
the program where similar conversions were done, there was no overflow
check for this particular code fragment because the rocket would “never” fly
fast enough to cause an overflow to happen. (Never say “never”…)

The same code had been running successfully on the previous Ariane IV
rocket launches.

This piece of code should only have been run at the start of the launch and
then been disabled afterward, but the engineers left it running during the first
40 seconds of the launch in order to make it easier to restart the rocket in case
there was a countdown hold.

The Ariane V was much more powerful than the Ariane IV rockets, with much
faster acceleration and speed.

The failure happened when the horizontal velocity overflowed the 16-bit
integer at 36.7 seconds into the flight. With no overflow checking, the CPU
determined that something must have gone wrong.

The failure was confirmed when a redundant CPU executing the identical
software experienced the same conditions.
The rocket was in fact operating correctly, but the control system initiated a
self-destruct.

As software gets more complex, more reuse of existing code inevitably occurs. It
is important to make embedded code as robust as possible. While there is no
foolproof way to write 100% robust embedded system software, in hindsight,
clearly a few major mistakes were made in the software implementation in this
case.

Sometimes “over-spec”-ing a system is far more important than worrying about
performance issues. In the particular example of the Ariane rocket, they could
have:

1. used a 32-bit integer to store the result of the floating point conversion, and

2. checked for an overflow condition regardless of what size integer variables
were being used, and

Type Promotion
Unless it is being used as the operand of the sizeof operator, an integer
expression has one of four types: int, unsigned int, long, and unsigned
long. If the expression does not have one of these types, the compiler promotes
its type. The table below summarizes promotions for integer and floating point
operands.

Interger Value Promotion
When a type is promoted, the value also changes, by using the following rules:

Type Balancing
When you write a binary operator with arithmetic operands, the compiler balances
the operand types using these rules:

Both operands are converted to the balanced type, and the operation is done
using the balanced type. If the operands do not have unsigned int and long
type, then select the wider operand type and use the section 2 entries.

BUG TRAP: Type Does Not Propagate Downward


A common misconception is that type requirements are propagated downward in
an expression. That is:

// assuming int is 16 bits and long is 32 bits
long l;
int x, y;

x = 0x7FFF;
y = 0x7FFF;
l = x * y;

// expected result:
// l == 0x3FFF0001
// actual result
// l == 0x0001

The expectation is that since the left hand side has the type long, that the right
hand side will use long multiply. In fact, this is not true: type promotion and
balancing work from bottom-up and not top-down: “x * y” above is performed
using int multiply and not long multiply since both operands are of type int.
Multiplying 0x7FFF with 0x7FFF overflows 16 bits and the excess bits are
dropped.

For JumpStart C for Cortex-M, this is not an issue, since int and long are both
32 bits. Nevertheless, this is a gotcha that C programmers should be aware of.

BUG TRAP: Unexpected Byte Promotion Effect


Given this example, does the if test succeed?

Unsigned char uc = 0xFE;

if (~uc == 0x1)


~ is the bitwise inverse operator: any bit that is one is flipped to zero and any bit
that is zero is flipped to one. Therefore, it would seem logical that for an 8-bit
value of 0xFE (binary 0b11111110), flipping the bits would result in 0b00000001, or
0x1.

Reasonable logic, but unfortunately incorrect in C. As the promotion rules dictate,
before applying the bitwise inverse operator ~, uc is zero-extended into an
int, e.g. to 0x000000FE. After applying the ~ operator, the result is 0xFFFFFF01,
and thus the comparison “~uc == 0x1” will fail.

This is an issue with any type narrower than int, not just the char type.

Assignment Operators
The assignment operator assigns the value of the right operand to the left
operand.

[12]
The data type of the operands must be “similar” . The left operand can be one
of the following (this list corresponds with the example list in the table above). The
meaning of the operators mentioned in the list will be explained in a later section:

1. A variable name
2. Dereference of a pointer expression or pointer variable
3. An array element
4. A structure member reference
5. A pointer to structure member reference

In C, the arithmetic, the bitwise, and the shift operators have assignment-form,
e.g. += is “add-assigns” -= is “subtract from” etc.

i += j

is equivalent to writing

i = i + j;

The only difference between the two forms is that in the assignment-form, the left
operand is evaluated only once. The assignment-forms are listed in their
respective operator sections.

Arithmetic Operators
Arithmetic Operators mainly correspond to their mathematical counterparts.
Arithmetic operands must have integer, floating point, or pointer data types.
Pointers are one of the most important features in C, and will be explained later.

Notice that arithmetic operators have assignment-forms.



INCREMENT AND DECREMENT OPERATORS: Increment operators add one to
the operands. Decrement operators subtract one from the operands. There are
two flavors: when written in front of the operand, they are called preincrement /
predecrement. Preincrement / predecrement operators perform the operations,
and then the results of the operations are used as the results of the expression.

When written after the operands, they are called postincrement / postdecrement.
The operand values are used as result of the expressions, and then the
operations are performed.

In standalone expressions, there is no difference between pre- and post-
operations:

int i = 5;

++i;

is the same as writing

i++;

However, if used in a larger expression, there will be a difference:

int i, j, k;

i = 5;
j = ++i;
i = 5;
k = i++;

At the end of the code fragment, j has the value of 6, but k has the value of 5.

The increment and decrement operators are particularly useful when used in
combination with the pointer indirection * operator, explained later.

Comparison / Relational Operators


A comparison operator compares two operands, and then produces either 1 (used
for “true”) or 0 (used to say “false”) as the result. A comparison operator is usually
used in the conditional test of an if, while, or for statement, but it can also be
used in other contexts, e.g. as operand of an arithmetic operation.

BUG ALERT #1: You cannot cascade comparison operators. That is, the
following piece of code probably does not do what you meant:

if (0 < i < 10)


you must write the following instead:

if (0 < i && i < 10)


EXERCISE: Nevertheless, if (0 < i < 10) is legal C. What does it mean?

BUG ALERT #2: Sometimes it is easy to mistakenly write the assignment
operator = instead of == inside a conditional check. Unfortunately, it is legal C and
may even be useful for advanced developers to write such an expression, but the
compiler does not necessarily warn beginners against such usage.

Test Conditional Context


if, while, do-while, and for statements take a test-expression and make a
decision based on whether the expression is true or not. For example, in the
“miles to kilometers” conversion program, we have the while loop:

while (miles <= end)


the while loop continues to run as long as the conditional test is satisfied. Unlike
[13]
some other languages, C does not have a boolean data type , i.e. one that is
either true or false. In C, if a test expression is needed and the expression is not a
conditional operation, then any nonzero result is considered as true and only zero
value is considered as false. In other words:

while (miles != 0)


is the same as writing

while (miles)


Since C allows nested expressions, a sequence of statements such as this:

int i = afunction();

if (i != 0)


can be reduced to:

if ((i = afunction()) != 0)


and even further reduced to:

if (i = afunction())


However, this last reduction is not recommended, as it could be a typo for
if (i == afunction()). To eliminate such ambiguity, the earlier forms are
preferred.

Logical Operators
Logical Operators evaluate multiple conditional expressions.

In C, the logical AND (written as &&) and logical OR (written as || ) combine


comparison expressions. For example, to see if a character array is a
hexadecimal number:

// return 1 if “str” contains a hexadecimal number.
// Only the uppercase characters X A..F are checked
int isHexadecimal(char str[])
{
if (str[0] == ‘0’ && str[1] == ‘X’ &&
(isdigit(str[2]) || ‘A’ <= str[2] && str[2] <=
‘F’))
return 1;
return 0;
}

The basic algorithm is that a string is a hexadecimal if it starts with the
hexadecimal prefix, then followed by a hexadecimal digit. The if statement
implements this algorithm.

For brevity, only upper case letters are checked. The if test-expression consists
of three primary tests:

if (<test for first character> && <test for second
character>
&& <test for third character>)


The first two tests are translated into C in obvious way. The third test involves

1. if the 3rd character is a digit, or
2. is the 3rd character between ‘A’ and ‘F’?

The logical AND && has higher precedence than logical OR ||. Therefore, the
“test for third character” is implemented as:

(isdigit(str[2]) || ‘A’ <= str[2] && str[2] <= ‘F’)

Logical operators perform short-circuited evaluation. For logical AND, if the left
operand is zero, then it will not evaluate the right operand, since the result is
going to be zero regardless the result of the right operand. Not only does this save
execution time, but there are instances where without short-circuited evaluation,
the program would run incorrectly:

if (LOW < x && x < HIGH && EmitLaser(x))


The intention is to check the range of a control variable x, and if it is in range, then
call the function EmitLaser—which as the name suggests, may emit a laser
light.

Without short-circuited evaluation, the code has to be written as follows to avoid
calling EmitLaser with out-of-range data:

if (LOW < x && x < HIGH)
{
if (EmitLaser())

}

For Logical OR, short-circuited evaluation means that if the left operand is
nonzero, then it will not evaluate the right operand, since the result is going to be
non-zero regardless.

BEST PRACTICE: avoid “macro abuse”, using #define to give these operators
[14]
“better” names, e.g. some people write

#define AND &&
#define OR ||

This is not recommended. Doing this makes your programs much less readable
to experienced C programmers.

BUG ALERT: Logical operators have lower precedence than bitwise operators.
An expression like this must use parentheses to get the likely intended result:

if ((status_flag & 1) && other_condition)

Shift Operators
Shift operators shift an integer by a specified number of bits. Shifts can be used
for:

multiplying by a power of two,
an unsigned divide by a power of two,
extracting a value packed inside a word

The ARM Cortex-M3 and above (ARM architecture V7m and above) includes
optional shift as part of the addressing mode of load and store instructions. This is
particularly useful for accessing array elements.

Notice
that shift operators have assignment-forms. Only integer operands are allowed.

Left shift moves all the bits of the left operand X position to the left, where X is
the value of the right operand, and fills the lowest order bit with zeroes. For
example:

unsigned char uc = 0b01110101;
uc <<= 3;

after the operation, uc will have the value 0b10101000. The upper 3 bits are
shifted off to the “bit bucket” and are lost. Left shifting by X bits is the same as
multiplying the operand by 2X:

var << 1 → var * 21 → var * 2
var << 2 → var * 22 → var * 4
var << 3 → var * 23 → var * 8
… etc.

Right shift moves all the bits of the left operand X position to the right, where X is
the value of the right operand. For a signed integer type, the sign bit / MSB is
replicated from the left. For an unsigned integer type, the vacant bits are filled
with zeroes. For example:

unsigned char uc = 0b11110101;
uc >>= 3;
// uc is now 0b00011110

signed char sc = 0b11110101;
sc >>= 3;
// sc is now 0b11111110

Right shifting an unsigned operand by X bits is the same as dividing the
operand by 2X. However, right shifting a SIGNED operand is not equivalent as
division:

unsigned-var >> 1 → unsigned-var / 21 → unsigned-var / 2
unsigned-var >> 2 → unsigned-var / 22 → unsigned-var / 4
unsigned-var >> 3 → unsigned-var / 23 → unsigned-var / 8
… etc.

NOTE: the result of shifting by more than the number of bits in an int type is
undefined. For example, shifting by more than 32 bits in Cortex-M is undefined.

Bitwise Operators
Bitwise operators operate on all the bits of the operands. There is no difference
between signed or unsigned operands.

Notice that bitwise operators have assignment-forms. Only integer operands are
allowed.

The bitwise operators apply the bit operations, as described in the
“Implementations” column above, to each bit of the operand(s). For example,
bitwise AND-ing two 8-bit operands:

operand 1: 10101101
operand 2: 01101110
— bitwise AND –––––
result: 00101100

Bitwise operations are commonly used in low level code such as accessing
microcontroller’s I/O registers, writing device drivers etc. Sometimes they are
used because the low level access (e.g. I/O registers) requires it, and sometimes
they are used where size and speed are at a premium.

TIPS #1: Turning on all bits in an unsigned variable: to turn on all bits in an
unsigned variable, assign ~0u to the variable. You will need to cast the
expression if the unsigned result is to be of a narrower type than unsigned
int:

unsigned char uc;
unsigned short us;
unsigned int ui;

ui = ~0u;
us = (unsigned short)~0u;
uc = (unsigned char)~0u;

TIPS #2: Bit toggle: to toggle a bit, exclusive OR it with the value 1.

I ^= 1; // toggle i

ADVANCED TIPS #1: to swap two same size variables without using a
temporary:

x = x ^ y;
y = x ^ y;
x = x ^ y;

ADVANCED TIPS #2: to isolate the rightmost bit that is ON of an unsigned
variable:

x = ~(x – 1);

or

x = ~x + 1;

Address-Of and Indirection Operators, and Pointer


Variables
All C objects (e.g. variables and functions) are placed in memory. The size and
location of the object is known as its allocation.

Unlike most other programming languages, C allows you to obtain the address of
[15]
any object . This is known as taking the address of a variable. There are many
uses for this feature and give C much of its usefulness. For now, consider that
with the address of a variable, you can read or write this variable indirectly without
using its name. This is known as aliasing.

Given a
variable with data type TypeX, the address of the variable has the data type
“pointer to TypeX”. To declare a variable of the data type “pointer to TypeX”, you
write a * in front of the variable name, to mirror the indirect or dereference
operator:

// “pui” is a “pointer to unsigned”
unsigned ui, *pui;
ui = 42;

// take the address of “ui” and assign it to “pui”
pui = &ui;

printf(“ui is %d, the address is %p\n”, *pui, pui);

After the assignment “pui = &ui”, *pui is an alias to ui. The printf call
prints out the value of *pui, which is 42, and the value of pui, which is the
address of ui. The format code %p in printf specifies the argument is a
pointer variable.

You can also modify ui indirectly through pui:

pui = &ui;
*pui = 5;
printf(“ui is %d\n”, ui);

prints out that “ui is 5”.

BUG ALERT: Illegal Pointer Access
Dereferencing a pointer variable will cause issues if the pointer variable does not
contain the address of a valid object. Accessing through uninitialized pointers or
pointer values that are beyond the bounds of the allocated objects is the biggest
source of program errors in C.

Array Subscript Operator


To access an element of an array, you use the subscript operator.

To access an element of an array, you write the name of the array variable
followed by the index enclosed in [ ]. The index must be an integer type.

As we will later see, an array subscript is semantically equivalent to *, the pointer
dereferencing operator. In other words:

char str[] = { “hello, world” };

printf(“%c %c\n”, str[0], *str);

produces the output

h h

(but let’s discuss this more in the chapter <Pointers and Arrays>)

Structure and union Member Access


Structure and union (struct/union for shorthand from now on) are
aggregate types. Unlike arrays, where all elements are of the same data type,
members (also called elements) of a struct/union can be of any data type.
Each member has a name, just like a variable name. struct/union will be
described fully in the chapter <Types and Declarations>.

Without in-depth explanation, here is an example of how these operators are


used:

struct name_record {
struct name_record *next;
char name[20];
unsigned id;
} astruct, *astruct_ptr;

astruct_ptr = &astruct;
strcpy(astruct.name, “John Doe”);
printf(“name is %s\n”, astruct_ptr->name);

produces the output

John Doe

Function Call Operator


You call a function by writing the name of a function followed by the list of
arguments enclosed in a set of parentheses ( ). If the function does not accept
any arguments, then the argument list is empty. If there is more than one
argument, they are separated by commas.

BEST PRACTICE: While not strictly required by the C Standard, you should
always provide a function declaration prior to calling it. Otherwise, the return-type
or the argument types in the function may not match the types you called it with,
which may lead to runtime errors. A function declaration is sometimes known as a
function prototype or a function signature.

A function call may return a value. This is indicated by the function prototype. If a
function does not return a value, its return-type is void. A function call that
returns a value may be used anywhere that a value of that type is allowed.
Function calls can be nested (a function call can be used as an argument to
another function).

// a couple function prototypes
extern int foo(int, int);
extern int bar(int);

// foo() is a “nested” function call
nti = bar(foo() * 2) + 5;

ADVANCED TOPIC – “C With Classes”: JumpStart API uses C With Classes to
make the API functions easy to use. “C With Classes” is a feature borrowed from
C++, and in JumpStart API, allows you to write function calls such as this:

porta.MakeOutput(5, OSPEED_LOW);

The syntax is <struct var>.<member function> ( argument list ). C
With Classes is fully described later.

Conditional Operator
The Conditional Operator is the only ternary operator (i.e. with 3 operands) in C.

The first operand of a conditional operator must be of the arithmetic type, and is
evaluated as to whether it is nonzero (true) or zero (false). If it is nonzero, the
second operand is evaluated and the result of the expression is the result of the
second operand. The third operand is ignored. However, if the first operand is
zero, the third operand is evaluated and the result of the expression is the result
of the third operand. The second operand is ignored.

The second and third operand must be of compatible types.

[16]
A conditional operator can replace an if-else statement . For example, the
code in the table above “nti = isdigit(a) ? ‘Y’ : ‘N’;” can be written
in this much longer form:

nti ;
if (isdigit(a))
i = ‘Y’;
else
i = ‘N’;

Besides saving keystrokes, conditional expressions can be used as part of larger
expressions that would otherwise require more complicated control structures
and variables to hold temporary results. For example:

printf(“‘a’ is a digit: %c\n”, isdigit(a) ? ‘Y’ : ‘N’);

which is easier to read than the alternatives using if-else.

Cast Operator
A Cast operator casts the operand to a result with the casted-to type. It is written
as a type declaration inside a set of ( ) in front of an expression.

A Cast operator can be used for the following purposes. In the below examples,
assume the following variable declarations

signed char sc;
unsigned char uc;
nti ;
unsigned ui;
float f;
double d;
char *pc;
int *pi;
void *pv;

Conversion marked with (*) can be done without writing the cast operator
explicitly, as the compiler may deduce the intended operation by applying the type
promotion rules. This is known as Free Cast.

Converting between signed and unsigned integer types of the same size:
casting between signed and unsigned integer of the same size produces a
value with the same bit-pattern, but with the casted-to type. Mainly useful to
bypass compiler type checking rules.

I = (int)ui; // no code generated, type change only

(*) Converting a smaller integer type to a larger integer type: converting a
smaller signed integer to a larger integer type involves sign-extending of the
operand. Sign extension of a positive number fills the upper bits with zeroes and
sign extension of a negative number fills the upper bits with ones.

Converting a smaller unsigned integer to a larger integer type uses zero-
extension where the upper bits are filled with zeroes.

I = (int)sc; // signed extension
i = (int)uc; // unsigned extension

Converting a larger integer type to a smaller integer type: the excess bits are
discarded.

Sc = (signed char)i; // truncate

(*) Converting a floating point type to a different floating point type: a
compatible value is produced. When converting from a 64-bit double to a 32-bit
float, some precision may be lost.

F = d; // value may overflow
d = f;

(*) Converting an integer to a floating point type: a 32-bit floating point can
[17]
only hold about 7 decimal significant digits . Any integer with more than 7
decimal significant digits cannot be represented exactly in a 32-bit floating point
value.

A 64-bit floating point has about 16 decimal significant digits.

F = (float)i; // value may not be exact
d = (double)i; // value may not be exact

(*) Converting a floating point to an integer: the fractional portion is discarded,
and then the integral portion is truncated if needed.

I = (int)f; // value may truncate

Converting between a pointer type to a compatible pointer type: produces a
value with the same bit-pattern, but with the casted-to type. Mainly useful to
bypass compiler type checking rules. Void * is compatible to any data pointer
type.

Pc = (char *)pi;
pv = (void *)pi;

(*) Converting integer zero to a pointer type: a pointer value of zero is the null
pointer, signifying that it does not point to a valid location. However, in a
microcontroller environment, it is possible that zero may be a valid address; some
microcontrollers have memory at location 0.

Pc = 0;

Converting between a pointer type and an integer type: except for the integer
0, this is an unsafe practice and should be avoided.

Ui = (unsigned)pc;
pc = (char *)ui;

Bit patterns may change and casting back-and-forth may produce a different
result than the original value. With the above code fragment, there is no
guarantee that the pc has the original value.

Casting any expression to a “void” type: this discards the result of the
expression.

(void)ui;
(void)function_call();

Sizeof Operator
The sizeof operator is the only operator that use letters and not symbols. It
returns the size of a data object in number of bytes.

The following are valid operands to the sizeof operator (see examples in table
above):

1. a constant (integer, floating point, character, string)
2. a variable name
3. an array element
4. a struct/union member reference
5. dereferencing of a pointer variable (e.g. *ptr)
6. a type declaration
7. a typedef’ed name (see <Variables And Type Declarations>)
8. a function name (returns the size of a function pointer, not the size of the
function)

A set of enclosing ( ) is optional for the operand unless the operand is a “type
declaration”. In that case, the ( ) is required. However, for consistency, you
should just enclose all sizeof-operands with a set of ( ).

Comma Operator
The comma operator is just a comma (,), but it is different from the comma used to
separate arguments in a function call or a function prototype, or the comma used
to separate a list of variables in a variable declaration.

The operands of a comma operator are evaluated from left to right. The value of
the left operand is discarded and the result of the comma operator is the value of
the right operand.

A comma operator is most frequently used in the “initial expression” of a for
statement: as we will see in the chapter <Statements>, the general form of a for
statement is:

for (<init expr> ; <test condition>; <post expr> )


The <init expr> is typically for initializing loop variables. If there is more than
one variable you wish to initialize, then you can use the comma operator achieve
that goal:

for (i = 0, j = 0; …

A comma operator is also useful when you need to perform an operation, possibly
a function call, in a nested expression. For example:

nti = isdigit(a) ? (foo(), ‘Y’) : ‘N’;

Using the comma operator, you can call the function foo without affecting the
value of the subexpression.

4. STATEMENTS

C Statements
Most C statements provide control flow mechanisms, allowing the program to
execute pieces of code conditionally or repeatedly. Statements cannot be
embedded inside an expression. Some statements, including break, continue,
and case labels, can only be used inside other statements.

Statement Label
Any statement may be preceded with zero or more labels, in the form of

label:

For example:
// “top” is a label, as indicated by the colon ‘:’
top:
foo = a + b;

Any valid C identifier may be used as a label name as long as the name does not
conflict with another label name within the same function. A label is typically used
as target of a goto statement within the same function body. A label may appear
before or after the goto statement that references it.

Expression Statement
An expression statement is simply an expression. It’s called a statement just so
that we do not need to write things like “the body of a while loop is a statement or
an expression”.

Statements are at the “top level” (i.e. there is no such thing as a “sub-statement”,
unlike subexpressions) and it makes no sense to write an expression without side
effects:

nti , j;

i + j; // and then…?

[18]
Therefore, an expression statement usually has an operator with side effects
at the top level. These include assignment operators, increment / decrement
(which is a form of assignment) operators, and function calls.

K = i + j;
—i;
foo();

Compound Statement
A Compound statement is a list of statements and declarations enclosed in a pair
of { } , usually for the purpose of grouping them together as a single statement
to the body of an if-statement or a loop etc.

The body of a function definition is in fact a compound statement. Note that a
compound statement does not have or need a terminating semicolon.

Null Statement
A Null Statement is just a semicolon. This is useful if the syntax requires a
statement but the program has nothing to perform.

// copy src to dst until a null is encountered
char *strcpy(char *dst, const char *src)
{
char *s = dst;

while ((*dst++ = *src++) != 0)
;
return s;
}

All the work is done in the while condition itself, therefore the while body-
statement is a null statement.

Notice the use of the idiom (*dst++ = *src++) != 0. This single expression
assigns the value pointed-to by src to the address contained in dst, then
increments both pointers, and then the copied value is checked against 0.

If and if-else Statement


Use an if or if-else statement to make a decision in your program.

Syntax Form
if (<if-expr>)
<if-body statement>

if (<if-expr>)
<if-body statement>
else
<else-body statement>

If the <if-expr> is nonzero, then <if-body statement> will be executed.
Otherwise, if there is an else statement, the <else-body statement> will be
executed.

An else statement is always associated with the closest if statement. That is:

if (x > 0)
if (ADC() < 1.0)
{
LED(1);
DelayMS(10);
}
else
LED(0);

The else is interpreted as the else portion of the closest if. If you want to
associate the else to the outer if, you write:

if (x > 0)
{
if (ADC() < 1.0)
{
LED(1);
DelayMS(10);
}
}
else
LED(0);

Enclosing the inner if with the { } forces the else to be interpreted as part of
the first if. (The indentations in the example are for readability purposes only and
have no effect on the interpretation.)
while Loop
Syntax Form
while (<expr>)
<statement>

In a while loop, if the value of <expr> is nonzero, execution continues to
<statement> and the process repeats until the value of <expr> is zero. Within
the loop body, a break statement may be used to break out of the loop. A break
statement is useful if the exit condition is not convenient to be checked at the top
of the loop, see example below.

A continue statement jumps to the test expression immediately, skipping the
rest of the loop body. Example:

char buffer[128];
int len = 0;

while (len < sizeof (buffer)-1 && (c = getchar()) != EOF)
{
if (c == ‘\n’) // end of processing
break;
if (c == ‘\b’) // backspace
{
if (len != 0)
—len;
continue;
}

buffer[len++] = c;
}
buffer[len] = 0;

The example code fragment stores a line of keyboard input into a character array
named buffer. The test expression ensures that the buffer has enough space
and that the input has not terminated ( != EOF). The loop terminates if the input
is a newline or ‘\n’. If the input is a backspace or ‘\b’, then the last character
written to is discarded.

A “forever” loop can be written as:

while (1)


A forever loop is useful in main, as it typically serves no purpose for embedded
system firmware to “return”.

Do-while Loop

Syntax Form
do <statement>
while (<expr>);

A do-while loop is the same as a while loop except that the test expression is
tested at the bottom. Therefore, a do-while loop is executed at least once,
whereas a while loop may never execute if the test expression is zero the first
time it is run.

Notice there is a semicolon after the while keyword, as a terminator for the do-
while statement.

For Loop

Syntax Form
for (<init-expr>; <expr>; <post-expr>)
<statement>

A for loop is a shorthand of writing a while loop, but combines an initial-
expression and a post-expression in a single syntactic element.

The semantics for the test expression <expr> and the use of break and
continue statements inside the loop body are exactly the same as in the while
loop. A continue statement will jump to the post-expression, before proceeding
the test expression check.

Example:
int n = 0;
for (nti = 0; i < 16; i++ )
{
channel_reading[i] = readADC(i);
n += channel_reading[i];
}

<init-expr> is typically used for loop variable initialization, and you may
declare the variable in-place, as shown in the above example. You may even
declare multiple variables, separating them by commas:

for (nti = 0, j = 0; …

However, as <init-expr> must be an expression, you cannot write multiple
declaration statements. The following is incorrect:

for (nti = 0; int j = 0; …

When you declare a variable as part of the <init-expr>, the “scope” of the
variable ends after the for statement body. Scope is discussed in the chapter
<Variables>, but this just means that you cannot access the variable beyond the
for statement body.

If you are do not use a declaration syntax, e.g.

for (i = 0, j = 0; …

then it is a list of expressions separated by the comma operators. <post-expr>
is typically used for loop variable increment.

Note that all of the expressions, including the test expression, are optional, and
can be omitted. This is different from the do and while loops, where the test
expression is not optional.

A forever loop can be written as:

for (;;)


as an alternative to

while (1)

break and continue Statements


break and continue statements can be used only in the body of a while, do-
while, and a for statement; and a break statement can also be inside a
switch statement body.

Syntax Form
break;

continue;

A break statement exits the loop or the switch statement.

A continue statement skips the rest of the statement body and jumps to the test
expression portion of the while and do-while statements, and to the <post-
expr> portion of the for statement.

Break and continue Inside a Multi-Level Loop or a


Switch
A break or a continue statement is associated with the closest do-while,
for, or while statement, and in the case of the break statement, also the
closest switch statement. If you have a multi-level loop, e.g. a for loop nested
inside another loop, and you need to break out of both loops at once, you can add
a loop variable or use a goto statement (see <Dijkstra’s Bastard: The Goto
Statement>, below.)

For example,
int done = 0;

for (i = 0; !done && i < 16; i++)
{
while (1)
{
c = getADC();
if (c == 0)
break;
if (c > 10)
{
done = 1;
break;
}
}
}

In this example, the outer for loop normally executes 16 times, and the inner
while loop breaks out of the while loop if it either receives a value of 0 or a
value greater than 10 from the getADC function. In the latter case, the done
variable terminates the for loop as well.

Return Statement

Syntax Form
return;

return <expr>;

A return statement transfers control back to the calling function. If the function
has a return type, then an expression compatible with the return type must be
specified. If the function has no return type, i.e. a return type of void, then the
return statement must not have a return expression:

return 1; // returning a value

return; // function return with no return value

If the last statement of a function is not a return statement, then the compiler
inserts one so that execution will resume properly.

When a return statement executes, storage for local variables is reclaimed and
the C environment is restored to the calling function.

Example:

int foo(void)
{
return 42;
}

void bar(void)
{
return;
}

Note that the return expression does not need to be enclosed inside a set of ( ).
Some people always write a return statement that way, but it’s optional.

Switch Statement

Syntax Form
switch (<expr>)
{
case <const1>:
break;
default:
}

A switch statement evaluates an integer expression and compares the value to
the case label values in the body statement. If there is a match, then control is
transferred to the statement following the case label. If there is no match, and if
there is a default label within the body, then control is transferred to the
default label. If there is no default label, then execution proceeds to the
statement following the switch statement.

Example:

// check if a character is a hexadecimal or decimal digit
switch (ch)
{
case ‘a’: case ‘b’: case ‘c’: case ‘d’: case ‘e’: case ‘f’:
is_hex = 1;
// FALL THROUGH
case ‘0’: case ‘1’: case ‘2’: case ‘3’: case ‘4’: case ‘5’:
case ‘6’: case ‘7’: case ‘8’: case ‘9’:
is_digit = 1;
break;
default:
is_hex = is_digit = 0;
break;
}

A switch body is usually a block statement. All case labels must have unique
integer constant values within this switch body. Once execution starts at a case
label, it continues until either a break or return statement is executed or the
rest of the switch body is executed, ignoring any intervening case and
default labels.

In some programming languages, execution of a case body terminates when
another case label is encountered. Since this C behavior is potentially an error
case (e.g. the programmer accidentally forgot to put in a break statement), some
source code analysis programs would flag a case body without an unconditional
break statement. The comment

// FALL THROUGH
case …

is often used to inform these tools that this is intentional.

A straightforward implementation of a switch statement is the equivalent of
performing a series of if-else comparisons until a matching value is found or
until all tests are exhausted. A compiler may optimize the generated code by
using a jump table, or using a binary sorting algorithm. JumpStart C for Cortex-M
optimizes to use jump tables whenever possible.

“Dijkstra’s Bastard”: The goto Statement


Published in the 1968 Communications of the ACM (CACM), Edsger Dijkstra’s
letter “Go To Statement Considered Harmful” was a seminal and controversial
protocol document admonishing against the overuse of the goto construct. The
undisciplined uses of goto makes the program structure looks like “spaghetti
code”, making it hard to follow the flow of control and understand the logic of the
program. Dijkstra recommended replacing gotos with structured programming
using looping constructs and modularization using functions.

In C, a goto statement transfers control to the named label, which may appear
anywhere within the same function body. Unlike the use of other names, no
forward reference or declaration is needed for a goto label name.

While goto has a bad reputation and can be abused, it solves several problems
that are difficult to solve otherwise. For example, one of its most common uses is
to break out of multi-level loops. Recall in <break And continue Inside A
Multi-Level Loop>, an extra variable has to be created and checked to break out
of a nested loop to the outer level. Using goto, the example can be written as:

for (i = 0; i < 16; i++)
{
while (1)
{
c = getADC();
if (c == 0)
break;
if (c > 10)
goto done_label;
}

}
done_label:


Whether you prefer to use goto, or a created status variable, or even to rewrite
the logic of the code to avoid using either method is up to you.

Another use of goto is to utilize a segment of common (frequently used) code
inside a function where modularizing by using a call to another would not be
feasible, perhaps due to the large amount of context that needs to be passed to
the function.

For example, in JumpStart C’s assembler for Cortex-M, encoding for PUSH and
POP instructions are the same as the LDSTM instructions if certain conditions are
met. The assembler carries a lot of context (e.g. local variables), and making the
common code into a function would mean either changing local variables into
global variables, or passing a large number of variables (individually or packaged
as a struct) to the function. This could have been solved if C supports nested
functions (i.e. a function definition inside another function) but since it does not,
the goto statement is one of the most elegant solutions:

case S_POP:
if ((reglist & ~0x80FF) == 0)
{
opcode = mp->opcode[0] | (reglist & 0xFF);
if ((reglist & 0x8000) != 0)
opcode |= 0x100;
outaw(opcode);
break;
}
goto _LDSTM;

case S_PUSH:
if ((reglist & ~0x40FF) == 0)
{
opcode = mp->opcode[0] | (reglist & 0xFF);
if ((reglist & 0x4000) != 0)
opcode |= 0x100;
outaw(opcode);
break;
}
goto _LDSTM;

In this code fragment, the special cases of the POP and PUSH instructions are
handled first. If the instructions do not fit into those constraints, then they use
goto to jump into the generic case that also handles the LDSTM instruction.
Without using a goto statement, this code sharing task would have been more
difficult.

5. VARIABLES


We have seen examples of variables and their declarations already. This chapter
discusses variables in detail, and the next will discuss types and declarations.

For the most basic understanding of variables, you only need to know a few
things:

1. Variable types: the difference between local, global, and static variables
2. Initializers: how to write initializers

Once you know these, you can “start writing code”, but we will also discuss
additional information important to mastering C.

Variable Names
A C variable name, also called an identifier, has the following syntactic
restrictions:

1. A variable name cannot be a C keyword.

2. The first character must be either an alphabetic character or an underscore _.

3. Subsequent characters can be alphabetic, numeric, or an underscore.

4. Symbols other than underscores, e.g. - $ % @ etc. cannot be used.

5. Standard C advises not to exceed certain length for a name for portability
reasons, but most modern compilers support names in the excess of at least
100 characters, so this is not a concern in most cases.

6. Variable names are case sensitive, e.g. “numberof” is different from


“NumberOf”.

Summary of Variable Types


Here are the “rules of thumb” on which variable type you need:

1. (This is usually the case) If the variable is only needed in a function, e.g. a
variable to hold temporary values, loop counters etc., and the values are
transient, then declare it as a local variable.

2. If you need to retain global information accessible from multiple functions and
it is not feasible to pass the information as function arguments, then declare it
as a global variable.

3. If the variable is only needed in a function, but the value must be kept across
invocations (e.g. an initialized array of constant content, a counter that does
not reset etc.), then declare it as a static variable inside a function.

4. As with all the reasons to use a global variable, but in addition, if all the
functions that need to access the variable are in a single file, then declare it
as a static variable at the file level of that file.

These rules keep the variable declaration close to where it is used, which is
important in program maintenance.

The next few pages elaborate on these.

Local Variables
If you declare a variable inside a function (without the static or extern
storage class, which will be described later), then it is a local variable. Local
variables do not have fixed locations in memory, and are created on demand only
when the enclosing function is running. New copies are created each time the
function runs, and the copies are destroyed when the function returns. The initial
value of a local variable is random.

Prior to C99, local variables could only be declared after the { of a compound
statement, before any statements within the block. C99 relaxed that, and allows
variable declarations anywhere a statement is allowed.

You may use the auto or register storage class in the variable declaration,
although their uses are archaic and should be avoided. Since you might
encounter them in sample code (not from us :-) ), a brief explanation is in order.
When used, the storage class is specified before the type. For example:

void foo(void)
{
int i; // local variable with no explicit storage
class
register char c; // register storage class
auto unsigned u; // auto storage class

// stuff

}

Notice the placement of auto or register. Originally, register was used as
a hint to the compiler that it should allocate the variable to a machine register if
possible. Since machine registers are faster to access, this is a desirable trait for
frequently used variables. However, most modern compilers use advanced
heuristics for register allocation, and this keyword is no longer needed. auto is
the original way to specify that the variable is probably not important enough to be
allocated in a register, and this is also no longer necessary.

There is one leftover use for the register keyword: any variables declared with
this storage class cannot be used in conjunction with the address-of & operator,
but that’s really minor fallout from the definition and is not much of use either.

for Statement Variable


Another C99/C++ enhancement is allowing variable declaration inside a for
statement:

for (int i = 0; …

When you declare a variable as part of the for statement, the “scope” (described
further in later section) of the variable ends after the for statement body. This just
means that you cannot access the variable beyond the closing } of the for
statement body.

Function Arguments
Function arguments are special cases of local variables. They act exactly like
local variables, except that their declaration form is different:

They are declared inside the set of ( ) after the function name
The argument list is separated by commas, and not semicolons
You cannot write initializers after their names, nor can you use the obsolete
auto storage class
Only one argument variable is allowed per type declaration

For example:

char *strcpy(char *dst, const char *src);

The last bullet means that if there are three int arguments, they have to be
written as follows:

int AddMult(int a, int b, int c);

and not:

// bad syntax
int AddMult(int a, b, c);

Global Variables
If you declare a variable outside a function, and without the static storage
class, then it is a global variable. Global variables have fixed locations in memory.

If a global variable declaration has no initializer, then the variable is initialized to
zero per Standard C requirements.

A global variable declaration may have the extern storage class, but it has two
meanings, depending on whether there is an initializer:

extern nti = 42;

With an initializer like above, then the extern keyword has no effect whatsoever;
it’s exactly the same without it. The second case is explained on the next page.

There are three primary uses for global variables:

1. To represent a persistent global state or a global object. For example, in the
JumpStart API, global variables are used to represent the state of hardware
setup, such as GPIO PORTA, or the I2C etc.

2. To exchange information between functions, as an alternative to using


function arguments.

3. To exchange information between main program code and interrupt handlers.



Since a global variable can be accessed and be modified by any function, its use
should be limited. Consider this scenario: in a function, the code checks for if a
global variable has the value X. In terms of program maintainability, it’s impossible
to know where this global variable might have received this value without
analyzing the full program and ascertaining where the global variable is modified.
Another example:

int global_var;



if (global_var == 1)

foo();
if (global_var == 1)


In the code fragment, after the call to foo, global_var may or may not have the
same value as before the call. There is no way to determine this except to
examine the code for foo, and all the functions foo calls. The code fragment is
of course contrived, but the principles remain.

External Declarations
An external declaration is a declaration with the extern storage class and
without an initializer:

extern int i;

Its purpose is to declare that there is a global (and not a static) variable named i
defined elsewhere (may even be in the same file). This allows a global variable to
be accessed in other files besides where it is defined. It does not need to be
placed at file level either: if an external global variable only needs to be accessed
in a function, the the external declaration can be placed inside the function:

void foo(void)
{
extern int i;


}

Some people prefer this style to limit where the variable is accessed. You may
write a global variable declaration and an external declaration for the same
variable in the same file.

Definition vs. Declaration: In this book, we use the terms declaration and
external declaration. Some writers also use the term “definition” to refer to the
former. However, even using that terminology, a definition also serves as a
declaration, so we feel it is best to use the terminology we are using.

Managing Global Variable Declarations with Header


Files
A common practice is to put all global variables in a single header file that is
#include by all the project source files, written in this form:

(in a header file):

#ifndef EXTERN
#define EXTERN extern
#endif

EXTERN int a_variable;
EXTERN int another_variable;


Let’s say this file is named header.h, then in one and only one file that includes
header.h, write:

#define EXTERN
#include “header.h”

This has the effect that this file, will define all the global variables mentioned in
header.h, and all the other files will declare them as external.

The use of #ifndef etc. will be explained in the chapter <The C Preprocessor>.

Static Variables
A static variable is similar to a global variable except that it is only visible in the
place it is declared:

If declared inside a function, then it can only be accessed within that function
If declared at the file level, then it can only be accessed in the file where it is
declared

A static variable has the static storage class:

static int i;

Like global variables, if a static variable declaration has no initializer, then the
variable is initialized to zero per Standard C requirements.

A function-level static variable could be used to keep track of persistent data that
is only of interest within the function.

Initializers
A variable declaration may be followed by an initializer:

int i = 5;

It is the same as if you have assigned the value before its use:

int i;


// before “i” is used
i = 5;

Using an initializer is more convenient and makes clearer code.

Initializers for aggregate types (array, struct/union) and pointers are explained
in chapter <Types and Declarations>.

Global and Static Variable Initialization


Initialization for a global or static variable must only contain an expression that the
compiler can evaluate at compile time, known as a compile-time expression:
constant (numeric, literal strings, etc.) and addresses of other global and static
variables. You may use simple arithmetic operators but not function calls, nor a
reference to another variable.

Local Variable Initialization


A non-aggregate local variable initialization may contain any C expression,
including a reference to another variable, or a function call etc. However, an
aggregate local variable must only have constant initializers—the same limitations
as initializers for global and static variables.

Variable Name Visibility and Scoping


The scope, also called name visibility, of a variable name refers to where that
name can be used in a program:

Visually:


Block scope needs further explanation: each set of compound statement { }
introduces a new (nested) scope. A variable declared in a block scope is visible
only within that scope and the visibility ends when the matching ending } is
encountered.

The scope rules also dictate when the same name can be reused:

All global variables must have unique names.
File level static variables within a single file must have unique names.
Within a block scope, all static and local variable must have unique names
from each other, but may have the same names as variables declared in an
outer block scope, or at the file scope.

The last bullet means that you can write:

int counter; // global variable

void foo(void)
{
… = counter; // use of global var
float counter; // new local variable with same name

if (counter / 2 > 0)
{
char *counter = { “Counter” }; // new local
variable

}
// “float counter” is in scope again
… = counter // use of “float counter
}

There are three (non-conflicting) declarations of the name “counter”. The
example is for demonstration only, as there is not much point of purposefully
reusing the same name within a set of nested scopes as shown. Reusing a
variable name is most useful for being able to have same name for counter
variables, or have a local variable that just so happens to have the same name as
a global variable without creating a conflict.

Variable Alignment
Come CPUs, such as the Cortex-M0, have strict alignment requirements:

1. 16-bit access must be on a 16-bit (e.g. 2 byte) boundary.
2. 32-bit access must be on a 32-bit (e.g. 4 byte) boundary.

The Cortex-M3 and above have support for unaligned access, but even for those
processors, 16-bit and 32-bit access on a 16-bit boundary are preferred because
access time is shorter. The JumpStart Cortex-M compiler aligns variables at their
natural boundary.


6. TYPES AND DECLARATIONS




This chapter discusses types and declarations. These concepts are intertwining,
[19]
and one cannot talk about types without declarations .

The _Bool Data Type


Before we move on, we need to introduce one more basic data type: The boolean
data type which has the values true or false. Some languages have a boolean
data type for the results of comparison and logical operations. Traditionally, C did
not have a boolean type. Instead the integer values nonzero and zero were used
for true and false conditions respectively. The C99 C Standard introduced the
_Bool data type:

_Bool status;

if (status == 1)


A variable of type _Bool can only have the value 1 or 0, but there is no addition
to the C keywords to include true or false. int expressions can freely intermix
with _Bool variables. However, the data type of the result of a comparison
operator is still an int, and not a _Bool.

Most C compilers provide definitions of true and false in the Standard C
header file stdbool.h. In addition, compilers that only conform to the C90
Standard and not the C99 Standard often would provide an alias for _Bool so
that a program could use _Bool and still be compatible with different compilers.

Derived Types
We have already seen simple declarations such as for int and array variables.
C also allows you to create derived types:

They can be strung together, forming complex declarations if needed:



// “foo” is an array of 10 elements of pointers to function
// returning int
int (*foo[10])();

There is no limit on how you may string derived types together. For example, you
could write a declaration of “a pointer to an array of a structure containing
pointers to functions returning a structure of arrays of unions” and beyond, but
such mind-twisters are probably never required. Indeed, most programming tasks
probably would not go beyond “an array of pointers”.

Pointer Types
When you take the address of an object and store that address in another object,
the second object is said to contain a pointer to the first object. For example,
given the following:

int i = 42;
int *p = &i;

p has the type “pointer to int”. After the second assignment, p contains the
address of i and is said to contain a pointer to i. If you look at the memory cells
containing the variables, they may look like these:

Memory Address Variable Name Content
0x1000 i 0x0001
0x1004 p 0x1000

A pointer may contain the address of a data object or the address of a function. If
you are familiar with machine code programming, this is not a surprising feature,
but this might be a new concept for users familiar with only other high level
programming languages.

Pointers are one of the most powerful features in C, allowing C to be used for
embedded systems or to be used in writing system programs such as Linux, OS X
and Windows. Unfortunately, misuse of pointers also causes the majority of the
bugs in C programs.

Since pointers contain addresses, a proper pointer value must be a valid address.
However, a pointer object may contain an invalid address, as long as the program
is not accessing the object through the pointer (a process known as
“dereferencing”). This is why using pointers can be dangerous: a pointer may pick
up an incorrect value at some point during program execution, but the pointer may
not be dereferenced until in a different section of the program code. This may
cause program to crash even though the code otherwise looks correct.

Pointers can be applied to any type, not just basic types such as int, but also
other pointers, or to a struct etc. For a simple pointer type such as “pointer to
int”, the declaration syntax is

<type> *<name>;

// example
int *p;

More complicated pointer declarations will be described in the section on
“Reading a C Declaration”.

Pointer Initializations
In a pointer variable declaration, you can follow it with an initializer:

int i = 42;
int *p = &i;
int *q = 0;

For a global and static variable, the initializer must be the address of a variable
with the points-to type of the pointer variable, or 0, the null pointer. For example, p
is a pointer to int, and i is an int, therefore, &i has the same type as p.

For a local variable, the initializer may be an address of a variable with the points-
to type, a 0, or any type-compatible expression.

Pointers and Arrays


A pointer variable of “pointer to type X” can hold the address of a variable of “type
X”, or the address of the first element of an array of “Type X”. Its uses depend
entirely on the programmer. There is no difference in the declaration of the pointer
variable.

Pointers and arrays are intimately tied but they are not the same. Beginner C
programmers often have misconceptions about them. Thus we devote a full
chapter <Advanced Topic: Effective Pointer and Array Usage> on them.

char * As “Strings”
C does not have a string data type. Literal strings such as “hello world” have
the char * data type and you may initialize a char * variable with a literal
string:

char *hello = “hello, world”;

The compiler places the literal string in the target memory and the address of the
allocation is assigned to the variable hello.

Array Types
An array type is indicated by putting the array symbols [ ] after the name:

unsigned char array[2];
unsigned char _2D_array[4][2];
unsigned char _3D_array[6[4][2];

array[0] = 1;
_2D_array[0][0] = 1;
_3D_array[0][0][0] = 1;

As shown in the example, multiple dimensions are supported. For each
dimension, the array indexing goes from 0 to the size-1 of the dimension.
Accessing an element of an array is called indexing the array. Array indexes must
have integer types.

The memory is laid out from the “left” dimensions to the “right”. For example, with
the _3D_array, assuming the starting address is 0x1000, the layout looks out
this:

Address Array element
0x1000 _3D_array[0][0][0]
0x1001 _3D_array[0][0][1]
0x1002 _3D_array[0][1][0]
0x1003 _3D_array[0][1][1]
0x1004 _3D_array[0][2][0]
0x1005 _3D_array[0][2][1]
0x1006 _3D_array[0][3][0]
0x1007 _3D_array[0][3][1]
0x1008 _3D_array[1][0][0]
0x1009 _3D_array[1][0][1]
0x100A _3D_array[1][1][0]
0x100B _3D_array[1][1][1]
0x100C _3D_array[1][2][0]
0x100D _3D_array[1][2][1]
0x100E _3D_array[1][3][0]
0x100F _3D_array[1][3][1]
0x1010 _3D_array[2][0][0]
0x1011 _3D_array[2][0][1]
0x1012 _3D_array[2][1][0]


Indeed, for any dimensions of the form [x][y][z], the layout is the same if the
[20]
array is declared as [x*y*z] for any dimension .

Array Initializations
For a one-dimensional array, you enclose the initializers with a set of { } . The
initialized values must be compile-time-expressions: constants (numeric, literal
strings, etc.) and addresses of other global and static variables. You may use
simple arithmetic operators but not function calls, nor a reference to another
variable.

char hello[] = { “Hello” };
int small_table[] = { 1 };
int table[3] = { 1, 2, 3};
int big_table[1000] = { 1 };

As can be seen here:

A char array (e.g. hello) can be initialized by a literal string. The initialization
includes the terminating \0.

An array with no dimension (e.g. small_table and hello) must have an


initializer, and in that case, the compiler computes the array size for the
variable

An array with a specified dimension (e.g. table) can have the exact number
of initialized values

An array with a specified dimension (e.g. big_table) can also have a fewer
number of initialized values. In this case, the compiler fills the rest of the array
variable with zeros

For multi-dimensional array, you enclose each set of the initializers with a set of {
} :

int array[4][2] = { {0, 1}, {2, 3}, {4, 5}, {6, 7} };

The initializers are laid out in memory like this:

0, 1, 2, 3, 4, 5, 6, 7

Which is the same as writing

int array2[8] = { 0, 1, 2, 3, 4, 5, 6, 7 };

As with one-dimensional array, you may skip the leftmost dimension if there is an
initializer:

int array3[][2] = { {0, 1}, {2, 3}, {4, 5}, {6, 7} };

Array: A Second Class Citizen


There are operations which are allowed for the other aggregate data types
(structure and union), that are not allowed for arrays:

You cannot assign an array to another array, whereas you can assign a
structure or union to another one of the same type.

A function cannot return an array of type X, although that behavior can be
approximated with the function returning a pointer to type X.

You may not pass an array of type X to a function, although you can pass a
pointer to type X to the function.

Hence, the array type is sometimes called a “second class citizen” in C. This
situation is mostly historical, and it is too late to fix without changing some
fundamental rules regarding array and pointer type conversion, which would break
existing code.

Pointer-to-Function Derived Type


The last derived type is pointer-to-function-returning. The code for functions is
placed in “code memory” (flash memory for most modern microcontrollers) and
thus each function has a unique address. A pointer to a function is simply the
address of a function object. For example,


extern unsigned char foo(int, int); // external
declaration
unsigned char (*pfunc)(int, int); // declaring a variable


// assign the address of function “foo” to “pfunc”
pfunc = foo;

pfunc is a pointer to a function that takes two int arguments and return an
unsigned char value. To specify the address of a function, simply write the
name of a function, but omit the function call operator ( ) that would normally
follow the name, such as foo above in the assignment statement pfunc =
foo;. Thus, the assignment in the example assigns the address of foo to the
variable pfunc.

Note that the return types and the argument types must match. Otherwise, the
types are not compatible, and you will receive an error diagnostic from the
compiler.

Initializing a pointer-to-function variable is no different from initializing variables of
other types, and their rules and restrictions apply.

enum and Aggregate Types


You can create new types in C using these keywords: enum to create name
aliases to integer values and struct/union to create aggregate data types.

Type declarations using these keywords share similar syntax:



<keyword> <optional-tag> { <member-list> };

The <member-list> format is different between enum and struct/union,
and will be described later. If you use an optional tag, then after the type
declaration, you can declare variables of this new type by writing

<keyword> <tag> <variable-name-1>, <variable-name-2>, … ;

That is, you use “<keyword> <tag>” in place of basic types such as int,
unsigned, etc. For example:

enum color { RED = 0, GREEN, BLUE };
struct id_record {
char name[20];
unsigned id;
enum color eye_color;
};

enum color my_eye_color;
struct id_record employee;

The first declaration declares a new enum type called enum color. The
second declaration declares a struct type name struct id_record. After
the declarations, you can use them to declare variables, such as my_eye_color
and employee.

You can skip the tag name in a type declaration, but then you must either use a
typedef (see later section) or declare all variables of that type in place, as there
is no way to refer to those types afterward:

enum { M0, M3, M4 } stm32f030 = M0, stm32f103;

stm32f103 = M3;

Created types behave the same as built-in types: you can assign a variable to
another variable of the same type, you can pass the variable as an argument to a
function, and a function may return a value of a created type.

The scope of a tag name is the same as with a variable: all the tag names at the
file level must be unique, and a compound statement introduces a new scope for
the tag names.

enum, struct/union tags shares the same name space, that is, all tag names
within a scope must be unique, regardless of whether it is an enum or
struct/union tag.

enum Enumeration Type


Enumeration types, created by using the keyword enum, are integers that you
give names to; not dissimilar to the effect of using #define (which is part of the
[21]
C preprocessor), but with a feature provided by the C language itself .

You write an enumeration starting with the keyword enum, optionally followed by a
tag name, then a list of enumeration members enclosed in a pair of { } . The
member list is a comma-separated list of names. Each member name is
synonymous to an integer value:

// declare an enum type
// Cortex_Model is a tag name
// M0, M1, M3, and M4 are member names
enum Cortex_Model { M0 = 0, M1, M3, M4};

// declare a variable of the type “enum Cortex_Model”
enum Cortex_Model our_model = M3;


void foo(void)
{

if (our_model == M3)


You may use the enum member names (e.g. “M0”) in your program code
anywhere that an integer constant is legal. An enum member name must be
unique among all enum member names and other “ordinary identifiers” visible in
the same source file.

In the declaration, an enum member may optionally be followed by an initialized
value, e.g. “= 0” in the above example. This assigns the value to the name. If
there is no explicit value given, then the member’s value is one plus the value of
the previous member. More than one enum members may have the same value.
The default value for the first member is zero.

In the example above, the values given to each enum member are:

enum member value
M0 0
M1 1
M3 2
M4 3

struct Type
A struct is a collection of members or elements, collected in a single data
structure. The member list looks like a list of variable declarations. Instead of
names of variables, the names are called member, or field, names:

enum color { RED = 0, GREEN, BLUE };
struct id_record {
char name[20];
unsigned id;
enum color eye_color;
};

struct id_record employee1, employee2;

After declaring the type struct id_record, we declare two variables of that
type employee1 and employee2. You access a struct member by using
the . dot notation:

strcpy(employee1.name, “Jame T. Kirk”);
employee1.id = 1;
employee1.eye_color = RED;

Members of a struct must have unique names within that struct, but
otherwise there is no further restriction as long as the name follows the same
rules as naming a variable.

You can also assign a struct to another struct variable of the same type:

employee2 = employee1;

You may also pass a struct type as a function argument, and a function may
return a struct type as its return type:

extern struct id_record AddRecord(char *name, unsigned id);
extern void ProcessRecord(struct id_record);

employee2 = AddRecord(“Spock”, 2);
ProcessRecord(employee2);

Combining Type and Variable Declarations


You may combine a type declaration and a variable declaration together:

enum color { RED = 0, GREEN, BLUE };
struct id_record {
char name[20];
unsigned id;
enum color eye_color;
} employee1, employee2;

In here, the variables employee1 and employee2 are declared along with the
type struct id_record.

Pointer-To struct Member Access


If you have a variable of type “pointer to a struct”, you access the members by
using the -> operator:

struct id_record *pemployee;

pemployee = &employee1;

printf(“Employee name ‘%s’, id %u\n”, pemployee->name,
pemployee->id);

Note that there is no printf format code to print a struct (how could it
possibly?). Therefore, you can only print individual members.

Pointers to struct are important when we discuss dynamic data structures.
The topic of data structures is expanded further in the chapter <Dynamic Data
structures>.

Bitfield Members
Bitfields are members of a struct that occupy a number of contiguous bits. The
type of a bitfield is either int or unsigned. You declare a struct member
as a bitfield by specifying a size after its name:

struct {
unsigned : 28,
V : 1, C : 1, N : 1, Z : 1;
unsigned PC;
} PSR;

The number of bits can range from zero up to the number of bits in the int
type.

Bitfields may be unnamed (: 28 above); in which case, they exist mainly for
padding purposes, or to match unused bits in a hardware IO register field.

The semantic of an int bitfield of size one is undefined, as the single bit is
used to store the sign of the bitfield, and so there is no space to store any
value.

Bitfields can be accessed and operated on like other struct/union members,
except that you cannot take the address of a bitfield member using the C address
operator &.

Bitfields can be used to map to the layout of a hardware IO register. The example
above shows a possible mapping of a “PSR” which contains a processor’s status
flags and the PC register. Be aware that the allocation order of bitfields is not
defined by the C Standard. All JumpStart C compilers allocate bitfields from right
to left. If you use bitfields to map to an IO register, make sure that the allocation
orders match up.

Bitfields can also be used to minimize the use of space, for example, if there are
many one-state status flags, they can be declared as bitfields. However, it takes
more code to access bitfields, and as modern microcontrollers have sizeable
amount of SRAM, this practice is discouraged.

Unlike other members of a struct, or indeed other C variables, you cannot take
the address of a bitfield using the & operator, and the following constraints exist:

A pointer cannot point to a bitfield.

You cannot declare an array of bitfields

You cannot declare a function as returning a bitfield.


Nested Structure Members


A member of a struct can be of any type, including another struct:

struct inner {
int a, b, c;
};

struct outer {
int x, y, z;
struct inner i;
} outer;


outer.i.a = 1;

You use the dot notation to access a nested member, e.g. outer.inner.x.

You may also declare a nested struct inside a struct:


As with all struct declarations, the inner struct’s tag name is optional. In
addition, if the nested struct’s member names do not conflict with the names of
other members of the outer struct, then the member name for the nested
struct (e.g. i on the left hand side) can be omitted as well. Again, see the
example on the right hand side. This is called anonymous struct.

Union Type
Everything that has been written about struct types is applicable to union
types. The only difference between a struct and a union is that a struct is
laid out with the members in ascending memory address order, whereas a union
is laid out with the members at the same starting address.

A union is useful to examine the bit patterns of underlying data types:

union u {
float f;
unsigned char a[4];
} u;

u.f = 3.14159267;
printf(“%f dump: %X%X%X%X\n”, u.a[0], u.a[1], u.a[2],
u.a[3]);

The 4-byte array u.a occupies the same memory as the floating point member
u.f. The printf call prints out the 4 individual bytes of the internal
representation for 3.14159267.

Incomplete struct/union Declaration


There are a couple of situations where you cannot specify the full declaration of a
nested struct/union member; the chief one being that the member may be a
[22]
pointer to to itself .

To solve this, C allows you to write an incomplete declaration, which is just a
struct <tag>:

enum color { RED = 0, GREEN, BLUE };
struct id_record; // incomplete declaration

struct id_record {
struct id_record *next;
char name[20];
unsigned id;
enum color eye_color;
};

In this example, struct id_record contains a member next that is a pointer
to itself, and you must use the incomplete declaration to declare the partial type.
You can declare a variable of pointer to an incomplete type, but not a variable of
the incomplete type, as the compiler needs to know the size and member names
of the type.

Padding and Alignment


Some CPUs, such as the Cortex-M0, have strict alignment requirements:

1. 16-bit access must be on a 16-bit (e.g. 2 byte) boundary.

2. 32-bit access must be on a 32-bit (e.g. 4 byte) boundary.



The Cortex-M3 and above have support for unaligned access, but even for those
processors, 16-bit and 32-bit access on 16-bit boundary are preferred, as access
time is shorter. The JumpStart Cortex-M compilers align variables and structure
members at their natural boundary. For a struct, padding may be introduced
between members to in order to align the members properly.

Sometimes a struct is used for mapping data between an external source with
predefined protocol element layout. In that case, padded elements may cause
problems when exchanging data. You can use the __packed attribute to tell the
compiler to leave the member allocation alone. You will need to make sure you
access the members with the alignment requirements in mind.

Struct { __packed struct {
char a; char a;
unsigned b; unsigned b;
} x = { 1, 2 }; } y = { 1, 2 };

Memory (Byte view) Memory (Byte view)
0x1 0x1
X 0x2
X 0x0
X 0x0
0x2 0x0
0x0
0x0
0x0

With packed structures, you must be careful when accessing unaligned members,
as the JumpStart compiler does not generate unaligned access code.

Struct Initializations
You write an initializer for a struct by enclosing list of initialized values in a set
of { }. The list of values matches the list of members of the structure. The
initialized values must be compile-time-expressions, as with initializers for array
declarations and global / static variables.

Enum color { RED = 0, GREEN, BLUE };
struct id_record {
char name[20];
unsigned id;
enum color eye_color;
};

struct id_record employee1 = {
“James T. Kirk”,
1,
RED
};

Bitfields are treated exactly the same as other struct member. If the initialized
value is too large to fit, the unused upper bits are discarded:

struct {
unsigned a : 2, b : 3;
unsigned c;
}x = { 0xC, 2, 3 };


printf(“%d %d %d\n”, x.a, x.b, x.c);

would output

0 2 3

As a is only two bits, the value 0xC overflows, and only the bottom 2 bits (i.e. 0) is
kept.
Union Initializations
You can only initialize the first member of a union variable, and even though
there is only one value, you still need to enclose it with a set of { } , just like a
struct initializer:

union u {
float f;
unsigned char a[4];
} u = { 3.14159267 };
Global struct/union Declarations
To declare a global variable of a struct/union type, typically you write the
type declaration in a header file, and #include the header file in all the C
source files that access the global variable:

enum color { RED = 0, GREEN, BLUE };
struct id_record {
char name[20];
unsigned id;
enum color eye_color;
};

extern struct id_record employee1, employee2;

extern struct color {
unsigned char R, G, B;
} mycolor;

It makes no sense to prefix pure type declarations such as enum color and
[23]
struct id_record with the extern keyword . On the other hand, the
external variable employee1, employee2 must have the extern keyword for a
proper external declaration.

You can combine an external declaration of a struct variable and the type
declaration of the struct such as the case for mycolor above.

C GEEKERY TOPIC: There is a subtle C interpretation here: within a single file, a
tag name can only be used once to declare an enum, or struct/union. Even
two identical declarations are considered an error:

// within the same file…
struct color {
unsigned char R, G, B;
};

// An error, even if the declaration is exactly the same
struct color {
unsigned char R, G, B;
};

However, this rule is relaxed and in fact turned opposite when it comes to multiple
files: a type declaration across multiple files is assumed to be referring to the
same type is it has the same tag. This allows a type declaration to be put in a
header file, as explained earlier.
Type Qualifiers
A type qualifier specifies additional attributes of a data type.

[24]


A type qualifier may appear to the left or to the right of a * in a declaration. The
rules are simple:

1. If the type qualifier is in the initial part of the declaration and precedes any *,
then it is modifying the base type of the declaration.

2. Otherwise, the type qualifier must appear after a *, and it is modifying the *
pointer to its immediate left.

For example,

int const i = 0;
int *__flash cp = (int * __flash)0x1000;
int __flash * xp = (int __flash *)0x1000;


[25]
i is a “const qualified int”
cp is a “pointer in flash space pointing to an int”
xp is a “pointer (in normal data space) pointing to an int in flash space”

A const-qualified or __flash-qualified variable must have an initializer in the
declaration since it cannot be assigned to due to the const attribute.

C allows you to freely assign a non-qualified pointer type to a qualified pointer:

int i;
int *pi = &i;
int *const cpi = pi;

cpi is “const pointer to int” and therefore the pointer value cannot change further,
but the points-to value (the integer) may be modified.
Compatible Types
The compiler tests whether two types are compatible in many contexts. Two types
are compatible if:

They are the same type.

They are pointer types with the same type qualifiers, and they point to
compatible types.

They are array types with the same compatible element types, and either at
least one array does not have an element count, or both counts are equal.

They are functions whose return types are compatible, and
if parameters are specified on both functions, then the parameter types must
match for all parameters, or
if parameters are specified on one function only, then the parameter types
must not be float or an integer type that changes when promoted (i.e. one of
the smallest integer types such as char or short), or
if neither of their parameter lists are empty

For struct/union or enum types, if both are declared in different source
files, and they have the exact same member names. For struct/union
their member types are compatible, and for enum the values for enumeration
members are the same.

In practice, this means that struct/union and enum types are best
declared in a common header file that is #include’d by all source files that use
those types.

Note that this rule applies to struct/union/enum types declared in different


source files only. If you declare the same struct/union/enum type multiple
times in the same source file, the compiler will flag it as a multiple declarations
error.

It is still possible to mix variables of incompatible types in an expression, but it
may involve using the typecast ( ) operators. See typecast operator in
chapter <Expressions and Operators>.
Typedef
A typedef does not declare or define an object, but provides an alias for a type
declaration. You write a typedef declaration by starting with the keyword typedef
and follow it with a declaration that looks like a variable declaration. The name in
the declaration is the alias name for the type.
You use typedef so that a single name can replace a complex set of type
decorations, including not having to write struct <tag> to refer to a structure
type.

// string is not a built-in type in C
typedef char *String;
String str1, str2;

// creates an alias for a struct decl
typedef struct {
unsigned year, month, day;
} CALENDAR;

CALENDAR cal;
Type Name Declaration
A Type Name Declaration is used inside a type cast operator or a sizeof
operator. It looks like a normal variable declaration, but no storage class specifier
or a name is allowed.

// simple typecast
unsigned char uc;
int i;

uc = (unsigned char)i;

// sizeof operator can take a type name declaration
i = sizeof (int (*)());
Reading a Declaration Using Right-Left-Right Rules
[26]
A C declaration with pointer, function, and array decorations mirrors the syntax
of applying the corresponding pointer dereferencing operator(s), function calling,
and array index operator(s) to an expression. While this has a level of
consistency, it is daunting to a neophyte C user to encounter something like this:

struct x * const (*APFPS[10])(int);

Fear not; some simple rules make it easy to decode (and write) complex C
declarations! First, recall the derived types:

* is a pointer of
[] is an array of
() is a function returning

Next, the type qualifiers: const, volatile, or __flash which either
modify the base type, or
appear to the right of a * pointer symbol.

Another basic rule for reading a declaration: when reading a function type
specifier ( ), skip over the parameter list enclosed by the ( ), as it is
unimportant at this stage.

Here then, are the right-left-right rules:

1. Locate the name of a declaration.

2. Move right, reading off any array [ ] and function ( ) specifiers, until you
hit a terminator: ‘)’, a comma ‘,’ or a semicolon ‘;’.

3. Now move back to the left again, reading off any * symbol(s), along with
any type qualifier(s).
4. If you encounter a ‘)’ in step 2, then stop when you hit the corresponding ‘(‘.
Now repeat from step 2 starting at the rightmost place where you were before
after step 2.

5. If you do not encounter a ‘)’ in step 2, then you will be reading off the base
type.

6. You are done!



Let’s look at an example

The declaration is then “APFPS is an array of 10 pointers to functions returning


const pointers to struct x”.

Nothing to it!
7. FUNCTIONS

Functions contain the executable code of a program. We have discussed some
aspects of functions already: their declarations, definitions, and how to make
function calls. This chapter gathers all the relevant information related to
functions, and some material will be rehashed from earlier material.
In C, a function may return a value and may accept arguments. If the execution
“falls out” of a function without encountering a return statement, the compiler
will insert code such that the function will return to the caller, albeit with a random
return value.

A function is useful for:

Reusing code - the same function can be called with different arguments. For
example, you can write a Celsius to Fahrenheit conversion function, and it can
be called with different arguments to perform the conversion:

float CelsisusToFhrenheit(float F)
{
return (F * 9.0/5) + 32;
}

The code is simply the mathematical formula written in C.

Information hiding (or function abstraction) - closely related to code reuse; a


function hides its implementation from its callers. A well written function, e.g.
one that does not use global variables and has a well-defined argument
interface, means that its implementation can change without impacting the
code that uses the function.

Modular coding - break up your code into manageable chunks. For example,
the sequence of instructions to set up a microcontroller is long and pretty
boring: “turn on these bits”, “enable that other bit”, etc. By putting this code
fragment into its own function, it makes the calling function less cluttered and
easier to manage.
API (Application Programming Interface)
Following on from above, a current widely-used programming practice is to write
programming “components” that are described by its API (Application
Programming interface), allowing other programmers to use the components
without knowing how they are implemented or how they work. A software program
then can be built using these components.
An API typically consists of function prototypes and communication protocols of
the component. For example, JumpStart API is an interface to the low level
Cortex-M microcontrollers’ features.
Function Prototype / Function Declaration
While the C Standard allows you to call a function without specifying the
prototype, runtime errors may result. Therefore, before you call a function, you
should declare its return value and its parameter types. The declaration must
appear lexically prior to the function call, and is sometimes placed in a header file.
C allows you to omit some part of a function declaration, e.g. you may declare the
name and the return type, but not the argument types. This practice is, again, not
recommended. A full function declaration is called a function prototype:

// prototype
char *strcpy(char *dst, const char *src);
extern char *strcpy(char *dst, const char *src);

The general form of a function prototype is:

<return-type> <function-name> ( <parameter-list> );

The storage class extern is optional and does not provide additional meaning.

If the function does not accept any arguments, then the keyword void is used for
the parameter-list. Otherwise, the parameter-list is a list of parameter declaration
separated by commas. Each parameter declaration is in the form of a variable
declaration except that:

1. The variable name is optional

2. You cannot write an initializer

3. Only one argument is allowed per type declaration

4. No storage class except register is allowed (which has no real effect


anyway)
5. The last argument can be three dots “…” used to specify a variadic
[27]
function

Prototype vs. Declaration: Function prototype is a term introduced in C90. The
older term function declaration can mean the same thing, although in the older
usage, a function declaration does not necessarily include the data types of the
parameter list.

Parameters vs. Arguments: Some writers, including the authors of this book,
use these terms interchangeably. Some people use the terms parameters or
formal parameters when talking about a function declaration or the actual function
body, and the terms arguments or actual arguments when referring to the function
call expressions.

Function Definition and Function Prototype: If you write the parameter
declarations with argument names, then the function prototype can be copied and
pasted and made into a function definition by removing the semicolon terminator
and replacing it with the function body. (Or vice versa):

// prototype
char *strcpy(char *dst, const char *src);

// definition
char *strcpy(char *dst, const char *src)
{
char *val = dst;

while (*dst++ = *src++)
;
return dst;
}
Function main
// traditional main() prototype
// int main(int argc, char *argv[]);
//

// embedded system’s main() prototype
// void main(void);
int main(void)
{
while (1)
printf(“hello world\n”);
}

You must provide a function name main in your program. main is the first
function that will be called after the C environment is set up. Traditionally, the
function main has the prototype listed in the comment fragment above.
As C programs originally ran on the command line shells of an operating system
(OS) such as early Unix, DOS or Windows, command line arguments are passed
to the program’s main function using argv and argc. argv is an array of
character pointers, each one pointing to a command line argument (with the first
argument being the name of the program itself), with argc specifying the number
of arguments. main then can process the arguments and act accordingly. When
main exits or returns, the program returns control to the OS with the return value
of main as the exit code.
For an embedded system, there is typically no OS, and there is no place for main
to return to. Therefore main’s embedded prototype differs somewhat. A common
skeletal structure for a C program (or any embedded program really) looks like
this:

int main(void)
{
Setup(); // hardware and other setup
while (1)
{ // loop forever
// do stuff

}
return 0;
}
Function Calls
To make a function call, you write a function designator, i.e. either a name or an
expression that has the type pointer-to-function, and follow it with a list of
parameters (if any) enclosed by ( ):

extern int func(int, int, int);
extern int (*pf)(int, int, int);

func(1, 2, 3);

(*pf)(1, 2, 3);
// same as above. Alternative syntax in C
pf(1, 2, 3);

In the code fragment above with the pointer-to-function expressions, the
dereference operator * and the associated ( ) for grouping purposes are
optional, as indicated in the second example.
Implicit Function Declaration
If there is no prior declaration for a function name, then the compiler will assume
that the function returns an int. If there is no prior function declaration, or if the
declaration omits the parameter declarations, then the compiler assumes the
parameters to have the promoted types of the arguments listed in the call. This is
not a recommended practice; you really should provide the function prototype
before calling it. If the compiler sees a function declaration afterward in the same
source file, then the implicit declaration and the explicit declaration must match.
Inline Functions
C99 and C++ introduced inline functions, which are designated by the keyword
__inline:

static __inline int add(int a, int b)
{
return a + b;
}

void foo(void)
{
volatile int x, y = 1, z = 2;

x = add(y, z);
}

Inline functions are particularly useful for small short functions. An inline function
is a request to the compiler to expand the body of the function “inline” where the
calls are made. This could result in faster and smaller code. For example, for the
assignment in the function “foo” above, the compiler may generate as efficiently
as:

x = 3;

__inline is only a request, and a compiler might not necessarily honor the
request. For example, if the function to be inlined is very complex, the compiler
may decide that it is not worthwhile to perform the inline optimization..
The storage class static can be used as a hint to the compiler that if all the
calls to the function are expanded, then there is no need to generate the function
body, as shown above. If the storage class is not static, then the compiler will
still have to generate code for the function so that any external reference to the
function can be resolved.
Preserving the Function Context
When generating code for the function body, the compiler inserts code to save
and restore the execution context so that the caller can resume without changes
in its environment. The context typically includes some CPU registers and the
content of the stack. These are called the function prologue and epilogue code.
Function Argument Promotions
[28]
In the absence of a function prototype , or when calling a variadic function
(see later section), integer and floating point arguments are promoted using the
same type promotion rules as used for expression evaluation in chapter
<Expressions and Operators>.

Call by Value
Function arguments are expressions and can be evaluated in any order. As each
argument is evaluated, a copy of the argument is made and the copy is passed to
the function being called. This is known as call by value. In the called function,
any changes to an argument are not reflected back to the actual argument
passed:

void foo(int a, int b)
{
a = 42;
b = 0;
}

int main(void)
{
int i = 5, j = 6;

foo(i, j);
printf(“i %d j %d\n”, i, j);

return 0;
}

Even though foo changes the arguments inside the function, they have no effect
on the calling arguments i and j. The output is

i 5 j 6
Passing the Address of a Variable
Sometimes it is useful for a function to be able to modify the values of the
[29]
variables passed to it. In C , you can pass the address of a variable to a
function, and any modification done through pointer dereferencing inside the
function will be reflected in the variable.

void foo(int *pa, int *pb)
{
*pa = 42;
*pb = 0;
}

int main(void)
{
int i = 5, j = 6;

foo(&i, &j);
printf(“i %d j %d\n”, i, j);

return 0;
}

The output is

i 42 j 0
Returning the Address of a Variable
A function may return the address of a variable:

char *foo(void)
{
static char c;

return &c;
}

char *bar(void)
{
static char hello[] = “hello, world”;

return hello;
}

Regardless of whether you are returning the address of a variable or a pointer to
[30]
an array, it is important to not return the address of a local variable .
Otherwise, if you dereference the return pointer value, you will be accessing
memory that is no longer valid.
Using the const Type Qualifier for an Argument
If a function has a pointer type parameter, but it will not modify the contents of the
pointed-to object, then it should use the const type qualifier in the parameter
declaration. This differentiates the case where an argument may be modified
through its address, as explained in the previous page. Using const type
qualifier serves both for documentation purposes and also to allow the compiler to
perform better optimization.

extern char *strcpy(char *dst, const char *src);

char buffer[10];
strcpy(buffer, “Hello”);
Passing and Returning Structures
You can pass a struct type to a function. Since arguments are passed by
value, there is some overhead in making a copy of the struct when calling the
function. If space usage or execution time is a concern, it is more efficient to pass
a pointer instead:

enum color { RED = 0, GREEN, BLUE };
struct id_record {
char name[20];
unsigned id;
enum color eye_color;
};

extern void foo(struct id_record x);
extern void bar(const struct id_record *px);

Calling foo would require more instructions and thus take longer. With the
modern fast microcontrollers, it should not matter much, but it is a simple
optimization and a lot of programmers still use it. The only possible problem is that
the called function may modify the content of the struct, but a well-defined
interface using the const qualifier eliminates the problem.
The same discussion applies to returning a struct from a function as well: there
are some space and time advantages to return a pointer to a struct rather than
a struct. The choice is up to you. Do remember that: if you return a pointer to a
struct, the struct should not be a local variable.

EXERCISE: why is it okay to return a local struct variable, but not the address
of a local struct variable?

C With Classes
JumpStart C implements C++’s member function features. We do not expect
users to define their own functions using “C with Classes”, as this is added
primarily intended to support the JumpStart API and is not a complete C++
member function implementation.
Member function declarations are written inside a struct. The struct must
have an alias name using typedef. To define a member function, the C++ syntax
is used: the name of the function is preceded with the struct’s alias name,
followed by the function name. Function calls are done using a normal struct
member reference syntax.

typedef struct {
void *vp;

void SetPins(JSAPI_GPIO *port, unsigned pin_number);
int ReadBytes(unsigned char* data, int len);
} JSAPI_I2C;

A sample definition; notice the use of the struct alias name as prefix to the names:

void JSAPI_I2C::SetPins(JSAPI_GPIO *port, unsigned
pin_number)
{
// function body here …
}

Some sample calls:

extern JSAPI_I2C i2c1;

i2c1.SetPins(&porta, 8);
unsigned char buffer[10];
i2c1.ReadBytes(buffer, 10);
Recursion
Recursion is the term for when a function calls itself. Many mathematical formulas
are written in recursive form. For example, the Fibonacci sequence starts with the
sequence {0, 1}, and each following number in the sequence is produced by
adding the previous two numbers together:

0, 1, 1, 2, 3, 5, 8, 13, 21, 34, …

The Fibonacci sequence can be implemented in C as a recursive function:

int Fibonacci(int n)
{
if (n <= 1)
return n;
return Fibonacci(n-1) + Fibonacci(n-2);
}

Most recursive functions can be written in non-recursive form as well, by using a
loop:

int Fibonacci(int n)
{
int last1 = 1,
last2 = 1;
int val = 1;
int i;

if (n <= 1)
return val;

for (i = 2; i <= n; i++)
{
val = last1 + last2;
last2 = last1;
last1 = val;
}
return val;
}

Note the loop index in the loop version goes to “<= n” and not “< n”.

Generally speaking, recursive functions may run slower than non-recursive
versions, due to the overhead of function calls. Recursive functions are also
discouraged or banned by some embedded system programming guidelines, such
as the MISRA rules. The rationale is that recursive functions may exceed the
available stack space and cause programs to crash. However, often the recursive
version is easier to read and maintain, without the introduction of extra variables
as shown in the example. If you are not bound by MISRA or other guidelines
forbidding the use of recursion, then it may still be a good choice, depending on
the programming requirements.
Interrupt Handlers
An embedded system frequently has to deal with interrupts. Indeed, some
embedded system firmware is written as nothing but interrupt handlers!
Depending on the target devices, most compilers allow interrupt handlers to be
written as C functions.
In some devices such as the Cortex-M, interrupt handlers are just normal
functions and do not require any special function entry or exit code. In this case,
you only need to associate the address of the handler function with the interrupt
vector table, either through the NVIC (Nested Vectored interrupt Controller) or the
hardware vector table. Please refer to your device and compiler manuals for
details.
In other devices, such as the Atmel AVR, interrupt handlers require special entry
and exit instructions. To inform the compiler that a function is an interrupt handler
and thus needs special code generation, the compiler usually provides special
#pragma or extended keywords for this purpose. For example, in JumpStart C for
AVR, you may write:

#pragma interrupt_handler timer_isr:20

void timer_isr(void)
{

}

Some attributes of well written interrupt handlers are:

An interrupt handler should be reentrant. (See next section.)

An interrupt handler should limit resource usage. For example, it should not
use too much stack space.

An interrupt handler should finish as quickly as possible, so as not to pause
the main application for too long.
For example, instead of doing processing in an interrupt handler, the handler
can set a flag which would cause the main application to eventually perform
the job.

An interrupt handler should be careful in calling other functions, as they may
use too many resources (stack space) or take a long time to run. Also, the
compiler might have to save and restore more CPU registers if a handler calls
other functions.
Reentrant Functions
A reentrant function is a function to which another call to the same function can be
made while a prior call to that function is still running, yet these multiple entrances
to the function will not affect either function call’s operations.
A properly operating recursive function is an example of a reentrant function. In an
embedded system, an interrupt handler may also call a function while it is already
running in the main program context. Therefore, an interrupt handler should only
invoke reentrant functions. In a RTOS (Real Time Operating System)
environment, task functions should also be reentrant.
The main attribute of a reentrant function is that it must be careful in accessing –
and especially writing to – global or static variables. For a contrived example:

void swap(int *p, int *q)
{
static int tmp = *p;

*p = *q;
*q = tmp;
}

“swap” is not reentrant because it uses a static variable as a temporary storage. If
it is interrupted and called again, then the variable tmp might pick up an incorrect
value:

(Primary program execution):
1. swap() is called
2. “tmp = *p” is executed
(Interrupt occurs, primary execution paused, interrupt
handler running)
3. interrupt handler calls swap()
4. “tmp = *p” is executed
5. rest of swap() runs
(Interrupt handler returns, primary execution resumes)
6. rest of swap() runs in primary execution
7. “*q = tmp” executed

The value of tmp at step 7 is no longer the same as the one which the primary
execution context saved in step 2, and thus an incorrect result occurs. A simple fix
for this case would be to simply declare “tmp” as a local variable instead of as a
static variable.
Also, to be legitimately reentrant, a function should not call other functions that
might not be reentrant.
Lastly, a reentrant function cannot use a locking mechanism to prevent access to
global resources. For example, a function that disables and re-enables an
interrupt is not reentrant.
Variadic Functions
A variadic function is a function that takes a variable number of arguments. In C, a
variadic function must declare at least one argument. printf is an example of a
variadic function:

extern int printf(char *fmt, …);

unsigned char uc;
unsigned short us;
unsigned int ui;

printf(“Hello World %c %d %d %d\n”, uc, uc, us, ui);

// is exactly the same as if you have written:
printf(“Hello World %c %d %d %d\n”, (int)uc, (int)uc,
(int)us,
ui);

The “…” tells the compiler that zero or more arguments may follow at that
argument location. The last argument before the “…” must not have a data type
that is not a promoted type. That is, if it an arithmetic type, then it must be one of
int, unsigned int, long, unsigned long, double or long double.
When making a function call, the arguments that are in the variadic locations are
promoted to their natural types.
When you make a call to a variadic function, the variadic arguments are usually
pushed onto the stack in declaration order.
Variadic Functions: Accessing the Arguments
In the function definition, you use macros defined in the C standard library header
file stdarg.h to access the variadic parameters:

Example:

void PrintInts(int nargs, …)
{
va_list ap;

va_list(ap, nargs); // initialize va with the last
// argument

for (int i = 0; i < nargs; i++)
{
int n = va_arg(ap, int);
printf(“next arg: %d\n”, n);
}
va_end(ap);
}

int main(void)
{
PinrtInts(4, 0, 1, 2, 3);
return 0;
}

would output:

next_arg: 0
next_arg: 1
next_arg: 2
next_arg: 3

Remember that for a variadic function, the arguments are promoted according to
the Type Promotion rules described earlier.
8. THE C PREPROCESSOR

The C preprocessor takes a source file and produces a set of C tokens, which
looks similar to the original source file. Given a source file, the C Preprocessor
performs the following tasks:

A line ending with a \ (backslash) is concatenated with the following line. This
is most useful with the #define directive, see below.

Comment text block sections enclosed in /* */ are replaced by a single space
(i.e.: they are “stripped” out of the code, as they have nothing to do with
running the program).

A single-line comment starting with // is replaced by a single space.

A line that starts with # is checked to see if it is a C preprocessing directive. If
so, the directive will be carried out.

A legal C token is checked to see if it is a macro invocation, AKA a macro
expansion. If so, the token is replaced by its expansion.

Each of these steps are described below. The C Preprocessor operations are
relatively simple, but misunderstanding its features can lead to errors. In
particular, an important point is that the C Preprocessor is a textual processor, and
it does not know the C language per se. We will see how that could give rise to
subtle program bugs.
Backslash \ Concatenation, or Line Continuation
C Preprocessor commands are known as directives. Directives start with the ‘#’
symbol and must be contained within a single line. However, sometimes it is more
convenient and less cluttered if the content of the directive can be placed on
separate lines.
To do this, if the last character in a line is a backslash \, it specifies that the line is
to be concatenated with the following line. For example, you can define a multiple-
line macro:

#define ADDMULT(x, a, b, c) \
(x) = ((a) + (b)) * C)

With multiple lines, the definition visually looks quite similar to normal C code
instead of a single, possibly overly-long line.

Comments
For readability and maintainability, it is useful to add comments to source code
regarding what behavior a particular section of code is doing. A single line of
commentary in a source code file is prefaced by //, and does not require any
terminating character other than a newline; a large block of comment text that
spans multiple lines is enclosed in the symbol pair /* */.

Examples:

int i; // A single line of commentary in a C source code
file.

/* I am multiple lines of commentary that go into depth to
explain something in the code so that it will be clearer
when another person wants to figure out what this section
of code is doing */

After seeing the initial /*, the C preprocessor considers the next */ it sees as the
ending delimiter for the comment block. In other words, nested comment blocks
are not supported.
List of Standard C Processor Directives
[31]
Lines starting with the # character are interpreted as C preprocessor
directives. Compiler-specific extensions are usually specified using #pragma
directives.

Predefined Macros

Standard C specifies that all the following macros be defined in a C standard


conforming compiler before the C Preprocessor processes a source file. (Macro
Invocation or expansion is described later). “Current” refers to “at the time of
compilation”:

__DATE__ expands to a string literal of the current date.
__FILE__ expands to a string literal of the current filename
(not including the file path prefix).
__LINE__ expands to an integer matching the current line number
in the file (line numbering starts with 1)
__STDC__ expands to the constant 1.
__TIME__ expand to a string literal of the current time in the form
“hh:mm:ss”.

Additional JumpStart C Predefined Macros
A compiler may provided additional predefined macros. JumpStart C provides the
following:

__IMAGECRAFT__
expands to the constant 1.

__ICC_LICENSE
expands to the constants 0, 1 or 2; for demo or non-commercial compiler
licenses, a standard license, or a professional license respectively.

__ICC_VERSION
expands into an integer constant of the form 8xxyy, where xxyy is the 4-digit
minor version number of the compiler, e.g. 81600

__BUILD
[32]
expands into an integer constant representing the build number. This is
defined by the IDE. The build number starts with one, and increments each
time a build is performed. The build number is also written to the .mp map file.

The IDE predefines the identifier used in the Device list in the “Build Options -
Target” dialog box. For example, STM32F030R8 is predefined when you select
that device as your target. This allows you to write conditional code based on the
device.

Finally, a product-specific macro is defined:

__AVR expands to 1 by JumpStart C for AVR
__Cortex expands to 1 by JumpStart C for Cortex-M

#include - File Inclusion
You use the #include directive to insert the contents of the included file in the
location of the #include directive:

#include <stdio.h> // example of a system include
file
#include “myheader.h” // example of a user include file
#include filename // example of a macro

Files enclosed between < > are system include files. System include files are
located in the standard system directory. For example, in Unix and Linux systems,
system files such as stdio.h and stdlib.h are placed in the
c:/usr/include directory.

For JumpStart C compilers on a Windows platform, the “standard system
directory” is located at c:\<installation root>\include\

Files that are enclosed between ” ” (double quotes) are user include files. They
can be located in any directory.

If the filename is not enclosed in a < > or ” ” pair, then the C Preprocessor will
expand it if it is macro name (see the later section on macro processing):

#define file “myheader.h” // defining the macro
#include file // macro for Preprocessor to
expand

The macro in the second line above expands to:

#include “myheader.h”

after which the Preprocessor inserts the contents of the file “myheader.h” at that
location.
While it is not a formal requirement, header files are usually given the .h
extension. A header file does not normally contain executable code (i.e. function
definitions), and is usually #include’d by multiple source files. It may contain
things like:

#includes of other header files

#defines of commonly used macros

definitions for shared structures and unions

shared typedef definitions

extern declarations of global variables

extern declarations of functions



Filename specifications are dependent on the underlying OS. For Windows, the
formats are:

Absolute path - <drive>:\path\to\file

Relative path - ..\path\to\file


.\path\to\file
path\to\file
file

In a relative path, one dot “.” Refers to the current directory and two dots “..” refers
to the “parent” directory.

#include: Order of File Search
To locate a #include header file, the C Preprocessor follows the following
rules. First, If the file name is an absolute path, then the specified path is used.
Otherwise, for a relative path file name:

1. If the < > is used in the #include directive, then search the files in the
standard system include directory. If the file is not found, then

2. Search for the file in the directory where the enclosing file is located as the
starting directory, and apply any relative path specification (e.g. “..” to reach
the parent directory). If the file is still not found, then

3. If the compiler allows the users to add additional include file paths, search
them in the order specified.

For JumpStart C, include path(s) are specified through the Project->Build Options
dialog box.
#define - Simple Macro Definitions
Use #define to define a simple substitution macro:

#define <name> <definition>

Examples:
#define PI 3.1415926
#define TABLE_LENGTH 10

Any whitespace between the macro name and the definition is ignored and is not
part of the definition, as is any whitespace after the last non-whitespace character
in the definition.
A macro definition is terminated by a newline. In particular, do not put a C
terminator semicolon ‘;’ at the end of a macro definition, as a preprocessor macro
does not require it.
A macro definition can be removed by using the #undef directive. The scope of
the macro runs from the place where it is defined until the end of the current
source file being processed, or until the Preprocessor encounters a #undef for
the named macro.
Unless placed in a header file that is included by multiple source files, a macro
definition in one source file is not visible to any other source files.
Simple Macro Invocation
Whenever a macro name is seen in the source file after the #define statement,
then its occurrence is replaced by the definition text. This is called a macro
invocation. A macro invocation may result in another macro name, which will then
itself be expanded. However, the C Preprocessor remembers the chain of macro
names being expanded, and will terminate the expansion if the same macro name
is invoked more than once (recursive definition).
Simple macro definition is useful to replace frequently used sequences. For
example, if you declare an array of 10 elements, it is best to use a macro name to
represent the array size, as it might appear multiple times in the code:

#define TABLE_LENGTH 10
int table[TABLE_LENGTH];

for (int i = 0; i < TABLE_LENGTH; i++)


Using a macro name allows the array size to be changed if specifications change,
without modifying all the places where the array size appears.
CAUTION: Macro Abuse!
BAD PROGRAMMING PRACTICES: Some C programmers like to define “more
user friendly” names to make it “easier to remember” some of the C operators; for
example, some common macros are:

#define AND &&
#define OR ||

//now they can write
if (a == b AND c != d OR a < d)


In our opinion, these macros do not serve any useful function: it makes it harder
for someone else to understand your “dialect” of C, and it makes it harder for you
to understand other people’s code.

Another abuse is to change a definition without changing the macro name. Most
of the time, doing this is not necessarily a bad practice, however, something like
this was actually found in production code:

#define FIVE 4

Using a macro like “FIVE” instead of simply writing “5” was probably a Bad Idea™
to begin with. It’s NOT the same as “#define NAME_LEN 5” where the name
provides some sort of clue on how it might be used. FIVE is just not informative.

The worst though is that the constant was obviously changed, perhaps to work
[33]
around An Issue , but the macro name remained unchanged. This
demonstrates a profound lack of consideration for code maintenance on the part
of the programmer.
#define: Function-Like Macros
Before the introduction of inline functions as a feature of the C language, using a
function-like macro was a way to simulate this feature:

#define <name>(<optional-args>) <definition>

Example:

#define max(a, b) (((a) > (b)) ? (a) : (b))

A function-like macro differs from a simple macro in that a list of macro arguments
separated by a comma and enclosed by a set of parenthesis ( ) follows the
macro name. Unlike a C function declaration, the argument list is just a list of
names, without any type declarations.
Function-Like Macro Invocation
After the function-like macro #define statement, if the macro name is seen in the
source file, and if it is followed by a set of ( ), then its occurrence including the
arguments (at the invocation point, they are known as actual arguments) is
replaced by the definition text. The number of actual arguments must match the
number of arguments in the macro definition.

During the substitution phase, any occurrence of an argument in the definition text
is replaced by the actual argument. For example, the macro definition #define
max(a, b)in the previous example:

int x, y, z;
x = max(y, z);

is expanded as:

x = (((y) > (z)) ? (y) : (z));

[34]
BEST PRACTICE: It must be emphasized, again , that the preprocessor is a
textual processor, i.e. the definition text replaces the original text without
considering the validity of the definition text as C code. This is why in the definition
text, you should always enclose any reference to an argument in a set of ( ), as
shown in the example above. It is also common to enclose the entire definition (if
it is an expression) in a set of ( ). This will prevent any unintended effects after
expansion.

For example, given a macro definition of “mult” macro as below:
#define mult(a, b) a * b

int x, y, z;
x = mult(y + z, y);

the macro expands to:
x = y + z * y;

According to C precedence rules, this expression is the same as:
x = y + (z * y);

even though the programmer probably meant the expression to be interpreted as:
x = (y + z) * y;

To fix this, simply enclose arguments in the macro definition with sets of
parentheses, and the desired effect will result:
#define mult(a, b) ((a) * (b))
Function-Like Macros vs. Inline Functions
With the introduction of inline functions, function-like macros should only be used
if special features (e.g. tokenization or stringizing, explained in later sections) are
needed, or if the macro definition is short. Otherwise, inline functions should be
used.
Variadic Macros
Just like a C function, a C macro may specify a variadic argument list:

#define mac1(a, …) foo(a+1, __VA_ARGS__)
#define mac2(…) bar(__VA_ARGS__)

mac1(1, 2, 3) foo(1+1, 2, 3)
mac1(a, b) foo(a+1, b)

mac2(1, 2, 3) bar(1, 2, 3)
mac2(1) bar(1)

Variadic macros allow a variable number of actual arguments to appear when the
macro argument is specified as “…” (three dots). In the macro definition, the
special word __VA_ARGS__ is replaced with the variadic actual arguments.
ADVANCED TOPIC: Tokenization
In a macro definition, a new token is created if two tokens are separated by the
character sequence ##. For example, writing a##b is the same as if you have
written the token ab. This is called tokenization or token pasting:

#define make_add(type) \
type add_##type(type a, type b) { \
return (a) + (b); \
}

make_add(float)
make_add(int)

The macro invocations expand to two function declarations:

float add_float(float a, float b) { return (a) + (b); }
int add_int(int a, int b) { return (a) + (b); }

Tokenization is useful in creating multiple functions from a basic template, as
shown above.

ADVANCED TOPIC: Stringizing
In the definition of a function-like macro, if a macro argument is prefixed with a
single # character, then a literal string is created with the actual argument at
invocation:

#define make_string(a) #a
#define concat(a, b) #a “ “ #b

make_string(hello world) “hello world”
concat(hello, world) “hello” ” ” “world”
“hello world”

In the “concat” macro, the definition also takes advantage of the C feature that if
two literal strings appear adjacent to each other, they will then be merged into a
single string.
#undef - Undefine a Macro Name

#undef directs the C Preprocessor to “forget” the macro definition of the macro
name. It is not considered an error to #undef a name that has not yet been
defined. #undef un-defines both simple and function-like macros.

#undef <name>

Example:
#undef max

Conditional Processing
These directives allow conditional processing of the source file. This must start
with an “if test directive”, and the conditional group of lines is terminated by
#endif at the other end. Before the ending #endif, there might optionally be an
“else directive” line group.

An “if test directive” is one of #if, #ifdef (“if defined”), or #ifndef (“if not
defined”). For readability, on this page, #if is used to represent any of these
directives.

An “else directive” is either #else or #elif (“else if”). For readability, on this
page, #else is used to mean either of these directives.

A conditional processing group is as follows:

#if <condition>
<a line group that will only be processed by the
Preprocessor if the condition passes>

#else
<an (optional!) line group that will only be processed
if the condition fails>
#endif

Conditional processing can be nested, and each #endif is always paired with the
immediately preceding #if directive. While in the actual source code,
indentations do not matter; the indentations in the examples show the groupings:

#if <condition>
<stuff>
#if
<…stuff…>
#else
<…some different other stuff…>
#endif
#else
<stuff>
#endif
Conditional Test with #if <expr>
If <expr> is evaluated to nonzero, then the line group following until the
corresponding #else or #endif is processed by the Preprocessor. Otherwise,
the entire line group, including any nested conditional group, is skipped.

When processing the <expr>, the following actions are performed:

Given a defined(<name>) or defined <name>:

If there is a macro definition for “<name>”, then it is replaced with the value 1.
Otherwise, it is replaced with the value 0.

Given a <name>:

If <name> has a macro definition, it is expanded. Otherwise, it is replaced with


the value 0. After a macro expansion, the expanded text is processed again.
For example:

#define A B
#define B 1

//So:
#if A → expands to #if B

// And then:
#if B → expands to #if 1

If there are any C-style operators, they are evaluated using C rules.


For example:
// These two are usually defined externally in the IDE, or
as
// predefined macros
#define __ICC_VERSION 81020
#define DEBUG 1

// In the source file:
#if __ICC_VERSION >= 81000 && defined(DEBUG)
expands to #if 81020 >= 81000 && 1

// And then:
#if 81020 >= 81000 && 1
evaluates to #if 1 && 1

// And then:
#if 1 && 1
evaluates to #if 1

Notes:
The sizeof operator is not allowed in the <expr>; e.g. you cannot write:

// could be useful, but sadly it is invalid


#if sizeof(unsigned int) >= sizeof(char *)

Likewise, a typecast operator is also not valid; e.g. you cannot write:

#define PI 3.1415926
#if (int)PI >= 3
Conditional Test with #ifdef / #ifndef <name>
#ifdef <name> is the same as writing #if defined(<name>)

#ifndef <name> is the same as writing #if !defined(<name>)
Conditional Test with #else / #elif <expr>
#else starts the alternate line group of a conditional group. #elif <expr> is a
shorthand of writing #else followed by a #if/#endif directive pair:

#elif <expression>
<stuff>
#endif

This is the same as writing

#else
#if <expression>
<stuff>
#endif
#endif
Conditional Test End Marker #endif
#endif ends the conditional group.
#warning <message> - Output a Warning Message
#warning <message> causes the C Preprocessor to output the <message> as
a warning. For example:

#if __ICC_VERSION < 81000
#warning “Out-of-date compiler is being used”
#endif

When processing this code fragment, if the conditional succeeds (evaluates to 1),
then the C Preprocessor outputs the message.
#error <message> - Output an Error Message
#error <message> causes the C Preprocessor to output the <message> as an
error diagnosis. For example:

#if __ICC_VERSION < 81000
#error “Out-of-date compiler is being used”
#endif

When processing this code fragment, if the conditional succeeds (evaluates to 1),
then the C Preprocessor outputs the message.

Unlike with a “mere warning”, the C Preprocessor may then terminate and return
with an error status.
#once - Include a File Once Only
#once directs the C Preprocessor not to process this file again. This can be
placed anywhere in the included file. This is usually used in a header file so it will
not be processed more than once, which could cause problems, depending on the
content of the include file. Without this directive, a previously common idiom
(sometimes still found in older code) that was used to prevent this behavior is:

// assuming the include file is named this_header.h
#ifndef THIS_HEADER_H
#define THIS_HEADER_H

// actual content of the file
#endif

The macro name “THIS_HEADER_H” should reflect the actual filename of the file,
to prevent name collision with other header files.

Using #once simplifies the process.
ADVANCED TOPIC: #pragma - Compiler Specific
Extensions
#pragma (derived from the word “pragmatic”) provides a method for a C compiler
to define product-specific extensions. As such, there are no generic #pragma
descriptions. If you are not using JumpStart C, please refer to your compiler
manual for specific list of supported pragmas.

List of (Incomplete) JumpStart C Pragmas

#pragma ignore_unused_var <name1> <name2> …
Normally, the JumpStart C compilers issue a warning if a local variable is not
referenced. This pragma prevents the compiler from issuing such diagnostics
on the listed name(s).

#pragma warn <message>
Same as #warning <message>

#pragma once
Same as #once

#pragma interrupt_handler <func1>:<vec1> <func2>:<vec2> …

Specific to JumpStart C for Atmel AVR, this directs the compiler to associate
function names as interrupt handlers, since interrupt handlers require different
entry and exit code than normal C functions. This pragma MUST precede the
definitions of those functions.

#pragma ctask <func1> <func2> …

Directs the compiler to treat the named functions as “C tasks” for a


multitasking kernel, which requires different entry and exit code than normal C
functions. This pragma MUST precede the function definitions.
_Pragma(<string>) - Compiler Specific Extension

_Pragma(<string>) is the same as writing #pragma <string>:

// The following are equivalent
_Pragma(“ctask foo bar”)
#pragma ctask foo bar
ADVANCED TOPIC: #line - Specify Line Number
This is normally not used by most programmers. It is included here for reference
only.

#line <line>
#line <line> <filename>
change the line number and the file name used for compiler diagnostic messages.

The C Preprocessor itself emits these directives when processing include files so
that the C compiler proper can reference the correct line numbers and filename
when issuing diagnostic error messages related to the C code itself.

9. THE STANDARD C LIBRARY

The Standard C Library is a major part of the C programming environment. The C
language itself is relatively simple, but its usefulness is greatly enhanced by the
functions in the Standard C Library. Functions exist to manipulate strings and
dynamic memory, and to provide floating point functions, among others.
The C compiler automatically links a user-written application with the Standard C
Library. The Standard C Library function prototypes are declared in a set of
header files, located in the system include directory, which by default for
JumpStart C is the location c:\<installation root>\include. Note that
you might add header files to the JumpStart C IDE project file list; however, this is
for documentary purposes only, as this does not affect your project source code.
You still must use the C Preprocessor directive to include header files in your
code.
Header Files
The following Standard C header files are supported. Per C rules, you will get a
warning from the compiler if you use a library function but do not #include the
header file (which contains the function prototype). If you fail to #include the
header, your program may fail at runtime since the compiler must know about the
function prototype in order to generate correct code.

The following are the specific header files and functions provided by the
JumpStart C compilers. Standard C further defines other header files that are not
particularly important to embedded system programming and so are not currently
included in JumpStart C.

assert.h assert(), the assertion macros

ctype.h character type functions

float.h properties of floating point representations

limits.h properties of integer type representations

math.h floating-point math functions

setjmp.h transfer program control bypassing normal function calls and returns

stdarg.h support for variable argument functions

stddef.h standard defines

stdio.h standard I/O (input/output) functions

stdlib.h standard library including memory allocation functions

string.h string manipulation functions

time.h time manipulation functions



Remember to use the line #include <header.h> in your source file for
each header file you need before referencing the functions within them.

The following sections describe the Standard C Library functions within each
header file.

assert.h - Assertion Macros and Functions
assert.h contains the assert() macro.

assert(expr)
Test “expr” against zero. If expr is zero (false), this function prints out a
diagnostic message indicating the location in the code where the assert()
macro was placed, and then exits.

This is defined as a C Preprocessor macro, as it expands to a library function


call passing the filename and line number using the C Preprocessor built-in
macros __FILE__ and __LINE__.

The printing is done using stdio output mechanisms (see stdio in a later
section.) The message is generally in the following form (there might be
variations depending on the compiler):

assertion Error: “<expr>” in file <file> and line <line>


ctype.h - Character Type Functions
The following functions categorize input according to the ASCII character set.

int isalnum(int c)
returns non-zero if c is a digit or alphabetic character.

int isalpha(int c)
returns non-zero if c is an alphabetic character.

int iscntrl(int c)
returns non-zero if c is a control character (for example, FF, BELL, LF).

int isdigit(int c)
returns non-zero if c is a digit.

int isgraph(int c) )
returns non-zero if c is a printable character and not a space.

int islower(int c)
returns non-zero if c is a lowercase alphabetic character.

int isprint(int c)
returns non-zero if c is a printable character.

int ispunct(int c)
returns non-zero if c is a printable character and is not a space or a digit or an
alphabetic character.

int isspace(int c)
returns non-zero if c is a space character, including space, CR, FF, HT, NL,
and VT.

int isupper(int c)
returns non-zero if c is an upper-case alphabetic character.

int isxdigit(int c)
returns non-zero if c is a hexadecimal digit.

int tolower(int c)
returns the lower-case version of c if c is an upper-case character. Otherwise it
returns c.

int toupper(int c)
returns the upper-case version of c if c is a lowercase character. Otherwise it
returns c.

float.h - Properties of Floating Point Representations
float.h defines various properties of the floating point representations specific to
the compiler and the target machine.

FLT_RADIX
this is the radix of the floating representations. JumpStart C defines this as 2,
as it would be for most compilers for 2s complement targets.

FLT_ROUNDS
this describes the rounding mode being used: it is -1 if the mode is
indeterminate, 0 if rounding is toward zero, 1 if rounding to nearest
representable value, 2 if rounding toward +infinity, and 3 if rounding toward -
infinity.

JumpStart C defines this as -1.

Double specific macros:


An equivalent set exists for float data type:
limits.h - Properties of Integer Type Representations

The following macros are defined. Note that there is no <unsigned type>_MIN, as
that is always zero.


math.h - Floating-Point Math Functions
The following floating-point math routines are supported in this header file.

float asinf(float x)
returns the arcsine of x for x in radians.

float acosf(float x)
returns the arccosine of x for x in radians.

float atanf(float x)
returns the arctangent of x for x in radians.

float atan2f(float y, float x)
returns the angle whose tangent is y/x, in the range [-pi, +pi] radians.

float ceilf(float x)
returns the smallest integer not less than x.

float cosf(float x) )
returns the cosine of x for x in radians.

float coshf(float x)
returns the hyperbolic cosine of x for x in radians.

float expf(float x)
returns e to the x power.

float exp10f(float x)
returns 10 to the x power.

float fabsf(float x)
returns the absolute value of x.

float floorf(float x)
returns the largest integer not greater than x.

float fmodf(float x, float y)
returns the remainder of x/y.

float frexpf(float x, int *pexp)
returns a fraction f and stores a base-2 integer into *pexp that represents the
value of the input x. The return value is in the interval of [1/2, 1) and x equals f
* 2**(*pexp).

float froundf(float x)
rounds x to the nearest integer.

float ldexpf(float x, int exp)
returns x * 2**exp.

float logf(float x)
returns the natural logarithm of x.

float log10f(float x)
returns the base-10 logarithm of x.

float modff(float x, float *pint)
returns a fraction f and stores an integer into *pint that represents x. f + (*pint)
equal x. abs(f) is in the interval [0, 1) and both f and *pint have the same sign
as x.

float powf(float x, float y) )
returns x raised to the power y.

float sqrtf(float x)
returns the square root of x.

float sinf(float x)
returns the sine of x for x in radians.

float sinhf(float x)
returns the hyperbolic sine of x for x in radians.

float tanf(float x) )
returns the tangent of x for x in radians.

float tanhf(float x)
returns the hyperbolic tangent of x for x in radians.
setjmp.h - Transfer Program Control
These functions transfer program control bypassing the normal function calls and
returns.

jmp_buf
a typedef type, for declaring a variable to be used for storing the execution
context. The actual type is compiler and target device dependent. For
example:

jmp_buf env;

int setjmp(jmp_buf env)
stores the current execution context into “env”, and immediately returns zero.
This function may “return” again via a longjmp call, but in that case the return
value is guaranteed to be nonzero. setjmp’s function is basically just to “set up”
a location in the code for a later longjmp (described below) to be able to
“return” to during runtime.

Setjmp example:

if (setjmp(env) == 0)
<normal processing>

void longjmp(jmp_buf env, int val)
causes execution to “jump” back to the location of the setjmp call, as if the
setjump call had just returned (but this time with a non-zero return value).

longjmp should never be called before the initial setjmp call has actually been
made, because this will almost certainly result in disaster. The longjmp call
must be made by a function in a runtime call chain originating inside the
function which called setjmp.

If “val” is nonzero, it is used as the return value from setjmp. Otherwise, the
value 1 is used as the return value.

Obviously, longjmp never returns to its own location, as program control


resumes at the setjmp call site.

setjmp.h can be used for error recovery without going through a long chain of
error condition checking; see next section.

setjmp.h - Error Recovery
setjmp and longjmp can be used for more efficient (or at least less verbose) error
recovery.
It is of course possible to perform error recovery WITHOUT the use of setjmp and
longjmp, as follows:
In a deeply nested function call chain, if the nested function detects an error, it can
return to the function that called it, and so on, until a top level function resumes
control. At each call and return, an error condition needs to be checked and the
error status needs to be propagated up.

Example:

// top level function
if (func1() == ERROR)


// func1
if (func2() == ERROR)
return ERROR;


// func2
if (func3() == ERROR)
return ERROR;


// func3
if (<bad condition>)
return ERROR;


However, using the setjmp.h facility, func3 can use the longjmp function to
jump back to the top level function.

Example:

jmp_buf env;

// top level function
if (setjmp(env) == 0)
func1();
else
<error handling>

// func1
func2();


// func2
func3();


// func3
if (<bad condition>)
longjmp(env, -1);


Notice that in this example, func1 and func2 do not need to perform any error
checking themselves, simplifying the coding.

stdarg.h - Variable Argument Functions
stdarg.h provides support for variable argument processing.

stddef.h - Standard Defines
Some standard macros are defined in multiple header files, e.g. NULL. This
particular header file contains all the standard macros and typedef types.

NULL
the null pointer value. It has the value 0.

offsetof(struct_type, member)
returns the byte offset of “member” in a structure. “struct_type” must be a
structure type. For example:

struct s {
int x;
int y;
int z;
};

… = offset(struct s, y);

ptrdiff_t
a signed integer type that can hold the difference between two pointer values.
Example:

ptrdiff_t diff;
int *x, *y;

diff = x - y;

size_t
an unsigned integer type that can hold the result of a sizeof operator. It is
also the return type of certain string.h functions, e.g. strlen. Example:

size_t size;

size = sizeof (int);
stdio.h Standard Input / Output (I/O) Functions
Standard I/O functions read and write to “standard Input Output” channels. On a
Unix/Linux or a Windows target machine, these I/O channels might be the
keyboard input and the terminal output of the command prompt or shell window.
These machines would also support file systems, and these functions allow a
portable method to read and write to files in the file systems.
For embedded system targets, there is typically no file system, but nevertheless,
some I/O functions are useful and are usually supported.
This section describes the subset of Standard C Library functions supported by
the JumpStart C compilers. Other embedded system compilers might support
different functions or different options.
NOTE: You will need to initialize the input and output ports. The lowest level of I/O
routines consists of the single-character input (getchar) and output (putchar)
routines, which must be implemented specific to each target device. JumpStart C
provides example implementations for various devices, and for most cases you
can just copy the correct example file to your project.

Once you have implemented the low-level functions, you do not need to make
modifications to the high-level standard I/O functions such as printf, sprintf, scanf,
etc.

int getchar(void)
returns a character from the input channel. You must implement this function
(or copy example code to implement it), as it is device-specific.

char *gets(char *buf)
reads a line of input. You must pass a buffer “buf” into the function. The
function uses the getchar() function to store input characters into “buf” until a
newline character is read and copied. “gets” then stores a NUL character to
terminate the string and returns the address that was originally passed.

In a host environment such as Windows, if the input channel encounters an


“end-of-file” condition, then “gets” return a null pointer. However, this condition
is not supported by JumpStart C, as the “end-of-file” condition does not make
sense in embedded systems.

int printf(char *fmt, ..)
printf prints out formatted text according to the format specifiers in the “fmt”
string:

The format specifiers are a subset of the standard formats:


%[flags]*[width][.precision][l]<conversion character>

The available flags are:


# - use alternate form. For the x or X conversion (see “conversion
characters” below), a 0x or 0X is generated. For the floating-point conversions,
a decimal point is generated, even if the floating-point value is exactly an
integer.

- (minus) - left-align the output (the default is right-aligned)

+ (plus) - add a ‘+’ character for positive integer output

‘ ‘ (space) - use space as the sign character for a positive integer

0 - pad with zeros instead of spaces

The width is either a decimal integer or a ‘*’ (star), indicating that the value is
taken from the next argument. The width specifies the minimal number of
characters that will be printed, which will be right or left aligned, and padded
with either spaces or zeros depending on the flag characters.

The precision is preceded by a ‘.’ and is either a decimal integer or ‘*’, denoting
that the value is to be taken from the next argument. The precision specifies
the minimal number of digits for an integer conversion, the maximum number
of characters for the s-string conversion, or the number of digits after the
decimal point for the floating-point conversions.

The conversion characters are as follows. NOTE: If an l (letter ‘el’) appears


before an integer conversion character, then the argument is taken as a long
integer.

d - prints the next argument as a decimal integer


o - prints the next argument as an unsigned octal integer
x - prints the next argument as an unsigned hexadecimal integer using a..f
for values 10..15
X - the same as %x except that A..F is used for values 10..15
u - prints the next argument as an unsigned decimal integer
p - prints the next argument as a void pointer. The value printed is the
unsigned hexadecimal address, prefixed by 0x.
s - prints the next argument as a C null-terminated string
S - (NOTE: JumpStart C for AVR only) prints the next argument as a C
null-terminated string in flash (“const”) memory
c - prints the next argument as an ASCII character
e - prints the next argument as a floating-point number in scientific notation
f - prints the next argument as a floating-point number in decimal notation
g - prints the next argument as a floating-point number in either decimal or
scientific notation, whichever is more convenient.

The next page contains examples of printf and its formatting code.

int putchar(char c)
prints out a single character. You must implement this function (or copy
example code to implement it), as it is target-specific.

int puts(char *s)
prints out a string followed by a newline.

int scanf(char *fmt, …)
reads the input according to the format string “fmt”. The function getchar() is
used to read the input. Therefore, if you override the function getchar(), you
can use this function to read from any device you choose.

Non-white whitespace characters in the format string must match exactly with
the input, and whitespace characters are matched with the longest sequence
(including null size) of whitespace characters in the input. A % character in the
format string introduces a conversion specifier.

The format specifiers are a subset of the standard C formats.


% <conversion character>

[l] long modifier. This optional modifier specifies that the matching argument
is of the type pointer-to-long.

d - the input is a decimal number. The argument must be a pointer to a


(long) int.
x/X - the input is a hexadecimal number, possibly beginning with 0x or 0X.
The argument must be a pointer to an unsigned (long) int.
p - the input is a hexadecimal number, possibly beginning with 0x or 0X.
The argument must be cast to a pointer to a “void pointer,” e.g., void **.
u - the input is a decimal number. The argument must be a pointer to an
unsigned (long) int.
o - the input is a decimal number. The argument must be a pointer to an
unsigned (long) int.
c - the input is a character. The argument must be a pointer to a character.

int sprintf(char *buf, char *fmt)


prints a formatted text into “buf” according to the format specifiers in “fmt”. The
format specifiers are the same as in printf().

int sscanf(char *buf char *fmt, …)
same as scanf, except that the input is taken from the buffer “buf”.

int vprintf(char *fmt, va_list va)
same as printf, except that the arguments after the format string are specified
using the stdarg mechanism. [See stdarg.h in a later section.]
printf Examples

Recall printf format code is in the form:
%[flags]*[width][.precision][l]<conversion character>

In the examples below, assume the following variable declarations:

float pi = 3.1415926;
float z = 0f;
int i = 0x4A;
char *h = “hello”;

flags:
printf format Output
printf(“%x”, i); → 4a
printf(“%X”, i); → 4A
printf(“%d”, i); → 74
printf(“%#x”, i); → 0x4a
printf(“%#X”, i); → 0x4A
printf(“%+x”, i); → +4a
printf(“% x”, i); → 4a (has one space in front)
printf(“%f”, z); → 0
printf(“%#f”, z); → 0.

width and flags:
printf(“%6x”, i); 4a (has 4 spaces in front)
printf(“%06x”, i); 00004a
printf(“%+6x”, i); +4a

precision:
printf(“%f”, f); 3.14159
printf(“%.2f”, f); 3.14
printf(“%s”, s); hello
printf(“.3s”. s); hel
stdio.h - Using printf on Multiple Output Devices
NOTE: This is JumpStart C specific information.

JumpStart C makes it very simple to use printf on multiple devices. Your
putchar() function can write to different devices depending on a global variable.
You change the global variable, either directly or through a function, whenever you
want to use a different device.
stdlib.h - Standard Library and Memory Allocation
Functions
The Standard Library header file stdlib.h defines typedef size_t and the macros
NULL and RAND_MAX, and declares the following functions.

Depending on the compiler, you might have to initialize the heap with a system-
specific call before using any of the memory allocation routines (calloc, malloc,
and realloc).

int abs(int i)
returns the absolute value of “i”.

int atoi(char *s)
converts the initial characters in “s” into an integer, or returns 0 if an error
occurs.

double atof(const char *s)
converts the initial characters in “s” into a double and returns it.

long atol(char *s)
converts the initial characters in “s” into a long integer, or returns 0 if an error
occurs.

void *calloc(size_t nelem, size_t size)
returns a memory chunk large enough to hold nelem number of objects, each
of size size. The memory is initialized to zeros. It returns 0 if it cannot honor
the request.

void exit(status)
terminates the program. In an embedded environment, it typically simply loops
forever, and its main use is to act as the return point for the user’s main
function.

void free(void *ptr)
frees a previously allocated heap memory from a malloc/calloc/realloc call.

char *ftoa(float f, int *status)
and
char *dtoa(double f, int *status)

These are JumpStart C specific functions for converting a floating-point


number to its ASCII representation. They return a static buffer containing the
result.

Both ftoa and the dtoa perform the same task, except that ftoa works with 32-
bit floating points, while dtoa works with 64-bit floating points.

Both “ftoa” and “dtoa” have two versions, selected by using an option in the
CodeBlocks IDE options dialog box. The default version of each function is
smaller and faster, but does not support the full range of the floating point
inputs (numbers that are too small or too large).

In the default version of these functions, If the input is out of range, “*status” is
set to the constant _FTOA_TOO_LARGE or _FTOA_TOO_SMALL (defined in
stdlib.h) and zero is returned. Otherwise, “*status” is set to zero and the buffer
is returned.

If you encounter these error results, you can enable the larger (and slower)
version which can handle all valid ranges by enabling the option in the dialog
box.

As with most other C functions with similar prototypes, “*status” means that
you must pass the address of a variable to this function. Do not declare a
pointer variable and pass it without initializing its pointer value.

void itoa(int value, char *buf, int base)
converts a signed integer value to an ASCII string, using “base” as the radix.
“base” can be an integer from 2 to 36.

long labs(long i)
returns the absolute value of “i”.

void ltoa(long value, char *buf, int base)
converts a long value to an ASCII string, using “base” as the radix.

void *malloc(size_t size)
allocates a memory chunk of size “size” from the heap. It returns zero if it
cannot honor the request.

void _NewHeap(void *start, void *end)
This is a JumpStart C specific function that initializes the heap for memory
allocation routines. A typical call uses the address of the symbol _bss_end+1
as the “start” value. The symbol _bss_end defines the end of the bss segment
(see Chapter 3. C Compilers And The Runtime Environment). Example:

extern char _bss_end;


unsigned start = ((unsigned)&_bss_end+4) & ~4;
_NewHeap(start, start+200); // 200 bytes heap

int rand(void)
returns a pseudo-random number between 0 and RAND_MAX.

void *realloc(void *ptr, size_t size)
re-allocates a previously allocated memory chunk with a new size from the
heap. The new size may be smaller or larger than the previously allocated
memory chunk.

The content of the previous allocated memory is copied to the new space.

void srand(unsigned seed)
initializes the seed value for subsequent rand() calls.

long strtol(char *s, char **endptr, int base)
converts the initial characters in “s” to a long integer according to the base. If
“base” is 0, then strtol chooses the base depending on the initial characters
(after the optional minus sign, if any) in “s”: 0x or 0X indicates a hexadecimal
integer, 0 indicates an octal integer, with a decimal integer assumed otherwise.
If “endptr” is not NULL, then “*endptr” will be set to where the conversion ends
in “s”.

unsigned long strtoul(char *s, char **endptr, int base)
is similar to “strtol” except that the return type is unsigned long.

void utoa(unsigned value, char *buf, int base)
same as itoa except that the argument is taken as unsigned int.

void ultoa(unsigned long value, har *buf, int base)
same as “ltoa” except that the argument is taken as unsigned long.

string.h - String Functions
The following string functions and macros are declared in string.h:

Macros and Types


NULL
the null pointer, defined as value 0.

size_t
a typedef name: the unsigned type that can hold the result of a sizeof operator.
Functions
void *memchr(void *s, int c, size_t n)
searches for the first occurrence of “c” in the array “s” of size “n”. It returns the
address of the matching element or the null pointer if no match is found.

int memcmp(void *s1, void *s2, size_t n)
compares two arrays, each of size “n”. It returns 0 if the arrays are equal and
greater than 0 if the first different element in “s1” is greater than the
corresponding element in “s2”. Otherwise, it returns a number less than 0.

void *memcpy(void *s1, const void *s2, size_t n)
copies “n” bytes starting from “s2” into “s1”.

void *memmove(void *s1, const void *s2, size_t n)
copies “s2” into “s1”, each of size “n”. This routine works correctly even if the
inputs overlap. It returns “s1”.

void *memset(void *s, int c, size_t n)
stores “c” in all elements of the array “s” of size “n”. It returns “s”.

char *strcat(char *s1, const char *s2)
concatenates “s2” onto “s1” . It returns “s1”.

char *strchr(const char *s, int c)
searches for the first occurrence of “c” in “s”, including its terminating null
character. It returns the address of the matching element or the null pointer if
no match is found.

int strcmp(const char *s1, const char *s2)
compares two strings. It returns 0 if the strings are equal, and greater than 0 if
the first different element in “s1” is greater than the corresponding element in
“s2”. Otherwise, it returns a number less than 0.

(There is related function, “strncmp” (see below) which is used to compare


only a portion of the strings.)

int strcoll(const char *s1, const char *s2)
compares two strings using locale information. Under our compilers, this is
exactly the same as the strcmp function.

char *strcpy(char *s1, const char *s2)
copies “s2” into “s1”. It returns “s1”.

(There is related function, “strncpy” (see below) which is used to copy only a
portion of the “s2” string.)

size_t strcspn(const char *s1, const char *s2)
searches for the first element in “s1” that matches any of the elements in “s2”.
The terminating nulls are considered part of the strings. It returns the index in
“s1” where the match is found.

size_t strlen(const char *s)
returns the length of “s”. The terminating null is not counted.

char *strncat(char *s1, const char *s2, size_t n)
concatenates up to n elements, not including the terminating null, of “s2” into
“s1”. It then copies a null character onto the end of “s1”. It returns “s1”.

int strncmp(const char *s1, const char *s2, size_t n)
is the same as the “strcmp” function except it compares at most “n” characters.

char *strncpy(char *s1, const char *s2, size_t n)
is the same as the strcpy function except it copies at most “n” characters.

char *strpbrk(const char *s1, const char *s2)
does the same search as the “strcspn” function but returns the pointer to the
matching element in “s1” if the element is not the terminating null. Otherwise, it
returns a null pointer.

char *strrchr(const char *s, int c)
searches for the last occurrence of “c” in “s” and returns a pointer to it. It
returns a null pointer if no match is found.

size_t strspn(const char *s1, const char *s2)
searches for the first element in “s1” that does not match any of the elements
in “s2”. The terminating null of “s2” is considered part of “s2”. It returns the
index where the condition is true.

char *strstr(const char *s1, const char *s2)
finds the substring of “s1” that matches “s2”. It returns the address of the
substring in “s1” if found and a null pointer otherwise.

char *strtok(char *s1, const char *delim)
splits “s1” into tokens. Each token is separated by any of the characters in
“delim”. You specify the source string “s1” in the first call to “strtok”.
Subsequent calls to “strtok” with “s1” set to NULL will return the next token
until no more tokens are found, and “strtok” then returns NULL.

“strtok” modifies the content of “s1”. For example:

char *str = “Hello world\tI am Alive!\nMay be\n”;


char *delim = “ \t\n”;

char *p;
int i = 0;

p = strtok(str, delim);
// p now points to “Hello”
printf(“%d: %s\n”, i++, p);

while ((p = strtok(0, delim)) != 0)


printf(“%d: %s\n”, i++, p);

Prints out:
1: Hello
2: world
3: I
4: am
5: Alive!
6: May
7: be
time.h - Time Manipulation Functions
As embedded systems do not generally have an underlying OS that provides the
system time, low-level functions must be written in order to access the target
system’s RTC (Real Time Clock) or MCU timers to acquire the rudimentary time
data.

There are two sets of functions: Clock Functions and Time Functions.
time.h - Clock Functions
clock_t
a typedef type. An arithmetic type that can hold the values returned by the
function “clock”. Typically an int.

clock_t clock(void)
this returns the number of clock ticks that have occurred since the program
started, or -1 if the information is not available.

CLOCKS_PER_SEC
this is a macro defining the number of clock ticks per second.
time.h - Time Functions
The time functions operate on two defined types:

time_t
a typedef type. This is an arithmetic type that can hold the values returned by
the function “time”. Typically int or long.

struct tm
a structure that holds the time information. Typically it has the following fields,
but they may not be necessarily in the order shown, or have identical field
names:

struct tm {
int sec; // number of seconds after min
int min; // minutes after the hour
int hour; // hour of the day (from 0)
int mday; // day of the month (from 1)
int month; // month of the year (from 0)
int year; // year since 1900
int wday; // days since Sunday (from 0)
int yday; // day of the year (from 0)
int is_dst; // is Daylight Saving Time in effect
};

The lowest level function is:
time_t time(time_t *tod)
returns the current calendar time. If “tod” (time of day) is not NULL, then the
current time is written to “*tod” as well.

For an embedded target, you might have to implement this function to read
from a RTC or the target device timer, and convert the native value into the
time_t format.

time_t Functions:
char *ctime(time_t *cal)
converts the calendar time in “*cal” to an ASCII representation. It returns a
static buffer and is equivalent to calling “asctime(localtime(cal))” (see below)

double difftime(time_t t1, time_t t2)
returns the difference of two time values in number of seconds

struct tm *gmtime(const time_t *tod)
converts a calendar time to a “struct tm“ variable, in GMT (Greenwich Mean
Time, now more commonly known as UTC - Universal Time Coordinated). It
returns the address of a static struct holding the converted values.

struct tm *localtime(const time_t *tod)
converts a calendar time to a “struct tm“ variable, in local time. It returns the
address of a static struct holding the converted values.

Note: both gmtime and localtime use the same static “struct tm” variable.

“struct tm“ Functions:
char *asctime(struct tm *tptr)
converts the time value to an ASCII representation of 26 characters. The
format is

www mmm dd hh:mm:ss yyyy\n\0

where:
www 3 character weekday, e.g. Mon
mmm 3 character month, e.g. Jan
dd 2 character day, e.g. 02
hh 2 character hour
mm 2 character minutes
ss 2 character seconds
yyyy 4 character year

time_t mktime(struct tm *tptr)



size_t strftime(char *s, size_t n, const char *fmt, const struct tm *tptr)
converts the time value to an ASCII representation using the format specified
in “fmt”. You must pass a character array object “s” whose length is “n”.
Characters in “fmt” are copied verbatim into the buffer in “s” except the
conversion characters of format %<x>, which are interpreted as:

%a 3 character weekday name, e.g. Mon


%A full weekday name, e.g. Monday
%b 3 character month name, e.g. Jan
%B full month name, e.g. January
%c date and time, e.g. Jan 01 01:02:30 2015
%d 2 character day of the month, e.g. 01
%H 2 character hour of the 24-hour day
%I 2 character hour of the 12-hour day
%j 3 character day of the year, starting from 001
%m 2 character month of the year
%M 2 character minutes after the hour
%P AM/PM indicator, e.g. PM
%S 2 character seconds after the minute
%U 2 character Sunday week of the year, from 00
%w one character weekday number, from 0 for Sunday
%W 2 character Monday week of the year, from 00
%x the date, in the form of “mmm dd yyyy”
%X the time, in the form of “hh:mm:ss”
%y 2 character year of the century
%Y 4 character year
%Z time zone name, if any, e.g. PST
%<x> any character not in the above list is printed verbatim
SECTION III
ADVANCED TOPICS IN C

This section explores some of the advanced topics in C.
10. EFFECTIVE POINTER AND ARRAY USAGE

This chapter presents some examples on proper pointer and array usage.
“Pointer” is a fundamental construct built into the C programming language: it
allows efficient addressing of array elements and other aggregate data
structures, it allows a function to modify the value of an argument through its
address, and it allows use of dynamic objects in Heap Memory.

Pointers and Arrays: “Here Be Dragons”


To become a C guru, remember the following:

Under many contexts, arrays and pointers are interchangeable. In particular,
an array object of data type X is treated as having a data type of pointer to the
type X. This is explained further in later section in this chapter:

char hello[6];
char *ptr;

ptr = hello; // OK

An array object has allocated storage.The name of the array object is the
starting address of the array storage.

(using declarations as above)

// size of is a C operator that returns the size of the


object
// in bytes
sizeof (hello) → 6 bytes
sizeof (ptr) → 4 bytes for JumpStart C for Cortex
→ 2 bytes for JumpStart C for AVR

hello is the same as &hello[0]



A pointer object holds an address, which must be assigned to an address of
valid storage objects before it can be used.

char hello[6];
char *qptr; // uninitialized variable

*qptr = 1; // RUNTIME ERROR! qptr does not contain a


valid
// address

qptr = hello;
*qptr = 1; // OK

C does not check the validity of an address. Reading and writing from an
invalid address is not checked by the compiler, and may cause runtime
problems at execution points far away from where the invalid address is used.
This is known as memory overwrite problems, or the “C gives you enough rope
to hang yourself” axiom.

Even if an address is valid, C does not check the size of the storage object it is
accessing. If you use strcpy or other functions to write 12 characters into a 10
character array, C is OK with that! Your program, however, won’t be.

char hello[6];

strcpy(hello, “Hello World”); // RUNTIME ERROR!


// buffer overwrite

Array and Pointer Types
In most contexts, when a compiler sees the name of an array variable with the
type “array of type X”, the compiler treats it as if the expression of address of the
first element has been written, and uses “pointer to the base type X” as the type
instead:

int table[];
… &table[0] … // address of the first element
… table … // SAME as above

This allows assigning an array variable name to a pointer variable, as long as the
“points-to” type of the pointer variable and the “array-of” type of the array variable
are the same:

int *p, a[10];

p = a; // OK, the types are compatible
// this is the same as if you have written
p = &a[0];

This also allows a function parameter of a “pointer to type X” and accepts an array
argument of any size, as long as the converted type is compatible.

Extern void func(char *);
char hello[] = “Hello World”;

func(hello);

The only context where this implicit type rewriting happens is if an array type is
used as an operand to the sizeof operator:

char table[100];
printf(“%d\n”, sizeof (table));

correctly outputs:

100

The worst confusion occurs when some people believe that “pointers and arrays
are the same”. In fact, they are not: they may have compatible types, but an array
variable object has space allocated to the entire array object, whereas a pointer
variable has space large enough only contain an address; and again, a valid
object address must be assigned to a pointer variable before its use.

Multi-Dimensional Array versus Array of Pointers
Another example to demonstrate the difference between pointers and arrays:

Each element of array_of_ptrs is (for Cortex-M) 4 bytes, containing the


address of a literal string that is allocated elsewhere. On the other hand, each
element of array_of_arrays is a 12-byte array that contains the actual
characters. Even though the two arrays behave similarly under most
circumstances, they have vastly different memory content.

Null Pointer
The null pointer value is equal to integer zero and does not match the address of
any function or data object in the program. Typically, it is defined like this:

#define NULL (void *)0

Zero is the only valid integer that can be converted to a function address. Other
conversions between integer and pointer values are undefined.
Pointer Type Conversions
The following conversion rules are defined.

void * or a “pointer to void” is an incomplete type and is the only pointer type
that can be converted between any other pointer type.

Memory Aliasing: Pointer Access
The simplest use of a pointer variable is to assign the address of another variable
to it, so that the value of the pointed-to variable can be accessed through the
pointer variable. Being able to access the same memory location using different
objects is known as memory aliasing:

nti , *p;

p = &i;

*p = …; // write to “i”
… = *p; // read “i”

Pointer Arithmetic
One of the unique innovations of the C pointer, as compared to pointer
implementations in other programming languages at the time (early 1970s) is that
when you add an integer to a pointer, the integer is scaled by the number of bytes
of the pointed-to type. For example,

short *p;
int n;

Assuming “p” contains the value (presumably a valid address) 0x1000, and a
short is 2 bytes wide, and “n” has the value 3, then the expression “p + n” has the
value:

p + n 0x1000 + (n * 2)
0x1000 + (3 * 2)
0x1000 + 6
0x1006

This is also the address needed to access an “array of shorts”. For example:

short array[4];

Assume “array” starts at 0x1000. The addresses of the elements of “array” are:

address of array[0] 0x1000
address of array[1] 0x1002
address of array[2] 0x1004
address of array[3] 0x1006

As you can see, &array[3] (address of array[3]) is the same as (p + 3). Therefore,
by assigning the address of an array of type X to a pointer to type X, any array
element can be accessed by treating the pointer as if it is an array. Given:

p = &array[0];
// or alternately written as
p = array;


Then these are equivalent:
Using Pointers to Access Arrays
As mentioned, the type “array of type X” is type-compatible with “pointer to type
X”. A concise method to traverse an array can be written using pointers. For
example, the following sums (adds up the values of) the contents of an array
using array indexing:

#define SIZE 10
int table[SIZE];
int sum = 0;

for (nti = 0; i < SIZE; i++)
sum += table[i];

Here is the same behavior written using pointers with the same variable
declarations as above:

int *p = table;

for (nti = 0; i < SIZE; i++)
sum += *p++;

The primary advantage of using pointers in this example is the efficiency of using
the expression *p++, which can be compiled to a single instruction depending on
the target machine, whereas an indexing operation table[i] would take two to
four instructions. (See next section.)
Efficiency of Using Pointers
One idiom frequently used in writing pointer code is this expression:

*p++;

This evaluates to the value pointed to by “p”, and then increments “p” to point to
the next element. For example, the core part of the standard library function
strcpy() can be written as:

while (*dst++ = *src++)
;

This single line copies a byte from the memory whose address is stored in “src” to
the memory cell whose address is stored in “dst”, then increments both pointers to
point to their next locations. Finally, the copied value is tested against zero, and if
it is the terminating nul in the string, then the loop terminates.

On the Digital Equipment PDP-11, one of the earliest machines on which the C
programming language was created, the copy instruction with pointer increments
compiled down to a single instruction (!):

movb (r1)+,(r2)+

It was rumored that the C ++ and – operators were created to match the PDP-11
addressing modes, but Dennis Ritchie, the creator of C, has said that this is not
true. Nevertheless, C’s pointer access and increment/decrement operators
continue to influence the design of Instruction Set Architecture. For example, the
Cortex-M does the copy and increments in two instructions:

ldrb R3,[R0],#+1
strb R3,[R1],#+1

Even if the target architecture does not have the increment or decrement
addressing modes, accessing an array via a pointer is usually more efficient, as a
pointer variable can be allocated in a fast CPU register, whereas an array name
reference is always a (slower) memory access.
Out of Bound Array Accesses
A pointer may contain the address of any array element, not just the first element.
Using this feature allows a pointer to alias to the address of any array element as
the start of a “virtual array”:

p = &array[2];
then
p[1] is the same as array[3]

ADVANCED TOPIC: it is legal to write a negative index to a pointer or array
reference. An obvious use is:

p = &array[2];
assert(&p[-1] == &array[1]);
assert(p – 1 == &array[1]);

Even though “p” above is not set to the first element of “array”, it can still access
earlier elements of “array” by using a negative index.

POTENTIAL BUG TRAPS: it is this lack of index checking that can cause
program crashes. For example:

short array[4];
short *p;

p = &array[2];

// The following examples cause runtime errors
p[-3] = … ;
p[4] = … ;

Both array references through “p” exceed the boundary of “array”, and the writes
would be writing to random other objects depending on where “array” is allocated
– it may affect other global variables, or local variables, or even the function’s
return address if it is stored in the stack.

Indeed, there is no index bound checking even for a real array:

short array[4];

array[-1] = … ;
array[4] = … ;

Both array references above are also out of bounds and will cause runtime errors,
but they are perfectly valid C code.
Writing Generic Functions to Process Arrays
Since a “pointer to type X” is type-compatible to the type “array of type X”
regardless of the number of elements in the array, it is simple to write functions in
C that operate on input arrays of any size.

For example, “strings” in C are arrays of characters terminated in a nul character
(\0). Without needing to know the actual length of this array, the standard C library
function strcpy copies from a source string into a destination array. The function
walks through the source input and copies the input to the output, and only needs
to stop when it reaches the terminating nul:

char *strcpy(char *dst, const char *src)
{
char *val = dst;

while (*dst++ = *src++)
;
return val;
}

Indeed, it is acceptable to declare the argument types as incomplete array types:

char *strcpy(char dst[], const char src[]);

By contrast, in a programming language such as Pascal where an array of X
elements is considered to be a separate and distinct type from an array of Y
elements, it is impossible to write such generic functions.
Returning a Pointer from a Function
Sometimes a function needs to return a pointer type. For example, the strcpy
function in the Standard C Library is declared as:

char *strcpy(char *dst, const char *src);

dst is either an array object, or a pointer to a dynamically allocated object. In the
case of strcpy, the function returns the value of dst that was passed to the
function.

When a function returns a pointer, the returned pointer must point to a valid
address; for example, the address of a global or static variable, or a dynamically
allocated object.

Returning a pointer to a local variable would capture the address of a stack
location that becomes no longer valid once the function returns, and will cause
problems when you attempt to access the array later.

For example, the following function “ReverseString” reverses an input string of up
to 99 characters. The returned pointer is the address of an element in the static
array dst. Since it has a static lifetime, even after the function ReverseString
returns, the array object persists and therefore can still be accessed.

#define SIZE 100 // 99 characters + 1 nul
char *ReverseString(char *src)
{
static char dst[SIZE];

dst[SIZE-1] = 0;
char *pdst = &dst[SIZE-1];

for (nti = 0; i < SIZE-1 && *src != 0; i++)
*—pdst = *src++;
return pdst;
}

The “for” loop terminates when either the limit of the destination array is reached
or when the terminating nul of the source is reached. Characters are copied from
the end of the destination to the beginning using the *—pdst idiom.

PITFALL #1: the function above returns the same storage object every time it is
called. Therefore, if multiple calls are made to the function, you may need to make
copies of the reversed strings elsewhere.

PITFALL #2: the function only works with an input array of up to 99 characters. A
limit must be set, since you must give a size when you declare the array. You can
eliminate this limitation and also solve PITFALL #1 by using a dynamically
allocated storage space for the reversed string. The tradeoff is the use of heap
memory which incurs runtime overhead:

char *ReverseString(char *src)
{
int len = strlen(src);
char *dst = (char *)malloc(len+1);

dst[len] = 0;
char *pdst = &dst[len];

while (*src != 0)
*—pdst = *src++;
return dst;
}

strlen is used to compute the actual length of the input string. Space is
allocated by using malloc, after which the code copies the bytes in reverse
order, using a similar algorithm as before.
Using Pointers to Access Arbitrary Memory Locations
Embedded System programming frequently uses pointers to reference I/O
registers or arbitrary memory locations. The power of C and pointers is that you
can declare a structure and cast an arbitrary address to be of that type, and the C
compiler computes the correct offset to access the elements, so the programmers
do not need to remember constant offsets and addresses.
For example,

#define PERIPH_BASE ((uint32_t)0x40000000)
#define APB2PERIPH_BASE (PERIPH_BASE + 0x10000)
#define TIM1_BASE (APB2PERIPH_BASE + 0x2C00)
#define TIM1 ((TIM_TypeDef *) TIM1_BASE)

typedef struct
{
uint16_t CR1;
uint16_t RESERVED0;
uint16_t CR2;
uint16_t RESERVED1;
uint16_t SMCR;
uint16_t RESERVED2;
uint16_t DIER;
uint16_t RESERVED3;
uint16_t SR;
… // rest of the struct elided
} TIM_TypeDef;

TIM1->CR2 = 0;

While we have cautioned against type casting integer values to pointer values in
general, this is indeed the most efficient way for embedded system work to
access the I/O registers. As these values are provided by the silicon vendors or
the compiler vendors, and hidden behind macro names, the potential for misuse is
much reduced.

The cast operator is needed to satisfy the Standard C typing requirements. In the
end, TM1->CR2 just means to pretend that address 0x40012C00 is the address of
a structure defined by the TIM_TypedDef declaration, and then access the CR2
element of that structure.

In the above structure typedef definition, CR2 is the third element of the structure,
preceded by two uint16_t members (which we can assume are 16-bit each in
size), therefore the final address location we are accessing is 0x40012C00 + 0x4,
or 0x40012C04.
Volatile I/O Register Access
In C, reading memory data is normally useful only if the value of the data is to be
used in some manner, e.g. being assigning to a variable, or used in a
computation. However, sometimes an MCU peripheral subsystem requires the
firmware to read an I/O register to effect some changes in the hardware state:

This is an excerpt from the I2C section of the reference manual of ST’s
STM32F411 devices. In here, the “master” (i.e. the MCU) waits for the SR1
(Status Register 1) I/O register to be read.
In C, you would typically write
unsigned tmp = I2C1->SR1;

I2C1 is a type, defined similarly to the example given on a previous page, that
maps to the I2C1 I/O register of the MCU. This statement reads the register,
satisfying the hardware requirement. However, the variable is never used, and this
may trigger warnings from the compiler or a lint-style code checking tool.
One method to avoid warnings is to declare the I/O register field as volatile,
similar to a previous page’s example:


// inside the typedef for I2C struct
volatile unsigned SR1;


In this case, then, a read of the I/O register informs the compiler that the read
must be performed, even though the value is not explicitly used in the program
code:

I2C1->SR1; // a read with a side effect on hardware
ADVANCED TOPIC – Accessing the Stack Address
In embedded programming, sometimes you need to find out the location of the
stack pointer, for example, to write a task scheduler, you need to capture the CPU
context, which includes the stack pointer. While this is normally done using
assembly code routines, you can sometimes replace some or all of the assembly
code using C.

This function returns the address of the stack pointer:

void *StackPointerAddress(void)
{
unsigned x;

return (void *)&x;
}
ADVANCED TOPIC: Writing a Simple Function
Dispatcher
You can store a set of function addresses in an array or other data structure and
then invoke them indirectly. You can use this function dispatching to implement a
[35]
Finite State Machine (FSM ), a simple task scheduler, or even to implement a
rudimentary version of C++’s virtual function feature. It’s beyond the scope of this
section to explain these in detail. Nevertheless, a FSM is particularly suitable for
certain types of embedded programming, and it would be worthwhile for you to
follow up on the subject via web searches for tutorials on the subject.

Here we will just use a simple example of function dispatching based on an
integer argument:

typedef void (*FUNCPTR)(void);

void f1(void), f2(void), f3(void), f4(void);

FUNCPTR ftable[] = {
f1, f2, f3, f4
};

// Dispatch on 0..3
void FunctionDispatch(nti )
{
if (i >= 0 && i <= 3)
(*ftable[i-1])();
}

ftable is an array of function addresses that is initialized with the addresses of 4
functions. In FunctionDispatch, if the input argument i is between 0 to 3, then
it is used as an index to indirectly call the respective function.
ADVANCED TOPIC: Using Pointers to Pointers
A pointer may contain the address of any other data type, including another
pointer type. Please see chapter <Advanced Topic: Dynamic Data structures> to
see how to use pointers to pointers to simplify code for dynamic data structure
allocation.
11. DYNAMIC DATA STRUCTURES

Dynamic / Heap Objects


So far we have only been discussing objects are that declared at compile time. C
also allows dynamic object allocation and deallocation from heap memory using
the library functions malloc, calloc, realloc and free. These are declared
in the header file stdlib.h:

// allocate an object of ‘size’ bytes
void *malloc(unsigned size);

// allocate an nelem object of ‘size’ bytes each and
// zero the content
void *calloc(unsigned nelem, unsigned size);

// reallocate a previously allocated object to a different
size
void *realloc(void *old_object, unsigned size);

// deallocate and free the object
void free(void *obj);

The allocation functions return a pointer to an object in the heap memory of the
requested size . These heap objects can only be referenced through pointers, as
they have no compile time names.

Most heap allocated objects will be some data structures that you need in your
programs. A linked list is one of the most useful data structures and we will use
that as an example in this chapter. FIFO (first in, first out), Queue, Trees etc. are
also very important. Once you move away from the low-level bits and bytes and
I/O registers, a good embedded engineer should have a good grasp of these
subjects and we recommend that you pursue this subject further.

A future version of JumpStart API may include these standard data structure APIs.
Linked Lists
After an array, a linked list is the most basic data structures:


A linked list is made up of a list of “nodes”, which is represented in C as a
struct. Each node has a link, i.e. a pointer, to another node. The rest of the
node holds data necessary for the computation. A root pointer contains the
address of the first node in the chain. The end of the chain is signified by the link
address value of 0, the null pointer (usually drawn as the electrical symbol for
GROUND).
Example of Linked List
For example, this code fragment creates a linked list of input lines (for example,
typed on a terminal keyboard):

#include <stdio.h>
struct input_str;
typedef struct input_str {
struct input_str *next;
char *str;
} INPUT_STR;
INPUT_STR *root;

void ReadName(void)
{
char *s;
char buf[1024];

while ((s = gets(buf)) != NULL)
{
INPUT_STR *p = (NAME *)malloc(sizeof (INPUT_STR));
// out of memory
if (p == 0)
break;

p->next = root;
root = p;
p->str = malloc(strlen(s) + 1);
if (p->str)
strcpy(p->str, s);
else
break;
}
}

Each node of the linked list is of type struct input_str, which consists of a
pointer to another node and a pointer to the string. The loop in the function
ReadName reads a line, then allocates a new node and links it to the existing
chain rooted at root. Root points to the last structure created. The list can be
traversed through the next pointer of the chain.

Two calls to malloc are used: one to create a new node, and the other one to
create an array in which to store the input string.

PITFALL: The Standard C Library function “gets()” takes an input buffer. However,
gets might overrun the buffer, as there is no method to communicate the size of
this buffer to gets.

EXERCISE: Note that new nodes are created at the beginning of the list. How
would you append the new node to the end of the list? Also see the section
“Pointers to Pointers” in the later section of this chapter.
Traversing a Linked List
C pointers make it easy to access and manipulate a linked list. Using the data
structures from previous page:

for (INPUT_STR *p = root; p; p = p->next)
printf(“input was ‘%s’\n”, p->str);

The for loop “walks” from the start of the list using the address stored in the
global variable root, and traverses each element of the list.
ADVANCED TOPIC: Allocating a Dynamic Size
Structure
Notice that in the previous example, there are two calls to malloc, one to allocate
space for the struct input_str structure, and one for the character array
holding the content of the input string. Besides function call overhead, each
allocated object requires some space overhead for allocation management, and
more allocation calls can potentially cause heap fragmentation.

There is a method to eliminate one malloc call in this example; since C does not
perform object size checking, you might use this to your advantage and eliminate
the second call to malloc to avoid some space overhead. To do this, you “lie” to
the compiler by telling it that the end of the structure is a single character array
instead of a pointer to a char, and then in the allocation call to malloc, you
specify enough space sufficient for both the structure and the input string.

#include <stdio.h>
struct input_str;
typedef struct input_str {
struct input_str *next;
char str[1]; // extensible array
} INPUT_STR;
INPUT_STR *root;

void ReadName(void)
{
char *s;

while ((s = gets()) != NULL)
{
int len = strlen(s);
INPUT_STR *p =
(NAME *)malloc(sizeof (INPUT_STR) + len);
if (p == 0)
break;

p->next = root;
root = p;
strcpy(p->str, s);
}
}

strlen returns the length of a string not counting the terminating nul. However,
as the structure has a one-byte array as a member, there is no need to account
for the space for the nul in the call to malloc.

The same call to strcpy is made as in the previous example. In the original
case, p->str is a separate heap object pointer, whereas in this case, p->str
has the type “array of one char”, but we have allocated extra space for the input
string.

This optimization only works because C allocates a struct member in declaration
order, so str will be at the end of the structure.
Appending Node at the End of the List
In the code example from the previous section, the function ReadName inserts
the new node at the beginning of the list. What if you wish to append the new
node to the beginning of the list instead? An obvious solution would be to use
another variable to keep track of the “current” node. Looking at just the while
loop where the nodes are created, we can write

INPUT_STR *current = 0;

while ((s = gets()) != NULL)
{
int len = strlen(s);
INPUT_STR *p =
(NAME *)malloc(sizeof (INPUT_STR) + len);

if (p == 0)
break;

p->next = 0;
strcpy(p->str, s);
if (current == 0)
root = p;
else
current->next = p;

current = p;
}

Inside the loop, either “current” is zero, denoting an empty list, or it has at least
one element. The if statement checks for either of those two conditions and acts
accordingly: the code should hopefully be obviously by now.

This code is easy to understand, however, there is an alternative.
ADVANCED TOPIC: Using Pointers to Pointers
The power of using a pointer is that it is that it can contain the address of ANY
object, including another pointer object! For example:

nti ;
int *pi = &i;
int **ppi = &pi;

**ppi = 4;
// now *pi and i also equal to 4

While a pointer to a pointer to an integer may not be very useful generally,
consider the requirement to append a node at the end of a list. We have seen an
example where it was done by using another variable to keep track of the
“current” node. The following demonstrates a more compact coding style using
another variable nextp, a pointer to a pointer, to accomplish the same task:

INPUT_STR **nextp = &root;

while ((s = gets()) != NULL)
{
int len = strlen(s);
INPUT_STR *p =
(NAME *)malloc(sizeof (INPUT_STR) + len);

if (p == 0)
break;

p->next = 0;
strcpy(p->str, s);
*nextp = p;
nextp = &p->next;
}

By using a pointer to another pointer, this version eliminates the if statement and
therefore is visually much less cluttered.The initial assignment of nextp =
&root sets up the initial condition, and nextp is the address where the new
node should be attached.

Which version you prefer to write is up to you. There is no significant difference in
code size or speed. Nevertheless, while coding using pointers to pointers may
look more daunting, especially when you are not familiar with it, once you gain
proficiency you might find that it can make programs both visually shorter and
easier to understand and maintain.
Dangling Pointers
The function free is used to deallocate heap memory objects, so that future
objects can use that space. However, as C argument passing is call-by-value, the
pointer to the object is not changed. Therefore, one might still access that memory
space through the pointer. This is known as a “dangling pointer” problem.

// (using the data structure as previous pages)

INPUT_STR *p = (INPUT_STR *)malloc(sizeof (INPUT_STR));

free(p); // p is destroyed
printf(“input is \”%s\”\n”, p->str);

In this example, p->str is being accessed, even though the memory has been
released back to the heap. The call to printf may print the correct value (by
“luck”), print garbage, or even cause the program to crash.

The more insidious problem is when a pointer is free’d in one section of the code
but the “dangling pointer” is used in another section of code, potentially causing
random crashes.

BEST PRACTICE: always assign zero to a free’d pointer variable and check for
non-zero pointers before using them:

INPUT_STR *p = (INPUT_STR *)malloc(sizeof (INPUT_STR));

free(p);
p = 0;



// always check for valid pointer before accessing
if (p)
printf(“input is \”%s\”\n”, p->str);
Advantages and Pitfalls of Using Dynamic Memory
PITFALL #1: Out of (heap) memory error. The heap memory is limited in size.
Malloc and calloc return 0 if memory cannot be allocated. Your program must
account for the possibility that malloc/calloc cannot allocate storage, and so
should not use the returned pointer if it is a null pointer.

Unfortunately, error recovery in an embedded system is always tricky: that is,
there is no general mechanism to handle failure, as an embedded system should
never fail or crash! Imagine a self-driving car literally crashing due to a software
crash in the embedded control system! In the given examples above, the returned
value from malloc is checked against zero before it is used. However, how to
deal with the failure case is left as an exercise for the actual embedded system
firmware implementation.

It is also important to call free to release the memory back when the storage is
no longer needed.

PITFALL #2: Heap fragmentation. There are different algorithms to implement
heap memory routines – some favor speed of allocation and deallocation, and
others favor optimal space usage. The heap may become fragmented, which
refers to chunks of memory being allocated with unused space left between the
chunks. This condition happens more frequently when there are a lot of random
allocation and deallocation calls.

If the heap memory is highly fragmented, malloc/calloc might not be able to
honor an allocation request even if the total amount of free space is large enough.

Some programming languages provide a feature called garbage collection that
frees up unused heap objects automatically and merges the free heap memory
together as needed. Unfortunately, the pervasive unrestricted use of pointers in C
makes garbage collection for C programs not generally possible.

ADVANTAGE: For the reasons mentioned, some people recommend against ever
using dynamic memory. This is certainly in line with the spirit of “defensive
programming”. Nevertheless, while there are pitfalls, using dynamic memory can
be a very useful feature. Primarily, it can adapt to changing memory needs of your
programs. Imagine you have a number of data structures defined in your
program; by using dynamic memory, you do not need to predetermine how much
memory is allocated to each data structure set.

You can program defensively to account for out-of-memory errors. For example,
you might write your own functions that call malloc/calloc and take a
consistent approach when an out-of-memory error occurs. Moreover, if a program
can run of heap memory, then it will most likely run out of memory even when
using statically allocated memory, so not using heap memory just avoids the
issue, and is not dealing with it per se.

WAR STORY: One of the authors worked at Whitesmiths Ltd in the 1980s, with
Whitesmiths being the first company to produce a commercial C compiler outside
of Bell Labs. Whitesmiths also produced a Unix Edition 6 API compatible system
called Idris. Today’s Cortex-M microcontrollers are much more powerful and have
more memory than the machine targets of that time. One story told by PJ Plauger,
Whitesmiths’ founder and president, was that in the early Unix and C compilers
they were careful to avoid using dynamic memory, for fear of memory
fragmentation and other issues. Mr. Plauger wrote the Whitesmiths compilers and
Idris from scratch, and he did not set such restrictions. The resulting programs did
not suffer any performance or quality issues from use of dynamic memory.
SECTION IV
APPENDICES
A. INTRODUCTION TO COMPUTER
ARITHMETIC

Representing Numbers
At the heart of a digital computer is the Central Processing Unit (CPU). A CPU
operates purely on bit patterns, which are most convenient to notate as numbers.
Normally, numbers are written in base 10, or decimal notation: the digits are 0..9
(using the notation .. to mean from one end to another), and the number
sequence goes from 0..9, then 10..19, and so on.

However, since digital CPUs work in a binary digital domain, where a bit is either
in a state of “on” (1) or “off” (0), it is often more convenient to refer to these bit
patterns in other (non-decimal) numbering systems.
Binary Notation
Binary notation (also called “base 2”) uses only the digits 0 and 1 to write
numbers. The sequence of all the numbers that can be represented in a single
byte is:

Decimal Binary
0 00000000
1 00000001
2 00000010

10 00001010
11 00001011
12 00001100

252 11111100
253 11111101
254 11111110
255 11111111

As the table shows, starting with all zeros (0 in decimal), the highest number that
can be stored in a byte is 11111111 in binary, or 255 in decimal (assuming that we
are interpreting the byte as unsigned, but we will get into signed vs. unsigned
later).
Hexadecimal Notation
Writing in binary is very cumbersome; so one common notation used in the
computer field is hexadecimal notation, (also known as “base 16”) where the digits
go from 0 to 9, then A, B, C, D, E, F (or: a, b, c, d, e, f), corresponding to 10 to 15
in decimal.
Hexadecimal 0 to F (decimal 0 to 15) can be represented in exactly 4 bits, and 2
hexadecimal digits fit into an 8-bit byte exactly. This is why hexadecimal notation
is the preferred method of writing numbers “for computers”, not decimal notation.
Octal Notation
Another common notation is octal notation (“base 8”) which uses the digits 0 to 7.
Octal is useful because it fits into 3 bits, but of course it does not fit into a byte as
well as hexadecimal numbers.
Numbering Prefixes
To disambiguate the base of a number in this book, we will adopt the C notation: a
decimal number is given no special prefix, a binary number is prefixed with the
[36]
sequence 0b , a hexadecimal number is prefixed with the sequence 0x, and an
octal number is prefixed with 0. Occasionally, some examples in this book will
show a series of bit patterns without the 0b prefix, but it should be clear from the
context, and those bit patterns will usually be broken down into nibbles of 4 bits,
e.g. 0100 1001.

The number “42” as written in different bases:
Base Number
2 0b101010
8 052
10 42
16 0x2A
Interpreting a Number
The place value of the initial non-zero digit (1) in each number system is
equivalent to the next power of the base.

A few observations:
1. Any whole numbering system is possible, e.g. base 3, base 4 etc. However,
to programmers, bases 2, 8, 16 and 10 are the most common and useful ones.

2. Unsurprisingly, the numerical values go up much faster as the base number


increases. For example, when raised to the 5th power, the values of the digits
are:

Base 5th Power
2 32
8 32768
10 100000
16 1048576
For example:
Decimal: 123
1 * 102 = 100
2 * 101 = 20
+ 3 * 100 = 3

Hexadecimal: 0x2AB converted to decimal
2 * 162 = 256
10 * 161 = 160
+ 12 * 160 = 12

result: 427
Prefixes: Kilo, Mega, and Giga etc.
In describing the size of computer memory, the convention is to use the nearest
power-of-two, e.g. a “kilobyte” is actually 1024 bytes, and not 1000 bytes.
[37]
However, for hard drive memory sizes , or in general usage as prefixes, power-
of-ten is used, e.g. kilohertz is a thousand Hertz, or a thousand cycles per second.
Fraction prefixes, e.g. “milli”, are always a (negative) power of ten.

The exponent number of the ten’s power can be viewed as the number of the
zeros following the initial one, e.g. 106 is one followed by 6 zeros, or 1,000,000.

ASCII Characters
ASCII (American Standard Code for Information interchange) is a character-
encoding scheme for 128 specific characters based on the English alphabet. It
has the following properties:

1. 7 bits are needed for the character encoding; the 8th bit is normally left as 0 /
zero / “off”. On an 8-bit byte, in a context where one is expecting an ASCII
[38]
character, any byte found that has the MSB turned on indicates that it is
an “escape code”.

In the old days of terminals and line printers, “escape codes” might contain
graphics or symbols (e.g. the copyright © symbol), or something totally non-
printable. Escape codes now can even be two or more bytes depending on
the encoding.

2. The digits ‘0’ to ‘9’ are consecutive.

3. The alphabetic characters ‘a’ to ‘z’ are consecutive, as are ‘A’ to ‘Z’, and the
lowercase letters have lesser values than their uppercase counterparts. This
means that the delta (difference) in numeric value between a lowercase letter
and its corresponding uppercase letter is constant for all letters.

4. Not all characters are printable. Some are non-printable graphics and some
are control character codes for a terminal, printer, or other older devices.

http://commons.wikimedia.org/wiki/File:Ascii-codes-table.gif
Carry, Borrow
When you add one number to another, the operation starts on the right and moves
from there to the next set of digits to the left (or from LSB to MSB), just as in
decimal calculations.
Similar to decimal operations, when you add two binary digits together, the result
may carry a 1 to the next digit to the left. If the final result has an extra 1 carried
from the MSB, then it’s called a carry-out condition.
Carry:
00000001
+ 00000001

00000010 a “1” is carried to to the left



Carry-Out:
11111111
+ 00000001

1 00000000 the final carried “1” exceeds the


number of bits in the number

Likewise, when you subtract a binary digit from another, just like a decimal
operation, you may need to borrow a 1 from the left. Unlike carry-out, there is no
given term for having to borrow from the hypothetical digit beyond the MSB.
Borrow:
00000010
- 00000001 a “1” is borrowed from the left

00000001

Borrow from “outside”:
00000000
- 00000001

11111111 with “1” borrowed from the left


(Later in this chapter, we will discuss how a CPU represents the carry-out or
borrow condition in the status flag register.)
One’s Complement
To design a CPU, one must decide how it will represent negative numbers. One
possible way is called the One’s Complement format. As demonstrated below,
one’s complement has a couple of issues, so most systems actually use “two’s
complement” representation. However, since two’s complement is derived from
the one’s complement system, let’s look at one’s complement representation first.
To begin with, one bit is designated the “sign bit”. Usually the MSB is used for this
purpose: if the MSB is 0, the number is positive; if it is 1, it is negative. That
leaves 7 bits in which to store the magnitude. It should probably come as no
surprise that this is called a sign-magnitude representation.
One’s complement is created by inverting the bits: i.e. changing the 1s to 0s and
the 0s to 1s. In a one’s complement system, a negative number is created by
simply taking the one’s complement of all the bits of the corresponding positive
number.
One’s Complement
Decimal Binary
0 00000000
1 00000001
2 00000010

126 01111110
127 01111111
-127 10000000
-126 10000001
-125 10000010

-2 11111101
-1 11111110
-0 11111111

One’s complement has two issues:
1. There are two zeros: both 0b00000000 and 0b11111111 are zeros. The zeros
do behave as expected, but is confusing to have two bit patterns that represent
the same number.
2. When you add two one’s complement numbers, if there is a carry-out with an
extra 1 coming out of the MSB, then this 1 must be added back to the LSB of
the intermediate result to obtain the correct final result.
3. Likewise, when subtracting two one’s complement numbers, if there is an
underflow with an extra 1 needed for the MSB, then this 1 must be subtracted
from the LSB of the intermediate result to obtain the correct final result.

ONE’S COMPLEMENT, ADDITION
Adding Without Carry-Out
(decimal)
0000 0010 2
+ 0000 0010 +2
–––– –—
0000 0100 4 (result)

Adding With Carry-Out
0111 1111 127
+ 1000 0001 + 126
––––
1 0000 0000 ← carry-out
––––
+ 0000 0001 (adding 1 back to the LSB)
––––
0000 0001 1 (result)

EXERCISE: write an example with one’s complement underflow to show that a 1
needs to be subtracted in that condition to get the correct result.
Two’s Complement
Two’s Complement fixes the problems inherent in one’s complement. To obtain
the two’s complement of a number (i.e. negating the number), first all the bits are
inverted (one’s complement), after which you add one. In two’s complement, there
is only a single representation for the number zero, and no additional adjustment
is needed for addition or subtraction:

The two’s complement of a number ‘A’ = the one’s complement
of ‘A’ + 1

Decimal Binary
0 00000000
1 00000001
2 00000010

126 01111110
127 01111111
-128 10000000
-127 10000001
-126 10000010

-3 11111101
-2 11111110
-1 11111111

Two’s complement representation does have one quirk: the most negative number
in a byte is 0b10000000 or -128, but the most positive number in a byte is:
0b01111111 or 127, i.e. there is one more negative number than the positive
number. This makes more sense if one considers 0 as the first “positive” number
in two’s complement, whereas -1 is the first negative number.
Integer Arithmetic Operations
The hardware operations that a CPU can perform are very simple; usually just the
basic integer arithmetic operations such as addition and subtraction. Multiplication
and division are typically more involved, and most low end (8-bit) CPUs do not
have machine instructions to perform these operations. 32-bit CPUs usually have
multiply instructions, but divide and remainder operations may or may not exist.
Non-supported instructions can, however, be implemented in software by using
sequences of basic hardware operations.
The actual implementation of basic operations in hardware circuitry is beyond the
scope of this book. However, most CPU operations can be implemented using a
few hundred transistors, which are the fundamental building blocks of hardware
logic design.
Note that floating point arithmetic is usually not implemented in hardware. As it is
time-consuming to perform; it is typically done via library function calls provided by
software compilers. In-depth explanation of floating point arithmetic is beyond the
scope of this book.
Integer Addition
We have already seen addition, but just to be clear: here are all the possible
combinations for an addition operation taking two single-bit binary input operands
and giving a 2-bit result:
0 + 0 = 00
0 + 1 = 01
1 + 0 = 01
1 + 1 = 10
For example, here is what occurs when adding 0b11111111 and 0b00000001:
two’s complement decimal
11111111 -1
+ 00000010 + 2
––––— –—
00000001 1
1 ← carry-out

From right to left, the bits are added together with the carry propagating to the left
as the additions are performed. The end result leaves a bit which has been
carried-out to the left, and is lost forever (unless the carry-out condition is
captured by the CPU’s internal state). In decimal, the two’s complement binary
numbers above are -1 and 2 respectively, so adding them results in 1. The carry-
out bit in this case can be ignored.
Integer Subtraction
Subtraction is done using two’s complement. That is, if you convert the second
operand (the number being subtracted, AKA the subtrahend) to its two’s
complement form, and then add the result to the first operand, the answer is the
result of the intended subtraction operation.
Decimal
10101010 170 - 95 = 75
-01011111
––––-
First, form the two’s complement
01011111 → 10100000 + 1 = 10100001

10101010
+10100001
––––—
01001011 75 in decimal

Subtraction is easily done using two’s complement, because due to the design of
logic circuits, the one’s complement (flipped state) of a number is also always
available. So, instead of needing to create separate “subtractor” circuits, the
“adder” circuits can be used to perform subtractions as well as additions.

EXERCISE: work out a few addition and subtraction examples in two’s
complement to show that no adjustments (such as adding or subtracting a 1) are
needed for carry-out or borrow cases.
Integer Multiplication
There are several methods of implementing integer multiplication using computer
circuitry. Most use some variations of “shift while adding” algorithms derived from
the long multiplication which most of us learned in elementary school. We will not
explore this topic in detail here, since the internet has plenty of information on the
subject (search for “Booth’s algorithm” and “binary multiplier”).
Older CPU designs often did not include multiply instructions due to the amount of
logic circuit resource requirements. Most current high performance CPUs include
a single cycle multiply instruction, as partial products can be computed quickly
and added in single cycle. This is one of the methods of implementing CPU
multiply operations.
Integer Division
Integer division is typically only provided in the highest performance 32-bit CPU
designs, as it take a lot of circuitry to implement. Most integer division
implementations use some variation of repeated subtractions. There are other
high performance division algorithms, but those are even more complex to
implement.
Integer division by a constant is exactly the same as multiplying by the constant’s
reciprocal (1/constant), which a software compiler may compute at compile time
so that a fast multiply can be done at run time.
In addition, unsigned division by a power of 2 constant (1, 2, 4, 8, 16… etc.) is
just a logical right shift by the log2 of the constant. For example (>> is the C
notation for right shift):

167 / 8 = ?

log2(8) = 3 (therefore, shift right by 3 bits)

So: 167 / 8 = 167 >> 3 or
0b10100111 >> 3 =
0b00010100 (AKA 20 decimal)

Shift Operations
Shifts, which have many uses (such as extracting certain bits from a word), are
also important for implementing arithmetic operations such as multiplication and
can also be used for unsigned division. “Left shift by one bit” moves all the bits
one position to the left, and fills the lowest order bit (bit 0) with a 0:

Left shift one bit
01110101 ←
11101010

Likewise, “right shift by one bit” moves all the bits one position to the right. While
there is only one type of left shift, there are, however, two types of right shift:
arithmetic and logical.

[39]
An arithmetic right shift (typically used for signed integers , to preserve the
number’s positive or negative number status) retains the value of the sign bit
(MSB) as the original sign bit is shifted to the right. A logical right shift fills the
MSB with a 0. (Of course, if the most significant bit was already a 0, then the
apparent result of an arithmetic right shift and a logical right shift are the same.)

Arithmetic right shift one bit
10101110 →
11010111

Logical right shift one bit
10101110 →
01010111
Signed Overflow
Regardless of how many bits a CPU uses to store numbers (8 bits, 32 bits, etc.),
overflow may happen. In two’s complement, this means, for example, that adding
two negative numbers results in a positive number; or the opposite: adding two
positive numbers results in a negative number. This is known as overflow.

Note that overflow is NOT the same as carry-out. As was shown before, a carry-
out occurs when a “1” is “pushed out” of the MSB to the left. An overflow occurs
when the sign of both operands are the same but the result has a different sign.
(By definition, overflow will not occur if the operands have different signs.)

Signed Decimal
01111111 127
+00000010 + 2
––––— ––
10000001 -127

Overflowed:
MSB changed from 0 to 1, the number has
become negative.
Overflow: Unsigned Wraparound
If you use unsigned representation, e.g. the numbers 0…255 represented in 8
bits, adding two unsigned values may result in a number smaller than either
operand. This is called wraparound. In fact, unsigned arithmetic is sometimes
known as modular arithmetic, because the results are as if you applied the
modulus to the intermediate result (the result you would have if you are not limited
by the number of bits being used) to get the final result.

For example, in using 8 bits to store a number, you would use modulus 256 (28):

Unsigned Decimal
10001010 138
+10001010 +138
––––– ––
00010100 = 20 276 % 256 = 20
1 ← carry-out
Status Flags / Status Bits
A CPU normally maintains a set of “status flags” (which are typically just a set of
dedicated bits). Status flags allow various types of useful information resulting
from hardware operations to be accessed.
The Carry (and sometimes Borrow) Bit
When you look at a CPU reference manual, there is usually a carry status bit in
the processor status register. The use of the carry status bit is well defined for
addition: it is set to “true” (1) when a hardware operation has just resulted in a
carry-out from the MSB. Otherwise, it set to “false” (0).
For subtraction, however, there are two interpretations for use of the carry status
bit. Let’s say that a given operation is “A - B”. One subtraction interpretation treats
the carry status as a borrow flag, so it will be set to true whenever A is less than
B. The other interpretation uses the two’s complement arithmetic model, where
subtraction is done via adding the two’s complement of the subtrahend (the
number we are subtracting). In that case, the carry status reflects the normal
addition case: it is set to “true” whenever there is a carry-out:

Carry being used as a Borrow:
00001010
-00010101
––––—
11110101
1 → borrowed, the carry flag is set to “true” (1)

Carry when subtraction is being performed via two’s
complement addition:
00001010
-00010101

First find two’s complement of 00010101 → 11101011, then
the problem becomes
00001010
+11101011
––––—
11110101
The carry flag is set to “false” (0)

Among CPUs, the 8080, Z80, x86 and 68K families use the borrow flag
subtraction interpretation, while ARM and PowerPC use the carry flag subtraction
interpretation. There is no particular advantage to one choice over the other, and if
you are writing in C, this “under the hood” implementation choice is irrelevant to
you.
Compare Operations
Compare operations are typically done by using subtract operations, with the
calculation results being ignored while the side effects of the operation (the states
of various CPU status flags) are noted.
The 4 most commonly found CPU status flags are:
C - Carry (see above): did the operation produce a carry-out?
V - Overflow: did the operation cause a signed overflow?
Z - Zero: was the result of the operation zero?
N - Negative: was the result a negative number?

Note that signed and unsigned compare are considered two different
operations. For example:
a: 0b11111111
being compared to
b: 0b00000001
If they are unsigned bytes, a is greater than b; you are comparing 255 to 1.
However, if they are being interpreted as signed bytes, then a is less than b since
it is comparing -1 to 1. Using combinations of flag statuses, you can deduce the
comparison result for both signed and unsigned operands.
To compare two operands, first subtract the second operand (b) from the first, and
check the flag status to determine whether the first operand (a) was equal to,
greater than, or less than b. (In the following examples, the carry-out and not the
borrow interpretation is used to determine the status of status flag C.)

Nomenclature used below:
“==” means “is equal to”, “!=” means “is not equal to”. This notation comes from C
programming language grammar. E.g.: “Z == 1” means the Z status flag’s value is
“true” (the status bit is 1 because the operation’s result was zero).

Compare Operations: Equal comparison (unsigned or signed)
‘a’ is equal to ‘b’: flag status result: Z == 1
not equal to: flag status result: Z == 0

Compare Operations: unsigned comparison
greater than (sometimes also called higher than): C == 1, Z == 0
less than (sometimes also called lower than): C == 0, Z == 0

Compare Operations: Signed comparison
greater than: N == V, Z == 0
less than: N != V, Z == 0

EXERCISE: Work out examples and show how signed greater than and signed
less than use the flag statuses as described.

ADVANCED EXERCISE: Although modern CPU designs normally have the
overflow flag V, which allows for more efficient software implementation, one
popular 8-bit CPU has a design based on an older convention which does not
provide this flag. How would you find the result of a signed comparison using only
the C, Z, and N flags?
Bitwise Operations
Bitwise operations include AND, OR, Exclusive OR, and Complement. Note that
there is no overflow or carry in bitwise operations.
Bitwise AND (&)
0 & 0 = 0
0 & 1 = 0
1 & 0 = 0
1 & 1 = 1

The result of an AND operation is ‘1’ if and only if both inputs are ‘1’s.

Bitwise OR (|)
0 | 0 = 0
0 | 1 = 1
1 | 0 = 1
1 | 1 = 1

The result of an OR operation is ‘1’ if either of its inputs is ‘1’.

Bitwise Exclusive OR (^)
0 ^ 0 = 0
0 ^ 1 = 1
1 ^ 0 = 1
1 ^ 1 = 0

The result of an “exclusive OR” operation is ‘1’ if (and only if!) only one of its
inputs is ‘1’.
Exclusive OR performed with “1” as the second operand is sometimes known as
the “toggle operation”, as the result is the toggled value of the first operand.

Bitwise Complement (also know as Bitwise Inverse, Toggle)
~0 = 1
~1 = 0
Bitwise complement is the one’s complement of the bit.
B. A BRIEF HISTORY OF C

In the 1960s, Bell Labs, the research arm of American Telephone and
[40]
Telegraph , was engaged in some of the most important computer science
research of the time. After working on a mainframe operating system project
called “Multics” (which Bell Labs finally pulled out of), Ken Thompson, a
researcher at Bell Labs, decided to implement the best ideas of Multics using
Assembly Language for a Digital Equipment Corp.’s PDP-7 minicomputer.

At that time, paper tape - not even punch cards! - was the standard type of
program storage unit. The code was first developed on a GE-635, then the output
paper tapes were carried over to the PDP-7 for processing, until enough of the
system was working on the PDP-7 to enable native development (although still in
Assembly).

Ken Thompson then began working on a compiler for a language he called B, the
name being attributed either to the language BCPL (to which B bears some
similarities) or Bon, an earlier language which Thompson had written for
[41]
Multics . He then rewrote part of the as-yet unnamed new OS in B.

In 1970, the Bell Labs researchers managed to convince Bell Labs management
to procure a PDP-11, a much more powerful machine, in order to implement a
“text processing system.” Along the way, the name “Unix” was adopted for the
new operating system by Brian W. Kernighan, attributed either to being a play-on-
words of “Multics”, or even possibly a tongue-in-cheek reference to “eunuchs”.
Meanwhile, Dennis M. Ritchie, while working with B on the PDP-11 Unix, decided
to add some needed features to B, and picking the next letter from BCPL, he
called this new language C. Using a procedure called “bootstrapping”, he first
wrote a prototype C compiler in B, after which he rewrote the compiler in C itself,
adding more features at each iteration as needed.

By 1973 and 1974, C was sufficiently mature enough that Unix and its utilities
were entirely rewritten in C. Remarkably, this version of C still bears a great
resemblance to even the latest Standard C, a testament to how well the original C
language was designed. During the same period, C was retargeted to the
Honeywell 635 and IBM 360/370, and through that experience, C adopted
features and encouraged practices that have improved the portability of C
programs, which in turn, contributes to the success of the language today.

Kernighan and Ritchie published the book The C Programming Language in 1978,
arguably considered “the C Bible” even to this day (a “version 2” was published in
1988). During the same time period, Steve Johnson wrote a reference C compiler
called the Portable C Compiler. Unix and C were ported to the interdata 8/32 and
then to the VAX-11 in the late 70s. Due to some anti-monopoly agreement with the
US government, the Unix source was given to the University of California
Berkeley, which they (mainly a programmer named Bill Joy, who went on to co-
found the legendary company Sun Microsystems) modified to become BSD Unix.
The various AT&T Unix and BSD Unix and C versions influenced an MIT hacker
named Richard Stallman (around 1984) to work on “free” versions of the compiler
and operating system, which eventually grew to become the set of GNU software,
including the GCC suite of (non-standard C) compilers. The GNU efforts in turn
influenced a young Finnish student named Linus Torvalds (in 1991) to write a “toy”
operating system that eventually became Linux.

Meanwhile, back in the early 1980s, commercial C compilers started to appear for
chip targets such as Motorola 68K, Intel 8086, and even small microcontrollers
such as the Motorola 68HC11 and Zilog Z80. P.J. Plauger, a researcher at Bell
Labs, left to found Whitesmiths Ltd. and produced one of the first commercial C
compilers outside of Bell Labs with wholly independently-developed compilers.

In the 1980s, C compiler companies practically blossomed like daisies after a
spring rain. Borland’s Turbo C blew the market open with its low price, but
eventually the Windows compiler market coalesced to mainly just Microsoft Visual
C (later Visual Studio). The Mac market was dominated by Think C, then Code
Warrior C, until Mac changed to the Intel x86 processors in the late 90s.

C became the “lingua franca” programming language for machines from small
embedded systems to mainframes. The Bell Labs researchers eventually followed
Unix/C with Plan 9 and then the commercial Inferno OS and the Limbo language,
but they did not achieve the earlier successes of Unix and C.

The timely popularity of many computer languages has often been fickle. After the
computer industry collectively embraced C/C++ in the 90s, its attention now
seems poised to fragment again, into Go (Google), Swift (Apple), C# (Microsoft),
Java (Sun/Oracle, Android), Objective C (Mac OS X and iOS, although with Apple
working on Swift, the writing is on the wall for Objective C), and a smattering of
scripting / interpreting languages for the web such as PHP, Python, Ruby, and
Perl.

For embedded systems, though, C is STILL the most popular high-level
programming language, even for the 32-bit segment. As long as price and
performance are the prime driving factors in the embedded space, C’s future is
still assured.
C. THE C STANDARDS

The definitive technical description of this language is the “ISO C Standard.” You
can find that on the web by using search terms, but in order to see the full content
of the released Standard documents, you must pay a fee to the Standards
[42]
Organization . Luckily, there are also various drafts, FAQs, summaries, and
plenty of helpful information on the web and in printed literature that you can
peruse for free. Most of the time, the draft versions are nearly identical to the
release versions.

There have been three major releases of the C Standard. Summarizing, with
emphasis on language differences (standard library differences are omitted):

[43]
ANSI C89 / ISO C90 - the first C Standard; 1989. This mostly codified
existing practices and resolved some of the differences between compiler
implementations up until that time. In particular, the original compilers were based
on the AT&T compilers, but by 1989, many commercial compilers written by other
vendors have sprung up, sometimes disagreeing with the AT&T compilers on
language syntax and semantics.

C99 (ratified in 1999) - This release added inline functions, interspersed
declarations and statements, variable-length arrays, and // as a single line
comment delimiter.

C11 (2011) - This release added new keywords and macros for alignment,
thread-local-storage, and anonymous struct / union.

Here are some drafts of the standards, with their availability subject to change
from the hosting organizations. Later drafts might also exist.

C90 Draft:
http://web.archive.org/web/20050207005628/http://dev.unicals.com/papers/c89-
draft.html

C99 Draft: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf

C11 Draft: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1494.pdf

Most embedded C compilers implement C89 or a subset of the ISO C99. This
book describes C as implemented by the ImageCraft JumpStart C compilers,
which implement C90 with some C99/C11 features, including // comment,
anonymous struct / union, interspersed declarations and statements, and for loop
variable declarations. Additional C99 and C11 features are also scheduled to be
[44]
added.
D. C COMPILERS AND THE RUNTIME
ENVIRONMENT

Regarding Compilers
A compiler is a software program that translates source files written in a high level
programming language, e.g. C, to a set of machine instructions for a target
machine.

There are C compilers available for various targets, ranging from large
mainframes to the smallest 8-bit MCUs. ImageCraft produces two popular
embedded system C compilers: JumpStart C for the Atmel AVR, and JumpStart C
for the Cortex-M. When implementation-specific topics are covered, this book
uses either of these compilers as examples.

While a program written in C and compiled to assembly might be slower than a
program written in assembly (from 1.5x to possibly 5x as slow, depending on the
compiler and the target machine), this overhead is usually well worth the tradeoff,
due to the numerous advantages of writing and modifying programs in a high level
language.
Building a C Program
The source of a C program is typically separated into multiple source files so that
it is easier to write and maintain. The source files are processed by multiple
programs, eventually resulting in a “program image” that can be download into the
microcontroller. The term “C Compiler” is commonly used to refer to the entire
chain of tools, even though properly speaking, a C compiler is only one part out of
many. In the narrow context, a C compiler translates C source file into assembly
code of the target CPU.


The C source file with .c extension is processed by the C preprocessor, then by
the compiler proper and the assembler; and finally the linker combines all the .o
files and .a library files into a set of output files. The .bin and .hex output files are
the output image in binary and hexadecimal format.

A top level program called a compiler driver hides the compilation process from
the users, and the integrated Development Environment (IDE) hides everything in
an easy to use GUI.

Compiler Passes
A “compiler” is typically made up of a few different programs, as shown in the
diagram on the previous page. Each stage of conversion between the original C
source file to the final executable file is called a “compiler pass”.

Compiler Driver
Not shown in the table above, the compiler driver is responsible for the top level
user interaction, and it processes the input files by calling the different compiler
passes in sequence.

C Preprocessor
A C source line that starts with the # symbol is a preprocessor directive (a specific
instruction to the preprocessor to perform some behavior) (see <Chapter 6. The C
Preprocessor>). Certain text in the source file might be textually replaced by other
content if it is affected by such directives. For example:

#include <stdio.h>

replaces the line with the contents of the file stdio.h. By convention, files that
are to be #include’d by C source files use a .h extension.

C Compiler Proper
The C compiler proper translates C language input into assembly language.

Assembler
The assembler translates an assembly language input into object code.

Linker
The linker combines any number of object files with the required library files to
form the final output executable files. The executable images are “burned” or
“downloaded” onto the target machine, e.g. the JumpStart MicroBox kit, and run.
The C Abstract Machine
A programming language can be described by its syntax and the semantics of the
language elements. A subtle implication is that the language also defines an
“abstract machine” that the generated code depends on, including the sizes,
behaviors, and representations of data types, and the memory model expected by
a running program.

The C programming language is a relatively simple language. Most C operators
and variable accesses compile to a few machine instructions. More involved
operations, for example floating point or 64-bit integer operations, might compile
to calls to internal library functions provided by the compilers.

Therefore, in most cases, a C Abstract Machine can be implemented with just
tens of instructions in the startup code, which takes initial control when a C
program is run. After the startup code sets up the C environment, it turns controls
over to the user function “main”.
C Startup
The C compiler inserts startup code in the program image (the loaded executable
version of the compiled program) that takes control as the CPU resets. The
startup code sets up a C abstract machine by:

Initializing the stack pointer

Zeroing the bss segment (see later section) - global and static variables (see
Chapter 10. Variables>) that do not have explicit initialized values.

Copying the initialized values of the global and static variables that so have
explicit initialized values from the program image to the data segment (see
later section).

After that, the startup code transfers control to the function main, which enters the
user code. Main is not expected to return, and if it somehow does, typically the
startup code will then just loop forever.
The Code Segment
Executable instructions are placed in the machine’s code segment, which usually
is located in read-only memory. In most MCUs, this corresponds to the flash
memory, often built into the MCU itself. The size of the code segment is known to
the compiler tools. A typical compiler may allocate the space in the code segment
as follows:

Low Memory Address


Startup code
Code for user function 1
Code for user function 2

Code for user function n
Code for C library function 1
Code for C library function 2

Code for C library function n

High Memory Address



Compilers may put the user functions and the C library functions in any order.
They may also put the startup code elsewhere (not at the beginning), as long as
the startup code is still invoked first.

The size, format and representations of the executable instructions are defined by
the target ISA (Instruction Set Architecture).
Literal Segments
A program might have literal content, for example, literal character strings such as
“Hello World”, or floating point constants such as 3.1415926. Conceptually, they
are gathered in a literal segment, which is also located in read-only memory.

Most of the time, a compiler will place the literal segment adjacent to the code
segment. However, in some CPU Architectures, the literal segment is split into
multiple literal pools, with each pool of literals placed close to the function in which
they are accessed. This is necessary if the CPU architecture has limited
addressing modes and it might be either inefficient or impossible to fetch the literal
data if they are placed too far away in memory. The ARM Cortex-M is in fact one
such architecture; JumpStart C for Cortex-M puts each literal pool right after the
function where those literals are used.

In the illustration examples in the following sections, the literal segment will be
omitted, as it can be viewed as part of the code segment although it does not itself
contain executable instructions.
Global Data Segments
Global and static variables are allocated in data memory. This corresponds to the
RAM memory in the target machines. Global and static variables might have
initialized values as part of their declarations. C specifies that uninitialized global
and static variables are given the value zero by default.

A common terminology is to call the region of initialized variables the data
segment and the region of uninitialized variables the bss segment, with “bss”
standing for “Block Started by Symbol”, a legacy term used by an early assembler.

The sizes of the data and bss segments are known by the compiler tools. A typical
compiler allocates global data memory as follows:

Low Memory Address

initialized variable 1 ← start of the data segment


initialized variable 2

initialized variable n
uninitialized variable 1 ← start of the bss segment
uninitialized variable 2

uninitialized variable n

High Memory Address



In this example, the data segment precedes the bss segment, but the order may
be reversed in other implementations.
The Stack Segment
The stack is an area of memory used by program functions for variable storage.
Functions allocate local variables in either CPU registers or in the stack segment.
The stack segment is also located in RAM.

A stack is a FILO (“First In Last Out”) structure; items that are pushed onto the
stack must be popped off in inverse order of being put on.

A function’s local variables are allocated on the stack when the function is called,
and the memory is reclaimed (“freed”, or deallocated) when the function exits. As
the size of the stack typically varies quite a bit during program execution, the
[45]
stack size is NOT known to the compiler tools .

Since the size of the stack segment is unknown at compile time, a common
strategy is to put the top of the stack at the highest possible RAM location and let
the stack “grow” toward the lower memory address space. Most MCUs do not
have memory protection, and if the stack segment accidentally collides with the
data segments as it grows, then memory corruption will result and Bad Things
Can Happen. By placing the top of stack at the highest memory address, this at
least minimizes the chance of segment collision.

Low Memory Address

[ Data Segment ]
[ BSS Segment ]




(stack frames grow toward low memory addresses)
stack frame for function b ← current stack pointer
stack frame for function a
stack frame for function main ← top of stack (TOS)
High Memory Address

main is the root of the program (the first user function in a program that initiates
the other function calls), so its stack is located at the highest address. As each
function call is made, a new stack frame for that particular function is allocated.
When a function call exits, its stack frame is deallocated (freed up for other use).
Heap Memory
C allows allocation of dynamic data structures at runtime through the Standard C
Library functions malloc, calloc and free. These objects are allocated in the
heap memory, also located in RAM. Heap objects are accessed through pointers.

Heap objects are managed entirely by the user using the library functions. These
objects are created and destroyed as needed, depending on the program’s
runtime requirements.

To minimize the possibility of memory corruption, heap memory is placed to start
after the BSS segment, and memory allocation for the heap grows toward higher
memory address. Memory for dynamic objects may be reclaimed by calling the
function free; the object is then destroyed (deallocated). The heap space is
managed by library functions which compact and merge free heap space as
needed.

Low Memory Address

[ Data Segment ]
[ BSS Segment ]
heap object 1 ← start of the heap objects
heap object 2
heap object 3
(heap objects allocated toward high memory address)




(stack frames grow toward low memory address)
[ Stack Segment ]

High Memory Address


AVR Specific: RAM Literal Strings vs. Flash Literal
Strings
In C, literal strings such as “Hello World” have the data type of “array of char”.
Among other reasons, this allows a function which takes “char *” as a function
argument to take any of the following as arguments:

1. a variable with the type “char *”
2. a variable with the type “array of char”
3. a literal string

For example:

// prototype for library function strcpy
extern char *strcpy(char *dst, const char *src);

char buffer[100];
char hello_array[] = { “Hello” };
char *a_ptr = “Hello”;

// all of these are valid
strcpy(buffer, &hello_array[0]);
strcpy(buffer, a_ptr);
strcpy(buffer, “Hello World”);

Since literal strings should not be modified, they are normally placed into the
literal segment, as mentioned in previous section.

For a CPU architecture that has separate instructions for RAM access and flash
access, literal strings may be place in either RAM space, or in read-only space.
The Atmel AVR is just a machine. Either decision has its advantages and
disadvantages.

If literal strings are placed in RAM space, then there is no additional work
necessary for the programmer. The compiler must make sure that the RAM space
is initialized with the correct content at program startup, but the programmers do
not need to be concerned about those details. However, as the AVR has limited
RAM, a better choice in terms of space usage is to allocate literal strings in read-
only memory instead.

If literal strings are placed in the read-only memory (i.e. flash memory), then the
compiler must generate different instructions to access a literal string. Flash-
based literals have the data type of “array of char in flash”, or in C: “__flash
char *”

NOTE: This has the unfortunate consequence that a standard function such as
“strcpy” will no longer work with literal strings - an equivalent function that accepts
“__flash char *” must be provided:

extern char *cstrcpy(char *dst, __flash char *src);

JumpStart C for AVR places literal strings in RAM by default, so that regular
Standard C library functions can be used without source modifications. To reduce
RAM consumption, JumpStart C for AVR provides an option to place literal strings
in flash, if desired. When a user selects this option, then they must use the flash-
equivalent versions of the Standard C library functions. Please refer to the
JumpStart C for AVR documentation for more details.

[1]
If you do not obtain a license, the program runs in demo mode and is fully functional for 45 days. This will
be sufficient for the simpler programs.
[2]
Step by step set of instructions on how to do something.
[3]
Teletype and paper tape, when you go far enough back in the history of C.

[4]
The actual example you see might be slightly different from the excerpts in this book, as things might have
been changed in minor ways since publication.
[5]
See <JumpStart MicroBox Preparation> section earlier.
[6]
“Arguments” are values being passed to a function that you are invoking.
[7]
A VERY trivial fact is that there is actually no such thing as a “negative constant” in C, but rather it is a
negation operator applied to a positive constant. However, there is absolutely no difference between that and
a negative constant, except to score points as a C geek.
[8]
This is the original definition for ASCII. There are now extended ASCII codes.

[9]
A nonzero value is considered as “true” in C.
[10]
Seriously.
[11]
Traditionally, Unix accepts the just the code NL to this, Windows/DOS needs CR, followed by NL, and
Mac OS (the pre-OSX versions) accepts NL, followed by CR.
[12]
The C technical term is compatible type, which will be explained in chapter <Variables and Type
Declarations>.
[13]
But see _Bool Data Type in chapter <Types and Declarations>

[14]
Most people do not bother to #define other operators. Perhaps the fact that the symbols && and || look
so different from operators in other languages drives people to do this.
[15]
Except bitfields, but they are not standalone objects.
[16]
The if-else statement will be described in the chapter <Statements>.

[17]
That is, find the first nonzero digit from the right and discard all the zeroes to the right of it. The number of
digits left is known as significant digits. The number of decimal significant digits is not exact, because floating
point uses binary representation.
[18]
One exception is when a read access to a volatile I/O register effects a change in the I/O state. This is
explained later in <I/O Register Access>
[19]
An earlier attempt of this book tried to separate Types and Declarations into their own chapters, and
the results weren’t pretty.
[20]
Although using more than 3 dimensions is very rare.
[21]
There is no enum type in the original C definition, hence earlier programs made use of #define. Since
its introduction in C90, enum is the preferred mechanism.

[22]
This is a typical way to construct a linked list. See chapter <Advanced Data Structures>.
[23]
It’s not incorrect, just have no effect whatsoever.
[24]
JumpStart C for AVR only.
[25]
C actually allows you to also write “const int”; it means the same thing, as C is flexible in type
keyword placements.
[26]
These symbols “decorate” other parts of the declaration.
[27]
It must be preceded by at least one named argument. Variadic functions are described in more detail
later.
[28]
Which you really should not be doing. Yes, this book stresses this point any chance it gets.
[29]
Some languages allow an argument to be “call by reference”, allowing modifications to a passed
argument to be reflected in the calling function. Under the hood, they basically implement the pointer passing
and modification through the passed pointer.
[30]
Unless you really know what you are doing. See “Accessing the Stack Address” in the chapter
<Advanced Topic: Effective Pointer and Array Usage>
[31]
Whitespace in front of the # character is ignored.
[32]
Essentially, “building a project”, a term used by the IDE (Integrated Development Environment), basically
just means compiling all of the source files to object files and then linking them together into one executable.
[33]
Unfortunately, off-by-one errors and “hacking until it works” are commonly encountered phenomena in
programming. :(
[34]
Ad nauseum? :p
[35]
Not to be confused with a “Flying Spaghetti Monster”.
[36]
The official C Standard does not include binary notation, but 0b is a common extension accepted by a
number of C compilers, including the JumpStart C Compilers.
[37]
By referring to power-of-ten instead of the typical power-of-two, hard drive manufacturers are “cheating”
you on expected memory bytes
[38]
MSB = “most significant bit”
[39]
As opposed to signed char, or signed short, etc.
[40]
The original AT&T, with a monopoly on phone service, is the foremother (“Ma Bell”) of the current 2000s
AT&T cell phone carrier
[41]
B and BCPL have similar operators to C, but they have no concept of data types.
[42]
Almost as if they want to encourage people to use a non-standard C instead. Short-sighted, much?
[43]
Originally it was ANSI C89, an American standard. When it was adopted by the International Standards
Organization (ISO), it became ISO C90, but it’s the same standard.
[44]
Other implementation choices that are not applicable will not be discussed here; for example, ImageCraft
JumpStart compilers do not support multibyte characters or non-US locales.
[45]
Although it is possible to estimate the stack segment usage by performing program analysis.

You might also like