Professional Documents
Culture Documents
1 Unix Processes: Null Segfault Main
1 Unix Processes: Null Segfault Main
1 Unix Processes: Null Segfault Main
A process is an executing instance of a program. We have been discussing processes in one form
or another throughout the class, but in this lesson we dive into the lifecycle of a process. How does
a process get created? How is it managed? How does it die?
For starters, let's return to the definition of a process: an executing instance of a program.
A program is the set of instructions for how a process operates when run, while a process is a
current instance of that program as it executes. There can be multiple instances of the same
program running; for example, multiple users can be logged into a computer at the same time,
each running a shell, which is same program with multiple executing instances.
A process is also an Operating System abstraction. It's a way to manage varied programs and
contain them within individual units. It is via this abstraction that the O.S. can provide isolation,
ensuring that one process cannot interfere with another. A proces is also the core unit of the O.S.
resource of Process Management which main goal is to determine which process has access to the
CPU and at what time.
In this lesson, we are going to trace the life and death of a Unix process, from the first nubile
invocation to the final mortal termination. We'll begin at the end, with death, and work backwards
towards birth. Throughout this lesson, we'll use the diagram below to explain these concepts. You
can find a version of this diagram in APUE on page 201, Section 7.3.
terminate from another point in the program, somewhere other than main()? We need a separate
mechanism to do that, and the solution is an exit call, like we did in bash programming.
2.1
exit()
and
_exit()
and
_Exit()
The printing of "Hello World!" includes a new line symbol, and thus the buffer is flushed and "Hello
world!" is printed to the terminal. However, consider this version of the hello-world program:
int main(){
printf("Hello World!");
}
This time there is no newline, but "Hello World" is still printed to the terminal. How?
When the main() function returns, it actually returns to another function within the C startup
routines, which calls exit(). Then exit() will perform a cleanup of standard I/O, flushing all the
buffered writes.
However, when you call _exit(), buffers are not cleared. The process will exit immediately. You
can see the difference between these two procedures in these two programs:
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
int main(){
}
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
int main(){
You do not need to reply on the exit procedures to clear the I/O buffers. Instead, you can directly
flush the buffers with fflush() library function. For example:
int main(){
The fflush() function takes a FILE * and will read/write all data that is currently being
buffered. There is also an analogous function, fpurge(), which will delete all data in the buffers.
int main(){
In some ways, this is good policy. You want errors to be reported immediately, not when it is most
convenient for the buffer. There is a side effect, however, writes to stderr are more expensive
since it requires an immediate context switch.
We can
change
the
policy
for
how
a
input/output
stream
is
buffered.
By
default, stdout and stdin is line buffered which means that input and output is buffered until a
newline is read/written. stderr is unbuffered, as described above. There is also a third
choice, fully buffered, which means writes/reads occur once the buffer is full.
The library call to change the buffering policy is setvbuf(), which has the following function
declaration:
int setvbuf(FILE *stream, char *buf, int mode, size_t size);
You select which input/output stream is effected, such as stderr or stdout, and you can also
provide memory for the buffer with its size. The mode option can have the following choices:
_IONBF unbuffered : data is written immediately to the device via the system
call write()
_IOLBF line buffered : data is written to the device using write() once a newline is
found or the buffer is full
_IOFBF fully buffered : data is only written to the device using =write() once the buffer
is full
In general, you do not need to specify a new buffer, instead just want to affect the mode. For
example, if we want to set stderr to be line buffered, we can alter the program from above like
so, and the result would be that it would no longer print
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
int main(){
2.4
atexit()
Exit Handlers
So far, we've seen within the context of I/O buffering, that the exit() procedure will perform
actions before executing the final exit, with _exit(). What if you wanted to add an additional exit
processing? You do that with exit handlers.
void my_exit1(){
printf("FIRST Exit Handler\n");
}
void my_exit2(){
printf("SECOND Exit Handler\n");
}
int main(){
A couple of other things to note here is that the argument to atexit() is a function. This is the
first time we are using a function pointer, or a reference to a function. We will do something similar
later in the class when setting up signal handlers, again registering a function to be called when an
event occurs.
exit(int status);
_Exit(int status);
Additionally, the return value of main() implicitly sets the exit status. Convention indicates the
following exit status:
0 : success
1 : failure
2 : error
We've seen this before when programming with bash scripting. Recall that an if statement in bash
executes a program:
if cmd
then
else
fi
When cmd succeeds, i.e., returns with exit status 0, then the then block is executed. If
the cmd fails or there was an error, the else block is executed. The special variable, $?, also
stores the exit status of the last executed program.
An exit status actually is more than just the argument to _exit(). The kernel also prepares a
termination status for a process, one of those parts is the exit status. The termination status also
contains information about how the program terminated, and we will concern ourselves more with
termination status when we discuss process management.
The exec family of system calls simply loads a program from memory and executes it, replacing
the current program of the process. Once the loading of instructions is completed, exec will start
execution through the startup procedures. We'll just discuss the execv() system call, which has
the following function definition. Read the manual page to find the other forms of exec,
including execve() and execvp().
int execv(const char *path, char *const argv[]);
execv() takes a path to a program, as it lives within the file system, as well as the arguments to
that program. It's easier to understand by looking at an example. The program below, will execute
ls.
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
//arguments for ls
char * ls_args[2] = { "/bin/ls", NULL} ;
//execute ls
execv( ls_args[0], ls_args);
The execv() system call takes a path to the program, in this case that is "/bin/ls", as well as
an arg array. For this exec, the only argument is the name of the program, ls_args[0]. You
might notice that the arguments to execv() match the arguments to main() with respect to
the argv array, and that's intentional. In some ways, you can think of an exec calling
the main() function of the program directly with those arguments. You have to NULL the last
argument so that exec can count the total arguments, to set argc.
For example, we can extend this program to take an argument for ls where it will long list the
contents of the directory /bin. If we were to call ls from the shell to perform this task, we would
do so like this:
ls -l /bin
We can translate that into an argv array with the following values:
//arguments for ls, will run: ls
-l /bin
//
'
//
In this case, ls_argv has 3 fields set, not including NULL. We can now pass this through to exec:
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
-l /bin
//
'
//
//execute ls
execv( ls_args[0], ls_args);
Another thing to note is that upon success, an exec does not return. Instead, the whole program
is replaced with the exec'ed program, when the exec'ed program returns, that's the final return.
To check if an exec fails, you don't need an if statement.