Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

e-PG Pathshala

Subject: Computer Science


Paper: Operating Systems
Module 38: Process Management in Linux
Module No: CS/OS/38
Quadrant 1 — e-text

38.1 Introduction
In this module, we learn how process management is handled in the Linux operating
system. We learn various aspects related to processes such as process identity, process
environment and process context. We also learn the usage of the fork and clone system calls in
Linux.

38.2 Process
A process is a program in execution. When a program is executed, a process is formed. In
addition to the code that we see as a program, there are many other components for a process.
A process comprises of text, data and stack regions. The text corresponds to the instructions.
The data can be initialized or uninitialized global data. The stack is used when there is a
function call. The parameters passed to the function and the local variables of the function are
kept in the stack. When a process runs, a few registers are used. Some of them are the
program counter (PC) and the stack pointer (SP). The program counter stores the address of
the next to be executed for that process. The stack pointer points to the top of the stack for that
process. There are a number of kernel data structures that are used during the execution of a
process like the process table, file table, inode table and so on. All the above said components
comprise a process. Linux uses a process model similar to other versions of UNIX. We first
review the traditional UNIX process model and then learn the aspects of the Linux threading
model.
38.3 Process Management
UNIX process management separates the creation of processes and the running of a new
program into two distinct operations. The creation and running are implemented using two
different system calls. The fork system call creates a new process. When a new process is
created, a new program need not be run. The new process that is created can execute the
same program as the parent. The newly created process continues to execute the next
instruction from the point where the parent was executing at the time of creation of this new
process.
The exec system call is used to run a new program. There are different variants of exec
such as execvp, execve, execlp and execle. Executing a new program does not require a
process to be created just before the program is run. Any process can run a program at any
point of time. To execute a new program, the name of the object file is given as argument to the
execve call. The binary object file is loaded into the process’s address space. The new
executable starts executing in the context of the existing process.
Under UNIX, a process encompasses all the information that the operating system must
maintain to track the context of a single execution of a single program. Under Linux, process
properties fall into three groups: the process’s identity, environment, and context. We learn the
various aspects of these three groups in the following sections.
38.4 Process Identity
Process id:
There are different ways in which a process is identified. On e of them is the process id.
When a process is created, it is assigned a unique identifier called the process id. This process
id is useful when the process needs to be referred to by a user or by a program. The process id
is used to specify processes to the operating system. When an application makes a system call
to signal, modify, or wait for another process, the process id is given as argument to the system
calls to identify the process.
Credentials:
Each process must have an associated user ID and one or more group IDs that determine
the process’s rights to access system resources and files.
Personality:
Personality is not traditionally found on UNIX systems, but present in Linux. The
personality sets the process execution domain. Under Linux, each process has an associated
personality identifier that can slightly modify the semantics of certain system calls. That is, a
process with a particular personality can behave in a particular manner for certain system calls.
Namespace:
Each process has a specific view of the file-system, called namespace. For example,
each process has a specific root directory. That is, the current root of one process can be
different from the current root of another process. Similarly, the current directory of one process
can be different from the current directory of another process. Each process can view a different
set of mounted file systems. Most processes can share a common namespace and can operate
on a shared file-system hierarchy (root directory, set of mounted file systems). When a parent
process creates a child process, the child inherits the namespace from the parent process. Bu
the child process can change its namespace. Therefore, processes and their children can have
different namespaces.
38.5 Process Environment
The process’s environment is inherited from its parent when the process is created. The
process environment is composed of two null-terminated vectors. One is the argument vector
that lists the command-line arguments used to invoke the currently running program. When an
object file is executed, the name of the object file followed by the arguments given to the
executing program is given in the command line. This list of arguments along with the name of
the object file forms one part of the environment of the process.
The second vector is the environment vector. The environment vector is a list of
“NAME=VALUE” pairs that associates named environment variables with arbitrary textual
values. For example, TERM is an environment variable which is used to name the type of
terminal connected to a user’s login session.
Both the argument vector and environment vector are not altered when a new process is
created. The created child inherits the environment of the parent. When a new program is
invoked, a new environment can be set up. On calling execle() or execve(), a process can
supply the environment for the new program as an argument to the system call The kernel
passes these environment variables to the next program, replacing the process’s current
environment. The environment-variable mechanism custom-tailors the operating system on a
per-process basis.
38.6 Process Context
The state of a running program at any point in time is called the context of a process. This
state of the process keeps on changing and hence, the context of the process also keeps on
changing. The context of the process includes scheduling contexts, accounting, file table,
file-system context, signal-handler table, virtual memory context and so on. We now see what
each of the above refers to.

Scheduling context:
This is the information that the scheduler needs to suspend and restart a process. We
know that when a scheduler chooses a new process to run and a context switch is made, the
context of the old process is saved and the context of the new process is loaded. The
scheduling context includes the saved copies of all process’s registers, information about the
scheduling priority, information about any outstanding signals waiting to be delivered to the
process and the kernel stack used by the process while executing kernel code.

Accounting information:
This is the information about resources currently consumed by each process and the
information about the total resources consumed by the process in its lifetime so far. The
resources consumed may be information about the CPU usage time, the amount of time the
process spent in kernel mode, the amount of time the process spent in user mode and so on.

File Table:
The file table has an array of pointers to kernel file structures. Whenever a file is created
or a file is opened, the file descriptor is returned. When making file I/O system calls (like read,
write and so on), processes refer to files using this file descriptor. The kernel uses the file
descriptor as an index into the file table. The file table has entries that point to other data
structures that help in accessing file contents.

File-system context:
The file table lists the existing open files, whereas, the file-system context applies to
requests to open new files. Process’s root directory, current working directory and namespace
(default directories to be used for new file searches) are stored in the file-system context.

Signal-handler table:
Signals are used to notify events to processes. Signals may be sent from one process to
another or from the kernel to a process. When a signal is sent to a process, the process can
choose to ignore the signal or can invoke a routine in the process’s address space or can let the
default action take place. The default action is to terminate the process. The signal-handler
table defines the action to take in response to a specific signal.

Virtual-memory context:
The virtual-memory context describes the full contents of the process’s private address
space. The address space of a process comprises of the text, data and stack regions of the
process.

38.7 Processes and Threads


We now look at the difference between processes and threads as seen by Linux. Linux does
not distinguish between a thread and a process. Linux uses the term task to refer to a flow of
control within a program. The fork call duplicates a process without loading a new executable
image. There is another system call called ‘clone’ which behaves similar to fork except that it
accepts as arguments a set of flags that dictate what resources are shared between the parent
and the child. The flags that can be given as argument to clone include CLONE_FS,
CLONE_FILES, CLONE_SIGHAND and CLONE_VM. These arguments decide which
resources are shared between the parent and the child.
For example, if CLONE_FS is set, then it means that file-system information (current
working directory) is shared between the parent and the child. If CLONE_VM is set, the same
memory space is shared by the parent and the child. If CLONE_SIGHAND is set, signal
handlers are shared. If CLONE_FILES is set, the set of open files is shared between the parent
and the child. Thus, if no flag is set for the clone system call, then it behaves similar to fork.
The lack of distinction between a process and a thread is because Linux does not hold a
process’s entire context within the main process data structure. In the case of UNIX, there is
process data structure that holds all the details related to a process. But in the case of Linux,
the operating system holds the context within independent subcontexts. A process’s file-system
context, file-descriptor table, signal-handler table and virtual-memory context are all held in
separate data structures. There is one process data structure (named as struct task_struct) that
has a pointer to these structures. There is a reference count associated with each subcontext.
Any number of processes can easily share a subcontext by pointing to the subcontext and
incrementing a reference count. The arguments to the clone() call tell it which subcontexts to
copy and which to share. If the argument flags are set, then the corresponding subcontexts are
shared and if the argument flags are not set, the corresponding subcontexts are copied. Any
new process is always given a new identity and a new scheduling context. According to the
arguments passed, the kernel may share the subcontext data structures or make a copy of the
subcontext data structures. The fork() system call is a special case of the clone() which copies
all subcontexts, shares none.

30.2 Summary

In this module, we learnt different properties of a process such as the process’s identity,
environment and context. We also learnt the difference between the fork and clone system calls
used in Linux.

References
1. Abraham Silberschatz, Peter B. Galvin, Greg Gagne, “Operating System Concepts”,
Sixth Edition, John Wiley & Sons Inc., 2003.

You might also like