Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

Introduction to Linux 2020-2021

Module 4
Topics to be covered:
 The Structure of Processes: Process States and Transitions
 Layout of system memory
 Context of a process.
 Process Control: Process Creation
 Signals – Process Termination
 Invoking other programs
 PID & PPID – Shell on a Shell.

What is a Process?
 A process is the execution of a program and consists of a sequence of bytes that the
CPU interprets as machine instructions (called "text"), data, and stack.

 Program vs process
o Program : just a passive collection of instructions (in disk)
o Process : an instance of running program (in memory)

 Many processes may execute simultaneously as the kernel schedules them for
execution, and several processes may be instances of one program.

 A process reads and writes its data and stack sections, but it cannot read or write the
data and stack of other processes.

 Processes communicate with other processes and with the environment via system
calls.

 In Unix system, a process is created by the fork system call. The process which
invokes fork system call is called the parent process and the newly created process is
called the child process. A process can have only one parent process but it can have
many child processes.

 The kernel identifies each process by its process ID (PID). Process 0 is a special
process that is created internally by the kernel when the system boots.

 When a process is created, each process is allocated a u area (u stands for "user.") in
the memory that contains private data manipulated only by the kernel. The u area
contains additional information that controls the operation of a process.

 Kernel maintains a process table for all processes and makes an entry in the process
table, when a process is created.

 During its execution, a process undergoes several states. The kernel stores the state of
the process and other information about the process in the process table and u areas.

Jain School of CS and IT


Introduction to Linux 2020-2021

 The entry in the process table and the u-area of the process combined together is
the context of the process.

 Execution Modes

 A process in UNIX system can execute in two CPU execution modes: user mode and
kernel mode. Execution modes place restrictions on the operations that can be
performed by the running process.

 User mode
o When CPU is executing user application, it is in user mode. It is a non-
privileged mode in which the process is forbidden to access those portions of
memory that have been allocated to the kernel or to other programs.

 Kernel mode
o When the CPU is executing kernel code, it is in kernel mode. It is a privileged
mode, and thus it can execute any instructions and reference any memory
addresses (i.e., locations in memory) or access to any resource.

Process States & Transition Diagram


 The process enters the created state when the parent process executes the fork system
call and eventually moves into a state where it is ready to run (3 or 5).
 The scheduler will eventually pick the process and the process enters the state kernel
running, where it completes its part of fork system call.
 After the completion of system call, it may move to user running. When interrupts
occur (such as system call), it again enters the state kernel running.

Jain School of CS and IT


Introduction to Linux 2020-2021

 After the servicing of the interrupt, the kernel may decide to schedule another process
to execute, so the first process enters the state preempted.
 The state preempted is really same as the state ready to run in memory, but they are
depicted separately to stress that a process executing in kernel mode can be preempted
only when it is about to return to user mode. Consequently, the kernel could swap a
process from the state preempted if necessary. Eventually, it will return to user
running again.
 When a system call is executed, it leaves the state user running and enters the
state kernel running. If in kernel mode, the process needs to sleep for some reason (such
as waiting for I/O), it enters the state asleep in memory. When the interrupt handler
awakens the process, it enters the state ready to run in memory.
 Suppose the system is executing many processes that do not fit simultaneously into
main memory, then the swapper (process 0) swaps out a process to make room for
another process that is in the state ready to run swapped. When evicted from main
memory, the process enters the state ready to run swapped. Eventually, swapper
chooses the process as most eligible to run and it re-enters the state ready to run in
memory. And then when it is scheduled, it will enter the state kernel running. When a
process completes and invokes exit system call, thus entering the states kernel
running and finally, the zombie state.
 Some state transitions can be controlled by the users, but not all. User can create a
process. But the user has no control over when a process transitions to sleeping in
memory to sleeping in swap, or ready to run in memory to ready to run in swap, etc. A
process can make a system call to transition itself to kernel running state. But it has no

Jain School of CS and IT


Introduction to Linux 2020-2021

control over when it will return from kernel mode. Finally, a process can exit whenever
it wants, but that is not the only reason for exit to be called.
 Two kernel data structures describe the state of a process: the process table entry and
the u-area. The process table contains information that should be accessible to the kernel
and the u-area contains the information that should be accessible to the process only
when its running. Kernel allocates space for u-area only when creating a process.

Fields in the Process Table


 State of the process
 Pointers that allow the kernel to locate the process and its u-area in memory.
o This information is used to do a context switch to the process when the process
moves from state ready to run in memory to the state kernel running or from the
state preempted to the state user running or when swapping the process.
 Several user identifiers (user IDs) determine various process privileges and are used
when processes send signals to each other.
 Process identifiers (process IDs or PIDs) specify the relationship of processes to each
other.
o These ID fields are set up when the process enters the state created in
the fork system call.
 Event descriptor when the process is in sleep state.
 Scheduling parameters allow the kernel to determine the order in which processes move
to the states kernel running and user running.
 A signal field enumerates the signals sent to a process but not yet handled.
 Various timers give process execution time and kernel resource utilization.
o These are used for calculation of process scheduling priority.

Fields in the u-area


 A pointer in the process table identifies the entry that corresponds to the u-area.
 The real and effective user IDs determine various privileges allowed the process, such
as file access rights.
 Timer fields record the time the process spent executing in user mode and in kernel
mode.
 An array indicates how the process wishes to react to signals.
 The control terminal field identifies the "login terminal" associated with the process, if
one exists.
 An error field records errors encountered during a system call.
 A return value field contains the result of system calls.
 I/O parameters describe the amount of data to transfer, the address of the source (or
target) data array in user space, file offsets for I/O, and so on.
 The current directory and current root describe the file system environment of the
process.
 The user file descriptor table records the files the process has open.

Jain School of CS and IT


Introduction to Linux 2020-2021

 Limit fields restrict the size of a process and the size of a file it can write.
 A permission modes field indicating mode settings on files the process creates.

Layout of System Memory: Physical Address and Virtual


Address space
 The physical memory of a computer is addressable, starting at byte offset 0 and going
up to a byte offset equal to the amount of memory on the machine.

 A process on the UNIX system consists of three logical sections: text, data, and stack.

 During compilation of a program, the compiler generates memory addresses for these
sections that the process occupies during its execution: text addresses (for access to
machine instructions including subroutine calls); data addresses (for access to data
variables), and stack addresses (for access to data structures local to a subroutine).

 If the computer were to treat the generated addresses as address locations in physical
memory, it would be impossible for two processes to execute concurrently when their
set of generated addresses overlap.

 It is impractical for the compiler to generate address that did not overlap between
programs because the amount of memory on a machine is finite and the set of all
programs that could be compiled is infinite.

 Therefore, the compiler generates addresses for a virtual address space with a given
address range, and the memory management unit (MMU) of the kernel translates the
virtual addresses generated by the compiler into address locations in physical memory.
The compiler does not have to know where in memory the kernel will later load the
program for execution.

 In fact, several copies of a program can coexist in memory: All execute using the same
virtual addresses but reference different physical addresses.

Jain School of CS and IT


Introduction to Linux 2020-2021

Regions

 The UNIX system divides its virtual address space in logically separated regions. The
regions are contiguous area of virtual address space. A region is a logically distinct
object which can be shared. The text, data, and stack are usually separate regions.
 The kernel maintains a region table and allocates an entry from the table for each active
region in the system. The region table entries contain the physical memory locations at
which the region is spread.
 Each process contains a private per process regions table, called a pregion. The pregion
entry contains a pointer to an entry in the region table, and contains starting virtual
address of the region.
 The pregions are specific to a process, pregion table is private to a process.

 Regions can be shared amongst processes. It is common to share the text region among
instances of a same process.

Jain School of CS and IT


Introduction to Linux 2020-2021

 Figure depicts two processes, A and B, showing their regions, pregions, and the virtual
addresses where the regions are connected.
 The processes share text region 'a' at virtual addresses 8K and 4K, respectively.
 If process A reads memory location 8K and process B reads memory location 4K, they
read the identical memory location in region 'a'.
 The data regions and stack regions of the two processes are private (‘b’ , ‘c’, ‘e’, ‘d’) .

Pages and Page Tables


 The physical memory is divided into equal sized blocks called pages. Typical page sizes
range from 512 bytes to 4K bytes and are defined by the memory management
hardware.
 Every memory location is contained in a page and can be addressed by a "page number"
and "byte offset" in the page.
 When kernel assigns physical pages of memory to a region, it need not assign the pages
contiguously or in any particular order. Just like disk blocks are not assigned
contiguously to avoid fragmentation.
 The kernel correlates the virtual addresses of a region to their physical machine
addresses by mapping the logical page numbers in the region to physical page numbers
on the machine.
 Since a region is a contiguous range of virtual addresses in a program, the logical page
number is the index into an array of physical page numbers. The region table entry
contains a pointer to a table of physical page numbers called a page table
 The kernel maintains a mapping of logical to physical page numbers using page tables
which looks like this:
 For Example: Virtual address of 4k size may be indexed in to 4 pages each of 1k size
which may be mapped into different physical pages of 1k size each.

 Page table entries may also contain machine-dependent information such as permission
bits to allow reading or writing of the page.
 The kernel stores page tables in memory and accesses them like all other kernel data
structures.

Jain School of CS and IT


Introduction to Linux 2020-2021

Context of a Process
 Each time a process is removed from the CPU, sufficient information on its current
execution state must be stored such that when it is again scheduled to run on the CPU,
it can resume its execution from where it stopped. This execution state data is known
as its context.

 When executing a process, the system is said to be executing in the context of the
process. When the kernel decides that it should execute another process, it does a
context switch, so that the system executes in the context of the other process.

 The context of a process consists of:


o Contents of its (user) address space, called as user level context
o Contents of hardware registers, called as register context
o Kernel data structures that relate to the process, called as system context

Jain School of CS and IT


Introduction to Linux 2020-2021

 User level context consists of the process text, data, stack, and shared memory that is
in the virtual address space of the process. The part which resides on swap space is also
part of the user level context.

 The register context consists of the following components:


o Program counter specifies the next instruction to be executed. It is an address in
kernel or in user address space.
o The processor status register (PS) specifies hardware status relating the process.
 It has subfields which specify if last instruction overflowed, or resulted
in 0, positive or negative value, etc. in case of arithmetic operations. It
also specifies the current processor execution level and current and most
recent modes of execution (such as kernel, user).
o The stack pointer points to the current address of the next entry in the kernel or
user stack.
o The general purpose registers contain data generated by the process during its
execution.

 The system level context has a "static part" and a "dynamic part". A process has one
static part throughout its lifetime. But it can have a variable number of dynamic parts.

 The static part consists of the following components:


o The process table entry
o The u-area
o Pregion entries, region tables and page tables.

 The dynamic part consists of the following components:


o The kernel stack that needs to be different for all processes as every process
might be in a different state depending on the system calls it executes. The
kernel stack is empty when the process executes in user mode
o A set of system-level context layer containing information necessary to recover
a state.

Process Creation
 The only way to create a new process in UNIX is to use the fork system call. The
process which calls fork is called the parent process and the newly created process is
called the child process.
Syntax: pid = fork()

Jain School of CS and IT


Introduction to Linux 2020-2021

 On return from the fork system call, the two processes (parent and the child) have
identical copies of their user-level context except for the return value pid which is
different than the parent pid (PPID)

 The kernel does the following sequence of operations for fork:


o It creates a new entry in the process table.
o It assigns a unique ID to the newly created process.
o It makes a logical copy of the regions of the parent process. If a region can be
shared, only its reference count is incremented instead of making a physical
copy of the region.
o It increments file and inode table reference counts for the file associated with
the process.
o It returns the ID number of the child to the parent process.

Signals
 Signals inform processes of the occurrence of events. Processes may send each
other signals with the kill system call, or the kernel may send signals internally. There
are 19 signals in the System V (Release 2) UNIX system that can be classified as
follows:
o Signals having to do with the termination of a process, sent when a
process exits.
o Signals having to do with process induced exceptions such as when a process
accesses an address outside its virtual address space, when it attempts to write
memory that is read-only (such as program text), or when it executes a
privileged instruction or for various hardware errors.
o Signals having to do with the unrecoverable conditions during a system call,
such as running out of system resources.
o Signals caused by an unexpected error condition during a system call, such as
making a non-existent system call, or using an illegal "reference" value for
the lseek system call.
o Signals originating from a process in user mode, such as when a process wishes
to receive an alarm signal after a period of time, or when processes send
arbitrary signals to each other.
o Signals related to terminal interaction such as when a terminal hangs up, or
when a user presses the "break" or "delete" keys on a terminal keyboard.
o Signals for tracing execution of a process.

Jain School of CS and IT


Introduction to Linux 2020-2021

Sending Signals from Processes


Processes use the kill system call to send signals.
syntax: kill (pid, signum)
where pid identifies the set of processes to receive the signal, and signum is the signal
number being sent.

Process Termination
 Processes on a UNIX system terminate by executing the exit system call. When a
process exits, it enters the zombie state, relinquishes all of its resources, and dismantles
its context except for its process table entry.
syntax: exit (status)
where the value of status is returned to the parent process for its examination.
 Processes may call exit explicitly or implicitly at the end of a program.
 When the child process exits, the child sends a termination signal to the parent and then
terminates, which causes the parent process to wake up from sleep and return to running
state.

Invoking Other Programs


 The exec system call invokes another program, overlays/overwrites the address space
of a process with the contents of an executable file.

syntax: execve(filename, argv, envp)

where filename is the name of the executable file being invoked, argv is a pointer to the
to the parameters to the executable program, and envp is a pointer to the environment
of the executed program.
 There are several library functions that call exec, like execl, execv, execle, and so on.
All call execve eventually.

The Shell
 The shell is a very complex program. The shell reads a command line from its standard
input and interprets it. The shell keeps looping and reading the commands.

 The standard input and standard output file for the login shell are usually the terminal
on which the user logged in.

 The simplest command lines contain a program name and some parameters, for
example: ls -l

Jain School of CS and IT


Introduction to Linux 2020-2021

 The shell forks and creates a child process, which invokes the program that the user
specified on the command line and begins executing.

 The parent process (i.e. the shell) waits until the child process exits and then loops back
to read the next command.

Jain School of CS and IT

You might also like