Operating Systems Project: Device Drivers: Jordi Garcia and Yolanda Becerra

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 22

Operating Systems Project: Device Drivers

Jordi Garcia and Yolanda Becerra1

Department of Computer Architecture


Universitat Politècnica de Catalunya

September 2012

1. Introduction

The main aim of this project is to study the internal functions of an operating system in
depth. You will learn how to modify basic data structures of an OS and improve its
functionalities.

In this second project, a generic Linux distribution (specifically, version 2.6) will be
used and several kernel modules to add new functionalities will be implemented.

When you start your PC in the laboratory, you must boot Ubuntu and the image
labeled “proso” as usual.

In [1] you can find all the documentation about Linux Kernel Modules (LKM). They
basically allow kernel parts to be dynamically modified/added while Linux is still
running without having to recompile or relink, as you had to do in the previous project.
You will thus learn another way of modifying system code.

Obviously, some functions have restricted access and some functions cannot be
inserted into the kernel in this way. The most usual system changes made are to
device drivers. However, to do so, it is necessary to be a privileged user as not
everybody is allowed to make changes to a system. The printer driver is a typical
example. Imagine a laptop that is used at home and at work; it will have several
printer drivers installed on it. However, people rarely have the same printer at home as
they do at work. Therefore, even though the drivers will always be installed, the
physical devices (the printers) will not always be readily available.

1
This document was drawn up with the support of professors on previous courses: Julita
Corbalán, Juan José Costa, Marisa Gil, Jordi Guitart, Amador Millan, Gemma Reig Silvia
LLorente, Pablo Chacín and Rubén González.

--1--
Specifically, in this project you will add a monitoring mechanism for some Linux system
calls. This monitoring will be dynamically added by using a module, without the need
for recompiling the Linux kernel. Once monitored, a new device will be added to allow
users to access the statistics they wish to consult. It will therefore be necessary to
create a driver for this device. Another module will have to be used to avoid
recompiling the kernel.

A summary is given below of essential concepts and of the basic code for creating
modules, devices and drivers.

2. Previous concepts

2.1. Linux Kernel Modules (LKM)


LKM is a Linux mechanism for dynamically adding a set of routines and data structures
to a system. Each module is made up of an object file that can be dynamically
mounted (inserted) on the running (executable) system using the insmod program
and unmounted (removed) using the rmmod program.

2.1.1 Module definition


In general, a module only needs to define an initialization function and an ending
function, as can be seen in Figure 1, which shows the functions Mymodule_init and
Mymodule_exit.

#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/init.h>
MODULE_LICENSE(“GPL”);
/*
* Initialize the module.
*/
static int __init Mymodule_init(void
void)
void
{
/* Initialization code */
printk(KERN_DEBUG “Mymodule successfully loaded\n”);
return 0;
// This function returns 0 if is everything is OK
// and < 0 in case of error
}
/*
* Unload the module.
*/
static void __exit Mymodule_exit(void
void)
void
{
/* Finalization code*/
}
module_init(Mymodule_init);
module_exit(Mymodule_exit);

Figure 1. Basic code of a module (mymodule.c)

--2--
The optional tokens __init and __exit are used to indicate the kernel that these
functions can only be used when initializing/ending the module.
The routines defined with module_init and module_exit macros are automatically
executed when the module is loading and unloading, respectively.
The optional keywords __init and __exit inform the kernel that these functions can
only be used when the module is being initialized/ended.
The routines defined with the macros module_init and module_exit are executed
automatically when the module is loading and unloading, respectively. These macros
are mandatory.

2.1.2 Defining module parameters in loading time


The Linux version 2.4 and later versions allow programmers to define parameters in
loading time. The interface is quite easy:
module_param. Defines the parameter, its type and the access rights in the
corresponding sysfs2 file that will be created for this module to allow users to access
the parameter (in our case 0, which means it will not create any files).
MODULE_PARM_DESC Makes it possible to add a short description to the parameter
(which can be consulted later using the modinfo command).
MODULE_AUTHOR Includes the author name in the module.
MODULE_DESCRIPTION Includes a description of the module.
MODULE_LICENSE. Shows the type of license the module has (GPL, BSD, etc.).

There is a small example below. A parameter (the PID) of the type of integer that
could be modified in loading time is defined in the module’s source code:
#include <linux/moduleparam.h>
...
int pid = 1;
module_param (pid, int, 0);
MODULE_PARM_DESC (pid, "Process ID to monitor (default 1)");
...
MODULE_AUTHOR("Joe Bloggs <joe.bloggs@somewhere>");
MODULE_LICENSE ("GPL");
MODULE_DESCRIPTION("ProSO driver");
...

2.1.3 Compiling a module


To compile the code in Figure 1 (saved in a file named mymodule.c), a Makefile must
be created as described below:

obj-m += mymodule.o

2
sysfs is a file system, generally located at /sys, used by the kernel to obtain information about
devices, modules, etc. You can find further information in Chapter 2 of Linux Device Drivers,
listed in the bibliography.

--3--
all:
make -C /lib/modules/$(shell uname -r)/build M=$(PWD) modules
clean:
make -C /lib/modules/$(shell uname -r)/build M=$(PWD) clean

This command will result in an ELF file named mymodule.ko (ko=kernel object).

2.1.4 Utilities that Linux has for managing modules


Linux has the insmod, lsmod and rmmod commands, which install, list and remove
modules from the system, respectively. Additionally, the command modprobe
determines whether or not a module with the same name exists and it installs it
automatically; and modinfo, which is used to look up information about the module.
To load a module, use the following command:
#insmod mymodule.ko

In loading time, a defined parameter can be given:


#insmod mymodule.ko pid=1

Also try executing the command:


#modinfo mymodule.ko

2.1.5 What does insmod do internally?


The program loads the file given in the address space of the operating system and also
links the remaining unresolved symbols of the file in the system symbol table
It also makes it possible to change some values of the integer variables or object
strings, so that the module (the driver it contains) can be configured in loading time.

2.1.6 When can a module be unloaded?

A module can only be unloaded when no one is accessing it. In order to ascertain
whether or not it is in use, the kernel maintains a reference counter that must be
properly updated. For instance, all the functions in a module that can be accessed from
other modules must increment this counter when called and decrement it when
returned. To maintain this counter, the programmer can use the following macros:

• try_module_get(THIS_MODULE): Increments the counter


• module_put(THIS_MODULE): Decrements the counter

These counters can be checked in the special device /proc/modules. If the counter
is not 0, it is not possible to unload the module. Therefore it is important to maintain a
consistent number of gets and puts in the counter.

2.1.7 Expressing dependencies between modules


In some cases, a module needs the functionality of another module. It is therefore not
possible to install the first module until the second has been installed. Linux allows

--4--
these dependencies to be expressed by means of the file /lib/modules/modules.dep
(for instance, /lib/modules/2.6.27-proso/modules.dep). For example, if module
moduleA requires module moduleB, this can be expressed as:

/absolute_path /modulA.ko: /absolute_path /modulB.ko


/absolute_path /modulB.ko:

It should be highlighted that the path must be an absolute path to the module’s
code.
Thus, the command modprobe facilitates the task of loading modules when the
following command is executed:

#modprobe modulA.ko

The modules will be loaded in the proper order.

2.2. Devices
A device is a real or virtual peripheral that users can use to perform input/output
operations or to interact with the OS kernel.

2.2.1. What is a device driver?


It is the group of routines and variables that handles the functions of a device (open,
release, read, write, etc.), as is shown in Figure 2Error! No s'ha trobat l'origen de
la referència..

Figure 2. Data structures for device management in Linux


Usually, the routines that control the operation of a device require access to
instructions (in/out) or addresses not allowed as an ordinary user. To be able to access
these instructions and/or addresses, the code must be executed in system mode and,
therefore, the driver is included in the OS code.

--5--
2.2.2. How to install a device driver in the system
There are two possible mechanisms:
• Statically, by recompiling all the system, including the new driver routines.
• Dynamically, by using system calls or software that make it possible to
dynamically include object files in the kernel of the OS (for example, a
module). You can see how a module is compiled and installed in Sections 2.1.3
and 2.1.4.

2.2.3. Defining device operations


To define the driver, only the group of valid operations for the device has to be
defined. The possible operations to be defined are found in the header file
<linux/fs.h>, in the file_operations structure. Its format is:

struct file_operations {
struct module *owner
owner;
owner
loff_t(*llseek) (struct file *, loff_t, int);
ssize_t(*read
read)
read (struct file *, char __user *, size_t, loff_t *);
ssize_t(*aio_read) (struct kiocb *, char __user *, size_t, loff_t);
ssize_t(*write) (struct file *, const char __user *, size_t, loff_t *);
ssize_t(*aio_write) (struct kiocb *, const char __user *, size_t,
loff_t);
int (*readdir) (struct file *, void *, filldir_t);
unsigned int (*poll) (struct file *, struct poll_table_struct *);
int (*ioctl
ioctl)
ioctl (struct inode *, struct file *, unsigned int,
unsigned long);
int (*mmap) (struct file *, struct vm_area_struct *);
int (*open
open)
open (struct inode *, struct file *);
int (*flush) (struct file *);
int (*release
release) (struct inode *, struct file *);
release
int (*fsync) (struct file *, struct dentry *, int datasync);
int (*aio_fsync) (struct kiocb *, int datasync);
int (*fasync) (int, struct file *, int);
int (*lock) (struct file *, int, struct file_lock *);
ssize_t(*readv) (struct file *, const struct iovec *, unsigned long,
loff_t *);
ssize_t(*writev) (struct file *, const struct iovec *, unsigned long,
loff_t *);
ssize_t(*sendfile) (struct file *, loff_t *, size_t, read_actor_t,
void __user *);
ssize_t(*sendpage) (struct file *, struct page *, int, size_t,
loff_t *, int);
unsigned long (*get_unmapped_area) (struct file *, unsigned long,
unsigned long, unsigned long,
unsigned long);

You will use the open, release (which corresponds to close), read and ioctl
operations. The open function makes the device available to the program and the
release function ends access. They both return 0 if everything is correct and <0 in
the case of error. The arguments of the read function are the following: the user’s

--6--
buffer where the read characters are stored, the number of characters to read size,
and an offset in/out parameter that shows the current position of the read/write
pointer before it is read, and the current position after it is read. The call returns the
number of bytes read, 0 if it has reached the end of the file, or <0 if an error has
occurred. The ioctl function returns 0 if everything is correct or <0 if an error has
occurred.
The first field in the file_operations structure, named owner, is used when the
driver is installed as a module. If defined, this field saves the programmer the task of
explicitly handling the reference counter of the driver’s module (as explained in Section
2.1.6): when using the macro THIS_MODULE, the kernel automatically maintains the
reference counter and it is not necessary to use the functions try_module_get and
module_put3.
There are also definitions for the structures struct inode and struct file in
this header file.
In order to make the code reading easier, the required fields of this structure might be
tagged such as those listed below:
struct file_operations mymod_fops = {
owner: THIS_MODULE,
read: mymod_read,
ioctl: mymod_ioctl,
open: mymod_open,
release: mymod_release,
}

Note that this syntax is not C standard, but an extension of the GNU compiler. In the
case this compiler is not be available, our old friend NULL will have to be used in the
fields in which initialization is unwanted.
Finally, it must be pointed out that the only operations required are those that the
driver will have.

2.2.4. Device identification


How does the system know that a specific system call references a specific driver?
By default, it does not know.
When the driver is installed, an identifier must be explicitly specified. This identifier is
unique and it is formed by two integers: the major and the minor (traditionally the
major was used to identify the device type and the minor to identify the subtype).
This operation is called “registering the driver” and is performed in two steps: first, the
identifier is reserved, and then the operations are associated with this identifier.
To generate an identifier for a driver, use the MKDEV macro, which, given a major
and a minor, fills the type dev_t structure with the corresponding identifier.
dev_t MKDEV(unsigned int major, unsigned int minor);

A definition of the structure dev_t can be found in the header file

3
Keep in mind that a driver is considered in use since the device is opened until it is released.

--7--
<linux/types.h>.
Furthermore, when a new device is added to the system, it is also necessary to specify
the major and minor of the corresponding device driver that manages it. Thus, each
time an operation is performed on a device, the system uses this identifier to find the
operations of its driver.

2.2.5. How to register a device driver


The first step in registering a device driver is to reserve its identifier. Using the
register_chrdev_region function defined in the header file <linux/fs.h>, it is
possible to reserve a range of device drivers identifiers:
int register_chrdev_region (dev_t first, unsigned int count, const char
*name);

The arguments are the first identifier of the region to be reserved (first), which
must be previously generated from a major and a minor using the MKDEV macro; the
number of identifiers to be reserved (count); and the name of the device (name),
which will be shown in /proc/devices. A negative return value means an error has
occurred.

This function reserves count identifiers, all of which have the same major (from the
parameter first) and consecutive minors (starting with the minor from the
parameter first)4.

To release the driver’s identifiers and allow them to be used in the future, the
unregister_chrdev_region function can be used.
void unregister_chrdev_region(dev_t first, unsigned int count);

The arguments are the region’s first identifier (first) and the number of identifiers in
the region (count).

Once the identifiers have been reserved, they must be associated with the driver’s
specific operations. To do so, a type cdev structure is used. This structure is defined
in the header file <linux/cdev.h>. First, it is necessary to define a new structure:

struct cdev *my_cdev;

It is then necessary to reserve the memory space for the structure using the following

4
It is also possible to let the system assign all the identifiers for a driver, without passing the
first identifier in the range, but this requires using the alloc_chrdev_region function
instead of register_chrdev_region. For further details about this alternative function, see
Chapter 3 of Linux Device Drivers, cited in the bibliography.

--8--
function5:
struct cdev *cdev_alloc();

Two of its fields must then be initialized, namely, the owner field, used by the system
to maintain a counter of references to the structure and that must be initialized using
the macro THIS_MODULE; and the ops field, which must be initialized using the
structure file_operations that contains the specific operations for the driver.
Finally, this structure must be attached to the device structures registered in the
system using the following function:
int cdev_add(struct cdev *dev, dev_t num, unsigned int count);

The parameters of this function are: the structure that contains the operations of the
driver (dev), the first identifier of the region (num) and the number of drivers of the
region which are to be associated with these operations (count). This function returns
a negative value if any errors occur. Until this function is successfully executed, the
driver will not be visible to the system and, therefore, it will not be possible to use its
functions.

Below is a short example in which a new cdev is declared and initialized:


struct cdev *my_cdev;
my_cdev = cdev_alloc();
my_cdev->owner = THIS_MODULE;
my_cdev->ops = &my_fops; /* my_fops is a static structure of
type file_operations previously initialized with the operations
of the driver */
cdev_add(my_cdev, dev, ndev); /* check of errors must be made */

When the driver is no longer in use, its cdev structure must be removed from the
system:
void cdev_del(struct cdev *dev);

2.2.6. How to select the major and the minor


The device driver identifier is derived from the major and the minor. Therefore, a
combination that no other device is using is required. There is a list of all the installed
device drivers with their major6 in the file /proc/devices. Although Linux version 2.6

5
If the variable of type cdev is defined statically rather than as a pointer, the cdev_init
must be used instead of the cdev_alloc function. You will find the definition of this function
in Linux Device Drivers, cited in the bibliography.
6
In the previous version of Linux, the major was used to identify the device driver and the
minor was only used internally by the driver to distinguish between the different device types
that it could manage. In version 2.6 and the following versions, both numbers (major and
minor) are needed to identify the operations associated with a device. However, the format in
/proc/devices still only shows the major of the driver.

--9--
allows different drivers to have the same major, a major currently not assigned can
be selected to obtain a new major-minor combination. Thus, any minor can work
with the confidence that the combination is not already in use.

There is an option that frees the programmer from the task of selecting these
numbers, whereby the system is told to dynamically reserve a rank of driver identifiers
(which implicitly selects the majors and the minors in the region7). Note that in this
case, the driver identifiers can vary each time the driver is installed: this behavior must
be considered when the devices are added.

2.2.7. How the major and the minor are recognized inside the driver
Using the following macros:
int MAJOR(dev_t dev);
int MINOR(dev_t dev);

The value of the parameter dev is extracted from the inode (one of the parameters
that all the driver’s operations receive: Section Error! No s'ha trobat l'origen de la
referència.).

2.2.8. How a file is associated with a device


The devices are visible from the filesystem (by default the files in /dev/* are devices).
They can be created using the system call mknod.

#mknod file type major minor

The arguments are: the file, which identifies the file that will be used as a device; the
type (a ‘c’ to create a character device); the major and the minor, which are integers
that make it possible to identify the device in the system (see Section Error! No s'ha
trobat l'origen de la referència. for further details; any minor can be used to
begin).
To see the various devices already existing in the system, check the file
/proc/devices, in which the available devices (major and registered names) are
grouped by device type.
Once the device file is created, the functionalities (i.e. which operations the system
allows for the device) of this “file” must be defined. This is done by using the device
driver.

2.2.9. How users access the device


Users can access the device using the usual file system calls: open, close, read,
write, ioctl.

7
For further details, see Chapter 3 of Linux Device Drivers, cited in the bibliography.

- - 10 - -
The only of these system calls that is really dependent on the peripheral is ioctl,
which allows users to perform specific operations on the peripheral by combining its
last two arguments.

2.3. Linux
2.3.1. How to find the Linux source code
The directory /usr/src/linux contains the system’s source code. The headers
related to the system version are in /usr/src/linux/include. The various
routines and structures used by Linux to manage processes can be found in Chapter 3
of [2]: for_each_process, find_task_by_pid, etc. (see http://lxr.linux.no).

2.3.2. Symbols
By symbols, we mean variable names and routine names. Symbols from an object
file can be consulted using the nm command.

Another kind of symbol is that defined using #define, such as the macro current,
which returns a pointer struct task struct that references the control data of the
running process. This structure is known as the PCB (Process Control Block). See
http://lxr.linux.no/source/include/asm-generic/current.h).

2.3.3. The Linux symbol table


Linux exports a set of symbols so that they can be used and referenced from modules.
For each exported symbol (variable, routine, etc.) the system keeps the name and
memory address where it is allocated (system logical address) in a table.

This symbol table is created in compilation time, since it is necessary to have the name
of the symbol and its address. To export a symbol, the macro
EXPORT_SYMBOL(symbol_name) must be used and the kernel recompiled (an example
can be seen at http://lxr.linux.no). As modules are part of the kernel, they can
also export symbols using this macro.

A definition of the system’s architecture-independent symbol table can be found in the


file kernel/ksyms.c. The Intel architecture-dependent exported symbols can be
found in the file arch/i386/kernel/i386_ksyms.c.

2.3.4. How to ascertain the contents of the system’s symbol table


There are two different ways:
• Reading the device /proc/ksyms
• Using the command ksyms -a

- - 11 - -
2.3.5. What must internal system routines return? Who receives this
information?
The Linux convention states that a negative value (< 0) is returned when an error
occurs. Otherwise, a non-negative value (>=0), is returned. The type of error is the
absolute value of the returned code. It is necessary to find out its meaning in the
header file <sys/errno.h>.

2.3.6. What action should be taken when an internal system routine


returns an error? What type of error should be returned?
If we do not have a specific treatment, it should be returned the same error that has
already given us back the system routine.

2.3.7. User space and system space


The system differentiates between two address translation functions when it is
running. Depending on the processor execution mode, one or another will be used.

Usually, there is a unique translation function for the system code and a specific
translation function for each running process. This mechanism guarantees the system
security, since users cannot change system data from its applications or between
applications because they cannot access other application address spaces.
Likewise, the memory access mechanism depends on the execution mode. If it is
necessary to access the user address space (to pass parameters for system calls, for
example) when running in system mode, special instructions will be needed to tell the
processor that the user address space must be used, even if the system mode is on.
Remember that you will have to check in each case if it is possible for the user and the
system space to copy this information, as you did in Project 1.

2.3.8. Operations to copy data between address spaces


Basically, there are two operations, which are declared in <asm-i386/uaccess.h>:
unsigned long copy_from_user(void *to, const void *from, unsigned long count);

To copy from the user mode to system mode .

unsigned long copy_to_user(void *to, const void *from, unsigned long count);

To copy from the system mode to user mode.


See the parameters and return values at http://lxr.linux.no.

2.3.9. How printk works


printk is the routine that can be used inside the kernel to write information in the
computer console. However, its (excessive) use is discouraged.
The format and parameters that it uses are a limited version of that used for the printf
routine from the C library. It is a line-buffered writing function. This means that it will
not write data until it does not find a line jump.

It must be highlighted that printk has a special feature: the first characters of a string

- - 12 - -
are interpreted as the message priority that is to be written. The format of this
information is:
printk ("<N>Goodbye cruel world\n");
printk (KERN_EMERG "Goodbye cruel world\n");

where N is a number between 0 and 7. Depending on the priority level, the message
appears in a different place: the computer console, a log file (for example,
/var/log/messages or /var/log/kern.log), etc. The log file name depends
on the system configuration (/etc/syslog.conf).
Some macros such as KERN_EMERG for defining different priorities can be found in the
file <linux/kernel.h>. If the priority is lower than console_loglevel, the
message is printed to the console. If syslogd and klogd are running, the message is
also written in the log file, regardless of whether or not it is written to the console.

All these kernel messages are kept on a structure called the “kernel ring buffer”, which
can be accessed using the dmesg command. The size of this buffer is limited, so old
messages are removed to make room for new ones (a kind of circular queue). More
information can be found in the appendix of this document or by using the man
command:
man dmesg
man syslogd
man syslog.conf

3. Description of work

In this project, you have to modify the system to take usage statistics. Current OSes
have different ways to store statistics about them, so that problems can be easily
identified and the appropriate actions taken. In our particular case, the task will be to
find out the system’s mean response time by focusing on system calls.
To do so, the entry point of each system call must be modified introducing new code
(instrumentation). We will keep information for the following calls: open, write,
clone, close and lseek. The information needed for each type can be summarized
by:
• Number of times the call is initiated
• Number of times it ends correctly
• Number of times it ends incorrectly
• The time the call is running
By adding this instrumentation to each system call, the system may be a little slower.
You are therefore to implement this instrumentation dynamically so that it can be
enabled or disabled.
To do this, the Linux system call table will be intercepted and each function to measure
up, replaced for a local function. This local function will check the time it takes the old
function to execute. This local function will have the same interface than the
corresponding system call (you can see this interface on the Linux source code).

- - 13 - -
To sum up, you will have to implement two modules:
• Module 1 to intercept system calls and measure the time spent.
• Module 2 to access system statistics.

3.1. Module 1: Intercepting and measuring

This module will intercept the symbols table (modify the system call table), insert the
instrumentation functions when the module is loaded into the system (by enabling
instrumentation) and remove them when unloading it.
The instrumentation functions will be in the module and they will update the system
call counters for the current process. You can use the mechanism explained in the
section How to measure time for measuring times.
From the system call table, the original functions will be replaced by those to be
monitored for the routines created. These monitoring routines must (see Figure 3):
1. Mark the beginning of the system call
2. Execute the original system call
3. Calculate the total running time and obtain the call result

Running
program
Your routine is
executed: time check
Original system call

System call

System call
return

The original system call ends. Time check


and total duration calculated.

Figure 3. Diagram of system call interceptions

3.1.1. Intercepting the system call table


The system call table is called sys_call_table and it is a table of function pointers. Thus,
if the module defines:
extern void * sys_call_table[];

- - 14 - -
a symbol will be obtained that references the system call table (see
http://lxr.linux.no/).

3.1.2. Intercepting system calls


After the system call table has been obtained, only the functions to be monitored need
to be kept. Below is an example of where the variable is stored:
sys_open_old = sys_call_table[POS_SYSCALL_OPEN];

From this time on, the variable sys_open_old will contain a pointer to the original
open system call.

3.1.3. Modifying the system call table


The table will obviously be modified in the same way. If a function to monitor the open
called sys_open_local has been defined, it will only be necessary to enable the
following:
sys_call_table[POS_SYSCALL_OPEN] = sys_open_local;

3.1.4. Headers of the system call to trace


As an example, you can find below the headers of the system calls to trace in modul 2.
Notice that the kernel headers are different from the user (system calls headers):

long sys_open(const char __user * filename, int flags, int mode);

long sys_close(unsigned int fd);

ssize_t sys_write(unsigned int fd, const char __user * buf, size_t count) ;

off_t sys_lseek(unsigned int fd, off_t offset, unsigned int origin) ;

int sys_clone(struct pt_regs regs) ;

To obtain the other system call headers you can see:


http://www.jollen.org/blog/2006/10/linux_2611_system_calls_table.html

3.1.5. Where to maintain statistics


In kernel 2.6, the management of the PCB is different to version 2.4 and is therefore
different to the ZeOS. In version 2.6 and the following versions, it was decided to split
the PCB into two components: task_struct and thread_info.

- - 15 - -
current_thread_info() current

Figure 4. Sharing of two pages by the kernel stack and the thread_union

The task_struct contains the process information, such as the open files and a pointer
to the thread_info.
The thread_info is the structure that shares the memory space with the kernel stack. It
contains the thread’s execution state and a pointer to the task_struct. A definition can
be seen at http://lxr.linux.no/#linux+v2.6.34.1/arch/x86/include/asm/thread_info.h. A
definition of the thread_union (the convergence of the thread_info and the stack) is at
http://lxr.linux.no/#linux+v2.6.34.1/include/linux/sched.h#L1939. Also see Figure 4.

The macro current must be used to obtain the address of the task_struct. The
routine current_thread_info() must be used to obtain the base address of the
thread_info.

As the statistics require little space, they will be stored just above the structure
thread_info so that they can easily be associated with the process. To do so, a new
structure called my_thread must be created. It will contain the structure thread_info
and the statistics for the process (see Figure 5).

- - 16 - -
Figure 5. Where statistics are stored
For each process, it must be determined whether or not its statistics have been
initialized. Notice that to create a new process, the kernel will reuse the same data
structures previously associated with a dead process. Therefore, this structure can still
store the statistics from the previous process.
It must be determined whether or not statistics have been initialized. Therefore, an
additional field can be defined in the data structure to store the PID of the process
associated with the data currently stored. If the PID does not match the current
process’ PID, the statistics have not been initialized and, therefore, this must be done
and the PID updated.

3.1.6. How to measure time


You can use the proso_get_cycles function for measuring time. It is implemented as
follows:
#define proso_rdtsc(low,high) \
__asm__ __volatile__("rdtsc" : "=a" (low), "=d" (high))

static inline unsigned long long proso_get_cycles (void) {


unsigned long eax, edx;

proso_rdtsc(eax, edx);
return ((unsigned long long) edx << 32) + eax;
}

3.1.7. Other restrictions

Each process must have its own counters for the system calls, so they will have to be
reset to zero for each new process.
As a final requirement, you must check that everything works properly. Thus, when the
module is uninstalled from the system, it will print the PID’s statistics on the
screen that will have been entered as an argument when the module was
inserted.
It is necessary to prevent the module from being unloaded when there is a process
with an intercepted call (you can use try_module_get and module_put, as explained in
Section 2.1.6).

- - 17 - -
3.1.8. Tests
To check that the module is working properly, a number of tests must be run to show
this. Below is a skeleton of the tests:
• Begin the test. Print a start message.
• Print the PID of the current process and block the process until a key is pressed.
• (Load the module with the PID of the test.)
• Press a key and continue with the test.
• Check that all the system calls have been monitored.
• Print an end message to finish the test. Block the process until a key is pressed.
• (Download the module and print the process’ statistics.) Check that the results
belong to the test.
• End the test.

3.2. Module 2: Accessing system information


The main aim of this project is to build a module that will make it possible to access
the information stored in the previous delivery. Using the previous module, it was
possible to monitor all the processes created in the system. Using this module, a new
device will be created to access this information. To create this device, you must select
a major and a minor to identify the driver, which will be used to register it and to
create the file that will represent the device in the file system.
This module allows the process and the system call to be selected from which
information is to be obtained (even when all the processes and the five system calls
are being monitored).
A new device must be created that will make it possible to perform the following
operations:
ssize_t read (struct file *f, char __user *buffer, size_t s, loff_t *off);
int ioctl (struct inode *i, struct file *f, unsigned int arg1, unsigned long
arg2);
int open (struct inode *i, struct file *f);
int release (struct inode *i, struct file *f);

• open. The device can only be opened one process at a time and only by the user
root (uid==0).
• read. A read on this device will return a structure to the user space (buffer) with
information about the current monitored system call for the process currently being
monitored. Users should create the structure before executing the read system call.
The number of bytes to be read will be the minimum before the s parameter and the
sizeof(struct_info).
• ioctl. Users will be able to modify the device’s settings using this call (selected
process and system call, etc.).
• release. This call will deallocate the use of the device.

- - 18 - -
By default, the read system call obtains the statistics of the open system call for the
process that opened the device. The structure that will be returned to the user is of the
type shown below:
struct t_info {
int num_entries;
int num_exits_ok;
int num_exits_error;
unsigned long long total_time;
}

In order to control the behavior of this new device using the ioctl call, the following
parameters (the values in brackets are constant values) must be defined:

CHANGE_PROCESS (arg1 = 0). The third parameter (arg2) indicates, by


reference, the identifier of the process that is to be analyzed. If the
pointer is NULL (zero), this means that once again the target process for
the read system call is the one that opened the device. If the required
process does not exist, the system call must return an error.

CHANGE_SYSCALL (arg1 = 1). This makes it possible to change the


target system call for the read system call. The meaning of the third
parameter (arg2) is:
• OPEN (0)
• WRITE (1)
• LSEEK (2)
• CLOSE (3)
• CLONE (4)

RESET_VALUES (arg1 = 2). This resets the statistics of the process


currently being analyzed.
RESET_VALUES_ALL_PROCESSES (arg1=3). This resets the statistics of
all processes.

The system call will return a zero if everything worked properly, and < 0 if an error
occurred (the corresponding error code will be displayed).

4. Dynamic monitoring

The aim in this stage of the project is to add more dynamism to the instrumentation
mechanisms than have been explained so far. To do so, the modules created must be
modified.

- - 19 - -
4.1. Changes in Module 1
The monitoring of system calls is to be dynamically activated/deactivated. Thus, the
new behavior will be as follows:
• By default, all five system calls will be monitored, as before.
• Two new functions will be added to make it possible to activate/deactivate the
monitoring of system calls. Module 2 will access these two new functions.
• The addresses of the system calls should be kept on a table (penalties will be
imposed if such a table is not implemented).

4.2. Changes in Module 2


To allow users to activate/deactivate the monitoring of a system call, the functionality
of the ioctl function for the device must be extended using the new functions added
to Module 1.
• Enable and disable the system calls to implement selectively. The system call
ioctl must be modified to implement two new operations:

o ENABLE SYS CALL (arg1 = 4 and arg2 = num). Enables the


instrumentation of the system call num (or all of them if num is a
negative number).
o DISABLE SYS CALL (arg1 = 5 and arg2 = num). Disables the
instrumentation of the system call num (or all of them if num is a
negative number).

• Users must be able to easily introduce the type of the call to be implemented,
such as by using constants.

5. Deliverables
You should deliver all of the source files (including Makefiles) you have created and the
test suite you used to test the modules. Additionally, you must submit a README file
describing your test suite and the instructions to execute it.

- - 20 - -
6. References

[1] Jonathan Corbet, Alessandro Rubini, Greg Kroah-Hartman: Linux Device


Drivers, third Edition, February 2005. http://lwn.net/Kernel/LDD3
[2] Daniel P. Bovet, Marco Cesati: Understanding the Linux Kernel. O’Reilly.
November 2005.
[3] http://tlpd.org/LPD/kmpg/2.6/html/lkmpg.html#AEN245

7. Appendix: how to export the sys_call_table symbol


In the kernel you will use in the laboratory, the symbol sys_call_table will already have
been exported. However, if you want to test your project at home, follow the steps
below to generate a new kernel image (with the name linux.2.6.xx-proso, where xx is
the subversion you have installed).
1. Go to the kernel source code directory (assume to be /usr/src/linux).
# cd /usr/src/linux
2. Modify the file arch/i386/kernel/i386_ksyms.c by adding the following lines:
extern void * sys_call_table[];
EXPORT_SYMBOL(sys_call_table);

3. Edit the Makefile to modify the variable EXTRAVERSION, to set the image
name:
# vi Makefile

...
EXTRAVERSION=-proso
...

4. Use the current configuration file from the /boot directory.


# cp /boot/config-2.6.XXXX .config

5. Prepare the environment to compile the kernel.


# make oldconfig

6. Recompile the kernel.


# make

7. Recompile the modules.


# make modules_install

- - 21 - -
8. Install the kernel image (vmlinuz-2.6.XXX-proso) and the symbols
(System.map-2.6.xx-proso).
# make install

9. Generate a boot file with the required modules (otherwise, the system will not
boot).
# mkinitramfs -o /boot/initrd.img-2.6.XXX-proso 2.6.XXX-proso

10. Modify the grub’s boot file, /boot/grub/menu.lst to add the new image.
# vi /boot/grub/menu.lst

11. Modify the following fields to point to the new image and the new initrd file:
title
kernel
initrd

- - 22 - -

You might also like