Part 1: File System Information: PA3.1. Reading An Ext2 Disk Image: Basic Information and Blocks

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Reading an ext2 disk image

PA3.1. Reading an ext2 disk image: basic information and blocks


This assignment is all about file systems. While completing this assignment, you will gain
a better understanding of the implementation of the Linux file system, in particular the
ext2 version of this file system.

The assignment is divided into five parts. The division is meant to help you with time
management while working on the assignment, and to ensure you make steady
progress on it instead of leaving it to the last minute.

• Unless otherwise specified, the course's regular policies and procedures, as listed
on the syllabus, apply to this assignment.
• To be marked, the program must compile and run on a Linux environment. In
particular, note that parts of the provided code may not work at all in other
environments, including Mac OS X. If you are testing your code in your own
Linux environment, you may need to install package libfuse-dev (or its
equivalent in your distribution).

Part 1: File System Information


In this assignment you will access information from a file containing an image (i.e., the
binary representation of the contents of a storage device) of an ext2 (Linux) file system.
Your task it to write code to determine some basic information about this file system.
The following is a list of some of the information to be determined about the file system:

• its size, in bytes.


• its block size, in sectors.
• its volume name.
• the total number of blocks and inodes.
• the number of reserved blocks.
• the number of blocks and inodes in each block group.
• the number of block groups.
• for each block group, the number of free blocks and inodes.
• other relevant information stored in the superblock.
• other relevant information stored in each block group.

Most of the information above can be read, or computed, from the information stored
in the superblock (located 1024 bytes ahead from the start of the image) or the group
descriptor table (located in the following block) of the file system volume file. Relevant
details about the ext2 file systems can be found here:
• https://wiki.osdev.org/Ext2
• https://www.nongnu.org/ext2-doc/ext2.html

A skeleton code is provided in ext2.zip. Note that this file contains starting code for all
parts of this assignment, as well as a testing file.

In this part of the assignment you will need to implement the following functions
in ext2.c: open_volume_file, close_volume_file, and read_block (as well as any
additional helper functions you may see fit).

Note that the function prototypes in the provided code are built in a manner that is
expected to be helpful for other parts of the assignment, as well as to simplify
automated testing, so you must not make any changes in function prototypes and
expected behaviour. Note in particular that your code will be tested with a different
main function (not provided) that includes additional tests beyond those that are listed
in the provided test files, so you should ensure that your code works for generic cases,
not only for those listed in the test file.

You are allowed to make some assumptions: you may assume that your code is running
on a little-endian CPU, which allows you to read the integer values in the image file in its
raw format. You are also allowed to assume that the file system uses version 1.0 of the
ext2 specification, and that it follows the specific format described for the Linux
environment, in particular regarding the use of 32-bit values for UID and GID. Note that
these assumptions, combined with the fact that some structs in ext2.h follow the ext2
specification exactly, should allow you to read the full superblock (as well as other data
structures like group attribute tables and inodes) in a single read operation.

One topic of note that is not clearly described in the reference links above is regarding
sparse files. In ext2 (and other similar file systems), if the block number associated to a
particular location in the file is 0 (zero), then no block is allocated for the region in
question, and all data bytes for that block are considered to be zero. This allows the file
system to store very large files with sparse content (e.g., a 13 GB file with only a handful
of non-zero blocks) by only using blocks for regions of the file that actually contain non-
zero data. You should ensure your implementation applies this information
appropriately, i.e., if the block number being read is zero, then all data returned for that
block is zero.

Additional testing
The file ext2test.c contains some preliminary tests that should allow you to verify the
basic functionality of your code, particularly the first four parts of the assignment. Some
of the tests are based on files and resources that are available in some of the sample
files below, but not in others, so you should presume that an error in this tests
necessarily means that your code is wrong. You are strongly encouraged to add further
tests to ensure your code works as expected.

The test program takes as an argument the name of the volume file containing the file
system. This functionality is provided, so make sure you don't break it. If you would like
to implement additional arguments for testing purposes, you must make sure that
calling the program with a single argument still works as expected.

A sample of ext2 image files are provided below for testing. Different files have different
characteristics (different files, sizes, block sizes, etc.).

• ext2_R1_1024.img
• Sparse.img
• LargeUID_file.img

Submitting your file


You may save your submission as many times as you wish before the deadline. Your
latest submission will be considered for grades.

Files required: ext2.c


PA3.2. Reading an ext2 disk image: inodes and file content
This part of the assignment builds up on your work from the first part of the assignment
(blocks).

Part 2: Inodes and File Content


The goal of this part is to be able to deal with inodes, and extract information about the
associated files. More specifically, you must implement the
functions read_inode, get_inode_block_no and read_file_block in ext2file.c, so that
they can obtain both the metadata and content of an inode.

As with the previous part, relevant details about the ext2 file systems can be found here:

• https://wiki.osdev.org/Ext2
• https://www.nongnu.org/ext2-doc/ext2.html
Note that the file ext2test.c, described in the previous parts, already has some
preliminary tests for the functions above.

Files required: ext2file.c


PA3.3. Reading an ext2 disk image: directories and path resolution
This part of the assignment builds up on your work from the first two parts of the
assignment (blocks and inodes).

Part 3: Directories and Path Resolution


The goal of this part is to be able to deal with a special type of file: directories. More
specifically, you must implement the
functions next_directory_entry, find_file_in_directory,
and find_file_from_path in ext2dir.c.

The function next_directory_entry is intended to be called in a loop, allowing the caller


to navigate through all files of a directory. This function will return all entries in the
directory, one at a time. In each call, the offset argument is modified so that, when the
next call is received, the function is able to proceed from where it ended before. Two
examples of how this function is expected to be called are found in helper functions
in ext2test.c.

The function find_file_from_path performs the path resolution process within the file
system. In particular, this function will split a path into components (file names and
directory names); for each component the function will find the directory entry for the
individual name (you may use find_file_in_directory for that purpose), then find the
inode associated to that entry. For example, if the path is /FOO/BAR/BUZZ.TXT, the
function will:

• read the inode for the root directory (see read_inode; the inode number for this
directory is represented by the constant EXT2_ROOT_INO);
• visit the entries in the root directory until it finds one whose name is FOO,
keeping track of the inode number (see find_file_in_directory);
• find and read the inode for the directory FOO;
• assuming FOO is a directory, visit the entries in the content of this directory until
it finds one whose name is BAR;
• find and read the inode for the directory BAR;
• assuming BAR is a directory, visit the entries in the content of this directory until
it finds one whose name is BUZZ.TXT;
• find and read the inode for the file, returning this inode.
Note that the file ext2test.c, described in the previous part, already has some
preliminary tests for the functions above.

Files required: ext2dir.c

PA3.4. Reading an ext2 disk image: symbolic link

This part of the assignment builds up on your work from the first two parts of the
assignment (blocks and inodes). It is not directly related to part 3 (directories), though
completing part 3 before this part may help with the testing process.

Part 4: Symbolic Links


The goal of this part is to be able to deal with another special type of file: symbolic links.
More specifically, you must implement the
function read_symlink_target in ext2symlink.c.

Symbolic links store, as their only information, a string corresponding to the target path
of the link. Given that this path is typically short, and using a data block for this kind of
storage is often wasteful, these files often have a special type of storage, as described
in https://www.nongnu.org/ext2-doc/ext2.html#def-symbolic-links. In essence, if the
length of the target has less than 60 bytes, the target is stored in the inode itself, in the
space typically reserved for data block numbers. Note that the structs in ext2.h use
a union to allow the same set of bytes to be used for more than one purpose.

Note that the file ext2test.c, described in the previous parts, already has some
preliminary tests for the functions above.

Files required: ext2symlink.c

Implementing an ext2 file system

PA3.5. Implementing an ext2 file system in userspace

This part of the assignment builds up on your work from the first four parts of the
assignment.

Part 5: A Virtual File System


This part of the system aims to give you a better understanding of how a file system
actually works. You will use the FUSE (Filesystem in Userspace) library to create a
functional file system that will allow a user to read the data in the file system using tools
provided by the operating system.

This part of this assignment may not work at all in non-Linux environments, including
Mac OS X. If you are testing your code in your own Linux environment, you may need to
install package libfuse-dev (or its equivalent in your distribution).

FUSE allows a particular user-level application to act as a real file system mounted on
the system. When a FUSE-based file system is mounted, any system call received by the
kernel related to actions in files inside the mounted file system will trigger calls to
predetermined functions associated to those actions in your application. So, for
example, when a user lists the contents of a directory inside the mounted file system,
the kernel will call the function associated to the readdir operation. When a user reads
data from an open file, the kernel will call the function associated to the read operation,
and will use its result as the result of the read operation.

In the provided file ext2fs.c, you are provided with the skeleton of a FUSE-based file
system that uses an ext2 volume file as the underlying data structure. This skeleton
provides support for the following operations:

• init: this operation is called automatically when the file system is first mounted.
• destroy: this operation is called automatically just before the file system is un-mounted.
• getattr: read the metadata of a file (similar to the stat command, or the stat system
call).
• readdir: lists the entries of a directory.
• read: reads the content at a specific position of a file (similar to an fseek followed
by fread, or a call to pread).
• readlink: reads the target of a symbolic (soft) link.

In this part of the assignment you are responsible for implementing the operations
above (except for init and destroy, which are already implemented). The comments
provided in the file should give you enough information to implement the operations
successfully, but if you require additional information, you can find them
in http://libfuse.github.io/doxygen/. In particular, the page on fuse operations may be
particularly useful.

Testing your file system


To run the file system above, you must use the following command:

./ext2fs -f -s -o use_ino mountpoint file.img


where mountpoint is the directory where you will mount your file system (details below),
and file.img is the file containing the ext2 volume you are mounting. The command-
line option -f indicates that the file system should run in the foreground (details below).
The command-line option -s is used to indicate your file system will run single-
threaded, i.e., only one file operation will be called at a time. You can also use the
option -d for additional debugging information.

The command-line option -o is used to set additional options. In the example above, it
is used to add the use_ino option (see below). You are welcome to explore further
options (such as allow_other, direct_io, max_read or nonempty; see man 8 mount.fuse for
more details), but you are not required to support them, and your final implementation
should not rely on these options being set.

By default, FUSE will generate inode numbers for any new file it is made aware of. If,
however, you use the use_ino option, FUSE will use the inode number provided by
the getattr function instead of the one generated by FUSE. Your code must support
the use_ino option, i.e., it must set the st_ino field for the struct stat buffer
in ext2_getattr to the inode number used in the file system.

The mountpoint is the directory where you will mount your file system. Once you mount
it, any access to a file or directory within the mountpoint will automatically be redirected
to the file system. So, for example, if you mount a file system in /tmp/myfs, and run the
command:

ls -ali /tmp/myfs
the command will request the directory listing from the kernel, and the kernel will
request the directory listings (readdir operation) from the file system, with the path /. If
you run the command:

cat /tmp/myfs/sp.c
the kernel will call operations getattr (to confirm the file exists and is a regular file)
and read in the file system, passing the path /sp.c as an argument to each operation.
Choosing your mountpoint

You can choose almost any empty directory to be your mountpoint. However, there are
some restrictions. First of all, you can only mount one file system in each directory. If
you choose a mountpoint inside your home directory in the department Linux
computers, this directory must be readable/executable by everybody, and its path must
be executable. So, for example, if you choose the
mountpoint /cs/home/jonatan/eecs3221/pa3/mp, you can run the following commands to
provide access to the directory:

chmod a+x /cs/home/jonatan /cs/home/jonatan/eecs3221/cs/home/jonatan/eecs3221/pa3


chmod a+rx /cs/home/jonatan/eecs3221/pa3/mp

Alternatively, you can choose a mountpoint in a folder outside of your home directory,
like a folder in /tmp (not /tmp itself!!!). In this case, if using a shared computer like the
remote labs or the red servers in the department, make sure to choose a unique name
that won't conflict with other students using the same computer. A directory with your
account may be a suitable option (e.g., /tmp/jonatan).

You can run your file system in the foreground (with the -f option) or in the background
(without it). If you run it in the background, the file system will be created, and the
command shell will be restored for you, so you can use the same terminal for testing.
The disadvantage, though, is that you won't have access to the standard output and
debugging messages. If you run your file system in the foreground, the program will
stay running in the terminal, and will wait there for operations to happen in other
terminals (it won't accept any input, so you should not type anything in that terminal).
You will need to open a new terminal to test your file system. It is recommended that
you run your system in the foreground, as this provides more flexibility to debug your
code and see it running. As mentioned above, using the -d option can also help with
debugging.

After testing your program, to unmount and close the file system, you have two options.
One is to hit Ctrl-C in your file system terminal, if you are running it in the foreground.
The other is to run:

fusermount -u mountpoint
replacing mountpoint with the mountpoint of your file system. If your file system crashed
because of a bug (e.g., segmentation fault), you must still unmount the file system using
the command above before you are able to reuse the same mountpoint.

To test your file system, you can use commands like:

stat FILE
ls FILE
cat FILE
readlink FILE
or any other command you use to read files, or get information about them.

Note that mountpoints are computer-specific, so they only work if you access the file
from within the same computer where you mounted. This means that, for example, if
you mount a file system in red.eecs.yorku.ca, you will only have access to the file
in red.eecs.yorku.ca, even if red2.eecs.yorku.ca has the same directory. If you open a
separate terminal to test your code, make sure you run it in the same server.

If you change into the mountpoint directory using a command like cd, note that you will
not be able to cleanly unmount the file system until you leave that directory.

Additional testing
The file extList.sh is a BASH script that:

• does a recursive directory listing of the file system with the command ls -lanRi. This
command essentially checks the basic functionality of your code with respect to the
location of inodes and directories.
• runs the command sha256sum on each regular file. This command essentially checks the
functionality of your code with respect to retrieving the data blocks of a file.
• runs the stat command on every file, regardless of type, in the file system.

In part 1 you received a set of image files for testing. These files are repeated below. For
each provided disk image, though, there is an equivalent listing file, which contains the
output of the extList.sh when run on the corresponding disk image. Consequently, you
can check your program by running the extList.sh script on your solution's fuse
mounted file system. The steps you need to follow are:

1. Run your solution as follows:


2. ./ext2fs -f -s -o use_ino mountpoint XXX.img

where mountpoint is a mountpoint directory and XXX.img is an ext2 file system image;

3. Run the shell script:


4. ./extList.sh mountpoint > listing.txt
5. Compare the listing file against that provided:
6. diff -b listing.txt XXX_Listing.txt
If your program is working properly the only differences you should see is the ".." entry
for the root of the mounted file system. As long as these numbers are the only
difference on that line things are working as expected.

• ext2_R1_1024.img (listing)
• Sparse.img (listing)
• LargeUID_file.img (listing)

Files required: ext2fs.c

You might also like