Professional Documents
Culture Documents
Part 1: File System Information: PA3.1. Reading An Ext2 Disk Image: Basic Information and Blocks
Part 1: File System Information: PA3.1. Reading An Ext2 Disk Image: Basic Information and Blocks
Part 1: File System Information: PA3.1. Reading An Ext2 Disk Image: Basic Information and Blocks
The assignment is divided into five parts. The division is meant to help you with time
management while working on the assignment, and to ensure you make steady
progress on it instead of leaving it to the last minute.
• Unless otherwise specified, the course's regular policies and procedures, as listed
on the syllabus, apply to this assignment.
• To be marked, the program must compile and run on a Linux environment. In
particular, note that parts of the provided code may not work at all in other
environments, including Mac OS X. If you are testing your code in your own
Linux environment, you may need to install package libfuse-dev (or its
equivalent in your distribution).
Most of the information above can be read, or computed, from the information stored
in the superblock (located 1024 bytes ahead from the start of the image) or the group
descriptor table (located in the following block) of the file system volume file. Relevant
details about the ext2 file systems can be found here:
• https://wiki.osdev.org/Ext2
• https://www.nongnu.org/ext2-doc/ext2.html
A skeleton code is provided in ext2.zip. Note that this file contains starting code for all
parts of this assignment, as well as a testing file.
In this part of the assignment you will need to implement the following functions
in ext2.c: open_volume_file, close_volume_file, and read_block (as well as any
additional helper functions you may see fit).
Note that the function prototypes in the provided code are built in a manner that is
expected to be helpful for other parts of the assignment, as well as to simplify
automated testing, so you must not make any changes in function prototypes and
expected behaviour. Note in particular that your code will be tested with a different
main function (not provided) that includes additional tests beyond those that are listed
in the provided test files, so you should ensure that your code works for generic cases,
not only for those listed in the test file.
You are allowed to make some assumptions: you may assume that your code is running
on a little-endian CPU, which allows you to read the integer values in the image file in its
raw format. You are also allowed to assume that the file system uses version 1.0 of the
ext2 specification, and that it follows the specific format described for the Linux
environment, in particular regarding the use of 32-bit values for UID and GID. Note that
these assumptions, combined with the fact that some structs in ext2.h follow the ext2
specification exactly, should allow you to read the full superblock (as well as other data
structures like group attribute tables and inodes) in a single read operation.
One topic of note that is not clearly described in the reference links above is regarding
sparse files. In ext2 (and other similar file systems), if the block number associated to a
particular location in the file is 0 (zero), then no block is allocated for the region in
question, and all data bytes for that block are considered to be zero. This allows the file
system to store very large files with sparse content (e.g., a 13 GB file with only a handful
of non-zero blocks) by only using blocks for regions of the file that actually contain non-
zero data. You should ensure your implementation applies this information
appropriately, i.e., if the block number being read is zero, then all data returned for that
block is zero.
Additional testing
The file ext2test.c contains some preliminary tests that should allow you to verify the
basic functionality of your code, particularly the first four parts of the assignment. Some
of the tests are based on files and resources that are available in some of the sample
files below, but not in others, so you should presume that an error in this tests
necessarily means that your code is wrong. You are strongly encouraged to add further
tests to ensure your code works as expected.
The test program takes as an argument the name of the volume file containing the file
system. This functionality is provided, so make sure you don't break it. If you would like
to implement additional arguments for testing purposes, you must make sure that
calling the program with a single argument still works as expected.
A sample of ext2 image files are provided below for testing. Different files have different
characteristics (different files, sizes, block sizes, etc.).
• ext2_R1_1024.img
• Sparse.img
• LargeUID_file.img
As with the previous part, relevant details about the ext2 file systems can be found here:
• https://wiki.osdev.org/Ext2
• https://www.nongnu.org/ext2-doc/ext2.html
Note that the file ext2test.c, described in the previous parts, already has some
preliminary tests for the functions above.
The function find_file_from_path performs the path resolution process within the file
system. In particular, this function will split a path into components (file names and
directory names); for each component the function will find the directory entry for the
individual name (you may use find_file_in_directory for that purpose), then find the
inode associated to that entry. For example, if the path is /FOO/BAR/BUZZ.TXT, the
function will:
• read the inode for the root directory (see read_inode; the inode number for this
directory is represented by the constant EXT2_ROOT_INO);
• visit the entries in the root directory until it finds one whose name is FOO,
keeping track of the inode number (see find_file_in_directory);
• find and read the inode for the directory FOO;
• assuming FOO is a directory, visit the entries in the content of this directory until
it finds one whose name is BAR;
• find and read the inode for the directory BAR;
• assuming BAR is a directory, visit the entries in the content of this directory until
it finds one whose name is BUZZ.TXT;
• find and read the inode for the file, returning this inode.
Note that the file ext2test.c, described in the previous part, already has some
preliminary tests for the functions above.
This part of the assignment builds up on your work from the first two parts of the
assignment (blocks and inodes). It is not directly related to part 3 (directories), though
completing part 3 before this part may help with the testing process.
Symbolic links store, as their only information, a string corresponding to the target path
of the link. Given that this path is typically short, and using a data block for this kind of
storage is often wasteful, these files often have a special type of storage, as described
in https://www.nongnu.org/ext2-doc/ext2.html#def-symbolic-links. In essence, if the
length of the target has less than 60 bytes, the target is stored in the inode itself, in the
space typically reserved for data block numbers. Note that the structs in ext2.h use
a union to allow the same set of bytes to be used for more than one purpose.
Note that the file ext2test.c, described in the previous parts, already has some
preliminary tests for the functions above.
This part of the assignment builds up on your work from the first four parts of the
assignment.
This part of this assignment may not work at all in non-Linux environments, including
Mac OS X. If you are testing your code in your own Linux environment, you may need to
install package libfuse-dev (or its equivalent in your distribution).
FUSE allows a particular user-level application to act as a real file system mounted on
the system. When a FUSE-based file system is mounted, any system call received by the
kernel related to actions in files inside the mounted file system will trigger calls to
predetermined functions associated to those actions in your application. So, for
example, when a user lists the contents of a directory inside the mounted file system,
the kernel will call the function associated to the readdir operation. When a user reads
data from an open file, the kernel will call the function associated to the read operation,
and will use its result as the result of the read operation.
In the provided file ext2fs.c, you are provided with the skeleton of a FUSE-based file
system that uses an ext2 volume file as the underlying data structure. This skeleton
provides support for the following operations:
• init: this operation is called automatically when the file system is first mounted.
• destroy: this operation is called automatically just before the file system is un-mounted.
• getattr: read the metadata of a file (similar to the stat command, or the stat system
call).
• readdir: lists the entries of a directory.
• read: reads the content at a specific position of a file (similar to an fseek followed
by fread, or a call to pread).
• readlink: reads the target of a symbolic (soft) link.
In this part of the assignment you are responsible for implementing the operations
above (except for init and destroy, which are already implemented). The comments
provided in the file should give you enough information to implement the operations
successfully, but if you require additional information, you can find them
in http://libfuse.github.io/doxygen/. In particular, the page on fuse operations may be
particularly useful.
The command-line option -o is used to set additional options. In the example above, it
is used to add the use_ino option (see below). You are welcome to explore further
options (such as allow_other, direct_io, max_read or nonempty; see man 8 mount.fuse for
more details), but you are not required to support them, and your final implementation
should not rely on these options being set.
By default, FUSE will generate inode numbers for any new file it is made aware of. If,
however, you use the use_ino option, FUSE will use the inode number provided by
the getattr function instead of the one generated by FUSE. Your code must support
the use_ino option, i.e., it must set the st_ino field for the struct stat buffer
in ext2_getattr to the inode number used in the file system.
The mountpoint is the directory where you will mount your file system. Once you mount
it, any access to a file or directory within the mountpoint will automatically be redirected
to the file system. So, for example, if you mount a file system in /tmp/myfs, and run the
command:
ls -ali /tmp/myfs
the command will request the directory listing from the kernel, and the kernel will
request the directory listings (readdir operation) from the file system, with the path /. If
you run the command:
cat /tmp/myfs/sp.c
the kernel will call operations getattr (to confirm the file exists and is a regular file)
and read in the file system, passing the path /sp.c as an argument to each operation.
Choosing your mountpoint
You can choose almost any empty directory to be your mountpoint. However, there are
some restrictions. First of all, you can only mount one file system in each directory. If
you choose a mountpoint inside your home directory in the department Linux
computers, this directory must be readable/executable by everybody, and its path must
be executable. So, for example, if you choose the
mountpoint /cs/home/jonatan/eecs3221/pa3/mp, you can run the following commands to
provide access to the directory:
Alternatively, you can choose a mountpoint in a folder outside of your home directory,
like a folder in /tmp (not /tmp itself!!!). In this case, if using a shared computer like the
remote labs or the red servers in the department, make sure to choose a unique name
that won't conflict with other students using the same computer. A directory with your
account may be a suitable option (e.g., /tmp/jonatan).
You can run your file system in the foreground (with the -f option) or in the background
(without it). If you run it in the background, the file system will be created, and the
command shell will be restored for you, so you can use the same terminal for testing.
The disadvantage, though, is that you won't have access to the standard output and
debugging messages. If you run your file system in the foreground, the program will
stay running in the terminal, and will wait there for operations to happen in other
terminals (it won't accept any input, so you should not type anything in that terminal).
You will need to open a new terminal to test your file system. It is recommended that
you run your system in the foreground, as this provides more flexibility to debug your
code and see it running. As mentioned above, using the -d option can also help with
debugging.
After testing your program, to unmount and close the file system, you have two options.
One is to hit Ctrl-C in your file system terminal, if you are running it in the foreground.
The other is to run:
fusermount -u mountpoint
replacing mountpoint with the mountpoint of your file system. If your file system crashed
because of a bug (e.g., segmentation fault), you must still unmount the file system using
the command above before you are able to reuse the same mountpoint.
stat FILE
ls FILE
cat FILE
readlink FILE
or any other command you use to read files, or get information about them.
Note that mountpoints are computer-specific, so they only work if you access the file
from within the same computer where you mounted. This means that, for example, if
you mount a file system in red.eecs.yorku.ca, you will only have access to the file
in red.eecs.yorku.ca, even if red2.eecs.yorku.ca has the same directory. If you open a
separate terminal to test your code, make sure you run it in the same server.
If you change into the mountpoint directory using a command like cd, note that you will
not be able to cleanly unmount the file system until you leave that directory.
Additional testing
The file extList.sh is a BASH script that:
• does a recursive directory listing of the file system with the command ls -lanRi. This
command essentially checks the basic functionality of your code with respect to the
location of inodes and directories.
• runs the command sha256sum on each regular file. This command essentially checks the
functionality of your code with respect to retrieving the data blocks of a file.
• runs the stat command on every file, regardless of type, in the file system.
In part 1 you received a set of image files for testing. These files are repeated below. For
each provided disk image, though, there is an equivalent listing file, which contains the
output of the extList.sh when run on the corresponding disk image. Consequently, you
can check your program by running the extList.sh script on your solution's fuse
mounted file system. The steps you need to follow are:
where mountpoint is a mountpoint directory and XXX.img is an ext2 file system image;
• ext2_R1_1024.img (listing)
• Sparse.img (listing)
• LargeUID_file.img (listing)