OS - Unhsgeit IV

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

FILE MANAGEMENT

A file is a collection of related information that is recorded on secondary storage. Or file is a collection of
logically related entities. File management is one of the basic and important features of operating system.
Operating system is used to manage files of computer system. All the files with different extensions are
managed by operating system. File management is defined as the process of manipulating files in computer
system, it management includes the process of creating, modifying and deleting the files.
The following are some of the tasks performed by file management of operating system of any computer
system:
1. It helps to create new files in computer system and placing them at the specific locations.
2. It helps in easily and quickly locating these files in computer system.
3. It makes the process of sharing of the files among different users very easy and user friendly.
4. It helps to stores the files in separate folders known as directories. These directories help users to
search file quickly or to manage the files according to their types or uses.
5. It helps the user to modify the data of files or to modify the name of the file in the directories.
Properties of a File System
• Files are stored on disk or other storage and do not disappear when a user logs off.
• Files have names and are associated with access permission that permits controlled sharing.
• Files could be arranged or more complex structures to reflect the relationship between them.
File Attributes
A file has a name and data. Moreover, it also stores meta information like file creation date and time, current
size, last modified date, etc. All this information is called the attributes of a file system.
Here, are some important File attributes used in OS:
• Name: It is the only information stored in a human-readable form.
• Identifier: Every file is identified by a unique tag number within a file system known as an identifier.
• Location: Points to file location on device.
• Type: This attribute is required for systems that support various types of files.
• Size. Attribute used to display the current file size.
• Protection. This attribute assigns and controls the access rights of reading, writing, and executing the
file.
• Time, date and security: It is used for protection, security, and also used for monitoring
File Type
It refers to the ability of the operating system to differentiate various types of files like text files, binary, and
source files. However, Operating systems like MS_DOS and UNIX has the following type of files:
Character Special File
It is a hardware file that reads or writes data character by character, like mouse, printer, and more.
Ordinary files
• These types of files stores user information.
• It may be text, executable programs, and databases.
• It allows the user to perform operations like add, delete, and modify.
Directory Files
• Directory contains files and other related information about those files. Its basically a folder to hold
and organize multiple files. Special Files
• These files are also called device files. It represents physical devices like printers, disks, networks,
flash drive, etc.

Various Types of Files:


FILE TYPE USUAL EXTENSION FUNCTION
Read to run machine language
Executable exe, com, bin program

Compiled, machine language not


Object obj, o linked

Source Code C, java, pas, asm, a Source code in various languages

Commands to the command


Batch bat, sh interpreter

Text txt, doc Textual data, documents

Word
Processor wp, tex, rrf, doc Various word processor formats

Related files grouped into one


Archive arc, zip, tar compressed file

For containing audio/video


Multimedia mpeg, mov, rm information

Operations of File : Create, Open, Read, Write, Append, Truncate, Delete, Close
• Create file, find space on disk, and make an entry in the directory.
• Write to file, requires positioning within the file • Read from file involves positioning within the file
• Delete directory entry, regain disk space.
• Reposition: move read/write position.

File Access Methods


File access is a process that determines the way that files are accessed and read into memory. Generally, a
single access method is always supported by operating systems. Though there are some operating system
which also supports multiple access methods.
Three file access methods are:
• Sequential access
• Direct random access
• Index sequential access Sequential Access
In this type of file access method, records are accessed in a certain pre-defined sequence. In the sequential
access method, information stored in the file is also processed one by one. Most compilers access files using
this access method. Random Access
The random access method is also called direct random access. This method allow accessing the record
directly. Each record has its own address on which can be directly accessed for reading and writing.
Indexed Sequential Access
This type of accessing method is based on simple sequential access. In this access method, an index is built
for every file, with a direct pointer to different memory blocks. In this method, the Index is searched
sequentially, and its pointer can access the file directly. Multiple levels of indexing can be used to offer
greater efficiency in access. It also reduces the time needed to access a single record.
FILE ALLOCATION METHODS
In the Operating system, files are always allocated disk spaces.
Three types of space allocation methods are:
• Contiguous Allocation
• Linked Allocation
• Indexed Allocation
1. Contiguous Allocation: A single continuous set of blocks is allocated to a file at the time of file creation.
Thus, this is a pre-allocation strategy, using variable size portions. The file allocation table needs just
a single entry for each file, showing the starting block and the length of the file. This method is best
from the point of view of the individual sequential file. Multiple blocks can be read in at a time to
improve I/O performance for sequential processing. It is also easy to retrieve a single block. For
example, if a file starts at block b, and the ith block of the file is wanted, its location on secondary
storage is simply b+i-1.

Disadvantage
• External fragmentation will occur, making
it difficult to find contiguous blocks of space of
sufficient length. Compaction algorithm will be
necessary to free up additional space on disk.
• Also, with pre-allocation, it is necessary to
declare the size of the file at the time of creation.

Linked Allocation(Non-contiguous allocation) :


Allocation is on an individual block basis. Each block contains a pointer to the next block in the chain. Again
the file table needs just a single entry for each file, showing the starting block and the length of the file.
Although pre-allocation is possible, it is more common
simply to allocate blocks as needed. Any free block can be
added to the chain. The blocks need not be continuous.
Increase in file size is always possible if free disk block is
available. There is no external fragmentation because only
one block at a time is needed but there can be internal
fragmentation but it exists only in the last
disk block of file.
Disadvantage:
• Internal fragmentation exists in last disk block of file.
• There is an overhead of maintaining the pointer in every
disk block.
• If the pointer of any disk block is lost, the file will be truncated.
• It supports only the sequencial access of files.

3. Indexed Allocation:
It addresses many of the problems of contiguous and
chained allocation. In this case, the file allocation table
contains a separate one-level index for each file: The index
has one entry for each block allocated to the file. Allocation
may be on the basis of fixed-size blocks or variable-sized
blocks. Allocation by blocks eliminates external
fragmentation, whereas allocation by variable-size blocks
improves locality. This allocation technique supports both
sequential and direct access to the file and thus is the most
popular form of file allocation.

Disadvantages:
• The pointer overhead for indexed allocation is greater than linked allocation.
• For very small files, say files that expand only 2-3 blocks, the indexed allocation would keep one
entire block (index block) for the pointers which is inefficient in terms of memory utilization.
However, in linked allocation we lose the space of only 1 pointer per block.

File-System Layout
File Systems are stored on disks. The above figure depicts
a possible File-System Layout.
• MBR: Master Boot Record is used to boot the computer
• Partition Table: Partition table is present at the end of
MBR. This table gives the starting and ending addresses of
each partition.
• Boot Block: When the computer is booted, the BIOS
reads in and executes the MBR. The first thing the MBR
program does is locate the active partition, read in its first
block, which is called the boot block, and execute it. The
program in the boot block loads the operating system
contained in that partition. Every partition contains a boot
block at the beginning though it does not contain a
bootable operating system.
• Super Block: It contains all the key parameters about the file system
and is read into memory when the computer is booted or the file
system is first touched.
• Directory structure organizes the files
• Per-file File Control Block (FCB) contains many details about the file.
(Ref. Figure)
LAYERED FILE STRUCTURE
File systems organize storage on disk drives, and can be viewed as a layered design: o At the lowest
layer are the physical devices, consisting of the magnetic media, motors &
controls, and the electronics connected to them and controlling them. Modern
disk put more and more of the electronic controls directly on the disk drive itself,
leaving relatively little work for the disk controller card to perform. o I/O Control
consists of device drivers, special software programs (often written in assembly
) which communicate with the devices by reading and writing special codes
directly to and from memory addresses corresponding to the controller card's
registers. Each controller card ( device ) on a system has a different set of
addresses ( registers, a.k.a. ports ) that it listens to, and a unique set of command
codes and results codes that it understands. o The basic file system level works
directly with the device drivers in terms of retrieving and storing raw blocks of
data, without any consideration for what is in each block. Depending on the
system, blocks may be referred to with a single block number, ( e.g. block #
234234 ), or with head-sector-cylinder combinations. o The file organization
module knows about files and their logical blocks, and
how they map to physical blocks on the disk. In addition to translating from logical to physical
blocks, the file organization module also maintains the list of free blocks, and allocates free
blocks to files as needed.
o The logical file system deals with all of the meta data associated with a file ( UID, GID, mode,
dates, etc ), i.e. everything about the file except the data itself. This level manages the
directory structure and the mapping of file names to file control blocks, FCBs, which contain
all of the meta data as well as block number information for finding the data on the disk.
• The layered approach to file systems means that much of the code can be used uniformly for a wide
variety of different file systems, and only certain layers need to be filesystem specific.

File Directories
Collection of files is a file directory. A single directory may or may not contain multiple files. The directory
contains information about the files, including attributes, location and ownership. Much of this information,
especially that is concerned with storage, is managed by the operating system. The directory is itself a file,
accessible by various file management routines. It can also have sub-directories inside the main directory. In
Windows OS, it is called folders.

Information maintained in a directory:


• Name The name which is displayed to the user.
• Type: Type of the directory.
• Position: Current next-read/write pointers.
• Location: Location on the device where the file header is stored.
• Size: Number of bytes, block, and words in the file.
• Protection: Access control on read/write/execute/delete.
• Usage: Time of creation, access, modification

Operation performed on directory:


• Search for a file
• Create a file
• Delete a file
• List a directory
• Rename a file
• Traverse the file system
Advantages of maintaining directories are:
• Efficiency: A file can be located more quickly.
• Naming: It becomes convenient for users as two users can have same name for different files or may
have different name for same file.
• Grouping: Logical grouping of files can be done by properties e.g. all java programs, all games etc.

VARIOUS DIRECTORY STRUCTURE (List Advantages and Disadvantages also)

Single-Level Directory

In this a single directory is maintained for all the


users. In it all files are contained in same directory
which make it easy to support and understand. A
single level directory has a significant limitation,
however, when the number of files increases or
when the system has more than one user. Since all
the files are in the same directory, they must have the unique name.

Two-Level Directory

In this, separate directories for each user is


maintained. In the two-level directory
structure, each user has there own user files
directory (UFD). The UFDs has similar
structures, but each lists only the files
of a single user. system’s
master file directory (MFD) is searches
whenever a new user id=s logged in. The MFD is indexed by username or account number, and each entry
points to the UFD for that user.

Tree-Structured Directory :

Directory is maintained in the form of a tree. Searching is


efficient and also there is grouping capability. We have
absolute or relative path name for a file. Once we have seen
a two-level directory as a tree of height 2, the natural
generalization is to extend the directory structure to a tree
of arbitrary height.
This generalization allows the user to create there own
subdirectories and to organize on their files accordingly. A
tree structure is the most common directory structure. The
tree has a root directory, and every file in the system have a
unique path.

Acyclic Graph Directory


An acyclic graph is a graph with no cycle and
allows to share subdirectories and files. The
same file or subdirectories may be in two
different directories. It is a natural
generalization of the tree-structured directory.
It is used in the situation like when two
programmers are working on a joint project and
they need to access files. The associated files are
stored in a subdirectory, separating them from
other projects and files of other programmers,
since they are working on a joint project so they
want the subdirectories to be into their own
directories. The common subdirectories should be shared. So here we use Acyclic directories.
It is the point to note that shared file is not the same as copy file . If any programmer makes some changes
in the subdirectory it will reflect in both subdirectories.

Disk partitioning:
It is a process of dividing a hard drive into several partitions. It is one step of disk formatting. It is the process
of dividing a disk into one or more regions, the
so called partitions. If a partition is created, the
disk will store the information about the
location and size of partitions in partition table
that is usually located in the first sector of a
disk. With the partition table, each partition
can appear to the operating system as a logical
disk and users can read and write data on disk.
And each partition can be managed separately.
Partitions are categorized as primary partition
and extended partition. A MBR disk can be at
most divided into 4 primary partitions or 3 primary partitions plus 1 extended partition. And the extended
partition can be divided into a number of logical partitions.
Purpose of Partitioning a Hard Drive
1. Use of New Hard Disk
2. Ease of Windows Reinstallation
3. Better Organization
4. Easier Backup
5. Simpler Data Recovery Disk formatting
It is the process of preparing a hard disk drive or flexible disk medium for data storage. In some cases, the
formatting operation may also create one or more new file systems. The formatting process that performs
basic medium preparation is often referred to as “low-level formatting.” The term “high-level formatting”
most often refers to the process of generating a new file system. In certain operation systems (e.g., Microsoft
Windows), the two processes are combined and the term “format” is understood to mean an operation in
which a new disk medium is fully prepared to store files. Formatting a disk for use by an operating system
and its applications involves three different steps.
• Partitioning creates data structures needed by the operating system. This level of formatting often
includes checking for defective tracks or defective sectors.
• Low-level formatting (i.e., closest to the hardware) marks the surfaces of the disks with markers indicating
the start of a recording block (typically today called sector markers) and other information like block CRC
to be used later, in normal operations, by the disk controller to read or write data. This is intended to be
the permanent foundation of the disk, and is often completed at the factory.
• High-level formatting creates the file system format within the structure of the intermediate-level
formatting. This formatting includes the data structures used by the OS to identify the logical drive or
partition’s contents). This may occur during operating system installation, or when adding a new disk. Disk
and distributed file system may specify an optional boot block, and/or various volume and directory
information for the operating system.
Disk Free Space Management
Just as the space that is allocated to files must be managed ,so the space that is not currently allocated to
any file must be managed. To perform any of the file allocation techniques,it is necessary to know what
blocks on the disk are available. Thus we need a disk allocation table in addition to a file allocation table.The
following are the approaches used for free space management.
1. Bitmap
This technique is used to implement the free space management. When the free space is implemented as
the bitmap or bit vector then each block of the disk is represented by a bit. When the block is free its bit is
set to 1 and when the block is allocated the
bit is set to 0. The main advantage of the
bitmap is it is relatively simple and efficient
in finding the first free block and also the
consecutive free block in the disk. Many
computers provide the bit manipulation
instruction which is used by the users.
The calculation of the block number is done by the formula:
(number of bits per words) X (number of 0-value word) + Offset of first 1 bit Advantages:
• This technique is relatively simple.
• This technique is very efficient to find the free space on the disk.
Disadvantages:
• This technique requires a special hardware support to find the first 1 in a word it is not 0.
• This technique is not useful for the larger disks.
For example: Consider a disk where blocks 2, 3, 4, 5, 8, 9, 10, 11, 12, 13, 17, 18, 25,26, and 27 are free and
the rest of the blocks are allocated. The free-space bitmap would be:
001111001111110001100000011100000
2. Linked list
This is another technique for free space management. In this linked list of all the free block is maintained. In
this, there is a head pointer
which points the first free block
of the list which is kept in a
special location on the disk. This
block contains the pointer to the next block and the next block contain the pointer of another next and this
process is repeated. By using this disk it is not easy to search the free list. This technique is not sufficient to
traverse the list because we have to read each disk block that requires I/O time. So traversing in the free
list is not a frequent action.
In our earlier example, we see that keep block 2 is the first free block which points to another block which
contains the pointer of the 3 blocks and 3 blocks contain the pointer to the 4 blocks and this contains the
pointer to the 5 block then 5 block contains the pointer to the next block and this process is repeated at the
last .
3. Grouping
This is also the technique of free space management. In this,
there is a modification of the free-list approach which stores
the address of the n free blocks. In this the first n-1 blocks are
free but the last block contains the address of the n blocks.
When we use the standard linked list approach the addresses
of a large number of blocks can be found very quickly. In this
approach, we cannot keep a list of n free disk addresses but
we keep the address of the first free block.

4. Counting
Counting is another approach for free space management. Generally, some contiguous blocks are allocated
but some are free simultaneously. When the free space is allocated to a process according to the contiguous
allocation algorithm or clustering. So we cannot keep the list of n free block address but we can keep the
address of the first free block and then the numbers of n free contiguous block which follows the first block.
When there is an entry in the free space list it consists the address of the disk and a count variable. This
method of free space management is similar to the method of allocating blocks. We can store these entries
in the B-tree in place of the linked list. So the operations like lookup, deletion, insertion are efficient.

DISK STRUCTURE:
A hard disk is a memory storage device which looks like
this:
The disk is divided into tracks. Each track is further
divided into sectors. The point to be noted here is that
outer tracks are bigger in size than the inner tracks but
they contain the same number of sectors and have
equal storage capacity. This is because the storage
density is high in sectors of the inner tracks where as
the bits are sparsely arranged in sectors of the outer
tracks. Some space of every sector is used for
formatting. So, the actual capacity of a sector is less
than the given capacity.

Read-Write(R-W) head moves over the rotating hard disk. It is this Read-Write head that performs all the
read and write operations on the disk and hence, position of the R-W head is a major concern. To perform a
read or write operation on a memory location, we need to place the R-W head over that position. Some
important terms must be noted here:
• Seek time – The time taken by the R-W head to reach the desired track from it’s current position. Seek
time is the time taken to locate the disk arm to a specified track where the data is to be read or write. So
the disk scheduling algorithm that gives minimum average seek time is better.
• Rotational latency – Rotational Latency is the time taken by the desired sector of disk to rotate into a
position so that it can access the read/write heads. So the disk scheduling algorithm that gives minimum
rotational latency is better. Hence it is Time taken by the sector to come under the R-W head.
• Data transfer time – It is the Time taken to transfer the required amount of data. It depends upon the
rotational speed.
• Disk Access Time: Disk Access Time is:
Disk Access Time = Seek Time + Rotational Latency + Transfer Time
• Disk Response Time: Response Time is the average of time spent by a request waiting to perform its I/O
operation. Average Response time is the response time of the all requests. Variance Response Time is
measure of how individual request are serviced with respect to average response time. So the disk
scheduling algorithm that gives minimum variance response time is better.

Disk Scheduling Algorithms


Disk scheduling is done by operating systems to schedule I/O requests arriving for the disk. Disk scheduling
is also known as I/O scheduling. Disk scheduling is important because:
• Multiple I/O requests may arrive by different processes and only one I/O request can be served at a
time by the disk controller. Thus other I/O requests need to wait in the waiting queue and need to be
scheduled.
• Two or more request may be far from each other so can result in greater disk arm movement.
• Hard drives are one of the slowest parts of the computer system and thus need to be accessed in an
efficient manner.
Each algorithm is unique in its own way. Overall Performance depends on the number and type of requests.

Various Disk Scheduling Algorithms


1. FCFS 2. SSTF 3. SCAN 4. C-SCAN 6. LOOK 7. CLOOK

You might also like