UNIT - 2 Introduction to File System and File Management 2.1. File Concept Operations on File 2.2. Introduction to File System and File Management 2.3. File Access Methods (Sequential Access and Direct Access) 2.4. Directory Systems File Management Functions. 2.5. File System and Directory Structure organization. 2.6. File Protection. File Concept What is File? A file is a container in a computer system that stores data, information, settings, or commands. A file is a named collection of related information that is recorded on secondary storage such as magnetic, optical disks. Files are mapped by the operating system on the physical devices. These devices are usually non-volatile. A file is logical storage unit of secondary storage. In general, a file is a sequence of bits, bytes. A file may exist on a disk or in the main memory. The collection of files is known as Directory. The collection of directories at the different levels. File Attributes Files are named, for the convenience of its human users, and is referred to by its name. A name is usually a string of characters, such as example.c. Some systems differentiate between uppercase and lowercase characters in names, whereas other systems do not. When a file is named, it becomes independent of the process, the user, and even the system that created it. For instance, one user might create the file example.c, and another user might edit that file by specifying its name. The file's owner might write the file to a floppy disk, send it in an e-mail, or copy it across a network, and it could still be called example.c on the destination system. A file's attributes vary from one operating system to another but typically consist of these: Name: The symbolic file name is the only information kept in human readable form. Identifier: This unique tag, usually a number identifies the file within the file system; it is the non-human-readable name for the file. Type: This information is needed for systems that support different types of files. Location: This information is a pointer to a device and to the location of the file on that device. Size: The current size of the file (in bytes, words, or blocks) and possibly the maximum allowed size are included in this attribute. Protection: Access-control information determines who can do reading, writing, executing, and so on. Time, date, and user Identification: This information may be kept for creation, last modification, and last use. These data can be useful for protection, security, and usage monitoring. The information about all files is kept in the directory structure, which also resides on secondary storage. Typically, a directory entry consists of the file's name and its unique identifier. The identifier in turn locates the other file attributes. Creating File Two steps are necessary to create File: First, space must be found for the file. Second, an entry for the new file must be made in the directory. The directory entry records the name of the file and the location in the file system. Writing File To write a file, a system call is made that specifying the name of the file and the information to be written to the file. Given the name of the file, the system searches the directory to find the location of the file. The directory entry will need to store a pointer to the current block of the file (usually the beginning of the file). Using this pointer, the address of the next block can be computed where the information will be written. The write pointer must be updated - in this way, successive writes can be used to write a sequence of blocks to the file. Reading File To read a file, a system call is made specifying the name of the file and where (in memory) the next block of the file should be put. Again, the directory is searched for the associated directory entry, and the directory will need a pointer to the next block to read. Once the block is read, the pointer is updated. Truncating File: The user wants to erase content of file for that its attributes remain unchanged. Rather than forcing the user to delete file and then recreate it, this function allows all space allocated to the file is released and remove from the directory entry. File Types The file name is split into two parts - a name and an extension, usually separated by a period (.). character (dot). In this way, the user and the operating system can tell from the name alone what the type of a file is. For example, most operating systems allow users to specify a file name as a sequence of characters followed by a period and terminated by an extension of additional characters. File name examples: text file: sequence of characters formed as lines. Extension: .txt, .doc source file: sequence of subroutines and functions. Extension: .c, .cpp Write operation calls "write next" - pointer at end of newly written material, i.e. it appends to the end of file and set pointer. Reset to the beginning. Figure 2.3: sequential access method Here records maintain sequence. If we want to find specific record, we can find it easily. Disadvantage: It is time consuming because to find specific record, whole file has to be scanned from beginning to matching record or end of file. Direct access or random access: A file is made up of fixed length records. It is based on the disk model of a file. Each record has its own address for reading and writing. The file is viewed as a sequence of blocks or records - block 14 then block 53 then block 7 and so on. There is no restriction on reading or writing order. Direct access files are used in databases. Easy to read, write and delete a record. The file operation must be modified to include the block number as a parameter. Thus the address of N record is L*(N-1) where L is record length. Disadvantage: It is very difficult to find a particular record. Index sequential access: This method generally involves construction of an index of file. The index contains pointers to the various blocks. With large files the index may become too large to keep in memory. It is the combination of sequential and direct access. A pointer to the secondary index is maintained. Advantage: We can get accurate result. Disadvantage: To find a particular item, we first make a binary search of master index which provides the block number of secondary index. The block is read and again binary search is used to find the block containing desired records. Finally this block is searched sequentially. 2.4 Directory System File Management Functions Information about files is maintained by directories. A directory can contain multiple files. It can even have directories inside of it. We also call these directories as folders. The directory can be viewed as a symbol table that translates file names into their directory entries. Search for a file: We need to be able to search a directory structure to find the entry for a particular file. Create a file: New files need to be created and added to the directory. Delete a file: When a file is no longer needed, we want to be able to remove it from the directory. List a directory: We need to be able to list the files in a directory and the contents of the directory entry for each file in the list. Rename a file: Because the name of a file represents its content to its users, the name must be changeable when the content or use of the file changes. Renaming a file may also allow its position within the directory structure to be changed. Traverse file system: It is useful to be able to access every directory and every file within a directory structure. Files are saved on secondary storage to manage all these data so we need to organize them. This organization is done by directory. Single level directory: This is very common on single user operating systems. It is simple. All files are contained in the same directory. If there is more than one user then this problem occurs. All files must have unique names and it may be quite possible that different users give same file name, and then the problem of name collision occurs. We can't give descriptive name to the file because the identifiers length is limited. Two level directory: Advantages: Itsolves the problem of! ision at user level. ® Isolates one user from another, when user's creating or naming a file. (8 Toname asileuntayely, Nsey ¢ er nN | Whe feta thar t } 2) 3. tree level directory 's is |-DOS (Micro Soft Disk. “® The Most common example of ee ove direct Ms Operating System) id files. 2 Thos ot direetary which cntaring subdirectories 225 Scanned with CamScanner Wsolves the problem of name collision, ae Disadvantages: « PREAL : “ rd © Ifwe delete ardirectory then’ a idrectory should be empty. : 4 ® We cannot share file and directories. It solves the problem of name collision. Disadvantages: If we delete a directory then that directory should be empty. We cannot share files and directories. Acyclic graph directory structure: It is a generalization of tree structure. If the same file or subdirectory exists in more than one place. A tree structure does not provide sharing. Figure 2.9: Acyclic graph directory. The same file appears in two different directories. Each user views only a copy, not original. If a user changes the file, the changes will not appear to other users. With shared file: Only one actual file exists. If user changes the file, another user can immediately see that changes. A common way to create a shared file is to use links. A link is effectively a pointer to another file or subdirectory. To search the directory, the directory entry is examined to determine the type of entry. If it is a link, the real file must be located. All links are easily identified in the directory entry and are effectively named indirect pointers. Another common approach to implementing a shared file is to duplicate all the information about the file in both sharing directories thus both entries are identical and equal. A link is clearly different from original directory entry. Thus the two are not equal. A major problem with this approach is maintaining consistency when a file is modified. Through this approach, it is difficult to maintain consistency. When a shared file is deleted, it is easier to handle with symbolic links - we can just remove the link or reference count. A relative path name is used. When using an acyclic graph structure, how to ensure that there are no cycles. Figure 2.10: General graph directory. One serious problem with using a general graph structure is how to avoid searching any component twice for efficiency and performance. A poorly designed algorithm might loop through the cycle and never terminate. It is more costly and needs garbage collection. One solution is to limit the number of directories that will be accessed during a search. Garbage collector involves examining the entire file system. 2.6 File Protection: When information is kept in a computer system, a major concern is protection from physical damage. In multiuser systems, there are ways of protection. The need for protection of a file is a direct result of the ability to access files. Protection mechanism provides control access by limiting the types of file access that can be made. Access is permitted or denied depending on several factors. Several different types of operations: Read: Read from the file. Write: Write or rewrite the file. Execute: Load the file into memory and execute it. Append: Write new information at the end of file. Delete: Delete the file and free its space for reuse. List: List the name and attributes of the file. Other operations can be renaming, copying, editing the file may also be controlled. Protection is provided at only lower level (system level). Access List and Groups: The most common approach is to make access dependent on the identity of the user. Various users may need different types of access to a file or directory. To implement this mechanism is to associate with each file and directory an access list, specifying the user name and types of access allowed for each user. When user requests access to a particular file the operating system checks the access list associated with that file. We must list all the users with read access. To solve the length of access list problem, many systems recognize three classifications of users: Owner: The user who creates a file is owner. Group: A set of users who are sharing the file. Other: All other users. Other Protection Approach: There is another approach to the protection problem which is to associate a password with each file. To access a computer system itself is often controlled by a password, so access to each file can be controlled by a password. There are several disadvantages for this schema: If we associate a separate password with each file, the number of passwords that a user needs to remember may become large. If all files are accessible with one password, then all files are accessible. One solution is to associate a password with a subdirectory rather than with an individual file. Owner: the user who creates a file is owner. ; Seo Asset of users who are Sane je file Other Protection Approach Fy There is another approach to the protection problem which isto associate with 3 password to cathe To access a computer system itself often controlled by a = to cach filecan be controlled by a password. > ie are several Disadvantages for this schemas if we associated a separate passvord with each file, the number of passwords that dt a ser needs to toremember mov betaine large. tf AT ed ny theyfont ‘LL. ifilestare atressible. f : ime system all 6r to uissoclate & passwort ith a subdirectory rather than * ith an individual did deal with this problem. . Geanned with CamSeanner Scanned with CamScanner sequent ace; begin at the beginni ne oi access -each elem, : “Aah og Lu ee Index Sequential e | Access of fir ent if ak of any record first order, one after the other. In gent nie ere Search a record in index and [x [Duplicate Data is aliowen “aires. | then use pointer to access -| Duplicate Data; record directly. [ [Record isnot someq > plowed “Snot [Duplicate ata isnot order ‘ in Reeord ie in is i =~+__| allowed in index +> access is slow | basea ch Ret Record is in. sorted order 7 | Access is slow : elativ Tse, sed on k Lt. «| eneey HE thn ney Paes ee bat sow i a 5 . ‘sar 6 | ThisSmethod is useg 0) Th seeSidatafrom Tap at faa Co-Dvo | 3 cess: data/from DBMS x fem. : + Differentiate betwieen Allo cae bet cation Method : No. + Contiguous: Linked Toe nde 1 | All blocks:-of the file are | Linked list of file Blocks => =] File block: areas are consecutive on disk 26 « ¥ storéd in anarray which is Sats ee A stored in.a disk block 2 | Sequential and direct access | Effective for sequential:| Sequential and direct access ‘access but Problematic with | methods well supported but random access Much more effective for sequential access direct access 3, | External” Fragmentation | External__ Fragmentation | External Fragmentation {| Problem Occur. | Problém Resolve! Problem Resolve rt! ae ei field pe for each file ent! Easily File'Gan bélexpanded Tei can be Sparded eer haa | ize of index |}. Pestatt=and “Ends| field store for cach file entry mame and Index block field store for each file entry Major problem For creating file, How much space allocate to file No need to declare the size of file when file create: but each block occupy pointer reference of next block Major problem with how long index block when file create. Inefficient for small file | LZ [Efficient for small file Efficient for any size of file Scanned with CamScanner OE vd a Scanned with CamScanner

