Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

2010 3rd International Conference on Advanced Computer Theory and Engineering(ICACTE)

The Undelete Technology Research for UNIX-like

Fei Zhao Jingsheng Zhang


Institute of Data Recovery, Beijing Information Science Institute of Data Recovery, Beijing Information Science
and Technology University and Technology University
Beijing, China Beijing, China
zhaofei198681O@yahoo.com.cn zjs@bistu.edu.cn

Abstract-Most UNIX-like file systems do not support the


undelete operation, so it is a difficult thing to recovery the II. ANALYSIS OF UNIX-FILE STORAGE
deletion files in the field of data recovery. By studying on the
UNIX-file system has many branches, take the EXT3
bitmap and physical structure of UNIX-like file system (take
file system as an example, the others are similar. The basic
ext3 file system as an example), present an improved method
about the UNIX-like file system which does not support the
institutions of EXT3 file system is showed in Fig. 1.
undelete operation. It shows that the method is helpful in
Superblock
practical using.

Group Descriptor
Keywords- undelete; bitmap; the physical structure offlies;
data recovery Block Bitmap

I. INTRODUCTION [node Bitmap

Safety assurance system in every file systems is Inode Table


important and indispensable, but any kind of security
measure is not 100% prevent the file system from the Data

criminal acts of assault and destruction. It is also difficult to


prevent the system from error operations, such as file Figure [ Basic Structure of EXT3 File System
deletions and so on. Data corruption or loss occurred Fig. 1 shows the first group of the structure group,
inevitability. other groups may not have superblock and group descriptors,
File system is the core of the operating system, once the this varies for different versions. Group descriptor records
file system is damaged, the entire computer system will be the partition information for all groups. The group
paralyzed, user data stored in the computer can't be read, and descriptor's structure is showed as follows:
may cause disastrous consequences. Currently there are typedef struct ext3 _group_desc
already have some file-specific system of undelete
{
technology, mainly divided into two methods: 1, According _le32 bg_block_bitmap; /* Blocks bitmap block */
to the file system's recovery mechanism to restore, for
_le32 bg_inode_bitmap; /* Inodes bitmap block */
instance, in the NTFS file system, a file has been deleted,
_le32 bg_inode_table; /* Inodes table block */
but you still can find the file's name , length, number, cluster
_le16 bg_free_blocks_count; /* Free blocks count */
and other important information[1]; 2, Base on the physical
storage structure to recover the files which have been _le16 bg_free_inodes_count; /* Free inodes count */
deleted, some files have the header and footer signatures, the _le16 bg_used_dirs_count; /* Directories count */
files may come out if we copy the middle of header and _u16 bg�ad;
footer, but this method can't restore the file's blocks which _le32 bgJeserved[3];
the physical address is non-contiguous. For the file system of }
UNIX-like which does not support the undelete technology, EXT3 file system's block is the smallest unit of storage
all the information of files stored in the data structure of files. Block bitmap records the information of blocks which
inode, the corresponding inode will be cleared after files are in used (the unused block is recorded as 0; the block
have been deleted. So the first method is impossible. In which has been used is recorded as 1). The corresponding
UNIX-like file system, the larger file's data blocks are hard inode can record the size of the file, the file's pointers which
to be continuous, so the second method may be not very point to the file's data blocks and other important
good. So it is necessary to in-depth analyze the UNIX-file information.
system, and find a more efficient recovery methods. Because of the file of the EXT3 file system is deleted,
the file's corresponding inode will be cleared. So the file

978-1-4244-6542-2/$26.00 © 20 10 IEEE VJ-202


2010 3rd International Conference on Advanced Computer Theory and Engineering(1CACTE)

system doesn't know the address of the file's data blocks. As described here, undelete method can only recover the
On this file system, the deleted files are difficult to restore, data blocks which are continuous or continuous in the free
although sometimes the deleted files' inode may be space. The pointer blocks are the specific data blocks which
recorded in the log, but this probability is very low, so you the EXT3 file system used to manage file blocks. If the
can give up this method. recovered file is correct circumstances, pointer blocks will
The size of EXT3 file system block must be 2xKB, the appear in a fixed location, the concrete location and the
x is an integer, in general, x 2 and the block size is 4KB, a
= number are connected with the file's size. Here we set the
block bitmap's space is generally a block (a small file file's size is S, the file system's block size BS (for example
system does not take up full space), thus allowing a group to as 4K), calculate the pointer blocks' position as follows:
store 128MB information which also includes the space • OKB:SS:S48KB. The ext3 file system's inode has
occupied by the file system itself, making the actual data in 12 direct pointers, the file does not occupy the
each group is limited. The pointers in the inode contain 12 indirect pointer without the need of removing the
direct pointers, an indirect pointer, a double indirect pointer pointer block operation;
and a triple indirect pointer. The direct pointer is directly • 48KB<S::;4144KB;::;4MB. The file takes up an
pointing to block. The indirect pointer points to a block that indirect pointer and a pointer to block, because
its pointers point to data blocks (In the following passage, the size of each pointer occupies 4B, for example,
the block which records the pointers is called the pointer the file system block is 4KB, so a pointer block
block). The double indirect pointer points to a pointer block can keep 1024 pointers that point to 1024 blocks,
which in each of its pointers they are pointing to a pointer add original 12 direct pointers, a total of 1036
block, the triple indirect pointers and so on. So EXT3 file blocks in size, so the file's size in this range is
system can store files of TB level. These pointer blocks and only a pointer block, located in the No.13 block,
the ordinary files' block are all stored in the data area, and the block's starting offset address is 48KB;
they are no difference. According to the above two points, • 4144KB:SS<4GB. The file takes up an indirect
the file data blocks are prone to the phenomenon of pointer and a double indirect pointer. The location
discontinuous. We can conclude that it has bad effect if we of the indirect pointer block is the same with the
directly use the method of basing on the physical storage last paragraph. Double indirect pointer block's
structure to recover the files. max number is 1025, it contains a double indirect
Based on the above analysis about the EXT3 file pointer block and 1024 indirect pointer blocks
system, the file system itself can't take the route recovery (1024 indirect pointers may not be full used). The
mechanism; it also can't be directly applied to the method of location of the double indirect pointer block is at
basing on the physical file storage structure. Therefore, we the No. 1038 block in the file, the algorithm is as
need to propose an improved method of basing on the follows: an indirect pointer block point to 1024
physical file's storage structure. data blocks, these data blocks are arranged behind
the pointer block, the location of the double
III. UNIX-FILE SYSTEM UNDELETE METHOD
pointer block is at the No.1025 block behind the
If we want to improve basis file physics structure's indirect pointer block, together with the blocks in
method, first, we must make the object file's data blocks front of an indirect pointer location, it is 1025 +13
restore to be continual, this must remove the disturbance of = 1038. Pointer in the second block contains an
other blocks. The main disturbance of data blocks are indirect pointer which is in the first block position
occupied by the file system blocks, not deleted files' data after each block of a backward pointer offset 1025
blocks and the pointer blocks. We want to recover the blocks. The location of the double indirect pointer
deleted files' blocks which have been released, so that we block is 1038BS;The location of the indirect
just need to carry on the scanning to the free block, it may pointer blocks which are the subordinate of the
improve continuous rates of data blocks. double indirect pointer:
The most difficult thing is removing the pointer blocks.
When the corresponding file is deleted, the data blocks and (1039+1024k)BS(k=O ,I,2 ,n,n= 1I S-
...

lO24
4I44
l )

pointer blocks will be released together. There is no • 4GB:SS:S4TB. File has been occupied all the
difference between the pointer block and the data block.
indirect pointers and the double pointers, start
They are stored in the data area where it is mixed in data
using triple indirect pointers, it can be concluded
blocks. General speaking, they are very difficult to remove.
from the above paragraph.
In order to remove the pointer block, we must carry on the
In addition, when the target part is scanned completely,
further analysis about the UNIX-like file system, take the
it will restore the files which have been scanned, and these
EXT3 file system as an example, the distribution of the
files corresponding data blocks removed from the free space,
pointer block has certain regularity. Here we first ignore the
then re-scan the rest of the free block, this excludes the
pointer block (take pointer blocks as data blocks), restore
disturbance of files which have been restored block, to
the files out, and then remove the pointer blocks in the files.

VI-203
2010 3rd International Conference on Advanced Computer Theory and Engineering(ICACTE)

'
improve the recovery and efficiency of files. tracking text pointer. The full use of model B and B has
'
Take the image files (JPG file) as an example, JPG been "part of the match" the results of the mode B and B
files have an unique file's header and footer signatures, tum to the middle of "sliding" as far a distance, continue
beginning with a fixed value, the first 4B position more. General n 2': 2m, the actual case n>> 2m.
OxFFD8FFEO, 7-11B for the Ox4A46494600, at the end of a Definition pattern string Q function:
fixed value OxFFD9 [2]. Search header and footer of the file,
if the file's data blocks are continuous between them is a
0, j = 1

JPG file! The file's header is easier to search, because files max
{� 11<� <j, } ,
are stored in a block from the starting position, and we just b1 .. hk1-1 =bj_k,+I .. hj_1
need to search each block's first lIB to see if the first This set can't be empty
signature matching. Search footer of the file is more
1, Other cases
complicated, because the file's header will appear in a block Q(r) = (1)
'
of fixed position, but the footer's position is not fixed. Here
is a quote KMP string matching algorithm [3], and the
0, B(O)
j =

efficiency of simply using the KMP algorithm is not very . {k2Il<k2<B(0),


ITI1n
bB(o)· ..bk2+1 = bB(O)-k2+F ..bi+l
} ,

good, so here is a reference of the improved algorithm KMP


algorithm [4]: This set can't be empty
A as a text, B as a model[5], set two pointers for A,
'
mode B and mode B are located a pointer, The beginning of
B(O), Other cases

the text A and model B alignment. Text A terminal and mode Literature [6] gives a similar method for evaluating k.
' High efficiency through the above algorithm can find
B end alignment. The starting position of the pointer of
Model B is the first side of Model B. The starting position the end of the file. Main program's processing is shown as
' ' Fig. 2.
of the pointer of Model B is the end side of Model B . The
two-pointer of the text A start location: Header and footer of Finally, removed the pointer blocks on the recovered
' files, you can restore the files out.
the text. Matching process, the model B and model B turn
to the middle of text matching, respectively, the matching
processes when a mismatch occurs. It not needs back

Read all the block bitmap to build Location of log file header
a simulation chart, set the current block, to continue scanning
position to the first free block

Scan free block from the current No


location

No

Clear the record of the


Restore files and marked the original file's header
blocks which have been
End restored as non free block

Figure 2 Main Program's Processing

VI-204
2010 3rd International Conforence on Advanced Computer Theory and Engineering(ICACTE)

Figure 5 This Method of Recovery Plans


Restoration's experiment due to the use of the file image. In order to facilitate comparison and save space,
system differs. In general, newly created file system we only show the part of a significant difference.
recovery is better than the one which is used a very long FigA is available to restore the image of foreign
time. In the new file system Gust formatted), the data files software, the result of picture on the left is better than the
are possibly stored in consecutive blocks, so a very large result of picture on the right, but there are obvious stripes,
recovery effect was better. The prolonged use of the file this is because the pointer blocks are not removed. The
system stored file data block is very difficult to right makes a mistake to take a pointer block as the file's
consecutive, the effect will be poor. ending, so part of the image can't be restored. Fig.5 is
FigA and Fig.5 is the effect of two existed foreign using this way of restoring the image, can achieve a
software and the method which is in this paper recovered satisfied result.
a JPG file on the EXT3 file system. Fig.3 is the original

VI-205
2010 3rd International Conference on Advanced Computer Theory and Engineering(1CACTE)

shows similar reference [7].


IV. CONCLUSIONS
ACKNOWLEDGMENT
This paper provides an improved method which can
make the non support the UNIX-like file systems to The research work is supported by Beijing Education
restore the deleted files as much as possible. Commission Science and Technology Development
The advantage is that after repeatedly removed the Program project NO.KM200910772021.
disturbance, the file's data blocks can be more
REFERENCES
continuous. So making the based on the physical structure
of files recovery methods in the UNIX-like file systems [I] Zhongxia Wang,Wei Liu. Advanced Technology Data
Recovery[Mj.Beijing: Electronic Industry Press, 2006:192-194
can restore the large files of original block discontinuous.
[2] Yizhen Zhang. Visual C++ Realization of MPEG / JPEG Codec
In addition, introductions of the improved algorithm KMP
Technology[Mj.Beijing: Posts and Telecom Press, 2002:40---62
improve the efficiency of data recovery. The paper takes
[3] Qun Li. An Improvement of the KMP Algorithm[J]. School Paper
the EXT3 file system recovery JPG file as an example, of China University of Mining,1999,28(2): 198-200
the method to recover files in other UNIX-like file [4] Song Yu,Jun Zheng,Wenxin Wu. Improved KMP Algorithm[J].
systems and other types is similar as the EXT3 file system School Paper of East China Normal University, 2009,7(4):92-97
recovery JPG file. [5] Kaicheng Lu. Guidance Computer Algorithms - Design and
The disadvantage is that the files' header and footer Analysis[M].Beijing:Qinghua University Press, 1996:222-224

don't have any signatures and file's data blocks [6] Weimin Yan,Weimin Wu. Data structure[M]. Beijing:Qinghua
University Press, 1996:80--84
interleaved on more complex files (For instance, file A
[7] Zhongxia Wang, Jingsheng Zhang. SQL Server Database Reverse
and file B is the same type of file, file B' header is at the reconstruction[J].Beijing: School Paper of Beijing Information
middle of file A, file B's footer is after the footer of the Science and Technology University, 2009,12(4) 39--42
file A)can't be restored. For the above, understanding of
the logical structure of files can solve the problems. We
will continue study this for the future direction of research

VI-206

You might also like