Professional Documents
Culture Documents
Organizacion de Archivos para Mejorar Rendimiento
Organizacion de Archivos para Mejorar Rendimiento
Organizacion de Archivos para Mejorar Rendimiento
Objetivos
• Estudiar diferentes métodos para compresión de datos.
• Estudiar la Compresión de archivos como una manera sencilla de reusar espacio en un archivo.
• Estudiar procedimientos para eliminar registros de longitude fija que permitan que el espacio
disponible sea asignado dinámicamente.
• Utilizar listas enlazadas y pilas para manejar listas de espacio disponible en archivos.
• Estudiar diferentes métodos para resolver el problema de eliminar registros de longitude
variable en un archivo.
Contents
6.1 Data compression
6.2 Reclaiming space in files
6.3 Finding things quickly: An Introduction to internal sorting and binary searching
6.4 Keysorting
Data Compression(1)
• Reasons for data compression
– less storage
– transmitting faster, decreasing access time
– processing faster sequentially
Data Compression(2)
Using a different notation
• Fixed-Length fields are good candidates
• Cons.
– unreadable by human
– cost in encoding time
– decoding modules => increase the complexity of
s/w
=> used for particular application
Data Compression(3)
:Suppressing repeating sequences
• Run-length encoding algorithm
– read through pixels, copying pixel values to file in sequence, except the same pixel
value occurs more than once in succession
– when the same value occurs more than once in succession, substitute the following
three bytes
special run-length code indicator((ex) ff)
pixel value repeated
the number of times that value is repeated
• ex) 22 23 24 24 24 24 24 24 24 25 26 26 26 26 26 26 25 24
22 23 ff 24 07 25 ff 26 06 25 24
Data Compression(3)
:Suppressing repeating sequences
• Run-length encoding (cont’d)
00
01
b(010) c(011)
000 001
Stack
Deleting Fixed-length Records for Reclaiming Space
Dynamically(3)
Size Size
Si 38 72
Size
68
-1
ze
(a)Before removal
47
(b)After removal
Size
Removed record
72
Storage Fragmentation
• Internal fragmentation (in fixed-length record)
– waste space within a record
– in variable-length records, minimize wasted space by doing away with internal
fragmentation
• External fragmentation (in variable-length record)
– unused space outside or between individual records
– three possible solutions
storage compaction
coalescing the holes: a single, larger record slot
minimizing fragmentation by adopting placement
strategy
Internal Fragmentation
in Fixed-length Records
Record[1] Record[2]
record
40 Ames | Jone | 123 Maple | Stillwater | OK | 740751 | 64 Morrison | Sebastian |
length
Record[3]
9035 South Hillcrest | Forest Village | OK | 74820 | 45 Brown | Martha | 625 Kimb
ex) Delete Record[2] and Insert New Record[i] : 12-byte unused space
• First-fit
– select the first available record slot
– suitable when lost space is due to internal fragmentation
• Best-fit
– select the available record slot closest in size
– avail list in ascending order
– suitable when lost space is due to internal fragmentation
• Worst-fit
– select the largest record slot
– avail list in descending order
– suitable when lost space is due to external fragmentation
Finding Things Quickly(1)
– binary search
• O(log n)
• list is sorted by key
– sequential search
• O(n)
Finding Things Quickly(3)
• Sorting a disk file in RAM
– read the entire file from disk to memory
– use internal sort (=sort in memory)
• UNIX sort utility uses internal sort
• Limitations of binary search & internal sort
– binary search requires more than one or two access c.f.) single access
by RRN
– keeping a file sorted is very expensive
– an internal sort works only on small files
Internal Sort
Sort in memory
disk
memory
Key Sorting & Its Limitations
/* read in records according to sorted order, and write them out in this order */
for i := 1 to REC_COUNT
seek in IN_FILE to record with RRN of KEYNODES[I].RRN write BUFFER
contents to OUT_FILE
close IN_FILE and OUT_FILE
end PROGRAM
Two Solutions
:why bother to write the file back?
• Write out sorted KEYNODES[] array without writing records back in sorted order