Unit 4 DATA PLACEMENT ON DISKS

DATA PLACEMENT ON DISKS
Data Placement on Disks
STATISTICAL PLACEMENT ON DISKS:
In this method Multimedia objects are placed based on the characteristics of their access
pattern.
Placement Methods:
1.Frequency based placement –
place objects based on the access frequencies
2. Bandwidth based placement.
places objects based on their datarate.
Frequency based placement
MM stream access objects display them to the users. Some users access same objects
frequently. This gives pattern of accessing the data. This is called as Popularity of the
object. Access frequency depend on the object,
When an object is popular, it is accessed frequently, The time interval decreases. Such
objects are called as hot or high temperature objects
The hot objects are placed in the convenient place so that it can be retrieved easily.
When an object is not popular, it is accessed less frequently, The time interval increases.
Such objects are called as cold or low temperature objects
The cold objects are removed from the convenient place and placed in least convenient
location
Some objects are neither popular nor unpopular, time interval is medium. They are called
as warm objects. These objects are placed in medium convenient location
Example
V1 is the hottest object and V12 is the coolest object.
When the object is in the Inner zone it takes more time transfer and when the object is in
the outer zone it takes less time to transfer.
Zone 5: Minimum access time from all positions of the disk. Therefore V1,V2 and V3 are
placed in zone 5
Zone 4: Second Minimum access time from all positions of the disk. Therefore V4 and V5
are placed in zone 4
Zone 6: Third Minimum access time from all positions of the disk. Therefore V6,V7 and V8
are placed in zone 6
Zone 3: Next Minimum access time from all positions of the disk. Therefore V10 and V9 are
placed in zone 3
Zone 2: Next Minimum access time from all positions of the disk. Therefore V12 is placed
in zone 2
Zone 1: Highest access time from all positions of the disk. Therefore V11is placed in zone 1
In frequency based placement extra storage is required for storing access history
Bandwidth based placement:
Different multimedia pbjects require different access bandwidth.
For example video requires high BW and high data rate whereas audio requires low BW
and lesser data rate
Besides MM objects files and programs are also stored in disks
Inner zone takes more time to transfer (low data rate)
Outer zone takes less time to transfer (high data rate)
Steps in Bandwidth based placement

Step1: MM objects are grouped based on the BW. (Low and High). Others (files,etc) are
grouped as arbitrary BW group.
Step2: MM objects are sorted from highest to lowest BW.
Step3: Disk is divided into number of zones. No of Zones are >= No of BW groups.
Step4: Placing of Objects
 Since the outer zone has less access time and high transfer rate , Objects group with
highest BW is placed in the outermost zone (Zone 6) . In the Example V1,V2 and V3
have higher BW, hence placed in outer zone.
 Similarly Objects group with next highest BW is placed in the next outermost zone.
In the Example V4,V5 and V6 have next higher BW, hence placed in zone 5.
 Similarly Objects group with next highest BW is placed in the next outermost zone.
In the Example V7 and V8 have next higher BW, hence placed in zone4
 Similarly Arbitrary Objects group with next highest BW is placed in the next
outermost zone. In the Example V9 and V10 have next higher BW, hence placed in
zone 3
 Similarly V11 is placed in Zone 2 and V12 is placed in the Zone 1
---------------------------------------------------------------------------------------------------------------------
STRIPING OF DISKS OR DATA STRIPING
It is the process of dividing a body of data into blocks and spreading the data blocks across
multiple storage devices
Types:
1. Simple Striping
2. Staggered Striping
Simple Striping
An object is divided into multiple data stripes of equal size. Stripes are placed in
multiple disks on a array. Each data strip is placed on one disk in the round robin manner.
If No of Disks is N
First Strip is Placed in Disk1
Second strip is placed in Disk 2
Nth strip is placed in Disk N
N+1th strip is placed in Disk 1 and so on

When the data stripes are accessed from the disk array, one request is sent to every
disk in the array at the same time.
While the first disk is repositioning its read/write heads to the desired location, the
second to the last disks are also repositioning their read/write heads to the desired
locations.
One data stripe is then transferred back from each disk to the memory buffers.
Hence, the time required to retrieve N data stripes from N identical disks takes about the
same amount of time as retrieving one data stripe from only one disk.
The throughputs of all N disks are summed up to provide high data bandwidth.
Example: (From the diagram)
No of Disk – 4
Stripes of Object X- X1,X2,X3,X4,X5,X6,X7,X8
Stripes of objects Y – Y1,Y2,Y3,Y4,Y5,Y6
First Object Stripes X1 to X4 are placed, then X5-X8 are placed
Then Y1 to Y4 and Y5 and Y6 are placed
While retrieving X1 to X4 are retrieved simultaneously. They are filled in buffers.
Once the initial buffers are filled stream may begin to display. Once displayed the buffer is
freed.
Then X4 to X8 are are retrieved simultaneously. They are filled in buffers and displayed and
then the buffer is freed. In this way data streams are retrieved.
If the No of disk is N, then No of Buffers is also N
Throughput of Disk 1 is β
Throughput of Disk N is Nβ, Thus throughput increases
Disadvantages:
Some Disks serve more and some less therefore throughput is affected.
Busy disks are unavailable for long period of time
Staggered Striping
In Staggered Striping multimedia objects are divided into subobjects.
Each sub-object is placed on a group of disks as shown in the diagram.. The number of disks
in a group is chosen based on the required bandwidth of the object. The next sub-object is
then placed in the next disk group
Example:
Stream S is divided in to Objects : X1,X2,X3
Object X1 is divided into sub objects :X11,X12,X13
There are 4 disks – Disk 1,2,3 and 4
Sub objects( Data Stripes ) of X (X11,X12,X13) are placed in 3 disks – Disk 1,2,and 3
Since Stride is 1 Sub objects of Object X2 (X21,X22,X23) is shifted right by 1 disk and
placed in Disk 2,3 and 4
Similarly sub objects of X3(X31,X32,X33) is shifted and placed in Disk3,4, and 1
Likewise sub objects of Y1(Y11,Y12) are placed in Disk 1 and Disk 4 , Y2(Y21,Y22) in Disk
1 and Disk2 and Y3(Y31,Y32) are placed in Disk 2 and Disk 3 respectively
On retrieving Example Stream X1, Disks 1, 2 and 3 work simultaneously while Disk 4 is left
free. Thus each disk becomes free periodically.
Within the time gap another stream can be retrieved. Thus multiple streams can be
accessed in tandom.
Disadvantage:
Suffers from disk BW fragmentation problem

New streams are rejected
Pseudorandom Placement
In the data striping methods, the data stripes are stored onto a number of disks. The choice
of which disk to store a data stripe depends on the number of disks. The striping methods
assign disk numbers sequentially to data stripes in cycles
When new disks are added to the disk farm, the data stripes become incorrectly placed
according to the new number of disks. Thus, all the disks need to be reorganized, and the
data stripes of all the objects are moved to new locations. This reorganization of all the
objects on the disks incurs heavy workload on the storage system
Apart from disk additions, disk reorganization is required when disks are removed. When
one of the disks is removed, the number of disks decreases. The data stripes on all disks
should be moved to other disks
Disk addition algorithm uses RF(Xj-1) to generate a new random number so that the data
stripes can be evenly distributed to Nj disks
In the above example Newly added disk is D3. Using Disk addition algorithm placing of
objects are changed.
Y3 is moved from Disk 1 to Disk3

Similarly Y4 from Disk 2 to 1 , Y5 from Disk 1 to 2 and Y6 from Disk 2 to 3.
It uses the pseudorandom number function to generate new disk numbers that are
independent
of the disk number of other data stripes. The pseudorandom placement reduces the
workload on data reorganization when disks are added or removed.
---------------------------------------------------------------------------------------------------------------------
REPLICATION PLACEMENT ON DISKS

Extra copies of objects may be created and stored on the storage system to increase
the storage system performance. The presence of replicas on light loading disks may be
able to reduce the period of time that an object is inaccessible. Thus, the replication
strategy increases the availability of the stored objects by applying redundancy on the
stored objects.
Advantages:
 the replica on idle disks can increase the availability of data on corrupted and busy
disks.
 the replica on local server can reduce the network load to access objects from
remote servers.
 the replica on local server can also reduce the need to wait for the filling of initial
buffer prior to consumption.
 replica can avoid disk multitasking by avoiding the need to serve multiple streams
from the same disk head
Replication to Increase Availability
Multimedia data are large, and each stream accesses data of an object for a long
time. Thus, the disk containing the accessed object will become busy for a long period of
time.
When other streams try to access other objects residing on the same disk, the disk
becomes too busy to serve them. As a result, new request streams will not be served until
the disk becomes free.
This disk outage problem limits the storage system’s ability to serve multimedia
streams without degrading their quality.
To avoid this Streaming RAID is method 0s used.
In the streaming RAID method, every data object is divided into a number of blocks of fixed
size.
The data blocks are stored on multiple disks. One block from each disk forms a group. A
parity bit is formed by XOR of one bit from every block in the group. The parity information
is created as the redundant information. This parity information is then stored onto a
separate parity disk
When data are accessed, each disk is issued a request.
All the requests are then served simultaneously. Each request retrieves a data block from
the disk. If one of the disks is not available, or some data on the disk are corrupted, the
redundant information on the parity disk is accessed.
The unavailable or corrupted block is then reconstructed from the data from other data
disks and the parity disk.
In the above example each bit from a block is treated as a parity it. When corrupted
this parity block is used to recover the object.
Replication to Reduce Network Load
Multimedia objects are large in size. The workload to deliver multimedia objects across the
networks is heavy.
It would be nice if some objects can be fetched from neighboring storage servers that have
stored the object.
Thus, data replication is one of the approaches to distribute objects across the network in
order to reduce the network load
Replication to Reduce Start-Up Latency
When multimedia streams are initiated, the storage server delivers the first part of the
object to the clients.
The client program uses these initial data to fill up the memory buffer. Before enough data
are received to fill the buffers, the client program cannot start to display the stream. Thus,
the client program needs to wait for the delivery of the initial part of the objects. This
waiting time is the start-up latency.
When multimedia objects are being accessed over a high delay network, the start-up
latency may be significant. Data replication is reduces the start-up latency.
Replication to Avoid Disk Multitasking

In magnetic hard disks, the disk heads are connected together like a hair comb. All the disk
heads move at the same time to the accessed tracks and cylinders. Each object may be
stored contiguously onto a few cylinders.
When one of these objects is accessed, the disk heads only need to move one long seek
action. Subsequent seek actions are very short if there is not any other concurrent streams.
Thus, the aggregate seek overhead is very low.
However, if the disk heads need to serve multiple concurrent streams, the disk heads move
up and down across the disk surface to serve one request for each stream. The disk head is
then multitasked among multiple streams.
This disk multitasking involves significant seek overheads that decrease the disk
performance.
Separate replica on multiple disks may avoid disk multitasking by avoiding the need to
serve multiple streams from the same disk head
Replication to Maintain Balance of Space and Load
Replication of objects consumes both storage space and access bandwidth. When a replica
of an object is stored on a storage device, the replica may be accessed by request streams.
The replica cannot be accessed if the storage device does not have enough bandwidth to
deliver it.
Thus, it is more appropriate to place the replica on other storage devices with sufficient
bandwidth
UNDESIRABLE CONDITIONS ON A STORAGE SYSTEM
 The hot objects should be replicated to high bandwidth disks

 The cold objects should be replicated to low bandwidth disks
 high bandwidth disk needs to have sufficient storage capacity to store the hot
objects
 high bandwidth disk needs to have sufficient storage capacity to store the hot
objects
Actual BSR = Access Bandwidth / Storage Capacity
Allocated BSR =Allocated Access Bandwidth / Allocated Storage Capacity,
BSR Deviation.=.Allocated BSR.– .Actual. BSR.
Example, a storage system has five disks, D1 to D5. Each disk has a storage capacity of
600MB and bandwidth 60 MB/s. After some objects are stored on the disks, the disks now
have 100MB, 300MB, 300MB, 500MB,500MB of free space and 10MB/s, 35MB/s, 30MB/s,
45MB/s, 50MB/s of free bandwidth, respectively.
ACTUAL BSR=BANDWIDTH/STORAGE
CAPACITY
ALLOCATED BANDWIDTH=BANDWIDTH-FREE
BANDWIDTH
ALLOCATED STORAGE SPACE=STORAGE
CAPACITY-FREE SPACE
ALLOCATED BSR=ALLOCATED
BANDWIDTH/ALLOCATED STORAGE SPACE
BSR DEVIATIONS=ALLOCATED BSR - ACTUAL
BSR
Alloca
Stora Allocat Obje
Free Free Actu ted BSR
Dis ge Bandw ed Allocat ct
Spac Bandw al Stora Deviati
ks Capa idth Bandw ed BSR Typ
e idth BSR ge ons
city idth e
Space
600M 60 100 10MB/ 0.1 / 50MB/ 500M 0.1/ WA
D1 B MB/s MB s SEC s B SEC 0/SEC RM
-
600M 60 300 35MB/ 0.1 / 25MB/ 300M 0.083/ 0.17/ COL
D2 B MB/s MB s SEC s B SEC SEC D
600M 60 300 30MB/ 0.1 / 30MB/ 300M 0.1/ WA
D3 B MB/s MB s SEC s B SEC 0/SEC RM
600M 60 500 45MB/ 0.1 / 15MB/ 100M 0.15/ 0.05/
D4 B MB/s MB s SEC s B SEC SEC HOT
D5 600M 60 500 50MB/ 0.1 / 10MB/ 100M 0.1/ 0/SEC WA
B MB/s MB s SEC s B SEC RM
Hot objects D4 is stored in high bandwidth storage devices and cold object D2 is stored in
low bandwidth storage devices
---------------------------------------------------------------------------------------------------------------------
CONSTRAINT ALLOCATION ON DISKS
Most existing storage servers store data stripes on magnetic hard disks. These
magnetic hard disks are accessed by moving the disk heads to random disk tracks. A
significant amount of overhead is spent in moving the disk heads across the disk tracks.
The access time of a request would be significantly reduced if the seek time is reduced
Constraint allocation methods limit the available locations to store the data stripes.
This helps to reduce the access time
Two constraint allocation methods that are
1. Phase Based Constraint Allocation
2. Region Based Constraint Allocation
PHASE BASED CONSTRAINT ALLOCATION

Multimedia objects are stored on and accessed from storage systems. The
concurrent streams send requests to access data stripes. The disk heads serve all the
requests of one stream before another. Hence the latter stream waits for a long time before
it can start.
If the disk heads serve requests of streams in an interleaving manner, the disk heads
move across the disk heads many times.
The overheads in serving concurrent streams are heavy, and the storage system
cannot retrieve the objects efficiently.
To avoid it in Phase based Constraint allocation, data stripes of an object are stored
in nearby locations.
First homogeneous streams( accessed at the same time) are chosen. Therefore all can
access in the same period.
Secondly streams are scheduled before storing.
In the phase based constraint allocation method, all the objects are interleaved
together to form a super object. The super object is viewed by users starting at fixed and
regular intervals called phases.
The super object is then placed on the storage system.

Let m – number of disks in the storage system
p - number of phases.
The super object is organized as an (n x (m x p)) matrix of data stripes.
For example, the storage system delivers six streams, and each stream is separated from
the previous stream with a phase time T.
The data stripes of the super object are placed on three disks as shown
The data stripes X11 to X61 are stored on the same track of the disk 1.
The data stripes X11 to X61 are accessed by only one disk seeking action.
The six streams can be displayed in T/3.
Afterwards, the data stripes X12 to X62 are accessed. The six streams can be displayed in
T/3.
Similarly, the data stripes X13 to X63 are accessed and the six streams can be displayed in
T/3.
The seek time is shared among the number of phases.
REGION BASED CONSTRAINT ALLOCATION
The region based constraint allocation method partitions each disk into many regions. Each
region consists of a number of neighboring tracks. The number of regions and the size of
each region may vary from disk to disk.
When many small regions are created, the maximum seek distance is short, but the
maximum start up latency is high.
When many larger regions are created, the maximum seek distance is long, but the
maximum start up latency is low
The disk heads would traverse track by track to access data.
The data stripes of objects are stored into the regions one by one as shown in the figure.
Diagram shows data stripes of 2 objects X and Y
Disk is split into 6 regions

X is split into X1,X2,-------X16
Y is split into Y1,Y2,-------Y16
X1 is stored in region 1
X11 is stored in region 1 and so on.
Data is stored in regions from Left to Right. On reaching the Right most region data storing
direction changes to Right to Left
Similarly on reaching the Left most region the direction again changes. Similarly Y is
stored. This limits the separation distance among the data stripes belonging to the same
object
---------------------------------------------------------------------------------------------------------------------
TERTIARY STORAGE DEVICES
The main objective of the tertiary storage level is to provide huge storage capacity at low
cost
Some are
 Magnetic tapes
 Optical disks
 Optical tapes
The storage devices may be fixed or removable.
1.MAGNETIC TAPES
Magnetic tapes are used to store Multimedia objects that are large and rarely
accessed. They are cheap. A multimedia stream accessed a data object sequentially. When a
multimedia object is sequentially accessed from the tapes, the tape storage format allows
the storage system to deliver data at high throughput.
Magnetic tape drives access data tapes in two forms.
 the tape reels
 the tape cartridges.
Tape Reels:
Tapes are wound on reels. It has 2 reels supply reels and takeup reels. The magnetic tape is
unwound from the supply reel. It passes through several tape guides to the read-write
heads. It then passes through more tape guides to the take-up reel to be wound as
illustrated in Figure. It usually needs human intervention to lead the tape through all the
guides and wind it at the take-up reel. It is difficult to perform the tape loading operation
automatically.
Tape cartridges:
When tapes are kept on cartridges, the supply reel, the tape guides, and the take-up
reel are all included in one cartridge. To load a tape, the drive extracts some tape between
the supply reel and the take-up reel and winds it around the read-write heads. It is easier to
perform the tape loading operation automatically.
Magnetic tapes record data in three formats. They are
• Linear tape
• Longitudinal tape
• Helican scan
Linear tape:
Tapes move horizontally. The read/write heads assembly is fixed inside the tape
drive. Computer data are recorded horizontally along with the tape moving direction.
Several read/write heads are mounted along perpendicularly to the tape moving direction.
Thus, several bits are recorded by the read/write heads on the tape at the same time.
Longitudinal tape:
It is similar to linear tap format. But after moving in the Left right direction and
reaching the end the tape moves in the opposite direction (Backwards-Right to left)
Therefore the second data track is in the opposite direction,
Helican scan:
R/W heads are mounted on a cylindrical drum at an angle. The heads rotate while
the cylinder is rotated, They form short tracks on the tape
OPTICAL DISKS:
In the optical disk reading and writing of data s done by light (optical) beams.
A thin layer of Aluminium (recording material) is coated on the substrate. The AL layer is
coated with clear polycarbonate. Data are recorded on the recording material under the
disk surface. High intensity light rays record (write) data. The optical disks are circular in
shape. Data are recorded on a spiral track. Low intensity light rays are used to read data
Types:
Read only disk
Write once disk
Rewritable disk
Read-write disk
OPTICAL TAPES:
Optical tapes are designed to maximize the storage capacity of a media unit. Optical
tapes record data on tapes using laser beams to maximize the recording density. Most of
the recording surface is wound and hidden using a tape form.
The tape moves horizontally. The optical read/write head moves at the vertical
direction to the tape moving direction as shown
The tape stops while the optical head records a track of data. The head records data from
one edge to another edge along the width of the tape. After a track of data is recorded, the
tape moves a step and the head springs back. The tape stops again, and the optical head
writes another data track. These data tracks are perpendicular to the tape moving
direction. The length of the data track is shorter than the width of the tape.
Robotic Tape Library

Large hierarchical storage systems need to store many objects. These HSS need to
exchange tapes quickly so that they can serve many requests. Robotic tape libraries
perform the exchange operation automatically so that manual operations are avoided.
As shown in Figure, the robotic tape library consists of the:
1. tapes,
2. tape drive,
3. robotic arm, and
4. tape cells.
As in manual tape drives, the objects are stored on the tapes, and the tapes are loaded to
the tape drives for accessing.
The number of tape drives in the robotic tape library determines the access bandwidth of
the library.
The tapes are usually kept in the tape cells. The number of tape cells in the robotic tape
library determines the total storage capacity of the library.
The robotic arm performs the exchange of tapes automatically.
When a tape is required, the robotic tape library performs the following
steps:
1. The robotic arm moves to the tape drive.
2. The tape drive ejects the original tape.
3. The robotic arm fetches the original tape.
4. The robotic arm moves the original tape back to its own cell or a vacant cell.
5. The robotic arm moves to the cell containing the new tape.
6. The robotic arm fetches the new tape.
7. The robotic arm moves the tape to the tape drive.
8. The robotic arm puts the tape inside the tape drive.
9. The tape drive loads the tape.
The above exchange operation uses only one robotic arm. If the tape library
has two robotic arms that are mounted together, it can exchange tapes using
fewer steps as follows.
1. The robotic arm moves to the tape cell.
2. The robotic arm fetches the tape from the cell.
3. The robotic arm moves the tape to the tape drive.
4. The tape drive ejects the original tape.
5. The robotic arm exchanges the tape with the original tape.
6. The tape drive loads the new tape.
7. The robotic arm moves the original tape back to its cell or a vacant cell.
From the above steps, the tape library performs fewer steps when two robotic arms are
available. In addition, the tape drive can start its next operation to reposition the tape after
the new tape is loaded. The tape drive receives the new tape in fewer steps. Thus, the
robotic tape library can thus exchange tapes more quickly.
-------------------------------------------------------------------------------------------------------------------
CONTIGUOUS (CONTINUOUS) PLACEMENT ON HIERARCHICAL STORAGE SYSTEMS
The contiguous placement is the most common method to place traditional data files
on tertiary storage devices.
The storage space in the media units is checked.
The data file is stored on a media unit with enough space to store the data file.
Types of placement
Contiguous placement
Log Structured Placement
Contiguous Placement
Multimedia objects can be stored in a contiguous manner as the media units. Each
media unit stores the whole object as a file. When a media unit is partially occupied and the
available storage space is not enough to store a second object a separate media unit is used.
And If the object is larger than the storage capacity of a media unit, and part of the object
can be stored in another media unit. The object is partly stored on each media unit.
Each file occupies a contiguous set of blocks on the disk. For example, if a file requires
n blocks and is given a block 3 as the starting location, then the blocks assigned to the file
will be: 3,4,5,…….3+(n-1). cks occupied by the file.
The directory entry for a file with contiguous allocation contains
 Address of starting block – gives the fragment location from where objects are placed
continuously
 Length of the allocated portion. – the number of fragments needed for an object to be
placed.
In the above example we have 3 objects A, B and C.

Object A is placed in 0 to2, since the starting location of A is 0 and the length is 3.
Object B is placed in 4 to6, since the starting location of B is 4 and the length is 3.
Object C is placed in 7 to10, since the starting location of C is 7 and the length is 4.
Entire object is accessed consecutively as a whole.
Log Structured Placement

In the back up and data archival applications, object files are backed up to the media units
so that they can be accessed when the original data are corrupted or lost.
As modified objects are also backed up to the media units, an object may be overwritten
many times before it is retrieved again. Thus, the data objects are written more often than
they are read in these applications.
A simple log structured placement treats the entire storage space as a log. All the object
files are written by appending to the storage space only the access ranges within an object
are tracked to determine whether the object is often accessed sequentially or randomly.
Random accessed objects may be stored to many locations on the media units. Sequential
object files are stored contiguously.
When new objects are created, they are appended to the end of the media units. When an
object is modified, the modified data blocks are appended to the end of the media unit and
the data blocks being modified are deleted. When the entire object is overwritten, the
entire object is appended to the end of the media unit.
The append-only operation improves the efficiency in serving consecutive writing requests
This simplifies the storage management and retrieval.
Advantages:
Sequential write increases efficiency
Reduced fragmentation
---------------------------------------------------------------------------------------------------------------------
STATISTICAL PLACEMENT ON HIERARCHICAL STORAGE SYSTEMS
Frequency Based Placement [in the topic 1]
---------------------------------------------------------------------------------------
CONSTRAINT ALLOCATION ON HIERARCHICAL STORAGE SYSTEMS
Multimedia objects are stored on hierarchical storage systems (HSS). The objects are large
in size but the access latency of HSS is high throughput is less. But the throughput should
be high. addition to the statistical placement and striping methods in the two previous
chapters, constraint allocation can also improve the throughput of HSS.
Multimedia streams should be displayed with continuity. Constraint allocation method

limits the placement of data on MM objects. They decrease access time.
Interleaved Contiguous Placement
Some multimedia streams are accessed concurrently. Ex Audio and Video of a movie.
The interleaved contiguous placement method merges the data stripes of the objects that
are likely to be accessed concurrently. It interleaves the data stripes on the same optical
disk by maintaining the distance in separation between consecutive data stripes. Thus, the
optical disk moves only the distance between consecutive data stripes to serve a request on
the object.
Each object is characterized by the

Each stored object is characterized by
M - the number of data blocks of each data stripe,
G - the number of gap blocks between two consecutive data stripes of the stream.
Figure shows two homogeneous streams, stream X and stream Y.

Stream X is divided into data stripes, X1, X2, X3, X4, and so on.
GX- gap blocks of X.
The stream Y is divided into data stripes Y1, Y2, Y3, Y4, and so on.
Gy-gap blocks of Y
When the two streams are merged, the media blocks of stream X are placed in the gap
blocks of stream Y, and the media blocks of stream Y are placed in the gap blocks of stream
X.
The gap block becomes smaller GXY.
Corollary1:
Two homogeneous streams can be merged if and only if the number of media blocks of the
second stream is not more than the number of gap blocks of the first stream.
EXAMPLE:
No of data stripes of Y < No of gap blocks of X [Gx]
No of data stripes of X < No of gap blocks of X [Gy]
Corollary.2:
A number of homogeneous streams can be merged if and only if the total number of media
blocks of the streams within a period of time is not more than the number of blocks that are
retrieved within the same period of time.
The interleaved contiguous placement uses two policies to merge streams depending on
whether the storage pattern of streams remains the same or not.
Two policies:
Storage Pattern Preserving Policy
Storage Pattern Altering Policy
STORAGE PATTERN PRESERVING POLICY
In the storage pattern preserving (SPP) policy, two streams are merged. The streams
maintain their storage patterns before and after the merging
The storage pattern preserving policy states that two media streams can be merged if and
only if their greatest common divisor satisfies the feasibility condition,
M1 + M2 ≤ G.C.D. (M1 + G1, M2 + G2),

where G.C.D.( ) is the greatest common divisor function
S1,S2 – two streams

S1 storage pattern – (1,3) ie 1- storage starting block and 3- gap of block between 2 stripes
M1=1, G1=3, M2=1,G2=5
M1+M2=2, M1+G1= 4, M2+G2=6
The condition for feasibility of merging is M1 + M2 ≤ G.C.D. (M1 + G1, M2 + G2),
Substituting we get 2 ≤ GCD(4,6) ie 2 ≤ 2

This satisfies the condition Therefore S1 and S2 can be merged
In the figure first row shows the stream S1 and second row shows the stream S2
From the figure we can say that I and III blocks of S2 lies under the blocks of S1. Therefore
it cannot be merged.
Hence Shift S2 by 1 block (III row). No block of S2 comes under S1. Therefore it can be
merged.
Again Shift S2 by 1 block (IV row). Second block of S2 comes under S1. Therefore it cannot
be merged.
Again Shift S2 by 1 block (V row). No block of S2 comes under S1. Therefore it can be
merged.
The merged block is shown in the VI row.

The SPP doesn’t change the storage pattern of individual streams. Therefore continuous
display is preserved.
STORAGE PATTERN ALTERING POLICY

It changes the storage pattern of the participating stream. While the average no of
gap blocks per media blocks remains the same.
A number of multimedia data streams whose storage patterns are characterized
by (M1, G1), (M2, G2), …, (Mk, Gk) can be merged if and only if
Example:
Consider 2 streams S1 and S3
S1,S3– two streams
M1=1, G1=3, M2=1,G2=2
Substituting in the above equation we get 7/12 <1
Thus the equation is satisfied. Therefore S1 and S3 can be merged.
In the figure first row shows the stream S1 and second row shows the stream S3
Storage pattern S1 need not be altered.
Storage pattern of S3 need to be altered
In the third row S3 is moved by 1 cell left (altered) and the problem of blocks remaining in
the same column is solved
But extra buffer is needed to temporarily store the block while accessing
CONCURRENT STRIPING
The principle aim of the concurrent striping method is to increase the throughput of the
hierarchical storage system the concurrent striping method uses the following ideas:
1. It desynchronizes all tape exchanges.
2. It shares exchange overheads among concurrent streams.
3. It supports efficient accesses for multiple concurrent streams.
The concurrent striping method desynchronizes the parallel I/O operation to avoid
exchange contentions. That is, each individual I/O operation such as exchange, reposition,
and transfer, on different drives does not need to be completed at the same time. Each
device performs the I/O operations independently
When multiple streams access objects concurrently, the overheads of switching tapes
among the streams are heavy. The concurrent striping method places the data stripes of the
concurrent streams on the same tape to avoid exchange overheads between services of
concurrent streams. Thus, the exchange overheads are shared among concurrent
streams.
.
The system that uses concurrent striping method is a combination of five different
components.
1. Divide object into logical segment
2. Distribute segments across all tapes
3. Store in fixed order within tape
4. Parallel stream controller
5. Request scheduling
1.Divide object into logical segment
Each object is divided into a number of logical segments such that each logical segment is a
logical starting point for consumption.
For example, objects X, Y, and Z are divided into different number of logical segments as
illustrated in Figure Large objects are divided into many segments and small objects are
divided into few segments
Each logical segment is optionally be subdivided into slices or fixed length data
stripes.
In the concurrent striping method controls the delivery of the objects is done using
parallel stream controller. The parallel stream controller accepts requests for new streams.
It creates a new “stream object” for each data object being accessed. The stream object is a
software object in the storage system and it is different from the data objects.
The stream object initially sends two requests to the service queue of each tertiary
drive. The tertiary drive accesses segments for the requests on the tapes to the memory for
display. The tertiary drives serve all requests in currently loaded tapes before they serve
the requests on other tapes that require switching.
After all requests on the currently loaded tapes are served, the drive sends an
exchange request to the robotic arm to exchange the next tape. After each request of a
stream is served, the stream sends a new request that accesses the next segment from the
tapes. After all the segments of an object are retrieved, the stream object sends a finish
notification to the parallel stream controller before it destroys itself. The parallel stream
controller can thus accept another new stream from its waiting queue.
Example
Two tertiary drives Drive 1 and Drive 2 retrieve segments of three objects on six tapes
T1,T2----T6 as shown in Figure.
Each drive exchanges a tape and retrieves one segment for each stream in each cycle.
Drive 1 exchanges M1 and retrieves segments X13, Y9, and Z4 on M1.
Then, it exchanges M3 and retrieves X15, Y11, and Z6 on M3.
Afterwards, it exchanges M5 and retrieves X17, Y13, and Z8.
It then exchanges M1 again, but stream Y has finished. So, it retrieves X19 and Z10 from M1
and so on.
Drive 2 exchanges M2 and retrieves segments X14, Y10, and Z5 on M2.
Then, it exchanges M4 and retrieves X16, Y12, and Z7 on M4.
After that, M6 is exchanged and X18, Y14, and Z9 are retrieved.
Drive 2 exchanges M2 again, but stream Y has finished and stream Z is aborted.
Thus, it retrieves X20 from M2 only. Similarly, it exchanges M4 and retrieves X22 and so on.
-------------------------------------------------------------------------------------------------------------------

Unit 4 DATA PLACEMENT ON DISKS

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Unit 4 DATA PLACEMENT ON DISKS

Uploaded by

Copyright:

Available Formats

DATA PLACEMENT ON DISKS

Data Placement on Disks

STATISTICAL PLACEMENT ON DISKS:

1.Frequency based placement –

place objects based on the access frequencies

2. Bandwidth based placement.

places objects based on their datarate.

Frequency based placement

V1 is the hottest object and V12 is the coolest object.

Bandwidth based placement:

Different multimedia pbjects require different access bandwidth.

Besides MM objects files and programs are also stored in disks

Inner zone takes more time to transfer (low data rate)

Outer zone takes less time to transfer (high data rate)

Steps in Bandwidth based placement

Step2: MM objects are sorted from highest to lowest BW.

Step4: Placing of Objects

STRIPING OF DISKS OR DATA STRIPING

First Strip is Placed in Disk1

Second strip is placed in Disk 2

Nth strip is placed in Disk N

N+1th strip is placed in Disk 1 and so on

Stripes of Object X- X1,X2,X3,X4,X5,X6,X7,X8

Stripes of objects Y – Y1,Y2,Y3,Y4,Y5,Y6

First Object Stripes X1 to X4 are placed, then X5-X8 are placed

Then Y1 to Y4 and Y5 and Y6 are placed

While retrieving X1 to X4 are retrieved simultaneously. They are filled in buffers.

If the No of disk is N, then No of Buffers is also N

Throughput of Disk N is Nβ, Thus throughput increases

Busy disks are unavailable for long period of time

In Staggered Striping multimedia objects are divided into subobjects.

Stream S is divided in to Objects : X1,X2,X3

Object X1 is divided into sub objects :X11,X12,X13

Object X2 is divided into sub objects :X21,X22,X23

Object X3 is divided into sub objects :X31,X32,X33

There are 4 disks – Disk 1,2,3 and 4

Similarly sub objects of X3(X31,X32,X33) is shifted and placed in Disk3,4, and 1

Suffers from disk BW fragmentation problem

Y3 is moved from Disk 1 to Disk3

REPLICATION PLACEMENT ON DISKS

Replication to Reduce Start-Up Latency

Replication to Avoid Disk Multitasking

Replication to Maintain Balance of Space and Load

 The hot objects should be replicated to high bandwidth disks

Allocated BSR =Allocated Access Bandwidth / Allocated Storage Capacity,

BSR Deviation.=.Allocated BSR.– .Actual. BSR.

CONSTRAINT ALLOCATION ON DISKS

PHASE BASED CONSTRAINT ALLOCATION

The super object is then placed on the storage system.

REGION BASED CONSTRAINT ALLOCATION

Disk is split into 6 regions

TERTIARY STORAGE DEVICES

The storage devices may be fixed or removable.

Robotic Tape Library

CONTIGUOUS (CONTINUOUS) PLACEMENT ON HIERARCHICAL STORAGE SYSTEMS

In the above example we have 3 objects A, B and C.

Log Structured Placement

Frequency Based Placement [in the topic 1]

CONSTRAINT ALLOCATION ON HIERARCHICAL STORAGE SYSTEMS

Multimedia streams should be displayed with continuity. Constraint allocation method

Interleaved Contiguous Placement

Each object is characterized by the

Figure shows two homogeneous streams, stream X and stream Y.