CC Unit5

8. Expiain about the evolution of storage technology. Ans: A cloud application demands hugs amount of storage es well as computing cycles inorder to storage and process cloud data. The evaluation of storage technology begin inthe yeu (586 n this yea, 2.6EB (A CD-ROM less than or equal 0730 MB) vas used for storing data per computer, However i the year 1993, 15.8EB of storage (whose capacity was equiva to 4C1D« ROMS) was provided to the users, Later a storage of about $4.5EB (Whose capacity equal 1o 12 CD-ROMS) was provided to users in the year 2000, Whereas inthe year 2007 a siorage of 29SEB (whose capacity was equal to 61 CD-ROMS) aa ited tothe users. . ‘Arecent study made th year2003 reveals that during the year 19802003 storage with respect to procéor technology ‘has increased the capacity of Hard Disk Drives (HDD) density by magnitude of four i.e,, from 0.01 GB to GBiin’, Later in ‘the year 2011 the HDD density was increased to 744 GB/ie? and in the year 2016 it reached to |, in’. This lead to.an increase ie. Low-power, double-data- ations where encounter between of centralized storage system for functions like data replication, backup {nthe year2003, Dynanyje Random Access Memory (ORAM) ineteased front GBVin'to ‘noveral storage cost Laterin the year 2010 the Samsung introduce the 4 gigabit LPDDR2_ route DRAM which uses 30 nm prdcess, Due to the rapid advaiteemeit in ‘technology, initial investment cost and the system management cost. Thus this leads to the des cloud. Eventhough the centralized storage have automated various storage man ize the cost of storage rianagement it BGiypitomarce degradation. recovery et. to substantially min Hence to overcome the above issues, large seimenatcaae stems like network file system, Andrew file system, sprite file systems. Google file system and megastore where evolved i 0. This storage system uses off-the shell components and allow, . AY 1. Storage of huge amount of data on cloud, : 2. Process or manage large volume of wood data wD ion of huge sua, ~ igh performance (at any cost) as well as high reliability at low cost. 3, Perform intensive comput + The torage system was designed 1 gnoy 9, AAxplain the various storage models, ~ . Mol Pepact, aT} Storage Models: Surg ds data structure in physical storage like loa disks, rmovable media ad storage, Whereas data model deseribéthe logical aspects of data structure in a database. The storage models areas follows, 1 el storage ‘ ass . 2 Joumal storage. \. Call Storage Cell Srage define storage of ix szd cells, ach cellontsns atleast one object int. deserts the ohysical organization of primary and secondary Sorage media. I eomputer systems primary memory is aanged as aay of remory ‘ells whereas secondary nity i ergo in the form of block sectrs. Read/rie operations performed on each ook is refeted as a unit The two desirable properties of storage model especialy of cell storage areas follows, (i) Read/write coherence *() Read/Write Coherence: ReadAwrite coherence arises wien th notch whi the see write operation on merry cl ervey eal doe Tae] BRT Olnceantcivl | © TemP read J from memory ® cell M. > Tine » Fen tain cba 8 (i) Hetoreonater Atomic: Bsfroraer smi oar when he rt Faure opnton doesn cis withthe revs Ox eal operon ‘ ~~ Sy2 Journal Storage: Journal storage stores eae vec eit ‘These objects contain multiple field. This type Gi) Journal Manager: A journal mafiager/sllow users, to interact with the cell stofage. To perform this rust send request oa journal mariage inorder 10 (a) Initiate a new actio (6) Perform read ion oni cell value (©) Perform writ tion on cell value (@) Perforh gommit operation of storage includes a journal manager and cell (Cell Storage: It is used for storing al of variable instead of storing the current value. e) Stop a ation. Upon a 2 request, the journal manager, uses the following command inorder to allow usérs to interact with ce storage. a % Read? Itallow users to perform ‘Read’ operation on a cell : % Write: It allow users to perform ‘write’ operation on cell. 4% Allocate: It allow users to perform allocation on a cell. ‘+ Deallocation: It allow users to perform deallocation, In storage system, the cell storage maintains a ‘log’ for storing the history of variables. This log is basically present thé non volatile storage media of journal storage. This Tog contain a record which include information about updated data it appended at the end of it The information contained inthe log can later be recovered/reused atthe time of system failure, 11 all the actions present in the log follow All-or-Nothing action. ‘That is using this action helps in maintaining a.m ensures tQ10__Wfite short hotes on file system and database. NS . An File System: File system serves as a source of operating, sen in identifying the data located on disk. It specify the way how data canbe saved, stored and retrieved ona disk iS Considered a a data struetie/method used by operating system. A file systom is basically sed for, F : 1. Keeping records of files on disks‘patitions, ~ 2 Organizing the files on disks. XY 3. Partitioning thé ilés stored on disk. se 4. Accessing data present on disk AN file sysicm consist of collecting irectorics, Where each directory contains information about a group of files/multiple files, Modein high performance systems opt the following tree classes of file system. (“Network File ne s(NFS) , (i) Storage Area Networks (SAN) (Gi) Paratel tems (PFS) . 5 @ Network stem: It is the most popularly used distributed file system developed by SUN Microsystem, This file ssystem,iges the basic client/server model inorder to share applications and also to shire file systems among multiple oli cted via LAN, ° However, thi filesystem deals with various isues lke scalability reliability and system failure, : i) Storage Area Networks: Itis a networking technology that connects storage systems withthe computation servers. This storage System is said tobe centralized storage as it maintains a storage pool which allow storage allocation based on the servers need, [A SAN-based file system is considered as expensive as it require addtional hardware and software suppor for pooling, il) Parallel Fite Systems: Iti a distribute filesystem that alow multipleclient to agcess same fle. It make use’ of global “ame spacing mectanism to distibute a file across multiple UO nodes. Here VO nodes are used for forwarding data to all computation nodes. A parallel file system also maintains a metadata server inorder to maiatain information about data nodes. Database: A datatase is an integrated, shared collection of data, It consist of both “End-users” data, (which includes raw facts of interest) and Mefa-data (which includes data about data). It is usually so large that it has been stored on storage devices like diskstapes ‘ADatabase Management System (DBMS) isa softwate that defines « database; stores the dat, support a query language, produce reports and manages the progess made to data in database. Cloud applications used database as an interface to interact with file system. « “The data-intensive cloud application supports database models like navigationaf model, relational mode! abject oriented model, But it does not support SQL query language. The cloud database support NOSQL model. It does not ensures ACID properties, .a 6.3. DISTRIBUTED FILE SYSTEMS. So ystome and UNL to sysiom, peer) me - Network File System (NFS) It is the most popularly used distributed file system developed by Sun. sm. This fle system. used basic, clisnyserver model in order to develop applications and also to'share file system am@me multiple clients connected vis Lac! Ars Network {LAN}. ; N ib S)en UF. AN [Network File System (NFS) Was designed based on the : designing philosophy of UNIX file system offers the following benefits as 1. _“Teedefines sematies similar to UNIX file system, This provides compatibility ithe file system. 2. Itallows easy integration with existing file system, OQ 3. __lteeses mui-lens gvironment. This garanies the wide wage ofa Wp Bete that cofnesied toa neork. ding eae aces eros tHe network conning high 4, It accepts optimal level of performance degradation while provi bandit GS Unix Bile System: Unix file structure (or Unix file system) plays sean «standing most of the other Unix commands. Unis treats all of it’s utilities as a ‘file’. This file structure wher ally represented, resembles, an upside down tee, with~ root directory claiming the peak position and various other subdiétoresspeaded Beloit. A snapshot of such representation is as follows. Sub dzecloties Sub-sub ecto > Figure: Unix Director} File Structure (A Snapshot of Upside Down Tree) a collection of eeated files hence a directory canbe called asa file iri behind auch repressntatin isto placeall he relative fs ino directory hence cilsing case in manager. Features of Unix File System ~ (Supports hierarchical file structure The dynamic grow of files . (iii)- File access permissions (iv) All devices are treated as filesCharacteristics of UNIX File System |. -The layered design of UNIX file systsm provides a flexibl fle system. The concept of layering minimizes the interaction treweea those modUles that ae necessary for implementing the system. Besides this, the use of unode layer in UNEX file system grants uniform excess right on both loeal and remote files, ign of UNIX file system facilitates a scable file system. The hierarchical design group multiple files 2, The hierarchical de have multiple (or) collection of files or directories. into a single file called director. A hierarchal filesystem can 3. The metadata contained in the UNIX file system describes the aystematic view of the file system_ metadata contains information about the » (i; Owner of te file i) Access rights granted to the user (il). Time of file creation (iv) Last modifications made on the file : Size of the file : Structure of the file (vii) Persistent storage device cells. ‘The inode layer present on the persistent socage media includes informe snd directories 12. Explain the layered design of UNIK.le system. ww: ‘Ans: OY oda’ Paera 7) Layered Desig of UNIX Fle System: The yer desion of UNIX fe yg shown in gre below, mbar pat ame layer N =~» . ns a — | name layer ° S Path naiie leg layer < w gical lest Lonel rod : amor) Psi fe art lock Bea] r 7 Th frie ter gt lock layer Figura: Layered Dosign of UNIX Fil Systom In layered design of UNIX filesystem the piysical file structure is separated from thet of thie logicil file structure Physical File Structure: The physical file stcucture of UNIX fil z c sical file structure of le systém, represent the storage model. This model provi description regarding the storage of fle ina storage media. physica i maya seepinaee Be physieal files strut stores files on physical device called physicaLogical File Structure: The logical fle structure Of UNIX file system represent the data model, This riodel [provides abstrac logical file i Qn Anst Neiwork file sysiem uses cfes Clients connected by A le clients connected by Local Area Network (LAN). fa’ " cat host whercas aserver runs on remote filesystem, This filesystem uses local filesystems Application Programing Inerface (API) inorder o diffecetite fle operation of local file from remote file and later invokes the Remote Pro Hl(RPC) lsat esponse.AnAPI calls are provided by uses on remote file system and Network File System =e ponse to RPC cals, Figure below shows the NES client. = cay t Tene Hat . em AA ie Fie sysem Arrieta] | y te Yoke Layer Yrode Liver Jepsen ‘ A's NFS Client NES Server apn finrssu fiNFSsnn & ‘Communication Network <\ . I Figure : NFS Chi ~ Server Intersction In ihe above architecture, Vnode layer is responsible for iniplementng file operations on local o remote files an also." for differentiating local ies from remot ile: A Toca fle system includes al the operations of local file and remote Me System includes all the operations of remote files and NFS. ‘A File Handle (FH) is used for uniquély identifying remote files. It contains a intemal naine of 32-bytes.file system ‘identification, inode number and a generation number. I allow system to identify reméte filesystems and the corresponding file on the system, The generation number ofthe file handle facilitate the system in reusing the inode numbers. Apart from this, the generation also guarantees to provide accurate semaniies when many clients simultaneously operates on a single remote fie,; e) Characierstis: Gensel parallel File S}sters passes be following ebarectersies, no) 1 3 WERAL! PARALLEL FILE SYSTEMS. 16. Explain about general parallel file systems. Ans: Model Paper-tl, Q7(a) Parallcl file systcm substantiate concurrent access to single + file. That is, parallel file system allow multiple clients t6 perform read/write operation on a single file. simultaneously. This file system uses Same Program Multiple Data (SPDM) paradigm to allow concurrent.access to the single files. Concurrency control * is considered as one of the critical issue of parallel file system General parallel file system was designed based on the concept of parallel file system. This file system employs the * characteristics of general purpose POSIX system which runs we ene ERA ‘This filesystem was developed for optimizing the performance OF larg clusters. 1s designed in sach away that t can support age file system of 4PB having 4096; sof Ltrabyte and maximum filesize of about (2 —1) bytes, In GeFS, a file is basically divided into fixed sized block which ranges between IGKB and IMB, These i ‘0 multiple disks, The configuration of general parallel file-system is shown below, Qe ’ aw Gait ba >
‘chunk index 7 ‘chunk location ‘ ° = NO lena ice Ae data + ehintehandle See || ae nS ‘Communication Network . (Come (Chank Server] [Chak Sever Finax File Syste [Linux File System] [Linux File System] : Siu: OES ChuteGoogle File System (GFS) les includes aset of fixed sized segment of equal sized. These segments are: referred to as chunk, Each chunk contains a 64 KB block where ‘each block ‘comprises a 32-bit checksum. Each chunk is associated with a ‘unique chunk handler and are stored on Linux file system. In GFS, files are replicated and stored on ‘multiple site. GFS support a chunk size: ofabout64MB. A largechunk size tend, : 1 Optimize the performance of large files. Minimizes the storage of metadata on to the file system. « 3. Maintains persistent network connéetion with chunk Minimizes the requests made for locating the chunk. The Google File System cluster make use of the fellowisig two'components, @ Master. * (i) _Chuik Server. Master: ‘A master is responsible for. maintaining state information of all the system components. This information includes metadata about, 1. Filename 2. Location of the replicated files on each chunk. >. 3. Access control information 4. ‘State information of each chunk server Besides this, it also manages multiple chunk servers. A. master Contain the location of each chunk a control structure its memory. Due to this, a master can have updated information regarding the chunk location. This updation are done each time the system gets statted or whin a new chunk server participates the cluster. 7updated to metadata information which are later masters in case of system failure. In Addition to thisy'ft also stores the replicated copies of metadata on persisieatstorage. To To maintain system reliability, an operations I ee) maintained by'the file system. This logs keeps a ma recover from fault tolerance and system failure, the operation logs are replayed. However, the recovery reduce when the raster repeatedly check points it state. And once the last check point is made the recovery time replays the log records 1. (Chunk Server: A Chunk: Singhiors on Linux File System. Maccepts the metadata informltion provided by the master to irecily communicate-orinierdéct with the applications ihorder to perform desired file Operation, It uses the status information to respond the = to a file is provided in.the following. Initially a filename, chunk index and the offset of the file for performing read/write operation is send by client application to the master, »: Upon receiving the request, the master responds to the pplication by specifying the chunk handle and location - of the chunk, 3. Using the above information provided by’ the master, the * application can directly interact with the chunk server to perform the desired filé operation,“The GFS cluster architecture, is considered as ascable and effective consistency model. In this model, a master is responsible for carrying out atomic operations like file creation. This file system is said be scable, to make this architecture scalable, a master willcootribut less in file mutation and also frequently occurring operations like write/append, In such situations, a particular chunk is provided to primary chunk server. This server specifies the serial-order for the updated chunk. . In this architecture data flow and control flow are de ‘coupled. This helps in achieving'efficiency while carrying.out ‘rious file operations especiaily the write operation. low steps illustrate how write request is carried out by fing the control flow from data flow. : 1 Initially the client communicate. ee by allocating a'lease to single ‘chink for a specific ‘chunk when there is no lease, particular chunk. ‘Upon assigning, the master generates response with ID of both primary’ ‘chunk server which has the duplicate chunks, Thus thé received information is cached by client") > 2 The clic smite sta ‘as’ well the replicated copy of chu server: Each chunk server uses intel ‘buffer to store:the. data and send an ackno\ nt to the client. 3. ‘Thewrte equest is forwarded from client to the primary +, "€hiink server after receiving the information from each wos unk server that holds replicated copies of chunk. The _ primary chunk server finds the file mutation using the Gonsecutive sequence number: .@ di) 4. The write request is now forwarded for primary chunk server to secondary chunk servers. 5.° ‘The secondary ‘chunk servers assign mutations to the ‘file based on the Sequence ‘iumber and then sends an acknowledgment to primary chunk server. 6. Upon receiving an acknowledgment from secondary * chunk server the piitnary chunk server responds to the *selint : "The GFS provides efficient: check pointing-pfocedure depending on the copy-on-virté thechanism. If make’ wwe of lazy garbage collectionin orderto free the memory space after deletion: + ne . Besides this, the'maaster perform periodic scanning on “th name spats preset onthe filesystem and then deletes the metadata files of the less recently used/nidden older file ‘name, IW addition to, this, a chunk server also maintains a-etecksum of locally store chunks. This ensure data integrity at the time of data loss and system future, Google File System can beimplemented on cloyd store. | Itisan open-source software written using C++ programming *tariguage. Ttalso supports to Java and Python édpFamming | Janguage, Ae : It uses distributed locking mechanism inode eae dita consistency and-data synchronization. ‘This mechanista employs a central Jock manager which ar taken to every local manage unning on theirespectve /O node. I offers lock granularity by applying thegollawing two echnigues. Byte-range Tokens: It is used Ks ing read/write opérations on datafile. This operation is casied out in the following manner. Initially the first * siamo token (between 0 tooo] andattemptto write toa file This node can simultaneously perform alts subsequent read/write operations until the attempt to perform read/write operation is done by another node, (On same file if such situation can restrict the: token range provided to the first node. That i, if offset of first node for performing write oper cs P, and if offset of the second node is, then range ofthe token willbe [0, P,] and [, ] respectively: Hence the fwo nodes are now allow fo simultaneously access the file. Data Shipp! thod: This method offer file-grained data sharing among multiple nodes to obtain similar files. This method und robin algorithm to.control the equal sized file blocks. Read/write operations are controlled by the target leek me HerepioRen manager is responsible for (2) Maintaining token states (b) Creating a token , . “ . (©) Distributing.a token. (@) Colleétinga token after the file gets closed. (©) Upgrading a token when addition requests are made by node for file access. It synchronizes the metadata access, It employs disk mapping mechanism to manage the disk space. This mechanism partitions the disk maps into m regions and stores them on distinct I/O nodes. Thereby allowing several node to use the disk space at same time.

CC Unit5

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CC Unit5

Uploaded by

Copyright:

Available Formats

You might also like