Networked Storages

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 46

UNIT-3

NETWORKED STORAGE
• The following is a list of some of the factors that have contributed to the
growth of digital data:
• Increase in data-processing capabilities: Modern computers provide a significant
increase in processing and storage capabilities. This enables the conversion of
various types of content and media from conventional forms to digital formats.

• Lower cost of digital storage: Technological advances and the decrease in the cost
of storage devices have provided low-cost storage solutions. This cost benefit has
increased the rate at which digital data is generated and stored.

• Affordable and faster communication technology: The rate of sharing digital


data is now much faster than traditional approaches. A handwritten letter might take
a week to reach its destination, whereas it typically takes only a few seconds for an
e-mail message to reach its recipient.

• Proliferation of applications and smart devices: Smartphones, tablets, and newer


digital devices, along with smart applications, have significantly contributed to the
generation of digital content.
• Structured data is organized in rows and columns in a rigidly defined format so
that applications can retrieve and process it efficiently. Structured data is typically
stored using a database management system (DBMS).

• Data is unstructured if its elements cannot be stored in rows and columns,


which makes it difficult to query and retrieve by applications. For example,
customer contacts that are stored in various forms such as sticky notes, e-mail
messages, business cards, or even digital format files, such as .doc, .txt, and .pdf.
• It includes both structured and unstructured data generated by a variety of sources,
including business application transactions, web pages, videos, images, e-mails,
social media, and so on.

• Analyzing big data in real time requires new techniques, architectures, and tools
that provide high performance, massively parallel processing (MPP) data platforms,
and advanced analytics on the data sets.
Storage
• Data is stored on HDDs on which data can be read and written. Depending on the
methods that are used to run those tasks, and the HDD technology on which the
HDDs were built, the read and write function can be faster or slower.

• We can store hundreds of gigabytes on a single HDD, which allows us to keep all
of the data we can ever imagine.

• A question is what happens if for any reason we are unable to access the HDD?
The first solution might be to have a secondary HDD where we can manually copy
our primary HDD to our secondary HDD. Immediately, we can see that our data is
safe or must we copy only what changes?

• Fortunately, technology exists that can help us. That technology is the Redundant
Array of Independent Disks (RAID) concept, which presents a possible solution to
our problem. It is clear that data needs to be copied every time that it changes to
provide us with a reliable fault tolerant system. It is also clear that it cannot be done
in a manual way. A RAID controller can maintain disks in synchronization and can
also manage all of the writes and reads (input/output (I/O)) to and from the disks.
• Organizations maintain data centers to provide centralized data-processing
capabilities across the enterprise. Data centers house and manage large amounts of
data. The data center infrastructure includes hardware components, such as
computers, storage systems, network devices, and power backups; and software .
Large organizations often maintain more than one data center to distribute data
processing workloads and provide backup if a disaster occurs.

• Five core elements are essential for the functionality of a data center:

• Application: A computer program that provides the logic for computing operations
• Database management system (DBMS): Provides a structured way to store data
in logically organized tables that are interrelated
• Host or compute: A computing platform (hardware, firmware and software) that
runs applications and databases
• Network: A data path that facilitates communication among various networked
devices
• Storage: A device that stores data persistently for subsequent use such as
applications, operating systems, and management software.
• Availability: A data center should ensure the availability of information when
required. Unavailability of information could cost millions of dollars per hour to
businesses, such as financial services, telecommunications, and e-commerce.

• Security: Data centers must establish policies, procedures, and core element
integration to prevent unauthorized access to information.

• Scalability: Business growth often requires deploying more servers, new


applications, and additional databases. Data center resources should scale based on
requirements, without interrupting business operations.

• Performance: All the elements of the data center should provide optimal
performance based on the required service levels.

• Data integrity: Data integrity refers to mechanisms, such as error correction codes
or parity bits, which ensure that data is stored and retrieved exactly as it was
received.
• Capacity: Data center operations require adequate resources to store and process
large amounts of data, efficiently. When capacity requirements increase, the data
center must provide additional capacity without interrupting availability or with
minimal disruption. Capacity may be managed by reallocating the existing
resources or by adding new resources.

• Manageability: A data center should provide easy and integrated management of


all its elements. Manageability can be achieved through automation and reduction
of human (manual) intervention in common tasks.
• Managing a data center involves many tasks. The key management activities
include the following:

• Monitoring: It is a continuous process of gathering information on various


elements and services running in a data center. The aspects of a data center that are
monitored include security, performance, availability, and capacity.

• Reporting: It is done periodically on resource performance, capacity, and


utilization. Reporting tasks help to establish business justifications and chargeback
of costs associated with data center operations.

• Provisioning: It is a process of providing the hardware, software, and other


resources required to run a data center. Provisioning activities primarily include
resources management to meet capacity, availability, performance, and security
requirements.
• There are just three key concepts to be understood:
• Connectivity: how processors and storage are physically connected.

• Media: the type of cabling and associated protocol that provides the
connection.

• I/O protocol: how I/O requests are communicated over the media.
• Connectivity:

• Ethernet: Ethernet began as a media for building LANs in the 1980s. Typical bandwidths are
10Mbps, 100Mbps, and 1Gbps.Ethernet is a media and its protocol. IP-based protocols such as
TCP/IP generally run on top of Ethernet.

• Fibre Channel: Fibre Channel is a technology developed in the 1990s that has become
increasingly popular as a storage-to-processor media (for both SANs and DAS). Bandwidth is
generally 100MBps, with 200MBps minimum.

• Parallel SCSI (Small Computer Systems Interface): Typical bandwidths are 40MBps (also
called UltraSCSI), 80MBps (also called Ultra2 SCSI), and 160MBps (also called Ultra160 SCSI).
Parallel SCSI is limited to relatively short distances (25 meters or less, maximum) and so is
appropriate for direct attach, especially when storage and processors are in the same cabinet, but is
not well-suited for networking.

• SSA (Serial Storage Architecture): Serial Storage Architecture (SSA) is an open protocol
used to facilitate high-speed data transfer between disks, clusters, and servers. SSA is an
industry and user supported storage interface technology.

SSA products including disk enclosures, storage servers, and host bus adapters. SSA
products are based on the Small Computer System Interface (SCSI) standard
• I/O Protocols:

• The following are the most common I/O protocols supported on midrange platforms.

• SCSI (Small Computer Systems Interface): SCSI ( Small Computer System Interface),
standard electronic interfaces that allow personal computers (PCs) to communicate
with peripheral hardware such as disk drives, tape drives, CD-ROM drives, printers and 
scanners faster and more flexibly than previous parallel data transfer interfaces.

• NFS (Network File System): NFS is a network that was introduced by Sun Microsystems
and is used by Unix or Linux based operating systems and stands for Network File
System. This is a network that is used for giving the remote access capabilities to the
applications. Remote access enables the user to edit or even take a closer look at his
computer by using another computer.  This protocol gives devices the functionality to
modify the data over a network.

• CIFS (Common Internet File System, often pronounced “siffs”): CIFS is a Windows-based
network in file sharing and is used in devices that run on Windows OS This is a very
efficient feature that enables the devices to share multiple devices that are printers and
even multiple ports for the user and administration. CIFS also enables a request for
accessing files of another computer that is connected to the server.   CIFS supports the
huge data companies to ensure that their data is used by the employees at multiple
locations
JBOD( Just a Bunch of Disks or Just a Bunch of Drives)

• JBOD generally refers to a collection of hard disks that have not been 


configured to act as a redundant array of independent disks (RAID) array

• When functioning as one unit, JBOD uses a process called spanning. When one
disk drive in the enclosure reaches its capacity, data is stored on the next drive in
the enclosure, and so on throughout the entire unit. Data is not fragmented,
duplicated or combined as with RAID.

• The various RAID levels use a variety of storage processes to achieve data
redundancy and fault tolerance, including striping, mirroring, a combination of
striping and mirroring, parity and double parity.
• For example, RAID 0 uses striping only, which fragments data onto the drives in
the array and offers no data redundancy, while RAID 1 uses mirroring only, which
duplicates data onto the drives and offers data redundancy.
• Advantages JBOD offers over RAID 0 (or any other RAID configuration) are:

• Avoiding Drive Waste: If you have a number of odd-sized drives, JBOD will let
you combine them into a single unit without loss of any capacity; a 10 GB drive
and 30 GB would combine to make a 40 GB JBOD volume but only a 20 GB
RAID 0 array. This may be an issue for those expanding an existing system, though
with drives so cheap these days it’s a relatively small advantage.

• Easier Disaster Recovery: If a disk in a RAID 0 volume dies, the data on every
disk in the array is essentially destroyed because all the files are striped; if a drive
in a JBOD set dies then it may be easier to recover the files on the other drives (but
then again, it might not, depending on how the operating system manages the
disks.) Considering that you should be doing regular backups regardless, and that
even under JBOD recovery can be difficult, this too is a minor advantage.
DAS(Direct-Attached Storage)
• Direct-Attached Storage (DAS) refers to a digital storage system directly
attached to a server or workstation, without a network in between.
• A typical DAS system is made of a data storage device  connected directly to a
computer through a host bus adapter.
• The most important differentiation between DAS and NAS is that between the
computer and DAS there is no network device (like a hub, switch, or router).
• The media could be any (i.e., Fibre Channel, SCSI, SSA, Ethernet). The I/O protocol
is SCSI.
• DAS is optimized for single, isolated processors and low initial cost.
• If we use the storage disks’ location as a distinguishing point, we can have two
types of DAS: internal and external DAS.

• 1. Internal DAS

• With internal DAS, the storage disk/disks are directly located inside of a
hosting server.
• The common interface connection is through HB(Host Bus Adapter). The main
function of HBA is to provide high-speed bus connectivity or communication
channels between a host server and storage devices or a storage network.

• Internal DAS allocates one or two disk drives for system boot. With an internal
DAS, the physical space is limited. If a business application needs further
expansion for larger storage capacity and the physical space within the host server
is not available, then we have to locate a disk array externally.
• 2.  External DAS
• For an external DAS arrangement, the storage disk array is still directly
attached to a server without any network device.

• However, it interfaces with the host server with different protocols. The popular
interface protocols are SCSI and Fibre Channel (FC).

• In comparison with internal DAS, external DAS overcomes the issues of physical
space and distance between the host server and storage disk array. In addition, the
storage array can be shared by more than one host server 
• The size of the DAS enclosure also restricts storage capacity.
• These drawbacks continue to limit this type of storage's appeal. Sharing
with DAS, for example, is typically limited to a small number of ports or host
connections.
Network Attached Storage (NAS)
• A term used to refer to storage devices that connect to a network and provide
file access services to computer systems. These devices generally consist of an
engine that implements the file services, and one or more devices, on which data
is stored. NAS uses file access protocols such as NFS or CIFS.

• NAS systems are popular with enterprise and small businesses in many industries
as effective, scalable and low-cost storage solutions. They can be used to support
email systems, accounting databases, payroll, video recording and editing, data
logging, business analytics and more
• NAS Protocols:
• Common Internet File Services / Server Message Block (CIFS/SMB). This is
the protocol that Windows usually uses.

• Network File System (NFS). NFS was first developed for use with UNIX servers
and is also a common Linux protocol.

• A NAS device is attached to a TCP/IPbased network (LAN or WAN), and accessed


using CIFS and NFS— specialized I/O protocols for file access and file sharing.

• The weaknesses of a NAS are related to scale and performance. As more users


need access, the server might not be able to keep up and could require the addition
of more server horsepower. The other weakness is related to the nature of Ethernet
itself.
• NAS Gateways.
• A NAS gateway provides the function of a conventional NAS appliance but without
integrated disk storage.

• The disk storage is attached externally to the gateway, possibly sold separately,
and may also be a standalone offering for direct or SAN attachment.

• The gateway accepts a file I/O request (e.g., using the NFS or CIFS protocols)
and translates that to a SCSI block-I/O request to access the external attached disk
storage.
• The gateway approach to file sharing offers the benefits of a conventional
NAS appliance, with additional potential advantages:

• increased choice of disk types.


• increased capability (such as large read:write cache or remote copy
functions).
• increased disk capacity scalability (compared to the capacity limits of an
integrated NAS appliance).
• ability to preserve and enhance the value of selected installed disk systems
by adding file sharing.
• ability to offer file sharing and block-I/O on the same disk system.
• Benefits of NAS include:
• Simple to operate; a dedicated IT professional is generally not required

• NAS is optimized for ease-of-management and file sharing using lower-cost


Ethernet-based networks

• Easy data backup and recovery, with granular security features

• Centralization of data storage in a safe, reliable way for authorised network users
and clients

• Installation is relatively quick, and storage capacity is automatically assigned to


users on demand.

• Permits data access across the network, including cloud based applications and data
Storage Area Network (SAN)
• A SAN (storage area network) is a network of storage devices that can be accessed by
multiple servers or computers, providing a shared pool of storage space.

• Each computer on the network can access storage on the SAN as though they were local
disks connected directly to the computer.

• SAN is typically assembled with cabling, host bus adapters, and SAN switches attached to
storage arrays and servers.

• A SAN switch is hardware that connects servers to shared pools of storage devices. It is
dedicated to moving storage traffic in a SAN.
•  
• A SAN and network-attached storage (NAS) are two different types of shared networked
storage solutions. While a SAN is a local network composed of multiple devices, NAS is a
single storage device that connects to a local area network (LAN). 

• The most common media is Fibre Channel, but Ethernet-based SANs are emerging. The
I/O protocol is SCSI.
• The main benefit of using a SAN is that raw storage is treated as a pool of
resources that IT can centrally manage and allocate on an as-needed basis. SANs
are also highly scalable because capacity can be added as required. The
main disadvantages of SANs are cost and complexity.

• SAN Protocols:
• Fibre Channel Protocol (FCP)
• Internet Small Computer System Interface (iSCSI)
• Fibre Channel over Ethernet (FCoE)
• Non-Volatile Memory Express over Fibre Channel (FC-NVMe)
•  
Content addressable storage (CAS)
• Content addressable storage (CAS) is a storage mechanism in which fixed data
is assigned a permanent location on a hard disk and addressed with a
unique content name, identifier or address.

• CAS is also known as associative storage, content aware storage or Fixed Content
Storage (FCS).

• CAS is designed to facilitate more efficient storage and access of fixed data that
does not generally change over time. It allows organizations to archive and retrieve
large amounts of data for longer retention periods, specifically to comply with
regulatory requirements.

• CAS works by storing each data object on a hard disk and assigning it a
unique content address/identifier. Once the data object is stored, it cannot be
duplicated, modified or deleted. To access the data, a user or application must
specify the data's content address or identifier
• It is designed for secure online storage and recovery of fixed content.

• Unlike file-level and block-level data access that use file names and the


physical location of data for storage and recovery, Content addressed storage
stores user data and its information as separate objects. The stored object is
assigned a globally unique address known as a content address (CA).

• Example of Fixed content data is


• Types of Archives:
• An electronic data archive is a repository for data that has fewer access
requirements. It can be implemented as online, nearline, or offline
based on the means of access: ฀
• Online archive: The storage device is directly connected to the host
to make the data immediately available. This is best suited for active
archives. ฀
• Nearline archive: The storage device is connected to the host and
information is local, but the device must be mounted or loaded to
access the information. ฀
• Offline archive: The storage device is not directly connected,
mounted, or loaded. Manual intervention is required to provide this
service before information can be accessed.
• The features and benefits of CAS include the following:
• Content authenticity: It assures the genuineness of stored content. This is
achieved by generating a unique content address and automating the process of
continuously checking and recalculating the content address for stored objects.
• Content integrity: Refers to the assurance that the stored content has not been
altered.
• ฀Location independence: CAS uses a unique identifier that applications can
leverage to retrieve data rather than a centralized directory, path names, or URLs
• ฀Single-instance storage (SiS): The unique signature is used to guarantee the
storage of only a single instance of an object
• Record-level protection and disposition: All fixed content is stored in CAS once
and is backed up with a protection scheme.
• Fast record retrieval: CAS maintains all content on disks that provide subsecond
“time to first byte” (200 ms–400 ms) in a single cluster. Random disk access in
CAS enables fast record retrieval.
• ฀Load balancing: Distributes data objects on multiple nodes to provide maximum
throughput, availability, and capacity utilization. ฀
• Scalability: Adding more nodes to the cluster without any interruption to data
access and with minimum administrative overhead.

You might also like