Professional Documents
Culture Documents
Networked Storages
Networked Storages
Networked Storages
NETWORKED STORAGE
• The following is a list of some of the factors that have contributed to the
growth of digital data:
• Increase in data-processing capabilities: Modern computers provide a significant
increase in processing and storage capabilities. This enables the conversion of
various types of content and media from conventional forms to digital formats.
• Lower cost of digital storage: Technological advances and the decrease in the cost
of storage devices have provided low-cost storage solutions. This cost benefit has
increased the rate at which digital data is generated and stored.
• Analyzing big data in real time requires new techniques, architectures, and tools
that provide high performance, massively parallel processing (MPP) data platforms,
and advanced analytics on the data sets.
Storage
• Data is stored on HDDs on which data can be read and written. Depending on the
methods that are used to run those tasks, and the HDD technology on which the
HDDs were built, the read and write function can be faster or slower.
•
• We can store hundreds of gigabytes on a single HDD, which allows us to keep all
of the data we can ever imagine.
• A question is what happens if for any reason we are unable to access the HDD?
The first solution might be to have a secondary HDD where we can manually copy
our primary HDD to our secondary HDD. Immediately, we can see that our data is
safe or must we copy only what changes?
• Fortunately, technology exists that can help us. That technology is the Redundant
Array of Independent Disks (RAID) concept, which presents a possible solution to
our problem. It is clear that data needs to be copied every time that it changes to
provide us with a reliable fault tolerant system. It is also clear that it cannot be done
in a manual way. A RAID controller can maintain disks in synchronization and can
also manage all of the writes and reads (input/output (I/O)) to and from the disks.
• Organizations maintain data centers to provide centralized data-processing
capabilities across the enterprise. Data centers house and manage large amounts of
data. The data center infrastructure includes hardware components, such as
computers, storage systems, network devices, and power backups; and software .
Large organizations often maintain more than one data center to distribute data
processing workloads and provide backup if a disaster occurs.
• Five core elements are essential for the functionality of a data center:
• Application: A computer program that provides the logic for computing operations
• Database management system (DBMS): Provides a structured way to store data
in logically organized tables that are interrelated
• Host or compute: A computing platform (hardware, firmware and software) that
runs applications and databases
• Network: A data path that facilitates communication among various networked
devices
• Storage: A device that stores data persistently for subsequent use such as
applications, operating systems, and management software.
• Availability: A data center should ensure the availability of information when
required. Unavailability of information could cost millions of dollars per hour to
businesses, such as financial services, telecommunications, and e-commerce.
• Security: Data centers must establish policies, procedures, and core element
integration to prevent unauthorized access to information.
• Performance: All the elements of the data center should provide optimal
performance based on the required service levels.
• Data integrity: Data integrity refers to mechanisms, such as error correction codes
or parity bits, which ensure that data is stored and retrieved exactly as it was
received.
• Capacity: Data center operations require adequate resources to store and process
large amounts of data, efficiently. When capacity requirements increase, the data
center must provide additional capacity without interrupting availability or with
minimal disruption. Capacity may be managed by reallocating the existing
resources or by adding new resources.
• Media: the type of cabling and associated protocol that provides the
connection.
• I/O protocol: how I/O requests are communicated over the media.
• Connectivity:
• Ethernet: Ethernet began as a media for building LANs in the 1980s. Typical bandwidths are
10Mbps, 100Mbps, and 1Gbps.Ethernet is a media and its protocol. IP-based protocols such as
TCP/IP generally run on top of Ethernet.
• Fibre Channel: Fibre Channel is a technology developed in the 1990s that has become
increasingly popular as a storage-to-processor media (for both SANs and DAS). Bandwidth is
generally 100MBps, with 200MBps minimum.
• Parallel SCSI (Small Computer Systems Interface): Typical bandwidths are 40MBps (also
called UltraSCSI), 80MBps (also called Ultra2 SCSI), and 160MBps (also called Ultra160 SCSI).
Parallel SCSI is limited to relatively short distances (25 meters or less, maximum) and so is
appropriate for direct attach, especially when storage and processors are in the same cabinet, but is
not well-suited for networking.
• SSA (Serial Storage Architecture): Serial Storage Architecture (SSA) is an open protocol
used to facilitate high-speed data transfer between disks, clusters, and servers. SSA is an
industry and user supported storage interface technology.
SSA products including disk enclosures, storage servers, and host bus adapters. SSA
products are based on the Small Computer System Interface (SCSI) standard
• I/O Protocols:
• The following are the most common I/O protocols supported on midrange platforms.
• SCSI (Small Computer Systems Interface): SCSI ( Small Computer System Interface),
standard electronic interfaces that allow personal computers (PCs) to communicate
with peripheral hardware such as disk drives, tape drives, CD-ROM drives, printers and
scanners faster and more flexibly than previous parallel data transfer interfaces.
• NFS (Network File System): NFS is a network that was introduced by Sun Microsystems
and is used by Unix or Linux based operating systems and stands for Network File
System. This is a network that is used for giving the remote access capabilities to the
applications. Remote access enables the user to edit or even take a closer look at his
computer by using another computer. This protocol gives devices the functionality to
modify the data over a network.
• CIFS (Common Internet File System, often pronounced “siffs”): CIFS is a Windows-based
network in file sharing and is used in devices that run on Windows OS This is a very
efficient feature that enables the devices to share multiple devices that are printers and
even multiple ports for the user and administration. CIFS also enables a request for
accessing files of another computer that is connected to the server. CIFS supports the
huge data companies to ensure that their data is used by the employees at multiple
locations
JBOD( Just a Bunch of Disks or Just a Bunch of Drives)
• When functioning as one unit, JBOD uses a process called spanning. When one
disk drive in the enclosure reaches its capacity, data is stored on the next drive in
the enclosure, and so on throughout the entire unit. Data is not fragmented,
duplicated or combined as with RAID.
• The various RAID levels use a variety of storage processes to achieve data
redundancy and fault tolerance, including striping, mirroring, a combination of
striping and mirroring, parity and double parity.
• For example, RAID 0 uses striping only, which fragments data onto the drives in
the array and offers no data redundancy, while RAID 1 uses mirroring only, which
duplicates data onto the drives and offers data redundancy.
• Advantages JBOD offers over RAID 0 (or any other RAID configuration) are:
• Avoiding Drive Waste: If you have a number of odd-sized drives, JBOD will let
you combine them into a single unit without loss of any capacity; a 10 GB drive
and 30 GB would combine to make a 40 GB JBOD volume but only a 20 GB
RAID 0 array. This may be an issue for those expanding an existing system, though
with drives so cheap these days it’s a relatively small advantage.
• Easier Disaster Recovery: If a disk in a RAID 0 volume dies, the data on every
disk in the array is essentially destroyed because all the files are striped; if a drive
in a JBOD set dies then it may be easier to recover the files on the other drives (but
then again, it might not, depending on how the operating system manages the
disks.) Considering that you should be doing regular backups regardless, and that
even under JBOD recovery can be difficult, this too is a minor advantage.
DAS(Direct-Attached Storage)
• Direct-Attached Storage (DAS) refers to a digital storage system directly
attached to a server or workstation, without a network in between.
• A typical DAS system is made of a data storage device connected directly to a
computer through a host bus adapter.
• The most important differentiation between DAS and NAS is that between the
computer and DAS there is no network device (like a hub, switch, or router).
• The media could be any (i.e., Fibre Channel, SCSI, SSA, Ethernet). The I/O protocol
is SCSI.
• DAS is optimized for single, isolated processors and low initial cost.
• If we use the storage disks’ location as a distinguishing point, we can have two
types of DAS: internal and external DAS.
• 1. Internal DAS
• With internal DAS, the storage disk/disks are directly located inside of a
hosting server.
• The common interface connection is through HB(Host Bus Adapter). The main
function of HBA is to provide high-speed bus connectivity or communication
channels between a host server and storage devices or a storage network.
• Internal DAS allocates one or two disk drives for system boot. With an internal
DAS, the physical space is limited. If a business application needs further
expansion for larger storage capacity and the physical space within the host server
is not available, then we have to locate a disk array externally.
• 2. External DAS
• For an external DAS arrangement, the storage disk array is still directly
attached to a server without any network device.
• However, it interfaces with the host server with different protocols. The popular
interface protocols are SCSI and Fibre Channel (FC).
• In comparison with internal DAS, external DAS overcomes the issues of physical
space and distance between the host server and storage disk array. In addition, the
storage array can be shared by more than one host server
• The size of the DAS enclosure also restricts storage capacity.
• These drawbacks continue to limit this type of storage's appeal. Sharing
with DAS, for example, is typically limited to a small number of ports or host
connections.
Network Attached Storage (NAS)
• A term used to refer to storage devices that connect to a network and provide
file access services to computer systems. These devices generally consist of an
engine that implements the file services, and one or more devices, on which data
is stored. NAS uses file access protocols such as NFS or CIFS.
• NAS systems are popular with enterprise and small businesses in many industries
as effective, scalable and low-cost storage solutions. They can be used to support
email systems, accounting databases, payroll, video recording and editing, data
logging, business analytics and more
• NAS Protocols:
• Common Internet File Services / Server Message Block (CIFS/SMB). This is
the protocol that Windows usually uses.
• Network File System (NFS). NFS was first developed for use with UNIX servers
and is also a common Linux protocol.
• The disk storage is attached externally to the gateway, possibly sold separately,
and may also be a standalone offering for direct or SAN attachment.
• The gateway accepts a file I/O request (e.g., using the NFS or CIFS protocols)
and translates that to a SCSI block-I/O request to access the external attached disk
storage.
• The gateway approach to file sharing offers the benefits of a conventional
NAS appliance, with additional potential advantages:
• Centralization of data storage in a safe, reliable way for authorised network users
and clients
• Permits data access across the network, including cloud based applications and data
Storage Area Network (SAN)
• A SAN (storage area network) is a network of storage devices that can be accessed by
multiple servers or computers, providing a shared pool of storage space.
• Each computer on the network can access storage on the SAN as though they were local
disks connected directly to the computer.
• SAN is typically assembled with cabling, host bus adapters, and SAN switches attached to
storage arrays and servers.
• A SAN switch is hardware that connects servers to shared pools of storage devices. It is
dedicated to moving storage traffic in a SAN.
•
• A SAN and network-attached storage (NAS) are two different types of shared networked
storage solutions. While a SAN is a local network composed of multiple devices, NAS is a
single storage device that connects to a local area network (LAN).
• The most common media is Fibre Channel, but Ethernet-based SANs are emerging. The
I/O protocol is SCSI.
• The main benefit of using a SAN is that raw storage is treated as a pool of
resources that IT can centrally manage and allocate on an as-needed basis. SANs
are also highly scalable because capacity can be added as required. The
main disadvantages of SANs are cost and complexity.
• SAN Protocols:
• Fibre Channel Protocol (FCP)
• Internet Small Computer System Interface (iSCSI)
• Fibre Channel over Ethernet (FCoE)
• Non-Volatile Memory Express over Fibre Channel (FC-NVMe)
•
Content addressable storage (CAS)
• Content addressable storage (CAS) is a storage mechanism in which fixed data
is assigned a permanent location on a hard disk and addressed with a
unique content name, identifier or address.
• CAS is also known as associative storage, content aware storage or Fixed Content
Storage (FCS).
• CAS is designed to facilitate more efficient storage and access of fixed data that
does not generally change over time. It allows organizations to archive and retrieve
large amounts of data for longer retention periods, specifically to comply with
regulatory requirements.
• CAS works by storing each data object on a hard disk and assigning it a
unique content address/identifier. Once the data object is stored, it cannot be
duplicated, modified or deleted. To access the data, a user or application must
specify the data's content address or identifier
• It is designed for secure online storage and recovery of fixed content.