SCloud - Storage Virtualization

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Storage Virtualization

The cloud computing platform maximizes the efficiency of resource utilization and business
operation and maintenance management through hardware-assisted virtualization computing
technology, reduces the total cost of ownership of IT infrastructure as a whole, and effectively
improves the availability, reliability and stability of business services. While addressing
computing resources, enterprises also need to consider the data storage suitable for virtualized
computing platforms, including storage security, reliability, scalability, ease of use, performance,
and cost.
The virtualized computing KVM platform can connect to various types of storage systems, such
as local disks, commercial SAN storage devices, NFS, and distributed storage systems,
respectively, to solve the data storage requirements of virtualized computing in different
application scenarios.
• Local disk: The local disk on the server, usually RAID striped to ensure disk data security.
High performance, poor scalability, difficult migration in virtualized environments, suitable
for high-performance business scenarios that basically do not consider data security.

• Commercial storage: that is, disk arrays, usually a single storage that integrates software
and hardware, using RAID to ensure data security. High performance, high cost, and virtual
migration with shared file systems, which is suitable for large-scale application data storage
scenarios such as Oracle Database.

• NFS system: Shared file system, low performance, good ease of use, and cannot guarantee
data security, suitable for scenarios where multiple virtual machines share reading and
writing

• Distributed storage system: Software-defined storage, which adopts the standard of a


general-purpose distributed storage system, aggregates the disk resources of a large
number of general-purpose x86 cheap servers to provide unified storage services. It ensures
data security through multiple copies, which is highly reliable, high-performance, high-
security, easy to scale, easy to migrate, and low-cost, and is suitable for most storage
scenarios of virtualized computing.

Each type of storage system has advantages and disadvantages in different storage scenarios,
and the virtualization computing platform needs to select the appropriate storage system
according to the business characteristics to provide storage virtualization functions, and in some
specific business modes, it may be necessary to provide a variety of storage systems for different
application services.

This document is strictly private, confidential and personal to its recipients and should not be copied,
distributed or reproduced in whole or in part, nor passed to any third party.
In a traditional storage architecture, clients communicating with centralized storage components
at a single-entry point can limit the performance and scalability of the storage system and can
introduce a single point of failure. The SCloudStack platform uses a distributed deposit and
payment system as virtualized storage for interconnection with KVM virtualized computing
services, eliminates centralized gateways, enables clients to interact directly with the storage
system, and writes data with a multi-copy mechanism to ensure data security and availability.

Distributed Storage
Based on distributed storage system adaptation optimization, SCloudStack cloud platform
provides a set of pure software-defined, high-performance, highly reliable, highly scalable, high-
security, easy-to-manage, and low-cost virtualized storage solutions for virtualized computing
platforms that can be deployed on x86 general-purpose servers, while having great scalability. As
the core component of the cloud platform, it provides users with a variety of storage services and
petabyte-level data storage capabilities, suitable for application scenarios such as virtual
machines and databases, meets the storage requirements of key services, and ensures efficient,
stable, and reliable business operation.
The distributed storage service integrates the disk storage resources of a large number of x86
general-purpose servers together to build an infinitely scalable unified distributed storage
cluster, realize the unified management and scheduling of all storage resources in the data
center, and provide [block] storage interfaces to the virtualization computing layer for cloud
platform virtual machines or virtual resources to freely allocate and use the storage space in the
storage resource pool according to their own needs. At the same time, storage management
supports the docking of storage devices such as IP-SAN, FC-SAN, AND NAS, and connects with
storage nodes through different adapters, providing unified block storage space through
platform virtualization storage technology.
Storage function WYSIWYG, users do not need to pay attention to the type and capability of
storage devices, you can quickly use virtualized storage services on the cloud platform, such as
virtual disk mounting, expansion, incremental snapshots, monitoring, etc., cloud platform users
use virtual disks in the same way as using the local hard disk of x86 servers, such as formatting,
installing operating systems, reading and writing data, etc. Cloud platform administrators can
configure and manage the overall virtualized storage resources of the platform, such as QoS
limits, storage pool expansion, storage specifications, and storage policy configurations.
The distributed storage system can provide block storage, file storage, and object storage
services, which are suitable for various data storage application scenarios, while ensuring data
security and cluster service reliability. In terms of deployment, it is generally recommended to
use the same type of disk to build a storage cluster, such as hyperconverged computing nodes
and independent storage nodes with SSD disks to build high-performance storage clusters. The
SATA/SAS disks that come with hyper finance compute nodes and independent storage nodes
This document is strictly private, confidential and personal to its recipients and should not be copied,
distributed or reproduced in whole or in part, nor passed to any third party.
are built as normal performance storage clusters. The distributed storage system combines the
disk devices in the cluster with OSDs as built-in elastic block storage services, which can be
directly mounted and used by virtual machines, and measures such as three copies, write
confirmation mechanism, and replica distribution policy are used to ensure data security and
availability to the greatest extent when data is written. The logical schema is as follows:

SCloudStack distributed storage system is an indispensable core component of the entire cloud
platform architecture, providing basic storage resources through the distributed storage cluster
architecture, supporting online horizontal expansion, and integrating intelligent storage clusters,
multi-copy mechanisms, data striping, data rebalancing, fault data reconstruction, data cleaning,
automatic thin provisioning and snapshots to provide high performance, high reliability, high
scalability, easy management and data security protection for virtualized storage. Improve the
service quality of storage virtualization and cloud platforms in all aspects.

Storage clusters
A distributed storage cluster can contain thousands of storage nodes and typically requires at
least one monitor, two OSD daemons for proper operation and data replication. The basic
concepts included are as follows:
• oSD: Usually an OSD corresponds to a physical machine, a disk, a RAID group or a physical
storage device, which is mainly responsible for data storage, processing data replication,
recovery, backfilling and data rebalancing, and is responsible for reporting detection
information to the monitor. A single cluster requires at least two OSDs, and the physical

This document is strictly private, confidential and personal to its recipients and should not be copied,
distributed or reproduced in whole or in part, nor passed to any third party.
architecture can be divided into multiple fault domains (computer rooms, racks, servers),
and multiple replicas are configured to be in different fault domains.

• Monitor: Implements the status monitoring of the storage cluster, is responsible for
maintaining the mapping diagram between the Object, PG and OSD of the storage cluster,
provides strong consistency decision-making for data storage, and provides the mapping
relationship of data storage for clients.

• Client: Deployed on the server, data slicing is implemented, the object location is located
through the CRUSH algorithm, and object data is read and written. This typically includes
block devices, object storage, file systems, and so on, with read/write operations handled
by the OSD daemon.

• CRUSH algorithm: A pseudo-random algorithm used to ensure uniform data distribution,


and both OSD and client use the CRUSH algorithm to calculate the location information of
objects on demand, providing support for dynamic scaling, rebalancing, and self-healing
functions of storage clusters.

The data storage process is shown in the following figure, the storage cluster receives data from
the client, stores the data shards as objects in the storage pool, and the client program obtains
the mapping relationship data by interacting with the OSD or monitor, and calculates the object
storage location locally through the CRUSH algorithm, communicate directly with the
corresponding OSD to complete data reading and writing operations.

Data can be sharded into 2MB, 4MB, or custom-sized objects, each of which is a file in the file system.

The distributed storage cluster involves a variety of logical concepts in the data storage process,
and the logical architecture diagram is as follows:

This document is strictly private, confidential and personal to its recipients and should not be copied,
distributed or reproduced in whole or in part, nor passed to any third party.
• A cluster can be logically divided into multiple pools, Pool is a namespace, and clients need
to specify a Pool when storing data.

• A Pool contains several logical PGs (Placement Group), which can define the number of PGs
and object replicas in the pool.

• PG is an intermediate logical layering of objects and OSDs, and when writing object data,
the PG to be stored is calculated first.

• A physical file is divided into multiple objects, each object is mapped to a PG, and a PG
contains multiple objects.

• A PG can be mapped to a set of OSDs, where the first OSD is dominant and the other OSDs
are slaves, and the Object is evenly distributed to a set of OSDs for storage.

• OSDs hosting the same PG monitor each other's survival status and support multiple PGs to
be mapped to an OSD at the same time.

This document is strictly private, confidential and personal to its recipients and should not be copied,
distributed or reproduced in whole or in part, nor passed to any third party.
In the storage cluster mechanism, master and slave OSDs hosting the same PG need to exchange
information with each other to ensure each other's survival state. The first time the client
accesses, it first obtains the mapped data from the monitor, and when the data is stored, it
compares the version of the mapped data with the OSD. As can be seen from the diagram above,
an OSD can host multiple PGs at the same time, and each PG is usually 3 OSDs, OSDs The number
of messages exchanged may reach thousands at any one time, and the number of exchanges is
determined by the number of OSDs and PGs in the cluster. As shown in the preceding figure, the
data addressing process is divided into three mapping phases:
1. Map the files to be manipulated by the user as objects that can be processed by the storage
cluster, that is, use data striping to shard files according to the size of the object, and data
striping is consistent with the concept of RAID.

2. Map the Object of all file shards to the PG.

3. Map the PG into the OSD where the data is actually stored.
The size of the pool of the distributed storage cluster, the number of replicas, the CRUSH algorithm rules, and the
number of PGs determine how the storage cluster stores data.

This document is strictly private, confidential and personal to its recipients and should not be copied,
distributed or reproduced in whole or in part, nor passed to any third party.

You might also like