Professional Documents
Culture Documents
PowerScale+Concepts SSP+ +Participant+Guide
PowerScale+Concepts SSP+ +Participant+Guide
PowerScale+Concepts SSP+ +Participant+Guide
CONCEPTS-SSP
PARTICIPANT GUIDE
PARTICIPANT GUIDE
Internal Use - Confidential
PowerScale Concepts-SSP
Course Objectives...................................................................................................... 8
Course Objectives................................................................................................................ 9
PowerScale Concepts-SSP
PowerScale Concepts-SSP
PowerScale Concepts-SSP
PowerScale Concepts-SSP
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 1
Prerequisite Skills
Prerequisite Skills
PowerScale Concepts-SSP
Prerequisite Skills
To understand the content and successfully complete this course, a student must
have a suitable knowledge base or skill set. The student must have an
understanding of:
• Networking fundamentals such as TCP/IP, DNS and routing
• An introduction to storage such as NAS and SAN differences and basic storage
principles and features
• Installation process of an PowerScale cluster
PowerScale Concepts-SSP
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 3
Rebranding - Isilon is now PowerScale
PowerScale Concepts-SSP
PowerScale Concepts-SSP
PowerScale Concepts-SSP
The graphic shows the PowerScale Solutions Expert certification track. You can
leverage the Dell Technologies Proven Professional program to realize your full
potential. A combination of technology-focused and role-based training and exams
to cover concepts and principles as well as the full range of Dell Technologies'
hardware, software, and solutions. You can accelerate your career and your
organization’s capabilities.
PowerScale Solutions
(C) - Classroom
PowerScale Concepts-SSP
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 7
Course Objectives
Course Objectives
PowerScale Concepts-SSP
Course Objectives
PowerScale Concepts-SSP
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 9
Data Storage Overview
PowerScale Concepts-SSP
PowerScale Concepts-SSP
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 11
Data Storage Overview
Module Objectives
PowerScale Concepts-SSP
Storage Evolution
PowerScale Concepts-SSP
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 13
Data Storage Overview
During the data storage evolution, two types of data developed: structured data and
unstructured data. PowerScale specializes in storing unstructured data.
• Resides in fixed field of records or files • Does not reside in fixed model
PowerScale Concepts-SSP
Block-based data
• Sequence of bytes at fixed length
• Single piece of file or whole file
File-based data
• Discrete unit of information defined by application or created by user
PowerScale Concepts-SSP
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 15
Data Storage Overview
Digital Transformation
IDC projects, that through 2022, 75% of successful digital strategies will be built by
a transformed IT organization, with modernized and rationalized infrastructure,
applications, and data architectures.
[...] within the next four years, the global economy will finally reach digital
supremacy, with more than half of Gross domestic product (GDP)Gross domestic
product (GDP) is a monetary measure of the market value of all the final goods and
services produced in a specific time period. - IDC FutureScape1
At the same time, many organizations still struggle to tactically apply DX learnings
to their own business.
PowerScale Concepts-SSP
With unstructured data being the majority of data storage growth, a solution was
needed. An International Data Corporation (IDC) study published in 2018 showed
that the amount of digital data created, captured, and replicated worldwide grew
exponentially. This finding was based on the proliferation of then-new technologies
such as Voice over IP, RFID, smartphones, and consumer use of GPS. Also, the
continuance of data generators such as digital cameras, HD TV broadcasts, digital
games, ATMs, email, videoconferencing, medical imaging, and so on.
PowerScale Concepts-SSP
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 17
Data Storage Overview
PowerScale clusters are a NAS solution. There are two types of NAS architectures;
scale-up and scale-out.
Scale-Up
3The two controllers can run active/active or active-passive. For more capacity,
add another disk array. Each of these components is added individually. As more
systems are added, NAS sprawl becomes an issue.
PowerScale Concepts-SSP
Controller with
disk shelves
Independent systems on
network - separate points
of management
Client
s
Structured or
Unstructured storage
Scale-Out
• With a clustered NAS solutions, or scale-out architecture, all the NAS boxes, or
PowerScale nodes, belong to a unified cluster with a single point of
management.
• In a scale-out solution4, the computational throughput, disks, disk protection,
and management are combined and exist for a single cluster.
4Not all clustered NAS solutions are the same. Some vendors overlay a
management interface across multiple independent NAS boxes. This gives a
unified management interface, but does not unify the file system. While this
approach does ease the management overhead of traditional NAS, it still does not
scale well.
PowerScale Concepts-SSP
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 19
Data Storage Overview
Unstructured storage
1000+ PBS
Clients
PowerScale Concepts-SSP
Scale-Out NAS
Scale-out NAS5 is now a mainstay in most data center environments. The next
wave of scale-out NAS innovation has enterprises embracing the value6 of NAS
and adopting it as the core of their infrastructure.
PowerScale Concepts-SSP
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 21
Data Storage Overview
PowerScale Concepts-SSP
DAS
SAN
SAN
As applications proliferated, soon there were many servers, each with its own DAS.
This worked fine, with some drawbacks. If one server’s DAS was full while another
server’s DAS was half empty, the empty DAS couldn’t share its space with the full
DAS. Due to this limitation with DAS, SAN was introduced which effectively utilized
volume manager and RAID.
NAS
NAS
SAN was set up for servers, not personal computers (PCs). PCs worked differently
from the storage file server and the network communications in PCs, only
communicate from one file system to another file system. The breakthrough came
when corporations put employee computers on the network, and added to the
PowerScale Concepts-SSP
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 23
Data Storage Overview
storage a file system to communicate with users. From this, Network Attached
Storage (NAS) was born.
NAS works pretty well, but there is room for improvement. For example, the server
is spending as much time servicing employee requests as it is doing the application
work it was meant for. The file system doesn’t know where data is supposed to go,
because that’s the volume manager’s job. The volume manager doesn’t know how
the data is protected; that’s RAID’s job. If high-value data needs more protection
than other data, you need to migrate the data to a different volume that has the
protection level that data needs. So there is opportunity to improve NAS.
PowerScale Concepts-SSP
With traditional NAS systems the file system7, volume manager8, and the
implementation of RAID9 are all separate entities.
OneFS is the operating system and the underlying file system that drives and
stores data. OneFS is a single file system that performs the duties of the volume
manager and applies protection.
• Creates a single file system for the cluster.10
• Volume manager and protection.11
7The file system is responsible for the higher-level functions of authentication and
authorization.
10As nodes are added, the file system grows dynamically and content is
redistributed.
PowerScale Concepts-SSP
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 25
Data Storage Overview
12Because all information is shared among nodes, the entire file system is
accessible by clients connecting to any node in the cluster.
13Each PowerScale storage node contains globally coherent RAM, meaning that,
as a cluster becomes larger, it also becomes faster. When a node is added, the
performance scales linearly.
PowerScale Concepts-SSP
PowerScale Concepts-SSP
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 27
PowerScale Physical Architecture
PowerScale Concepts-SSP
Module Objectives
PowerScale Concepts-SSP
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 29
PowerScale Physical Architecture
14The Gen 6 platform reduces the data center rack footprints with support for four
nodes in a single 4U chassis. It enables enterprise to take on new and more
demanding unstructured data applications. The Gen 6 can store, manage, and
protect massively large datasets with ease. With the Gen 6, enterprises can gain
new levels of efficiency and achieve faster business outcomes.
15 The ideal use cases for Gen 6.5 (F200 and F600) is remote office/back office,
factory floors, IoT, and retail. Gen 6.5 also targets smaller companies in the core
verticals, and partner solutions, including OEM. The key advantages are low entry
price points and the flexibility to add nodes individually, as opposed to a chassis/2
node minimum for Gen 6.
PowerScale Concepts-SSP
Network: There are two types of networks that are associated with a cluster:
internal and external.
Ethernet
Clients connect to the cluster using Ethernet connections17 that are available on all
nodes.
16In general, keeping the network configuration simple provides the best results
with the lowest amount of administrative overhead. OneFS offers network
provisioning rules to automate the configuration of additional nodes as clusters
grow.
17Because each node provides its own Ethernet ports, the amount of network
bandwidth available to the cluster scales linearly.
PowerScale Concepts-SSP
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 31
PowerScale Physical Architecture
OneFS supports a single cluster18 on the internal network. This back-end network,
which is configured with redundant switches for high availability, acts as the
backplane for the cluster.19
19 This enables each node to act as a contributor in the cluster and isolating node-
to-node communication to a private, high-speed, low-latency network. This back-
end network utilizes Internet Protocol (IP) for node-to-node communication.
PowerScale Concepts-SSP
The external network provides connectivity for clients over standard file-based
protocols. It supports link aggregation, and network scalability is provided through
software in OneFS. A Gen 6 node has to 2 front-end ports - 10 GigE, 25 GigE, or
40 GigE, and one 1 GigE port for management. Gen 6.5 nodes have 2 front-end
ports - 10 GigE, 25 GigE, or 100 GigE. In the event of a Network Interface
Controller (NIC) or connection failure, clients do not lose their connection to the
cluster. For stateful protocols, such as SMB and NFSv4, this prevents client-side
timeouts and unintended reconnection to another node in the cluster. Instead,
clients maintain their connection to the logical interface and continue operating
normally. Support for Continuous Availability (CA) for stateful protocols like SMB
and NFSv4 is available with OneFS 8.0.
PowerScale Concepts-SSP
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 33
PowerScale Physical Architecture
Back-end Network
InfiniBand
Ethernet
PowerScale Concepts-SSP
PowerScale Concepts-SSP
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 35
PowerScale Physical Architecture
The clients can access the cluster using DNS, and the enhanced functionality20
provides connection distribution policies as shown in the graphic. Also, they provide
continuous availability21 (CA) capabilities.
1 4
2
3
PowerScale Concepts-SSP
2: Determines the average CPU utilization on each available network interface and
selects the network interface with lightest processor usage.
3: Selects the next available network interface on a rotating basis. This selection is
the default method. Without a SmartConnect license for advanced settings, this is
the only method available for load balancing.
PowerScale Concepts-SSP
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 37
PowerScale Physical Architecture
N + M Data Protection
OneFS sets parity bits, also called FEC protection. In the example below, using the
parity bit (green), OneFS determines the missing pieces.
Here, if blue + yellow = green, the missing pieces are identified using the parity
bits.
Parity bit
Then
PowerScale Concepts-SSP
FEC enables the customer to choose the number of bits of parity to implement.
One bit of parity for many disks is known as N+1; two parity points for many disks
are known as N+2, and so on.
With the N+1 protection, data is 100% available even if a drive or a node fails.
Failure
With N+2, N+3, and N+4 protection, data is 100% available if multiple drives or
nodes fail.
Failure Failure
PowerScale Concepts-SSP
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 39
PowerScale Physical Architecture
During the write operation, with OneFS, the file from the client is striped across the
nodes. The system breaks the file-based data into smaller logical sections called
stripe units. The smallest element in a stripe unit is 8 kilobytes and each stripe unit
is 128 kilobytes, or sixteen 8 kilobytes blocks. If the datafile is larger than 128
kilobytes, the next part of the file is written to a second node. If the file is larger than
256 kilobytes, the third part is written to a third node, and so on. The graphic
illustrates a 384-kilobytes file with 3 stripe units and 1 FEC unit.
File
Stripe Unit
FEC
Node 3 Node 4
Node 1 Node 2
Leaf
Leaf
Spine
PowerScale Concepts-SSP
PowerScale Nodes
PowerScale Concepts-SSP
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 41
PowerScale Nodes
PowerScale Nodes
PowerScale Concepts-SSP
Module Objectives
PowerScale Concepts-SSP
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 43
PowerScale Nodes
The design goal for the PowerScale nodes are to keep the simple ideology of NAS,
provide the agility of the cloud, and the cost of commodity.
The Gen 6x family has different offerings that are based on the need for
performance and capacity. As Gen 6 is a modular architecture, you can scale out
compute and capacity separately. All the nodes are powered by OneFS.
PowerScale Concepts-SSP
PowerScale Family
Click each tab to learn about the different offerings that Gen 6 family provides.
F-Series
• F80022
• F81023
• F60024
• F20025
22 The F800 is suitable for workflows that require extreme performance and
efficiency.
23 The F810 is suitable for workflows that require extreme performance and
efficiency. The F810 also provides high-speed inline data deduplication and in-line
data compression. It delivers up to 3:1 efficiency, depending on your specific
dataset and workload.
24 Ideal for small, remote clusters with exceptional system performance for small
office/remote office technical workloads.
PowerScale Concepts-SSP
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 45
PowerScale Nodes
H-Series
After F-series nodes, next in terms of computing power are the H-series nodes.
These are hybrid storage platforms that are highly flexible and strike a balance
between large capacity and high-performance storage to provide support for a
broad range of enterprise file workloads.
• H40026
• H50027
• H560028
• H60029
25Ideal for low cost all-flash node pool for existing Gen6 clusters. Ideal for small,
remote clusters.
27The H500 is a versatile hybrid platform that delivers up to 5 GB/s bandwidth per
chassis with a capacity ranging from 120 TB to 720 TB per chassis. It is an ideal
choice for organizations looking to consolidate and support a broad range of file
workloads on a single platform.
28The H5600 combines massive scalability – 960 TB per chassis and up to 8 GB/s
bandwidth in an efficient, highly dense, deep 4U chassis. The H5600 delivers inline
data compression and deduplication. It is designed to support a wide range of
demanding, large-scale file applications and workloads.
PowerScale Concepts-SSP
A-Series
The A-series nodes namely have lesser compute power compared to other nodes
and are designed for data archival purposes. The archive platforms can be
combined with new or existing all-flash and hybrid storage systems into a single
cluster that provides an efficient tiered storage solution.
• A20030
• A200031
30The A200 is an ideal active archive storage solution that combines near-primary
accessibility, value and ease of use.
31The A2000 is an ideal solution for high density, deep archive storage that
safeguards data efficiently for long-term retention.
PowerScale Concepts-SSP
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 47
PowerScale Nodes
Gen 6 requires a minimum of four nodes to form a cluster. You must add nodes to
the cluster in pairs.
The chassis holds four compute nodes and 20 drive sled slots.
Both compute modules in a node pair power-on immediately when one of the
nodes is connected to a power source.
Gen 6 chassis
1 10 9
2 8
4
6
3
5 7
1: The compute module bay of the two nodes make up one node pair. Scaling out a
cluster with Gen 6 nodes is done by adding more node pairs.
2: Each Gen 6 node provides two ports for front-end connectivity. The connectivity
options for clients and applications are 10 GbE and 40 GbE.
3: Each node can have 1 or 2 SSDs that are used as L3 cache, global namespace
acceleration (GNA), or other SSD strategies.
4: Each Gen 6 nodes provides two ports for back-end connectivity. A Gen 6 node
supports 10 GbE, 40 GbE, and InfiniBand.
5: Power supply unit - Peer node redundancy: When a compute module power
supply failure takes place, the power supply from the peer compute module in the
node pair will temporarily provide power to both nodes.
PowerScale Concepts-SSP
6: Each Node has five drive sleds. Depending on the length of the chassis and type
of the drive, each node can handle up to 30 drives or as few as 15.
8: The sled can be either a short sled or a long sled. The types are:
9: The chassis comes in two different depths, the normal depth is about 37 inches
and the deep chassis is about 40 inches.
10: Large journals offer flexibility in determining when data should be moved to the
disk. Each node has a dedicated M.2 vault drive for the journal. A node mirrors
their journal to its peer node. The node writes the journal contents to the vault when
a power loss occurs. A backup battery helps maintain power while data is stored in
the vault.
PowerScale Concepts-SSP
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 49
PowerScale Nodes
Gen 6.5 requires a minimum of three nodes to form a cluster. You can add single
nodes to the cluster. The F600 and F200 are a 1U form factor and based on the
R640 architecture.
8 5
1: Scaling out an F200 or an F600 node pool only requires adding one node.
3: Each F200 node has four SAS SSDs. Each F600 node has 8 NVMe SSDs.
4: Each Gen F200 and F600 node provides two ports for backend connectivity. The
PCIe slot 1 is used.
5: Redundant power supply units - When a power supply fails, the secondary
power supply in the node provides power. Power is supplied to the system equally
from both PSUs when the Hot Spare feature is disabled. Hot Spare is configured
using the iDRAC settings.
7: The nodes come in two different 1U models. The graphic shows the F200.
8: The F200 frontend connectivity uses the rack network daughter card (rNDC).
PowerScale Concepts-SSP
Important: The F600 nodes have a 4-port 1 GB NIC in the rNDC slot.
The NIC is not allocated to any OneFS function.
PowerScale Concepts-SSP
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 51
PowerScale Nodes
Node Interconnectivity
1: Backend ports int-a and int-b. The int-b port is the upper port. Gen 6 backend
ports are identical for InfiniBand and Ethernet, and cannot be identified by looking
at the node. If Gen 6 nodes are integrated in a Gen 5 or earlier cluster, the backend
will use InfiniBand. Note that there is a procedure to convert an InfiniBand backend
to Ethernet if the cluster no longer has pre-Gen 6 nodes.
2: PowerScale nodes with different backend speeds can connect to the same
backend switch and not see any performance issues. For example, an environment
has a mixed cluster where A200 nodes have 10 GbE backend ports and H600
nodes have 40 GbE backend ports. Both node types can connect to a 40 GbE
switch without effecting the performance of other nodes on the switch. The 40 GbE
switch provides 40 GbE to the H600 nodes and 10 GbE to the A200 nodes.
4: There are two speeds for the backend Ethernet switches, 10 GbE and 40 GbE.
Some nodes, such as archival nodes, might not need to use all of a 10 GbE port
bandwidth while other workflows might need the full utilization of the 40 GbE port
bandwidth. The Ethernet performance is comparable to InfiniBand so there should
be no performance bottlenecks with mixed performance nodes in a single cluster.
Administrators should not see any performance differences if moving from
InfiniBand to Ethernet.
Gen 6 nodes can use either an InfiniBand or Ethernet switch on the backend.
InfiniBand was designed as a high-speed interconnect for high-performance
PowerScale Concepts-SSP
computing, and Ethernet provides the flexibility and high speeds that sufficiently
support the PowerScale internal communications.
Gen 6.5 only supports Ethernet. All new, PowerScale clusters support Ethernet
only.
PowerScale Concepts-SSP
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 53
PowerScale Nodes
Quick Scalability
Ready to write
PowerScale Concepts-SSP
PowerScale Concepts-SSP
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 55
PowerScale OneFS Operating System
PowerScale Concepts-SSP
PowerScale Concepts-SSP
Module Objectives
PowerScale Concepts-SSP
33When nodes are added, OneFS redistributes the content to use the resources of
the entire cluster.
PowerScale Concepts-SSP
• FlexProtect35.
• Runs on all nodes
• Each node is a peer36.
• Prevents bottlenecking37.
• A copy of OneFS is on every cluster node.
• 10 GBE, 40 GbE (Gen 6): 10 GBE, 25 GBE, 100 GBE(Gen 6.5) and infiniBand
handle all intracluster communications.
35 Creates an n-way, redundant fabric that scales as nodes are added to the
cluster, providing 100% data availability even with four simultaneous node failures.
36 Each node shares the management workload and acts independently as a point
of access for incoming data request.
PowerScale Concepts-SSP
Benefits of OneFS
38When a node is added to the cluster, it adds computing power, storage, caching,
and networking resources.
PowerScale Concepts-SSP
OneFS supports access to the same file using different protocols and
authentication methods simultaneously. SMB clients that authenticate using Active
Directory (AD), and NFS clients that authenticate using LDAP, can access the
same file with their appropriate permissions applied.
• OneFS translates Windows Security Identifiers (SIDS) and UNIX User Identities
(UIDs) into a common identity format.
• Different authentication sources.
• Permissions activities are transparent to client.
• Authenticate against correct source.
• File access behavior as protocol expects.
• Correct permissions applied - stores the appropriate permissions for each
identity or group.
PowerScale Concepts-SSP
Authentication
1
5 2
1: Active Directory (AD): The primary reason for joining the cluster to an AD
domain is to let the AD domain controller perform user and group authentication.
4: Local or File Provider: OneFS supports local user and group authentication
using the web administration interface.
PowerScale Concepts-SSP
Policy-Based Automation
• Includes the way data is distributed across the cluster and on each node.
• Includes how client connections get distributed among the nodes, when and
how maintenance tasks are run.
PowerScale Concepts-SSP
Management Interfaces
• Serial Console39
• Web Administration Interface (WebUI)40
• Command Line Interface (CLI)41
39The serial console is used for initial cluster configurations by establishing serial
access to the node designated as node 1.
PowerScale Concepts-SSP
42The PAPI is divided into two functional areas: one area enables cluster
configuration, management, and monitoring functionality, and the other area
enables operations on files and directories on the cluster.
43The Front Panel Display is located on the physical node or chassis. It is used to
perform basic administrative tasks onsite.
PowerScale Concepts-SSP
So who is allowed to access and make configuration changes using the cluster
management tools? In addition to the integrated root and admin users, OneFS
provides role-based access control (RBAC). With RBAC, you can define privileges
to customize access to administration features in the OneFS WebUI, CLI, and for
PAPI management.
• Grant or deny access to management features.
Configured user with restricted privileges
• RBAC
• Set of global admin privileges
• Five preconfigured admin roles
• Zone RBAC (ZRBAC)
• Set of admin privileges specific to an access zone
• Two preconfigured admin roles
• Can create custom roles.
• Assign users to one or more roles.
PowerScale Concepts-SSP
Dell Technologies
Support Location
Cluster
Location
If there is an issue with your cluster, there are two types of support available. You
can manually upload logfiles to the Dell Technologies support FTP site, or use
Secure Remote Services.
• Manually FTP upload logfiles
• As needed.
• Support requests logfiles.
• Secure Remote Support
• Broader product support.
• Manual logfile uploads.
• 24x7 remote monitoring - node-by-node basis and sends alerts regarding
the health of devices.
• Allows remote cluster access - requires permission.
• Secure authentication with AES 256-bit encryption and RSA digital
certificates.
• Log files provide detailed information about the cluster activities.
• Remote session that is established through SSH or the WebUI - support
personnel can run scripts that gather diagnostic data about cluster settings and
operations. Data is sent to a secure FTP site where service professionals can
open support cases and troubleshoot on the cluster.
PowerScale Concepts-SSP
PowerScale Concepts-SSP
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 69
Data Management and Security
PowerScale Concepts-SSP
Module Objectives
PowerScale Concepts-SSP
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 71
Data Management and Security
Node Pool
Data distribution is how OneFS spreads data across the cluster. Various models of
PowerScale nodes, or node types can be present in a cluster. Nodes are assigned
to node pools based on the model type, number of drives, and the size of the
drives. The cluster can have multiple node pools, and groups of node pools can be
combined to form tiers of storage. Data distributes among the different node pools
that are based on the highest percentage of available space. This means that the
data target can be a pool or a tier anywhere on the cluster.
PowerScale Concepts-SSP
Data IO Optimization
Manage directories or
Manage cluster-wide by files
default
random
concurrent
sequential
You can optimize data input and output to match the workflows for your business.
By default, optimization is managed cluster-wide, but you can manage individual
directories or individual files. The data access pattern can be optimized for random
access, sequential access, or concurrent access. For example, sequential
optimization has aggressive prefetching. The prefetch, or read ahead, is an
optimization algorithm that attempts to predict what data is needed next, before the
request is made. When clients open larger files, especially streaming formats like
video and audio, OneFS assumes that you will watch minute four of the video after
minute three. Prefetch proactively loads minutes four, five, and sometimes even six
into memory before it is requested. Prefetch delivers those minutes faster than
returning to the hard drive for each request. With OneFS, you can configure the
prefetch cache characteristics to work best with the selected access pattern.
PowerScale Concepts-SSP
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 73
Data Management and Security
Performance optimization is the first thing a customer notice about their cluster in
day to day operations. But what does the average administrator notice second?
They notice when a cluster has issues after they notice how great it works. They
want it fast, and they want it to work. That is a reason why data protection is
essential.
Data protection level refers to how many components in a cluster can malfunction
without loss of data.
• Flexible and configurable.
• Virtual hot spare - allocate disk space to hold data as it is rebuilt when a disk
drive fails.
• Select FEC protection by node pool, directory, or file.
• Extra protection creates more FEC stripes, increasing overhead.
• Standard functionality is available in the unlicensed version of SmartPools.
PowerScale Concepts-SSP
You can subdivide capacity usage by assigning storage quotas to users, groups,
and directories.
• Policy-based quota management.
• Nesting - place a quota on a department, and then a smaller quota on each
department user, and a different quota on the department file share.
• Thin provisioning - shows available storage even if capacity is not available.
• Quota types
• Accounting - informational only, can exceed quota.
• Enforcement soft limit - notification sent when exceeded
• Enforcement hard limit - deny writes.
• Customizable quota notifications.
• Requires SmartQuotas license.
PowerScale Concepts-SSP
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 75
Data Management and Security
PowerScale Concepts-SSP
InsightIQ is a powerful tool that monitors one or more clusters and then presents
data in a robust graphical interface with reports you can export. You can examine
the information and break out specific information you want, and even take
advantage of usage growth and prediction features. InsightIQ offers:
• Monitor system usage - performance and file system analytics.
• Requires a server or VMware system external to cluster.
• Free InsightIQ license.
PowerScale Concepts-SSP
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 77
Data Management and Security
Each stripe is protected separately with forward error correction (FEC) protection
blocks, or parity. Shown is a 1-megabyte file that is divided into two stripe units with
N+2 protection.
• Protected at data stripe - one or two data or protection stripe units are contained
on a single node for any given data stripe.
• Striped across nodes.
• Variable protection levels - set separately for node pools, directories, or even
individual files.
• Set at node pool, directory, or file.
• High availability is integrated - data is spread onto many drives and multiple
nodes, all ready to help reassemble the data when a component fails.
PowerScale Concepts-SSP
Data resiliency is the ability to recover past versions of a file that has changed over
time. Sooner or later, every storage admin gets asked to roll back to a previous
“known good” version of a file. OneFS provides this capability using snapshots.
• File change rollback technology - called snapshots.
• Copy-on-write (CoW) - writes the original blocks to the snapshot version first,
and then writes the data to the file system, incurs a double write penalty but less
fragmentation.
• Redirect-on-write (RoW) - writes changes into available file system space and
then update pointers to look at the new changes, there is no double write
penalty but more fragmentation.
• Policy-based
• Scheduled snapshots
• Policies determine the snapshot schedule, path to the snapshot location,
and snapshot retention periods.
• Deletions happen as part of a scheduled job, or are deleted manually.
• Out of order deletion allowed, but not recommended.
• Some system processes use with no license required.
• Full capability requires SnapshotIQ license.
PowerScale Concepts-SSP
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 79
Data Management and Security
PowerScale Concepts-SSP
Source Target
Replication keeps a copy of data from one cluster on another cluster. OneFS
replicates during normal operations, from one PowerScale cluster to another.
Replication may be from one to one, or from one to many PowerScale clusters.
Cluster-to-cluster synchronization
Cluster-to-cluster synchronization
• Copy - new files on the source are copied to the target, while files deleted on
the source remain unchanged on the target.
• Synchronization - only works in one direction and both the source and target
clusters maintain identical file sets, except that files on the target are read-only.
Per directory or for specific types of data and can set exceptions to include or
exclude specific files.
• Manual start
PowerScale Concepts-SSP
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 81
Data Management and Security
• On schedule
• When changes made
Bandwidth throttling
Bandwidth throttling - used on replication jobs to optimize resources for high priority
workflows.
PowerScale Concepts-SSP
Data Retention
Data retention is the ability to prevent data from being deleted or modified before
some future date. In OneFS, you can configure data retention at the directory level,
so that different directories can have different retention policies. You can also use
policies to automatically commit certain types of files for retention.
• Two modes of retention
• Enterprise (more flexible) - enable privileged deletes by an administrator.
• Compliance (more secure) - designed to meet SEC regulatory requirements.
Once data is committed to disk, individuals cannot change or delete the data
until the retention clock expires - OneFS prohibits clock changes.
• Compatible with SyncIQ replication.
• Requires SmartLock license.
PowerScale Concepts-SSP
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 83
PowerScale and Big Data
PowerScale Concepts-SSP
PowerScale Concepts-SSP
Module Objectives
PowerScale Concepts-SSP
PowerScale Concepts-SSP
The “Three v's”– volume, velocity, and variety – often arrive together. When they
combine, administrators truly feel the need for high performance, higher capacity
storage. The three V's generate the challenges of managing Big Data.
Growing data has also forced an evolution in storage architecture over the years
due to the amount of maintained data. PowerScale is a Big Data solution because
it can handle the volume, velocity, and variety that defines the fundamentals of Big
Data.
1: Challenge: Nonflexible data protection. When you have Big Data volumes of
information to store, it had better be there, dependably. If an organization relies on
RAID to protect against data loss or corruption, the failure of a single disk drive
causes a disproportionate inconvenience. The most popular RAID implementation
scheme allows the failure of only two drives before data loss. (A sizable Big Data
installation easily has more than 1000 individual hard drives, so odds are at least
one drive is down at any time.) The simpler answer is to protect data using a
different scheme.
What is meant by volume? Consider any global website that works at scale. One
example of Big Data volume is the YouTube press page that says YouTube ingests
100 hours of video every minute.
PowerScale Concepts-SSP
graphics chip prototype, sensors generate many terabytes of data per second.
Storing terabytes of data in seconds is an example of Big Data velocity.
3: Perhaps the best example of variety is the migration of the world to social media.
On a platform such as Facebook, people post all kinds of file formats: text, photos,
video, polls, and more. Many kinds of data at that scale represent Big Data variety.
PowerScale Concepts-SSP
45 Challenge: SAN and scale-up NAS architectures must reserve much of the raw
capacity of the system for management and administrative overhead. Overhead
includes RAID parity disks, metadata for all the LUNs and mega LUNs, duplicate
copies of the file system, and so on. As a result, conventional SAN and NAS
architectures often use half of the raw capacity available, because of the headroom
for each separate stack of storage. Suppose that you have seven different silos of
data. When you put them in one large volume, you immediately get back the
headroom from six of the seven stacks. In that way, PowerScale offers high
utilization. PowerScale customers routinely use 80% or more of raw disk capacity.
PowerScale Concepts-SSP
46 Some data storage architectures use two controllers, sometimes called servers
or filers, to run a stack of many hard drives. You can scale capacity by adding more
hard drives, but it is difficult to scale performance. In a given storage stack, the
hard drives offer nothing but capacity. All the intelligence of the system, including
computer processing and RAM, must come from the two filers. If the horsepower of
the two filers becomes insufficient, the architecture does not enable you to pile on
more filers. You start over with another stack and two more filers. In contrast, every
node in an PowerScale cluster contains capacity plus computing power plus
memory. The nodes can work in parallel, so each node you add scales out linearly.
In other words, all aspects of the cluster scale up, including capacity and
performance.
47 Due to the architectural restrictions, SAN and scale-up NAS end up with several
isolated stacks of storage. Many sites have a different storage stack for each
application or department. A backup storage stack is an example. Instead, an
administrator has to manually arrange a data migration. If the R&D stack performs
product testing that generates results at Big Data velocity, the company may
establish an HPC stack, which could reach capacity rapidly. Other departments or
workflows may have independent storage stacks with lot of capacity remaining, but
there is no automated way for R&D to offload their HPC overflow. In contrast, an
PowerScale cluster distributes data across all its nodes to keep them all at equal
capacity. You do not have one node that is overworked while other nodes sit idle.
PowerScale Concepts-SSP
There are no hot spots, and thus, no manual data migrations. If the goal is to keep
pace with Big Data velocity, automated balancing makes more sense.
48In conventional storage, a file is typically confined to a RAID stripe. That means
that the maximum throughput of reading that file is limited to how fast those drives
can deliver the file. In modern workflows where a hundred engineers or a thousand
digital artists access a file, the RAID drives cannot keep up. Perhaps the two filers
on that stack cannot process that many requests efficiently. With PowerScale,
every node has at least a dozen drives, plus more RAM and more computer
processing, for more caching and better concurrent access. When there is heavy
demand for a file, several nodes can deliver it.
49 Besides manual data migrations, conventional storage has many more manual
processes. A SAN or a scale-up NAS administrator spends a significant amount of
time creating and managing LUNs, partitioning storage, establishing mounts,
launching jobs, and so on. In contrast, PowerScale is policy-driven. Once you
define your policies, the cluster does the rest automatically.
PowerScale Concepts-SSP
A scale-out Data Lake is a large storage solution where vast amounts of data from
other solutions or locations are combined into a single store. Elements of a data
lake are:
• Digital repository to store massive data.
• Variety of formats.
• Can do computations and analytics on original data.
• Helps address the variety issue with Big Data.
• Data can be secured, analyzed, and actions taken based on insights.
• Enterprises can eliminate the cost of having silos of information.
• Provides scaling capabilities in terms of capacity, performance, security, and
protection.
PowerScale Concepts-SSP
PowerScale Concepts-SSP
Unmatched
Easy Growth
Efficiency
Cloud Tiering
Ready Hadoop Enabled
A Data Lake is a central data repository that stores data from various sources, such
as file shares, web apps, and the cloud. It enables businesses to access the same
data for various uses and enables the manipulation of data using various clients,
analyzers, and applications. The data is real-time production data with no need to
copy or move it from an external source, like another Hadoop cluster, into the Data
Lake. The Data Lake provides tiers that are based on data usage, and the ability to
instantly increase the storage capacity when needed. This slide identifies the key
characteristics of a scale-out Data Lake.
PowerScale Concepts-SSP
PowerScale CloudPools
The PowerScale CloudPools software enables you to select from various public
cloud services or use a private cloud. CloudPools offers the flexibility of another tier
of storage that is off-premise and off-cluster. Essentially what CloudPools do is
provide a lower TCO50 for archival-type data.
• Treat cloud storage as another cluster-connected tier.
• Policy-based automated tiering
• Address rapid data growth and optimize data center storage resources - use
valuable on-site storage resources for active data.
• Send rarely used or accessed data to cloud.
• Seamless integration with data – retrieve at any time.
PowerScale Concepts-SSP
PowerScale Concepts-SSP
Edge locations are often inefficient islands of storage, running with limited IT
resources, and inconsistent data protection practices. Data at the edge generally
lives outside of the Data Lake, making it difficult to incorporate into data analytics
projects. The edge-to-core-to-cloud approach extends the Data Lake to edge
locations and out into the cloud. It enables consolidation, protection, management,
and backups of remote edge location data.
PowerScale Concepts-SSP
Course Summary
PowerScale Concepts-SSP
Course Summary
PowerScale Concepts-SSP
PowerScale Concepts-SSP
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 101
Appendix
PowerScale Nodes
Individual PowerScale nodes provide the data storage capacity and processing
power of the PowerScale scale-out NAS platform. All of the nodes are peers to
each other and so there is no single 'master' node and no single 'administrative
node'.
• No single master
• No single point of administration
Administration can be done from any node in the cluster as each node provides
network connectivity, storage, memory, non-volatile RAM (NVDIMM) and
processing power found in the Central Processing Units (CPUs). There are also
different node configurations, compute, and capacity. These varied configurations
can be mixed and matched to meet specific business needs.
Each contains.
• Disks
• Processor
• Cache
PowerScale Concepts-SSP
Tip: Gen 5 and Gen 6 nodes can exist within the same cluster. Every
PowerScale node is equal to every other PowerScale node of the
same type in a cluster. No one specific node is a controller or filer.
PowerScale Concepts-SSP
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 103
PowerScale Concepts-SSP