PowerScale+Concepts SSP+ +Participant+Guide

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 110

POWERSCALE

CONCEPTS-SSP

PARTICIPANT GUIDE

PARTICIPANT GUIDE
Internal Use - Confidential
PowerScale Concepts-SSP

Internal Use - Confidential


© Copyright 2020 Dell Inc. Page i
Table of Contents

Prerequisite Skills ...................................................................................................... 2


Prerequisite Skills ................................................................................................................ 3

Rebranding - Isilon is now PowerScale ................................................................... 4


Rebranding - Isilon is now PowerScale ................................................................................ 5

PowerScale Solutions - Internal ............................................................................... 6


PowerScale Solutions - Internal ........................................................................................... 7

Course Objectives...................................................................................................... 8
Course Objectives................................................................................................................ 9

Data Storage Overview ............................................................................................ 10


Data Storage Overview ...................................................................................................... 11
Module Objectives ............................................................................................................. 12
Storage Evolution............................................................................................................... 13
Types of Data Storage ....................................................................................................... 14
Block-Based Data and File-Based Data ............................................................................. 15
Digital Transformation ........................................................................................................ 16
Data Storage: Ever Changing and Ever Growing ............................................................... 17
Two Types of NAS: Scale-Up and Scale-Out ..................................................................... 18
Scale-Out NAS .................................................................................................................. 21
From DAS to NAS .............................................................................................................. 23
OneFS Operating System .................................................................................................. 25

PowerScale Physical Architecture ......................................................................... 27


PowerScale Physical Architecture...................................................................................... 28
Module Objectives ............................................................................................................. 29
PowerScale Hardware Overview........................................................................................ 30
PowerScale Networking Architecture ................................................................................. 31
PowerScale Architecture - External Network ...................................................................... 33
PowerScale Architecture - Interconnect ............................................................................. 34

PowerScale Concepts-SSP

Page ii © Copyright 2020 Dell Inc.


Enhanced Connection Management .................................................................................. 36
N + M Data Protection........................................................................................................ 38
FEC Instead of RAID ......................................................................................................... 39
File Striping Example ......................................................................................................... 40

PowerScale Nodes ................................................................................................... 41


PowerScale Nodes ............................................................................................................ 42
Module Objectives ............................................................................................................. 43
PowerScale Nodes Overview ............................................................................................. 44
PowerScale Family ............................................................................................................ 45
Gen 6 Hardware Components............................................................................................ 48
Gen 6.5 Hardware Components......................................................................................... 50
Node Interconnectivity ....................................................................................................... 52
Quick Scalability ................................................................................................................ 54
Additional Features: Self-Encrypting Drives (SEDs) ........................................................... 55

PowerScale OneFS Operating System ................................................................... 56


PowerScale OneFS Operating System .............................................................................. 57
Module Objectives ............................................................................................................. 58
OneFS - Distributed Clustered File System ........................................................................ 59
Benefits of OneFS.............................................................................................................. 61
Multiprotocol File Access ................................................................................................... 62
Authentication .................................................................................................................... 63
Policy-Based Automation ................................................................................................... 64
Management Interfaces ..................................................................................................... 65
Built-In Administration Roles .............................................................................................. 67
Secure Remote Services ................................................................................................... 68

Data Management and Security .............................................................................. 69


Data Management and Security ......................................................................................... 70
Module Objectives ............................................................................................................. 71
Data Distribution Across Cluster ........................................................................................ 72
Data IO Optimization.......................................................................................................... 73

PowerScale Concepts-SSP

Internal Use - Confidential


© Copyright 2020 Dell Inc. Page iii
Data Protection for Simultaneous Failures ......................................................................... 74
User Quotas for Capacity Management ............................................................................. 75
Deduplication for Data Efficiency ....................................................................................... 76
Data Visibility and Analytics ............................................................................................... 77
Data Integrity - FEC Protection .......................................................................................... 78
Data Resiliency - Snapshots .............................................................................................. 79
Data Recovery - Backup .................................................................................................... 80
Data Recovery - Replication .............................................................................................. 81
Data Retention ................................................................................................................... 83

PowerScale and Big Data ........................................................................................ 84


PowerScale and Big Data .................................................................................................. 85
Module Objectives ............................................................................................................. 86
What Is Big Data? .............................................................................................................. 87
Big Data - Volume, Velocity, Variety................................................................................... 88
Big Data Challenges: Volume ............................................................................................ 90
Big Data Challenges: Velocity ............................................................................................ 91
Big Data Challenges: Variety ............................................................................................. 93
Big Data Positioning of PowerScale ................................................................................... 94
PowerScale OneFS: Scale-Out Data Lake ......................................................................... 95
PowerScale CloudPools..................................................................................................... 96
PowerScale and Edge-to-Core-to-Cloud ............................................................................ 98

Course Summary ..................................................................................................... 99


Course Summary ............................................................................................................. 100

Appendix ............................................................................................... 101

PowerScale Concepts-SSP

Page iv © Copyright 2020 Dell Inc.


Prerequisite Skills

PowerScale Concepts-SSP

© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 1
Prerequisite Skills

Prerequisite Skills

PowerScale Concepts-SSP

Internal Use - Confidential


Page 2 © Copyright 2020 Dell Inc.
Prerequisite Skills

Prerequisite Skills

To understand the content and successfully complete this course, a student must
have a suitable knowledge base or skill set. The student must have an
understanding of:
• Networking fundamentals such as TCP/IP, DNS and routing
• An introduction to storage such as NAS and SAN differences and basic storage
principles and features
• Installation process of an PowerScale cluster

PowerScale Concepts-SSP

© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 3
Rebranding - Isilon is now PowerScale

Rebranding - Isilon is now PowerScale

PowerScale Concepts-SSP

Page 4 © Copyright 2020 Dell Inc.


Rebranding - Isilon is now PowerScale

Rebranding - Isilon is now PowerScale

Important: In mid-2020 Isilon launched a new hardware platform, the


F200 and F600 branded as PowerScale. Over time the Isilon brand
will convert to the new platforms PowerScale branding. In the
meantime you will continue to see Isilon and PowerScale used
interchangeably, including within this course and any lab activities.
OneFS CLI isi commands, command syntax, and man pages may
have instances of "Isilon".
Videos associated with the course may still use the "Isilon" brand.
Resources such as white papers, troubleshooting guides, other
technical documentation, community pages, blog posts, and others
will continue to use the "Isilon" brand.
The rebranding initiative is an iterative process and rebranding all
instances of "Isilon" to "PowerScale" may take some time.

PowerScale Concepts-SSP

© Copyright 2020 Dell Inc. Page 5


PowerScale Solutions - Internal

PowerScale Solutions - Internal

PowerScale Concepts-SSP

Internal Use - Confidential


Page 6 © Copyright 2020 Dell Inc.
PowerScale Solutions - Internal

PowerScale Solutions - Internal

The graphic shows the PowerScale Solutions Expert certification track. You can
leverage the Dell Technologies Proven Professional program to realize your full
potential. A combination of technology-focused and role-based training and exams
to cover concepts and principles as well as the full range of Dell Technologies'
hardware, software, and solutions. You can accelerate your career and your
organization’s capabilities.

PowerScale Solutions

PowerScale Advanced Administration (VC,C)

PowerScale Advanced Disaster Recovery (VC,C)

(Knowledge and Experience based Exam)

Implementation Engineer Technology Architect Platform Engineer

PowerScale Concepts (ODC)


PowerScale Concepts PowerScale Concepts
PowerScale Hardware Concepts (ODC)
(ODC) (ODC)
PowerScale Hardware Installation (ODC)
PowerScale Administration PowerScale Solutions Design
PowerScale Hardware Maintenance (ODC)
(ODC,VC,C) (ODC)
PowerScale Implementation (ODC)

Information Storage and Management (ODC, VC, C)

(C) - Classroom

(VC) - Virtual Classroom

(ODC) - On Demand Course

For more information, visit: http://dell.com/certification

PowerScale Concepts-SSP

© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 7
Course Objectives

Course Objectives

PowerScale Concepts-SSP

Internal Use - Confidential


Page 8 © Copyright 2020 Dell Inc.
Course Objectives

Course Objectives

After completion of this course, you will be able to:


→ Compare structured and unstructured data.
→ Describe the PowerScale physical architecture.
→ Discuss nodes workflow application, node model details, and adding new
nodes to the cluster.
→ Describe the PowerScale OneFS operating system.
→ Explain data management and security in PowerScale.
→ Discuss PowerScale with a Big Data solution.

PowerScale Concepts-SSP

© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 9
Data Storage Overview

Data Storage Overview

PowerScale Concepts-SSP

Internal Use - Confidential


Page 10 © Copyright 2020 Dell Inc.
Data Storage Overview

Data Storage Overview

PowerScale Concepts-SSP

© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 11
Data Storage Overview

Module Objectives

After completing this lesson, you will be able to:


• Explain data storage evolution.
• Discuss the two types of data storage.
• Explain the journey of DAS to NAS.

PowerScale Concepts-SSP

Internal Use - Confidential


Page 12 © Copyright 2020 Dell Inc.
Data Storage Overview

Storage Evolution

The web version of this content contains an interactive activity.

PowerScale Concepts-SSP

© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 13
Data Storage Overview

Types of Data Storage

During the data storage evolution, two types of data developed: structured data and
unstructured data. PowerScale specializes in storing unstructured data.

Structured Data Unstructured Data

• Resides in fixed field of records or files • Does not reside in fixed model

• Requires defined data types, access, and


processes • Does not exist in typical row/column
format
• Most often in relational database
• Example - photos, documents and
• Example - records or files, census presentations
records, economic catalog(s), phone
director(ies), customer contact records,
spreadsheets etc.

Note: 80 – 90% of digital data is unstructured.

PowerScale Concepts-SSP

Internal Use - Confidential


Page 14 © Copyright 2020 Dell Inc.
Data Storage Overview

Block-Based Data and File-Based Data

Block-based data
• Sequence of bytes at fixed length
• Single piece of file or whole file

• Best for high input/output and low latency

• Associated with structured data

File-based data
• Discrete unit of information defined by application or created by user

• Only useful as complete file


• Too large for database apps and high I/O

• Associated with unstructured data

PowerScale Concepts-SSP

© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 15
Data Storage Overview

Digital Transformation

Digital Transformation (DX) has become a ubiquitous component of nearly every


organization’s strategic plan over the last few years. DX-related emerging
technologies will have profound effects on the means of production and will
transform the way consumers interact with every organization in the future.

IDC projects, that through 2022, 75% of successful digital strategies will be built by
a transformed IT organization, with modernized and rationalized infrastructure,
applications, and data architectures.

[...] within the next four years, the global economy will finally reach digital
supremacy, with more than half of Gross domestic product (GDP)Gross domestic
product (GDP) is a monetary measure of the market value of all the final goods and
services produced in a specific time period. - IDC FutureScape1

At the same time, many organizations still struggle to tactically apply DX learnings
to their own business.

1IDC FutureScape: Worldwide IT Industry 2020 Predictions: October 2019, IDC


#US45599219

PowerScale Concepts-SSP

Internal Use - Confidential


Page 16 © Copyright 2020 Dell Inc.
Data Storage Overview

Data Storage: Ever Changing and Ever Growing

With unstructured data being the majority of data storage growth, a solution was
needed. An International Data Corporation (IDC) study published in 2018 showed
that the amount of digital data created, captured, and replicated worldwide grew
exponentially. This finding was based on the proliferation of then-new technologies
such as Voice over IP, RFID, smartphones, and consumer use of GPS. Also, the
continuance of data generators such as digital cameras, HD TV broadcasts, digital
games, ATMs, email, videoconferencing, medical imaging, and so on.

PowerScale Concepts-SSP

© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 17
Data Storage Overview

Two Types of NAS: Scale-Up and Scale-Out

PowerScale clusters are a NAS solution. There are two types of NAS architectures;
scale-up and scale-out.

Scale-Up

• With a scale-up2 platform, if more storage is needed, another independent NAS


system is added to the network.
• A scale-up solution3 has controllers that connect to trays of disks and provide
the computational throughput.
• Traditional NAS is great for specific types of workflows, especially those
applications that require block-level access.

2Scale-up storage is the traditional architecture that is dominant in the enterprise


space. High performance, high availability single systems that have a fixed capacity
ceiling characterize scale-up.

3The two controllers can run active/active or active-passive. For more capacity,
add another disk array. Each of these components is added individually. As more
systems are added, NAS sprawl becomes an issue.

PowerScale Concepts-SSP

Internal Use - Confidential


Page 18 © Copyright 2020 Dell Inc.
Data Storage Overview

Controller with
disk shelves

Independent systems on
network - separate points
of management
Client
s

Structured or
Unstructured storage

Additional storage - Usually


restricted to tens or hundreds of
TBs

Scale-Out

• With a clustered NAS solutions, or scale-out architecture, all the NAS boxes, or
PowerScale nodes, belong to a unified cluster with a single point of
management.
• In a scale-out solution4, the computational throughput, disks, disk protection,
and management are combined and exist for a single cluster.

4Not all clustered NAS solutions are the same. Some vendors overlay a
management interface across multiple independent NAS boxes. This gives a
unified management interface, but does not unify the file system. While this
approach does ease the management overhead of traditional NAS, it still does not
scale well.

PowerScale Concepts-SSP

© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 19
Data Storage Overview

Unstructured storage

1000+ PBS

Clients

Adding storage, adds


compute and bandwidth

PowerScale Concepts-SSP

Internal Use - Confidential


Page 20 © Copyright 2020 Dell Inc.
Data Storage Overview

Scale-Out NAS

Scale-out NAS5 is now a mainstay in most data center environments. The next
wave of scale-out NAS innovation has enterprises embracing the value6 of NAS
and adopting it as the core of their infrastructure.

5The PowerScale scale-out NAS storage platform combines modular hardware


with unified software to harness unstructured data. Powered by the OneFS
operating system, a PowerScale cluster delivers a scalable pool of storage with a
global namespace.

6 Enterprises want to raise the standard on enterprise grade resilience, with a no


tolerance attitude toward data loss and data unavailable situations and support for
features to simplify management. Organizations see massive scale and
performance with smaller data center rack footprints that the performance-centric
workloads drives.

PowerScale Concepts-SSP

© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 21
Data Storage Overview

1: The unified software of the platform provides centralized web-based and


command-line administration to manage the following features:

• A cluster that runs a distributed file system.


• Scale-out nodes that add capacity and performance.
• Storage options that manage files and tiering.
• Flexible data protection and high availability.
• Software modules that control costs and optimize resources.

PowerScale Concepts-SSP

Internal Use - Confidential


Page 22 © Copyright 2020 Dell Inc.
Data Storage Overview

From DAS to NAS

DAS

In the early days of computer data, corporations stored


RAID
data on hard drives in a server. The intellectual property
of the company depended entirely on the continuous
functionality of hard drive. Thus, to minimize risk,
DAS
corporations mirrored the data on a Redundant Array of
Independent Disks (RAID). RAID disks were directly
attached to a server so that the server thought the hard
drives were part of it. This technique is called Direct Attached Storage (DAS).

SAN

Volume Manager RAID

SAN

As applications proliferated, soon there were many servers, each with its own DAS.
This worked fine, with some drawbacks. If one server’s DAS was full while another
server’s DAS was half empty, the empty DAS couldn’t share its space with the full
DAS. Due to this limitation with DAS, SAN was introduced which effectively utilized
volume manager and RAID.

NAS

File System Volume Manager RAID

NAS

SAN was set up for servers, not personal computers (PCs). PCs worked differently
from the storage file server and the network communications in PCs, only
communicate from one file system to another file system. The breakthrough came
when corporations put employee computers on the network, and added to the

PowerScale Concepts-SSP

© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 23
Data Storage Overview

storage a file system to communicate with users. From this, Network Attached
Storage (NAS) was born.

NAS works pretty well, but there is room for improvement. For example, the server
is spending as much time servicing employee requests as it is doing the application
work it was meant for. The file system doesn’t know where data is supposed to go,
because that’s the volume manager’s job. The volume manager doesn’t know how
the data is protected; that’s RAID’s job. If high-value data needs more protection
than other data, you need to migrate the data to a different volume that has the
protection level that data needs. So there is opportunity to improve NAS.

PowerScale Concepts-SSP

Internal Use - Confidential


Page 24 © Copyright 2020 Dell Inc.
Data Storage Overview

OneFS Operating System

With traditional NAS systems the file system7, volume manager8, and the
implementation of RAID9 are all separate entities.

OneFS is the operating system and the underlying file system that drives and
stores data. OneFS is a single file system that performs the duties of the volume
manager and applies protection.
• Creates a single file system for the cluster.10
• Volume manager and protection.11

7The file system is responsible for the higher-level functions of authentication and
authorization.

8 The volume manager controls the layout of the data.

9 RAID controls the protection of the data.

10As nodes are added, the file system grows dynamically and content is
redistributed.

11The PowerScale scale-out NAS storage platform combines modular hardware


with unified software to harness unstructured data. A PowerScale cluster delivers a
scalable pool of storage with a global namespace.

PowerScale Concepts-SSP

© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 25
Data Storage Overview

• Data shared across cluster.12


• Scale resources.13

12Because all information is shared among nodes, the entire file system is
accessible by clients connecting to any node in the cluster.

13Each PowerScale storage node contains globally coherent RAM, meaning that,
as a cluster becomes larger, it also becomes faster. When a node is added, the
performance scales linearly.

PowerScale Concepts-SSP

Internal Use - Confidential


Page 26 © Copyright 2020 Dell Inc.
PowerScale Physical Architecture

PowerScale Physical Architecture

PowerScale Concepts-SSP

© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 27
PowerScale Physical Architecture

PowerScale Physical Architecture

PowerScale Concepts-SSP

Internal Use - Confidential


Page 28 © Copyright 2020 Dell Inc.
PowerScale Physical Architecture

Module Objectives

After completing this lesson, you will be able to:


• Explain PowerScale data storage solution.
• Discuss PowerScale physical architecture.

PowerScale Concepts-SSP

© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 29
PowerScale Physical Architecture

PowerScale Hardware Overview

Nodes combine to create a cluster. Each cluster


behaves as a single, central storage system.
PowerScale is designed for large volumes of
unstructured data. PowerScale has multiple servers
that are called nodes.

PowerScale includes all-flash, hybrid, and archive


storage systems.
Dual chassis, 8 node Generation
6 (or Gen 6) cluster
Gen 6 highlights.14
Gen 6.5 highlights.15

14The Gen 6 platform reduces the data center rack footprints with support for four
nodes in a single 4U chassis. It enables enterprise to take on new and more
demanding unstructured data applications. The Gen 6 can store, manage, and
protect massively large datasets with ease. With the Gen 6, enterprises can gain
new levels of efficiency and achieve faster business outcomes.

15 The ideal use cases for Gen 6.5 (F200 and F600) is remote office/back office,
factory floors, IoT, and retail. Gen 6.5 also targets smaller companies in the core
verticals, and partner solutions, including OEM. The key advantages are low entry
price points and the flexibility to add nodes individually, as opposed to a chassis/2
node minimum for Gen 6.

PowerScale Concepts-SSP

Internal Use - Confidential


Page 30 © Copyright 2020 Dell Inc.
PowerScale Physical Architecture

PowerScale Networking Architecture

OneFS supports standard network communication protocols IPv4 and IPv6.


PowerScale nodes include several external Ethernet connection options, providing
flexibility for a wide variety of network configurations16.

Network: There are two types of networks that are associated with a cluster:
internal and external.

Front-end, External Network

Client/Application PowerScale Storage


Layer Layer

Ethernet

Protocols: NFS, SMB, S3, Ethernet Backend communication


HTTP, FTP, HDFS, SWIFT Layer (PowerScale internal)

F200 cluster showing supported frontend protocols.

Clients connect to the cluster using Ethernet connections17 that are available on all
nodes.

16In general, keeping the network configuration simple provides the best results
with the lowest amount of administrative overhead. OneFS offers network
provisioning rules to automate the configuration of additional nodes as clusters
grow.

17Because each node provides its own Ethernet ports, the amount of network
bandwidth available to the cluster scales linearly.

PowerScale Concepts-SSP

© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 31
PowerScale Physical Architecture

The complete cluster is combined with hardware, software, networks in the


following view:

Back-end, Internal Network

OneFS supports a single cluster18 on the internal network. This back-end network,
which is configured with redundant switches for high availability, acts as the
backplane for the cluster.19

18 All intra-node communication in a cluster is performed across a dedicated


backend network, comprising either 10 or 40 GbE Ethernet, or low-latency QDR
InfiniBand (IB).

19 This enables each node to act as a contributor in the cluster and isolating node-
to-node communication to a private, high-speed, low-latency network. This back-
end network utilizes Internet Protocol (IP) for node-to-node communication.

PowerScale Concepts-SSP

Internal Use - Confidential


Page 32 © Copyright 2020 Dell Inc.
PowerScale Physical Architecture

PowerScale Architecture - External Network

The external network provides connectivity for clients over standard file-based
protocols. It supports link aggregation, and network scalability is provided through
software in OneFS. A Gen 6 node has to 2 front-end ports - 10 GigE, 25 GigE, or
40 GigE, and one 1 GigE port for management. Gen 6.5 nodes have 2 front-end
ports - 10 GigE, 25 GigE, or 100 GigE. In the event of a Network Interface
Controller (NIC) or connection failure, clients do not lose their connection to the
cluster. For stateful protocols, such as SMB and NFSv4, this prevents client-side
timeouts and unintended reconnection to another node in the cluster. Instead,
clients maintain their connection to the logical interface and continue operating
normally. Support for Continuous Availability (CA) for stateful protocols like SMB
and NFSv4 is available with OneFS 8.0.

PowerScale Concepts-SSP

© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 33
PowerScale Physical Architecture

PowerScale Architecture - Interconnect

Back-end Network

• The back-end network is a private PowerScale network that is used for


intercluster communication.
• It is a distributed connectivity.
• The back-end network supports redundancy for high availability.
• With OneFS 8.2 and later versions, the back-end network may have a leaf-
spine network. Leaf-spine is a two level hierarchy where nodes connect to leaf
switches, and leaf switches connect to spine switches.

InfiniBand

• InfiniBand is a high-speed unmanaged fabric. It supports both Gen 5 and Gen 6


nodes.
• InfiniBand with Gen 6 nodes is only used when Gen 6 nodes are added to a
cluster that has, or had, older generation nodes.
• The InfiniBand switches are provided with PowerScale and they come with
range of sizes.

Ethernet

• An Ethernet back-end is a high speed managed fabric with limited monitoring


capability.
• Ethernet switches only support Gen 6 nodes.
• The minimum size of the switch is 24 ports.

PowerScale Concepts-SSP

Internal Use - Confidential


Page 34 © Copyright 2020 Dell Inc.
PowerScale Physical Architecture

Leaf and Spine

• Two level hierarchy.


• Cluster nodes connect to leaf switches which communicate with each other via
the spine switches.
• Switches are not interconnected - switches of the same type (leaf or spine) do
not connect to one another
• Each leaf switch connects with all spine switch(es)
• All leaf switches have the same number of uplinks to the spine switches.

PowerScale Concepts-SSP

© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 35
PowerScale Physical Architecture

Enhanced Connection Management

The clients can access the cluster using DNS, and the enhanced functionality20
provides connection distribution policies as shown in the graphic. Also, they provide
continuous availability21 (CA) capabilities.

1 4
2
3

1: Determines the average throughput on each available network interface and


selects the network interface with the lowest network interface load.

20The enhanced functionality includes continuous availability for SMBv3. This


feature enables SMBv3 with CA, and NFSv4 with CA can dynamically move to
another node in the event the node they are connected goes down.

21The continuous availability feature applies to Microsoft Windows 8, Windows 10,


and Windows Server 2012 R2 clients. This feature is part of nondisruptive
operation initiative of PowerScale to give customers more options for continuous
work and less down time. The CA option enables seamless movement from one
node to another and no manual intervention on the client side.

PowerScale Concepts-SSP

Internal Use - Confidential


Page 36 © Copyright 2020 Dell Inc.
PowerScale Physical Architecture

2: Determines the average CPU utilization on each available network interface and
selects the network interface with lightest processor usage.

3: Selects the next available network interface on a rotating basis. This selection is
the default method. Without a SmartConnect license for advanced settings, this is
the only method available for load balancing.

4: Determines the number of open TCP connections on each available network


interface and selects the network interface with the fewest client connections.

PowerScale Concepts-SSP

© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 37
PowerScale Physical Architecture

N + M Data Protection

OneFS sets parity bits, also called FEC protection. In the example below, using the
parity bit (green), OneFS determines the missing pieces.

Here, if blue + yellow = green, the missing pieces are identified using the parity
bits.

If blue + yellow = green

Parity bit

Then

PowerScale Concepts-SSP

Internal Use - Confidential


Page 38 © Copyright 2020 Dell Inc.
PowerScale Physical Architecture

FEC Instead of RAID

FEC enables the customer to choose the number of bits of parity to implement.
One bit of parity for many disks is known as N+1; two parity points for many disks
are known as N+2, and so on.

FEC with N+1 Protection

With the N+1 protection, data is 100% available even if a drive or a node fails.

Node 1 Node 2 Node 3 Node 4

Failure

FEC with N+2, N+3, and N+4 Protection

With N+2, N+3, and N+4 protection, data is 100% available if multiple drives or
nodes fail.

Node 1 Node 2 Node 3 Node 4

Failure Failure

Node 5 Node 6 Node 7 Node 8

PowerScale Concepts-SSP

© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 39
PowerScale Physical Architecture

File Striping Example

During the write operation, with OneFS, the file from the client is striped across the
nodes. The system breaks the file-based data into smaller logical sections called
stripe units. The smallest element in a stripe unit is 8 kilobytes and each stripe unit
is 128 kilobytes, or sixteen 8 kilobytes blocks. If the datafile is larger than 128
kilobytes, the next part of the file is written to a second node. If the file is larger than
256 kilobytes, the third part is written to a third node, and so on. The graphic
illustrates a 384-kilobytes file with 3 stripe units and 1 FEC unit.

File
Stripe Unit
FEC

Node 3 Node 4
Node 1 Node 2

Leaf
Leaf

Spine

PowerScale Concepts-SSP

Internal Use - Confidential


Page 40 © Copyright 2020 Dell Inc.
PowerScale Nodes

PowerScale Nodes

PowerScale Concepts-SSP

© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 41
PowerScale Nodes

PowerScale Nodes

PowerScale Concepts-SSP

Internal Use - Confidential


Page 42 © Copyright 2020 Dell Inc.
PowerScale Nodes

Module Objectives

After completing this lesson, you will be able to:


• Explain PowerScale nodes.
• Discuss the Gen 6 and Gen 6.5 hardware design, PowerScale product families,
and additional features.

PowerScale Concepts-SSP

© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 43
PowerScale Nodes

PowerScale Nodes Overview

Generation 6 (or Gen 6) chasis and Generation 6.5 nodes

The design goal for the PowerScale nodes are to keep the simple ideology of NAS,
provide the agility of the cloud, and the cost of commodity.

Storage nodes are peers.

The Gen 6x family has different offerings that are based on the need for
performance and capacity. As Gen 6 is a modular architecture, you can scale out
compute and capacity separately. All the nodes are powered by OneFS.

PowerScale Concepts-SSP

Internal Use - Confidential


Page 44 © Copyright 2020 Dell Inc.
PowerScale Nodes

PowerScale Family

Click each tab to learn about the different offerings that Gen 6 family provides.

F-Series

The F-series nodes sit at the top of both


performance and capacity with all-flash arrays
for ultra compute and high capacity. The all
flash platforms can accomplish 250-300k
protocol operations per chassis, and get 15
GB/s aggregate read throughput from the
chassis. Even when the cluster scales, the latency remains predictable.

• F80022
• F81023
• F60024
• F20025

22 The F800 is suitable for workflows that require extreme performance and
efficiency.

23 The F810 is suitable for workflows that require extreme performance and
efficiency. The F810 also provides high-speed inline data deduplication and in-line
data compression. It delivers up to 3:1 efficiency, depending on your specific
dataset and workload.

24 Ideal for small, remote clusters with exceptional system performance for small
office/remote office technical workloads.

PowerScale Concepts-SSP

© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 45
PowerScale Nodes

H-Series

After F-series nodes, next in terms of computing power are the H-series nodes.
These are hybrid storage platforms that are highly flexible and strike a balance
between large capacity and high-performance storage to provide support for a
broad range of enterprise file workloads.

• H40026
• H50027
• H560028
• H60029

25Ideal for low cost all-flash node pool for existing Gen6 clusters. Ideal for small,
remote clusters.

26The H400 provides a balance of performance, capacity and value to support a


wide range of file workloads. It delivers up to 3 GB/s bandwidth per chassis and
provides capacity options ranging from 120 TB to 720 TB per chassis.

27The H500 is a versatile hybrid platform that delivers up to 5 GB/s bandwidth per
chassis with a capacity ranging from 120 TB to 720 TB per chassis. It is an ideal
choice for organizations looking to consolidate and support a broad range of file
workloads on a single platform.

28The H5600 combines massive scalability – 960 TB per chassis and up to 8 GB/s
bandwidth in an efficient, highly dense, deep 4U chassis. The H5600 delivers inline
data compression and deduplication. It is designed to support a wide range of
demanding, large-scale file applications and workloads.

29The H600 is Designed to provide high performance at value, delivers up to


120,000 IOPS and up to 12 GB/s bandwidth per chassis. It is ideal for high

PowerScale Concepts-SSP

Internal Use - Confidential


Page 46 © Copyright 2020 Dell Inc.
PowerScale Nodes

A-Series

The A-series nodes namely have lesser compute power compared to other nodes
and are designed for data archival purposes. The archive platforms can be
combined with new or existing all-flash and hybrid storage systems into a single
cluster that provides an efficient tiered storage solution.

• A20030
• A200031

performance computing (HPC) workloads that don’t require the extreme


performance of all-flash.

30The A200 is an ideal active archive storage solution that combines near-primary
accessibility, value and ease of use.

31The A2000 is an ideal solution for high density, deep archive storage that
safeguards data efficiently for long-term retention.

PowerScale Concepts-SSP

© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 47
PowerScale Nodes

Gen 6 Hardware Components

Gen 6 requires a minimum of four nodes to form a cluster. You must add nodes to
the cluster in pairs.

The chassis holds four compute nodes and 20 drive sled slots.

Both compute modules in a node pair power-on immediately when one of the
nodes is connected to a power source.

Gen 6 chassis

1 10 9

2 8
4
6

3
5 7

1: The compute module bay of the two nodes make up one node pair. Scaling out a
cluster with Gen 6 nodes is done by adding more node pairs.

2: Each Gen 6 node provides two ports for front-end connectivity. The connectivity
options for clients and applications are 10 GbE and 40 GbE.

3: Each node can have 1 or 2 SSDs that are used as L3 cache, global namespace
acceleration (GNA), or other SSD strategies.

4: Each Gen 6 nodes provides two ports for back-end connectivity. A Gen 6 node
supports 10 GbE, 40 GbE, and InfiniBand.

5: Power supply unit - Peer node redundancy: When a compute module power
supply failure takes place, the power supply from the peer compute module in the
node pair will temporarily provide power to both nodes.

PowerScale Concepts-SSP

Internal Use - Confidential


Page 48 © Copyright 2020 Dell Inc.
PowerScale Nodes

6: Each Node has five drive sleds. Depending on the length of the chassis and type
of the drive, each node can handle up to 30 drives or as few as 15.

7: Disks in a sled are all the same type.

8: The sled can be either a short sled or a long sled. The types are:

• Long Sled - 4 drives of size 3.5"


• Short Sled - 3 drives of size 3.5"
• Short Sled - 3 or 6 drives of size 2.5"

9: The chassis comes in two different depths, the normal depth is about 37 inches
and the deep chassis is about 40 inches.

10: Large journals offer flexibility in determining when data should be moved to the
disk. Each node has a dedicated M.2 vault drive for the journal. A node mirrors
their journal to its peer node. The node writes the journal contents to the vault when
a power loss occurs. A backup battery helps maintain power while data is stored in
the vault.

PowerScale Concepts-SSP

© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 49
PowerScale Nodes

Gen 6.5 Hardware Components

Gen 6.5 requires a minimum of three nodes to form a cluster. You can add single
nodes to the cluster. The F600 and F200 are a 1U form factor and based on the
R640 architecture.

Graphic shows F200 or F600 node pool.

8 5

1: Scaling out an F200 or an F600 node pool only requires adding one node.

2: For frontend connectivity, the F600 uses the PCIe slot 3.

3: Each F200 node has four SAS SSDs. Each F600 node has 8 NVMe SSDs.

4: Each Gen F200 and F600 node provides two ports for backend connectivity. The
PCIe slot 1 is used.

5: Redundant power supply units - When a power supply fails, the secondary
power supply in the node provides power. Power is supplied to the system equally
from both PSUs when the Hot Spare feature is disabled. Hot Spare is configured
using the iDRAC settings.

6: Disks in a node are all the same type.

7: The nodes come in two different 1U models. The graphic shows the F200.

8: The F200 frontend connectivity uses the rack network daughter card (rNDC).

PowerScale Concepts-SSP

Internal Use - Confidential


Page 50 © Copyright 2020 Dell Inc.
PowerScale Nodes

Important: The F600 nodes have a 4-port 1 GB NIC in the rNDC slot.
The NIC is not allocated to any OneFS function.

PowerScale Concepts-SSP

© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 51
PowerScale Nodes

Node Interconnectivity

1: Backend ports int-a and int-b. The int-b port is the upper port. Gen 6 backend
ports are identical for InfiniBand and Ethernet, and cannot be identified by looking
at the node. If Gen 6 nodes are integrated in a Gen 5 or earlier cluster, the backend
will use InfiniBand. Note that there is a procedure to convert an InfiniBand backend
to Ethernet if the cluster no longer has pre-Gen 6 nodes.

2: PowerScale nodes with different backend speeds can connect to the same
backend switch and not see any performance issues. For example, an environment
has a mixed cluster where A200 nodes have 10 GbE backend ports and H600
nodes have 40 GbE backend ports. Both node types can connect to a 40 GbE
switch without effecting the performance of other nodes on the switch. The 40 GbE
switch provides 40 GbE to the H600 nodes and 10 GbE to the A200 nodes.

3: Gen 6.5 backend ports use the PCIe slot.

4: There are two speeds for the backend Ethernet switches, 10 GbE and 40 GbE.
Some nodes, such as archival nodes, might not need to use all of a 10 GbE port
bandwidth while other workflows might need the full utilization of the 40 GbE port
bandwidth. The Ethernet performance is comparable to InfiniBand so there should
be no performance bottlenecks with mixed performance nodes in a single cluster.
Administrators should not see any performance differences if moving from
InfiniBand to Ethernet.

Gen 6 nodes can use either an InfiniBand or Ethernet switch on the backend.
InfiniBand was designed as a high-speed interconnect for high-performance

PowerScale Concepts-SSP

Internal Use - Confidential


Page 52 © Copyright 2020 Dell Inc.
PowerScale Nodes

computing, and Ethernet provides the flexibility and high speeds that sufficiently
support the PowerScale internal communications.

Gen 6.5 only supports Ethernet. All new, PowerScale clusters support Ethernet
only.

Warning: With Gen 6, do not plug a backend Ethernet topology into a


backend InfiniBand NIC. If you plug Ethernet into the InfiniBand NIC,
it switches the backend NIC from one mode to the other and will not
come back to the same state.

PowerScale Concepts-SSP

© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 53
PowerScale Nodes

Quick Scalability

A PowerScale cluster expansion takes 60 seconds. The primary purpose of NAS


approach is the “scale-out” part. An administrator can expand the storage by
adding a new node. In PowerScale, once the node is racked and cabled, adding it
to the cluster takes just a short few minutes. That is because OneFS’ policies will
automatically discover the node, set up addresses for the node, incorporate the
node into the cluster, and begin rebalancing capacity on all nodes to take
advantage of the new space. The node fully configures itself, is ready for new data
writes, and begins taking on data from the other nodes to Auto-Balance the entire
cluster.

The space available before scaling

The space available after scaling

Joined the cluster Fully configured

Ready to write

PowerScale Concepts-SSP

Internal Use - Confidential


Page 54 © Copyright 2020 Dell Inc.
PowerScale Nodes

Additional Features: Self-Encrypting Drives (SEDs)

• Data At Rest Encryption (DARE)


• The term “data at rest” refers to any data
sitting on your drives. DARE is used for
confidential or sensitive information.
• Self-Encrypting drives securely store confidential
data over its lifetime.
• It is used in regulated verticals.
• Federal governments
• Financial services
• Healthcare (HIPPA)
• Self-Encrypting drives enable retirement of
hardware without data compromise.
• PowerScale implements DARE using SEDs.

PowerScale Concepts-SSP

© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 55
PowerScale OneFS Operating System

PowerScale OneFS Operating System

PowerScale Concepts-SSP

Page 56 © Copyright 2020 Dell Inc.


PowerScale OneFS Operating System

PowerScale OneFS Operating System

PowerScale Concepts-SSP

© Copyright 2020 Dell Inc. Page 57


PowerScale OneFS Operating System

Module Objectives

After completing this lesson, you will be able to:


• Explain OneFS.
• Discuss authentication and access.

PowerScale Concepts-SSP

Page 58 © Copyright 2020 Dell Inc.


PowerScale OneFS Operating System

OneFS - Distributed Clustered File System

The key to PowerScale scale-out NAS is the architecture of OneFS. Shown is a


Gen 6 cluster that can scale out to 66 nodes with a single spine switch for each
backend network.
• Spans nodes and runs on all nodes32.
• Grows dynamically33.
• Supports variable fault tolerance levels
• Reed-Solomon FEC.34

32 No master node that controls the cluster.

33When nodes are added, OneFS redistributes the content to use the resources of
the entire cluster.

34 As the system writes the data, it also protects the data.

PowerScale Concepts-SSP

© Copyright 2020 Dell Inc. Page 59


PowerScale OneFS Operating System

• FlexProtect35.
• Runs on all nodes
• Each node is a peer36.
• Prevents bottlenecking37.
• A copy of OneFS is on every cluster node.
• 10 GBE, 40 GbE (Gen 6): 10 GBE, 25 GBE, 100 GBE(Gen 6.5) and infiniBand
handle all intracluster communications.

35 Creates an n-way, redundant fabric that scales as nodes are added to the
cluster, providing 100% data availability even with four simultaneous node failures.

36 Each node shares the management workload and acts independently as a point
of access for incoming data request.

37 When there is a large influx of simultaneous requests.

PowerScale Concepts-SSP

Page 60 © Copyright 2020 Dell Inc.


PowerScale OneFS Operating System

Benefits of OneFS

The OneFS architecture is designed to optimize processes and applications across


the cluster.
• Concurrency38.
• Shared infrastructure
• Access to resources on any node in the cluster from any other node in the
cluster.
• Performance benefits of parallel processing.
• Improved utilization of resources - compute, disk, memory, networking.
• Because all nodes work together, the more nodes, the more powerful the
cluster gets.

38When a node is added to the cluster, it adds computing power, storage, caching,
and networking resources.

PowerScale Concepts-SSP

© Copyright 2020 Dell Inc. Page 61


PowerScale OneFS Operating System

Multiprotocol File Access

OneFS supports access to the same file using different protocols and
authentication methods simultaneously. SMB clients that authenticate using Active
Directory (AD), and NFS clients that authenticate using LDAP, can access the
same file with their appropriate permissions applied.
• OneFS translates Windows Security Identifiers (SIDS) and UNIX User Identities
(UIDs) into a common identity format.
• Different authentication sources.
• Permissions activities are transparent to client.
• Authenticate against correct source.
• File access behavior as protocol expects.
• Correct permissions applied - stores the appropriate permissions for each
identity or group.

• File and directory permissions


• User and group identities

PowerScale Concepts-SSP

Page 62 © Copyright 2020 Dell Inc.


PowerScale OneFS Operating System

Authentication

Authentication services offer a layer of security by verifying user credentials before


allowing access to the files. Authentication answers the question, “Are you really
who you say you are?”

Ensure that interactions between authentication types are understood before


enabling multiple methods on the cluster.

1
5 2

1: Active Directory (AD): The primary reason for joining the cluster to an AD
domain is to let the AD domain controller perform user and group authentication.

2: Lightweight Directory Access Protocol (LDAP): An advantage of LDAP is the


open nature of its directory services and the ability to use LDAP across many
platforms.

3: Network Information Service (NIS): Sun Microsystem directory access protocol.

4: Local or File Provider: OneFS supports local user and group authentication
using the web administration interface.

5: SSH authentication: SSH multifactor authentication supported.

PowerScale Concepts-SSP

© Copyright 2020 Dell Inc. Page 63


PowerScale OneFS Operating System

Policy-Based Automation

Automatically move data between


different tiers of storage

To run functions, OneFS creates automated policies.


• Repeatable - automated policies make processes repeatable, decreasing the
time spent manually managing the cluster.
• Policies managed throughout the cluster - a change to the configuration is a
change to the configuration on every node in the cluster.
• Executes policies as a cohesive system.
• Policies drive every process.

• Includes the way data is distributed across the cluster and on each node.
• Includes how client connections get distributed among the nodes, when and
how maintenance tasks are run.

PowerScale Concepts-SSP

Page 64 © Copyright 2020 Dell Inc.


PowerScale OneFS Operating System

Management Interfaces

The OneFS management interface is used to perform various administrative and


management tasks on the PowerScale cluster and nodes. Management capabilities
vary based on which interface is used. The different types of management
interfaces in OneFS are:

• Serial Console39
• Web Administration Interface (WebUI)40
• Command Line Interface (CLI)41

39The serial console is used for initial cluster configurations by establishing serial
access to the node designated as node 1.

40The browser-based OneFS web administration interface provides secure access


with OneFS-supported browsers. This interface is used to view robust graphical
monitoring displays and to perform cluster-management tasks.

PowerScale Concepts-SSP

© Copyright 2020 Dell Inc. Page 65


PowerScale OneFS Operating System

• Platform Application Programming Interface (PAPI)42


• Front Panel Display43

41The command-line interface runs isi commands to configure, monitor, and


manage the cluster. Access to the command-line interface is through a secure shell
(SSH) connection to any node in the cluster.

42The PAPI is divided into two functional areas: one area enables cluster
configuration, management, and monitoring functionality, and the other area
enables operations on files and directories on the cluster.

43The Front Panel Display is located on the physical node or chassis. It is used to
perform basic administrative tasks onsite.

PowerScale Concepts-SSP

Page 66 © Copyright 2020 Dell Inc.


PowerScale OneFS Operating System

Built-In Administration Roles

So who is allowed to access and make configuration changes using the cluster
management tools? In addition to the integrated root and admin users, OneFS
provides role-based access control (RBAC). With RBAC, you can define privileges
to customize access to administration features in the OneFS WebUI, CLI, and for
PAPI management.
• Grant or deny access to management features.
Configured user with restricted privileges

Restricted options are not


displayed

Root user privileges

• RBAC
• Set of global admin privileges
• Five preconfigured admin roles
• Zone RBAC (ZRBAC)
• Set of admin privileges specific to an access zone
• Two preconfigured admin roles
• Can create custom roles.
• Assign users to one or more roles.

PowerScale Concepts-SSP

© Copyright 2020 Dell Inc. Page 67


PowerScale OneFS Operating System

Secure Remote Services

Dell Technologies
Support Location

Cluster
Location

If there is an issue with your cluster, there are two types of support available. You
can manually upload logfiles to the Dell Technologies support FTP site, or use
Secure Remote Services.
• Manually FTP upload logfiles
• As needed.
• Support requests logfiles.
• Secure Remote Support
• Broader product support.
• Manual logfile uploads.
• 24x7 remote monitoring - node-by-node basis and sends alerts regarding
the health of devices.
• Allows remote cluster access - requires permission.
• Secure authentication with AES 256-bit encryption and RSA digital
certificates.
• Log files provide detailed information about the cluster activities.
• Remote session that is established through SSH or the WebUI - support
personnel can run scripts that gather diagnostic data about cluster settings and
operations. Data is sent to a secure FTP site where service professionals can
open support cases and troubleshoot on the cluster.

PowerScale Concepts-SSP

Page 68 © Copyright 2020 Dell Inc.


Data Management and Security

Data Management and Security

PowerScale Concepts-SSP

© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 69
Data Management and Security

Data Management and Security

PowerScale Concepts-SSP

Page 70 © Copyright 2020 Dell Inc.


Data Management and Security

Module Objectives

After completing this lesson, you will be able to:


• Explain data distribution, I/O optimization, and protection.
• Discuss quotas and deduplication.
• Define data resiliency, data recovery, and retention.

PowerScale Concepts-SSP

© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 71
Data Management and Security

Data Distribution Across Cluster

Node Pool

- Nodes assigned based on type


Tiers
- Function as single data target
location - Groups of node pools

- Can write data to tier - if


Policy Options SmartPools is licensed, data
written to specific node pool
- Default policy: write anywhere

- Data goes to node pools having


the most available space

Data distribution is how OneFS spreads data across the cluster. Various models of
PowerScale nodes, or node types can be present in a cluster. Nodes are assigned
to node pools based on the model type, number of drives, and the size of the
drives. The cluster can have multiple node pools, and groups of node pools can be
combined to form tiers of storage. Data distributes among the different node pools
that are based on the highest percentage of available space. This means that the
data target can be a pool or a tier anywhere on the cluster.

PowerScale Concepts-SSP

Page 72 © Copyright 2020 Dell Inc.


Data Management and Security

Data IO Optimization

Manage directories or
Manage cluster-wide by files
default
random

concurrent

sequential

Configurable pre-fetch cache

You can optimize data input and output to match the workflows for your business.
By default, optimization is managed cluster-wide, but you can manage individual
directories or individual files. The data access pattern can be optimized for random
access, sequential access, or concurrent access. For example, sequential
optimization has aggressive prefetching. The prefetch, or read ahead, is an
optimization algorithm that attempts to predict what data is needed next, before the
request is made. When clients open larger files, especially streaming formats like
video and audio, OneFS assumes that you will watch minute four of the video after
minute three. Prefetch proactively loads minutes four, five, and sometimes even six
into memory before it is requested. Prefetch delivers those minutes faster than
returning to the hard drive for each request. With OneFS, you can configure the
prefetch cache characteristics to work best with the selected access pattern.

PowerScale Concepts-SSP

© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 73
Data Management and Security

Data Protection for Simultaneous Failures

Performance optimization is the first thing a customer notice about their cluster in
day to day operations. But what does the average administrator notice second?
They notice when a cluster has issues after they notice how great it works. They
want it fast, and they want it to work. That is a reason why data protection is
essential.

Data protection level refers to how many components in a cluster can malfunction
without loss of data.
• Flexible and configurable.
• Virtual hot spare - allocate disk space to hold data as it is rebuilt when a disk
drive fails.
• Select FEC protection by node pool, directory, or file.
• Extra protection creates more FEC stripes, increasing overhead.
• Standard functionality is available in the unlicensed version of SmartPools.

As an example, a research and development department have a node pool that is


dedicated to testing. Because the test data is not production data, the minimal N+1
protection is set. The customer database, however, is a valuable asset. Customer
data is written to a different node pool set and to a higher level of protection such
as, N+4.

PowerScale Concepts-SSP

Page 74 © Copyright 2020 Dell Inc.


Data Management and Security

User Quotas for Capacity Management

You can subdivide capacity usage by assigning storage quotas to users, groups,
and directories.
• Policy-based quota management.
• Nesting - place a quota on a department, and then a smaller quota on each
department user, and a different quota on the department file share.
• Thin provisioning - shows available storage even if capacity is not available.
• Quota types
• Accounting - informational only, can exceed quota.
• Enforcement soft limit - notification sent when exceeded
• Enforcement hard limit - deny writes.
• Customizable quota notifications.
• Requires SmartQuotas license.

PowerScale Concepts-SSP

© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 75
Data Management and Security

Deduplication for Data Efficiency

Deduplication provides an automated way to increase storage efficiency. OneFS


finds duplicate sets of data blocks, and then stores only a single copy of any data
block that is duplicated.
• Consolidates duplicate data blocks.
• Post process - analyzes data that is already stored.
• Block-level deduplication at the 8-K block level on files over 32 KB.
• Directory level granularity.
• Dry-run assessment tool - test drive.
• Requires SmartDedupe license.

PowerScale Concepts-SSP

Page 76 © Copyright 2020 Dell Inc.


Data Management and Security

Data Visibility and Analytics

InsightIQ is a powerful tool that monitors one or more clusters and then presents
data in a robust graphical interface with reports you can export. You can examine
the information and break out specific information you want, and even take
advantage of usage growth and prediction features. InsightIQ offers:
• Monitor system usage - performance and file system analytics.
• Requires a server or VMware system external to cluster.
• Free InsightIQ license.

Powerful multi cluster Graphical presentation with Elaborate drill-down and


monitoring tool. reporting data. breakout capabilities.

PowerScale Concepts-SSP

© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 77
Data Management and Security

Data Integrity - FEC Protection

Each stripe is protected separately with forward error correction (FEC) protection
blocks, or parity. Shown is a 1-megabyte file that is divided into two stripe units with
N+2 protection.
• Protected at data stripe - one or two data or protection stripe units are contained
on a single node for any given data stripe.
• Striped across nodes.
• Variable protection levels - set separately for node pools, directories, or even
individual files.
• Set at node pool, directory, or file.
• High availability is integrated - data is spread onto many drives and multiple
nodes, all ready to help reassemble the data when a component fails.

PowerScale Concepts-SSP

Page 78 © Copyright 2020 Dell Inc.


Data Management and Security

Data Resiliency - Snapshots

Data resiliency is the ability to recover past versions of a file that has changed over
time. Sooner or later, every storage admin gets asked to roll back to a previous
“known good” version of a file. OneFS provides this capability using snapshots.
• File change rollback technology - called snapshots.
• Copy-on-write (CoW) - writes the original blocks to the snapshot version first,
and then writes the data to the file system, incurs a double write penalty but less
fragmentation.
• Redirect-on-write (RoW) - writes changes into available file system space and
then update pointers to look at the new changes, there is no double write
penalty but more fragmentation.
• Policy-based
• Scheduled snapshots
• Policies determine the snapshot schedule, path to the snapshot location,
and snapshot retention periods.
• Deletions happen as part of a scheduled job, or are deleted manually.
• Out of order deletion allowed, but not recommended.
• Some system processes use with no license required.
• Full capability requires SnapshotIQ license.

PowerScale Concepts-SSP

© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 79
Data Management and Security

Data Recovery - Backup

Gen 6 with the fiber channel combo card.

PowerScale supports NDMP for integration with backup applications such as


Symantec, EMC, CommVault, and IBM. A backup application external to the cluster
manages the backup process.
• Backup is managed over the external network in one of two ways.
• Direct to backup device over LAN - slower performance.
• Gen 5 Backup Accelerators or Gen 6 fiber channel combo card.
• NDMP support comes standard.

PowerScale Concepts-SSP

Page 80 © Copyright 2020 Dell Inc.


Data Management and Security

Data Recovery - Replication

Source Target

Replication keeps a copy of data from one cluster on another cluster. OneFS
replicates during normal operations, from one PowerScale cluster to another.
Replication may be from one to one, or from one to many PowerScale clusters.

Cluster-to-cluster synchronization

Cluster-to-cluster synchronization

• Scheduled replication over LAN or WAN.


• PowerScale to PowerScale only.
• One-way replication.

Two replication types

The two types of replication are:

• Copy - new files on the source are copied to the target, while files deleted on
the source remain unchanged on the target.
• Synchronization - only works in one direction and both the source and target
clusters maintain identical file sets, except that files on the target are read-only.

Policy-based synchronization jobs

Per directory or for specific types of data and can set exceptions to include or
exclude specific files.

• Manual start

PowerScale Concepts-SSP

© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 81
Data Management and Security

• On schedule
• When changes made

Bandwidth throttling

Bandwidth throttling - used on replication jobs to optimize resources for high priority
workflows.

PowerScale Concepts-SSP

Page 82 © Copyright 2020 Dell Inc.


Data Management and Security

Data Retention

Data retention is the ability to prevent data from being deleted or modified before
some future date. In OneFS, you can configure data retention at the directory level,
so that different directories can have different retention policies. You can also use
policies to automatically commit certain types of files for retention.
• Two modes of retention
• Enterprise (more flexible) - enable privileged deletes by an administrator.
• Compliance (more secure) - designed to meet SEC regulatory requirements.
Once data is committed to disk, individuals cannot change or delete the data
until the retention clock expires - OneFS prohibits clock changes.
• Compatible with SyncIQ replication.
• Requires SmartLock license.

PowerScale Concepts-SSP

© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 83
PowerScale and Big Data

PowerScale and Big Data

PowerScale Concepts-SSP

Page 84 © Copyright 2020 Dell Inc.


PowerScale and Big Data

PowerScale and Big Data

PowerScale Concepts-SSP

© Copyright 2020 Dell Inc. Page 85


PowerScale and Big Data

Module Objectives

After completing this lesson, you will be able to:


• Identify PowerScale’s Big Data position.
• Define Edge-to-Core-to-Cloud

PowerScale Concepts-SSP

Page 86 © Copyright 2020 Dell Inc.


PowerScale and Big Data

What Is Big Data?

• Big Data is a collection of data so large, diverse, and fast-changing that it is


difficult for traditional technology to efficiently process and manage.
• Big Data has too much volume, velocity, and variety.
• It is difficult to process and store using traditional means.

PowerScale Concepts-SSP

© Copyright 2020 Dell Inc. Page 87


PowerScale and Big Data

Big Data - Volume, Velocity, Variety

The “Three v's”– volume, velocity, and variety – often arrive together. When they
combine, administrators truly feel the need for high performance, higher capacity
storage. The three V's generate the challenges of managing Big Data.

Growing data has also forced an evolution in storage architecture over the years
due to the amount of maintained data. PowerScale is a Big Data solution because
it can handle the volume, velocity, and variety that defines the fundamentals of Big
Data.

1: Challenge: Nonflexible data protection. When you have Big Data volumes of
information to store, it had better be there, dependably. If an organization relies on
RAID to protect against data loss or corruption, the failure of a single disk drive
causes a disproportionate inconvenience. The most popular RAID implementation
scheme allows the failure of only two drives before data loss. (A sizable Big Data
installation easily has more than 1000 individual hard drives, so odds are at least
one drive is down at any time.) The simpler answer is to protect data using a
different scheme.

What is meant by volume? Consider any global website that works at scale. One
example of Big Data volume is the YouTube press page that says YouTube ingests
100 hours of video every minute.

2: What is an example of velocity? Machine-generated workflows produce massive


volumes of data. For example, the longest stage of designing a computer chip is
physical verification. Where the chip design is tested in every way to see not only if
it works, but also if it works fast enough. Each time researchers fire up a test on a

PowerScale Concepts-SSP

Page 88 © Copyright 2020 Dell Inc.


PowerScale and Big Data

graphics chip prototype, sensors generate many terabytes of data per second.
Storing terabytes of data in seconds is an example of Big Data velocity.

3: Perhaps the best example of variety is the migration of the world to social media.
On a platform such as Facebook, people post all kinds of file formats: text, photos,
video, polls, and more. Many kinds of data at that scale represent Big Data variety.

PowerScale Concepts-SSP

© Copyright 2020 Dell Inc. Page 89


PowerScale and Big Data

Big Data Challenges: Volume

Conventional Challenge PowerScale's Answer

Complex data architecture44 Single volume/single LUN

Low utilization of raw capacity45 High (80%+) utilization

Non-flexible data protection Scalable resiliency

44Challenge: SAN and scale-up NAS data storage architectures encounter a


logical limit at 16 TBs. This means that no matter what volume of data arrives, a
storage administrator has to subdivide it into partitions smaller than 16 terabytes.
The smaller partitions cause silos of data. To simplify this challenge, scale-out NAS
such as an PowerScale cluster holds everything in one single volume with one
LUN. PowerScale can scale seamlessly without architectural hard stops forcing
subdivisions on the data.

45 Challenge: SAN and scale-up NAS architectures must reserve much of the raw
capacity of the system for management and administrative overhead. Overhead
includes RAID parity disks, metadata for all the LUNs and mega LUNs, duplicate
copies of the file system, and so on. As a result, conventional SAN and NAS
architectures often use half of the raw capacity available, because of the headroom
for each separate stack of storage. Suppose that you have seven different silos of
data. When you put them in one large volume, you immediately get back the
headroom from six of the seven stacks. In that way, PowerScale offers high
utilization. PowerScale customers routinely use 80% or more of raw disk capacity.

PowerScale Concepts-SSP

Page 90 © Copyright 2020 Dell Inc.


PowerScale and Big Data

Big Data Challenges: Velocity

Conventional Challenge PowerScale's Answer

Difficult to scale performance46 Linear scalability

Silos of data47 No hot spots

46 Some data storage architectures use two controllers, sometimes called servers
or filers, to run a stack of many hard drives. You can scale capacity by adding more
hard drives, but it is difficult to scale performance. In a given storage stack, the
hard drives offer nothing but capacity. All the intelligence of the system, including
computer processing and RAM, must come from the two filers. If the horsepower of
the two filers becomes insufficient, the architecture does not enable you to pile on
more filers. You start over with another stack and two more filers. In contrast, every
node in an PowerScale cluster contains capacity plus computing power plus
memory. The nodes can work in parallel, so each node you add scales out linearly.
In other words, all aspects of the cluster scale up, including capacity and
performance.

47 Due to the architectural restrictions, SAN and scale-up NAS end up with several
isolated stacks of storage. Many sites have a different storage stack for each
application or department. A backup storage stack is an example. Instead, an
administrator has to manually arrange a data migration. If the R&D stack performs
product testing that generates results at Big Data velocity, the company may
establish an HPC stack, which could reach capacity rapidly. Other departments or
workflows may have independent storage stacks with lot of capacity remaining, but
there is no automated way for R&D to offload their HPC overflow. In contrast, an
PowerScale cluster distributes data across all its nodes to keep them all at equal
capacity. You do not have one node that is overworked while other nodes sit idle.

PowerScale Concepts-SSP

© Copyright 2020 Dell Inc. Page 91


PowerScale and Big Data

Challenge48 Parallel processing

Many manual processes49 Policy-driven

There are no hot spots, and thus, no manual data migrations. If the goal is to keep
pace with Big Data velocity, automated balancing makes more sense.

48In conventional storage, a file is typically confined to a RAID stripe. That means
that the maximum throughput of reading that file is limited to how fast those drives
can deliver the file. In modern workflows where a hundred engineers or a thousand
digital artists access a file, the RAID drives cannot keep up. Perhaps the two filers
on that stack cannot process that many requests efficiently. With PowerScale,
every node has at least a dozen drives, plus more RAM and more computer
processing, for more caching and better concurrent access. When there is heavy
demand for a file, several nodes can deliver it.

49 Besides manual data migrations, conventional storage has many more manual
processes. A SAN or a scale-up NAS administrator spends a significant amount of
time creating and managing LUNs, partitioning storage, establishing mounts,
launching jobs, and so on. In contrast, PowerScale is policy-driven. Once you
define your policies, the cluster does the rest automatically.

PowerScale Concepts-SSP

Page 92 © Copyright 2020 Dell Inc.


PowerScale and Big Data

Big Data Challenges: Variety

A scale-out Data Lake is a large storage solution where vast amounts of data from
other solutions or locations are combined into a single store. Elements of a data
lake are:
• Digital repository to store massive data.
• Variety of formats.
• Can do computations and analytics on original data.
• Helps address the variety issue with Big Data.
• Data can be secured, analyzed, and actions taken based on insights.
• Enterprises can eliminate the cost of having silos of information.
• Provides scaling capabilities in terms of capacity, performance, security, and
protection.

PowerScale Concepts-SSP

© Copyright 2020 Dell Inc. Page 93


PowerScale and Big Data

Big Data Positioning of PowerScale

PowerScale scale-out NAS architecture simplifies managing


Big Data.
• Scale-out NAS – simplifies Big Data
management.
• 1,000's of PB of file-based data – one volume,
one namespace, one file system
• Purpose-built to simplify Big Data challenges.
• Multiprotocol capable.

PowerScale Concepts-SSP

Page 94 © Copyright 2020 Dell Inc.


PowerScale and Big Data

PowerScale OneFS: Scale-Out Data Lake

Single Volume / Simplicity & Ease of


File System Use

High Performance Linear Scalability

Unmatched
Easy Growth
Efficiency

Cloud Tiering
Ready Hadoop Enabled

PowerScale is the industry leading scale-out clustered storage solution. It provides


a single volume of data storage at a massive scale that is easy to use and manage.
It offers linear scalability and readiness for performance applications, Hadoop
analytics, and other workflows.

A Data Lake is a central data repository that stores data from various sources, such
as file shares, web apps, and the cloud. It enables businesses to access the same
data for various uses and enables the manipulation of data using various clients,
analyzers, and applications. The data is real-time production data with no need to
copy or move it from an external source, like another Hadoop cluster, into the Data
Lake. The Data Lake provides tiers that are based on data usage, and the ability to
instantly increase the storage capacity when needed. This slide identifies the key
characteristics of a scale-out Data Lake.

PowerScale Concepts-SSP

© Copyright 2020 Dell Inc. Page 95


PowerScale and Big Data

PowerScale CloudPools

The PowerScale CloudPools software enables you to select from various public
cloud services or use a private cloud. CloudPools offers the flexibility of another tier
of storage that is off-premise and off-cluster. Essentially what CloudPools do is
provide a lower TCO50 for archival-type data.
• Treat cloud storage as another cluster-connected tier.
• Policy-based automated tiering
• Address rapid data growth and optimize data center storage resources - use
valuable on-site storage resources for active data.
• Send rarely used or accessed data to cloud.
• Seamless integration with data – retrieve at any time.

50 CloudPools optimize primary storage with intelligent data placement. CloudPools


eliminates management complexity and enables a flexible choice of cloud
providers.

PowerScale Concepts-SSP

Page 96 © Copyright 2020 Dell Inc.


PowerScale and Big Data

• Data remains encrypted in cloud until retrieval.


• Connect to ECS, another PowerScale cluster, Amazon S3, Virtustream,
Microsoft Azure, Google cloud, and Alibaba.
• Policies automatically move specified files to Cloud.

PowerScale Concepts-SSP

© Copyright 2020 Dell Inc. Page 97


PowerScale and Big Data

PowerScale and Edge-to-Core-to-Cloud

PowerScale can consolidate file-based, unstructured data into a Data Lake. It


eliminates costly storage silos, simplifies management, increases data protection,
and acquires more value from your data assets. With integrated multiprotocol
capabilities, PowerScale can support a wide range of traditional and next-
generation applications on a single platform. Support includes powerful Big Data
analytics that provide you with better insight and use of your stored information.

Edge locations are often inefficient islands of storage, running with limited IT
resources, and inconsistent data protection practices. Data at the edge generally
lives outside of the Data Lake, making it difficult to incorporate into data analytics
projects. The edge-to-core-to-cloud approach extends the Data Lake to edge
locations and out into the cloud. It enables consolidation, protection, management,
and backups of remote edge location data.

PowerScale Concepts-SSP

Page 98 © Copyright 2020 Dell Inc.


Course Summary

Course Summary

PowerScale Concepts-SSP

© Copyright 2020 Dell Inc. Page 99


Course Summary

Course Summary

Now that you have completed this course, you can:


→ Compare structured and unstructured data.
→ Identify the PowerScale physical architecture.
→ Discuss nodes workflow application, node model details, and adding new
nodes to the cluster.
→ Describe the PowerScale OneFS operating system.
→ Explain data management and security in PowerScale.
→ Discuss PowerScale with a Big Data solution.

PowerScale Concepts-SSP

Page 100 © Copyright 2020 Dell Inc.


Appendix

PowerScale Concepts-SSP

© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 101
Appendix

PowerScale Nodes
Individual PowerScale nodes provide the data storage capacity and processing
power of the PowerScale scale-out NAS platform. All of the nodes are peers to
each other and so there is no single 'master' node and no single 'administrative
node'.

• No single master
• No single point of administration

Administration can be done from any node in the cluster as each node provides
network connectivity, storage, memory, non-volatile RAM (NVDIMM) and
processing power found in the Central Processing Units (CPUs). There are also
different node configurations, compute, and capacity. These varied configurations
can be mixed and matched to meet specific business needs.

Each contains.

• Disks

• Processor

• Cache

• Front-end network connectivity

PowerScale Concepts-SSP

Page 102 © Copyright 2020 Dell Inc.


Appendix

Tip: Gen 5 and Gen 6 nodes can exist within the same cluster. Every
PowerScale node is equal to every other PowerScale node of the
same type in a cluster. No one specific node is a controller or filer.

PowerScale Concepts-SSP

© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 103
PowerScale Concepts-SSP

© Copyright 2020 Dell Inc. Page 104

You might also like