Professional Documents
Culture Documents
GPFS Easy
GPFS Easy
GPFS Easy
This section will provide information on setting up an initial GPFS cluster on an AIX system Installation Setting up SSH Create Cluster Verification Steps: Create the GPFS cluster using mmcrcluster. Before the cluster can be built, GPFS has to be installed on the required nodes. Then enable password less login enabled from each of the nodes to other nodes. Step 1 Installing GPFS fileset is fairly easy for an AIX administrator by using the smitty installp tool, select the fileset or use the installp command depending on your choice. The filesets required are gpfs.base gpfs.msg.en_US gpfs.base gpfs.docs.data Step 2 Next step is to enable the ssh password less login rsh or ssh can be used. Even though rsh can also be used, ssh is the preferred method as it offers more security. On the one of the nodes (preferably the node which you plan as the primary node) generate ssh key and copy the private and the public key to /root/.ssh directory on all the nodes which are part of GPFS cluster.
Generate keypair #ssh-keygen -t dsa Verify if key-pair is generated There should be 2 files generated on /root/.ssh: id_dsa and id_dsa.pub Modify parameter to YES Look for PermitRootLogin parameter in /etc/ssh/sshd_config file Before copying the ssh keys to all other nodes make sure that the parameter PermitRootLogin is set to yes on all nodes. If the parameter is not set to yes then change it to 'yes' Then refresh the ssh daemon.
Copy the key-pair id_dsa and id_dsa.pub to all other nodes to the same location /root/.ssh Append the ssh public key to /etc/ssh/authorized_keys on all the nodes including the primary gpfs node . #cp /root/.ssh/id_dsa.pub >> /etc/ssh/authorized_keys
Step 3 Create a nodelist Create a nodelist in root home directory which has all the node names (FQDN) which will be part of gpfs cluster For eg: If the nodenames are node1.test.com (node1) , node2.test.com (node2) , etc Create a file /root/nodelist Add the FQDNs or shortnames of these nodes in the file one after the other (remember to put these entries in /etc/hosts file as well)
/root/nodelist is input file with a list of node names:designations. Designations are manager or client and quorum or nonquorum (Tip: To make a node as quorum node specify quorum alone , to make it a client quorum node specify as quorum-client) node1:quorum-manager node2: quorum-manager
Syntax: mmcrcluster -N {NodeDesc[,NodeDesc...] | NodeFile -p PrimaryServer [-s SecondaryServer] [-r RemoteShellCommand] [-R RemoteFileCopyCommand] [-C ClusterName] [-U DomainName] [-A] [-c ConfigFile]
/usr/bin/scp -C testcluster
-A
where /root/nodelist contains the the list of nodes, node1 the primary configuration serve which is specified by p optionr, node2 the secondary configuration server which is specified by s option, ssh is the shell used for GPFS command execution specified by R option and scp is the copy command used by GPFS to copy in between nodes specified by r option and testcluster is the cluster name specified by C option and finally A option denotes GPFS will auto start during reboot.
Step 4 Verify status of the cluster using the mmlscluster # mmlscluster GPFS cluster information GPFS cluster name: GPFS cluster id: GPFS UID domain: Remote shell command: Remote file copy command: testcluster 12399838388936568191 testcluster /usr/bin/ssh /usr/bin/scp
Primary server
node1 node2
Secondary server:
IP address
hdisk1:node1:node2:dataAndMetadata:0:test_nsd This means the hdisk1 is the LUN which is to be used for the NSD, node1 is the primary server for this NSD, node2 is the backup server for this NSD, dataAndMetadata indicates it can contain data as well as metadata. 0 is the failure group and test_nsd is the name of the NSD After the descriptor file is created use the mmcrnsd command to creat the NSD Usage: mmcrnsd -F DescFile [-v {yes | no}]
Eg: #mmcrnsd -F /root/disklist mmcrnsd: Processing disk hdisk1 mmcrnsd: 6027-1371 Propagating the cluster configuration data to all affected nodes. This is an asynchronous process.
Verify the NSDs were created Use mmlsnsd to verify the NSDs were properly created
# mmlsnsd
File system
Disk name
NSD servers
After the NSD has been created the "disk descriptor" file /root/disklist in our case will have been rewritten and now has the NSD disk names. This newly written "disk descriptor" file now is used as input to the mmcrfs command.
Creating the GPFS Filesystem Before creating the gpfs filesystem make sure to create the mount point. For example if /gpfsFS1 is the filesystem name to be used then create the mount point using Command mkdir p /gpfsFS1. Create the gpfs filesystem using the below command. # mmcrfs /gpfsFS1 /dev/gpfsFS1 -F /root/disklist -B 64k m 2 M 2 This will create a Filesystem gpfsFS1 with device /dev/gpfsFS1 (whose underlying raw device will the disk mentioned in /root/disklist) with block size of 64K and maximum copies if data (m) and metadata (M) set to 2. Incase if you add an extra NSD like above and mention the failure group as 1, you can also specify the r 2 and R 2 which implies that 2 replicas of data and metadata will be created (like mirroring). Now the NSD can be viewed by mmlsnsd command # mmlsnsd File system Disk name NSD servers
======================================================================== ==================================================================
======================================================================== ==================================================================
Summary information --------------------Number of nodes defined in the cluster: Number of local nodes active in the cluster: Number of remote nodes joined in this cluster: Number of quorum nodes defined in the cluster: Number of quorum nodes active in the cluster: Quorum = 2, Quorum achieved 2 2 0 2 2
# mmshutdown N <nodename> # mmshutdown -N node1 Wed Dec 3 00:11:44 CDT 2010: 6027-1341 mmshutdown: Starting force unmount of GPFS file systems
3 00:11:49 CDT 2010: 6027-1344 mmshutdown: Shutting down GPFS Shutting down! 'shutdown' command about to kill process 5701702 3 00:11:56 CDT 2010: 6027-1345 mmshutdown: Finished
# mmshutdown N all
# mmgetstate -a
Node number
Node name
GPFS state
This shows that node2 is either down or GPFS is not started and hence the node1 is arbitrating to find the quorum. Solution is to start GPFS in node2 which will be described below
Mmstartup N <nodename>
# mmstartup -N node2 Wed Nov 3 00:14:09 CDT 2010: 6027-1642 mmstartup: Starting GPFS ...
mmaddnode -N {NodeDesc[,NodeDesc...] | NodeFile} Must have root authority May be run from any node in the GPFS cluster Ensure proper authentication (.rhosts or ssh key exchanges) Install GPFS onto new node Decide designation(s) for new node, for example, Manager | Quorum
#mmaddnode -N node3: quorum-manager Wed Nov 3 01:20:35 CDT 2010: 6027-1664 mmaddnode: Processing node node3
mmaddnode: Command successfully completed mmaddnode: 6027-1371 Propagating the cluster configuration data to all affected nodes. This is an asynchronous process.
GPFS cluster id: GPFS UID domain: Remote shell command: Remote file copy command:
GPFS cluster configuration servers: ----------------------------------Primary server: Secondary server: node1 node2
IP address
---------------------------------------------------------------------------------------------1 node1 quorum-manager 2 node2 quorum 3 node3 quorum-manager 10.10.19.81 10.10.19.82 10.10.19.83 node1 node2 node3
To change the designation of a node mmchnode In our case the node3 was a non-quorum node and a client. To change the designation of node3 to client
# mmchnode --client -N node3 Wed Nov 3 00:29:01 CDT 2010: 6027-1664 mmchnode: Processing node node3
mmchnode: 6027-1371 Propagating the cluster configuration data to all affected nodes. This is an asynchronous process.
# mmlscluster
GPFS cluster information =========================== GPFS cluster name: GPFS cluster id: GPFS UID domain: Remote shell command: Remote file copy command: testcluster 12399838388936568191 testcluster /usr/bin/ssh /usr/bin/scp
GPFS cluster configuration servers: ----------------------------------Primary server: Secondary server: node1 node2
IP address
---------------------------------------------------------------------------------------------1 node1 quorum-manager 2 node2 quorum 3 node3 quorum 10.10.19.81 10.10.19.82 10.10.19.83 node1 node2 node3
To remove the NSD To replace the disk To add a new disk to the GPFS filesystem To suspend a disk To Resume the disk Steps: To list characteristics of GPFS filesystem
---- --------------------------------------------------------------------f -i -I -m -M -r -R -j -D -k -a 2048 512 8192 1 2 1 2 cluster nfs4 all 1048576 Minimum fragment size in bytes Inode size in bytes Indirect block size in bytes Default number of metadata replicas Maximum number of metadata replicas Default number of data replicas Maximum number of data replicas Block allocation type File locking semantics in effect ACL semantics in effect Estimated average file size Estimated number of nodes that will mount file Block size Quotas enforced Default quotas enabled Maximum number of inodes File system version Support for large LUNs?
-z -L -E -S -K -P -d -A -o -T
Is DMAPI enabled? Logfile size Exact mtime mount option Suppress atime mount option Strict replica allocation option Disk storage pools in file system Disks in file system Automatic mount option Additional mount options Default mount point
Mmchconfig Prerequisite : A LUN obtained from SAN should first be added to NSD as mentioned in the section Creating Network shared Disk (NSD) and then proceed
Eg : To use the nsd test_nsd as a tiebreaker disk use the following command mmchconfig tiebreakerDisks="test_nsd" Eg : To remove a tiebreaker disk use the following command mmchconfig tiebreakerDisks=no
# mmmount all -a
To list all the physical disks which are part of a GPFS filesystem
mmlsnsd
To show the node names use the f and m option # mmlsnsd -f gpfs -m
NSD volume ID
Device
Node name
-------------------------------------------------------------------------------------nsd1 nsd2 nsd3 nsd4 AC1513514CD152BF AC1513514CD152C0 AC1513514CD152C1 AC1513514CD15352 /dev/hdisk1 /dev/hdisk2 /dev/hdisk3 /dev/hdisk4 node1 node2 node3 node4
To show the failure group info and storage pool info # mmlsdisk gpfs disk storage driver sector failure holds size holds status
------------ -------- ------ ------- -------- ----- ------------------------ -----------nsd1 system nsd2 system nsd3 pool1 nsd4 pool1 nsd nsd nsd nsd 512 512 512 512 -1 yes -1 yes -1 no -1 no yes yes yes yes ready ready ready ready up up up up
# mmdf gpfs disk KB name blocks disk size free KB in KB in fragments failure holds holds free in full
--------------- ------------- -------- -------- ------------------------ ------------------Disks in storage pool: system (Maximum disk size allowed is 97 GB) nsd1 99%) nsd2 99%) 10485760 960 ( 0%) 10485760 960 ( 0%) -1 yes -1 yes yes yes 10403328 ( 10402304 (
where gpfs_fs1 is the GPFS filesystem and node1 is the name of the node from where it needs to be unmounted
mmumount gpfs_fs1 -a
1. # mmumount <GPFS filesystem> -a To unmount the GPFS filesystem from all nodes 2. # mmdelfs gpfs_fs1 p To remove the filesystem
To remove FS gpfs_fs1
# mmdf gpfs_fs1 disk KB name blocks disk size free KB in KB in fragments failure holds holds free in full
--------------- ------------- -------- -------- ------------------------ ------------------Disks in storage pool: system (Maximum disk size allowed is 104 GB) test_nsd 99%) test_nsd2 (100%) test_nsd1 99%) 10485760 152 ( 0%) 10485760 62 ( 0%) 10485760 160 ( 0%) 0 yes 0 yes 1 yes yes yes yes 10359360 ( 10483648 10359360 (
Inode Information ----------------Number of used inodes: Number of free inodes: Number of allocated inodes: Maximum number of inodes: 4042 29494 33536 33536
--------------------------------------------------------------------------------------------
1. mmumount gpfs_fs1 -a 2. # mmdelfs gpfs_fs1 -p GPFS: 6027-573 All data on following disks of gpfs_fs1 will be destroyed: test_nsd test_nsd1 test_nsd2 GPFS: 6027-574 Completed deletion of file system /dev/gpfs_fs1.
Mmlsnsd output after GPFS was removed which shows all NSDs as free disks
# mmlsnsd
File system
Disk name
NSD servers
-------------------------------------------------------------------------(free disk) (free disk) (free disk) test_nsd1 test_nsd2 test_nsd directly Attached directly Attached directly Attached
Remember that you should remove only a disk after confirming that adequate space is left in the other disks which are part of this filesystem (you can check this by using mmdf <GPFS filesystem name> so that when the disk is removed it will respan and data will then be shared across other available disks. OR 2 data replicas are available which can be checked using the mmlsfs <GPFS filesysem> Syntax: mmdeldisk <GPFS filesystem> <NSD name> -r -r option is very important as it will resync the data and balnce the data across other available disks in this filesystem.
# mmdeldisk gpfs_fs1 test_nsd2 -r Deleting disks ... Scanning system storage pool GPFS: 6027-589 Scanning file system metadata, phase 1 ... GPFS: 6027-552 Scan completed successfully. GPFS: 6027-589 Scanning file system metadata, phase 2 ... GPFS: 6027-552 Scan completed successfully. GPFS: 6027-589 Scanning file system metadata, phase 3 ... GPFS: 6027-552 Scan completed successfully. GPFS: 6027-589 Scanning file system metadata, phase 4 ... GPFS: 6027-552 Scan completed successfully. GPFS: 6027-565 Scanning user file metadata ... GPFS: 6027-552 Scan completed successfully. Checking Allocation Map for storage pool 'system' GPFS: 6027-370 tsdeldisk64 completed. mmdeldisk: 6027-1371 Propagating the cluster configuration data to all affected nodes. This is an asynchronous process.
# mmlsnsd
File system
Disk name
NSD servers
-------------------------------------------------------------------------gpfs_fs1 gpfs_fs1 (free disk) test_nsd test_nsd1 test_nsd2 directly attached directly attached directly attached
Remember that you should only remove NSDs that are free. Steps to remove NSDs are #mmdelnsd <NSD name>
# mmlsnsd
File system
Disk name
NSD servers
-------------------------------------------------------------------------(free disk) Gpfs_fs1 Gpfs_fs1 test_nsd1 test_nsd2 test_nsd directly attached directly attached directly attached
# mmdelnsd test_nsd1 mmdelnsd: Processing disk test_nsd1 mmdelnsd: 6027-1371 Propagating the cluster configuration data to all affected nodes. This is an asynchronous process.
# mmlsnsd
File system
Disk name
NSD servers
Prerequisites for replacing or adding a new disk. The physical disk / LUN should be added to a NSD as mentioned in the Creating Network shared Disk (NSD) section
Syntax : mmrpldisk <GPFS filesystem name> <NSD to be replaced> <new NSD> -v {yes|no}
Yes checks will be done if any data is there in new NSD No -- no checks will be done if any data is there in new NSD
In this example you have 3 existing nsds nsd2, nsd3 and nsd4 and a newly added nsd nsd1 which is not part of GPFS filesystem. This procedure explains how to replace nsd4 with nsd1
# mmlsnsd
File system
Disk name
NSD servers
-------------------------------------------------------------------------Gpfs_fs1 Gpfs_fs1 Gpfs_fs1 nsd2 nsd3 nsd4 (directly attached) (directly attached) (directly attached)
(free disk)
nsd1
(directly attached)
# mmrpldisk gpfs nsd4 nsd1 -v no Verifying file system configuration information ... Replacing nsd4 ...
GPFS: 6027-531 The following disks of gpfs will be formatted on node trlpar06_21: nsd1: size 10485760 KB Extending Allocation Map Checking Allocation Map for storage pool 'system' GPFS: 6027-1503 Completed adding disks to file system gpfs_fs1 Scanning system storage pool GPFS: 6027-589 Scanning file system metadata, phase 1 ... GPFS: 6027-552 Scan completed successfully. GPFS: 6027-589 Scanning file system metadata, phase 2 ... Scanning file system metadata for pool1 storage pool GPFS: 6027-552 Scan completed successfully. GPFS: 6027-589 Scanning file system metadata, phase 3 ... GPFS: 6027-552 Scan completed successfully. GPFS: 6027-589 Scanning file system metadata, phase 4 ... GPFS: 6027-552 Scan completed successfully. GPFS: 6027-565 Scanning user file metadata ... 100 % complete on Thu Nov 4 02:14:14 2010
GPFS: 6027-552 Scan completed successfully. Checking Allocation Map for storage pool 'system' Done
Check the mmlsnsd output after the activity. Notice that nsd4 now became a part of gpfs_fs1 filesystem and nsd4 became free
# mmlsnsd
File system
Disk name
NSD servers
-------------------------------------------------------------------------(free disk) Gpfs_fs1 Gpfs_fs1 Gpfs_fs1 nsd2 nsd3 nsd4 nsd1 (directly attached) (directly attached) (directly attached) (directly attached)
Prerequisites for adding a new disk. The physical disk / LUN should be added to a NSD as mentioned in the Creating Network shared Disk (NSD) section and the same disk descriptor file used for creating the NSD has to be used with F option
Step 1: create a disk descriptor file with any name. here we use the file name as disks and the disk used is hdisk10
# /usr/lpp/mmfs/bin/mmcrnsd -F disks mmcrnsd: Processing disk hdisk10 mmcrnsd: 6027-1371 Propagating the cluster configuration data to all affected nodes. # # # mmlsnsd This is an asynchronous process.
File system
Disk name
NSD servers
-------------------------------------------------------------------------Gpfs_fs1 Gpfs_fs1 (free disk) test_nsd2 test_nsd test_nsd10 directly attached directly attached directly attached
GPFS: 6027-531 The following disks of gpfs_fs1 will be formatted test_nsd10: size 10485760 KB Extending Allocation Map Checking Allocation Map for storage pool 'system' GPFS: 6027-1503 Completed adding disks to file system gpfs_fs1. mmadddisk: 6027-1371 Propagating the cluster configuration data to all affected nodes. This is an asynchronous process.
To suspend a disk
This will be useful if you suspect any problem with existing disk and you want to stop further writes to that disk.
# mmlsdisk /dev/gpfs_fs1 disk storage driver sector failure holds size holds status
------------ -------- ------ ------- -------- ----- ------------------------ -----------nsd4 system nsd2 system nsd3 pool1 nsd nsd nsd 512 512 512 0 yes 1 yes 0 no yes yes yes suspended ready ready up up up
To resume a disk
# mmlsdisk /dev/gpfs_fs1 disk storage driver sector failure holds size holds status
------------ -------- ------ ------- -------- ----- ------------------------ -----------nsd4 system nsd2 system nsd3 pool1 nsd nsd nsd 512 512 512 0 yes 1 yes 0 no yes yes yes suspended ready ready up up up
# mmlsdisk /dev/gpfs_fs1 disk storage driver sector failure holds size holds status
------------ -------- ------ ------- -------- ----- ------------------------ -----------nsd4 system nsd2 system nsd3 pool1 nsd nsd nsd 512 512 512 0 yes 1 yes 0 no yes yes yes ready ready ready up up up