Download as pdf or txt
Download as pdf or txt
You are on page 1of 78

ONTAP Troubleshooting

Instructor Exercise Guide


Content Version 2
NETAPP UNIVERSITY

ONTAP Troubleshooting

Instructor Exercise Guide


Course ID: CP-ILT-CATSP
Catalog Number: CP-ILT-CATSP-IEG

NetApp University - Do Not Distribute


ATTENTION
The information contained in this course is intended only for training. This course contains information and activities that,
while beneficial for the purposes of training in a closed, non-production environment, can result in downtime or other
severe consequences in a production environment. This course material is not a technical reference and should not,
under any circumstances, be used in production environments. To obtain reference materials, refer to the NetApp product
documentation that is located at http://now.netapp.com/.

COPYRIGHT
© 2017 NetApp, Inc. All rights reserved. Printed in the U.S.A. Specifications subject to change without notice.
No part of this document covered by copyright may be reproduced in any form or by any means—graphic, electronic, or
mechanical, including photocopying, recording, taping, or storage in an electronic retrieval system—without prior written
permission of NetApp, Inc.

U.S. GOVERNMENT RIGHTS


Commercial Computer Software. Government users are subject to the NetApp, Inc. standard license agreement and
applicable provisions of the FAR and its supplements.

TRADEMARK INFORMATION
NETAPP, the NETAPP logo, and the marks listed at http://www.netapp.com/TM are trademarks of NetApp, Inc. Other
company and product names may be trademarks of their respective owners.

2 ONTAP Troubleshooting: Instructor Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


TABLE OF CONTENTS
INTRODUCTION................................................................................................................................................ 1
MODULE 0: WELCOME.................................................................................................................................... 5
MODULE 1: TROUBLESHOOTING THE MANAGEMENT COMPONENT ................................................... 10
MODULE 2: TROUBLESHOOTING SCALE-OUT NETWORKING ............................................................... 16
MODULE 3: TROUBLESHOOTING NETWORK COMPONENT AND SECURITY SERVICES .................... 26
MODULE 4: TROUBLESHOOTING NFS ....................................................................................................... 32
MODULE 5: TROUBLESHOOTING SMB....................................................................................................... 39
MODULE 6: TROUBLESHOOTING SCALABLE SAN .................................................................................. 55
MODULE 7: TROUBLESHOOT MULTIPLE PROBLEMS ............................................................................. 69

3 ONTAP Troubleshooting: Instructor Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


Getting Started
Study Aid Icons
In your exercises, you might see one or more of the following icons.
Warning
If you misconfigure a step marked with this icon, later steps might not work properly.
Check the step carefully before you move forward.
Attention
Review this step or comment carefully to save time, learn a best practice, or avoid errors.
Information
A comment labeled with this icon provides information about the topic or procedure.
References
A comment labeled with this icon identifies reference material that provides additional
information.

Exercise Equipment
The student lab environment consists of one vApp for each student.
The vApp is labeled OTS_X0Y, where X is the set number and Y is the student vApp number

Remote Desktop Protocol Access


Use a VPN to access advtraining.netapp.com with the VPN credentials that are assigned to you. Use
Remote Desktop Protocol (RDP) to connect to the access host for the lab that you want to access. The
access host is aptly named cats-access. You should have received the IP address . You can also find the IP
address by accessing the Virtual Machines tab in the vApp interface.
Console Access
Use a VPN to access advtraining.netapp.com with the VPN credentials that are assigned to you. Log in to
https://adv-vcloud.gsedu.ngslabs.netapp.com/cloud/org/prod/#/vAppListPage? with the credentials that
are assigned to you. This page gives you access to the vApps for labs. Double-click any host to access the
console.

4 ONTAP Troubleshooting: Instructor Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


Module 0: Welcome
In this exercise, you check the health of the NetApp ONTAP software cluster that you use for all the
subsequent exercises.

Objectives
This exercise focuses on enabling you to do the following:
 Conduct a full and comprehensive health check of an ONTAP cluster
 Access the cluster using OnCommand System Manager (System Manager)
 Access cluster and node log files via HTTPS
This is an optional module. The instructor should decide if any of the tasks here are
useful depending on the expertise of the students in the class.

Task 1: Complete a General Health Check


Use the lab information that is provided to you by your instructor (credentials and the IP addresses) to
complete the following steps.
Step Action
1-1. Log in to the following VPN: advtraining.netapp.com.

1-2. Open the Remote Desktop Connection (RDC) application and connect to your access host.

1-3. Launch PuTTY from the desktop of the access host.

1-4. Connect to the cluster1 management interface.

1-5. Run the following commands to do a complete health check of the cluster.
 cluster1::*> network interface show
 cluster1::*> cluster show
 cluster1::*> cluster ring show
 cluster1::*> storage failover show
 cluster1::*> event log show -severity WARNING
 cluster1::*> event log show -severity EMERGENCY

5 ONTAP Troubleshooting: Instructor Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


Task 2: Complete an ISCSI General Health Check
Step Action
2-1. Run the following commands:
 cluster1::*>vserver show -vserver sansvm1
 cluster1::*> vserver show -vserver sansvm2
 cluster1::*> iscsi show -instance
 cluster1::*> igroup show -instance
 cluster1::*> lun show -instance
 cluster1::*> lun mapping show
 cluster1::*> network interface show -vserver sansvm1 -
instance
 cluster1::*> network interface show -vserver sansvm2 -
instance
 cluster1::*> volume show -vserver sansvm1 -fields junction-
path, junction-parent, user, group, policy, unix-
permissions, security-style
 cluster1::*> volume show -vserver sansvm2 -fields junction-
path, junction-parent, user, group, policy, unix-permissions,
security-style

6 ONTAP Troubleshooting: Instructor Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


Task 3: Complete a CIFS Health Check
Step Action
3-1. Run the following commands:
 cluster1::*> vserver show -vserver nassvm1
 cluster1::*> vserver show -vserver nassvm2
 cluster1::*> cifs options show
 cluster1::*> cifs options show -fields is-exportpolicy-
enabled
 cluster1::*> cifs security show
 cluster1::*> cifs share show -fields vserver, share-name,
cifs-server, path, share-properties, volume
 cluster1::*> volume show -fields junction-path, junction-
parent, user, group, policy, unix-permissions, security-style
 cluster1::*> vserver services dns show
 cluster1::*> network interface show -vserver nassvm1
 cluster1::*> network interface show -vserver nassvm2
 cluster1::*> network interface show -vserver nassvm1 -
instance
 cluster1::*> network interface show -vserver nassvm2 –
instance
 cluster1::*> cifs domain discovered-servers show -vserver
nassvm1
 cluster1::*> cifs domain discovered-servers show -vserver
nassvm2
 cluster1::*> unix-user show -vserver nassvm1
 cluster1::*> unix-user show -vserver nassvm2
 cluster1::*> unix-group show -vserver nassvm1
 cluster1::*> unix-group show -vserver nassvm2
 cluster1::*> vserver name-mapping show -vserver nassvm1
cluster1::*> vserver name-mapping show -vserver nassvm2

7 ONTAP Troubleshooting: Instructor Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


Task 4: Complete an NFS Health Check
Step Action
4-1. Run the following commands:
 cluster1::*> vserver show -vserver nassvm1
 cluster1::*> vserver show -vserver nassvm2
 cluster1::*> nfs status -vserver nassvm1
 cluster1::*> nfs status -vserver nassvm2
 cluster1::*> nfs show
 cluster1::*> nfs show -instance
cluster1::*> network interface show -vserver nassvm1
 cluster1::*> network interface show -vserver nassvm2
 cluster1::*> network ping -node node3 -destination
192.168.6.20
 cluster1::*> network ping -node node4 -destination
192.168.6.20
 cluster1::*> volume show -fields junction-path, junction-
parent, user, group, policy, unix-permissions, security-
style, policy -vserver nassvm1
 cluster1::*> volume show -fields junction-path, junction-
parent, user, group, policy, unix-permissions, security-
style, policy -vserver nassvm2
 cluster1::*> export-policy rule show -policyname default
 cluster1::*> export-policy rule show -policyname default -
instance
 cluster1::*> export-policy rule show -policyname policy1
cluster1::*> export-policy rule show -policyname policy1 -
instance

Task 5: Access the Cluster Using OnCommand System Manager


Step Action
5-1. Access System Manager using the cluster management LIF. https://<cluster-mgmt-ip>/

5-2. Log in using the cluster credentials.

5-3. Verify that System Manager is launched.

5-4. Review System Manager.

Task 6: View Log Files Using HTTPS


Step Action
6-1.
Use one of the following knowledge base articles to complete this exercise:
 https://kb.netapp.com/support/index?page=content&id=1013814
 https://kb.netapp.com/support/s/article/ka31A0000000uYnQAI/how-to-enable-remote-
access-to-a-nodes-root-volume-in-a-cluster?language=en_US

8 ONTAP Troubleshooting: Instructor Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


Step Action
6-2.

NOTE: In ONTAP 8.3 software, you do not need to enable the web services and
HTTP.

6-3. Connect to the cluster1 management interface using the administrator account from the access
host. You might want to enable logging and save all your session output.

6-4. Identify the names of the nodes in the cluster.


cluster1::> node show -fields node
node
-----------
node1
node2
node3
node4
4 entries were displayed.

6-5. Access the URLs to view the log directory on each node. You must log in using the cluster
administration credentials.
https://<cluster-mgmt-ip>/spi/<node_name>/etc/log/
For example,
https://<cluster-mgmt-ip>/spi/node1/etc/log/
https://<cluster-mgmt-ip>/spi/node1/etc/log/

6-6. Access the URLs to view the directory where the core files are saved on each node.
https://<cluster-mgmt-ip>/spi/<node_name>/etc/crash/
For example,
https://<cluster-mgmt-ip>/spi/node1/etc/crash/
https://<cluster-mgmt-ip>/spi/node2/etc/crash/

End of Exercise

9 ONTAP Troubleshooting: Instructor Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


Module 1: Troubleshooting the Management
Component
In this exercise, you create a cluster backup and solve some issues that results in mgwd not being able to do
its function.

Objectives
This exercise focuses on enabling you to do the following:
 Recover a replicated database (RDB) configuration
 Resolve RDB replication problems
 Perform cluster and node backups
 Resolve an issue with /mroot

Task 1: Backup and Recovery of a RDB Configuration


Step Action
1-1. List two methods (or types) of recovery for a system configuration.
1. Node-level recovery restores node-specific information such as:
CDB, bootargs, varfs.tgz

2. Cluster-level recovery restores cluster-specific information such as: The RDB


information about rejoining, synchronizing, or re-creating the cluster from a backup

1-2. List two frequent reasons that could cause a cluster configuration backup to fail.
Common causes include:
 Lack of space in mroot
 Missing or misnamed files in /mroot for RDB and /var for CDB
 Failures in job manager

1-3. A customer sees the following error message on the console:


cluster_backup_job::check_for_node_backups - Node: %s Backup
Errored.
Explain what this error message tells you.
It shows that the backup did not work. It doesn’t explain why. Although, the customer story
might be the cause, it is probably best to continue to investigate to ensure that the customer
correctly diagnosed the problem

1-4. Identify a knowledge base article to resolve the error message in Item 3.
Troubleshooting Workflow: Cluster Config Backup/Restore: Backup failure
KB Article Number: 000014430 (Former KB ID: KB ID: 2017186)
https://kb.netapp.com/support/s/article/ka11A00000015ga/troubleshooting-workflow-cluster-
config-backup-restore-backup-failure

10 ONTAP Troubleshooting: Instructor Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


Step Action
1-5. List the command to display the default backup schedule.
cluster1::*> system configuration backup settings show -instance
Backup Destination URL: -
Username for Destination: -
Schedule 1: 8hour
Number of Backups to Keep for Schedule 1: 2
Schedule 2: daily
Number of Backups to Keep for Schedule 2: 2
Schedule 3: weekly
Number of Backups to Keep for Schedule 3: 2

List the command that verifies that the scheduled backups were created and distributed within
the cluster.
cluster1::*> system configuration backup show

List the command that you can use to recover a node’s configuration.
cluster1::*> system configuration recovery
cluster node

cluster1::*>system configuration recovery node restore -backup <cluster backup name or


node backup name> -nodename-in-backup <node name>

After the node restore, sync the node so it gets the RDB configuration data

cluster1::*> system configuration recovery cluster


modify recreate rejoin show sync
cluster1::*> system configuration recovery cluster sync -node <node name>

Task 2: Create a Cluster Backup


Step Action
2-1. List the current backups on node1 and the command that you run.
cluster1::*> system configuration backup show -node node1
Node Backup Name Time Size
--------- ----------------------------------------- ------------------ -----
node1
cluster1.8hour.2015-10-20.17_56_36.7z 10/20 17:56:36 18.61MB
node1
cluster1.8hour.2015-11-03.02_15_00.7z 11/03 02:15:00 55.44MB
node1
cluster1.daily.2015-10-16.00_10_00.7z 10/16 00:10:00 62.45MB
node1
cluster1.daily.2015-10-20.17_56_35.7z 10/20 17:56:35 18.61MB
node1
cluster1.daily.2015-11-03.00_10_04.7z 11/03 00:10:04 55.24MB
node1
cluster1.weekly.2015-08-02.00_15_04.7z 08/02 00:15:04 18.59MB

6 entries were displayed.

11 ONTAP Troubleshooting: Instructor Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


Step Action
2-2. Specify the location of the backup files.
The files are in /mroot/etc/backups/config

2-3. On node1, start a job to create a system configuration backup of the entire cluster and note the
job ID number.
cluster1::*> system configuration backup create -node node1 -backup-type cluster -
backup-name node1.cluster

[Job 495] Job is queued: Cluster Backup OnDemand Job.

2-4. Before the job finishes, review the job that you have created:
cluster1::*> job show
cluster1::*> job show –id <ID#>
(You use the job ID from the backup create command.)
cluster1::*> job show –id <ID#> -fields uuid
cluster1::*> job show -uuid UUID_from_the_previous_command

Task 3: Resolve an Unhealthy Node


Your instructor prepares your lab environment for this exercise and notifies you when it is ready.
Run script Mod1_Task3_Resolve_unhealthy_node.pl to break the lab.

Run script M01_Task3_Resolve_unhealthy_node_fix.pl to fix the lab.

Manual Break:

cluster1::> cluster modify -eligibility false -node node2

Manual Fix:

cluster1::> system configuration recovery cluster sync -node node2

Scenario: A customer reports an unhealthy node.


Step Action
3-1. Answer the following questions:
 What is the current state of the cluster?
 Is the cluster healthy?
 Which command did you use to check the health of the cluster?
Node 2 should show up as unhealthy.

cluster1::> cluster show


Node Health Eligibility
--------------------- ------- ------------
node1 true true
node2 false false
node3 true true
node4 true true
4 entries were displayed.

12 ONTAP Troubleshooting: Instructor Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


Step Action
3-2. List the status of the RDB.
You should see that node2 has not received updates in a while. Its DB epoch is far behind the
others.

cluster1::*> cluster ring show

Node UnitName Epoch DB Epoch DB Trnxs Master Online


--------- -------- -------- -------- -------- --------- ---------
node1 mgmt 48 48 15379 node1 master
node1 vldb 38 38 1 node1 master
node1 vifmgr 1567 1567 51 node1 master
node1 bcomd 2027 2027 11 node1 master
node1 crs 58 58 1 node1 master
node2 mgmt 0 48 15375 - offline
node2 vldb 0 38 1 - offline
node2 vifmgr 0 1567 51 - offline
node2 bcomd 0 2027 11 - offline
node2 crs 58 58 1 node1 secondary
node3 mgmt 48 48 15379 node1 secondary
node3 vldb 38 38 1 node1 secondary
node3 vifmgr 1567 1567 51 node1 secondary
node3 bcomd 2027 2027 11 node1 secondary
node3 crs 58 58 1 node1 secondary
node4 mgmt 48 48 15379 node1 secondary
node4 vldb 38 38 1 node1 secondary
node4 vifmgr 1567 1567 51 node1 secondary
node4 bcomd 2027 2027 11 node1 secondary
node4 crs 58 58 1 node1 secondary

3-3. Answer this question:


Does a reboot of the unhealthy node fix the issue?
cluster1::*> node reboot -node node2 -skip-lif-migration-before-reboot true
cluster1::*> cluster show -node node2
Nothing changes. The node is ineligible to receive updates.

3-4. Recover the node.


cluster1::> system configuration recovery cluster sync -node node2

Task 4: Resolve an Issue with /mroot


Step Action
4-1. Log in to the node management interface for node2, and then enter the systemshell.
cluster1::> set diag
cluster1::*> systemshell –node local

(system node systemshell)


Data ONTAP/amd64 (node2) (ttyp1) login: diag password: P@ssw0rd

13 ONTAP Troubleshooting: Instructor Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


Step Action
4-2. Identify the process ID of the management gateway process.
node2% ps -A | grep mgwd

PID TTY TIME CMD


915 ?? Ss 5:38.77 mgwd –z
8907 2 DL+ 0:00.00 grep mgwd
NOTE: In this example, the process ID of the running instance of the management gateway on
this node is 915.

4-3. Stop the management gateway process.


node2% sudo kill <process_ID_of_the_management_gateway>

4-4. Explain what happened and the reason.

4-5. Log in to the node management interface for node2 again, and answer these questions:
 Are you able to log in?
 Why or why not?
We are not able to login because mgwd was not restarted. This is because spmctl is not
monitoring mgwd anymore.
You can verify that from the systemshell as follows:

node2% sudo spmctl -l | grep mgwd


node2%
node2% ps -A | grep mgwd
9626 1 S+ 0:00.00 grep mgwd

4-6. Log back in to the systemshell of node2.


cluster1::> set diag
cluster1::*> systemshell -node local

4-7. From node 2’s systemshell, unmount mroot, using the following commands:
% cd /etc
% sudo./netapp_mroot_unmount
% exit
4-8. Log in to the cluster management session, and check the cluster health.
cluster1::*> cluster show
Node Health Eligibility Epsilon
-------------------- ------- ------------ ------------
node1 true true true
node2 true true false
node3 true true false
node4 true true false

4-9. Attempt to modify the volume nassvm1_nfs:


cluster1::*> vol modify -vserver nassvm1 -volume nassvm1_nfs -
size 2G
Volume modify successful on volume nassvm1_nfs of Vserver
nassvm1.

14 ONTAP Troubleshooting: Instructor Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


Step Action
4-10. Attempt to modify the volume nassvm2_nfs:
cluster1::*> vol modify -vserver nassvm2 -volume nassvm2_nfs -
size 2G
Volume modify successful on volume nassvm2_nfs of Vserver
nassvm2.

Info: Node node2 that hosts aggregate aggrnas2 is offline

4-11. Check the cluster health again, and answer these questions:
 Do you see a difference?
 If so, why?
 What is nonoperational?
cluster1::*> cluster show
Node Health Eligibility Epsilon
-------- ----------- ------- ------------
node1 true true true
node2 false true false
node3 true true false
node4 true true false

4-12. Fix this problem, and answer this question:


How did you verify that /mroot is mounted?
Reboot the node on which mgwd has a problem (node 2).

Note: Restarting mgwd on this node does not remount mroot. (as was done in the previous
versions of ONTAP).
The following was the solution in the previous versions.

Remount /mroot by restarting the management gateway from the system shell of node 2. When
mgwd restarts, it mounts mroot if it is not already mounted.

node2% ps -A | grep mgwd


12290 ?? Ss 0:26.71 /sbin/mgwd -z
12984 2 S+ 0:00.00 grep mgwd

node2% sudo kill 12290

node2% sudo kill <pid_of_mgwd_from_your_system>

node2% ps –A|grep mgwd -> verifies that mgwd has been started
node1% df| grep mroot
localhost:0x80000000,0xc3ffb35a 1881376 1601136 280240 85% /mroot
/mroot/etc/cluster_config/vserver 1881376 1601136 280240 85% /mroot/vserver_fs

End of Exercise

15 ONTAP Troubleshooting: Instructor Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


Module 2: Troubleshooting Scale-Out Networking
In this exercise, you resolve customer issues that relate to networking problems.

Objectives
This exercise focuses on enabling you to do the following:
 Identify the network component and the data component interaction
 Outline the networking implications of upgrading to ONTAP 8.3 software
 Use network triage tools
 Describe the implications of vifmgr going Out of Quorum (OOQ)

Task 1: Given a Scenario, Explain the Implications of Vifmgr Going Out


of Quorum (Paper Lab)
Step Action
1-1. A customer called in and provided the following output:
cluster::*> cluster ring show
Node UnitName Epoch DB Epoch DB Trnxs Master Online
--------- -------- -------- -------- ----------
------- ---------------------- ------------
cluster-01 mgmt 4 4 136154 cluster-02
secondary
cluster-01 vldb 4 4 76 cluster-02
secondary

Error: rdb_ring_info: RDB ring state query of 127.0.0.1


for vifmgr failed on RPC connect: clnttcp_create: RPC:
Remote system error - Connection refused
cluster-01 vifmgr - - - - -
cluster-01 bcomd 4 4 14 cluster-02
secondary
cluster-01 crs 1 1 79 cluster-02
secondary
cluster-02 mgmt 4 4 136154 cluster-02
master
cluster-02 vldb 4 4 76 cluster-02
master
cluster-02 vifmgr 4 4 13220 cluster-02
master
cluster-02 bcomd 4 4 14 cluster-02
master
cluster-02 crs 1 1 79 cluster-02
master
cluster-03 mgmt 4 4 136154 cluster-02
secondary
cluster-03 vldb 4 4 76 cluster-02
secondary
cluster-03 vifmgr 4 4 13220 cluster-02
secondary
16 ONTAP Troubleshooting: Instructor Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


Step Action
cluster-03 bcomd 4 4 14 cluster-02
secondary
cluster-03 crs 1 1 79 cluster-02
secondary
cluster-04 mgmt 4 4 136154 cluster-02
secondary
cluster-04 vldb 4 4 76 cluster-02
secondary
cluster-04 vifmgr 4 4 13220 cluster-02
secondary
cluster-04 bcomd 4 4 14 cluster-02
secondary
cluster-04 crs 1 1 79 cluster-02
secondary

20 entries were displayed.

cluster::*> net int show -role data


(network interface show)

Logical Status Network Current


Current Is
Vserver Interface Admin/Oper Address/Mask Node Port
Home
------------------- ---------------------------- -------------
-------- ---------- -----------
Vs lif1 up/up 10.61.83.215/24 cluster-03
e0a false
vs1 nfs_lif1 up/up 10.61.83.200/24 cluster-03
e0a false
2 entries were displayed.
VifMgr is down on that node
1-2.

The customer says that these LIFs are normally home on the node cluster-01. Explain
which vifmgr behavior might explain why these LIFs are now on node cluster-03.
1-3. A customer calls in and provided the following output:
cluster::*> cluster ring show
Node UnitName Epoch DB Epoch DB Trnxs Master Online
--------- -------- -------- -------- -------- ---------
---------

Error: rdb_ring_info: RDB ring state query of 127.0.0.1 for


vifmgr failed on RPC connect:
clnttcp_create: RPC: Remote system error - Connection refused

cluster-01 mgmt 4 4 136197 cluster-02


secondary
cluster-01 vldb 4 4 76 cluster-02
secondary
17 ONTAP Troubleshooting: Instructor Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


Step Action
cluster-01 vifmgr - - - - -
cluster-01 bcomd 44 15 cluster-02 secondary
cluster-01 crs 1 1 79 cluster-02
secondary
cluster-02 mgmt 4 4 136197 cluster-02 master
cluster-02 vldb 44 76 cluster-02 master
cluster-02 vifmgr 0 4 13256 - offline
cluster-02 bcomd 4 4 15 cluster-02 master
cluster-02 crs 11 79 cluster-02 master
cluster-03 mgmt 4 4 136197 cluster-02 secondary
cluster-03 vldb 44 76 cluster-02 secondary
cluster-03 vifmgr - - - -
-
cluster-03 bcomd 4 4 15 cluster-02 secondary
cluster-03 crs 11 79 cluster-02 secondary
cluster-04 mgmt 4 4 136197 cluster-02 secondary
cluster-04 vldb 44 76 cluster-02 secondary
cluster-04 vifmgr - - - -
-
cluster-04 bcomd 4 4 15 cluster-02 secondary
cluster-04 crs 1 1 79
cluster-02 secondary

20 entries were displayed.

cluster1::*> net int show -role data

Logical Status Network Current


Current Is
Vserver Interface Admin/Oper Address/Mask Node
Port Home
------------------- ---------- ----------------
-- ------------- -------------- ----------
clintons
lif1 up/- 10.61.83.215/24
cluster-01 e0a true
lif2 up/- 192.168.3.2/24 cluster-02
e0a true
lif3 up/- 192.168.3.3/24 cluster-03
e0a true
lif4 up/- 192.168.3.4/24 cluster-04
e0a true

primary
nfs_lif1 up/- 10.61.83.200/24
cluster-01 e0a true
1-4.

The customer says that the entire cluster is not serving data. The customer
wants an explanation as to why the LIFs are home but not serving data.

18 ONTAP Troubleshooting: Instructor Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


Step Action
1-5. Identify the vifmgr behavior that explains this situation.
VIF Manager is out of quorum

Task 2: Given a Networking Configuration, Associate the FreeBSD Ports


and the Ports That Are Defined in the RDB (ifconfig –A Compared to
net interface show)
Step Action
2-1. Display the current network interface configuration for the entire cluster using the following
command:
cluster1::> network interface show

2-2. View the current networking interface configuration for node4 by entering the following
command:
cluster1::> net int show -curr-node node4
Logical Status Network Current Current Is
Vserver Interface Admin/Oper Address/Mask Node Port Home
----------- ---------- ---------- ------------------ ------------- ------- ----
Cluster
cluster1-04_clus1
up/up 169.254.33.29/16 node4 e0a true
cluster1-04_clus2
up/up 169.254.33.30/16 node4 e0b true
cluster1
cluster1-04_mgmt1
up/up 192.168.6.34/24 node4 e0c true
nassvm1
nassvm1_data4
up/up 192.168.6.118/24 node4 e0d true
nassvm2
nassvm2_data4
up/up 192.168.6.128/24 node4 e0d true
sansvm1
sansvm1_data2
up/up 192.168.6.132/24 node4 e0d true
sansvm2
sansvm2_data2
up/up 192.168.6.136/24 node4 e0d true
7 entries were displayed.

2-3. Log in to the systemshell of node4 by running the following command:


cluster1::*> systemshell -node node4
cluster1::*> systemshell -node node4
(system node systemshell)
diag@169.254.33.29's password: (use P@ssw0rd)

2-4. From the systemshell prompt of node4, view the status of the network ports on the node
by running the following command:
node4% ifconfig -a

19 ONTAP Troubleshooting: Instructor Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


Step Action
node4% ifconfig -a
e0c: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500

options=8009b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,LINKSTA
TE>
ether 00:50:56:01:21:1f
inet 192.168.6.34 netmask 0xffffff00 broadcast 192.168.6.255 NODEMGMTLIF Vserver ID: -1
media: Ethernet autoselect (1000baseT <full-duplex>)
status: active
e0d: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500

options=8009b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,LINKSTA
TE>
ether 00:50:56:01:21:20
inet 192.168.6.118 netmask 0xffffff00 broadcast 192.168.6.255 DATALIF Vserver ID: 5
inet 192.168.6.128 netmask 0xffffff00 broadcast 192.168.6.255 DATALIF Vserver ID: 6
inet 192.168.6.132 netmask 0xffffff00 broadcast 192.168.6.255 DATALIF Vserver ID: 7
inet 192.168.6.136 netmask 0xffffff00 broadcast 192.168.6.255 DATALIF Vserver ID: 8
media: Ethernet autoselect (1000baseT <full-duplex>)
status: active
e0e: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500

options=8009b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,LINKSTA
TE>
ether 00:50:56:01:21:21
media: Ethernet autoselect (1000baseT <full-duplex>)
status: active
e0f: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500

options=8009b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,LINKSTA
TE>
ether 00:50:56:01:21:22
media: Ethernet autoselect (1000baseT <full-duplex>)
status: active
ipfw0: flags=8801<UP,SIMPLEX,MULTICAST> metric
ipfw0: flags=8801<UP,SIMPLEX,MULTICAST> metric 0 mtu 65536
lo0: flags=80c9<UP,LOOPBACK,RUNNING,NOARP,MULTICAST> metric 0 mtu 8232
options=600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6>
inet 127.0.10.1 netmask 0xff000000 LOOPBACKLIF Vserver ID: -1
inet 127.0.20.1 netmask 0xff000000 LOOPBACKLIF Vserver ID: -1
inet 127.0.0.1 netmask 0xff000000 LOOPBACKLIF Vserver ID: -1

20 ONTAP Troubleshooting: Instructor Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


Step Action
crtr0: flags=1<UP> metric 0 mtu 65536

2-5. Correlate the output from Step 2 and Step 4 to determine whether the interface configuration, as
reported by the management component, agrees with the interface configuration of the FreeBSD
networking layer.

2-6. Exit the systemshell on node4 by typing exit.

2-7. Administratively bring down an interface hosted on a port on node4.


cluster1::> net int modify -vserver nassvm2 -lif nassvm2_data4 -home-node node4 -status-admin
down

2-8. Repeat Step 2 through Step 5 to observe that the action taken in Step 7 was correctly passed on to the
FreeBSD networking layer of node4.
cluster1::> net int show -curr-node node4
(network interface show)
Logical Status Network Current Current Is
Vserver Interface Admin/Oper Address/Mask Node Port Home
----------- ---------- ---------- ------------------ ------------- ------- ----
Cluster
cluster1-04_clus1
up/up 169.254.33.29/16 node4 e0a true
cluster1-04_clus2
up/up 169.254.33.30/16 node4 e0b true
cluster1
cluster1-04_mgmt1
up/up 192.168.6.34/24 node4 e0c true
nassvm1
nassvm1_data4
up/up 192.168.6.118/24 node4 e0d true
nassvm2
nassvm2_data4
down/down 192.168.6.128/24 node4 e0d true
sansvm1
sansvm1_data2
up/up 192.168.6.132/24 node4 e0d true
sansvm2
sansvm2_data2
up/up 192.168.6.136/24 node4 e0d true
7 entries were displayed.

node4% ifconfig -a
e0c: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500

options=8009b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,LIN
KSTATE>
ether 00:50:56:01:21:1f
inet 192.168.6.34 netmask 0xffffff00 broadcast 192.168.6.255 NODEMGMTLIF Vserver
ID: -1
media: Ethernet autoselect (1000baseT <full-duplex>)
status: active
e0d: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500

options=8009b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,LIN
KSTATE>
21 ONTAP Troubleshooting: Instructor Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


Step Action
ether 00:50:56:01:21:20
inet 192.168.6.118 netmask 0xffffff00 broadcast 192.168.6.255 DATALIF Vserver ID: 5
inet 192.168.6.132 netmask 0xffffff00 broadcast 192.168.6.255 DATALIF Vserver ID: 7
inet 192.168.6.136 netmask 0xffffff00 broadcast 192.168.6.255 DATALIF Vserver ID: 8
media: Ethernet autoselect (1000baseT <full-duplex>)
status: active
e0e: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500

options=8009b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,LIN
KSTATE>
ether 00:50:56:01:21:21
media: Ethernet autoselect (1000baseT <full-duplex>)
status: active
e0f: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500

options=8009b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,LIN
KSTATE>
ether 00:50:56:01:21:22
media: Ethernet autoselect (1000baseT <full-duplex>)
status: active
ipfw0: flags=8801<UP,SIMPLEX,MULTICAST> metric 0 mtu 65536
lo0: flags=80c9<UP,LOOPBACK,RUNNING,NOARP,MULTICAST> metric 0 mtu 8232
options=600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6>
inet 127.0.10.1 netmask 0xff000000 LOOPBACKLIF Vserver ID: -1
inet 127.0.20.1 netmask 0xff000000 LOOPBACKLIF Vserver ID: -1
inet 127.0.0.1 netmask 0xff000000 LOOPBACKLIF Vserver ID: -1
crtr0: flags=1<UP> metric 0 mtu 65536
The entry inet 192.168.6.128 netmask 0xffffff00 broadcast 192.168.6.255 DATALIF Vserver
ID: 6 is missing from e0d

Task 3: Identify and Resolve Failures That Occur When You Can Create
New LIFs
Your instructor prepares your lab environment for this exercise and notifies you when it is ready.
Run script Mod2_Task5_stop_vifmgr.pl to break the lab.

Run script Mod2_Task5_stop_vifmgr.pl to fix the lab

Manual Break:
cluster1::*> systemshell -node node2 -command "sudo spmctl -s -h vifmgr "

Manual Fix:
cluster1::*> systemshell -node node2 -command "sudo spmctl -e -h vifmgr "

Scenario: A customer has called to report that the command to create LIFs fails.

22 ONTAP Troubleshooting: Instructor Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


Step Action
3-1. Log in to the cluster management interface, try to create a data LIF, and then answer the
questions:
cluster1::*> net int create -vserver nassvm1 -lif task5 -role
data -data-protocol nfs,cifs,fcache -home-node node2 -home-port
e0d -address 192.168.81.150 -netmask 255.255.255.0
 What error message do you see?
cluster1::*> net int create -vserver nassvm1 -lif task6 -role data -data-protocol
nfs,cifs,fcache -home-node node2 -home-port e0d -address 192.168.81.150 -netmask
255.255.255.0

Error: command failed: e0d is an invalid port on node node2

 Is the error message valid?


 What command would you use to check?

cluster1::*> net port show -node node2


(network port show)
There are no entries matching your query.
Warning: Unable to list entries for vifmgr on node "node2": RPC: Remote system error.
Connection refused.
3-2. Check the cluster connectivity from node2 to all the nodes in the entire cluster, and then answer
the following questions:
 What command do you use?
 What do you see?
cluster1::*> cluster ping-cluster -node node2 -use-sitelist true
Host is node2
Getting addresses from sitelist...
Local = 169.254.32.255 169.254.33.0
Remote = 169.254.32.249 169.254.32.250 169.254.33.23 169.254.33.24 169.254.33.29
169.254.33.30
Cluster Vserver Id = 4294967293
Ping status:
............
Basic connectivity succeeds on 12 path(s)
Basic connectivity fails on 0 path(s)
................................................
Detected 1500 byte MTU on 12 path(s):
Local 169.254.32.255 to Remote 169.254.32.249
Local 169.254.32.255 to Remote 169.254.32.250
Local 169.254.32.255 to Remote 169.254.33.23
Local 169.254.32.255 to Remote 169.254.33.24
Local 169.254.32.255 to Remote 169.254.33.29
Local 169.254.32.255 to Remote 169.254.33.30
Local 169.254.33.0 to Remote 169.254.32.249
Local 169.254.33.0 to Remote 169.254.32.250
Local 169.254.33.0 to Remote 169.254.33.23
Local 169.254.33.0 to Remote 169.254.33.24
Local 169.254.33.0 to Remote 169.254.33.29
Local 169.254.33.0 to Remote 169.254.33.30
Larger than PMTU communication succeeds on 12 path(s)
RPC status:
6 paths up, 0 paths down (tcp check)
6 paths up, 0 paths down (udp check)

23 ONTAP Troubleshooting: Instructor Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


Step Action
3-3. Check the interfaces and the ports on the problem node, node2, and list the command that you
use.
The students should use any of the commands below and identify vifmgr as the culprit
cluster1::*> net int show

cluster1::*> net port show

cluster1::*> cluster ring show

1) node2’s net int show command will error out. All others should succeed
cluster1::> net int show
(network interface show)
Error: show failed: RPC: Remote system error - Connection refused

2) cluster ring show should show the error


cluster1::> cluster ring show
Error: rdb_ring_info: RDB ring state query of 127.0.0.1 for vifmgr failed on RPC connect:
clnttcp_create: RPC: Remote system error - Connection refused

3) net port show will show a consistent error on all 4 nodes


cluster1::> net port show
Warning: Unable to list entries for vifmgr on node "node2": RPC: Remote system error -
Connection refused.

3-4. Attempt the same command from another node, and then answer the following questions:
 What do you see?
 Is there any warning or error?
 What might be wrong?
3-5. Verify your hypothesis on the systemshell using rdb_dump and using ps to check the running
processes, and check the logs from the clustershell.
3-6. You might need to include vifmgr and mgwd by using the following command:
cluster1::*> debug log files modify -incl-files vifmgr, mgwd,
messages
cluster1::*> debug log show –node node2 –timestamp “Mon Oct 10*”

Mon Sep 24 00:24:32 2012 node2 [kern_vifmgr:info:1972] A [quorum/inquorumstate.cc 414]:


Leaving Quorum at Mon Sep 24 00:24:32 2012; membership expired at Mon Sep 24 00:24:32
2012 - not yet avail.
Mon Sep 24 00:24:32 2012 node2 [kern_vifmgr:info:1972] A [quorum/quorumimpl.cc 1609]:
local_offlineUpcall QM Upcall status: Secondary ==> Offline Epoch: 5 => 5 not-stopping
Time Node Log
------------------------ ----------- -----------------------
Mon Sep 24 00:24:32 2012 node2 [kern_vifmgr:info:1972] A [TM.cc
1229]: TM 1002: Report UNIT_IS_OFFLINE (epoch 0, master 0).
Mon Sep 24 00:24:32 2012 node2 [kern_vifmgr:info:1972] A [cluster_events.cc 70]: Cluster
event: node-event, epoch 0, site 1002 [local node offline].
Mon Sep 24 00:24:32 2012 node2 [user_vifmgr:notice] FAILOVER rdb: Local unit VifMgr offline
Mon Sep 24 00:24:32 2012 node2 [kern_vifmgr:info:1972] Notice: online_status_callback: RDB
unit is offline

24 ONTAP Troubleshooting: Instructor Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


Step Action
3-7.

The logs might be verbose, so you might need to use debug log show and parse a
timestamp.

3-8.

The knowledge base https://kb.netapp.com/support/index?page=content&id=1015631


shows an example of how to parse the timestamp.

3-9. Correct the problem using information that you learned in this module.
The students should log in to the system shell on node2 and run ps command to see if vifmgr is
running:

node2% ps -A | grep vifmgr


10337 0 S+ 0:00.00 grep vifmgr

Checking spmctl, we see vifmgr isn't managed:

node2% sudo spmctl | grep vifmgr

Start vifmgr to be monitored by spmctl to fix the issue:


node2% sudo spmctl -e -h vifmgr

3-10. Log in to the cluster management interface, and again try to create the data LIF.
There should be no error messages and the lif should be created successfully.

cluster1::*> net int create -vserver nassvm1 -lif task6 -role data -data-protocol
nfs,cifs,fcache -home-node node2 -home-port e0d -address 192.168.81.150 -netmask
255.255.255.0

End of Exercise

25 ONTAP Troubleshooting: Instructor Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


Module 3: Troubleshooting Network Component and
Security Services
In this exercise, you practice using the secd tool to exercise secd operations from the CLI.

Objectives
This exercise focuses on enabling you to troubleshoot using the diag secd commands.

Task 1: Given a Scenario, Diagram and Identify Network Component and


Data Component Interaction (Includes Finding Which Node Has the
Problem)
When you diagram and document I/O through an ONTAP system, the questions that you generally
need to answer first are the following:
 Which storage virtual machine (SVM) is being used?
 Which client is being used?
 Which LIF is being used by the client to connect to the storage that the SVM provides, and on which
node is it?
 Which volume is the client attempting to access, and on which node is it?
 Is multiprotocol involved? If the data protocol is NFS protocol, do the volumes being accessed
have NTFS security style? If the data protocol is CIFS protocol, do the volumes being accessed
have UNIX or NTFS security style?
Step Action
1-1.

A customer calls in with a problem on its 4-node cluster. The customer states
that the SVM vs3 is not serving data. The customer indicates that it is connecting to
LIF_3, which is on node-3. The customer is trying to access the volume vol_cifs_homes
using CIFS. The customer thinks that the volume is on node-3, which is NTFS security
style.
1-2.
Answer the following questions:

 Is this scenario an example of local or remote I/O?

Local I/O
 Which node is doing the actual protocol work?
Node-3
 Which node is doing the NetApp WAFL and storage work?
Node-3
 Is multiprotocol processing involved?
No

26 ONTAP Troubleshooting: Instructor Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


Step Action
1-3.

A customer calls in with a problem on its 8-node cluster. The customer states
that the SVM vs1 is not serving data. The customer indicates that it is connecting to
LIF_5, which is on node-4. The customer is trying to access the volume vol_nfs_homes
using NFS. The customer thinks that the volume is on an aggregate on node-3, which is
UNIX security style.
1-4.
Answer the following questions:
 Is this scenario an example of local or remote I/O?
remote I/O
 Which node is doing the actual protocol work?
Node-4
 Which node is doing the WAFL and storage work?
Node-3
 Is multiprotocol processing involved?
No

Task 2: Use the network connections active Command to View


Active Connections
Step Action
2-1. On your lab gear, connect to the cluster management interface, and log in as administrator.

2-2. Run the following command, and observe the output:


::>network connections active show

2-3. Check for specific protocol connections by running the following commands:
cluster1::> network connections active show -service
nfs

cluster1::> network connections active show -service


cifs-srv

2-4.

NOTE: In your lab environment, there might be no data protocol


connections. In this case, the table is empty.
2-5. To list all possible values for the –service argument, run the following
command:
cluster1::> network connections active show -service ?
2-6.

NOTE: The –service iscsi argument always returns empty results because the iSCSI
service is not tracked here.
27 ONTAP Troubleshooting: Instructor Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


Step Action
2-7. Run the following command, and observe the output:
cluster1::> network connections active show -fields cid, local-
address, remote-ip, service
cluster1::> network connections active show -fields cid, local-address, remote-ip,
service

node cid vserver local-address remote-ip service


----------- ---------- ------- -------------- --------------- -------
node1 1656328788 Cluster 169.254.32.253 169.254.142.199 ctlopcp
node1 1656328789 Cluster 169.254.32.253 169.254.142.199 ctlopcp
node1 1656328790 Cluster 169.254.32.253 169.254.142.199 ctlopcp
node1 1656328791 Cluster 169.254.32.253 169.254.142.199 ctlopcp

2-8. Enter diag mode using the following command:


cluster1::> set diag

2-9. Select a connection ID (CID) from the output in Step 4.

2-10. Display the properties of the selected CID by running the following command:
cluster1::*> network connections active show -cid <CID #> -
instance
cluster1::*> network connections active show -cid 1656328807 -instance
Node: node1
Connection ID: 1656328807
Vserver: Cluster
Logical Interface Name: node1_clus2
Local IP address: 169.254.150.160
Local Port: 5007
Remote IP Address: 169.254.31.178
Remote Host: 169.254.31.178
Remote Port: 7700
Protocol: TCP
Logical Interface ID: 1023
Protocol Service: ctlopcp
Least Recently Used: no
Connection Blocks Load Balance Migrate: false
Context Id: 3

2-11. Terminate this connection by entering the following command:


cluster1::> network connections active delete -node <node> -cid
<CID #> -vserver <Vserver name>
2-12. Observe the status of this CID by running the following command:
cluster1::> network connections active show -cid <CID #> -
instance

Task 3: Using diag secd


Step Action
3-1. Log in to the node management interface of node4.

28 ONTAP Troubleshooting: Instructor Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


Step Action
3-2. Type the following command, and then answer the questions:
cluster1::> diag secd
 Why does it fail?
 What do you need to do to use the diag secd command for troubleshooting?
Need to be in diag privilege mode

3-3. Identify the UNIX user that the Windows user student1 maps to, and use diag secd to
find this mapping.
cluster1::*> diag secd name-mapping show -node node4 -vserver nassvm1 -direction
win-unix -name student1
'student1' maps to 'pcuser'

3-4. Explain how you query for a Windows security identifier (SID) of student1 using diag
secd.
cluster1::*> diag secd authentication translate -node node4 -vserver
nassvm1 -win-name student1
S-1-5-21-2002460515-4267185084-3612797530-1104

3-5. Explain how you can test a cifs login for a student1 user in diag secd.
cluster1::*> diag secd authentication login-cifs -node node4 -vserver
nassvm1 student1

Enter the password:


UNIX UID: pcuser <> Windows User: CATS\student1 (Windows Domain User)
GID: pcuser
Supplementary GIDs:
pcuser
Windows Membership:
CATS\Domain Users (Windows Domain group)
BUILTIN\Users (Windows Alias)
User is also a member of Everyone, Authenticated Users, and Network Users
Privileges (0x2080):
SeChangeNotifyPrivilege
Authentication Succeeded.

If this does not work clear all caches and reset server discovery as shown in step 6 and step
7. Also set the ntp server as follows:
::>cluster time-service ntp server create -server 192.168.6.10

29 ONTAP Troubleshooting: Instructor Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


Step Action
3-6. Answer the following questions:
 How can you clear caches using diag secd?
 How can you clear more than one at a time?
To clear a given cache, use:
cluster1::*> diag secd cache clear -node node4 -vserver nassvm1 -cache-name
<cache>

To clear all the caches, use


cluster1::*> diag secd restart -node local

You are attempting to restart a process in charge of security services. Do not restart this
process unless the system has generated a "secd.config.updateFail" event or you have
been instructed to restart this process by support personnel.
This command can take up to 2 minutes to complete.
Are you sure you want to proceed? {y|n}: y
Restart successful! Security services are operating correctly.

3-7. List the equivalents of Data ONTAP 7G operating system’s cifs resetdc and cifs
testdc.
cluster1::*> diag secd server-discovery reset -node node4 -vserver
nassvm1
Discovery Reset succeeded for Vserver:

3-8. Explain how you show and set the current logging level in secd.
cluster1::*> diag secd log show -node local
Log Options
----------------------------------
Log level: Debug
Function enter/exit logging: OFF

cluster1::diag secd log*> set -node local -level Debug -enter-exit on


Setting log level to "Debug"
Setting enter/exit to ON

cluster1::diag secd log*> diag secd log show -node local


Log Options
----------------------------------
Log level: Debug
Function enter/exit logging: ON

3-9. Explain how you enable tracing in secd to capture the logging level that is specified.
cluster1::*> diag secd trace set -node local
3-10. Explain how you check the secd configuration for comparison with what is in the RDB.
cluster1::*> diag secd configuration query -node node4 -source-name secd-cache-config
3-11. Explain how you can view and clear active CIFS connections in secd.
cluster1::*> diag secd connections show -node node4 -vserver nassvm2
No cached connections found matching the query provided.
Debrief: Ask the students why the node name is a required parameter for every command?
30 ONTAP Troubleshooting: Instructor Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


Secd is running on each node. Its configuration information is updated or managed by the mgwd. Secd
requests could be coming in from any node in the cluster but secd is processing the request on a specific
node. And, it has to talk to mgwd on that not to get information each tim

End of Exercise

31 ONTAP Troubleshooting: Instructor Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


Module 4: Troubleshooting NFS
In this exercise, you fix various issues during NFS access.

Objectives
This exercise focuses on enabling you to do the following:
 Resolve frequently seen mount issues
 Resolve access issues

Task 1: Mount Issues


Your instructor prepares your lab environment for this exercise and notifies you when it is ready.
Run Mod4_task1_disable_NFS.pl to break the lab.
Run Mod4_task1_disable_NFS_fix.pl to fix the lab..
Manual Break:
cluster1::> nfs server modify -vserver nassvm1 -udp disabled -tcp enabled -v3
disabled -v4.0 disabled

Manual Fix
cluster1::> nfs server modify -vserver nassvm1 -udp enabled -tcp enabled -v3 enabled
-v4.0 enabled

Scenario: A customer cannot mount an NFS share.


Step Action
1-1.
Log in to the Linux client using the following credentials:
 Username: root
 Password: P@ssw0rd.

1-2.
Issue the following commands, and then answer the question.
[root@cats-cent ~] mkdir /nassvm1
[root@catsp-cent ~]# mount -o nfsvers=3
192.168.6.115:/nassvm1_nfs /nassvm1
Does the command succeed?
mount.nfs: requested NFS version or transport protocol is not supported

1-3. Identify the node to which the mount request is going.

1-4.
From that node, capture a packet trace while repeating the previous mount command, and then
answer the following questions:
 Are you able to troubleshoot the issue using the packet trace?
 What is the issue?
(A sample packet trace is on the desktop of the Access host. Wireshark program is
pinned to the task bar of the Access Host.)

32 ONTAP Troubleshooting: Instructor Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


Step Action
cluster1::*> tcpdump start -node node1 -port e0d -address 192.168.6.115
(network tcpdump start)

Info: Started tcpdump packet trace on interface "e0d"

Run the mount command again on the client:


[root@catsp-cent ~]# mount -o nfsvers=3 192.168.6.115:/nassvm1_nfs /nassvm1
mount.nfs: requested NFS version or transport protocol is not supported

cluster1::*> tcpdump stop -node node1 -port e0d


(network tcpdump stop)

cluster1::*> tcpdump trace show


(network tcpdump trace show)
Node Trace File
--------------- --------------------
node1
e0d_20170602_195418.trc0

The trace files are located in /mroot/etc/log/packet_traces


To download it and read it using wireshark do the following:
Download wireshark on to the (Jump Host).
Install it by using all the default selections.

Download the trace file as follows:


In a browser open https://192.168.6.30/spi/node1/etc/log/packet_traces/
You will find your trace file here. Click on it. It will download. Put it on the desktop.(jump host
desktop)
Open wireshark and open this file using wireshark.
You will see the error PROGRAM_NOT_AVAILABLE in the decoded packet trace.

33 ONTAP Troubleshooting: Instructor Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


Step Action

The above is a screen capture from wireshark using snipping tool


This is seen because the ports are not open since the server has pieces disabled.
1-5. Describe how to fix the issue.
cluster1::*> vserver nfs show
Virtual General
Server Access v3 v4.0 v4.1 UDP TCP
------------ ------- -------- -------- -------- -------- --------
nassvm1 true disabled disabled disabled disabled enabled
nassvm2 true enabled disabled disabled enabled enabled
1 entries were displayed.

Enable UDP or TCP and NFS V3


cluster1::*> vserver nfs modify -vserver nassvm1 -v3 enabled

Mount command should work now.

Task 2: Mount and Access Issues


Your instructor prepares your lab environment for this exercise and notifies you when it is ready.

Run Mod4_Task2_mount_access_issues.pl to break the lab


Run Mod4_Task2_mount_access_issues_fix.pl to fix the lab
Manual break:
1) Modify the data volumes of vserver nassvm1 to use the export policy policy1.

cluster1::*> vol modify -vserver nassvm1 -volume nassvm1_nfs -policy policy1

2) Modify the SVM root volume permissions to 700 and change the volume owner to
cmodeuser and apply policy1
cluster1::*> vol modify -vserver nassvm1 -volume nassvm1_root -security-style
34 ONTAP Troubleshooting: Instructor Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


unix -unix-permissions 700 -user cmodeuser -group cmode -policy policy1
3) Unmount the volume from its junction path
cluster1::> vol unmount -vserver nassvm1 -volume
nassvm1_nfs

4) Modify the export policy as follows:.


cluster1::*> export-policy rule modify -vserver nassvm1 -policyname policy1 -
ruleindex 1 -protocol cifs -clientmatch 0.0.0.0 -rorule none -rwrule none -anon 65535

Manual Fix:
1) Modify root volume's attribute.
cluster1::> vol modify -vserver nassvm1 -volume nassvm1_root -security-style unix -
unix-permissions 755 -user root -group root -policy default

2) Modify NFS volume's attributes

cluster1::> vol modify -vserver nassvm1 -volume nassvm1_nfs -security-style unix -


unix-permissions 755 -user root -group root -policy default
3) Mount NFS volume under its junction path

cluster1::> vol mount -vserver nassvm1 -volume


nassvm1_nfs -junction-path /nassvm1_nfs

Scenario: The customer cannot mount due to access issues.


Step Action
2-1.
The customer receives an error when attempting to mount volume nassvm_nfs.
[root@catsp-cent ~]# mount -o nfsvers=3
192.168.6.115:/nassvm1_nfs /nassvm1
mount.nfs: access denied by server while mounting (null)

2-2. Explain why the customer is denied access, and then fix the problem.
The following are the issues here:
1 volume is not mounted to the svm namespace
2 export-policy rules are not set properly
3 permissions on the volumes are not good

cluster1::*> volume show -vserver nassvm1 -fields junction-path


vserver volume junction-path
------- ------------ -------------
nassvm1 nassvm1_cifs /nassvm1_cifs
nassvm1 nassvm1_nfs -
nassvm1 nassvm1_root /
nassvm1 test1 -
4 entries were displayed.

35 ONTAP Troubleshooting: Instructor Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


Step Action
cluster1::*> volume mount -vserver nassvm1 -volume nassvm1_nfs -junction-path
/nassvm1_nfs

Still will not be able to mount from the client.


Check the export policy on the root volume and the volume nassvm1.
Policy is policy1

cluster1::*> volume show -vserver nassvm1 -fields junction-path, policy , security-style ,


unix-permissions
vserver volume policy security-style unix-permissions junction-path
------- ------------ ------- -------------- ---------------- -------------
nassvm1 nassvm1_cifs policy1 ntfs ------------ /nassvm1_cifs
nassvm1 nassvm1_nfs policy1 unix ---rwx------ /nassvm1_nfs
nassvm1 nassvm1_root policy1 unix ---rwx------ /
nassvm1 test1 default unix ---rwxr-xr-x -
4 entries were displayed.

cluster1::*> volume modify -vserver nassvm1 -volume nassvm1_root -unix-permissions


755
Volume modify successful on volume nassvm1_root of Vserver nassvm1.
cluster1::*> volume modify -vserver nassvm1 -volume nassvm1_nfs -unix-permissions
755
Volume modify successful on volume nassvm1_nfs of Vserver nassvm1.
Look at the rules in policy1. Access protocol and the client match needs to be changed
cluster1::*> export-policy rule show -policyname policy1 -ruleindex 1 -fields protocol ,
clientmatch , rorule ,rwrule , superuser , anon
vserver policyname ruleindex protocol clientmatch rorule rwrule anon superuser
------- ---------- --------- -------- ----------- ------ ------ ----- ---------
nassvm1 policy1 1 cifs 0.0.0.0/0 none none 65535 none
nassvm2 policy1 1 cifs 0.0.0.0/0 none none 65535 none
2 entries were displayed. entries were displayed.
Fix the issue using the following command
cluster1::*> export-policy rule modify -policyname policy1 -vserver nassvm1 -ruleindex 1
-protocol nfs -clientmatch 0.0.0.0/0 -rorule any -rwrule any -superuser any

Now the mount command should work:


[root@catsp-cent ~]# mount -o nfsvers=3 192.168.6.115:/nassvm1_nfs /nassvm1

Note : If you get this error:


[root@catsp-cent ~]# mount.nfs: /nassvm1 is busy or already mounted

36 ONTAP Troubleshooting: Instructor Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


Step Action
Then, do the following:
[root@catsp-cent ~]# umount /nassvm1
[root@catsp-cent ~]# mount -o nfsvers=3 192.168.6.115:/nassvm1_nfs /nassvm1

2-3.
If you can mount now, cd into the mount point, and then answer the following questions:
 Can you cd into the mount point?
 If nonoperational, how do you resolve the issue?
 If you unmount and remount, does it still work?
[root@catsp-cent ~]# cd /nassvm1
[root@catsp-cent cmode]#

If the student has not changed anything other than the protocol and client match on the
export policy, they should get permission denied. To resolve, change the anon user to the
owner or modify the volume to allow access for all users, or change the user on the
volume.
If you unmount and remount, it still works.
[root@catsp-cent cmode]# cd ..
[root@catsp-cent /]# umount /nassvm1
[root@catsp-cent /]# mount -o nfsvers=3 192.168.6.115:/nassvm1_nfs /nassvm1
[root@catsp-cent /]#

2-4.
Try to write a file into the /nassvm1 directory, and then answer this question:
Are you able to write the file?

2-5.
After the write succeeds, view the permissions using ls –la, and then answer the following
questions:
 What are the file permissions on the file that you wrote?
 Why are the permissions and owner set the way that they are?
Permissions will be 744.
Owner will depend on anon and super user setting.
Superuser any/anon 0 – file will be root:root
Superuser any/anon = any value – file will be root:root
Superuser none /anon 0 – file will be root:bin
Superuser none /anon 65534 – permission denied to write
Superuser none /anon 65535 – cd /nassvm1 - permission denied

[root@catsp-cent nassvm1]# ll
total 0
-rw-r--r-- 1 root root 0 Jun 2 17:22 f1
-rw-r--r-- 1 root root 0 Jun 2 17:22 f2
-rw-r--r-- 1 root bin 0 Jun 2 17:23 f3

37 ONTAP Troubleshooting: Instructor Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


Step Action
2-6.
Change the export policy rule for the volume to make superuser and anon something other than
what they are, write another file and check permissions, and then answer this question:
What does these actions do?
Will change the owner/group.

2-7.
Open a new Secure Shell (SSH) session to your Linux computer, log in as the user “cmodeuser”
with the password “passwd,” and then answer these questions:
 Can you cd to the mount directory?
 If successful, can you write files to the mount?
 If you notice an issue, what is the reason?
 How do you resolve this issue?
May not be able to cd into the mount because the permissions are 700 unless the vol
permissions have been changed; only the owner/creator can cd/list/write files to this
mount.
cluster1::> vol modify -vserver nassvm1 -volume nassvm1_nfs -unix-permissions 755
Change permissions of the volume from cluster side to levels that would allow non-
creator/owners to write or change the owner to the ID of cmode on the client box.

End of Exercise

38 ONTAP Troubleshooting: Instructor Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


Module 5: Troubleshooting SMB
In this exercise, you fix various issues during CIFS access.

Objectives
This exercise focuses on enabling you to do the following:
 Identify LIFs that are involved in CIFS access
 Troubleshoot using the diag secd commands
 Troubleshoot domain controller login issues
 Troubleshoot SMB user-authentication issues
 Troubleshoot the export policy issues

Task 1: Identify LIFs


Run Mod1_Task5_resolve_replicationFailures_fix.pl to fix the previous lab but most importantly fix
any CIFS time synchronization issues.

Step Action
1-1. Try to access the share vol1 by using SMB and by mapping a network drive to the path
\\nassvm1\vol1 from the Windows host.

1-2. Explain whether you can access the share.

1-3. If there are issues, fix the issues.

You should be able to access the share.


The default local administrator could be disabled. It can be enabled by following the steps below

cluster1::> vserver cifs users-and-groups local-user modify -user-name Administrator -is-


account-disabled false -vserver *

Set the password for Administrator using

cluster1::> vserver cifs user local-user set-password -vserver nassvm1 -user-name


Administrator

Enter the new password:


Confirm the new password:

(you can set it to P@ssw0rd. You will have to do it from the ONTAP side using the above
command. Cannot be done from the client side when it prompts you to change the password)

If you get an error while connecting to the share by hostname (not IP address) from the
windows, check the event logs. You are most likely to get an error message similar to
“secd.kerberos.tktnyv: Kerberos client ticket not yet valid (-1765328351) for vserver (nassvm1)”
and the client will report “ A device attached is not functioning”. If this happens, please check
and correct the time on both windows client and the cluster.
https://technet.microsoft.com/en-us/library/cc780011(v=ws.10).aspx

39 ONTAP Troubleshooting: Instructor Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


Step Action
1-4. Answer the following questions:
 Which node is serving the data?
 Which data LIF is the client using to access the share?
 What is the junction path that is represented by the CIFS share?
 Which storage virtual machine (SVM), volume, and aggregate are involved?

C:\Users\Administrator>netstat -an
Active Connections

TCP 192.168.6.11:139 0.0.0.0:0 LISTENING


TCP 192.168.6.11:3389 192.168.6.254:49441 ESTABLISHED
TCP 192.168.6.11:52870 192.168.6.117:445 ESTABLISHED

cluster1::> net int show -address 192.168.6.117


(network interface show)
Logical Status Network Current Current Is
Vserver Interface Admin/Oper Address/Mask Node Port Home
----------- ---------- ---------- ------------------ ------------- ------- ----
nassvm1
nassvm1_data3
up/up 192.168.6.117/24 node3 e0d true

cluster1::>

cluster1::> vserver cifs share show -share-name vol1


Vserver Share Path Properties Comment ACL
-------------- ------------- ----------------- ---------- -------- -----------
nassvm1 vol1 /nassvm1_cifs oplocks - Everyone / Full Control
browsable
changenotify

cluster1::> volume show -junction-path /nassvm1_cifs -fields aggregate,node


vserver volume aggregate node
-------- ------------- --------- -----------
nassvm1 nassvm1_cifs nassvm1 node1

40 ONTAP Troubleshooting: Instructor Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


Step Action
1-5. Check which data LIFs have active sessions.
cluster1::> net connections active show-lifs
(network connections active show-lifs)

Node Vserver Name Interface Name Count

-------------- -------------- ---------------- ------

node1

Cluster cluster1-01_clus1 102

Cluster cluster1-01_clus2 102

node2

Cluster cluster1-02_clus1 72

Cluster cluster1-02_clus2 72

node3

nassvm1 nassvm1_data3 1

Cluster cluster1-03_clus1 108

Cluster cluster1-03_clus2 108

node4

Cluster cluster1-04_clus1 114

Cluster cluster1-04_clus2 114


9 entries were displayed.

1-6. Explain whether the most efficient network path to the volume is being used.
No.
Client is connected via nassvm1_data3 lif hosted on e0d on node3.
CIFS share lives on volume nassvm1_cifs, aggregate nassvm1 on node1

Task 2: Domain Controller Login Issues


Your instructor prepares your lab environment for this exercise and notifies you when it is ready.

Run Mod5_Task2_cifs_change_time.pl to break the lab.


Run Mod5_Task2_cifs_change_time_fix.pl to fix the lab.
Manual Break:
41 ONTAP Troubleshooting: Instructor Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


Record the Domain Controller's time and times zone. Set the time zone on the ONTAP
Cluster to a different zone (1 hour behind). Set the cluster time to what we recorded
earlier.

Scenario: The customer cannot access a CIFS share.


Step Action
2-1.
Close all SMB connections, and from the Windows host, attempt to access the SMB share
\\nassvm1\vol1 using the following credentials:
Username: student1
Password: P@ssw0rd

2-2. Explain the errors that you received.


Cannot access

2-3.
Instead of using the host name, use the LIF IP to access the CIFS share, and answer this
question:
Can you access the share?
Yes.

2-4.
Analyze the issues, and use related commands to troubleshoot and fix the issues.
Hint: Use the Command cifs session show -instance when you map using
vserver name and when you map using the IP address and check the protocol that
is being used for authentication.
When you use the vserver name for the mapping:
cluster1::*> cifs session show -instance

Vserver: nassvm1

Node: node4
Session ID: 3439342740427505666
Connection ID: 3950412208
Incoming Data LIF IP Address: 192.168.6.118
Workstation IP Address: 192.168.6.11
Authentication Mechanism: Kerberos
User Authenticated as: domain-user
Windows User: CATSP\student1
When you use the data LIF to map:
cluster1::*> cifs session show -instance

Vserver: nassvm1

42 ONTAP Troubleshooting: Instructor Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


Step Action

Node: node2
Session ID: 15892640135037059074
Connection ID: 684290916
Incoming Data LIF IP Address: 192.168.6.115
Workstation IP Address: 192.168.6.11
Authentication Mechanism: NTLMv2
User Authenticated as: domain-user
Windows User: CATSP\student1
UNIX User: pcuser

The root cause of this issue is that when Kerberos is used, the time skew
between DC and the client cannot be more than 5 minutes.
The cause of this issue can be found by one of the 3 ways below:
- run “diag secd” command to test login. Authentication will fail due to the Kerberos error
with more than 5 minute time lag
- /mroot/etc/secd log shows the Kerberos error with more than 5 minute time lag
- If collecting the packet trace on the corresponding node, there will be a packet showing
the same error.

cluster1::*> diag secd authentication login-cifs -node node3 -vserver nassvm1 -user
catsp\student1
Enter the password:
Vserver: nassvm1 (internal ID: 5)
Error: User authentication procedure failed
[ 0 ms] Login attempt by domain user 'catsp\student1' using
NTLMv2 style security
[ 1] Successfully connected to ip 192.168.6.10, port 445 using
TCP
[ 9] Encountered NT error (NT_STATUS_MORE_PROCESSING_REQUIRED)
for SMB command SessionSetup
[ 11] Cluster and Domain Controller times differ by more than
the configured clock skew (KRB5KRB_AP_ERR_SKEW)
[ 11] Kerberos authentication failed with result: 7537.
[ 13] Unable to connect to NetLogon service on
catsp-win-1.catsp.csslp.netapp.com (Error:
RESULT_ERROR_SECD_NO_CONNECTIONS_AVAILABLE)
[ 14] No servers available for MS_NETLOGON, vserver: 5, domain:
catsp.csslp.netapp.com.
**[ 14] FAILURE: Unable to make a connection
** (NetLogon:CATSP.CSSLP.NETAPP.COM), result: 6940
[ 14] CIFS authentication failed

Error: command failed: Failed to authenticate user. Reason: "SecD Error: no server
available".

The error also shows up in the secd.log on node3 after enabling diag secd tracing:

43 ONTAP Troubleshooting: Instructor Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


Step Action
diag secd trace set -trace-all yes -node node3

00000006.0001f48f 00b0d80b Wed Jun 14 2017 03:51:12 +02:00 [kern_secd:info:71180] |


[000.000.090] debug: [SECD SERVER THREAD] SecD RPC Server received RPC from
NBLADE_CIFS. RPC 151: secd_rpc_auth_extended { in secd_prog_1() at
src/server/secd_rpc_server.cpp:1524 }
00000006.0001f490 00b0d80b Wed Jun 14 2017 03:51:12 +02:00 [kern_secd:info:71180] |
[000.000.204] debug: [SECD MASTER THREAD] SecD RPC 151:secd_rpc_auth_extended
added to the Generic RPC task queue with Request ID:1581. { in pushRpcTask() at
src/server/secd_rpc_server.cpp:1336 }
00000006.0001f491 00b0d80b Wed Jun 14 2017 03:51:12 +02:00 [kern_secd:info:71180]
00000006.0001f492 00b0d80b Wed Jun 14 2017 03:51:12 +02:00 [kern_secd:info:71180] .-------
-----------------------------------------------------------------------.
00000006.0001f493 00b0d80c Wed Jun 14 2017 03:51:12 +02:00 [kern_secd:info:71180]
| TRACE MATCH |
00000006.0001f494 00b0d80c Wed Jun 14 2017 03:51:12 +02:00 [kern_secd:info:71180]
| RPC is being dumped because of a tracing match on: |
00000006.0001f495 00b0d80c Wed Jun 14 2017 03:51:12 +02:00 [kern_secd:info:71180]
| All |
00000006.0001f496 00b0d80c Wed Jun 14 2017 03:51:12 +02:00 [kern_secd:info:71180] .-------
-----------------------------------------------------------------------.
00000006.0001f497 00b0d80c Wed Jun 14 2017 03:51:12 +02:00 [kern_secd:info:71180]
| RPC FAILURE: |
00000006.0001f498 00b0d80c Wed Jun 14 2017 03:51:12 +02:00 [kern_secd:info:71180]
| secd_rpc_auth_extended has failed |
00000006.0001f499 00b0d80c Wed Jun 14 2017 03:51:12 +02:00 [kern_secd:info:71180]
| Result = 0, RPC Result = 4 |
00000006.0001f49a 00b0d80c Wed Jun 14 2017 03:51:12 +02:00 [kern_secd:info:71180]
| RPC received at Wed Jun 14 03:51:12 2017 |
00000006.0001f49b 00b0d80c Wed Jun 14 2017 03:51:12 +02:00 [kern_secd:info:71180] |-------
-----------------------------------------------------------------------'
00000006.0001f49c 00b0d80c Wed Jun 14 2017 03:51:12 +02:00 [kern_secd:info:71180]
Failure Summary:
00000006.0001f49d 00b0d80c Wed Jun 14 2017 03:51:12 +02:00 [kern_secd:info:71180] Error:
User authentication procedure failed
00000006.0001f49e 00b0d80c Wed Jun 14 2017 03:51:12 +02:00 [kern_secd:info:71180] CIFS
SMB2 Share mapping - Client Ip = 192.168.6.11
00000006.0001f49f 00b0d80c Wed Jun 14 2017 03:51:12 +02:00 [kern_secd:info:71180] [ 3
ms] Error accepting security context for Vserver identifier (5). Cluster and Domain Controller
times differ by more than the configured clock skew (KRB5KRB_AP_ERR_TKT_NYV).

Fix the time skew.

Task 3: Authentication Issues


Your instructor prepares your lab environment for this exercise and notifies you when it is ready.
Run Mod5_Task3_cifs_remove_name_Mapping.pl to break the lab.
Run Mod5_Task3_cifs_remove_name_Mapping_fix.pl to fix the lab.
Manual Break:
Name mappings shouldn’t exist, but students may have created some in playing with the
systems. Check them by running

cluster1::> vserver name-mapping delete -direction win-unix -position *

44 ONTAP Troubleshooting: Instructor Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


cluster1::> vserver services unix-user delete -user pcuser -vserver nassvm*

1.) Make a new UNIX user:


cluster1::> unix-user create -vserver nassvm1 -user test1 -id 3500 -primary-gid
3501

2) Modify the CIFS volume


cluster1::> vol modify -vserver nassvm1 -volume nassvm1_cifs -
security-style unix -user test1 -group 3501 -unix-permissions 700

3) Remove the default unix user pcuser from cifs options


cluster1::> cifs options modify -vserver nassvm1 -default-unix-user pcuser
Manual Fix
1) Create a mapping between root and Administrator

cluster1::> vserver name-mapping create -vserver nassvm1 -direction unix-win -


position 1 -pattern root -replacement Administrator
2) Recreate the default PCUSER account

cluster1::*> cifs options modify -vserver nassvm1 -default-unix-user pcuser


3) Delete the test user account

cluster1::*> unix-user delete -vserver nassvm1 -user test1


4) Modify volume's security style

cluster1::*> vol modify -vserver nassvm1 -volume nassvm1_cifs -security-style


ntfs

Scenario: The customer still cannot access the share.


Step Action
4-1. From the Windows client, log in with the following credentials:
Username: student1
Password: P@ssw0rd

4-2. Access Start > Run > \\nassvm1, and then describe the error message that you see.
Windows cannot access \\nassvm1

45 ONTAP Troubleshooting: Instructor Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


Step Action
4-3.
Is the user name valid? Use diag secd commands to check this.

cluster1::*> set diag


cluster1::*> diag secd authentication translate -node local -vserver nassvm1 -
win-name student1
cluster1::*> diag secd authentication sid-to-uid -node local -vserver
nassvm1 –sid <sid from previous command>

Error: Lookup of CIFS account SID procedure failed


[ 76] Retrieved CIFS credentials via S4U2Self for full Windows
user name 'student1@CATSP.CSSLP.NETAPP.COM'
[ 76] Trying to map 'CATSP\student1' to UNIX user 'student1'
using implicit mapping
[ 80] Entry for user-name: student1 not found in the current
source: FILES. Entry for user-name: student1 not found in
any of the available sources
[ 80] Unable to map 'CATSP\student1'. No default UNIX user
defined.
**[ 80] FAILURE: Name mapping for Windows user 'CATSP\student1'
** failed. No mapping found
[ 80] SID lookup failed

Error: command failed: Failed to convert Windows SID to a Unix ID. Reason:
"SecD Error: Name mapping does not exist".

cluster1::*> diag secd authentication show-creds -node node4 -vserver nassvm1 -win-
name student1

Vserver: nassvm1 (internal ID: 5)

Error: Get user credentials procedure failed


[ 10] Retrieved CIFS credentials via S4U2Self for full Windows
user name 'student1@CATSP.CSSLP.NETAPP.COM'
[ 10] Trying to map 'CATSP\student1' to UNIX user 'student1'
using implicit mapping
[ 12] Entry for user-name: student1 not found in the current
source: FILES. Entry for user-name: student1 not found in
any of the available sources
[ 12] Unable to map 'CATSP\student1'. No default UNIX user
defined.
**[ 12] FAILURE: Name mapping for Windows user 'CATSP\student1'
** failed. No mapping found

Error: command failed: Failed to get user credentials. Reason: "SecD Error:
Name mapping does not exist".

46 ONTAP Troubleshooting: Instructor Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


Step Action
4-4. Run the following command to view the logs, and then answer the question.
cluster1::> event log show
What do the logs show?
Time Node Severity Event
------------------- ---------------- ------------- ---------------------------
6/8/2017 13:50:02 node1 ERROR secd.nfsAuth.noNameMap: vserver (nassvm1)
Cannot map UNIX name to CIFS name. Error: Get user credentials procedure failed
[ 11] Retrieved CIFS credentials via S4U2Self for full Windows user name
'student1@CATSP.CSSLP.NETAPP.COM'
[ 11] Trying to map 'CATSP\student1' to UNIX user 'student1' using implicit mapping
[ 13] Entry for user-name: student1 not found in the current source: FILES. Entry for user-
name: student1 not found in any of the available sources
[ 13] Unable to map 'CATSP\student1'. No default UNIX user defined.
**[ 13] FAILURE: Name mapping for Windows user 'CATSP\student1' failed. No mapping
found

47 ONTAP Troubleshooting: Instructor Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


Step Action
4-5.
Use diag secd command to confirm the issue. What other commands can you run to view
configurtations that confirm the issue?

cluster1::*> diag secd name-mapping show -node local -vserver nassvm1 -direction win-
unix -name stdent1

ATTENTION: Mapping of Data ONTAP "admin" users to UNIX user "root" is enabled, but
the following information does not reflect this mapping.

Vserver: nassvm1 (internal ID: 5)

Error: RPC map name request procedure failed


[ 1 ms] Trying to map 'student1' to UNIX user 'student1' using
implicit mapping
[ 3] Entry for user-name: student1 not found in the current
source: FILES. Entry for user-name: student1 not found in
any of the available sources
[ 3] Unable to map 'student1'. No default UNIX user defined.
**[ 3] FAILURE: Name mapping for Windows user 'student1' failed.
** No mapping found

Error: command failed: Failed to find mapping for the user. Reason: "SecD
Error: Name mapping does not exist".

cluster1::*> cifs options show -fields default-unix-user


vserver default-unix-user
------- -----------------
nassvm1 -
nassvm2 -
2 entries were displayed.

If you set the default-unix-user option, you would still need to create the default PC user
on the UNIX system. This is the same as 7-Mode where you need an entry in the
/etc/passwd file on the UNIX system.

48 ONTAP Troubleshooting: Instructor Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


Step Action
4-6. Go to the systemshell of the node that reported the error, look at the appropriate log file,
and then answer these questions:
 Do you see that the issue that is logged is there?
 Can you identify the root cause?
 How do you fix it?
 Are you able to access the share through SMB now?
Secd.log that resides in /mroot/etc/mlog on the node that reports the error in the output of
the event log show command will have the logs confirming the issue. To view the file use the
following command from the systemshell.

node4% tail -50 /mroot/etc/mlog/secd.log

Windows user student1 should be mapped to unix user pcuser.


pcuser is missing in unix-users and cifs options does not specify a default unix user name to
use when name mapping cannot be done.

2 ways to fix it:


1 Add pcuser back to the list of unix users and make it the default unix user for cifs
2 make the default user to point to another valid unix user

cluster1::> services unix-user create -vserver nassvm1 -user pcuser -id 65534
-primary-gid 65534 -full-name pcuser

cluster1::*> cifs options modify -vserver nassvm1 -default-unix-user pcuser -read-grants-


exec enabled

4-7. If you still cannot access the share through SMB, check whether the user mapping is still a
problem.
No, not able to access the share.
Do the following:

- Enable debug logging for secd on the node that owns your data lifs

- Enter systemshell and cd to /mroot/etc/mlog

- Type tail –f secd.log

- Close the CIFS session on the Windows host and run net use /d* from cmd to clear
cached sessions and retry the connection

Usermapping succeeds

4-8. If the user mapping problem still exists, fix it.

49 ONTAP Troubleshooting: Instructor Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


Step Action
4-9. Given that the user mapping succeeds, but you are still unable to access the share, explain
what the issue could be.
Since usermapping works, permissions are the likely issue
Looking at the UNIX permissions, you see the permissions don’t give you access.
cluster1::*> vol show -vserver nassvm1 -fields security-style , unix-permissions
vserver volume security-style unix-permissions
------- ------------ -------------- ----------------
nassvm1 nassvm1_cifs unix ---rwx------
nassvm1 nassvm1_nfs unix ---rwxr-xr-x
nassvm1 nassvm1_root unix ---rwxr-xr-x
nassvm1 test1 unix ---rwxr-xr-x
4 entries were displayed.
To fix the permissions, without becoming root (su) and without changing the owner, you
need to use vserver name mapping.

Other ways to fix the problem is to change the security style, change the owner,
change the permissions. The best way depends on the needs of the customer.

4-10. List the commands that are available to review security settings, such as permissions and
security style on volumes, shares, and so on.
cluster1::>vol show –instance
cluster1::>vserver security file-directory show

4-11. From the cluster shell, use vserver security file-directory show to view
permissions on the volumes you’re trying to access. Should the user have access to
these volumes?

cluster1::*> vol show -vserver nassvm1 -fields security-style , unix-permissions , junction-


path, user
vserver volume user security-style unix-permissions junction-path
------- ------------ ----- -------------- ---------------- -------------
nassvm1 nassvm1_cifs test1 unix ---rwx------ /nassvm1_cifs
nassvm1 nassvm1_nfs 0 unix ---rwxr-xr-x /nassvm1_nfs
nassvm1 nassvm1_root 0 unix ---rwxr-xr-x /
nassvm1 test1 0 unix ---rwxr-xr-x -
4 entries were displayed.

cluster1::*> cifs share show -vserver nassvm1


Vserver Share Path Properties Comment ACL
-------------- ------------- ----------------- ---------- -------- -----------
nassvm1 admin$ / browsable - -
nassvm1 c$ / oplocks - BUILTIN\Administrators / Full Control

50 ONTAP Troubleshooting: Instructor Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


Step Action

browsable
changenotify
show-previous-versions
nassvm1 ipc$ / browsable - -
nassvm1 vol1 /nassvm1_cifs oplocks - Everyone / Full Control
browsable
changenotify
4 entries were displayed.
cluster1::*> vserver security file-directory show -vserver nassvm1 -path
/nassvm1_cifs

Vserver: nassvm1
File Path: /nassvm1_cifs
File Inode Number: 64
Security Style: unix
Effective Style: unix
DOS Attributes: 10
DOS Attributes in Text: ----D---
Expanded Dos Attributes: -
UNIX User Id: 3500
UNIX Group Id: 3501
UNIX Mode Bits: 700
UNIX Mode Bits in Text: rwx------
ACLs: -

User student1 (the user that is logged in the windows client) will have access to the root
volume because the permissions on the nassvm1_root is 755. But it does not have
access to the nassvm1_cifs volume as the iunix permissions are set to 700.

4-12. Explain how you resolve this issue.


Change the permissions on the nassvm1_cifs vol to allow at least read access or change
the user to the one the windows user maps to or flip the security style to NTFS

4-13. Change the security style of the volume to NTFS, and see whether you can access the volume
now.
Yes

51 ONTAP Troubleshooting: Instructor Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


Task 4: Export Policies
Your instructor prepares your lab environment for this exercise and notifies you when it is ready.

The script to be run is


Run Mod5_Task4_cifs_change_policy.pl to break the lab.
Run Mod5_Task4_cifs_change_policy_fix.pl to fix the lab.
Manual Break:
1) Modify the \\student1_cifs volume to use policy1 policy

cluster1::> vol modify -vserver nassvm* -volume nassvm*_cifs -policy policy1


2) Modify the policy1 policy to the following

cluster1::*> export-policy rule modify -vserver nassvm* -policyname policy1 -


ruleindex * -protocol nfs -clientmatch 0.0.0.0/0 -rorule sys -rwrule sys
Manual Fix:
1) cluster1::> export-policy create -vserver nassvm1 -policyname wideopen

2) cluster1::> export-policy rule create -vserver nassvm1 -policyname wideopen -


clientmatch 0.0.0.0/0 -rorule any -rwrule any -allow-suid true -allow-dev true -
ruleindex 1 -protocol any -anon 0 -superuser any

3) cluster1::> vol modify -vserver nassvm* -volume nassvm*_cifs -policy wideopen

4) cluster1::> vol modify -vserver nassvm* -volume nassvm*_root -policy wideopen

Scenario: The customer still cannot access the share.


Step Action
4-1. Log in to the Windows client using the following credentials:
Username: student1
Password: P@ssw0rd

4-2. Try to access \\nassvm1\vol1, and describe the error that you see.
Login prompt pop-up appears and access is denied.

4-3. Answer the following questions:


 Do the event logs show any errors?
 What about the secd log?
Event log shows no errors; secd shows valid user mappings
node4% tail -f /mroot/etc/mlog/secd.log

52 ONTAP Troubleshooting: Instructor Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


Step Action
4-4. Run the following command to see whether it shows that the permissions on the volume
should enable access.
cluster1::*> vserver security file-directory show -vserver
nassvm1 -path /nassvm1_cifs
Yes
cluster1::*> vserver security file-directory show -vserver nassvm1 -path
/nassvm1_cifs

Vserver: nassvm1
File Path: /nassvm1_cifs
File Inode Number: 64
Security Style: ntfs
Effective Style: ntfs
DOS Attributes: 10
DOS Attributes in Text: ----D---
Expanded Dos Attributes: -
UNIX User Id: 0
UNIX Group Id: 0
UNIX Mode Bits: 777
UNIX Mode Bits in Text: rwxrwxrwx
ACLs: NTFS Security Descriptor
Control:0x8004
Owner:BUILTIN\Administrators
Group:BUILTIN\Administrators
DACL - ACEs
ALLOW-Everyone-0x1f01ff
ALLOW-Everyone-0x10000000-OI|CI|IO

53 ONTAP Troubleshooting: Instructor Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


Step Action
4-5. Answer the following questions:
 What do you think could be the issue?
 How do you fix the issue?
::*> cifs options show -vserver nassvm1 -fields is-exportpolicy-enabled
vserver is-exportpolicy-enabled
------- -----------------------
nassvm1 true
cluster1::*> volume show -volume nassvm* -fields policy
vserver volume policy
------- ------------ -------
nassvm1 nassvm1_cifs policy1
nassvm1 nassvm1_nfs default

cluster1::*> export-policy rule show -policy default -vserver nassvm1 -fields


ruleindex, clientmatch , rorule , rwrule, superuser , anon

vserver policyname ruleindex clientmatch rorule rwrule anon superuser


------- ---------- --------- ----------- ------ ------ ----- ---------
nassvm1 default 1 0.0.0.0/0 any any 65534 any

cluster1::*> export-policy rule show -policy policy1 -vserver nassvm1 -fields


ruleindex, clientmatch , rorule , rwrule, superuser , anon

vserver policyname ruleindex clientmatch rorule rwrule anon superuser


------- ---------- --------- ----------- ------ ------ ----- ---------
nassvm1 policy1 1 0.0.0.0/0 sys sys 65534 none

Modify export policy rule of policy1 to allow RW and RO, as well as changing the
protocol to allow CIFS
cluster1::*> export-policy rule modify -vserver nassvm1 -policyname policy1 -
ruleindex 1 -protocol any -clientmatch 0.0.0.0/0 -rorule any -rwrule any

If it still fails, do not forget to clear secd cache on all nodes.


cluster1::*> secd restart -node node1
cluster1::*> secd restart -node node2
cluster1::*> secd restart -node node3
cluster1::*> secd restart -node node4

End of Exercise

54 ONTAP Troubleshooting: Instructor Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


Module 6: Troubleshooting Scalable SAN
In this exercise, you fix various issues during SAN access.

Objectives
This exercise focuses on enabling you to do the following:
 Use standard Linux commands to evaluate a Linux host in a NetApp scalable SAN environment
 Use standard Linux commands to identify SAN disks in a NetApp scalable SAN environment
 Use standard Linux commands to verify connectivity in a NetApp scalable SAN environment
 Use standard Linux log files to evaluate the iSCSI subsystem in a NetApp scalable SAN environment
 Troubleshoot a Linux host in a NetApp scalable SAN environment
 Troubleshoot a Windows host in a NetApp scalable SAN environment
 Restore LUN connectivity

Task 1: Evaluate a Linux Host in a Scalable SAN Environment


Verify your environment to make sure that it is stable enough for the remaining labs.

This exercise is designed as a tutorial for students who have little or no Linux experience
It is important that the students verify that the disks are alive and are not stale.

Step Action
1-1. Log in to the Linux system ots-cent as root, run the following commands to evaluate a Linux
host, and record the results in the space provided.
 Determine the IP address of the host: #ifconfig eth0
 Verify that the iSCSI initiator is installed: #rpm –qa | grep iscsi
 Verify that the host is logged in to the iSCSI array (target): #iscsiadm –m session
 The IP addresses and iSCSI Qualified Names (IQNs) that are listed belong to the targets.
tcp: [10] 192.168.6.131:3260,1037 iqn.1992-
08.com.netapp:sn.140668517d5511e5ac18005056bf03f8:vs.16
 List the IQN and IP addresses of the targets that are shown in the output of the previous
command from ONTAP:
::> net int show -vserver sansvm*
::>iscsi show -instance
[root@catsp-cent ~]# ifconfig eth0
eth0 Link encap:Ethernet HWaddr 00:50:56:BF:5B:64
inet addr:192.168.6.20 Bcast:192.168.6.255 Mask:255.255.255.0
inet6 addr: fe80::250:56ff:febf:5b64/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:84324 errors:0 dropped:0 overruns:0 frame:0
TX packets:7448 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:7582697 (7.2 MiB) TX bytes:1140153 (1.0 MiB)

55 ONTAP Troubleshooting: Instructor Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


Step Action

[root@catsp-cent ~]# rpm -qa | grep iscsi


iscsi-initiator-utils-6.2.0.872-41.el6.x86_64

[root@catsp-cent ~]# iscsiadm -m session


tcp: [3] 192.168.6.131:3260,1034 iqn.1992-
08.com.netapp:sn.a2aa340f479411e7b0040050560120f9:vs.7
tcp: [4] 192.168.6.132:3260,1035 iqn.1992-
08.com.netapp:sn.a2aa340f479411e7b0040050560120f9:vs.7
tcp: [7] 192.168.6.136:3260,1037 iqn.1992-
08.com.netapp:sn.b106e9bc479411e7b0040050560120f9:vs.8
tcp: [8] 192.168.6.135:3260,1036 iqn.1992-
08.com.netapp:sn.b106e9bc479411e7b0040050560120f9:vs.8

cluster1::*> net int show -vserver sansvm*


(network interface show)
Logical Status Network Current Current Is
Vserver Interface Admin/Oper Address/Mask Node Port Home
----------- ---------- ---------- ------------------ ------------- ------- ----
sansvm1
sansvm1_data1
up/up 192.168.6.131/24 node3 e0d true
sansvm1_data2
up/up 192.168.6.132/24 node4 e0d true
sansvm2
sansvm2_data1
up/up 192.168.6.135/24 node3 e0d true
sansvm2_data2
up/up 192.168.6.136/24 node4 e0d true
4 entries were displayed.

cluster1::*> iscsi show -instance

Vserver: sansvm1
Target Name: iqn.1992-
08.com.netapp:sn.a2aa340f479411e7b0040050560120f9:vs.7

56 ONTAP Troubleshooting: Instructor Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


Step Action
Target Alias: sansvm1
Administrative Status: up
Max Error Recovery Level: 0
RFC3720 DefaultTime2Retain Value (in sec): 20
Login Phase Duration (in sec): 15
Max Connections per Session: 4
Max Commands per Session: 128
TCP Receive Window Size (in bytes): 131400

Vserver: sansvm2
Target Name: iqn.1992-
08.com.netapp:sn.b106e9bc479411e7b0040050560120f9:vs.8
Target Alias: sansvm2
Administrative Status: up
Max Error Recovery Level: 0
RFC3720 DefaultTime2Retain Value (in sec): 20
Login Phase Duration (in sec): 15
Max Connections per Session: 4
Max Commands per Session: 128
TCP Receive Window Size (in bytes): 131400
2 entries were displayed.

57 ONTAP Troubleshooting: Instructor Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


Step Action
1-2. Type the following command to identify the SAN disks that are attached to a Linux host.
[root@ots-cent ~]# fdisk -l
Disk /dev/sdb: 209 MB, 209715200 bytes
7 heads, 58 sectors/track, 1008 cylinders
Units = cylinders of 406 * 512 = 207872 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 65536 bytes
Disk identifier: 0x00000000

Disk /dev/sdd: 104 MB, 104857600 bytes


4 heads, 50 sectors/track, 1024 cylinders
Units = cylinders of 200 * 512 = 102400 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 65536 bytes
Disk identifier: 0x00000000

Disk /dev/sdc: 209 MB, 209715200 bytes


7 heads, 58 sectors/track, 1008 cylinders
Units = cylinders of 406 * 512 = 207872 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 65536 bytes
Disk identifier: 0x00000000

Disk /dev/sde: 104 MB, 104857600 bytes


4 heads, 50 sectors/track, 1024 cylinders
Units = cylinders of 200 * 512 = 102400 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 65536 bytes
Disk identifier: 0x00000000

58 ONTAP Troubleshooting: Instructor Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


Step Action
1-3. Type the following command to determine the state of the Linux iSCSI service, and start the
service if it is not already started.
[root@ots-cent ~]# service iscsi status
iSCSI Transport Class version 2.0-870
version 2.0-872.41.el6
Target: iqn.1992-
08.com.netapp:sn.140668517d5511e5ac18005056bf03f8:vs.16
Current Portal: 192.168.6.131:3260,1037
Persistent Portal: 192.168.6.131:3260,1037
**********
Interface:
**********
Iface Name: default
Iface Transport: tcp
Iface Initiatorname: iqn.1994-05.com.redhat:ots-
cent
Iface IPaddress: 192.168.6.20
Iface HWaddress: <empty>
Iface Netdev: <empty>
SID: 10
iSCSI Connection State: LOGGED IN
iSCSI Session State: LOGGED_IN
Internal iscsid Session State: NO CHANGE
************************
Attached SCSI devices:
************************
Host Number: 12 State: running
scsi12 Channel 00 Id 0 Lun: 0
Attached scsi disk sdb State:
running
scsi12 Channel 00 Id 0 Lun: 1
Attached scsi disk sdd State:
running

1-4. Use the output of the service iscsi status command that is displayed in Step 3 to
answer the following questions:
 List the Iface initiatorname: ____________________________________
 List the iSCSI connection state: ________________________________
 List the disks that are attached to SCSI12 Channel 00: ______________________
 List the state of each disk: ____________________________________
 List the current portal: ________________________________________

59 ONTAP Troubleshooting: Instructor Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


Step Action
1-5. Type the following command to verify connectivity between the host and target:
[root@cats-cent ~]# netstat -pant | grep iscsi

Active Internet connections (servers and established)


Proto Recv-Q Send-Q Local Address Foreign Address
State PID/Program name
tcp 0 0 192.168.6.20:40412
192.168.6.131:3260 ESTABLISHED 1382/iscsid
tcp 0 0 192.168.6.20:39372
192.168.6.135:3260 ESTABLISHED 1382/iscsid
tcp 0 0 192.168.6.20:52005
192.168.6.136:3260 ESTABLISHED 1382/iscsid
tcp 0 0 192.168.6.20:47742
192.168.6.132:3260 ESTABLISHED 1382/iscsid

1-6. The state of the active internet connection between the host (local address) and target (foreign
address) is ESTABLISHED.

1-7. The Linux host records events about the iSCSI subsystem in the system messages file,
/var/log/messages.

60 ONTAP Troubleshooting: Instructor Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


Step Action
1-8. Run the following commands to view log entries that are related to the iSCSI subsystem.
 Observe the current date and time:
 [root@cats-cent~]# date
Thu Oct 29 04:14:47 EDT 2015
 Stop the iSCSI service and observe the event that is recorded in the /var/log/messages file:
[root@cats-cent~]# service iscsi stop && tail -f
/var/log/messages
 Press Ctrl+C to exit the log file view.
 Observe that the service shutdown event for each connection has been recorded in the log:
Oct 29 04:14:55 cats-centiscsid: Connection21:0 to [target:
iqn.1992-
08.com.netapp:sn.140668517d5511e5ac18005056bf03f8:vs.16,
portal: 192.168.6.132,3260] through [iface: default] is
shutdown.
 Correlate the date and time from Step 1 to the date and time of the log entries.
 Start the iSCSI service and observe the event that is recorded in the /var/log/messages file:
 [root@cats-cent~]# service iscsi start && tail -f
/var/log/messages
 Correlate the date and time from Step 1 to the date and time of the log entries.
 Observe that the log entries record the start-up event. Each disk is enumerated (sdb, sdc,
sdd, sde) and attached:
 Oct 29 04:15:03 cats-cent kernel: sd 28:0:0:0: [sdb] Attached
SCSI disk
 Observe that each connection to the targets is enumerated and listed as operational:
Oct 29 04:15:04 cats-cent iscsid: Connection28:0 to [target:
iqn.1992-
08.com.netapp:sn.1dd67cf37d5511e5ac18005056bf03f8:vs.17,
portal: 192.168.6.136,3260] through [iface: default] is
operational now
 Press Ctrl+C to exit the log file view.
1-9.

NOTE: The “&&” runs two or more commands in succession.

Task 2: The Linux Host HAS Lost iSCSI connections


Your instructor prepares your lab environment for this exercise and notifies you when it is ready.
(This LAB requires students knowing how to configure IPTABLES. If most of the students in the
class are not aware of IPTABLES, move to the next lab and leave this lab to the last for those
students who finish other lab quickly and still have time.)

Run Mod6_Task2_block3260_linux.pl to break the lab.

This script will add a rule to the Linux IPTABLES firewall, to block all outgoing packets to TCP port
3260 on SANSVM1. Please note that packets to SANSVM2 is allowed. Students may delete this
rule, or disable firewall completely in order to get rid of this issue.

61 ONTAP Troubleshooting: Instructor Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


The student is expected to identify a connection issue by
1) Evaluating the storage environment to ensure all elements are available.
2) Evaluating the LINUX host using the commands and log files introduced in Task 1. While there is
more than one method to identify the issue, the student should find the problem using the netstat
command:
There is no fix script.
Manual Break
[root@cats-cent ~]# /sbin/iptables -F
[root@cats-cent ~]# /sbin/iptables -t filter -A OUTPUT -d 192.168.6.131 -m tcp -p tcp --dport 3260 -j
DROP
[root@cats-cent ~]# /sbin/iptables -t filter -A OUTPUT -d 192.168.6.132 -m tcp -p tcp --dport 3260 -j
DROP
[root@cats-cent ~]# /sbin/iptables -t filter -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
[root@cats-cent ~]# /sbin/iptables -t filter -A INPUT -p icmp -j ACCEPT
[root@cats-cent ~]# /sbin/iptables -t filter -A INPUT -i lo -j ACCEPT
[root@cats-cent ~]# /sbin/iptables -t filter -A INPUT -m state --state NEW -m tcp -p tcp --dport 22 -j
ACCEPT
[root@cats-cent ~]# /sbin/iptables -t filter -A INPUT -j REJECT --reject-with icmp-host-prohibited
[root@cats-cent ~]# /sbin/iptables -t filter -A FORWARD -j REJECT --reject-with icmp-host-prohibited
[root@cats-cent ~]# /sbin/service iptables save

Manual fix:
Clear (flush) the Linux Firewall rules: iptables -F

Scenario: A customer reports that it has lost all connections to SANSVM1.


The instructor breaks the lab. If the problem is not visible right away, restart the iSCSI service.
Step Action
2-1. Log into the NetApp storage environment as an administrator.

2-2. Evaluate the storage environment.

2-3. Log into the LINUX host, cats-cent, as root.

2-4. Type the following command to verify connectivity between the host and target:
[root@cats-cent ~]# netstat -pant | grep iscsi
Answer the following questions:
 Do you see 4 connections in ESTABLISHED state?
 If not, what could be the issue?
 Fix the issue.

[root@cats-cent ~]# netstat -pant |grep iscsi


Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program
name
tcp 0 1 192.168.6.20:47782 192.168.6.132:3260 SYN_SENT 1382/iscsid

62 ONTAP Troubleshooting: Instructor Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


Step Action
tcp 0 0 192.168.6.20:39396 192.168.6.135:3260 ESTABLISHED 1382/iscsid
tcp 0 0 192.168.6.20:52029 192.168.6.136:3260 ESTABLISHED 1382/iscsid
tcp 0 1 192.168.6.20:40452 192.168.6.131:3260 SYN_SENT 1382/iscsid

The bold connections in this example are in a SYN_SENT state. After some time passes, the
connections are no longer displayed in the output:

[root@cats-cent ~]# netstat -pant |grep iscsi


tcp 0 0 192.168.6.20:39396 192.168.6.135:3260 ESTABLISHED 1382/iscsid
tcp 0 0 192.168.6.20:52029 192.168.6.136:3260 ESTABLISHED 1382/iscsid

Compare these netstat outputs to the netstat output of a fully functional Linux host in Step 5 of
Exercise 1.

The root cause is the local firewall rules of the host. This can be seen by observing the ipTables
(native Linux firewall configuration).

[root@cats-cent ~]# iptables -L


Chain INPUT (policy ACCEPT)
target prot opt source destination
ACCEPT all -- anywhere anywhere state RELATED,ESTABLISHED
ACCEPT icmp -- anywhere anywhere
ACCEPT all -- anywhere anywhere
ACCEPT tcp -- anywhere anywhere state NEW tcp dpt:ssh
REJECT all -- anywhere anywhere reject-with icmp-host-prohibited

Chain FORWARD (policy ACCEPT)


target prot opt source destination
REJECT all -- anywhere anywhere reject-with icmp-host-prohibited

Chain OUTPUT (policy ACCEPT)

target prot opt source destination


DROP tcp -- anywhere 192.168.6.131 tcp dpt:iscsi-target
DROP tcp -- anywhere 192.168.6.132 tcp dpt:iscsi-target

Observe the bold entries: Any packets to iscsi-target (port 3260) are dropped by the local
firewall.
[root@catsp-cent ~]# iptables -L

63 ONTAP Troubleshooting: Instructor Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


Step Action
Chain INPUT (policy ACCEPT)
target prot opt source destination
ACCEPT all -- anywhere anywhere state RELATED,ESTABLISHED
ACCEPT icmp -- anywhere anywhere
ACCEPT all -- anywhere anywhere
ACCEPT tcp -- anywhere anywhere state NEW tcp dpt:ssh
REJECT all -- anywhere anywhere reject-with icmp-host-prohibited

Chain FORWARD (policy ACCEPT)


target prot opt source destination
REJECT all -- anywhere anywhere reject-with icmp-host-prohibited

Chain OUTPUT (policy ACCEPT)


target prot opt source destination
DROP tcp -- anywhere sansvm1.catsp.csslp.netapp.com tcp dpt:iscsi-target
DROP tcp -- anywhere sansvm1.catsp.csslp.netapp.com tcp dpt:iscsi-target

Flush the iptables using:


iptables -F

Task 3: All LUNs Are Missing From the Windows Host


Your instructor prepares your lab environment for this exercise and notifies you when it is ready.
Run Mod6_Task3_change_iqn.pl to break the lab.
Run Mod6_Task3_change_iqn_fix.pl to fix the lab.
There is no fix script
Manual Break:
cluster1::> igroup remove -vserver sansvm* -igroup iscsi_group_win -initiator
iqn.1991-05.com.microsoft:cats-win-2.cats.csslp.netapp.com -force true
cluster1::> igroup add -vserver sansvm* -igroup iscsi_group_win -initiator iqn.1991-
05.com.microsoft:catps-win-2.cats.csslp.netapp.com

Manual Fix:

cluster1::> igroup remove -vserver sansvm* -igroup iscsi_group_win -initiator


iqn.1991-05.com.microsoft:catps-win-2.cats.csslp.netapp.com -force true
cluster1::> igroup add -vserver sansvm* -igroup iscsi_group_win -initiator iqn.1991-
05.com.microsoft:cats-win-2.cats.csslp.netapp.com

64 ONTAP Troubleshooting: Instructor Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


The student is expected to identify a connection issue by
1) Evaluating the storage environment to ensure all elements are available.
2) Evaluating and verifying the host configuration
To resolve: Check the IQN for the in the igroup in the LUN mapping

Scenario: A customer reports that there are no visible SAN disks attached to the Windows host. Evaluate the
NetApp scalable SAN environment. Restore LUN connectivity.
Step Action
3-1. Log in to the Windows host, and check firewall configuration.
3-2. If the firewall is enabled, disable it to see whether the LUN connectivity can be restored.
3-3. Log in to the NetApp cluster as an administrator.
3-4. Verify the configuration of the NetApp cluster.
3-5. Log in to the Windows host as an administrator.
3-6. Verify the configuration of the Windows host.
3-7. Verify that the windows IQN name is used in the SAN configurations of the cluster.
3-8. Restore the SAN disks.

Task 4: The LUNS are not visible through all LIFS of an SVM
Lab Scenario: In your lab, MPIO is not set in the windows host.
Step Action
4-1. Log in to the Windows host. Click Start-> Administrative Tools and open iSCSI Initiator.

4-2. Disconnect from the vserver sansvm2 if it is connected. (This will be the connection to the target
that has an iqn that ends with vs.8)
4-3. Select the target vserver sansvm1.(iqn ends with vs.7).
4-4. Click on Properties.
4-5. Note the target portal group of the session you see in the Properties window. Find the
corresponding LIF by using the following cluster shell command:
cluster1::> iscsi portal show Vserver
Logical Status Curr Curr
Interface TPGT Admin/Oper IP Address Node Port Enabled
---------- ---------- ---- ---------- --------------- ----------- ---- -------
sansvm1 sansvm1_data1
1034 up/up 192.168.6.131 node3 e0d true
sansvm1 sansvm1_data2
1035 up/up 192.168.6.132 node4 e0d true
sansvm2 sansvm2_data1
1036 up/up 192.168.6.135 node3 e0d true
sansvm2 sansvm2_data2
1037 up/up 192.168.6.136 node4 e0d true

65 ONTAP Troubleshooting: Instructor Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


Step Action
4-6. Now, add a second session. From the iSCSI Initiator Properties window, click on Properties
while the sansvm1 is still selected. Click on Add Session. In the <Connect To Target> pop-up
window, select Enable multi-path check-box. Click on Advanced. Use the pull-down menu and
set the following values:
Local adapter: Microsoft iSCSI Initiator
Initiator IP: 192.168.6.11
Target Portal IP: Choose the IP of the 2nd LIF of sansvm1.
4-7. How many disks do you see? Use the Disk Management to check this.
(Remember MPIO is not set for the iSCSI Initiator)
4-8. To identify the session through which you see the disk, disconnect from each session and note
the session and the target portal tag through which you see the disk.
4-9. Explain the behavior.

This is because of SLM feature.


Take a look at the output of the following command:
cluster1::*> lun mapping show -fields reporting-nodes
vserver path igroup reporting-nodes
------- --------------------- --------------- ---------------
sansvm1 /vol/sansvm_data/lun1 iscsi_group_lin node3
sansvm1 /vol/sansvm_data/lun2 iscsi_group_lin node3
sansvm1 /vol/sansvm_data/lun3 iscsi_group_win node3
sansvm2 /vol/sansvm_data/lun1 iscsi_group_lin node4
sansvm2 /vol/sansvm_data/lun2 iscsi_group_lin node4
sansvm2 /vol/sansvm_data/lun3 iscsi_group_win node4

For sansvm1, the lun is reported out only through node3. So the lun is visible only through the
lif that is local on node3.
For sansvm2, the lun is reported out only through node4. So the lun is visible only through the
lif that is local on node4.

Check where the lifs for vserver sansvm1 exists. It is on node3 and node4. Delete the lif
sansvm1_data2 from node 4 and put it on node 3. Since both lifs are on the node where the lun
is local, it will be visible as a disk through both lifs.

cluster1::*> net int show -vserver sansvm1


(network interface show)
Logical Status Network Current Current Is
Vserver Interface Admin/Oper Address/Mask Node Port Home
----------- ---------- ---------- ------------------ ------------- ------- ----
sansvm1
sansvm1_data1
up/up 192.168.6.131/24 node3 e0d true
sansvm1_data2
up/up 192.168.6.132/24 node4 e0d true
2 entries were displayed.

End of Exercise

66 ONTAP Troubleshooting: Instructor Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


Intentionally blank

67 ONTAP Troubleshooting: Instructor Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


MODULE 7 – TROUBLESHOOT MULTIPLE PROBLEMS

****OBJECTIVES
TO DO in the scripts:
Convert the initial set of commands in SuperLabSetup1 to just use ssh login and execute the command on
the cluster shell instead of using the NMSDK API.

In the SuperLabSetup2, will have to change the domain name for the CIFS server creation command.
Need to fix the command that accepts the username and password to log into the Domain controller while
creating a machine account.

The command doesn’t work :


Error: Machine account creation procedure failed
[ 0 ms] Trying to create machine account 'HAPPY' in
'CATS.CSSLP.NETAPP.COM' for Vserver 'happy'
[ 10] No servers found in DNS lookup for
_ldap._tcp.CATS.CSSLP.NETAPP.COM.
[ 10] No servers available for MS_LDAP_AD, vserver: 9, domain:
CATS.CSSLP.NETAPP.COM.
[ 10] Cannot find any domain controllers; verify the domain
name and the node's DNS configuration
**[ 10] FAILURE: Unable to connect to any (0) domain controllers.
[ 10] 'NisDomain' configuration not available
[ 10] NIS configuration not found for Vserver 9
[ 15] No servers found in DNS lookup for
_ldap._tcp.dc._msdcs.CATS.CSSLP.NETAPP.COM.
[ 20] No servers found in DNS lookup for
_ldap._tcp.CATS.CSSLP.NETAPP.COM.
[ 24] No servers found in DNS lookup for
_kerberos._tcp.CATS.CSSLP.NETAPP.COM.
[ 24] No servers available for MS_LDAP_AD, vserver: 9, domain:
CATS.CSSLP.NETAPP.COM.

By the end of this exercise, you should be able to:

- Recover a broken environment using the skills you have learned in this course. The

Scenario:

A customer calls to report that he cannot write to some mounts and shares.

68 ONTAP Troubleshooting: Instructor Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


SETUP:

Scripted in file SuperLabSetup1

cluster1::> aggr create -aggregate aggrsuper -diskcount 5 -nodes node4

cluster1::> vserver create -vserver happy -rootvolume happy_root -aggregate aggrsuper -ns-switch file -nm-switch file -
rootvolume-security-style unix

cluster1::> vserver create -vserver grumpy -rootvolume grumpy_root -aggregate aggrsuper -ns-switch file -nm-switch file -
rootvolume-security-style unix

cluster1::> vserver services dns create -vserver happy -domains cats.csslp.netapp.com -name-servers 192.168.6.10 -
state enabled

cluster1::> vserver services dns create -vserver grumpy -domains cats.csslp.netapp.com -name-servers 192.168.6.10 -
state enabled

cluster1::> network interface create -vserver happy -lif happy_data1 -role data -data-protocol nfs,cifs,fcache -home-node
node4 -home-port e0d -address 192.168.6.160 -netmask 255.255.255.0

cluster1::> network interface create -vserver grumpy -lif grumpy_data1 -role data -data-protocol nfs,cifs,fcache -home-
node node4 -home-port e0d -address 192.168.6.161 -netmask 255.255.255.0

cluster1::> export-policy create -vserver happy -policyname happy_policy

cluster1::> export-policy create -vserver grumpy -policyname grumpy_policy

cluster1::> export-policy create -vserver grumpy -policyname grumpy

cluster1::> export-policy rule create -vserver happy -policyname happy_policy -clientmatch 0.0.0.0/0 -rorule any -rwrule
none -ruleindex 1 -protocol any -anon 65534 -superuser any

cluster1::> export-policy rule create -vserver grumpy -policyname grumpy_policy -clientmatch 0.0.0.0/0 -rorule any -rwrule
none -ruleindex 1 -protocol any -anon 65534 -superuser any

cluster1::> export-policy rule create -vserver grumpy -policyname grumpy -clientmatch 0.0.0.0/0 -rorule any -rwrule none -
ruleindex 1 -protocol any -anon 65534 -superuser any

cluster1::> export-policy rule create -vserver grumpy -policyname default -clientmatch 0.0.0.0/0 -rorule any -rwrule any -
ruleindex 1 -protocol any -anon 65534 -superuser any

cluster1::> export-policy rule create -vserver happy -policyname default -clientmatch 0.0.0.0/0 -rorule any -rwrule any -
ruleindex 1 -protocol any -anon 65534 -superuser any

cluster1::> volume create -vserver grumpy -volume grumpy_cifs -aggregate aggrsuper -size 200M -state online -type RW
-unix-permissions 777 -junction-path /grumpy_cifs -policy default

cluster1::> volume create -vserver grumpy -volume grumpy_nfs -aggregate aggrsuper -size 200M -state online -type RW -
unix-permissions 777 -junction-path /grumpy_nfs -policy default

cluster1::> volume create -vserver happy -volume happy_cifs -aggregate aggrsuper -size 200M -state online -type RW -
unix-permissions 777 -junction-path /happy_cifs -policy default

cluster1::> volume create -vserver happy -volume happy_nfs -aggregate aggrsuper -size 200M -state online -type RW -
unix-permissions 777 -junction-path /happy_nfs -policy default

Scripted in file SuperLabSetup1


69 ONTAP Troubleshooting: Instructor Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


cluster1::>nfs server create -access true -v3 enabled -vserver grumpy

cluster1::> nfs server create -access true -v3 enabled -vserver happy

cluster1::> nfs server start -vserver happy

cluster1::> nfs server start -vserver grumpy

cluster1::> cluster time-service ntp server create -server 192.168.6.10

cluster1::> cifs create -vserver happy -cifs-server happy -domain cats.csslp.netapp.com

cluster1::> cifs create -vserver grumpy -cifs-server grumpy -domain cats.csslp.netapp.com

cluster1::> cifs share create -vserver happy -share-name happy -path /happy_cifs -share-properties
oplocks,browsable,changenotify

cluster1::> cifs share create -vserver grumpy -share-name grumpy -path /grumpy_cifs -share-properties
oplocks,browsable,changenotify

Scripted in file SuperLabSetup3

cluster1::> network interface create -vserver happy -lif happy_iscsi -role data -data-protocol iscsi -home-node node4 -
home-port e0d -address 192.168.6.164 -netmask 255.255.255.0

cluster1::> network interface create -vserver grumpy -lif grumpy_iscsi -role data -data-protocol iscsi -home-node node4 -
home-port e0d -address 192.168.6.165 -netmask 255.255.255.0

cluster1::> vol create -vserver grumpy -volume grumpy_iscsi -aggregate aggrsuper -size 600m -state online -type RW -
policy default -unix-permissions ---rwxr-xr-x

cluster1::> vol create -vserver happy -volume happy_iscsi -aggregate aggrsuper -size 600m -state online -type RW -policy
default -unix-permissions ---rwxr-xr-x

cluster1::> lun create -vserver grumpy -path /vol/grumpy_iscsi/grumpy_win_lun -size 200m -ostype windows -space-
reserve enabled

cluster1::> lun create -vserver happy -path /vol/happy_iscsi/happy_win_lun -size 200m -ostype windows -space-reserve
enabled

cluster1::> lun create -vserver grumpy -path /vol/grumpy_iscsi/grumpy_lun -size 200m -ostype linux -space-reserve
enabled

cluster1::> lun create -vserver happy -path /vol/happy_iscsi/happy_lun -size 200m -ostype linux -space-reserve enabled

70 ONTAP Troubleshooting: Instructor Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


- Create DNS entries for happy, grumpy, happyiscsi, grumpyiscsi

TO BE MANUALLY DONE??
happy 192.168.6.160
grumpy 192.168.6.161
happy_iscsi 192.168.6.164
grumpy_iscsi 192.168.6.165

- Linux NFS mounts


[root@cats-cent ~]# mkdir /happy_nfs_mount
[root@cats-cent ~]# mkdir /happy_cifs_mount
[root@cats-cent ~]# mkdir /grumpy_nfs_mount
[root@cats-cent ~]# mkdir /grumpy_cifs_mount
[root@cats-cent ~]# mount 192.168.6.160:/happy_cifs /happy_cifs_mount/
[root@cats-cent ~]# mount 192.168.6.160:/happy_nfs /happy_nfs_mount/
[root@cats-cent ~]# mount 192.168.6.160:/grumpy_nfs /grumpy_nfs_mount/
[root@cats-cent ~]# mount 192.168.6.161:/grumpy_nfs /grumpy_nfs_mount/
[root@cats-cent ~]# mount 192.168.6.161:/grumpy_cifs /grumpy_cifs_mount/

- ISCSI configuration
[root@cats-cent ~]# cat /etc/iscsi/initiatorname.iscsi
InitiatorName=iqn.1994-05.com.redhat:cats-cent

cluster1::> iscsi create -vserver grumpy


cluster1::> iscsi create -vserver happy

[root@capt-cent grumpy]# iscsiadm --mode discovery --op update --type sendtargets --portal 192.168.6.164
Starting iscsid: [ OK ]
192.168.6.164:3260,1044 iqn.1992-08.com.netapp:sn.952455a27c9711e5ab27005056bf11fa:vs.16
[root@capt-cent grumpy]# iscsiadm --mode discovery --op update --type sendtargets --portal 192.168.6.165
192.168.6.165:3260,1045 iqn.1992-08.com.netapp:sn.b2bc17da7c9711e5ab27005056bf11fa:vs.17

[root@capt-cent grumpy]# iscsiadm --mode node -l all


Logging in to [iface: default, target: iqn.1992-08.com.netapp:sn.952455a27c9711e5ab27005056bf11fa:vs.16, portal:
192.168.6.164,3260] (multiple)
Logging in to [iface: default, target: iqn.1992-08.com.netapp:sn.b2bc17da7c9711e5ab27005056bf11fa:vs.17, portal:
192.168.6.165,3260] (multiple)
Login to [iface: default, target: iqn.1992-08.com.netapp:sn.952455a27c9711e5ab27005056bf11fa:vs.16, portal:
192.168.6.164,3260] successful.
Login to [iface: default, target: iqn.1992-08.com.netapp:sn.b2bc17da7c9711e5ab27005056bf11fa:vs.17, portal:
192.168.6.165,3260] successful.

cluster1::> iscsi show


Target Target Status
Vserver Name Alias Admin
---------- -------------------------------- ---------------------------- ------
grumpy iqn.1992-08.com.netapp:sn.b2bc17da7c9711e5ab27005056bf11fa:vs.17
grumpy up
happy iqn.1992-08.com.netapp:sn.952455a27c9711e5ab27005056bf11fa:vs.16
happy up
student1 iqn.1992-08.com.netapp:sn.32962fef36a011e5bd4a005056af0bb1:vs.5
student1 up
student2 iqn.1992-08.com.netapp:sn.3905090b36a011e5bd4a005056af0bb1:vs.6
student2 up

cluster1::> iscsi initiator show


Tpgroup Initiator

71 ONTAP Troubleshooting: Instructor Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


Vserver Name TSIH Name ISID Igroup Name
------- -------- ---- --------------------- ----------------- -----------------
grumpy grumpy_iscsi
4 iqn.1994-05.com.redhat:d3f86c215967
00:02:3d:02:00:00 -
happy happy_iscsi 3 iqn.1994-05.com.redhat:d3f86c215967
00:02:3d:01:00:00 -

cluster1::> igroup create -vserver grumpy -igroup linux_group -protocol iscsi -ostype linux -initiator iqn.1994-
05.com.redhat:cats-cent

cluster1::> igroup create -vserver happy -igroup linux_group -protocol iscsi -ostype linux -initiator iqn.1994-
05.com.redhat:cats-cent

cluster1::> lun map -vserver grumpy -path /vol/grumpy_iscsi/grumpy_lun -igroup linux_group


cluster1::> lun map -vserver happy -path /vol/happy_iscsi/happy_lun -igroup linux_group

[root@cats-cent ~]# cd /happy_cifs_mount/


[root@cats-cent ~]# rpm -Uvh netapp_linux_unified_host_utilities-7-0.x86_64.rpm
[root@cats-cent ~]# sanlun lun show
[root@capt-cent /]# rescan-scsi-bus.sh
[root@capt-cent dev]# fdisk /dev/sdf
[root@capt-cent dev]# mkfs.ext4 /dev/sdf
[root@capt-cent dev]# mkdir /happy_iscsi
[root@capt-cent dev]# fdisk /dev/sdg
[root@capt-cent dev]# mkfs.ext4 /dev/sdg
[root@capt-cent dev]# mkdir /grumpy_iscsi
[root@capt-cent /]# service multipathd reload
[root@capt-cent /]# multipath -ll
[root@capt-cent /]# cp /usr/share/doc/device-mapper-multipath-0.4.9/multipath.conf /etc/multipath.conf
[root@capt-cent /]# mount /dev/dm-6 /happy_iscsi/
[root@capt-cent /]# mount /dev/dm-7 /grumpy_iscsi/

72 ONTAP Troubleshooting: Instructor Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


Configure NTP:
w32tm /config /manualpeerlist:pool.ntp.org /syncfromflags:MANUAL
Stop-Service w32time
Start-Service w32time

Windows Iscsi configuration

sc \\localhost config msiscsi start=auto


sc \\localhost start msiscsi
sc \\localhost query msiscsi
C:\>iscsicli.exe QAddTargetPortal 192.168.6.164
C:\>iscsicli.exe QAddTargetPortal 192.168.6.165
C:\>iscsicli.exe ListTargets
Microsoft iSCSI Initiator Version 6.2 Build 9200
Targets List:
iqn.1992-08.com.netapp:sn.c69fc7288fbf11e5aefc005056bf00ea:vs.18
iqn.1992-08.com.netapp:sn.cb322f8f8fbf11e5aefc005056bf00ea:vs.19
C:\>iscsicli.exe QLoginTarget iqn.1992-08.com.netapp:sn.c69fc7288fbf11e5aefc005056bf00ea:vs.18
C:\>iscsicli.exe QLoginTarget iqn.1992-08.com.netapp:sn.cb322f8f8fbf11e5aefc005056bf00ea:vs.19
C:\>iscsicli.exe iqn.1991-05.com.microsoft:cats-win-2.cats.csslp.netapp.com

cluster1::> igroup create -vserver happy -igroup win_group -protocol iscsi -ostype windows -initiator iqn.1991-
05.com.microsoft:cats-win-2.cats.csslp.netapp.com
cluster1::> igroup create -vserver grumpy -igroup win_group -protocol iscsi -ostype windows -initiator iqn.1991-
05.com.microsoft:cats-win-2.cats.csslp.netapp.com
cluster1::> lun map -vserver happy -path /vol/happy_iscsi/happy_win_lun -igroup win_group
cluster1::> lun map -vserver grumpy -path /vol/grumpy_iscsi/grumpy_win_lun -igroup win_group

C:\Windows> compmgnt.msc
Disc Management > Rescan Disks > Online disks> Initialize disks > Create Volumes on the disk
Disks F and G are created.

73 ONTAP Troubleshooting: Instructor Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


Super Lab verification

[root@cats-cent ~]# mount


/dev/mapper/vg_captcent-lv_root on / type ext4 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
tmpfs on /dev/shm type tmpfs (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
/dev/mapper/mpathe on /happy_iscsi type ext4 (rw)
/dev/mapper/mpathf on /grumpy_iscsi type ext4 (rw)
happy:/happy_cifs on /happy_cifs_mount type nfs (rw,addr=192.168.6.160)
happy:/happy_nfs on /happy_nfs_mount type nfs (rw,addr=192.168.6.160)
grumpy:/grumpy_nfs on /grumpy_nfs_mount type nfs (rw,addr=192.168.6.161)
grumpy:/grumpy_cifs on /grumpy_cifs_mount type nfs (rw,addr=192.168.6.161)

74 ONTAP Troubleshooting: Instructor Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


Super Lab Manual Break
- Change export policy of grumpy_cifs and grumpy_nfs to grumpy

- Unmount grumpy_cifs and grumpy_nfs

- Change grumpy export policy to deny CIFS, NFS and no super user access

- Disable and delete grumpy_data1 lif


- Recreate grumpy_data1 and assign it to nassvm1

Cluster1::> network interface create -vserver nassvm1 -lif grumpy_data1 -role data -data-protocol
nfs,cifs,fcache -home-node node4 -home-port e0d -address 192.168.6.161 -netmask 255.255.255.0

- Disable NFS completely


cluster1::> nfs server stop -vserver grumpy

- Delete cluster time services


cluster1::*> cluster time-service ntp server delete -server 192.168.6.10

- Edit the initiator name to


iqn.1994-05.com.redhat:castp-cent from iqn.1994-05.com.redhat:cats-cent
iqn.1991-05.com.microsoft:castp-win-2.cats.csslp.netapp.com from iqn.1991-05.com.microsoft:cats-win-
2.cats.csslp.netapp.com

- Change igroup name from linux_group to linux_groups, win_group to win_groups

- Change grumpy and grumpy_iscsi A records to 192.168.6.171 and 192.168.6.174

- Expire the student1 password in the AD

- Disable the account and remove it from Administrators and make sure it only belongs to the Domain Users
group.

- Disable SMB1 from the DC, https://support.microsoft.com/en-us/kb/2696547


PS C:\Users\Administrator> Set-SmbServerConfiguration -EnableSMB1Protocol $false
Confirm
Are you sure you want to perform this action?
Performing operation 'Modify' on Target 'SMB Server Configuration'.
[Y] Yes [A] Yes to All [N] No [L] No to All [S] Suspend [?] Help (default is "Y"): Y

PS C:\Users\Administrator> Get-SmbServerConfiguration | EnableSMBProtocol

EnableSMB1Protocol : False

75 ONTAP Troubleshooting: Instructor Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


1. TASK1: RECOVER BROKEN ENVIRONMENT USING THE SKILLS YOU HAVE
LEARNED

STEP ACTION

1. At the end of this lab you should have the following mounted on your linux host.

[root@cats-cent ~]# mount

/dev/mapper/vg_captcent-lv_root on / type ext4 (rw)

proc on /proc type proc (rw)

sysfs on /sys type sysfs (rw)

devpts on /dev/pts type devpts (rw,gid=5,mode=620)

tmpfs on /dev/shm type tmpfs (rw)

sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)

/dev/mapper/mpathe on /happy_iscsi type ext4 (rw)

/dev/mapper/mpathf on /grumpy_iscsi type ext4 (rw)

happy:/happy_cifs on /happy_cifs_mount type nfs (rw,addr=192.168.6.160)

happy:/happy_nfs on /happy_nfs_mount type nfs (rw,addr=192.168.6.160)

grumpy:/grumpy_nfs on /grumpy_nfs_mount type nfs (rw,addr=192.168.6.161)

grumpy:/grumpy_cifs on /grumpy_cifs_mount type nfs (rw,addr=192.168.6.161)

2. On the windows host, you should be able to map \grumpy_cifs and \happy_cifs.

3. You should be able to write to all the mounts and the mapped drives you recovered above.

76 ONTAP Troubleshooting: Instructor Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


Issues:
1. When doing the following command:cannot see the discovered servers for nassvm1:

cluster1::*> cifs domain discovered-servers show -vserver nassvm2

Node: node1
Vserver: nassvm2

Domain Name Type Preference DC-Name DC-Address Status


--------------- -------- ---------- --------------- --------------- ---------
catsp.csslp.netapp.com
KERBEROS adequate catsp-win-1 192.168.6.10 undetermined
catsp.csslp.netapp.com
MS-LDAP adequate catsp-win-1 192.168.6.10 undetermined
catsp.csslp.netapp.com
MS-DC adequate catsp-win-1 192.168.6.10 OK
3 entries were displayed.

cluster1::*> cifs domain discovered-servers show -vserver nassvm1


There are no entries matching your query.

2. No name mapping in the configuration

3. Cannot run the scripts to break labs. Able to ping the cluster management IP from the RDP
m/c. But the script is unable to run because it says the the target denied access.

4. Check this command in 8.3 version:


cluster1::*> node reboot -node node2

In the 9.2 version, gives an error as follows:


cluster1::*> node reboot -node node2

Warning: Are you sure you want to reboot node "node2"? {y|n}: y

Error: command failed: Could not migrate LIFs away from node: Failed to migrate
one or more LIFs away from node "node2". Use the "network interface show
-curr-node node2" command to review the status of any remaining LIFs on
that node.

Reissue the command with "-skip-lif-migration-before-reboot" to skip the


migration and continue with takeover.

77 ONTAP Troubleshooting: Instructor Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute

You might also like