Download as pdf or txt
Download as pdf or txt
You are on page 1of 36

ONTAP Troubleshooting

Exercise Guide
Content Version 2
NETAPP UNIVERSITY

ONTAP Troubleshooting

Exercise Guide
Course ID: CP-ILT-CATSP
Catalog Number: CP-ILT-CATSP-EG

NetApp University - Do Not Distribute


ATTENTION
The information contained in this course is intended only for training. This course contains information and activities that,
while beneficial for the purposes of training in a closed, non-production environment, can result in downtime or other
severe consequences in a production environment. This course material is not a technical reference and should not,
under any circumstances, be used in production environments. To obtain reference materials, refer to the NetApp product
documentation that is located at http://now.netapp.com/.

COPYRIGHT
© 2017 NetApp, Inc. All rights reserved. Printed in the U.S.A. Specifications subject to change without notice.
No part of this document covered by copyright may be reproduced in any form or by any means—graphic, electronic, or
mechanical, including photocopying, recording, taping, or storage in an electronic retrieval system—without prior written
permission of NetApp, Inc.

U.S. GOVERNMENT RIGHTS


Commercial Computer Software. Government users are subject to the NetApp, Inc. standard license agreement and
applicable provisions of the FAR and its supplements.

TRADEMARK INFORMATION
NETAPP, the NETAPP logo, and the marks listed at http://www.netapp.com/TM are trademarks of NetApp, Inc. Other
company and product names may be trademarks of their respective owners.

E-2 ONTAP Troubleshooting: Student Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


TABLE OF CONTENTS
INTRODUCTION............................................................................................................................................ E-1
MODULE 0: WELCOME................................................................................................................................ E-5
MODULE 1: TROUBLESHOOTING THE MANAGEMENT COMPONENT ............................................... E-11
MODULE 2: TROUBLESHOOTING SCALE-OUT NETWORKING ........................................................... E-15
MODULE 3: TROUBLESHOOTING NETWORK COMPONENT AND SECURITY SERVICES ................ E-19
MODULE 4: TROUBLESHOOTING NFS ................................................................................................... E-22
MODULE 5: TROUBLESHOOTING SMB................................................................................................... E-24
MODULE 6: TROUBLESHOOTING SCALABLE SAN .............................................................................. E-28

E-3 ONTAP Troubleshooting: Student Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


Getting Started
Study Aid Icons
In your exercises, you might see one or more of the following icons.
Warning
If you misconfigure a step marked with this icon, later steps might not work properly.
Check the step carefully before you move forward.
Attention
Review this step or comment carefully to save time, learn a best practice, or avoid errors.
Information
A comment labeled with this icon provides information about the topic or procedure.
References
A comment labeled with this icon identifies reference material that provides additional
information.

Exercise Equipment
The student lab environment consists of one vApp for each student.
The vApp is labeled OTS_X0Y, where X is the set number and Y is the student vApp number

Remote Desktop Protocol Access


Use a VPN to access advtraining.netapp.com with the VPN credentials that are assigned to you. Use
Remote Desktop Protocol (RDP) to connect to the access host for the lab that you want to access. The
access host is aptly named cats-access. You should have received the IP address. You can also find the IP
address by accessing the Virtual Machines tab in the vApp interface.
Console Access
Use a VPN to access advtraining.netapp.com with the VPN credentials that are assigned to you. Log in to
https://adv-vcloud.gsedu.ngslabs.netapp.com/cloud/org/prod/#/vAppListPage? with the credentials that
are assigned to you. This page gives you access to the vApps for labs. Double-click any host to access the
console.

E-4 ONTAP Troubleshooting: Student Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


Module 0: Welcome
In this exercise, you check the health of the NetApp ONTAP software cluster that you use for all the
subsequent exercises.

Objectives
This exercise focuses on enabling you to do the following:
 Conduct a full and comprehensive health check of an ONTAP cluster
 Access the cluster using OnCommand System Manager (System Manager)
 Access cluster and node log files through HTTPS

Task 1: Complete a General Health Check


Use the lab information that is provided to you by your instructor (credentials and the IP addresses) to
complete the following steps.
Step Action
1-1. Log in to the following VPN: advtraining.netapp.com.

1-2. Open the Remote Desktop Connection (RDC) application and connect to your access host.

1-3. Launch PuTTY from the desktop of the access host.

1-4. Connect to the cluster1 management interface.

1-5. Run the following commands to do a complete health check of the cluster.
 cluster1::*> network interface show
 cluster1::*> cluster show
 cluster1::*> cluster ring show
 cluster1::*> storage failover show
 cluster1::*> event log show -severity WARNING
 cluster1::*> event log show -severity EMERGENCY

E-5 ONTAP Troubleshooting: Student Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


Task 2: Complete an iSCSI General Health Check
Step Action
2-1. Run the following commands:
 cluster1::*>vserver show -vserver sansvm1
 cluster1::*> vserver show -vserver sansvm2
 cluster1::*> iscsi show -instance
 cluster1::*> igroup show -instance
 cluster1::*> lun show -instance
 cluster1::*> lun mapping show
 cluster1::*> network interface show -vserver sansvm1 -
instance
 cluster1::*> network interface show -vserver sansvm2 -
instance
 cluster1::*> volume show -vserver sansvm1 -fields junction-
path, junction-parent, user, group, policy, unix-
permissions, security-style
 cluster1::*> volume show -vserver sansvm2 -fields junction-
path, junction-parent, user, group, policy, unix-permissions,
security-style

E-6 ONTAP Troubleshooting: Student Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


Task 3: Complete a CIFS Health Check
Step Action
3-1. Run the following commands:
 cluster1::*> vserver show -vserver nassvm1
 cluster1::*> vserver show -vserver nassvm2
 cluster1::*> cifs options show
 cluster1::*> cifs options show -fields is-exportpolicy-
enabled
 cluster1::*> cifs security show
 cluster1::*> cifs share show -fields vserver, share-name,
cifs-server, path, share-properties, volume
 cluster1::*> volume show -fields junction-path, junction-
parent, user, group, policy, unix-permissions, security-style
 cluster1::*> vserver services dns show
 cluster1::*> network interface show -vserver nassvm1
 cluster1::*> network interface show -vserver nassvm2
 cluster1::*> network interface show -vserver nassvm1 -
instance
 cluster1::*> network interface show -vserver nassvm2 –
instance
 cluster1::*> cifs domain discovered-servers show -vserver
nassvm1
 cluster1::*> cifs domain discovered-servers show -vserver
nassvm2
 cluster1::*> unix-user show -vserver nassvm1
 cluster1::*> unix-user show -vserver nassvm2
 cluster1::*> unix-group show -vserver nassvm1
 cluster1::*> unix-group show -vserver nassvm2
 cluster1::*> vserver name-mapping show -vserver nassvm1
 cluster1::*> vserver name-mapping show -vserver nassvm2

E-7 ONTAP Troubleshooting: Student Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


Task 4: Complete an NFS Health Check
Step Action
4-1. Run the following commands:
 cluster1::*> vserver show -vserver nassvm1
 cluster1::*> vserver show -vserver nassvm2
 cluster1::*> nfs status -vserver nassvm1
 cluster1::*> nfs status -vserver nassvm2
 cluster1::*> nfs show
 cluster1::*> nfs show -instance
cluster1::*> network interface show -vserver nassvm1
 cluster1::*> network interface show -vserver nassvm2
 cluster1::*> network ping -node node3 -destination
192.168.6.20
 cluster1::*> network ping -node node4 -destination
192.168.6.20
 cluster1::*> volume show -fields junction-path, junction-
parent, user, group, policy, unix-permissions, security-
style, policy -vserver nassvm1
 cluster1::*> volume show -fields junction-path, junction-
parent, user, group, policy, unix-permissions, security-
style, policy -vserver nassvm2
 cluster1::*> export-policy rule show -policyname default
 cluster1::*> export-policy rule show -policyname default -
instance
 cluster1::*> export-policy rule show -policyname policy1
cluster1::*> export-policy rule show -policyname policy1 -
instance

E-8 ONTAP Troubleshooting: Student Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


Task 5: Access the Cluster Using OnCommand System Manager
Step Action
5-1. Access System Manager using the cluster management LIF. https://<cluster-mgmt-ip>/

5-2. Log in using the cluster credentials.

5-3. Verify that System Manager is launched.

5-4. Review System Manager.

E-9 ONTAP Troubleshooting: Student Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


Task 6: View Log Files Using HTTPS
Step Action
6-1.
Use one of the following knowledge base articles to complete this exercise:
 https://kb.netapp.com/support/index?page=content&id=1013814
 https://kb.netapp.com/support/s/article/ka31A0000000uYnQAI/how-to-enable-remote-
access-to-a-nodes-root-volume-in-a-cluster?language=en_US
6-2.

NOTE: In ONTAP 8.3 software, you do not need to enable the web services and
HTTP.

6-3. Connect to the cluster1 management interface using the administrator account from the access
host. You might want to enable logging and save all your session output.

6-4. Identify the names of the nodes in the cluster.

6-5. Access the URLs to view the log directory on each node. You must log in using the cluster
administration credentials.
https://<cluster-mgmt-ip>/spi/<node_name>/etc/log/

6-6. Access the URLs to view the directory in which the core files are saved on each node.
https://<cluster-mgmt-ip>/spi/<node_name>/etc/crash/

End of Exercise

E-10 ONTAP Troubleshooting: Student Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


Module 1: Troubleshooting the Management
Component
In this exercise, you create a cluster backup, identify issues that causes mgwd not to function properly and
solve those issues.

Objectives
This exercise focuses on enabling you to do the following:
 Recover a replicated database (RDB) configuration
 Resolve RDB replication problems
 Perform cluster and node backups
 Resolve an issue with /mroot

Task 1: Backup and Recovery of a RDB Configuration


Step Action
1-1. List two methods (or types) of recovery for a system configuration.

1-2. List two frequent reasons that could cause a cluster configuration backup to fail.

1-3. A customer sees the following error message on the console:


cluster_backup_job::check_for_node_backups - Node: %s Backup
Errored.
Explain what this error message tells you.

1-4. Identify a knowledge base article to resolve the error message in Step 1-3.

1-5. List the command to display the default backup schedule.

1-6. List the command that verifies that the scheduled backups were created and distributed within
the cluster.

1-7. List the command that you can use to recover a node’s configuration.

E-11 ONTAP Troubleshooting: Student Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


Task 2: Create a Cluster Backup
Step Action
2-1. List the current backups on node1 and the command that you run.

2-2. Specify the location of the backup files.

2-3. On node1, start a job to create a system configuration backup of the entire cluster and note the
job ID number.

2-4. Before the job finishes, review the job that you have created:
cluster1::*> job show
cluster1::*> job show –id <ID#>
(You use the job ID from the backup create command.)
cluster1::*> job show –id <ID#> -fields uuid
cluster1::*> job show -uuid UUID_from_the_previous_command

E-12 ONTAP Troubleshooting: Student Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


Task 3: Resolve an Unhealthy Node
Your instructor prepares your lab environment for this exercise and notifies you when it is ready.
Scenario: A customer reports an unhealthy node.
Step Action
3-1. Answer the following questions:
 What is the current state of the cluster?
 Is the cluster healthy?
 Which command did you use to check the health of the cluster?
3-2. List the status of the RDB.

3-3. Answer this question:


Does a reboot of the unhealthy node fix the issue?

3-4. Recover the node.

E-13 ONTAP Troubleshooting: Student Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


Task 4: Resolve an Issue with /mroot
Step Action
4-1. Log in to the node management interface for node2, and then enter the systemshell.

4-2. Identify the process ID of the management gateway process.

4-3. Stop the management gateway process.

4-4. Explain what happened and the reason.

4-5. Log in to the node management interface for node2 again, and answer these questions:
 Are you able to log in?
 Why or why not?
4-6. Log back in to the systemshell of node2.

4-7. From node 2’s systemshell, unmount mroot, using the following commands:
% cd /etc
% sudo./netapp_mroot_unmount
% exit
4-8. Log in to the cluster management session, and check the cluster health.

4-9. Attempt to modify the volume nassvm1_nfs:


cluster1::*> vol modify -vserver nassvm1 -volume nassvm1_nfs -
size 2G
Volume modify successful on volume nassvm1_nfs of Vserver
nassvm1.

4-10. Attempt to modify the volume nassvm2_nfs:


cluster1::*> vol modify -vserver nassvm2 -volume nassvm2_nfs -
size 2G
Volume modify successful on volume nassvm2_nfs of Vserver
nassvm2.

Info: Node node2 that hosts aggregate aggrnas2 is offline

4-11. Check the cluster health again, and answer these questions:
 Do you see a difference?
 If so, why?
 What is nonoperational?
4-12. Fix this problem, and answer this question:
How did you verify that /mroot is mounted?

End of Exercise

E-14 ONTAP Troubleshooting: Student Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


Module 2: Troubleshooting Scale-Out Networking
In this exercise, you resolve customer issues that relate to networking problems.

Objectives
This exercise focuses on enabling you to do the following:
 Identify the network component and the data component interaction
 Outline the networking implications of upgrading to ONTAP 8.3 software
 Use network triage tools
 Describe the implications of vifmgr going Out of Quorum (OOQ)

Task 1: Given a Scenario, Explain the Implications of Vifmgr Going Out


of Quorum (Paper Lab)
Step Action
1-1. A customer called in and provided the following output. Examine the output.
cluster::*> cluster ring show
Node UnitName Epoch DB Epoch DB Trnxs Master Online
--------- -------- -------- -------- ---------- ------- ------
cluster-01 mgmt 4 4 136154 cluster-02 secondary
cluster-01 vldb 4 4 76 cluster-02 secondary

Error: rdb_ring_info: RDB ring state query of 127.0.0.1 for vifmgr failed on
RPC connect: clnttcp_create: RPC: Remote system error - Connection refused
cluster-01 vifmgr - - - - -
cluster-01 bcomd 4 4 14 cluster-02 secondary
cluster-01 crs 1 1 79 cluster-02 secondary
cluster-02 mgmt 4 4 136154 cluster-02 master
cluster-02 vldb 4 4 76 cluster-02 master
cluster-02 vifmgr 4 4 13220 cluster-02 master
cluster-02 bcomd 4 4 14 cluster-02 master
cluster-02 crs 1 1 79 cluster-02 master
cluster-03 mgmt 4 4 136154 cluster-02 secondary
cluster-03 vldb 4 4 76 cluster-02 secondary
cluster-03 vifmgr 4 4 13220 cluster-02 secondary
cluster-03 bcomd 4 4 14 cluster-02 secondary
cluster-03 crs 1 1 79 cluster-02 secondary
cluster-04 mgmt 4 4 136154 cluster-02 secondary
cluster-04 vldb 4 4 76 cluster-02 secondary
cluster-04 vifmgr 4 4 13220 cluster-02 secondary
cluster-04 bcomd 4 4 14 cluster-02 secondary
cluster-04 crs 1 1 79 cluster-02 secondary

20 entries were displayed.

cluster::*> net int show -role data


(network interface show)

Logical Status Network Current Current Is


Vserver Interface Admin/Oper Address/Mask Node Port Home
------------------------------------------------ -----------------------------
Vs lif1 up/up 10.61.83.215/24 cluster-03 e0a false
vs1 nfs_lif1 up/up 10.61.83.200/24 cluster-03 e0a false
2 entries were displayed.

E-15 ONTAP Troubleshooting: Student Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


Step Action
1-2.

The customer says that these LIFs are normally home on the node cluster-01. Explain
which vifmgr behavior might explain why these LIFs are now on node cluster-03.
1-3. A customer calls in and provided the following output. Examine the output.
cluster::*> cluster ring show
Node UnitName Epoch DB Epoch DB Trnxs Master Online
--------- -------- -------- -------- -------- ---------

Error: rdb_ring_info: RDB ring state query of 127.0.0.1 for vifmgr failed on RPC
connect:
clnttcp_create: RPC: Remote system error - Connection refused

cluster-01 mgmt 4 4 136197 cluster-02 secondary


cluster-01 vldb 4 4 76 cluster-02 secondary
cluster-01 vifmgr - - - - -
cluster-01 bcomd 4 4 15 cluster-02 secondary
cluster-01 crs 1 1 79 cluster-02 secondary
cluster-02 mgmt 4 4 136197 cluster-02 master
cluster-02 vldb 4 4 76 cluster-02 master
cluster-02 vifmgr 0 4 13256 - offline
cluster-02 bcomd 4 4 15 cluster-02 master
cluster-02 crs 1 1 79 cluster-02 master
cluster-03 mgmt 4 4 136197 cluster-02 secondary
cluster-03 vldb 4 4 76 cluster-02 secondary
cluster-03 vifmgr - - - - -
cluster-03 bcomd 4 4 15 cluster-02 secondary
cluster-03 crs 1 1 79 cluster-02 secondary
cluster-04 mgmt 4 4 136197 cluster-02 secondary
cluster-04 vldb 4 4 76 cluster-02 secondary
cluster-04 vifmgr - - - - -
cluster-04 bcomd 4 4 15 cluster-02 secondary
cluster-04 crs 1 1 79 cluster-02 secondary

20 entries were displayed.

cluster1::*> net int show -role data

Logical Status Network Current Current Is


Vserver Interface Admin/Oper Address/Mask Node Port Home
------------------- ---------- ------------------ -------------
-------------- ----------
clintons
lif1 up/- 10.61.83.215/24 cluster-01 e0a true
lif2 up/- 192.168.3.2/24 cluster-02 e0a true
lif3 up/- 192.168.3.3/24 cluster-03 e0a true
lif4 up/- 192.168.3.4/24 cluster-04 e0a true

primary
nfs_lif1 up/- 10.61.83.200/24 cluster-01 e0a
true

1-4.

The customer says that the entire cluster is not serving data. The customer
wants an explanation as to why the LIFs are home but not serving data. Identify the
vifmgr behavior that explains this situation.

E-16 ONTAP Troubleshooting: Student Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


Task 2: Given a Networking Configuration, Associate the FreeBSD Ports
and the Ports That Are Defined in the RDB (ifconfig –A Compared to
net interface show)
Step Action
2-1. Display the current network interface configuration for the entire cluster using the
following command:
cluster1::> network interface show

2-2. View the current networking interface configuration for node4 by entering the following
command:
cluster1::> net int show -curr-node node4

2-3. Log in to the systemshell of node4 by running the following command:


cluster1::*> systemshell -node node4

2-4. From the systemshell prompt of node4, view the status of the network ports on the
node by running the following command:
node4% ifconfig -a

2-5. Correlate the output from Step 2-2 and Step 2-4 to determine whether the interface
configuration, as reported by the management component, agrees with the interface
configuration of the FreeBSD networking layer.

2-6. Exit the systemshell on node4 by typing exit.

2-7. Administratively bring down an interface hosted on a port on node4.

2-8. Repeat Step 2-2 through Step 2-5 to observe that the action taken in Step 2-7 was correctly
passed on to the FreeBSD networking layer of node4.

E-17 ONTAP Troubleshooting: Student Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


Task 3: Identify and Resolve Failures That Occur When You Can Create
New LIFs
Your instructor prepares your lab environment for this exercise and notifies you when it is ready.

Scenario: A customer has called to report that the command to create LIFs fails.
Step Action
3-1. Log in to the cluster management interface, try to create a data LIF, and then answer the
questions:
cluster1::*> net int create -vserver nassvm1 -lif task5 -role
data -data-protocol nfs,cifs,fcache -home-node node2 -home-port
e0d -address 192.168.81.150 -netmask 255.255.255.0
 What error message do you see?
 Is the error message valid?
 What command would you use to check?

3-2. Check the cluster connectivity from node2 to all the nodes in the entire cluster, and then answer
the following questions:
 What command do you use?
 What do you see?
3-3. Check the interfaces and the ports on the problem node, node2, and list the command that you
use.

3-4. Attempt the same command from another node, and then answer the following questions:
 What do you see?
 Is there any warning or error?
 What might be wrong?
3-5. Verify your hypothesis on the systemshell using rdb_dump and using ps to check the running
processes, and check the logs from the clustershell.
You might need to include vifmgr and mgwd by using the following command:
cluster1::*> debug log files modify -incl-files vifmgr, mgwd,
messages
3-6.

The logs might be verbose, so you might need to use debug log show and parse a
timestamp.

3-7.

The knowledge base article titled


https://kb.netapp.com/support/index?page=content&id=1015631 shows an example of how to
parse the timestamp.

3-8. Correct the problem using information that you learned in this module.

3-9. Log in to the cluster management interface, and again try to create the data LIF.

End of Exercise

E-18 ONTAP Troubleshooting: Student Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


Module 3: Troubleshooting Network Component and
Security Services
In this exercise, you practice using the secd tool to exercise secd operations from the CLI.

Objectives
This exercise focuses on enabling you to troubleshoot using the diag secd commands.

Task 1: Given a Scenario, Diagram and Identify Network Component and


Data Component Interaction (Includes Finding Which Node Has the
Problem)
When you diagram and document I/O through an ONTAP system, the questions that you generally
need to answer first are the following:
 Which storage virtual machine (SVM) is being used?
 Which client is being used?
 Which LIF is being used by the client to connect to the storage that the SVM provides, and on
which node is it?
 Which volume is the client attempting to access, and on which node is it?
 Is multiprotocol involved? If the data protocol is NFS protocol, do the volumes being
accessed have NTFS security style? If the data protocol is CIFS protocol, do the volumes
being accessed have UNIX or NTFS security style?
Step Action
1-1. A customer calls in with a problem on its 4-node cluster. The customer states that the
SVM vs3 is not serving data. The customer indicates that it is connecting to LIF_3,
which is on node-3. The customer is trying to access the volume vol_cifs_homes using
CIFS. The customer thinks that the volume is on node-3, which is NTFS security style.
Answer the following questions:
 Is this scenario an example of local or remote I/O?
 Which node is doing the actual protocol work?
 Which node is doing the NetApp WAFL and storage work?
 Is multiprotocol processing involved?
1-2. A customer calls in with a problem on its 8-node cluster. The customer states that the
SVM vs1 is not serving data. The customer indicates that it is connecting to LIF_5,
which is on node-4. The customer is trying to access the volume vol_nfs_homes using
NFS. The customer thinks that the volume is on an aggregate in node-3, which is UNIX
security style.
Answer the following questions:
 Is this scenario an example of local or remote I/O?
 Which node is doing the actual protocol work?
 Which node is doing the WAFL and storage work?
Is multiprotocol processing involved?

E-19 ONTAP Troubleshooting: Student Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


Task 2: Use the network connections active Command to View
Active Connections
Step Action
2-1. On your lab gear, connect to the cluster management interface, and log in as administrator.

2-2. Run the following command, and observe the output:


cluster1::> network connections active show

2-3. Check for specific protocol connections by running the following commands:
cluster1::> network connections active show -service
nfs

cluster1::> network connections active show -service


cifs-srv

2-4.

NOTE: In your lab environment, there might be no data protocol


connections. In this case, the table is empty.
2-5. To list all possible values for the –service argument, run the following
command:
cluster1::> network connections active show -service ?
2-6.

NOTE: The –service iscsi argument always returns empty results because the
iSCSI service is not tracked here.

2-7. Run the following command, and observe the output:


cluster1::> network connections active show -fields cid, local-
address, remote-ip, service

2-8. Enter diag mode using the following command:


cluster1::> set diag

2-9. Select a connection ID (CID) from the output in Step 2-7.

2-10. Display the properties of the selected CID by running the following command:
cluster1::*> network connections active show -cid <CID #> -
instance

2-11. Terminate this connection by entering the following command:


cluster1::> network connections active delete -node <node> -cid
<CID #> -vserver <Vserver name>
2-12. Observe the status of this CID by running the following command:
cluster1::> network connections active show -cid <CID #> -
instance

E-20 ONTAP Troubleshooting: Student Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


Task 3: Using diag secd
Step Action
3-1. Log in to the node management interface of node4.

3-2. Type the following command, and then answer the questions:
cluster1::> diag secd
 Why does it fail?
 What do you need to do to use the diag secd command for troubleshooting?
3-3. Identify the UNIX user that the Windows user student1 maps to, and use diag secd to
find this mapping.

3-4. Explain how you query for a Windows security identifier (SID) of student1 using diag
secd.

3-5. Explain how you can test a cifs login for a student1 user in diag secd. (Password for
student1:P@ssw0rd)

3-6. Answer the following questions:


 How can you clear caches using diag secd?
 How can you clear more than one at a time?
3-7. List the equivalents of Data ONTAP 7G operating system’s cifs resetdc and cifs
testdc.

3-8. Explain how you show and set the current logging level in secd.

3-9. Explain how you enable tracing in secd to capture the logging level that is specified.

3-10. Explain how you check the cifs server information in secd and compare with what is in the
RDB.

3-11. Explain how you can view and clear active CIFS connections in secd.

End of Exercise

E-21 ONTAP Troubleshooting: Student Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


Module 4: Troubleshooting NFS
In this exercise, you fix various issues during NFS access.

Objectives
This exercise focuses on enabling you to do the following:
 Resolve frequently seen mount issues
 Resolve access issues

Task 1: Mount Issues


Your instructor prepares your lab environment for this exercise and notifies you when it is ready.
Scenario: A customer cannot mount an NFS share.
Step Action
1-1.
Log in to the Linux client using the following credentials:
 Username: root
 Password: P@ssw0rd

1-2.
Issue the following commands, and then answer the question.
[root@cats-cent ~]# mkdir /nassvm1
[root@catsp-cent ~]# mount -o nfsvers=3
192.168.6.115:/nassvm1_nfs /nassvm1
Does the command succeed?

1-3. Identify the node to which the mount request is going.

1-4.
From that node, capture a packet trace while repeating the previous mount command, and then
answer the following questions:
 Are you able to troubleshoot the issue using the packet trace?
 What is the issue?

1-5. Describe how to fix the issue.

E-22 ONTAP Troubleshooting: Student Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


Task 2: Mount and Access Issues
Your instructor prepares your lab environment for this exercise and notifies you when it is ready.
Scenario: The customer cannot mount due to access issues.
Step Action
2-1.
The customer receives an error message when attempting to mount volume nassvm_nfs.
Examine the error message.
[root@catsp-cent ~]# mount -o nfsvers=3
192.168.6.115:/nassvm1_nfs /nassvm1
mount.nfs: access denied by server while mounting (null)

2-2. Explain why the customer is denied access, and then fix the problem.

2-3.
If you can mount now, cd into the mount point, and then answer the following questions:
 Can you cd into the mount point?
 If nonoperational, how do you resolve the issue?
 If you unmount and remount, does it still work?

2-4.
Try to write a file into the /nassvm1 directory, and then answer this question:
Are you able to write the file?

2-5.
After the write succeeds, view the permissions using ls –la, and then answer the following
questions:
 What are the file permissions on the file that you wrote?
 Why are the permissions and owner set the way that they are?

2-6.
Change the export policy rule for the volume to make superuser and anon something other than
what they are, write another file and check permissions, and then answer this question:
What do these actions do?

2-7.
Open a new Secure Shell (SSH) session to your Linux computer, log in as the user “cmodeuser”
with the password “passwd,” and then answer these questions:
 Can you cd to the mount directory?
 If successful, can you write files to the mount?
 If you notice an issue, what is the reason?
 How do you resolve this issue?

End of Exercise

E-23 ONTAP Troubleshooting: Student Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


Module 5: Troubleshooting SMB
In this exercise, you fix various issues during CIFS access.

Objectives
This exercise focuses on enabling you to do the following:
 Identify LIFs that are involved in CIFS access
 Troubleshoot using the diag secd commands
 Troubleshoot domain controller login issues
 Troubleshoot SMB user-authentication issues
 Troubleshoot the export policy issues

Task 1: Identify LIFs


Step Action
1-1. Try to access the share vol1 by using SMB and by mapping a network drive to the path
\\nassvm1\vol1 from the Windows host. You use the mRemoteNG application on your
desktop.

1-2. Explain whether you can access the share.

1-3. If there are issues, fix the issues.

1-4. Answer the following questions:


 Which node is serving the data?
 Which data LIF is the client using to access the share?
 What is the junction path that is represented by the CIFS share?
 Which SVM, volume, and aggregate are involved?
1-5. Check which data LIFs have active sessions.

1-6. Explain whether the most efficient network path to the volume is being used.

E-24 ONTAP Troubleshooting: Student Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


Task 2: Domain Controller Login Issues
Your instructor prepares your lab environment for this exercise and notifies you when it is ready.
Scenario: The customer cannot access a CIFS share.
Step Action
2-1.
Close all SMB connections, and from the Windows host, attempt to access the SMB share
\\nassvm1\vol1 using the following credentials (you use the mRemoteNG application):
Username: student1
Password: P@ssw0rd

2-2. Explain the errors that you received.

2-3.
Instead of using the host name, use the LIF IP to access the CIFS share, and then answer these
questions:
Can you access the share?
Why or why not?

2-4. Analyze the issues, and use related commands to troubleshoot and to fix the issues.
Hint: You use the command cifs session show -instance when you map using
vserver name and when you map using the IP address and you check the protocol that is being
used for authentication.

E-25 ONTAP Troubleshooting: Student Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


Task 3: Authentication Issues
Your instructor prepares your lab environment for this exercise and notifies you when it is ready.
Scenario: The customer still cannot access the share.
Step Action
3-1. From the Windows client, log in with the following credentials (you use the mRemoteNG
application):
Username: student1
Password: P@ssw0rd

3-2. Click Start > Run > \\nassvm1, and then describe the error message that you see.

3-3. Use the diag secd commands to check whether the user name is valid.

3-4. Run the following command to view the logs, and then answer the question.
cluster1::> event log show
What do the logs show?

3-5. Use a diag secd command to verify the issue, and then answer this question:
Which other commands can you run to view configurations that verify the issue?

3-6. Go to the systemshell of the node that reported the error, look at the appropriate log file,
and then answer these questions:
 Do you see that the issue that is logged is there?
 Can you identify the root cause?
 How do you fix it?
 Are you able to access the share through SMB now?
3-7. If you still cannot access the share through SMB, check whether the user mapping is still a
problem.

3-8. If the user mapping problem still exists, fix it.

3-9. Given that the user mapping succeeds, but you are still unable to access the share, explain
what the issue could be.

3-10. List the commands that are available to review security settings, such as permissions and
security style on volumes, shares, and so on.

3-11. From the clustershell, use vserver security file-directory show to view
permissions on the volumes that you are trying to access, and then answer this question:
Should the user have access to these volumes?

3-12. Explain how you resolve this issue.

3-13. Change the security style of the volume to NTFS, and see whether you can access the volume
now.

E-26 ONTAP Troubleshooting: Student Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


Task 4: Export Policies
Your instructor prepares your lab environment for this exercise and notifies you when it is ready.
Scenario: The customer still cannot access the share.
Step Action
4-1. Log in to the Windows client using the following credentials (you use the mRemoteNG
application):
Username: student1
Password: P@ssw0rd

4-2. Try to access \\nassvm1\vol1, and describe the error that you see.

4-3. Answer the following questions:


 Do the event logs show any errors?
 What about the secd log?
4-4. Run the following command to see whether it shows that the permissions on the volume
should enable access.
cluster1::*> vserver security file-directory show -vserver
nassvm1 -path /nassvm1_cifs

4-5. Answer the following questions:


 What do you think could be the issue?
 How do you fix the issue?

End of Exercise

E-27 ONTAP Troubleshooting: Student Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


Module 6: Troubleshooting Scalable SAN
In this exercise, you fix various issues during SAN access.

Objectives
This exercise focuses on enabling you to do the following:
 Use standard Linux commands to evaluate a Linux host in a NetApp scalable SAN environment
 Use standard Linux commands to identify SAN disks in a NetApp scalable SAN environment
 Use standard Linux commands to verify connectivity in a NetApp scalable SAN environment
 Use standard Linux log files to evaluate the iSCSI subsystem in a NetApp scalable SAN environment
 Troubleshoot a Linux host in a NetApp scalable SAN environment
 Troubleshoot a Windows host in a NetApp scalable SAN environment
 Restore LUN connectivity

Task 1: Evaluate a Linux Host in a Scalable SAN Environment


Verify your environment to make sure that it is stable enough for the remaining labs.
Step Action
1-1. Log in to the Linux system ots-cent as root, run the following commands to evaluate a Linux
host, and record the results in the space provided.
 Determine the IP address of the host: #ifconfig eth0
 Verify that the iSCSI initiator is installed: #rpm –qa | grep iscsi
 Verify that the host is logged in to the iSCSI array (target): #iscsiadm –m session
NOTE: The IP addresses and iSCSI Qualified Names (IQNs) that are listed belong to the
targets.
tcp: [10] 192.168.6.131:3260,1037 iqn.1992-
08.com.netapp:sn.140668517d5511e5ac18005056bf03f8:vs.16
 List the IQN and IP addresses of the targets that are shown in the output of the previous
command from ONTAP software:
::> net int show -vserver sansvm*
::>iscsi show -instance

E-28 ONTAP Troubleshooting: Student Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


Step Action
1-2. Type the following command to identify the SAN disks that are attached to a Linux host.
[root@ots-cent ~]# fdisk -l
Disk /dev/sdb: 209 MB, 209715200 bytes
7 heads, 58 sectors/track, 1008 cylinders
Units = cylinders of 406 * 512 = 207872 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 65536 bytes
Disk identifier: 0x00000000

Disk /dev/sdd: 104 MB, 104857600 bytes


4 heads, 50 sectors/track, 1024 cylinders
Units = cylinders of 200 * 512 = 102400 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 65536 bytes
Disk identifier: 0x00000000

Disk /dev/sdc: 209 MB, 209715200 bytes


7 heads, 58 sectors/track, 1008 cylinders
Units = cylinders of 406 * 512 = 207872 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 65536 bytes
Disk identifier: 0x00000000

Disk /dev/sde: 104 MB, 104857600 bytes


4 heads, 50 sectors/track, 1024 cylinders
Units = cylinders of 200 * 512 = 102400 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 65536 bytes
Disk identifier: 0x00000000

E-29 ONTAP Troubleshooting: Student Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


Step Action
1-3. Type the following command to determine the state of the Linux iSCSI service, and start the
service if it is not already started.
[root@ots-cent ~]# service iscsi status
iSCSI Transport Class version 2.0-870
version 2.0-872.41.el6
Target: iqn.1992-08.com.netapp:sn.140668517d5511e5ac18005056bf03f8:vs.16
Current Portal: 192.168.6.131:3260,1037
Persistent Portal: 192.168.6.131:3260,1037
**********
Interface:
**********
Iface Name: default
Iface Transport: tcp
Iface Initiatorname: iqn.1994-05.com.redhat:ots-cent
Iface IPaddress: 192.168.6.20
Iface HWaddress: <empty>
Iface Netdev: <empty>
SID: 10
iSCSI Connection State: LOGGED IN
iSCSI Session State: LOGGED_IN
Internal iscsid Session State: NO CHANGE
************************
Attached SCSI devices:
************************
Host Number: 12 State: running
scsi12 Channel 00 Id 0 Lun: 0
Attached scsi disk sdb State: running
scsi12 Channel 00 Id 0 Lun: 1
Attached scsi disk sdd State: running

1-4. Use the output of the service iscsi status command that is displayed in Step 1-3 to
answer the following questions:
 List the Iface initiatorname: ____________________________________
 List the iSCSI connection state: ________________________________
 List the disks that are attached to SCSI12 Channel 00: ______________________
 List the state of each disk: ____________________________________
 List the current portal: ________________________________________

E-30 ONTAP Troubleshooting: Student Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


Step Action
1-5. Type the following command to verify connectivity between the host and target:
[root@cats-cent ~]# netstat -pant | grep iscsi

Active Internet connections (servers and established)


Proto Recv-Q Send-Q Local Address Foreign Address State
PID/Program name
tcp 0 0 192.168.6.20:40412 192.168.6.131:3260 ESTABLISHED
1382/iscsid
tcp 0 0 192.168.6.20:39372 192.168.6.135:3260 ESTABLISHED
1382/iscsid
tcp 0 0 192.168.6.20:52005 192.168.6.136:3260 ESTABLISHED
1382/iscsid
tcp 0 0 192.168.6.20:47742 192.168.6.132:3260 ESTABLISHED
1382/iscsid

1-6. The state of the active internet connection between the host (local address) and target
(foreign address) is ESTABLISHED.
The Linux host records events about the iSCSI subsystem in the system messages
file, /var/log/messages.

E-31 ONTAP Troubleshooting: Student Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


Step Action
1-7. Run the following commands to view log entries that are related to the iSCSI subsystem.
 Observe the current date and time:
[root@cats-cent~]# date
Thu Oct 29 04:14:47 EDT 2015
 Stop the iSCSI service and observe the event that is recorded in the /var/log/messages file:
[root@cats-cent~]# service iscsi stop && tail -f
/var/log/messages
 Press Ctrl+C to exit the log file view.
 Observe that the service shutdown event for each connection has been recorded in the log:
Oct 29 04:14:55 cats-centiscsid: Connection21:0 to [target:
iqn.1992-
08.com.netapp:sn.140668517d5511e5ac18005056bf03f8:vs.16,
portal: 192.168.6.132,3260] through [iface: default] is
shutdown.
 Correlate the date and time from Step 1 to the date and time of the log entries.
 Start the iSCSI service and observe the event that is recorded in the /var/log/messages file:
[root@cats-cent~]# service iscsi start && tail -f
/var/log/messages
 Correlate the date and time from Step 1 to the date and time of the log entries.
 Observe that the log entries record the start-up event. Each disk is enumerated (sdb, sdc,
sdd, sde) and attached:
Oct 29 04:15:03 cats-cent kernel: sd 28:0:0:0: [sdb] Attached
SCSI disk
 Observe that each connection to the targets is enumerated and listed as operational:
Oct 29 04:15:04 cats-cent iscsid: Connection28:0 to [target:
iqn.1992-
08.com.netapp:sn.1dd67cf37d5511e5ac18005056bf03f8:vs.17,
portal: 192.168.6.136,3260] through [iface: default] is
operational now
 Press Ctrl+C to exit the log file view.
1-8.

NOTE: The “&&” runs two or more commands in succession.

E-32 ONTAP Troubleshooting: Student Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


Task 2: The Linux Host Has Lost iSCSI connections
Your instructor prepares your lab environment for this exercise and notifies you when it is ready.
Scenario: A customer reports that it has lost all connections to SANSVM1.
The instructor breaks the lab. If the problem is not visible right away, restart the iSCSI service.
Step Action
2-1. Log into the NetApp storage environment as an administrator.

2-2. Evaluate the storage environment.

2-3. Log into the LINUX host, cats-cent, as root.

2-4. Type the following command to verify connectivity between the host and target, and then
answer the following questions:
[root@cats-cent ~]# netstat -pant | grep iscsi
 Do you see four connections in ESTABLISHED state?
 If not, what could be the issue?

2-5. Fix the issue.

E-33 ONTAP Troubleshooting: Student Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


Task 3: All LUNs Are Missing From the Windows Host
Your instructor prepares your lab environment for this exercise and notifies you when it is ready.
Scenario: A customer reports that there are no visible SAN disks attached to the Windows host. You evaluate
the NetApp scalable SAN environment, and restore LUN connectivity.
Step Action
3-1. Log in to the Windows host, and check the firewall configuration.
3-2. If the firewall is enabled, disable it to see whether the LUN connectivity can be restored.
3-3. Log in to the NetApp cluster as an administrator.
3-4. Verify the configuration of the NetApp cluster.
3-5. Log in to the Windows host as an administrator.
3-6. Verify the configuration of the Windows host.
3-7. Verify that the windows IQN name is used in the SAN configurations of the cluster.
3-8. Restore the SAN disks.

E-34 ONTAP Troubleshooting: Student Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute


Task 4: The LUNS are not visible through all LIFS of an SVM
Lab Scenario: In your lab, MPIO is not set in the Windows host.
Step Action
4-1. Log in to the Windows host, and then click Start > Administrative Tools and open iSCSI
Initiator.
4-2. Disconnect from the vserver sansvm2 if it is connected. (This is the connection to the target that
has an iqn that ends with vs.8.)
4-3. Select the target vserver sansvm1.(This is the connection to the target that has aniqn that ends
with vs.7.)
4-4. Click Properties.
4-5. Note the target portal group of the session you see in the Properties window, and find the
corresponding LIF by using the following cluster shell command:
cluster1::> iscsi portal show
Vserver Logical Status Curr Curr
Interface TPGT Admin/Oper IP Address Node Port Enabled
---------- ---------- ---- ---------- --------------- ----------- ---- -------
sansvm1 sansvm1_data1
1034 up/up 192.168.6.131 node3 e0d true
sansvm1 sansvm1_data2
1035 up/up 192.168.6.132 node4 e0d true
sansvm2 sansvm2_data1
1036 up/up 192.168.6.135 node3 e0d true
sansvm2 sansvm2_data2
1037 up/up 192.168.6.136 node4 e0d true

4-6. Add a second session:


 From the iSCSI Initiator Properties window, click Properties while the sansvm1 is still
selected.
 Click Add Session.
 In the <Connect To Target> pop-up window, select the check box that is labeled Enable
multi-path.
 Click Advanced.
 Use the pull-down menu and set the following values:
o Local adapter: Microsoft iSCSI Initiator
o Initiator IP: 192.168.6.11
o Target Portal IP: Choose the IP of the second LIF of sansvm1

4-7. Answer this question:


How many disks do you see?
You use the Disk Management to check this. (Remember that MPIO is not set for the iSCSI
Initiator.)
4-8. To identify the session through which you see the disk, disconnect from each session and note
the session and the target portal tag through which you see the disk.
4-9. Explain the behavior.

End of Exercise

E-35 ONTAP Troubleshooting: Student Exercise Guide

© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.

NetApp University - Do Not Distribute

You might also like