Professional Documents
Culture Documents
Troubleshooting Cluster Administration
Troubleshooting Cluster Administration
Troubleshooting Cluster Administration
MapR 5.2 is at End of Life (EOL) and no longer supported. Please see the latest documentation. This documentation
is not being updated.
The URL reported by YARN for tracking job details does not load.
This URL uses the output of the hostname -f command, which must be the fully qualified domain name (FQDN) for the node. On
Ubuntu, make sure that the /etc/hostname file is configured with the node’s FQDN. On CentOS/Redhat, make sure that the
/etc/sysconfig/network file is configured with the node’s FQDN, then restart the node.
The ResourceManager does not start.
If the ResourceManager does not come up, check the following:
Check that you supplied the correct ResourceManager hostname or IP address in the -RM parameter when running
configure.sh on each node at installation time. If you are not sure, you can re-run configure.sh to correct the problem.
Do not specify a ResourceManager port with the hostname or IP address in the -RM parameter; there is no <port> option.
Make sure that you specified the same ResourceManager hostname or IP address on all nodes when running
configure.sh.
For more information about what might be causing a problem, check the ResourceManager
logs:/opt/mapr/hadoop/hadoop-<version>/logs
Make sure that the fileserver role is installed on the node by looking in the /opt/mapr/roles directory.
Make sure that the fileserver service is running, using either the service list command or the MapR Control System.
For more information about what might be causing a problem, check the NodeManager logs:/opt/mapr/hadoop/hadoop-
2.3.0/logs
Make sure that the HistoryServer role is installed on the desired node by looking in the /opt/mapr/roles directory. Note
that only one node in the cluster can have the HistoryServer role.
Make sure the HistoryServer is running on the desired node, using either the service list command or the MapR Control
System.
Check that you supplied the correct HistoryServer hostname or IP address in the -HS parameter when running
configure.sh on each node at installation time. If you are not sure, you can re-run configure.sh to correct the problem.
Make sure the application is running in YARN and not as a local application (check for app_local or job_local in the
application output).
Check the class path on which the application was invoked, and make sure that /opt/mapr/hadoop/hadoop-
<version>/etc/hadoop includes the class paths.
Make sure that you are running in the correct version of Hadoop. Example:juser@Techpubs-rawnode01:~$ ls -l
/usr/bin/hadooplrwxrwxrwx 1 root root 40 Mar 4 11:38 /usr/bin/hadoop -> /opt/mapr/hadoop/hadoop-2.3.0/bin/hadoop
https://docs.datafabric.hpe.com/52/AdministratorGuide/TroubleshootClusterAdmin.html 1/5
5/26/2021 Troubleshooting Cluster Administration
Check that the application jar is correctly packaged with the required class.
Make sure that you are running in the correct version of Hadoop. Example:juser@Techpubs-rawnode01:~$ ls -l
/usr/bin/hadooplrwxrwxrwx 1 root root 40 Mar 4 11:38 /usr/bin/hadoop -> /opt/mapr/hadoop/hadoop-2.3.0/bin/hadoop
When you run the configure.sh script with the -F option, the ZooKeeper and Warden services start up on the primary node first,
then as other nodes are installed, services are automatically started on those nodes. However, because of this timing issue,
Warden may fail to communicate with ZooKeeper, and the cluster may fail to come up.
If you encounter this problem, do not use the -F option. Instead, stop all ZooKeeper and Warden services on all nodes, then
start the ZooKeeper services on all of the ZooKeeper nodes (that is, the nodes where the ZooKeeper packages are installed).
Finally, start the Warden services on all nodes.
Out of memory.
When the aggregated memory used by MapReduce tasks exceeds the memory reserve on a TaskTracker node, tasks can fail or
be killed. MapR attempts to prevent out-of-memory exceptions by killing MapReduce tasks when memory becomes scarce. If
you allocate too little Java heap for the expected memory requirements of your tasks, an exception can occur. The following
steps can help configure MapR to avoid these problems:
If a particular job encounters out-of-memory conditions, the simplest way to solve the problem might be to reduce the
memory footprint of the map and reduce functions, and to ensure that the partitioner distributes map output to reducers
evenly.
If it is not possible to reduce the memory footprint of the application, try increasing the Java heap size (-Xmx) in the client-
side MapReduce configuration.
If many jobs encounter out-of-memory conditions, or if jobs tend to fail on specific nodes, it may be that those nodes are
advertising too many TaskTracker slots. In this case, the cluster administrator should reduce the number of slots on the
affected nodes.
To find the serverid, use the maprcli node list command, which lists information about all nodes in a cluster. The id field is
the value to use for serverid.
For example:
https://docs.datafabric.hpe.com/52/AdministratorGuide/TroubleshootClusterAdmin.html 2/5
5/26/2021 Troubleshooting Cluster Administration
You can also get this listing as a JSON object by using the -json option. For example:
Error 'mv Failed to rename maprfs...' when moving files across volumes.
Prior to version 2.1, you cannot move files across volume boundaries in the MapR Data Platform. You can move files within a
volume using the hadoop fs -mv command, but attempting to move files to a different volume results in an error of the form
"mv: Failed to rename maprfs://<source path> to <destination path>".
As a workaround, you can copy the file(s) from source volume to destination volume, and then remove the source files.
The example below shows the failure occurring. In this example directories /a and /b are mount-points for two distinct
volumes.
The example below shows the work-around, moving a file /a/testfile to directory /b, and then removing the source file.
https://docs.datafabric.hpe.com/52/AdministratorGuide/TroubleshootClusterAdmin.html 3/5
5/26/2021 Troubleshooting Cluster Administration
(The path given in this message is relative to /opt/mapr/, which might be misleading.)
As a work-around after upgrading, to continue working with mirror volumes created in v1.2, duplicate any lines with upper-
case letters in mapr-clusters.conf, converting all letters to lower case.
To resolve this issue, add the following properties to core-site.xml in directory /opt/mapr/hadoop/hadoop-0.20.2/etc/ :
<property>
<name>hadoop.proxyuser.mapr.groups</name>
<value>*</value>
<description>Allow the superuser mapr to impersonate any member of any
group</description>
</property>
<property>
<name>hadoop.proxyuser.mapr.hosts</name>
<value>*</value>
<description>The superuser can connect from any host to impersonate a
user</description>
</property>
When YARN container logs are not aggregated, the YARN container logs are retained for 3 hours on each node. To update the
duration, edit the value of yarn.nodemanager.log.retain-seconds in yarn-site.xml file.
When YARN container log aggregation is enabled, by default, the aggregated logs are not deleted. However, this setting can be
overridden in yarn-site.xml. To update the duration, edit the value of log-aggregation.retain-seconds in yarn-site.xml file.
You must consider how long you want the log to remain past the amount of time that the application will take to run. For
r
example, if you expect most applications to take 20 seconds to run, do not set the value of this property to 20 seconds because
the log may be deleted before the applications competes.
https://docs.datafabric.hpe.com/52/AdministratorGuide/TroubleshootClusterAdmin.html 4/5
5/26/2021 Troubleshooting Cluster Administration
MapReduce jobs and YARN applications fail because /tmp subdirectories have been deleted.
Some RHEL and CentOS platforms include the tmpwatch service by default. This service cleans up the /tmp directory on a
regular basis. However, this operation causes the deletion of directories that are needed for jobs and applications to run (for
example, nm-local-dir for YARN and hadoop-<user> directories for MapReduce jobs). The running NodeManager and
TaskTracker processes do not re-create these missing directories, causing jobs and applications to fail.
https://docs.datafabric.hpe.com/52/AdministratorGuide/TroubleshootClusterAdmin.html 5/5