Hadoop Administration Commands

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 3

Hadoop Administration Commands

Any Hadoop administrator worth his salt must master a comprehensive set of commands for cluster
administration. The following table summarizes the most important commands. Know them, and you
will advance a long way along the path to Hadoop wisdom.

Command What It Does Syntax Example

balancer Runs the cluster-balancing utility. hadoop balancer hadoop balancer


The specified threshold value, which [-threshold <threshold>] -threshold 20
represents a percentage of disk
capacity, is used to overwrite the
default threshold value (10 percent).
To stop the rebalancing process,
press Ctrl+C.

daemonlog Gets or sets the log level for each hadoop daemonlog
hadoop daemonlog
daemon (also known as a service). -getlevel
-getlevel
Connects tohttp://host:port/ 10.250.1.15:50030
<host:port>
logLevel?log=name and prints or sets org.apache.hadoop.
<name>; hadoop
the log level of the daemon that’s mapred.JobTracker;
daemonlog
running athost:port. Hadoop hadoop daemonlog
-setlevel
daemons generate log files that help -setlevel 10.250.
<host:port>
you determine what’s happening on 1.15:50030
<name>
the system, and you can use org.apache.hadoop.
<level>
thedaemonlog command to mapred.JobTracker
temporarily change the log level of a DEBUG
Hadoop component when you’re
debugging the system. The change
becomes effective when the daemon
restarts.

datanode Runs the HDFS DataNode service, hadoop datanode hadoop datanode –
which coordinates storage on each [-rollback] rollback
slave node. If you specify -rollback,
the DataNode is rolled back to the
previous version. Stop the DataNode
and distribute the previous Hadoop
version before using this option.

dfsadmin Runs a number of Hadoop hadoop dfsadmin


Distributed File System (HDFS) [GENERIC_
administrative operations. Use the - OPTIONS]
helpoption to see a list of all [-report]
supported options. The generic [-safemode
options are a common set of options enter
supported by several commands. | leave |
get | wait]
[-refreshNodes]
[-finalize
Upgrade]
[-upgrade
Progress
status |
details | force]
[-metasave filename]
[-setQuota
<quota>
<dirname>...<dirname>]
[-clrQuota <dirname>
...<dirname>]
[-restoreFailed
Storagetrue|false
|check] [-help
[cmd]]

mradmin Runs a number of MapReduce hadoop mradmin hadoop mradmin -


administrative operations. Use the - [ GENERIC_OPTIONS ] help
helpoption to see a list of all [-refreshServiceAcl] –refreshNodes
supported options. Again, the [-refreshQueues]
generic options are a common set of [-refreshNodes]
options that are supported by [-refreshUserTo
several commands. If you specify - GroupsMappings]
refreshServiceAcl, reloads the [- refreshSuper
service-level authorization policy file UserGroups
(JobTracker reloads the Configuration] [-help
authorization policy file); - [cmd]]
refreshQueues reloads the queue
access control lists (ACLs) and state
(JobTracker reloads the mapred-
queues.xml file); -
refreshNodes refreshes the hosts
information at the JobTracker; -
refreshUserToGroups
Mappings refreshes user-to-groups
mappings; -refreshSuperUserGroups
Configuration refreshes superuser
proxy groups mappings; and -help
[cmd] displays help for the given
command or for all commands if
none is specified.

jobtracker Runs the MapReduce JobTracker hadoop hadoop jobtracker –


node, which coordinates the data jobtracker [-dump dumpConfiguration
processing system for Hadoop. If you Configuration]
specify -dumpConfiguration, the
configuration that’s used by the
JobTracker and the queue
configuration in JSON format are
written to standard output.

namenode Runs the NameNode, which hadoop hadoop namenode –


coordinates the storage for the namenode finalize
whole Hadoop cluster. If you [-format] |
specify -format, the NameNode is [-upgrade] |
started, formatted, and then [-rollback] |
stopped; with -upgrade, the [-finalize] |
NameNode starts with the upgrade [-import
option after a new Hadoop version is Checkpoint]
distributed; with -rollback, the
NameNode is rolled back to the
previous version (remember to stop
the cluster and distribute the
previous Hadoop version before
using this option); with -finalize, the
previous state of the file system is
removed, the most recent upgrade
becomes permanent, rollback is no
longer available, and the NameNode
is stopped; finally, with -
importCheckpoint, an image is
loaded from the checkpoint
directory (as specified by
thefs.checkpoint.dirproperty) and
saved into the current directory.

Secondary Runs the secondary NameNode. If hadoop


hadoop secondary
namenode you specify -checkpoint, a secondarynamenode
namenode
checkpoint on the secondary –geteditsize
[-checkpoint [force]] | [-
NameNode is performed if the size
geteditsize]
of the EditLog (a transaction log that
records every change that occurs to
the file system metadata) is greater
than or equal tofs.checkpoint.size;
specify -force and a checkpoint is
performed regardless of the EditLog
size; specify –geteditsizeand the
EditLog size is printed.

tasktracker Runs a MapReduce TaskTracker hadoop hadoop tasktracker


node. tasktracker

You might also like