Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 3

Hadoops daemons expose a handful of ports over TCP.

Some of these ports are used by Hadoops


daemons to communicate amongst themselves (to schedule jobs, replicate blocks, etc.). Others ports are
listening directly to users, either via an interposed Java client, which communicates via internal protocols,
or via plain old HTTP.

This post summarizes the ports that Hadoop uses; its intended to be a quick reference guide both for
users, who struggle with remembering the correct port number, and systems administrators, who need to
configure firewalls accordingly.

Web UIs for the Common User

The default Hadoop ports are as follows:

Daemon Default Port Configuration Parameter


HDFS Namenode 50070 dfs.http.address

Datanodes 50075 dfs.datanode.http.address

Secondarynamenode 50090 dfs.secondary.http.address

Backup/Checkpoint node? 50105 dfs.backup.http.address

MR Jobracker 50030 mapred.job.tracker.http.address

Tasktrackers 50060 mapred.task.tracker.http.address


?
Replaces secondarynamenode in 0.21.

Hadoop daemons expose some information over HTTP. All Hadoop daemons expose the following:

/logs
Exposes, for downloading, log files in the Java system property hadoop.log.dir.
/logLevel
Allows you to dial up or down log4j logging levels. This is similar to hadoop daemonlog on the
command line.
/stacks
Stack traces for all threads. Useful for debugging.
/metrics
Metrics for the server. Use /metrics?format=json to retrieve the data in a structured form.
Available in 0.21.
Individual daemons expose extra daemon-specific endpoints as well. Note that these are not necessarily
part of Hadoops public API, so they tend to change over time.

The Namenode exposes:

/
Shows information about the namenode as well as the HDFS. Theres a link from here to browse
the filesystem, as well.
/dfsnodelist.jsp?whatNodes=(DEAD|LIVE)
Shows lists of nodes that are disconnected from (DEAD) or connected to (LIVE) the namenode.
/fsck
Runs the fsck command. Not recommended on a busy cluster.
/listPaths
Returns an XML-formatted directory listing. This is useful if you wish (for example) to poll HDFS to
see if a file exists. The URL can include a path (e.g., /listPaths/user/philip) and can take
optional GET arguments:/listPaths?recursive=yes will return all files on the file
system; /listPaths/user/philip?filter=s.* will return all files in the home directory that
start with s; and /listPaths/user/philip?exclude=.txt will return all files except text
files in the home directory. Beware that filter and exclude operate on the directory listed in the
URL, and they ignore the recursive flag.
/data and /fileChecksum
These forward your HTTP request to an appropriate datanode, which in turn returns the data or
the checksum.
Datanodes expose the following:

/browseBlock.jsp, /browseDirectory.jsp, tail.jsp, /streamFile, /getFileChecksum


These are the endpoints that the namenode redirects to when you are browsing filesystem
content. You probably wouldnt use these directly, but this is whats going on underneath.
/blockScannerReport
Every datanode verifies its blocks at configurable intervals. This endpoint provides a listing of that
check.
The secondarynamenode exposes a simple status page with information including which namenode its
talking to, when the last checkpoint was, how big it was, and which directories its using.

The jobtrackers UI is commonly used to look at running jobs, and, especially, to find the causes of failed
jobs. The UI is best browsed starting at /jobtracker.jsp. There are over a dozen related pages
providing details on tasks, history, scheduling queues, jobs, etc.

Tasktrackers have a simple page (/tasktracker.jsp), which shows running tasks. They also
expose /taskLog?taskid= to query logs for a specific task. They use /mapOutput to serve the output
of map tasks to reducers, but this is an internal API.

Under the Covers for the Developer and the System Administrator

Internally, Hadoop mostly uses Hadoop IPC to communicate amongst servers. (Part of the goal of
the Apache Avroproject is to replace Hadoop IPC with something that is easier to evolve and more
language-agnostic; HADOOP-6170is the relevant ticket.) Hadoop also uses HTTP (for the
secondarynamenode communicating with the namenode and for the tasktrackers serving map outputs to
the reducers) and a raw network socket protocol (for datanodes copying around data).

The following table presents the ports and protocols (including the relevant Java class) that Hadoop uses.
This table does not include the HTTP ports mentioned above.

Daemon Defaul Configuration Parameter Protocol Used


t Port for
Namenode 8020 fs.default.name ?
IPC: ClientProtocol Filesyste
m
metadata
operation
s.

Datanode 50010 dfs.datanode.address Custom Hadoop DFS data


Xceiver: DataNodeand DFSClient transfer

Datanode 50020 dfs.datanode.ipc.address IPC:InterDatanodeProtocol,ClientDat Block


anodeProtocol metadata
ClientProtocol operation
s and
recovery

Backupno 50100 dfs.backup.address Same as namenode HDFS


de Metadata
Operation
s

Jobtrack Ill- mapred.job.tracker IPC:JobSubmissionProtocol,InterTrac Job


er defined. kerProtocol submissio
? n, task
tracker
heartbeat
s.

Tasktrac 127.0.0. mapred.task.tracker.repor IPC:TaskUmbilicalProtocol Communica


ker 1:0 t.address ting with
child
jobs

?
This is the port part of hdfs://host:8020/.
?
Default is not well-defined. Common values are 8021, 9001, or 8012. See MAPREDUCE-566.

Binds to an unused local port.

You might also like