Huong Dan Chay MapReduceShortestPath

XỬ LÝ PHÂN TÁN MAPREDUCE
Shortest Path using MapReduce using parallel Breadth-First Search (BFS)
1. Thuật toán MapReduce

1.1 Tổ chức dữ liệu
Cho đồ thị như hình
Hình 1 Đồ thị có số đỉnh: 10, số cạnh: 21
Đỉnh bắt đầu (Source Node) là đỉnh mà tại đó ta tìm đường đi ngắn nhất từ đỉnh
đó qua tất cả các đỉnh còn lại.
a. Tổ chức danh sách kề ban đầu như sau:
{Node Number, Node Weight, Node Status} TAB {Adjacent Node, Edge Weight} TAB {Adjacent Node, Edge Weight} ...
Trong đó:
 Node Weight = 0 đối với đỉnh bắt đầu.
 Node Weight = INFINITY cho tất cả các đỉnh còn lại
 Node Status = UNDISCOVERED cho tất cả các đỉnh.
 Adjacent Node: Đỉnh kề với đỉnh Node Number
 Edge Weight : Trọng số của cạnh nối giữa 2 đỉnh Node Number và Adjacent Node.
Như vậy, theo như đồ thị trên thì danh sách kề sẽ được khởi tạo như sau:
Source Node: It is the node from which we are attempting to find the shortest path to
all other nodes.
Node Count: 10
Edge Count: 20
Adjacency List:
{Node Number, Node Weight, Node {Adjacent Node, {Adjacent Node, {Adjacent Node, ….
Status} Edge Weight} Edge Weight} Edge Weight}
1
{1,0,UNDISCOVERED} {2,40} {3,8} {4,10}
{2,INFINITY,UNDISCOVERED} {5,6} {7,10}
{3,INFINITY,UNDISCOVERED} {2,4} {4,12} {6,2}
{4,INFINITY,UNDISCOVERED} {6,1}
b. Tổ chức danh sách kề trung gian:

{Node Number, Node Weight, Node Status, Path} TAB {Adjacent Node, Edge Weight} TAB {Adjacent Node, Edge Weight} ...
Trong đó: Node Number = Lấy từ các đỉnh của đồ thị.

 Node Weight = 0 cho đỉnh bắt đầu (Source Node)
= Trọng số nhỏ nhất từ đỉnh bắt đầu tới đỉnh đang xét (cho tất cả các đỉnh khác)
 Node Status = UNDISCOVERED (cho những đỉnh chưa được xét)
= DISCOVERED (cho những đỉnh đã xét)
 Path = Đường đi từ đỉnh Source Node đỉnh đang xét.
 Adjacent Node: Đỉnh kề với đỉnh Node Number
 Edge Weight : Trọng số của cạnh nối giữa 2 đỉnh Node Number và Adjacent Node
{Node Number, Node Weight, Node {Adjacent Node, {Adjacent Node, {Adjacent Node, ….
Status} Edge Weight} Edge Weight} Edge Weight}
{1,0,DISCOVERED,1} {2,40} {3,8} {4,10}
{2,12,UNDISCOVERED,1-3} {5,6} {7,10}
{3,8,DISCOVERED,1} {2,4} {4,12} {6,2}
{4,10,DISCOVERED,1} {6,1}
{5,46,UNDISCOVERED,1-2} {3,2} {6,2} {7,4}
{6,10,UNDISCOVERED,1-3} {8,4} {9,3
{7,50,UNDISCOVERED,1-2} {8,20} {10,1}
1.2 Thuật toán Mapreduce

 Tiến trình Map - Reduce được lặp đi lặp lại cho đến khi không còn phát hiện ra cặp
Key-Value nào có đỉnh là UNDISCOVERED.
 Đầu ra của phép lặp hiện tại được sử dụng làm đầu vào của lần lặp tiếp theo.
Initial Adjacency List Format:

{Node Number,Node Weight,Node Status} TAB {Adjacent Node, Edge Weight} TAB {Adjacent
Node, Edge Weight} ...
Node Number = From Node Number (For all Nodes)
Node Weight = 0 (For the Source Node),
INFINITY (For other Nodes)
This is the minimum weight it takes to reach from the Source Node to the
From Node
Node Status = UNDISCOVERED (For all Nodes)
Adjacent Node = Adjacent Node Number with an Edge from the From Node
Edge Weight = Weight of the Edge from the From Node to the Adjacent Node
2
Intermediate Adjacency List Format:
{Node Number,Node Weight,Node Status,Path} TAB {Adjacent Node, Edge Weight} TAB {Adjacent
Node, Edge Weight} ...
Node Number = From Node Number (For all Nodes)
Node Weight = 0 (For the Source Node)
Minimum weight it takes in reaching from the Source Node to the From Node
(For other Nodes)
Node Status = UNDISCOVERED (For Nodes that are not Expanded)
DISCOVERED (For Nodes that are Expanded)
Path = The path from the Source Node to the From Node
Adjacent Node = Adjacent Node Number with an Edge from the From Node
Edge Weight = Weight of the Edge from the From Node to the Adjacent Node
Mapper Pseudo Code:

Input Key: {Node Number,Node Weight,Node Status} OR {Node Number,Node
Weight,Node Status,Path}
Input Value: {Adjacent Node, Edge Weight} TAB {Adjacent Node, Edge Weight} …
For Every From Node [Key-Value Pair]
If From Node Is Not Discovered And Has A Valid Weight
Updated Path = From Node Path + From Node
Emit A Updated From Node Key-Value Pair With
From Node Status = DISCOVERED
From Node Path = Updated Path
For Every Adjacent Node
Emit A New Adjacent Node Key-Value Pair With
Adjacent Node Weight = From Node Weight + Edge Weight
Adjacent Node Status = UNDISCOVERED
Adjacent Node Path = Updated Path
Else
Emit The Received From Node Key-Value Pair
End
Quá trình Map

1) Nếu source node là hợp lệ, Node Weight khác INFINITY và Node Status khác
DISCOVERED:
Lấy đường đi hiện tại bằng cách gán:
currentPath += “” nếu chưa có đường đi.
currentPath += “-” nếu đã có đường đi và thêm key vào.
Cập nhật lại mapper output.
2) Nếu value có nội dung:
Duyệt qua tất cả các cạnh kề
a) Node Weight mới = Node Weight cũ + trọng số của cạnh kề (Edge Weight)
b) Chọn mapper output có key là node kề với source node.
c) Gán cho các Node cần xử lý trạng thái UNDISCOVERED: Node Status =
UNDISCOVERED
Ngược lại xuất ra các cặp (key, value).
Output Key Comparator Pseudo Code:

Input Key / WritableComparable: {Node Number,Node Weight,Node Status,Path}
3
If From Node Number == To Node Number
Determine If From Node Weight Is Greater / Equal / Lesser Than To Node Weight
Else
Determine If From Node Number Is Greater / Lesser Than To Node Number
End
Output Value Grouping Comparator Pseudo Code:

If From Node Number == To Node Number
They Belong To The Same Group
Else
Determine If From Node Number Is Greater / Lesser Than To Node Number
End
Partitioner Pseudo Code:

Use The Node Number To Determine The Partition
Reducer Pseudo Code:

Input Key: {Node Number,Node Weight,Node Status,Path}
Input Values: {Adjacent Node, Edge Weight} TAB {Adjacent Node, Edge Weight} …
For Every From Node [Key]
New Value = Append Each Adjacent Node [Value] Separated by a TAB
Emit the Key-New Value Pair
Hàm Reduce có nhiệm vụ tổng hợp các cặp <key, value> sau khi chạy hàm Map. Các
key giống nhau được sắp xếp chạy trong cùng một Reduce.
1) Khởi tạo một newValue
2) Ứng với mỗi Key, khi value còn giá trị:
Lấy dữ liệu cho newValue
3) Cập nhật lại output reduce.
1.3 Minh họa các bước tính toán
File input.txt Hình ảnh đồ thị minh họa

{1,0,UNDISCOVERED} {2,40} {3,8} {4,10}
Chú ý : không có khoảng trống sau dấu phẩy (,)
4
Bước lặp 1:
Mapper Output [Trước khi sắp xếp, gộp nhóm]: Hình ảnh đồ thị minh họa
{1,0,DISCOVERED,1} {2,40} {3,8} {4,10}
{2,40,UNDISCOVERED,1}
{7,INFINITY,UNDISCOVERED} {8,20}{10,1}
Mapper Output [Sau sắp xếp, gộp nhóm]:

{1,0,DISCOVERED,1} {2,40} {3,8} {4,10}
{3,INFINITY,UNDISCOVERED} {2,4} {4,12}{6,2}
Bước lặp 2:
Reducer Input / Output: Hình ảnh đồ thị minh họa
Mapper Input:
{1,0,DISCOVERED,1} {2,40} {3,8} {4,10}
{2,40,UNDISCOVERED,1} {5,6} {7,10}
{3,8,UNDISCOVERED,1} {2,4} {4,12} {6,2}
{4,10,UNDISCOVERED,1} {6,1}
Mapper Output [sau sắp xếp, gộp nhóm]:

{1,0,DISCOVERED,1} {2,40} {3,8} {4,10}
{2,12,UNDISCOVERED,1-3}
{2,40,DISCOVERED,1} {5,6} {7,10}
{3,8,DISCOVERED,1} {2,4} {4,12} {6,2}
5
{7,INFINITY,UNDISCOVERED} {8,20}{10,1}
Bước lặp 3:
Reducer Input / Output: Hình ảnh đồ thị minh họa

Mapper Input:
{1,0,DISCOVERED,1} {2,40} {3,8}{4,10}
{2,12,UNDISCOVERED,1-3}{5,6} {7,10}
{3,8,DISCOVERED,1} {2,4} {4,12}{6,2}
{5,46,UNDISCOVERED,1-2} {3,2} {6,2} {7,4}
{6,10,UNDISCOVERED,1-3} {8,4} {9,3
{7,50,UNDISCOVERED,1-2} {8,20} {10,1}
Reducer Input / Output:
6
Final Output:
Hình ảnh đồ thị minh họa

{1,0,DISCOVERED,1} {2,40} {3,8} {4,10}
{2,12,DISCOVERED,1-3-2} {5,6} {7,10}
{3,8,DISCOVERED,1-3} {2,4} {4,12} {6,2}
{4,10,DISCOVERED,1-4} {6,1}
{5,14,DISCOVERED,1-3-6-8-5} {3,2} {6,2} {7,4}
{6,10,DISCOVERED,1-3-6} {8,4} {9,3}
{7,18,DISCOVERED,1-3-6-8-5-7}{8,20} {10,1}
{8,14,DISCOVERED,1-3-6-8} {5,0} {10,20}
{9,13,DISCOVERED,1-3-6-9} {4,6} {10,2}
{10,15,DISCOVERED,1-3-6-9-10}
The Path in each Key-Value Pair is the Weighted Shortest Path from the Source Node
[0] to the Node in the Key.
2. Chương trình MapReduce

Chương trình được tổ chức thành 03 file
2.1 Tập tin ShortestPath.java
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.KeyValueTextInputFormat;
import org.apache.hadoop.mapred.RunningJob;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
import java.io.IOException;
import java.util.Iterator;
import java.util.Arrays;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reporter;
import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.mapred.Partitioner;
class AlgorithmMapper extends MapReduceBase implements Mapper<Text, Text, Text,

Text>
{
private static final String DISCOVERED = "DISCOVERED";
private static final String UNDISCOVERED = "UNDISCOVERED";
private static final String INFINITY = "INFINITY";
private static final int MINIMUM_SOURCE_NODE_TOKENS = 3;
7
private static final int MAXIMUM_SOURCE_NODE_TOKENS = 4;
public static final int INDEX_SOURCE_NODE_NUMBER = 0;
public static final int INDEX_SOURCE_NODE_WEIGHT = 1;
private static final int INDEX_SOURCE_NODE_STATUS = 2;
private static final int INDEX_SOURCE_NODE_PATH = 3;
private Text emptyText = new Text();
@Override
public void map(Text key, Text value, OutputCollector<Text, Text> output, Reporter
reporter) throws IOException
{
String[] sourceNodeDetails = key.toString().substring(1, key.toString().length() -
1).split(",");
if (sourceNodeDetails.length >= MINIMUM_SOURCE_NODE_TOKENS
&& !sourceNodeDetails[INDEX_SOURCE_NODE_WEIGHT].equalsIgnoreCase(INFINITY)
&& !sourceNodeDetails[INDEX_SOURCE_NODE_STATUS].equalsIgnoreCase(DISCOVERED))
{
String currentPath = "";
if (sourceNodeDetails.length == MAXIMUM_SOURCE_NODE_TOKENS)
{
currentPath = sourceNodeDetails[INDEX_SOURCE_NODE_PATH];
}
currentPath += (currentPath.length() == 0 ? "" : "-") +
sourceNodeDetails[INDEX_SOURCE_NODE_NUMBER];
output.collect(new Text("{" + sourceNodeDetails[INDEX_SOURCE_NODE_NUMBER] + "," +

sourceNodeDetails[INDEX_SOURCE_NODE_WEIGHT] + "," + DISCOVERED + "," + currentPath +
"}"), value);
if (value.toString().trim().length() > 0)
{
String[] tokens = value.toString().trim().split("\t");
String[][] adjacentNodeDetails = new String[tokens.length][2];
for (int index = 0; index < tokens.length; index++)
{
adjacentNodeDetails[index] = tokens[index].substring(1, tokens[index].length()
- 1).split(",");
}
int sourceNodeWeight =
Integer.parseInt(sourceNodeDetails[INDEX_SOURCE_NODE_WEIGHT]);
for (int index = 0; index < tokens.length; index++)
{
int number = sourceNodeWeight + Integer.parseInt(adjacentNodeDetails[index]
[1]);
output.collect(new Text("{" + adjacentNodeDetails[index][0] + "," +
Integer.toString(number) + "," + UNDISCOVERED + "," + currentPath + "}"), emptyText);
reporter.incrCounter(ShortestPath.CUSTOM_COUNTERS,
ShortestPath.NUMBER_OF_UNDISCOVERED_NODES_TO_BE_PROCESSED, 1);
}
}
}
else
{
output.collect(key, value);
}
}
}//AlgorithmMapper
class AlgorithmReducer extends MapReduceBase implements Reducer<Text, Text, Text, Text>
8
{
@Override
public void reduce(Text key, Iterator<Text> values, OutputCollector<Text, Text> output,
Reporter reporter) throws IOException
{
int index = 0;
String newValue = "";
while (values.hasNext())
{
if (index > 0)
{
newValue += "\t";
}
newValue += values.next().toString();
}
output.collect(key, new Text(newValue));
}
}//AlgorithmReducer
class AlgorithmPartitioner implements Partitioner<Text, Text>

{
@Override
public int getPartition(Text key, Text value, int numReduceTasks)
{
String compositeKey = key.toString();
String[] nodeDetails = compositeKey.substring(1, compositeKey.length() -
1).split(",");
int returnValue = (nodeDetails[AlgorithmMapper.INDEX_SOURCE_NODE_NUMBER].hashCode() &
Integer.MAX_VALUE) % numReduceTasks;
return returnValue;
}
public void configure(JobConf job) { }

}//AlgorithmPartitioner
public class ShortestPath extends Configured implements Tool

{
public static final String CUSTOM_COUNTERS = "Custom Counters";
public static final String NUMBER_OF_UNDISCOVERED_NODES_TO_BE_PROCESSED =
"NUMBER_OF_UNDISCOVERED_NODES_TO_BE_PROCESSED";
@Override
public int run(String[] args) throws Exception
{
int iteration = 0;
long toBeProcessed = 0;
boolean finalRun = false;
String inputPath = args[0];
String outputPath = args[1];
do
{
JobConf conf = new JobConf(getConf(), ShortestPath.class);
conf.setJobName(this.getClass().getName());
FileInputFormat.setInputPaths(conf, new Path(inputPath));

FileOutputFormat.setOutputPath(conf, new Path(finalRun ? outputPath :
outputPath + "-" + iteration));
9
conf.setInputFormat(KeyValueTextInputFormat.class);
conf.setMapperClass(AlgorithmMapper.class);
conf.setOutputKeyComparatorClass(AlgorithmOutputKeyComparator.class);
conf.setOutputValueGroupingComparator(AlgorithmOutputValueGroupingComparator.class);
conf.setPartitionerClass(AlgorithmPartitioner.class);
conf.setReducerClass(AlgorithmReducer.class);
if (finalRun)
{
conf.setNumReduceTasks(1);
}
conf.setMapOutputKeyClass(Text.class);
conf.setMapOutputValueClass(Text.class);
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(Text.class);
RunningJob job = JobClient.runJob(conf);
inputPath = outputPath + "-" + iteration;
iteration++;
toBeProcessed = job.getCounters().findCounter(CUSTOM_COUNTERS,
NUMBER_OF_UNDISCOVERED_NODES_TO_BE_PROCESSED).getValue();
finalRun = (toBeProcessed == 0 && !finalRun);
}
while (toBeProcessed > 0 || finalRun);
return 0;
}
public static void main(String[] args) throws Exception

{
int exitCode = ToolRunner.run(new ShortestPath(), args);
System.out.print("Ket thuc");
System.exit(exitCode);
}
}
2.2 Tập tin AlgorithmOutputKeyComparator.java
import org.apache.hadoop.io.WritableComparable;
import org.apache.hadoop.io.WritableComparator;
public class AlgorithmOutputKeyComparator extends WritableComparator

{
public AlgorithmOutputKeyComparator()
{
super(Text.class, true);
}
@Override
public int compare(WritableComparable wc1, WritableComparable wc2)
{
String compositeKey1 = ((Text) wc1).toString().trim();
10
String[] nodeDetails1 = compositeKey1.substring(1, compositeKey1.length() -
1).split(",");
1).split(",");
Integer node1 = null;

try
{
node1 = Integer.valueOf(nodeDetails1[AlgorithmMapper.INDEX_SOURCE_NODE_NUMBER]);
}
catch (Exception e)
{
node1 = new Integer(Integer.MAX_VALUE);
}
try
{
}
catch (Exception e)
{
}
Integer weight1 = null;

Integer weight2 = null;
try
{
weight1 = Integer.valueOf(nodeDetails1[AlgorithmMapper.INDEX_SOURCE_NODE_WEIGHT]);
}
catch (Exception e)
{
weight1 = new Integer(Integer.MAX_VALUE);
}
try
{
weight2 = Integer.valueOf(nodeDetails2[AlgorithmMapper.INDEX_SOURCE_NODE_WEIGHT]);
}
catch (Exception e)
{
weight2 = new Integer(Integer.MAX_VALUE);
}
int returnValue = node1.compareTo(node2);

if (returnValue == 0)
{
returnValue = weight1.compareTo(weight2);
}
return returnValue;
}
}
2.3 Tập tin AlgorithmOutputValueGroupingComparator.java
import org.apache.hadoop.io.WritableComparable;
import org.apache.hadoop.io.WritableComparator;
11
public class AlgorithmOutputValueGroupingComparator extends WritableComparator
{
public AlgorithmOutputValueGroupingComparator()
{
super(Text.class, true);
}
@Override
public int compare(WritableComparable wc1, WritableComparable wc2)
{

1).split(",");
1).split(",");

try
{
}
catch (Exception e)
{
}
try
{
}
catch (Exception e)
{
}
int returnValue = node1.compareTo(node2);
return returnValue;
}
}
3. Biên dịch và chạy chương trình MapReduce

 Mở cửa sổ lệnh dưới quyền Administrator.
 Khởi động các tiến trình hadoop: chạy start-all.cmd
 Chuyển vào thư mục hiện hành
cd C:\hadoop\ShortestPath
 Khai báo biến môi trường chỉ đường dẫn

set HADOOP_LIB=c:\hadoop\hadoop-2.6.0\share\hadoop
set HADOOP_CLASSPATH=%HADOOP_LIB%\mapreduce\hadoop-mapreduce-
client-core-2.6.0.jar;%HADOOP_LIB%\common\hadoop-common-2.6.0.jar;
 Biên dịch mã nguồn java thành các file classes\*.class

12
C:\hadoop\ShortestPath> md classes
C:\hadoop\ShortestPath> javac -classpath %HADOOP_CLASSPATH% -d
classes *.java
C:\hadoop\ShortestPath> dir classes
 Xóa file *.jar nếu đã có trước đó

C:\hadoop\ShortestPath> del ShortestPathXYZ.jar
 Đóng gói file *. jar

C:\hadoop\ShortestPath> jar -cvf ShortestPathXYZ.jar -C classes .
 Xóa thư mục /inputXYZ nếu đã có, tạo mới và chép dữ liệu vào
C:\hadoop\ShortestPath> hadoop fs -rm -r /inputXYZ
C:\hadoop\ShortestPath> hadoop fs -mkdir /inputXYZ
C:\hadoop\ShortestPath> hadoop fs -put input.txt /inputXYZ
C:\hadoop\ShortestPath> hadoop fs -ls /inputXYZ
 Xóa thư mục /outputXYZ nếu đã có

C:\hadoop\ShortestPath> hadoop fs -rm –r /output*
 Chạy chương trình Mapreduce với dữ liệu trong thư mục /inputXYZ, kết quả xuất ra
/outputXYZ
C:\hadoop\ShortestPath> hadoop jar ShortestPathXYZ.jar ShortestPath
/inputXYZ /outputXYZ
Quá trình Map/Reduce sẽ lặp đi lặp lại nhiều lần. Kết quả của Map sẽ là input cho
Reduce.
 Xem kết quả trong /outputXYZ
c:\hadoop\MapReduceShortestPath>hadoop fs -ls /
Found 10 items
drwxr-xr-x - Administrator supergroup 0 2020-06-20 09:41 /inputXYZ
drwxr-xr-x - Administrator supergroup 0 2020-06-20 10:15 /outputXYZ
drwxr-xr-x - Administrator supergroup 0 2020-06-20 10:12 /outputXYZ-0
drwx------ - Administrator supergroup 0 2020-05-30 20:45 /tmp
C:\hadoop\ShortestPath> hadoop fs -ls /outputXYZ

C:\hadoop\ShortestPath> hadoop fs -cat /outputXYZ/part-00000
{1,0,DISCOVERED,1} {2,40} {3,8} {4,10}

{2,12,DISCOVERED,1-3-2} {5,6} {7,10}
{3,8,DISCOVERED,1-3} {2,4} {4,12} {6,2}
{4,10,DISCOVERED,1-4} {6,1}
{5,14,DISCOVERED,1-3-6-8-5} {3,2} {6,2} {7,4}
{6,10,DISCOVERED,1-3-6} {8,4} {9,3}
{7,18,DISCOVERED,1-3-6-8-5-7} {8,20} {10,1}
{8,14,DISCOVERED,1-3-6-8} {5,0} {10,20}
13
{9,13,DISCOVERED,1-3-6-9} {4,6} {10,2}
{10,15,DISCOVERED,1-3-6-9-10}
4. Bài tập
Mô tả quá trình thực hiện sử dụng giải thuật MapReduce với nhiều bộ dữ liệu khác.
theo các bước như sau:
a) Mô tả cấu trúc dữ liệu sử dụng trong chương trình
b) Tổ chức dữ liệu đầu vào
c) Quá trình Map/Reduce. Minh họa bằng kết quả trung gian và đồ thị
14
----------------------------------------------------------
15

Huong Dan Chay MapReduceShortestPath

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Huong Dan Chay MapReduceShortestPath

Uploaded by

Copyright:

Available Formats

XỬ LÝ PHÂN TÁN MAPREDUCE

Shortest Path using MapReduce using parallel Breadth-First Search (BFS)

1. Thuật toán MapReduce

Hình 1 Đồ thị có số đỉnh: 10, số cạnh: 21

b. Tổ chức danh sách kề trung gian:

Trong đó: Node Number = Lấy từ các đỉnh của đồ thị.

1.2 Thuật toán Mapreduce

Initial Adjacency List Format:

Mapper Pseudo Code:

Quá trình Map

Ngược lại xuất ra các cặp (key, value).

Output Key Comparator Pseudo Code:

Output Value Grouping Comparator Pseudo Code:

Partitioner Pseudo Code:

Reducer Pseudo Code:

3) Cập nhật lại output reduce.

1.3 Minh họa các bước tính toán

File input.txt Hình ảnh đồ thị minh họa

Chú ý : không có khoảng trống sau dấu phẩy (,)

Mapper Output [Sau sắp xếp, gộp nhóm]:

Reducer Input / Output: Hình ảnh đồ thị minh họa

Mapper Output [sau sắp xếp, gộp nhóm]:

Reducer Input / Output: Hình ảnh đồ thị minh họa

Reducer Input / Output:

Hình ảnh đồ thị minh họa

2. Chương trình MapReduce

class AlgorithmMapper extends MapReduceBase implements Mapper<Text, Text, Text,

output.collect(new Text("{" + sourceNodeDetails[INDEX_SOURCE_NODE_NUMBER] + "," +

class AlgorithmReducer extends MapReduceBase implements Reducer<Text, Text, Text, Text>

class AlgorithmPartitioner implements Partitioner<Text, Text>

public void configure(JobConf job) { }

public class ShortestPath extends Configured implements Tool

FileInputFormat.setInputPaths(conf, new Path(inputPath));

public static void main(String[] args) throws Exception

2.2 Tập tin AlgorithmOutputKeyComparator.java

public class AlgorithmOutputKeyComparator extends WritableComparator

Integer node1 = null;

Integer weight1 = null;

int returnValue = node1.compareTo(node2);

2.3 Tập tin AlgorithmOutputValueGroupingComparator.java

String[] nodeDetails1 = compositeKey1.substring(1, compositeKey1.length() -

Integer node1 = null;

3. Biên dịch và chạy chương trình MapReduce

 Khai báo biến môi trường chỉ đường dẫn

 Biên dịch mã nguồn java thành các file classes\*.class

 Xóa file *.jar nếu đã có trước đó

 Đóng gói file *. jar

 Xóa thư mục /outputXYZ nếu đã có

C:\hadoop\ShortestPath> hadoop fs -ls /outputXYZ

{1,0,DISCOVERED,1} {2,40} {3,8} {4,10}

You might also like