An Alternative Time - Optimal Distributed Sorting Algorithm On A Line Network

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

An alternative time - optimal distributed sorting

algorithm on a line network


R.Rajendra Prasath
Department of Computer and Information Science,
Norwegian University of Science and Technology,
NO-7491 Trondheim, Norway.

Abstract—In this paper, we consider sorting problem with exactly n elements and the creation of
with n elements distributed over a number of processing copies of elements is avoided. This makes the
entities in a distributed system. We have derived an proposed algorithm fast and robust for distributed
alternative, efficient algorithm with the worst case lower
environments.
bound of (n − 1) rounds for distributed sorting on a
line network, where n is the number of processors. In the design and analysis of algorithms, sorting
The proposed distributed sorting algorithm improves the problem is one of the most fundamental problem
performance of each processor without creating copies of in Computer Science. It has been extensively
(n − 2) elements at intermediate processors and reduces investigated in distributed contexts. To cooperate to
the execution time of Sasaki’s time−optimal algorithm solve a problem, the processors in the distributed
[A.Sasaki, A time-optimal distributed sorting algorithm
computing system must communicate among
on a line network, Inform. Process. Lett., 83(2002) pp. 21-
26]. Also all processors do not necessarily perform the themselves by exchanging messages via a network.
disjoint comparison-exchange operations and simulation The potential efficiency of a distributed system is
results show that the proposed algorithm results in better inherent in the design of an effective algorithm that
execution time with the identity of processors. This minimizes the number of message exchanges as
algorithm could also be extended for sorting the distributed well as the computation time [13].
elements on the linear embedding obtained from a general There exists several algorithms for distributed
network.
Index Terms—Distributed algorithms, distributed
sorting in various topologies. Loui [12] presented
sorting, computational complexity, line network. a simple sorting algorithm on rings. Rotem et al.
[19] have studied the static and dynamic versions
of the sorting problem where each node contains a
I. I NTRODUCTION subset of elements. Then Marberg and Gafni [15]
We consider the problem of sorting a set developed a sorting method for a multi-channel
of elements distributed over processors on a broadcast network. Zaks [22] presented a sorting
line network. The traditional lower bound for algorithm for a tree network and then extended it
distributed sorting problem has been considered to general networks. Then McMillin and Ni [16]
to be n rounds because the number of disjoint described the distributed sorting problem with an
comparison-exchange operations has required n unreliable network. However, we deal with a static
rounds for parallel sorting on a linear array [1], sorting problem with a reliable network [1], [7],
[10]. Then it has been reduced to (n − 1) rounds [9], [11], [12], [15], [19], [22] that forms a basis
by Sasaki’s algorithm [20] in which copies of for all sorting problems. Recently in [2], a self-
elements, at intermediate processors with restricted stabilizing scheme for a synchronization problem
local memory, are created. Now we have further on asynchronous oriented chains (Algorithm SSDS)
reduced the execution time of Sasaki’s algorithm has been proposed to solve local mutual exclusion
and distributed sorting (where each process holds a
This work was carried out when Rajendra was associated single value and the values to be sorted are distinct).
with the Department of Computer Science, University of Madras, An optimal sorting algorithm on the LARPBS
India and a part of this work was carried during the tenure of
an ERCIM “Alain Bensoussan” Fellowship programme. E-mail: model that runs in O(log N )− time using O(N )
rajendra@idi.ntnu.no; drrprasath@gmail.com; processors based on Coles CREW PRAM pipelined
••

Authorized licensed use limited to: Ingrid Nurtanio. Downloaded on October 01,2020 at 02:21:38 UTC from IEEE Xplore. Restrictions apply.
2

merge sort [3] is presented in [17]. arranged to satisfy the condition, ∀ i, 1 ≤ i < n,
An important observation of these results is ui ≤ ui+1 at the final state.
to find a strategy that minimizes the amount of Next we describe the line network - the
communication. For example, Gerstel and Zaks underlying computational model used for
[7] showed that for every network with a tree distributed sorting. A line network is defined
topology T , every sorting algorithm must send at as a linear collection of n processors P1 , P2 , · · ·,
least Ω(ΔT log(L/N )) bits in the worst case, where Pn where each Pi , 1 < i < n, is bidirectionally
{1, 2, · · · , L} is the set of possible initial values connected to Pi−1 and Pi+1 . Without loss of
and ΔT is the sum of distances from all values generality, we assume that P1 is the end point on
to a median of the tree. Then Pan et al. [18] the left of the network and Pn is the end point on
presented a parallel quick sort computational model the right of the network. Also we assume that each
on a linear array with a reconfigurable pipelined processor knows its neighbors only by local names
bus system. Hofstee et al. [8] designed a time- of left and right, with the orientation consistent
optimal algorithm for a sorting problem on a line along the line. Each processor Pi is equipped with
network with restricted local memory. However a restricted local memory and it is capable of
in this algorithm, each processor has to have at having a constant number of elements. In the line
least two elements and this algorithm fails when network, each processor communicates with its
each processor has exactly one element. Recently direct neighbor(s) only.
Sasaki [20] proposed a time-optimal distributed In the communication paradigm, there exists
sorting algorithm with a strict lower bound of two models namely synchronous and asynchronous
(n − 1) rounds on a line network by creating copies models [14]. This paper is particularly focused
of elements at intermediate processors. Thus the on algorithms designed for synchronous models.
number of elements used for sorting is (2n − 2). The analysis of proposed algorithms uses two
Even though Sasaki’s algorithm is based on the distinct measures namely time complexity which is
odd-even transportation sort, it does not ensure that measured in terms of the number of rounds and
each processor always has two elements of the same communication complexity which is measured in
value at the final round. Hence, we have proposed terms of the total number of message exchanges.
an algorithm for distributed sorting using median
based exchanges in a line network without creating
III. E ARLIER WORK
copies of the elements at all processors.
In next section, we have described the distributed The distributed sorting problem is similar to a
sorting problem. Then in section III, we have parallel sorting problem on a linear array, which
presented earlier work on distributed sorting. can be solved by using the odd-even transposition
Section IV carries the proposed an alternative sort on a synchronous model. At first, we brief the
time-optimal distributed sorting algorithm for a operations of the odd-even transposition sort [1],
line network. In section V, simulation results [10], [20] as follows. At an odd-numbered step,
are presented in comparison with recent work in a processor Pi whose suffix i is odd exchanges
distributed sorting on a line network. Then we its element ui with Pi+1 ’s element ui+1 if ui is
have described a technique for sorting n elements larger than ui+1 . At an even numbered step, Pi
on the linear embedding from a general network in whose suffix is odd exchanges its element ui with
section VI. Finally, concluding remarks in section Pi−1 ’s element ui−1 if ui is not larger than ui−1 .
VII completes the paper. An example of the execution of the odd-even
transposition sort can be found in [1], [10], [20].
Each processor executes the above operations in n
II. T HE PROBLEM AND THE COMPUTATIONAL
steps.
MODEL
To apply the odd-even transposition sort, Pi has
The definition of the distributed sorting problem, to know its global position, because the difference
considered in this paper, is as follows: At the between the execution of odd-numbered step and
initial state, each processor Pi has its element ui even-numbered step depends on i. While adopting
for sorting. Then, the position of each element is this type of communication, each processor always
••

Authorized licensed use limited to: Ingrid Nurtanio. Downloaded on October 01,2020 at 02:21:38 UTC from IEEE Xplore. Restrictions apply.
3

5 4 Initial 3 2 1
has two copies of the same element at the final round
(0) (1) (2) (0) (1)
state. Consequently, the time complexity is n rounds
since sorting is executed with 2n elements. Without Round 0 5 4 3 2 1
additional (n − 1) overhead rounds to learn the (0) (1) (2) (0) (1)
global position, Sasaki adopted a strategy that does
not require information on the global position, i.e., a Round 1 3 4 5 1 2
strategy in which each processor communicates with (2) (0) (1) (2) (0)

both neighbors simultaneously as in Hofstee et al.’s


Round 2 3 1 4 5 2
algorithm [8]. The improvement in the algorithm has (1) (2) (0) (1) (2)
been achieved by cancelling the creation of copies
in the leftmost and rightmost processors, i.e., each Round 3 1 3 2 4 5
P1 and Pn has only one element in each round. (0) (1) (2) (0) (1)
This implied that the number of elements used for
sorting is 2(n − 1) and the time complexity is Round 4 1 2 3 4 5
(n − 1) rounds. However, this improved algorithm The Solution
does not ensure that each processor always has two
copies of the element at the final state [10], [20]. Fig. 1. An example of the alternative (n-1) round algorithm with
n=5 elements
Hence, in this paper, we have addressed a new
median based exchanging algorithm, that is entirely
different from traditional odd-even transposition
sort, for distributed sorting problem in which each worst case. As in Hofstee et al.’s algorithm[8],
processor always has only one element at the end in which a processor communicates with both
of each round. neighbors simultaneously, we do communicate
with both neighbors simultaneously only at
median processors and not at all other non-median
IV. A N ALTERNATIVE TIME - OPTIMAL processors whose mark is either 0 or 2. The
ALGORITHM non-median processors can communicate only with
We have proposed an alternative, efficient (n − 1) its corresponding neighboring median processor.
rounds time-optimal algorithm for the distributed There exists efficient distributed algorithms for
sorting problem on a line network. Sasaki’s determining the medians[4], [6].
algorithm can be improved by implementing In the sequel, we describe the process of how to
median based exchanges and cancelling the select an adjacent median processor in each round.
creation of copies at intermediate processors, i.e., First we fix a mark Ti to each processor i, 1 ≤
each intermediate processor Pi has always only one i ≤ n, in such way that Ti ≡ i (mod 3) in the
element in every round. This explicitly implies that initial round. Here a processor neither creates nor
the number of elements used for sorting is exactly n. holds a copy of the element in each round. Then
Accordingly, the time-complexity of (n − 1) rounds for the subsequent rounds, the mark follows two
is derived from the exchanges of neighboring step forward to exchange the smallest element to
elements at the median processor whose mark is the leftmost processor and the largest element to
1 and this exchange is entirely different from the the rightmost processor as well. So after the initial
conventional odd-even transposition sort. round, the mark Ti of each processor Pi is set as
First we explain the proposed algorithm with Ti ≡ (Ti + 2) (mod 3) to achieve two-step forward
an example as shown in Figure 1. The initial operation. Thus the status of becoming a median
state resembles with the one that is as in n round processor is determined by the value of the mark
algorithm as well as in Sasaki’s algorithm. But Ti . Also this mark helps a processor Pi to select the
in the final state, we do not require a separate direction in which the element has to be exchanged
rule, as in [20], for processor Pi to select an with its median neighbor.
element as the solution. In fact, the element It is quite interesting to notice that among
stored by the processor Pi after (n − 1) rounds, intermediate processors, only the median
is itself the correctly sorted element even in the processors involve in performing two receipt
••

Authorized licensed use limited to: Ingrid Nurtanio. Downloaded on October 01,2020 at 02:21:38 UTC from IEEE Xplore. Restrictions apply.
4

and two send operations where as all other non- receive(x,p) - receives message x from
median intermediate processors as well as the end processor P .
processors (may be median or non-median) need to exchange(a,b,c) - exchanges the elements a,b
perform only one send and one receipt operation and c in the order small < medium < large.
simultaneously. But as in [20], each intermediate swap(a,b) - swaps the elements a and b.
processor needs compulsorily two receipt and two STOP - completes the execution.
send operations and the end processors need only
one receipt and one send operation. This greatly 2. Local variables at Pi :
reduces the computational time by increasing the ui - a variable or an element to be sorted.
speed of the proposed distributed sorting algorithm. vi - variable or an element for sorting, initially
Thus, in each round, each non-median processor an initial element.
Pi , whose mark is 0, communicates its element s,m,l - variables used for exchanging the
for comparison with the median processor Pi+1 elements during sorting, initially undefined.
and obtains the smallest element contributed by
rd - time(round), initially 0.
that median processor. Similarly, each non-median
Ti - mark of the processor Pi . This mark
processor Pi , whose mark is 2, communicates
decides the median processor whose mark is
its element with the median processor Pi−1 and
1.
obtains the largest element contributed by the
n - number of processors in the line network.
median processor. Now it is clear that each median
processor Pi receives elements from its adjacent 3. Operations for Pi in round 0:
neighbors; exchanges the received elements and the Begin
element of its own to preserve the partial order < ui = vi ; Ti = i (mod 3);
of three elements and then sends back the smallest if Ti = 0 then send(u,right);
one to the processor whose mark is 0 and the else if Ti = 2 then send(u,left);
largest element to the processor whose mark is 2. else
The same process is repeated for (n − 1) rounds.
s=receive(ui−1 ,left); m = ui ;
After the execution of (n − 1) rounds, the resulting
l=receive(ui+1 ,right);
sequence is itself in the sorted order and as in
endif
[20], we do not require a special rule for selecting
the solution. Also the process of ensuring each rd = rd + 1
processor to have two elements of the same value End
at the final round is not required. 4. Operations for Pi after the initial round:
Begin
Note: Instead of using two-step forward incremental if rd < n − 1 then
operation, if we use Ti ≡ (Ti + 1)(mod 3), then if i = 0 then s=0;
the worst case number of rounds exceeds (n − 1) if Ti = 0 or 2 then l=0;
rounds in the some cases. else m=ui ;
l = receive(ui+1 , right);
Next we describe the practical operations if m > l then
necessary for the simulation of the proposed swap(m, l); ui = l;
distributed sorting algorithm on a line network. It send(m, right);
is assumed that each processor knows the value of endif
n for the termination detection. endif
Ti = (Ti + 2)(mod3);
An alternative (n − 1) round algorithm: else if i = n − 1 then l=0;
1. Definitions of the basic primitive internal if Ti = 0 or 2 then s = 0;
operations for Pi : else
x:= the message that contains the element. s = receive(ui−1 , lef t);
send(x,p) - sends message x to processor P . m = ui ;
••

Authorized licensed use limited to: Ingrid Nurtanio. Downloaded on October 01,2020 at 02:21:38 UTC from IEEE Xplore. Restrictions apply.
5

if s > m then
Sasaki's Algorithm The proposed algorithm
swap(s, m); 450
ui = m; send(s, lef t); 400
endif 350
endif 300
Ti = (Ti + 2)(mod3);

time (in sec.) ->


250

else 200

if Ti = 0 or 2 then s = l = 0; 150

else 100

s = receive(ui−1 , lef t); 50

m = ui ; 0

l = receive(ui+1 , right); -50


0 5000 10000 15000 20000 25000 30000 35000

exchange(s, m, l); n ->

send(s, lef t); ui = m; Fig. 2. Performance comparison of the proposed algorithm with
Sasaki’s Algorithm
send(l, right);
endif
Ti = (Ti + 2)(mod3);
endif amount to 4*Num(Pi ) + 2*Num(n − Pi ), where
rd = rd + 1; Num(Pi ) is the number of intermediate processors
else /* Prints the solution and no (≈ n/3) and Num(n − Pi ) is the number of
special rule is needed */ the end processors. While comparing the amount
v i = ui ; of communication to be spent in each round
endif for the Sasaki’s algorithm which is based on
STOP the conventional odd-even transposition sort, the
End proposed algorithm is a very improved result with
less number of elements.
We have depicted the simulation results of
V. S IMULATION R ESULTS
the proposed algorithm with the performance
We have developed the discrete event simulator comparison with Sasaki’s algorithm [20]. From
in C under Red Hat Enterprise Linux to test simulation results (Figure 2), it is clear that our
the performance of the proposed distributed algorithm is good enough even for the largest
sorting algorithm. Even though the processors are sequence of elements.
concurrent and not related to the actual execution
time, the number of message exchanges per
VI. S ORTING ON THE LINEAR EMBEDDING FROM
round should be accounted for calculating the
A GENERAL NETWORK
overall message exchanges. However, we listed
the computational time needed to process the In this section, we have proposed an algorithm
requests for various number of processors in for the distributed sorting problem on an embedded
comparison with Sasaki’s algorithm [20]. We have linear array derived from a depth first search ordered
also noticed that the amount of communication general network. The depth first search ordered
to be spent in each round for exchanging the nodes of the network shall be embedded into a linear
elements in the proposed algorithm is 4*Num(pi ) + array in such a way that the indices of the nodes
2*(n−Num(pi )), where Num(pi ) is the number of in the general network shall be embedded into the
intermediate median processor(s) and (n−Num(pi )) indices of the linear array. Now the edges joining
is the number of remaining processors including the successive depth first search ordered nodes of
the end processors [which may either be a median the network shall be mapped to a single edge joining
processor or a non-median processor]. the corresponding successive nodes in the linear
For sorting n elements, the amount of array. The edges in the path between two successive
communication needed in Sasaki’s algorithm depth first search ordered nodes are grouped into
••

Authorized licensed use limited to: Ingrid Nurtanio. Downloaded on October 01,2020 at 02:21:38 UTC from IEEE Xplore. Restrictions apply.
6

a single edge having bidirectional communication [4] G.N.Frederickson, Tradeoffs for selection in distributed
capability with path record. This is the idea behind networks, in: Proc. 2nd ACM Sympos. on Principles of
Distributed Computing, (1983), 154-160, Assoc. Comput.
the embedding of a general network into a linear Mach., New York.
array. Note that the path stored from processor Pi [5] E.Gafni and D.Bertsekas, Distributed algorithms for generating
to Pi+1 will be useful for message with element loop-free routes in networks with frequently changing topology,
IEEE Trans. on Communiations, C-29 (1981) 11-18.
traversals. [6] R.G.Gallager, P.A.Humblet and P.M.Spira, A distributed
After embedding the general network into a linear algorithm for minimum-weight spanning trees, ACM Trans.
array, we compute all path records between two Prog. Lang. Systems, 5 (1983) 66-77.
[7] O.Gerstel and S.Zaks, The bit complexity of distributed sorting,
successive nodes. One path record consists of the Algorithmica, 18 (1997) 405-416.
details of all intermediate processors and it is [8] H.P.Hofstee, A.J.Martin and J.L.A. Van De Snepscheut,
assumed that within a time round, element of one Distributed sorting, Sci. Comput. Programming, 15 (1990) 119-
133.
processor will be exchanged with the element of the [9] E.Horowitz, S.Sahni and S.Rajasekaran, Computer Algorithms,
successive processor via the computed path record. Galgotia Publications, New Delhi, 2001.
Then we can implement the proposed algorithm on [10] F.T.Leighton, Introduction to Parallel Algorithms and
Architectures: Arrays, Trees, Hypercubes, Morgan Kaufmann,
the embedded linear array. The proposed distributed San Mateo, CA, 1992.
sorting algorithm takes (n − 1) rounds to complete [11] F-C. Leu, Y-T. Tsai and C.Y.Tang, An efficient external sorting
distributed sorting. algorithm, Inform. Process. Lett., 75 (2000) 159-163.
[12] M.C.Loui, The complexity of sorting on distributed systems,
Inform. and Control, 60 (1984) 70-85.
[13] W.S.Luk and F.Ling, An analytical/emperical study of
VII. C ONCLUSION distributed sorting on a local area network, IEEE Trans.
Software Engg., 15 (1989) 575-586.
In this paper, we have proposed an alternative [14] N.A.Lynch, Distributed Algorithms, Morgan Kaufmann, San
distributed sorting algorithm to minimize the Francisco, CA, 1996.
[15] J.M.Marberg and E.Gafni, Distributed sorting algorithms for
amount of message exchanges needed for sorting multi-channel broadcast networks, Theoret. Comput. Sci., 52
n elements distributed over n processors on a (1987) 193-203.
line network. The algorithm reduces the number [16] B.M.McMillin and L.M.Ni, Reliable distributed sorting through
the application-oriented fault tolerance paradigm, IEEE Trans.
of message exchanges needed for sorting n Parallel Distrib. Systems, 3 (1992) 411-420.
elements without creating the copies of elements [17] Min He, Xiaolong Wu and Si Qing Zheng, An optimal and
at intermediate processors. Each median processor processor efficient parallel sorting algorithm on a linear array
with a reconfigurable pipelined bus system, Comput. Electr.
performs two receipt and send operations to Eng., 35 (6)(2009) pp. 951-965.
exchange the elements of both its neighbors and [18] Y.Pan, M.Hamdi and K.Li, Efficient and scalable quicksort on a
all other processors perform only one send and linear array with a reconfigurable pipelined bus system, Future
Generation Computer Systems, 13 (1997/98) 501-513.
receipt operation. The idea is entirely different [19] D.Rotem, N.Santoro and J.B.Sidney, Distributed sorting, IEEE
from the approach of odd-even transposition sort Trans. on Computers, C-34 (1985) 372-376.
and the simulation results show that the proposed [20] A.Sasaki, A time-optimal distributed sorting algorithm on a
line network, Inform. Proc. Lett, 83 (2002) 21-26.
algorithm is faster than the Sasaki’s time-optimal [21] J.E.Walter, J.L.Welch and N.H.Vaidya A mutual exclusion
sorting algorithm. Even though the proposed algorithm for ad hoc mobile networks, Wireless Networks, 9
algorithm takes (n − 1) rounds in the worst case, (2001) 585-600.
[22] S.Zaks, Optimal distributed algorithms for sorting and ranking,
it is still fast and robust. Also this technique for IEEE Trans. on Computers, C-34 (1985) 376-379.
distributed sorting could be extended to the linear
embedding of a general network.

R EFERENCES
[1] S.G.Akl, The design and analysis of Parallel Algorithms,
Prentice-Hall, Englewood Cliffs, New Jersy, 1989.
[2] Bein, D. and Datta, A.K. and Larmore, L.L., Self-stabilizing
synchronization algorithms on oriented chains, 4th International
Conference on Intelligent Computer Communication and
Processing (ICCP 2008), (2008) pp. 303 - 306.
[3] Richard Cole, Parallel merge sort, SIAM Journal on
Computing, 17 (4) (1988) pp. 770-785.
••

Authorized licensed use limited to: Ingrid Nurtanio. Downloaded on October 01,2020 at 02:21:38 UTC from IEEE Xplore. Restrictions apply.

You might also like