Searching Algorithms

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

World Applied Programming, Vol (1), No (2), June 2011.

105-109
ISSN: 2222-2510
2011 WAP journal. www.waprogramming.com

Searching algorithms
C. Canaan *

M. S. Garai

M. Daya

Information institute
Chiredzi, Zimbabwe
canaancan@gmail.com

Information institute
Chiredzi, Zimbabwe
mat.s.g@mail.com

Information institute
Chiredzi, Zimbabwe
d2020m@yahoo.com

Abstract: Here we want to represent an introduction about searching algorithms. Due to this,
first we will discuss on general searching algorithm purposes. These are: virtual search spaces,
sub-structures of a given structure and quantum computers. Also, for more information, we
introduce some of simple and popular search algorithms such as: Linear search, Selection search
and Binary search. The purpose of doing so is to make you familiar with implementation of
searching algorithms.
Key word: Search algorithms linear search selection search binary search
I.

INTRODUCTION

In computer science, a search algorithm, broadly speaking, is an algorithm for finding an item with
specified properties among a collection of items. The items may be stored individually as records in a
database; or may be elements of a search space defined by a mathematical formula or procedure, such as
the roots of an equation with integer variables; or a combination of the two, such as the Hamiltonian
circuits of a graph [1].
II.

VIRTUAL SEARCH SPACES

Algorithms for searching virtual spaces are used in constraint satisfaction problem, where the goal is
to find a set of value assignments to certain variables that will satisfy specific mathematical equations and
inequations. They are also used when the goal is to find a variable assignment that will maximize or
minimize a certain function of those variables. Algorithms for these problems include the basic bruteforce search (also called "nave" or "uninformed" search), and a variety of heuristics that try to exploit
partial knowledge about structure of the space, such as linear relaxation, constraint generation, and
constraint propagation.
An important subclass are the local search methods, that view the elements of the search space as the
vertices of a graph, with edges defined by a set of heuristics applicable to the case; and scan the space by
moving from item to item along the edges, for example according to the steepest descent or best-first
criterion, or in a stochastic search. This category includes a great variety of general metaheuristic
methods, such as simulated annealing, tabu search, A-teams, and genetic programming, that combine
arbitrary heuristics in specific ways.
This class also includes various tree search algorithms, that view the elements as vertices of a tree,
and traverse that tree in some special order. Examples of the latter include the exhaustive methods such as
depth-first search and breadth-first search, as well as various heuristic-based search tree pruning methods
such as backtracking and branch and bound. Unlike general metaheuristics, which at best work only in a
probabilistic sense, many of these tree-search methods are guaranteed to find the exact or optimal
solution, if given enough time.
Another important sub-class consists of algorithms for exploring the game tree of multiple-player
games, such as chess or backgammon, whose nodes consist of all possible game situations that could
result from the current situation. The goal in these problems is to find the move that provides the best
chance of a win, taking into account all possible moves of the opponent(s). Similar problems occur when

105

C. Canaan et al., World Applied Programming, Vol (1), No (2), June 2011.

humans or machines have to make successive decisions whose outcomes are not entirely under one's
control, such as in robot guidance or in marketing, financial or military strategy planning. This kind of
problems has been extensively studied in the context of artificial intelligence. Examples of algorithms for
this class are the minimax algorithm, alpha-beta pruning, and the A* algorithm [2].

III.

SUB-STRUCTURES OF A GIVEN STRUCTURE

The name combinatorial search is generally used for algorithms that look for a specific sub-structure
of a given discrete structure, such as a graph, a string, a finite group, and so on. The term combinatorial
optimization is typically used when the goal is to find a sub-structure with a maximum (or minimum)
value of some parameter. (Since the sub-structure is usually represented in the computer by a set of
integer variables with constraints, these problems can be viewed as special cases of constraint satisfaction
or discrete optimization; but they are usually formulated and solved in a more abstract setting where the
internal representation is not explicitly mentioned.)
An important and extensively studied subclass are the graph algorithms, in particular graph traversal
algorithms, for finding specific sub-structures in a given graph such as subgraphs, paths, circuits, and
so on. Examples include Dijkstra's algorithm, Kruskal's algorithm, the nearest neighbour algorithm, and
Prim's algorithm.
Another important subclass of this category is the string searching algorithms, which search for
patterns within strings. Two famous examples are the BoyerMoore and KnuthMorrisPratt algorithms,
and several algorithms based on the suffix tree data structure.
IV.

QUANTUM COMPUTERS

There are also search methods designed for (currently non-existent) quantum computers, like
Grover's algorithm, that are theoretically faster than linear or brute-force search even without the help of
data structures or heuristics.
V.

SIMPLE SEARCH ALGORITHMS

In this section we are going to introduce some of the simple and popular searching algorithms
including: Linear search, Selection search and Binary search.
Linear search
In computer science, linear search or sequential search is a method for finding a particular value in a
list, that consists of checking every one of its elements, one at a time and in sequence, until the desired
one is found [2].
Linear search is the simplest search algorithm; it is a special case of brute-force search. Its worst case
cost is proportional to the number of elements in the list; and so is its expected cost, if all list elements are
equally likely to be searched for. Therefore, if the list has more than a few elements, other methods (such
as binary search or hashing) will be faster, but they also impose additional requirements.
For a list with n items, the best case is when the value is equal to the first element of the list, in which
case only one comparison is needed. The worst case is when the value is not in the list (or occurs only
once at the end of the list), in which case n comparisons are needed.
If the value being sought occurs k times in the list, and all orderings of the list are equally likely, the
expected number of comparisons is

106

C. Canaan et al., World Applied Programming, Vol (1), No (2), June 2011.

For example, if the value being sought occurs once in the list, and all orderings of the list are equally
likely, the expected number of comparisons is (n+1)/2. However, if it is known that it occurs once, than at
most n - 1 comparison are needed, and the expected number of comparisons is

(For example, for n = 2 this is 1, corresponding to a single if-then-else construct).


Either way, asymptotically the worst-case cost and the expected cost of linear search are both O(n).
The following pseudocode describes a typical variant of linear search, where the result of the search
is supposed to be either the location of the list item where the desired value was found; or an invalid
location , to indicate that the desired element does not occur in the list.
For each item in the list:
if that item has the desired value,
stop the search and return the item's location.
Return .

In this pseudocode, the last line is executed only after all list items have been examined with none
matching.
If the list is stored as an array data structure, the location may be the index of the item found (usually
between 1 and n, or 0 and n1). In that case the invalid location can be any index before the first
element (such as 0 or 1, respectively) or after the last one (n+1 or n, respectively).
If the list is a simply linked list, then the item's location is its reference, and is usually the null
pointer.
Linear search can also be described as a recursive algorithm:
LinearSearch(value, list)
if the list is empty, return ;
else
if the first item of the list has the desired value, return its location;
else return LinearSearch(value, remainder of the list)

Selection algorithm
In computer science, a selection algorithm is an algorithm for finding the kth smallest number in a list
(such a number is called the kth order statistic). This includes the cases of finding the minimum,
maximum, and median elements. There are O(n), worst-case linear time, selection algorithms. Selection is
a subproblem of more complex problems like the nearest neighbor problem and shortest path problems.
The term "selection" is used in other contexts in computer science, including the stage of a genetic
algorithm in which genomes are chosen from a population for later breeding.
Selection can be reduced to sorting by sorting the list and then extracting the desired element. This
method is efficient when many selections need to be made from a list, in which case only one initial,
expensive sort is needed, followed by many cheap extraction operations. In general, this method requires
O(n log n) time, where n is the length of the list.
Linear time algorithms to find minimums or maximums work by iterating over the list and keeping
track of the minimum or maximum element so far.
Using the same ideas used in minimum/maximum algorithms, we can construct a simple, but
inefficient general algorithm for finding the kth smallest or kth largest item in a list, requiring O(kn) time,
which is effective when k is small. To accomplish this, we simply find the most extreme value and move
it to the beginning until we reach our desired index. This can be seen as an incomplete selection sort. Here
is the minimum-based algorithm:
function select(list[1..n], k)
for i from 1 to k
minIndex = i
minValue = list[i]
for j from i+1 to n

107

C. Canaan et al., World Applied Programming, Vol (1), No (2), June 2011.

if list[j] <
minIndex
minValue
swap list[i] and
return list[k]

minValue
= j
= list[j]
list[minIndex]

Other advantages of this method are:


After locating the jth smallest element, it requires only O(j + (k-j)2) time to find the kth smallest element,
or only O(k) for k j.
It can be done with linked list data structures, whereas the one based on partition requires random
access.
Binary search
In computer science, a binary search or half-interval search algorithm locates the position of an item
in a sorted array [3] [4]. Binary search works by comparing an input value to the middle element of the
array. The comparison determines whether the element equals the input, less than the input or greater.
When the element being compared to equals the input the search stops and typically returns the position
of the element. If the element is not equal to the input then a comparison is made to determine whether
the input is less than or greater than the element. Depending on which it is the algorithm then starts over
but only searching the top or bottom subset of the array's elements. If the input is not located within the
array the algorithm will usually output a unique value indicating this. Binary search algorithms typically
halve the number of items to check with each successive iteration, thus locating the given item (or
determining its absence) in logarithmic time. A binary search is a dichotomic divide and conquer search
algorithm.
It is useful to find where an item is in a sorted array. For example, to search an array for contact
information, with people's names, addresses, and telephone numbers sorted by name, binary search could
be used to find out a few useful facts: whether the person's information is in the array, what the person's
address is, and what the person's telephone number is.
Binary search will take far fewer comparisons than a linear search, but there are some downsides.
Binary search can be slower than using a hash table. If items are changed, the array will have to be resorted so that binary search will work properly, which can take so much time that the savings from using
binary search aren't worth it. If you can tell ahead of time that a few items are disproportionately likely to
be sought, putting those items first and using a linear search could be much faster.
With each test that fails to find a match at the probed position, the search is continued with one or
other of the two sub-intervals, each at most half the size. More precisely, if the number of items, N, is odd
then both sub-intervals will contain (N - 1)/2 elements, while if N is even then the two sub-intervals
contain N/2 - 1 and N/2 elements.
If the original number of items is N then after the first iteration there will be at most N/2 items
remaining, then at most N/4 items, at most N/8 items, and so on. In the worst case, when the value is not
in the list, the algorithm must continue iterating until the span has been made empty; this will have taken
at most log2(N) + 1 iterations, where the   notation denotes the floor function that rounds its
argument down to an integer. This worst case analysis is tight: for any N there exists a query that takes
exactly log2(N) + 1 iterations. When compared to linear search, whose worst-case behavior is N
iterations, we see that binary search is substantially faster as N grows large. For example, to search a list
of one million items takes as many as one million iterations with linear search, but never more than
twenty iterations with binary search. However, a binary search can only be performed if the list is in
sorted order.
The following incorrect (see notes below) algorithm is slightly modified (to avoid overflow) from
Niklaus Wirth's in standard Pascal [5]:
min := 1;
max := N; {array size: var A : array [1..N] of integer}
repeat
mid := min + (max - min) div 2;
if x > A[mid] then
min := mid + 1;

108

C. Canaan et al., World Applied Programming, Vol (1), No (2), June 2011.

else
max := mid - 1;
until (A[mid] = x) or (min > max);

Note 1: In the programming language of the code above, array indexes start from 1. For languages that
use 0-based indexing (e.g. most modern languages), min and max should be initialized to 0 and N-1,
respectively.
Note 2: The code above does not return a result, nor indicates whether the element was found or not.
Note 3: The code above will not work correctly for empty arrays, because it attempts to access an element
before checking to see if min > max.
This code uses inclusive bounds and a three-way test (for early loop termination in case of equality),
but with two separate comparisons per iteration. It is not the most efficient solution.
VI.

CONCLUSION

In this paper, we got into sorting problem and investigated different solutions. We talked about the
most popular algorithms that are useful for sorting lists. They are: Bubble sort, Selection sort, Insertion
sort, Shell sort, Merge sort, Heapsort, Quicksort and Bucket sort. Algorithms were represented with perfect
descriptions. Also, it was tried to indicate the computational complexity of them in the worst, middle and
best cases. At the end, implementation code was placed.
REFERENCES
[1]
[2]
[3]
[4]
[5]

Wikipedia. Address: http://www.wikipedia.com


Donald Knuth (1997). The Art of Computer Programming. 3: Sorting and Searching (3rd ed.). Addison-Wesley. pp. 396408.
ISBN 0-201-89685-0.
Introduction to Algorithms, available at: http://en.wikipedia.org/wiki/Introduction_to_Algorithms.
http://mathworld.wolfram.com/BinarySearch.html.
Niklaus Wirth: Algorithms + Data Structures = Programs. Prentice-Hall, 1975, ISBN 0-13-022418-9.

109

You might also like