Data Structures Summary Topics 7-12

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

DATA STRUCTURES

Linear data structures

In some places you can read about abstract and concrete data structures. We haven't gone
into this in class because the literature can be confusing and often contradicts itself. In many
cases I don't even know myself when a structure is abstract or not.

But for you to understand me, let's say that an abstract data structure is an idea, that is, a
theoretical image that is created to try to optimize the organization and operations with data.
On the other hand, a concrete data structure is one with which abstract structures are
implemented, that is, one that really exists in the world of programming.

Arrays and Linked Lists can therefore be considered concrete data structures with which we
can implement other abstract structures such as a hash table. Remember that to implement
a hash table we needed an array and a hash function and that a hash table is not the same
as a Python dictionary.

Both arrays and linked lists are linear data structures, that is, data structures where data
elements are arranged sequentially or linearly where the elements are attached to its
previous and next adjacent in what is called a linear data structure. In linear data structure, a
single level is involved. Therefore, we can traverse all the elements in a single run only.
Linear data structures are easy to implement because computer memory is arranged in a
linear way.

So in arrays and linked lists each element can only be linked to the next or previous and
using these we can implement other abstract data structures that are also considered linear:
Stacks and Queues.

Stacks and Queues are designed to perform very specific tasks such as Spotify Queues or
Stacks that allow us to perform the function of undo in many of the programs we use.

In this way we can consider the Stacks and Queues as arrays or linked lists that are limited
to a series of operations to be more optimal in specific tasks. To implement them, we have to
use object oriented programming or we can take help from already created libraries.

Let´s see them

Stacks

A stack is a collection of objects that are inserted and removed according to the last-in,
first-out (LIFO) principle. A user may insert objects into a stack at any time, but may only
access or remove the most recently inserted object that remains.
Remember the call stack that the computer uses to save the functions in memory? We saw it
in the Recursion session.

A stack is an abstract data structure whose main operations are ‘push’ and ‘pop’.
when you use the undo function in any program, the program is popping the last operation
registered in the stack where it stores them.

The stack is the simplest of all abstract data structures, but is really important for many more
sophisticated algorithms and data structures.

Formally, it is an Abstract Data Type that supports the following methods:

● stack.push(e): add an element to the top of the stack


● stack.pop(): remove and return the top element from the stack
● stack.top(): return a reference to the top element of the stack without removing it
● stack.is_empty(): return True if the stack is empty
● len(stack): return the number of elements in the stack

Therefore, the best way to implement a Stack is using an array. That way, all operations
have a runtime of O(1).
Queues

A queue is a collection of objects that are inserted and removed according to the first-in,
first-out (FIFO) principle. Elements can be inserted at any time, but only the element that
has been in the queue the longest can be next removed.

A queue then, is an abstract data structure whose main operations are ‘enqueue’ and
‘dequeue’.

In a real life scenario, Call Center phone systems use Queues to hold people calling them in
an order, until a service representative is free.

Formally, a queue defines a collection that keeps objects in a sequence, where element
reading and deletion are restricted to the first element in the queue, and element insertion is
restricted to the back of the sequence.

The queue Abstract Data Type supports the following methods:

● queue.enqueue(e): adds element e to the back of the queue


● queue.dequeue(): removes and return the first element from the queue
● queue.first(): returns a reference to the element at the front of the queue without
removing it
● queue.is_empty(): returns True if the queue is empty
● len(queue): returns the number of elements in the queue

The most efficient way to implement a Queue is using a linked list. That way, all operations
have a runtime of O(1).

Non linear data structures

Okay, so these were some linear data structures. Well if there are linear is because there are
also non-linear.

Well that's right, data structures where data elements are not arranged sequentially or
linearly are called non-linear data structures. In a non-linear data structure, a single level is
not involved. Therefore, we can’t traverse all the elements in a single run only. Non-linear
data structures are not easy to implement in comparison to linear data structures. However,
they use computer memory efficiently in comparison to linear data structures. The main non
linear data structures are trees and graphs.

Also, these structures come from theoretical images of data organization such as a social
network, a map, an organization chart of a company, a family tree, etc. But they are also
concrete data structures in many cases (although more complex than an array or a linked
list). For this reason we will not make the distinction between abstract data structure and
concrete data structure if it is not very clear.

Let’s see some non linear data structures.

Trees

A family tree or an organization chart of a company are also ways to organize data. In
addition, they share a characteristic that is that the data is organized in a hierarchical way.

Well, this is very useful for many situations in the world of computers and that is why this
theoretical image of 'tree' has been incorporated into the world of programming and
computer science.

Let’s see how

What is a Tree in computer science?

Tree structures allow us to implement many algorithms much faster than using linear data
structures, such as array-based lists or linked lists. One reason to use trees might be
because you want to store information that naturally forms a hierarchy. For example, the file
system on a computer.

As we said, the relationships in a tree are hierarchical, with some objects being above and
some below others. Actually, the main terminology for tree data structures comes from family
trees, and that’s why we use terms as “parent”, “child”, “ancestor” and “descendant”.
The structure of a tree is quite intuitive and is formed by nodes and edges. The node is
where the data is stored and the edge is what connects one node to another.

Each node is called in one way or another depending on its relationship with the rest of the
nodes. This way we find ourselves with leaves, children, siblings, parents and the root of our
tree.

To better understand the shape of a tree we can measure its height, width and the depth and
height of its nodes.

● The depth of a node is the number of edges from the node to the tree's root node. A
root node will have a depth of 0.
● The height of a node is the number of edges on the longest path from the node to a
leaf. A leaf node will have a height of 0.
● The height of a tree would be the height of its root node, or equivalently, the depth
of its deepest node.
● The diameter (or width) of a tree is the number of nodes on the longest path
between any two leaf nodes. The tree in the image has a diameter of 5 nodes.

Well then it seems that there are a lot of possibilities with the trees,

❖ Which one should I choose?


❖ How do I translate it to code?

Binary Trees

Under the premise of maintaining a tree structure, we can create an infinite number of
different trees each with a different number of branches, leaves, with different heights,
widths, etc.

However, if we create trees without any type of limitation, then it would be very difficult to
design algorithms that can work correctly for all of them. This is why there are certain types
of trees with a defined structure according to the needs we have to store data.
The tree structure more used in this context is the binary tree since a binary structure is very
practical at the time of classifying elements. A binary structure allows us to classify elements
if they fulfill a condition or not (yes-no), it allows us to compare an element with another (if it
is higher, lower, equal...). Therefore, the properties of the binary trees makes them very
practical at the time of storing data and of designing algorithms to do operations on them.

Let’s see them

A Binary Tree is a tree, so it has the properties of a general tree with some constraints:

● Every node has at most two children


● Each child node is labeled as being either a left child or a right child
● A left child precedes a right child in the order of children of a node
● The subtree rooted at a left or right child of a node is called left subtree or right
subtree, respectively, of that node.
● A binary tree is more appropriate if each node has either zero or two children (also
called full binary tree).

Therefore, a binary tree is a tree data structure composed of nodes, each of which has at
most, two children, referred to as left and right nodes. Since a binary tree is a tree, it starts
off with a single node known as the root.

Each node in the binary tree contains the following:

● Data
● Pointer to the left child
● Pointer to the right child

Now that we have a more concrete tree structure, it is easier to translate the idea into code
and thus create our own binary tree with the main methods/algorithms/operations of any
data structure: Reading, Insertion and Deletion.

Insertion

Elements may be inserted into a binary tree in any order. The very first insertion operation
creates the root node. Each insertion that follows iteratively searches for an empty location
at each level of the tree.

Upon finding an empty left or right child, the new element is inserted. By convention, the
insertion always begins from the left child node.
Deletion

An element may also be removed from the binary tree. Since there is no particular order
among the elements, upon deletion of a particular node, it is replaced with the right-most
element.

One important thing to note here is that there is no specific way to organize data in a binary
tree. That's why, not having a defined order, the search operation would work as a simple
search, which is not very efficient.

Also, being a non-linear structure, there is no one way to go through its elements (nodes)).
For this reason, a series of algorithms were designed to perform the operation of iterating on
the elements of non-linear structures: Traversal algorithms. With the Traversal algorithms we
can read, print or perform operations on each of the nodes of a tree. We will talk about them
later.

So, having this tree structure and its well-defined operations, more efficient and useful binary
tree types arise for specific applications, like, for example, Heaps and Binary Search Trees.
Let's see these types and their operations with some more depth.
Heaps and Priority Queues

A Heap is a special Tree-based data structure in which the tree is a binary tree. Generally,
Heaps can be of two types:

● Max-Heap: In a Max-Heap the value present at the root node must be greatest
among the values present at all of its children. The same property must be
recursively true for all sub-trees in that Binary Tree.
● Min-Heap: In a Min-Heap the value present at the root node must be minimum
among the values present at all of its children. The same property must be
recursively true for all sub-trees in that Binary Tree.

This way, the root node will be the smallest/greatest value in the heap with an access of
O(1).

Apart from these, a heap has another specific property that differentiates it from other binary
trees.

● It has to be complete: That is, when we construct a heap, we do it from left to


right across each row. In this way, every node of the heap will have two children,
left and right. This property has to be fulfilled in all the rows of the heap except the last one,
where the nodes can have only one child or none at all.

The main application of the Heaps is their use to implement Priority Queues.

A priority queue is an abstract data type similar to regular queue or stack data structure
in which each element additionally has a "priority" associated with it.

● Every item has a priority value associated with it.


● An element with high priority is served before an element with low priority.
● If two elements have the same priority, they are served according to their order in the
queue.

Like with Stacks and Queues, we have defined the main and more used operations in
Priority Queues. They are:
● Insert with priority. First of all, we would want to be able to Insert new elements
with their associated priority level.
● Then, we would like to be able to obtain and remove the element with the highest
priority.
● Also it would be useful for us to just see which is the element with the highest priority
in the list, this operation is normally called Peek.
● And, finally, Sorting. We are going to be able to combine the main operations of the
Priority Queues to sort an unsorted array according to the priority of its elements.

A Priority Queue could be implemented with an array but, its more efficient
implementation is using a Heap. Below we will see how long it takes for a Heap to
complete Priority Queue operations.

Imagine that we have a list of items, each of them with a priority value associated: [(17,
‘item 0’), (7, ‘item 1’), (36, ‘item 2’), (1, ‘item 3’), (19, ‘item 4’), (3, ‘item 5’), (2, ‘item 6’), (25,
‘item 7’), (100, ‘item 8’)]

1. Heapify: First, we have to turn that list into a heap (a max heap in this case). A heap
is implemented with an underlying array, so that we will insert the values of the
original array into the heap keeping the properties of it, and therefore the underlying
array of the heap will be different from the original one. This process takes O(n).

2. Insertion with priority: In this process we would first insert the element with its
priority value at the end of the Heap (O(1)) and, after this, if the properties of the
Heap have been violated, we will proceed to build it by organizing the values to have
a valid Heap. The reconstruction process can have a runtime of O(logn), so the
whole insertion operation has a runtime of O(logn).

3. Get the highest priority element: In this operation the same thing would happen. To
pop the element with the highest priority would take O(1) since it is always at the root
of the Heap and you don't have to look for it. However, if you remove this element
and alter the properties of the Heap we will have to do a rebuild, so the whole
operation can take O(logn).
4. Peek: In this operation we simply read the element with the highest priority that we
know is in index 0, so the operation takes O(1).

5. Sorting (heapsort): In this operation we simply pop the element with the highest
priority and store it in a new array (completely sorted). This operation is done n times
so → Get the highest priority element n times = O(nlogn)

Binary Search Trees

We have seen a type of Binary tree, the Heaps, that are presented as the most efficient
option when implementing Priority Queues, however, neither the Binary Trees we have
talked about nor the Heaps have a very strict order in their elements.

Imagine performing the search operation in a Binary Tree. As we have already said before,
being disordered, finding an element in a Binary Tree can take us O(n) so it would not be
even more efficient than a linked list.

However, if we give a strict order to the items in a Binary Tree using the Binary Search
algorithm logic we will be able to create a more versatile and efficient structure than those
we have seen before: Binary Search Trees.

Therefore, if we have a very large collection of elements and we only want to access or read
elements and insert or remove the last one, using an array is a good option to store them. If,
on the other hand, our priority is to have a structure that allows us to insert or delete
elements in any position or read or access the first one of them, using a linked list is a good
option to store them.

However, if we want to perform these three operations efficiently when we have a large
collection of items, our best option is to use a hash table, which allows us to perform all three
operations in O(1) on average.

What are the problems with hash tables? One problem is that their elements do not have a
position that we can access. We can access and do operations with them through their keys
but there is no first item, second item, third item...

This does not allow us to iterate over its elements. So if we want to do a loop on a Python
dictionary we are doing an extra effort because what we do is to extract all the keys from the
dictionary, create an array of keys and iterate on that array. This extra effort will lead us to a
bad performance when applying algorithms that iterate over the structure like binary search
or quicksort.

In addition, your hash table may have a hash function that is not perfect or a high load factor,
which may cause collisions that slow down the performance of operations, reaching a
runtime of O(n) for the three main operations in the worst case scenario.

This is why binary search trees are really good. If they are balanced, they have a predictable
running time for every kind of operation, and they are going to perform all these operations
quite fast, in O(logn) time.

It doesn’t matter if the user is going to search more, or is going to insert or remove elements,
the operation using a balanced binary search tree is going to take O(logn) time complexity
(worst case).

Here we mention the most important characteristic of these structures, their predictable
runtime. When a binary search tree is balanced, it will give a performance of one O(logn)
runtime (worst case), for all operations.

Having a predictable runtime is crucial when we handle a large amount of data and want to
do all the possible operations.

So, although the hash tables give us a better performance in the three main operations (on
average), they do not offer us a sorted collection of the elements or good resources to iterate
over the structure. On the other hand, the BST are more versatile and maintain a
hierarchical order that allows us to achieve a good performance for all operations.

In order to obtain such a versatile and efficient structure we have to take into account two
things

● It follows the properties of a Binary Search Tree


● Whether it is balanced or not

Properties

A Binary Search Tree is a Binary Tree, so it inherits the properties of a Binary Tree. What is
special about the Binary Search Tree are three rules that must always be followed:

● The left subtree of a node contains only nodes with values lesser than the node’s
value.
● The right subtree of a node contains only nodes with values greater than the node’s
value.
● The left and right subtree each must also be a binary search tree.
Why is it good? Because in each lookup that we do we will discard half of the nodes, like in
Binary Search algorithm. This way, the running time of ALL the main operations will be
O(logn) on average where O(logn) is approximately equal to the height of the tree.

● Insertion: we start at the root node. If the data we want to insert is greater than the
root node we go to the right, if it is smaller, we go to the left. And so on…
● Search: we start at the root node. If the data we want to find is greater than the root
we go to the right, if it is smaller, we go to the left until we find it!
● Find the minimum: we start at the root node. If the data we want to find is greater
than the root we go to the right, if it is smaller, we go to the left until we find it!
● Find the maximum: we start at the root node. If the data we want to find is greater
than the root we go to the right, if it is smaller, we go to the left until we find it!
● Deletion: to complete this operation we have to do a search operation to find the
item to be removed and then remove it and update the references.

But what happens when the height of the tree is not approximately logn? This means that
the tree is not balanced, which can cause the runtime of all operations to be O(n) in the
worst case.
So we have to guarantee that our Binary Search Tree is balanced to ensure a 100%
predictable runtime. And, how do we do this?

To do this, we have Binary Search Trees that balance themselves: AVL Trees and Red
Black Trees.

Let 's see them.

AVL Trees

AVL trees are self balancing binary search trees. AVL tree checks the height of the left and
the right subtrees and assures that the difference is not more than 1. This difference is called
the Balance Factor.

If the difference in the height of left and right subtrees is more than 1, the tree is balanced
using some rotation techniques.

To balance itself, an AVL tree may perform the following four kinds of rotations:

● Left rotation

● Right rotation

● Left-Right rotation
● Right-Left rotation

The first two rotations are single rotations and the next two rotations are double rotations. To
have an unbalanced tree, we at least need a tree of height 2.

Red Black Trees

Another solution to the problem of balancing the binary search trees is the Red Black Trees.
Red Black Trees are BSTs that balance themselves.

● Building a Red Black Tree takes less time than building an AVL Tree
● AVL trees provide faster searches/readings than Red Black Trees because they are
more strictly balanced.
● Red Black Trees provide faster insertion and removal operations than AVL trees as
fewer rotations are done due to relatively relaxed balancing.
Graphs

As we said before, besides the Trees, we have another type of non-linear data structures
called Graphs.

A graph is an abstract data structure made up of nodes or vertices and edges. A node can
be directly connected to many other nodes. Those nodes are called its neighbors.

Graphs are a more general structure than trees; in fact you can think of a tree as a special
kind of graph.

Graphs can be used to represent many interesting things about our world, including systems
of roads, airline flights from city to city, how the Internet is connected, or even the sequence
of classes you must take to complete a major in computer science.

A graph models a set of connections.

Okay, graphs and trees are nonlinear collections of linked nodes/vertices, in fact, in previous
sessions we said that a tree is a constrained graph and that’s because a tree is simply a
hierarchical graph with a root node. Let’s see their differences:

Relationships between nodes:

● The first node of a tree is the root node. Everything else flows hierarchically from this
node, originating parent-child relations between the nodes.
● Graph data structures don’t have root nodes and there are no parent-child relations.
A tree has a hierarchical structure whereas graph has a network model.

Path between nodes:

● In a tree there exists only one path between any two nodes/vertices whereas a graph
can have more than one.

Directed or Undirected:

● Tree data structures are assumed to be directed whereas graphs can be directed or
undirected. In directed structures the nodes have 'arrows' pointing to other nodes, the
relationship is in one way. An undirected graph doesn’t have any arrows, and both
nodes are each other’s neighbors. For example, both of these graphs are equal:
During this course, we have seen the two main types of graphs: Unweighted and Weighted
Graphs.

If edges in your graph have weights then your graph is said to be a weighted graph, if the
edges do not have weights, the graph is said to be unweighted. A weight is a numerical
value attached to each individual edge.

Unweighted Graph Weighted Graph

The main operation we do in Graphs is to find paths between nodes. To perform this
operation we can use the Traversal algorithms, the Dijkstra algorithm or the Bellman Ford
algorithm, depending on the type of Graph we are working with and the objectives we have.

We'll see all of them in the algorithms section of this review.

You might also like