10 Intro Data StructuresUpdated

Data Structures and Algorithms
Lecture 10. Introduction to Data Structures

Reading and Acknowledgments
Chapter 1 in Chaffer (2009) A practical introduction to data structures
●
and algorithm analysis, 3rd edition
— http://courses.cs.vt.edu/cs3114/Spring09/book.pdf
● Lecture Notes
— http://www.cis.upenn.edu/~matuszek/cit594-2014/
2
Learning Outcomes
Motivation behind the various types of data structures

●
3
Introduction to data structures
Ways of organising data in a computer so that it can be processed

●
efficiently by algorithms
Data representations and their associated operations
●
— E.g. insert, delete, search, ...

●No single data structure works well for all purposes, and so it is
important to know the strengths and limitations of several of them
Using the proper data structure can make the difference between
●
a program running in a few seconds and one requiring many days
4
Need for data structures (1/2)
●An algorithm is said to be efficient if it solves the problem within

the required resource constraints
— E.g. the total space available to store the data (main memory
and disk)
— E.g. the time allowed to perform each subtask
●A solution is said to be efficient if it requires fewer resources

than known alternatives (regardless of whether it meets any
particular requirements)
5
Need for data structures (2/2)
●The cost of a solution is the amount of resources that the

solution consumes.
●Most often, cost is measured in terms of one key resource such as
time, with the implied assumption that the solution meets the other
resource constraints.
See lectures on Algorithm Analysis
●
6
Choosing a data structure to solve a problem
We write programs to solve problems

●
How would we select the right data structure to solve a

●
particular problem?
By first analyzing the problem to determine the performance goals
●
●Ignoring this analysis step and applying a data structure which is

inappropriate to the problem may result in a slow program
7
Selecting the right data structure (1)
When selecting a data structure to solve a problem, you should

follow these steps:
(1) Analyze your problem to determine the basic operations that
must be supported. Examples of basic operations include:
● inserting a data item into the data structure
● deleting a data item from the data structure
● finding a specified data item in the data structure
(2) Quantify the resource constraints for each operation

(3) Select the data structure that best meets these requirements
8
Three concerns
●
— the data and the operations to be performed on them

— the representation for those data
— the implementation of that representation
●Resource constraints on certain key operations, such as search,

inserting and deleting data drive the data structure selection
process.
9
Examples of question to ask

●
— Are all data items inserted into the data structure at the
beginning, or are insertions interspersed with other
operations?
— Can data items be deleted?
— Are all data items processed in some well-defined order, or is
search for specific data items allowed?
●Interspersing insertions with other operations (deletion), and
supporting search for data require more complex data
representations.
10
Example: A bank
A bank must support many types of transactions with its customers, but lets
consider a simple model where customers wish to open accounts, close
accounts, and add money or withdraw money from accounts.
The typical customer opens and closes accounts far less often than he/she
accesses the account. Customers are willing to wait many minutes while
accounts are created or deleted but are typically not willing to wait more than a
brief time for individual account transactions such as a deposit or withdrawal.
These observations can be considered as informal specifications for the time
constraints on the problem.
It is common practice for banks to provide human tellers or ATMs to support
customer access to account balances and updates such as deposits and
withdrawals. Special service representatives are typically provided (during
restricted hours) to handle opening and closing accounts. Teller and ATM
transactions are expected to take little time. Opening or closing an account can
take much longer.
(Continued)
11
Example: A bank (2)
ATM transactions do not modify the database significantly. For simplicity,
assume that if money is added or removed, this transaction simply changes the
value stored in an account record.
Adding a new account to the database is allowed to take several minutes.
Deleting an account need have no time constraint, because from the customer’s
point of view all that matters is that all the money be returned (equivalent to a
withdrawal). From the bank’s point of view, the account record might be removed
from the database system after business hours, or at the end of the monthly
account cycle.
When considering the choice of data structure to use in the database system
that manages customer accounts, we see that a data structure that has little
concern for the cost of deletion, but is highly efficient for search and moderately
efficient for insertion, should meet the resource constraints imposed by this
problem. Records are accessible by unique account number.
One data structure that meets these requirements is the hash table.
12
Primitive Types VS References
● Java’s types are divided into primitive types and reference types
The primitive types are boolean, byte, char, short, int, long, float and
●
double
● All non-primitive types are reference types.
A primitive-type variable can store exactly one value of its declared type
●
at a time.
●For example, an int variable can store one whole number (such as 7) at
a time. When another value is assigned to that variable, its initial value is
replaced.
● Primitive-type instance variables are initialized by default.
●Variables of types byte, char, short, int, long, float and double are
initialized to 0, and variables of type boolean are initialized to false.
13
Primitive Types VS References
Programs use variables of reference types to store the locations of
●
objects in the computer’s memory.

Such a variable is said to refer to an object in the program. Objects that
●
are referenced may each contain many instance variables.

Reference-type instance variables are initialized by default to the value
●
null - a reserved word that represents a “reference to nothing.”
14
Data Types
A type is a collection of values (e.g. boolean, integer)
●
●A data item is a piece of information whose value is drawn from a type. A data
item is said to be a member of a type.
Composite type
●
— E.g. A bank account record contains several pieces of information such as

name, address, account number, and account balance.
— Such a record is an example of a composite type.
●A data type is a type together with a collection of operations to manipulate the

type.
— E.g. An integer variable is a member of the integer data type. Addition is
an example of an operation on the integer data type.
15
Abstract Data Type
● What does ‘abstract’ mean?
— From Latin: to ‘pull out’—the essentials
— To defer or hide the details
Abstraction emphasizes essentials and defers the details, making
●
engineering artifacts easier to use

I don’t need a mechanic’s understanding of what’s under a car’s hood in
●
order to drive it
— What’s the car’s interface?
— What’s the implementation?
● Hiding the details of implementation is called encapsulation (data hiding)
16
E.g. Float
●You don't need to know how much about floating point arithmetic works
to use float
●Indeed, the details can vary depending on processor, even virtual
coprocessor
But the compiler hides all the details from you--some numeric ADTs are
●
built-in
All you need to know is the syntax and meaning of operators, +, -, *, /,
●
etc.
17
ADT = Properties + Operations
An ADT describes a set of objects sharing the same properties and
●
behaviors
The properties of an ADT are its data (representing the internal state of
●
each object
— double d;
The behaviors of an ADT are its operations or functions (operations on
●
each instance)
— sqrt(d) / 2;
● Thus, an ADT couples its data and operations
● OOP emphasizes data abstraction
18
Abstract Data Type and Data Structure
●An abstract data type (ADT) is the realization of a data type as a software
component. The interface of the ADT is defined in terms of a type and a set
of operations on that type. The behavior of each operation is determined by
its inputs and outputs.
●An ADT does not specify how the data type is implemented. These
implementation details are hidden from the user of the ADT and protected
from outside access (a concept referred to as encapsulation).
●A data structure is the implementation for an ADT. In an object-oriented

language such as Java, an ADT and its implementation together make up a
class. Each operation associated with the ADT is implemented by a method.
The variables that define the space required by a data item are referred to as
data members. An object is an instance of a class, that is, something that is
created and takes up storage during the execution of a computer program.
19
Array
●An array
— is the fundamental contiguously-allocated data structure
— is a fixed-size data structure
— consists of a collection of elements of the same data type (stored
in adjacent memory locations), each identified by an integer index.

●As a program retrieves the value of each element of an array, it simply
moves from one memory location to the very next -- in a sequential

manner.
20
Arrays (contd)
Advantages Disadvantages
●Quick Insertion
●We cannot adjust the size in the
— We normally insert at the end
middle of a program (but we can use
of the array the concept of dynamic array to
●Constant time access given the circumvent this problem)
index ●Slow search
— Because the index of each
●Slow deletion
element maps directly to a
— Deleting an element from the
particular memory address, we
can access arbitrary data items middle of the array requires a
instantly provided we know the lot of effort to readjust the
index. positions of the elements that
come after the deleted element
●Space efficiency
— Arrays consist purely of data,
so no space is wasted with

links or an end-of-array
information
21
Strings
List of characters
●
E.g. DNA sequence

●
22
LinkedLists
●Linked lists are dynamic data structures whereby one item in the list
points to the next item in the list, using pointers.
●Pointers are connections that hold the pieces of a linked list together
and represent the address of a location in memory.

●Example
John Mary Ann null
23
Linked lists
Types of LinkedLists:
●
— (a) Single/Simple Linked List

— (b) Double Linked List
24
Overview of Data Structures
LinkedLists (contd)
Advantages (as compared to an
● ● Disadvantages
array) — Require extra space for
— We can never run out of storing pointer fields

space unless the memory is
actually full — Do not allow efficient
random access to items
— quick insertion
— Slow search
— quick deletion
25
Doubly-linked lists
26
Stacks
27
Stacks
● Supports retrieval by the Last-In, First-Out (LIFO) order.

●A good analogy compares a stack data structure to a stack of pancakes.
The first pancake cooked (first in) is put on a plate and then covered with
other pancakes as they are done cooking. The first pancake is the last
one that leaves the plate if you eat them after they are all cooked, and
you eat one at a time.
● Can be implemented using an array or linked list
● Advantages
— Provides efficient access through last-in first-out access.
● Disadvantages
— Slow access to other items
28
Queues
29
Queue
●A linear data structure in which data can be added to one end and
retrieved from the other.
●Just like the queue of the real world, the data that goes first into the queue
is the first one to be retrieved.

●Appropriate for applications (like certain simulations) where the order is
important
●Examples include queues at counters in any real-life application, where the
person joining the queue first is served first

●Can be implemented using an array or linked list
●Advantages
— Provides efficient access through first-in first-out access.
● Disadvantages
30
Priority queues
31
Priority Queue
●To model applications whereby tasks are processed in a specific order (neither
LIFO nor FIFO).
●Useful in simulations, particularly for maintaining a set of future events ordered by
time so that we can quickly retrieve what the next thing to happen is.
— They are called ``priority'' queues because they enable you to retrieve items
not by the insertion time (as in a stack or queue), but by item that has the
highest priority of retrieval.
●Implemented using an array
●Advantages
— Rapid insertion and deletion
— Proves a fast sorting method
— Fast access to largest item
●Disadvantages
— As implemented using an array, it has a fixed-size
●Most popular implementation is known as a heap.
32
Deques
33
Maps
34
Sets
●It is an abstract data structure that can store certain values, without any
particular order, and no repeated values
●It is a computer implementation of the mathematical concept of a finite set
●Unlike most other collection types, rather than retrieving a specific element
from a set, one typically tests a value for membership in a set

●The main operations are:
— union(S,T): returns the union of sets S and T
— intersection(S,T): returns the intersection of sets S and T
— difference(S,T): returns the difference of sets S and T
— subset(S,T): a predicate that tests whether the set S is a subset of
set T
● Implementations
— Arrays, Linked-lists, trees, or hash tables
35
Hash Table
A data structure in which keys are mapped to table positions(array

●
positions) by a hash function (a mathematical function)

● Allows insertions, lookups, and deletions to be performed in O(1) time
●Easy access is possible because at the time of insertion, the position for
the given record is calculated by applying the same hash function on its
key
● Implemented using an array
● Advantages
— Very fast access if key is known
— Fast insertion
● Disadvantages
— Slow deletion
— Access is slow if key is not known
— Inefficient memory usage. 36
Hash Tables (buckets)
37
Hash Maps
38
Hash Maps (open addressing)
39
Binary Trees
40
Binary Search Trees
Represented graphically in the form of a tree (upside-down) with the root
●
at the top and branches below
41
Binary Search Trees (contd)
Dynamic data structure, which means, that its size is only limited by
●
amount of free memory in the operating system and number of elements

may vary during the program run.
Labels each node with a single key, such that for any node labeled x, all
●
nodes on the left have value < x, and all nodes on the right of x have value
>= x.
● Advantages
— Rapid search
— Easy addition (if tree remains balanced).
● Disadvantages
— Deletion is complex
● Balanced versions of BST are top-down 2-3-4 tree and red-black tree
42
Heaps
43
Trees
44
Parse trees
45
Tries
46
Graphs
●A specialized data structure which is not linear as compared to the

previously discussed data structures
●A fundamental data structure in the world of programming, which
consists of nodes and edges

●Used to model a number of real-life problems, including:
— Road network
— Internet
●Can be used to solve problems where shortest path is required
●The edges of a graph can be implemented using an array or a linked-
list
47
Graphs (contd)
● Advantages
— Models real-world situations
● Disadvantages
— Some graph algorithms are slow and complex.
48
Directed graphs (Digraphs)
49
Undirected graphs
50
Typical Graph Applications
Modelling a road network with vertexes as towns and edge costs as

●
distances.
●Modelling a water supply network. A cost might relate to current or a
function of capacity and length. As water flows in only 1 direction, from
higher to lower pressure connections or downhill, such a network is
inherently an acyclic directed graph.
Modelling the recent contacts of someone who has become ill with a
●
notifiable illness, e.g. Sars or Meningitis. Edge costs might be a function

of the probability that the contact resulted in an infection.
51
Typical Graph Applications (contd)
Dynamically modelling the status of a set of routes by which traffic might

●
be directed over the Internet.

●Modelling the connections between a number of potential witnesses or
suspects who were reported or came forward as having been within the
vicinity of a serious crime within an hour of of when it occurred.
Minimising the cost and time taken for air travel when direct flights don't
●
exist between starting and ending airports.

Using a directed graph to map the links between pages within a website
●
and to analyse ease of navigation between different parts of the site.
52
Data structures as tools
Arrays (or lists) alone
A more complete toolset
53

10 Intro Data StructuresUpdated

Uploaded by

Copyright:

Available Formats

You might also like

10 Intro Data StructuresUpdated

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

10 Intro Data StructuresUpdated

Uploaded by

Copyright:

Available Formats

Data Structures and Algorithms

Lecture 10. Introduction to Data Structures

and algorithm analysis, 3rd edition

Motivation behind the various types of data structures

Ways of organising data in a computer so that it can be processed

— E.g. insert, delete, search, ...

a program running in a few seconds and one requiring many days

●An algorithm is said to be efficient if it solves the problem within

●A solution is said to be efficient if it requires fewer resources

●The cost of a solution is the amount of resources that the

We write programs to solve problems

How would we select the right data structure to solve a

●Ignoring this analysis step and applying a data structure which is

When selecting a data structure to solve a problem, you should

(2) Quantify the resource constraints for each operation

— the data and the operations to be performed on them

●Resource constraints on certain key operations, such as search,

Examples of question to ask

objects in the computer’s memory.

are referenced may each contain many instance variables.

null - a reserved word that represents a “reference to nothing.”

— E.g. A bank account record contains several pieces of information such as

●A data type is a type together with a collection of operations to manipulate the

engineering artifacts easier to use

●A data structure is the implementation for an ADT. In an object-oriented

— is a fixed-size data structure

— consists of a collection of elements of the same data type (stored

in adjacent memory locations), each identified by an integer index.

moves from one memory location to the very next -- in a sequential

— Arrays consist purely of data,

so no space is wasted with

E.g. DNA sequence

and represent the address of a location in memory.

John Mary Ann null

— (a) Single/Simple Linked List

— We can never run out of storing pointer fields

● Supports retrieval by the Last-In, First-Out (LIFO) order.

is the first one to be retrieved.

person joining the queue first is served first

— Provides efficient access through first-in first-out access.

— Rapid insertion and deletion

— Proves a fast sorting method

— Fast access to largest item

— Slow access to other items

— As implemented using an array, it has a fixed-size

●Most popular implementation is known as a heap.

from a set, one typically tests a value for membership in a set

— union(S,T): returns the union of sets S and T

— intersection(S,T): returns the intersection of sets S and T

— difference(S,T): returns the difference of sets S and T

— subset(S,T): a predicate that tests whether the set S is a subset of

A data structure in which keys are mapped to table positions(array

positions) by a hash function (a mathematical function)

at the top and branches below

amount of free memory in the operating system and number of elements

— Easy addition (if tree remains balanced).

●A specialized data structure which is not linear as compared to the

consists of nodes and edges