10 Intro Data StructuresUpdated

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 53

Data Structures and Algorithms

Lecture 10. Introduction to Data Structures


Reading and Acknowledgments
Chapter 1 in Chaffer (2009) A practical introduction to data structures

and algorithm analysis, 3rd edition

— http://courses.cs.vt.edu/cs3114/Spring09/book.pdf

● Lecture Notes

— http://www.cis.upenn.edu/~matuszek/cit594-2014/

2
Learning Outcomes

Motivation behind the various types of data structures


3
Introduction to data structures

Ways of organising data in a computer so that it can be processed


efficiently by algorithms
Data representations and their associated operations

— E.g. insert, delete, search, ...


●No single data structure works well for all purposes, and so it is
important to know the strengths and limitations of several of them
Using the proper data structure can make the difference between

a program running in a few seconds and one requiring many days

4
Need for data structures (1/2)

●An algorithm is said to be efficient if it solves the problem within


the required resource constraints

— E.g. the total space available to store the data (main memory
and disk)
— E.g. the time allowed to perform each subtask

●A solution is said to be efficient if it requires fewer resources


than known alternatives (regardless of whether it meets any
particular requirements)

5
Need for data structures (2/2)

●The cost of a solution is the amount of resources that the


solution consumes.
●Most often, cost is measured in terms of one key resource such as
time, with the implied assumption that the solution meets the other
resource constraints.
See lectures on Algorithm Analysis

6
Choosing a data structure to solve a problem

We write programs to solve problems


How would we select the right data structure to solve a


particular problem?
By first analyzing the problem to determine the performance goals

●Ignoring this analysis step and applying a data structure which is


inappropriate to the problem may result in a slow program

7
Selecting the right data structure (1)

When selecting a data structure to solve a problem, you should


follow these steps:
(1) Analyze your problem to determine the basic operations that
must be supported. Examples of basic operations include:
● inserting a data item into the data structure
● deleting a data item from the data structure
● finding a specified data item in the data structure

(2) Quantify the resource constraints for each operation


(3) Select the data structure that best meets these requirements

8
Selecting the right data structure (2)

Three concerns

— the data and the operations to be performed on them


— the representation for those data
— the implementation of that representation

●Resource constraints on certain key operations, such as search,


inserting and deleting data drive the data structure selection
process.

9
Selecting the right data structure (3)

Examples of question to ask


— Are all data items inserted into the data structure at the
beginning, or are insertions interspersed with other
operations?
— Can data items be deleted?
— Are all data items processed in some well-defined order, or is
search for specific data items allowed?
●Interspersing insertions with other operations (deletion), and
supporting search for data require more complex data
representations.

10
Example: A bank
A bank must support many types of transactions with its customers, but lets
consider a simple model where customers wish to open accounts, close
accounts, and add money or withdraw money from accounts.
The typical customer opens and closes accounts far less often than he/she
accesses the account. Customers are willing to wait many minutes while
accounts are created or deleted but are typically not willing to wait more than a
brief time for individual account transactions such as a deposit or withdrawal.
These observations can be considered as informal specifications for the time
constraints on the problem.
It is common practice for banks to provide human tellers or ATMs to support
customer access to account balances and updates such as deposits and
withdrawals. Special service representatives are typically provided (during
restricted hours) to handle opening and closing accounts. Teller and ATM
transactions are expected to take little time. Opening or closing an account can
take much longer.
(Continued)

11
Example: A bank (2)
ATM transactions do not modify the database significantly. For simplicity,
assume that if money is added or removed, this transaction simply changes the
value stored in an account record.
Adding a new account to the database is allowed to take several minutes.
Deleting an account need have no time constraint, because from the customer’s
point of view all that matters is that all the money be returned (equivalent to a
withdrawal). From the bank’s point of view, the account record might be removed
from the database system after business hours, or at the end of the monthly
account cycle.
When considering the choice of data structure to use in the database system
that manages customer accounts, we see that a data structure that has little
concern for the cost of deletion, but is highly efficient for search and moderately
efficient for insertion, should meet the resource constraints imposed by this
problem. Records are accessible by unique account number.
One data structure that meets these requirements is the hash table.

12
Primitive Types VS References
● Java’s types are divided into primitive types and reference types
The primitive types are boolean, byte, char, short, int, long, float and

double
● All non-primitive types are reference types.
A primitive-type variable can store exactly one value of its declared type

at a time.
●For example, an int variable can store one whole number (such as 7) at
a time. When another value is assigned to that variable, its initial value is
replaced.
● Primitive-type instance variables are initialized by default.
●Variables of types byte, char, short, int, long, float and double are
initialized to 0, and variables of type boolean are initialized to false.

13
Primitive Types VS References
Programs use variables of reference types to store the locations of

objects in the computer’s memory.


Such a variable is said to refer to an object in the program. Objects that

are referenced may each contain many instance variables.


Reference-type instance variables are initialized by default to the value

null - a reserved word that represents a “reference to nothing.”

14
Data Types
A type is a collection of values (e.g. boolean, integer)

●A data item is a piece of information whose value is drawn from a type. A data
item is said to be a member of a type.

Composite type

— E.g. A bank account record contains several pieces of information such as


name, address, account number, and account balance.
— Such a record is an example of a composite type.

●A data type is a type together with a collection of operations to manipulate the


type.
— E.g. An integer variable is a member of the integer data type. Addition is
an example of an operation on the integer data type.

15
Abstract Data Type
● What does ‘abstract’ mean?
— From Latin: to ‘pull out’—the essentials
— To defer or hide the details
Abstraction emphasizes essentials and defers the details, making

engineering artifacts easier to use


I don’t need a mechanic’s understanding of what’s under a car’s hood in

order to drive it
— What’s the car’s interface?
— What’s the implementation?
● Hiding the details of implementation is called encapsulation (data hiding)

16
E.g. Float
●You don't need to know how much about floating point arithmetic works
to use float
●Indeed, the details can vary depending on processor, even virtual
coprocessor
But the compiler hides all the details from you--some numeric ADTs are

built-in
All you need to know is the syntax and meaning of operators, +, -, *, /,

etc.

17
ADT = Properties + Operations
An ADT describes a set of objects sharing the same properties and

behaviors
The properties of an ADT are its data (representing the internal state of

each object
— double d;
The behaviors of an ADT are its operations or functions (operations on

each instance)
— sqrt(d) / 2;
● Thus, an ADT couples its data and operations
● OOP emphasizes data abstraction

18
Abstract Data Type and Data Structure
●An abstract data type (ADT) is the realization of a data type as a software
component. The interface of the ADT is defined in terms of a type and a set
of operations on that type. The behavior of each operation is determined by
its inputs and outputs.

●An ADT does not specify how the data type is implemented. These
implementation details are hidden from the user of the ADT and protected
from outside access (a concept referred to as encapsulation).

●A data structure is the implementation for an ADT. In an object-oriented


language such as Java, an ADT and its implementation together make up a
class. Each operation associated with the ADT is implemented by a method.
The variables that define the space required by a data item are referred to as
data members. An object is an instance of a class, that is, something that is
created and takes up storage during the execution of a computer program.

19
Array

●An array
— is the fundamental contiguously-allocated data structure

— is a fixed-size data structure

— consists of a collection of elements of the same data type (stored

in adjacent memory locations), each identified by an integer index.


●As a program retrieves the value of each element of an array, it simply

moves from one memory location to the very next -- in a sequential


manner.

20
Arrays (contd)

Advantages Disadvantages
●Quick Insertion
●We cannot adjust the size in the
— We normally insert at the end
middle of a program (but we can use
of the array the concept of dynamic array to
●Constant time access given the circumvent this problem)
index ●Slow search
— Because the index of each
●Slow deletion
element maps directly to a
— Deleting an element from the
particular memory address, we
can access arbitrary data items middle of the array requires a
instantly provided we know the lot of effort to readjust the
index. positions of the elements that
come after the deleted element
●Space efficiency

— Arrays consist purely of data,

so no space is wasted with


links or an end-of-array
information

21
Strings
List of characters

E.g. DNA sequence


22
LinkedLists

●Linked lists are dynamic data structures whereby one item in the list
points to the next item in the list, using pointers.
●Pointers are connections that hold the pieces of a linked list together

and represent the address of a location in memory.


●Example

John Mary Ann null

23
Linked lists

Types of LinkedLists:

— (a) Single/Simple Linked List


— (b) Double Linked List

24
Overview of Data Structures

LinkedLists (contd)
Advantages (as compared to an
● ● Disadvantages
array) — Require extra space for

— We can never run out of storing pointer fields


space unless the memory is
actually full — Do not allow efficient
random access to items
— quick insertion
— Slow search
— quick deletion

25
Doubly-linked lists

26
Stacks

27
Stacks

● Supports retrieval by the Last-In, First-Out (LIFO) order.


●A good analogy compares a stack data structure to a stack of pancakes.
The first pancake cooked (first in) is put on a plate and then covered with
other pancakes as they are done cooking. The first pancake is the last
one that leaves the plate if you eat them after they are all cooked, and
you eat one at a time.
● Can be implemented using an array or linked list
● Advantages
— Provides efficient access through last-in first-out access.

● Disadvantages
— Slow access to other items

28
Queues

29
Queue

●A linear data structure in which data can be added to one end and
retrieved from the other.
●Just like the queue of the real world, the data that goes first into the queue

is the first one to be retrieved.


●Appropriate for applications (like certain simulations) where the order is

important
●Examples include queues at counters in any real-life application, where the

person joining the queue first is served first


●Can be implemented using an array or linked list

●Advantages

— Provides efficient access through first-in first-out access.

● Disadvantages
— Slow access to other items

30
Priority queues

31
Priority Queue
●To model applications whereby tasks are processed in a specific order (neither
LIFO nor FIFO).
●Useful in simulations, particularly for maintaining a set of future events ordered by

time so that we can quickly retrieve what the next thing to happen is.
— They are called ``priority'' queues because they enable you to retrieve items

not by the insertion time (as in a stack or queue), but by item that has the
highest priority of retrieval.
●Implemented using an array

●Advantages

— Rapid insertion and deletion

— Proves a fast sorting method

— Fast access to largest item

●Disadvantages

— Slow access to other items

— As implemented using an array, it has a fixed-size

●Most popular implementation is known as a heap.

32
Deques

33
Maps

34
Sets

●It is an abstract data structure that can store certain values, without any
particular order, and no repeated values
●It is a computer implementation of the mathematical concept of a finite set

●Unlike most other collection types, rather than retrieving a specific element

from a set, one typically tests a value for membership in a set


●The main operations are:

— union(S,T): returns the union of sets S and T

— intersection(S,T): returns the intersection of sets S and T

— difference(S,T): returns the difference of sets S and T

— subset(S,T): a predicate that tests whether the set S is a subset of

set T
● Implementations
— Arrays, Linked-lists, trees, or hash tables

35
Hash Table

A data structure in which keys are mapped to table positions(array


positions) by a hash function (a mathematical function)


● Allows insertions, lookups, and deletions to be performed in O(1) time
●Easy access is possible because at the time of insertion, the position for
the given record is calculated by applying the same hash function on its
key
● Implemented using an array
● Advantages
— Very fast access if key is known
— Fast insertion
● Disadvantages
— Slow deletion
— Access is slow if key is not known
— Inefficient memory usage. 36
Hash Tables (buckets)

37
Hash Maps

38
Hash Maps (open addressing)

39
Binary Trees

40
Binary Search Trees
Represented graphically in the form of a tree (upside-down) with the root

at the top and branches below

41
Binary Search Trees (contd)

Dynamic data structure, which means, that its size is only limited by

amount of free memory in the operating system and number of elements


may vary during the program run.

Labels each node with a single key, such that for any node labeled x, all

nodes on the left have value < x, and all nodes on the right of x have value
>= x.

● Advantages

— Rapid search

— Easy addition (if tree remains balanced).

● Disadvantages

— Deletion is complex

● Balanced versions of BST are top-down 2-3-4 tree and red-black tree

42
Heaps

43
Trees

44
Parse trees

45
Tries

46
Graphs

●A specialized data structure which is not linear as compared to the


previously discussed data structures
●A fundamental data structure in the world of programming, which

consists of nodes and edges


●Used to model a number of real-life problems, including:

— Road network

— Internet

●Can be used to solve problems where shortest path is required

●The edges of a graph can be implemented using an array or a linked-

list

47
Graphs (contd)

● Advantages

— Models real-world situations

● Disadvantages

— Some graph algorithms are slow and complex.

48
Directed graphs (Digraphs)

49
Undirected graphs

50
Typical Graph Applications

Modelling a road network with vertexes as towns and edge costs as


distances.
●Modelling a water supply network. A cost might relate to current or a
function of capacity and length. As water flows in only 1 direction, from
higher to lower pressure connections or downhill, such a network is
inherently an acyclic directed graph.
Modelling the recent contacts of someone who has become ill with a

notifiable illness, e.g. Sars or Meningitis. Edge costs might be a function


of the probability that the contact resulted in an infection.

51
Typical Graph Applications (contd)

Dynamically modelling the status of a set of routes by which traffic might


be directed over the Internet.


●Modelling the connections between a number of potential witnesses or
suspects who were reported or came forward as having been within the
vicinity of a serious crime within an hour of of when it occurred.
Minimising the cost and time taken for air travel when direct flights don't

exist between starting and ending airports.


Using a directed graph to map the links between pages within a website

and to analyse ease of navigation between different parts of the site.

52
Data structures as tools

Arrays (or lists) alone

A more complete toolset

53

You might also like