Graph Representation Part 02 - Adjacency Matrix - English

You might also like

Download as txt, pdf, or txt
Download as txt, pdf, or txt
You are on page 1of 13

So in our previous lesson, we discussed

one possible way of

storing and representing a graph in

which

we used two list. One to store the

vertices and another to store the

edges. A record in vertex list here

is name of a node

and a record in edge list is an

object

containing references to the two endpoints

of an edge and also the weight of that edge

because this example graph that I am showing

you here is a

weighted graph. We called this kind of

representation

edge list representation but we realised

that this kind of storage is not very

efficient in terms of

time cost of most frequently performed

operations

like finding nodes adjacent to a given

node

or finding if two nodes are

connected are not.

To perform any of these operations, we

need to scan the whole

edge list. We need to perform a

linear search on the edge list.

So the time complexity is big oh of number


of

edges and we know that number of edges

in the graph

can be really really large. In worst case

it can be close to square of number of

vertices.

In a graph, anything running in order

of number of

edges is considered very costly. We

often want to keep the cost

in order of number of vertices. So we

should think of some other efficient

design.

We should think of something better than

this. One more possible design is that

we can store the edges in a

two-dimensional array

or matrix. We can have a

two-dimensional matrix

or array of size V*V

where V is number of vertices.

As you can see, I have drawn an 8*8

array here because number of vertices

in my sample graph here

is 8. Let's name this array A.

Now if we want to store a graph that is

unweighted. Let's just remove the weights

from this sample graph here

and now our graph is unweighted and if we


have

of value or index between 0 and V-1

for each vertex which we have here

if we are storing the vertices in a

vertex list

than we have an index between 0 and V-1

for each vertex. We can say that A

is zeroth node,

B is 1th node, C is

2th

node and so on. We are picking up

indices from vertex list. Okay

so if the graph

is unweighted and each vertex has an

index between 0 and

V-1, then in this matrix

or 2d array. We can set ith row

and jth column that is A[i][j]

as 1 or boolean value

true. if there is an edge from i to j

0 or false otherwise. If I have

to fill this matrix for this example

graph here then I'll go vertex by vertex.

Vertex 0 is connected to Vertex 1

2 and 3. Vertex 1

is connected to 0, 4 and 5.

This is an undirected graph so if we

have and edge from 0 to 1,

we also have an edge from 1 to 0

so
1th row and 0th column should also be

set as 1.

Now let's go to nodes 2, it's connected

to 0

and 6, 3 is connected to 0 and 7,

4 is connected to 1 and 7,

5 once again is connected to 1 and 7,

6 is connected to

2 and 7 and 7 is connected

to 3, 4, 5 and 6.

All the remaining positions in

this array should be set as 0.

Notice that this matrix

is symmetric. For an undirected graph,

this matrix would be symmetric

because A[i][j] would be equal to A[j][i].

We would have two positions filled for

each edge.

In fact to see all the edges in the graph,

we need to go through only one of these

two halves.

Now this would not be true for our

directed graph. Only one position will be

filled for each

edge and we will have to go through

the entire matrix

to see all the edges. Okay,

now this kind of representation of a

graph in which
edges or connections are stored in a

matrix

or 2D array is called adjacency matrix

representation. This particular matrix that

I have drawn here

is an adjacency matrix. Now with this

kind of storage or representation,

what do you think would be the time cost

of finding

all nodes adjacent to a given node. Let's say

given this vertex list

and adjacency matrix, we want to find

all nodes adjacent to node named F.

If we are given name of a node than

we first need to know it's

index and to know the index, we will have to

scan the vertex list.

There is no other way. Once we figured out

index

like for F index is 5 then

we can go to the row with that index

in the adjacency matrix

and we can scan this complete row to

find all the

adjacent nodes. Scanning the vertex

list

to figure it out the index in worst case

will cost us time proportional to the

number of vertices

because in worst case we may have to


scan the whole list,

and scanning a row

in the adjacency matrix would once again

cost us time proportional to number of

what vertices because

in a row we would have exactly

V columns where V is number of a

vertices.

So overall time cost of this operation

is big oh of V. Now most of the time

while performing operations,

we must pass indices to avoid

scanning the vertex list all the time.

If we know an index, we can figure out

the name in constant time,

because in an array we can access element at

any index in constant time but if we know

a name

want to figure out index then it will

cost us big oh of V.

We will have to scan the vertex list.

wWe will have to perform linear search

on it. Okay moving on.

Now what would be the time cost of

finding if 2 nodes

are connected or not. Now once again the

two nodes can be given to us

as indices or names. If the nodes

would be passed test as indices


then we simply need to look at value in

a particular row and

particular column. We simply need to look

at

A[I][J] for some values of I and J

and this will cost us constant time.

You can look at Value in any cell in

a two-dimensional array in constant time.

So if

indices are given time complexity of

this operation would be big oh of 1

which simply means that we will

take constant time

but if names are given then we also need

to do the scanning

to figure out the indices which will

cost us big oh of V.

Overall time complexity would be

Big oh of V.

The constant time access would not mean

anything.

The scanning of vertex list all the

time to figure it out

indices can be avoided. We can use

some extra memory to create

a hash table with names and indices

as key value pairs and then the time

cost of finding

index from name would also be big oh

of 1 that is constant. Hash table is


a data structure

and I have not talked about it in any of

my lessons so far.

If you do not know about hash table, just

search online for

a basic idea of it. Okay, so as you can

see

with adjacency matrix representation

our time cost of some of the most

frequently performed operations

is in order of number of vertices

and not in order of number of

edges which can be as high as square of

number of vertices.

Okay now if we want to store

a weighted graph in adjacency matrix

representation

then A[i][j] in the matrix can be set as

weight of an edge. For non-existent ages we

can have

a default value like a really large

or maximum possible integer value

that is never expected to be an edge

weight. I have just filled in infinity

have to mean that

we can choose the default as infinity

minus infinity

or any other value that would never

ever be a valid
edge weight. Okay, now for further

discussion

I'll come back to an unweighted graph.

Ajacency matrix

looks really good so should we not use it

always.

Well, with this design we have improved

on

time, but we have gone really high on

memory usage

instead of using memory units exactly

equal to the number of edges

what we're doing with

edge list kind of storage.

Here we're using exactly V square

units of memory.

We are using big oh of V square space.

We are not just storing the information

that these two

nodes are connected, we are also storing not

of it

that is these two nodes side not connected

which probably is

redundant information. If a graph is

dense,

if the number of edges is really close

to V square

then this is good but if the graph is

sparse

that is if number of edges is lot lessser


than V square

then we are wasting a lot of

memory in storing the zeros.

Like for this example graph that I have

drawn here, in the edge list we were

consuming

10 units of memory we had ten rows

consumed in the edge list

but here we are consuming 64 unit.

Most graphs with

really large number of vertices would

not be very dense,

would not have number of edges anywhere

close to V sqaure

like for example, Let's say we are modeling

a social network like Facebook as a

graph such that a user in the network

is a node

and there is an undetected edge if two

users are friends.

Facebook has a billion users but I'm

showing only a few in my example graph

here because I'm short of space.

Let's just assume that we have a billion

users in our network,

so number of vertices in a graph is

10 to the power 9

which is billion. Now do you think number

of connections
in our social network can ever be close

to square of number of users

that will mean everyone in the network

is a friend of

everyone else. A user of our social

network will not be friend to all other

billion users.

We can safely assume that a user

on an average would not have more than

a thousand friends

with this assumption we would have

10 to the power 12

edges in our graph. Actually, this is an

undirected graph

so we should do a divide by 2 here. So

that we do not

count an edge twice. So if

average number of friends is 1000 then total

number of connections in my graph is

5 * 10 to power 11. Now this

is lot lesser than a square of number

of vertices.

So basically if you would use an adjacency

matrix for this kind of a graph,

we would waste a hell lot of space

and moreover

even if we are not looking in relative

terms 10 to the power 18

units of memory, even in absolute

sense
is alot. 10 to the power 18 bytes

would be about a 1000 petabytes.

Now this really is a lot of space. This

much data would never ever fit on one

physical disk.

5 into 10 to the power 11 byts on the other

hand

it's just 0.5 terabytes. A typical

personal computer these days would have this

much of storage.

So as you can see for something like a

large

social graph adjacency matrix

representation is not very efficient.

Agency matrix is good when a graph is

dense

that is when the number of edges is

close to square of number of vertices

or sometimes when total number of

possible connection that is V square

is so less that wasted space would not

even matter

but most real-world graphs would be

sparse

and adjacency matrix would not be a good

fit.

Let's think about another example. Let's

think about

world wide web as are directed graph.


If you can think of web pages as nodes

in a graph

and hyperlinks as directed edges

then a webpage would not have linked to

all other pages

and once again number of webpages

would be in order of millions.

A webpage would have link to only

the

a few other pages, so the graph would be

sparse.

Most real world graphs would be sparse

and adjacency matrix. Even though it's

giving us good running time for most

frequently performed

operations would not be a good fit

because it's not very efficient in terms

of space

so what should we do. Well there's

another

representation that gives us similar

or maybe even better running time than

adjacency matrix and does not consume so

much space

It's called adjacency list

representation and we will talk about it

in our next lesson.

This is it for this lesson.

Thanks for watching

You might also like