Professional Documents
Culture Documents
Nouveau Document Microsoft Word
Nouveau Document Microsoft Word
Topics
Learning by Doing
Matrix Operations at Once
Addition and Substraction of Matrices
Multiplication of Matrices
Multiplication and Division of Matrices by a Scalar
Orthogonal Matrices
Transpose and Inverse Properties
Determinants
Tutorial Review
References
Learning by Doing
In Part 1 of this tutorial we introduced the reader to different type of matrices, digraphs, and
markov chains. We used lots of graphics to help users visualize the concepts. Now is time to
discuss matrix operations. As mentioned before, only the most common and basic operations
will be covered. Here we will use a learning-by-doing approach. Thus, rather than staring at
some equations, you must do your part.
The rules for addition, substraction, multiplications and divisions between matrices are as
follows. Let first assume that matrix A and B are used to construct matrix Z. It must follows
that for
The rules for multiplication and division of a matrix by a scalar (a real number) are
simpler. If matrix Z is constructed by multiplying all elements of matrix A by a scalar c, then
its elements are zij = c*aij. In an analogous manner, dividing matrix A by c gives zij = (1/c)*aij.
All these operations are illustrated in Figure 1. Let's revisit these one by one.
Figure 1. Some matrix operations.
To add or substract matrices these must be of identical order. This just means that the matrices
involved must have the same number of rows and columns. If they don't have the same
number of rows and columns we cannot add or substract these.
The expression
means "to element in row i, column j of matrix A add element in row i, column j of matrix B".
If we do this with each element of A and B we end with matrix Z. An example is given in
Figure 2.
Figure 2. Addition operation.
means "to element in row i, column j of matrix A deduct element in row i, column j of matrix
B". If we do this with each element of A and B we end with matrix Z. See Figure 3.
Consider two matrices A and B with the following characteristics: the number of columns in
A equals the number of rows in B. These are conformable with respect to one another, and
they can be multiplied together to form a new matrix Z.
The expression
zij = ai1* b1j + ai2* b2j + ai3* b3j + ... aim* bnj
means "add the products obtained by multiplying elements in each i row of matrix A by
elements in each j column of matrix B". Figure 4 illustrates what we mean by this statement.
Matrix multiplication has a catch as we mentioned before. The order in which we multiply
terms does matter. The reason for this is that we need to multiply row elements by column
elements and one by one. Therefore A*B and B*A can produce different results. We say "can
produce" because there exist special cases in which the operation is conmutative (order does
not matter). An example of this is when we deal with diagonal matrices. Diagonal matrices
were described in Part 1.
The rules for multiplication and division of a matrix by a scalar are similar. Since multiplying
a number x by 1/c is the same as dividing x by c, lets consider these operations at once.
If all elements of matrix A are multiplied by a scalar c to construct matrix Z, hence zij = c*aij.
Similarly dividing matrix A by c gives zij = (1/c)*aij. The expression
zij = c*aij
means "multiply each element in row i column j times c", and the expression
means "divide each element in row i column j by c". These two operations are shown in
Figure 5, where c = 2.
Figure 6 shows that a scalar matrix is obtained when an identity matrix is multiplied by a
scalar. As we will see in Part 3 of this tutorial, deducting a scalar matrix from a regular matrix
is an important operation.
Orthogonal Matrices
A regular matrix (one whose determinant is not equal to zero) M is said to be orthogonal if
when multiplied by its transpose the identity matrix I is obtained; i.e., M*MT = I. Orthogonal
matrices have interesting properties. If M is orthogonal:
1. its transpose and inverse are identical: MT = M-1.
2. when multiplied by its transpose the product is commutative: M*MT = MT*M.
3. its transpose is also an orthogonal matrix.
4. when multipled by an orthogonal matrix the product is an orthogonal matrix.
5. its determinant is +/- 1. The reverse is not necessarily true; i.e., not all matrices whose
determinant is +/- 1 are orthogonal.
6. the sum of the square of the elements in a given row or column is equal to 1.
7. when multiplied, the corresponding elements in two rows or columns-i.e., dot product-
is equal to zero.
Conversely, a square matrix (one with same number of rows and columns) is orthogonal if
the following conditions both exist:
1. the sum of the square of the elements in every row or column is equal to 1.
2. the sum of the products of corresponding elements in every pair of rows or columns
-i.e., dot products- is equal to zero.
As we can see, it is quite easy to determine if a regular or square matrix is orthogonal. Just
look for any of these properties.
(ABC)T =CTBTAT
(ABC)-1 =C-1B-1A-1
A-1A = AA-1 = I = 1
Since matrix division is not defined, it is impossible to divide a matrix expression by a given
matrix. However, the desired effect is achieved by multiplying the expression by the inverse
of the given matrix (2).
Determinants
In the figure, the second subscripts are all distinct, the number of terms is n! and v is the
number of inversions of the second subscripts. Thus, the determinant of a matrix of order n=2
has two terms and 1 negative sign and the determinant of a matrix of order n=3 has 6 terms
and 3 negative signs. Sample calculations are given in Figure 8.
If the determinant of a square matrix is not zero, its matrix is described as a regular matrix. If
the determinant is zero, its matrix is described as a singular matrix. The problem of
transforming a regular matrix into a singular matrix is referred to as the eigenvalue problem.
The eigenvalue problem and two important concepts, eigenvalues and eigenvectors will be
explained in Part 3 of this tutorial.
Tutorial Review
1. Create two different matrices A and B, both of order n = 2. Prove that A*B and B*A
produce different results.
2. Consider the m = n = 2 matrix with the follow elements; a11 = -18; a12 = 29 ; a21 = 30;
a22 = 4. Calculate its trace and its determinant. Is this a regular or a singular matrix? Is
this an invertible or non invertible matrix?
3. Calculate the transpose matrices for the matrices shown in Figure 7. Calculate the
determinants of the transposed matrices. Are these regular or singular matrices? Are
these invertible or non invertible matrices?
References
1. Graphical Exploratory Data Analysis; S.H.C du Toit, A.G.W. Steyn and R.H. Stumpf,
Springer-Verlag (1986).
2. Handbook of Applied Mathematics for Engineers and Scientists; Max Kurtz, McGraw
Hill (1991).
Topics
In Part 1 of this three-part tutorial we defined different type of matrices. We covered digraphs,
stochastic matrices, and markov chains. We also mentioned how some search engine
marketers have derived blogonomies out of these and similar concepts.
It is now time to put everything together, to demystify eigenvalues, eigenvectors, and present
some practical applications.
Equation 1: A - Z = A - c*I.
Equation 2: |A - c*I| = 0
and A has been transformed into a singular matrix. The problem of transforming a regular
matrix into a singular matrix is referred to as the eigenvalue problem.
However, deducting c*I from A is equivalent to substracting a scalar c from the main
diagonal of A. For the determinant of the new matrix to vanish the trace of A must be equal to
the sum of specific values of c. For which values of c?
Calculating Eigenvalues
In the figure we started with a matrix A of order n = 2 and deducted from this the Z = c*I
matrix. Applying the method of determinants for m = n = 2 matrices discussed in Part 2 gives
|A - c*I| = c2 - 17*c + 42 = 0
c1 = 3 and c2 = 14.
Note that c1 + c2 = 17, confirming that these characteristic values must add up to the trace of
the original matrix A (13 + 4 = 17).
The polynomial expression we just obtained is called the characteristic equation and the c
values are termed the latent roots or eigenvalues of matrix A.
There are many scenarios like in Principal Component Analysis (PCA) and Singular Value
Decomposition (SVD) in which some eigenvalues are so small that are ignored. Then the
remaining eigenvalues are added together to compute an estimated fraction. This estimate is
then used as a correlation criterion for the so-called Rank Two approximation.
SVD and PCA are techniques used in cluster analysis. In information retrieval, SVD is used
in Latent Semantic Indexing (LSI) while PCA is used in Information Space (IS). These will
be discussed in upcoming tutorials.
Now that the eigenvalues are known, these are used to compute the latent vectors of matrix
A. These are the so-called eigenvectors.
Eigenvectors
Equation 3: A - ci*I
Multiplying by a column vector Xi of same number of rows as A and setting the results to
zero leads to
Equation 4: (A - ci*I)*Xi = 0
At this point it might be a good idea to highlight several properties of eigenvalues and
eigenvectors. The following pertaint to the matrices we are dicussing here, only.
• the absolute value of a determinant (|detA|) is the product of the absolute values of the
eigenvalues of matrix A
• c = 0 is an eigenvalue of A if A is a singular (noninvertible) matrix
• If A is a nxn triangular matrix (upper triangular, lower triangular) or diagonal
matrix , the eigenvalues of A are the diagonal entries of A.
• A and its transpose matrix have same eigenvalues.
• Eigenvalues of a symmetric matrix are all real.
• Eigenvectors of a symmetric matrix are orthogonal, but only for distinct eigenvalues.
• The dominant or principal eigenvector of a matrix is an eigenvector corresponding
to the eigenvalue of largest magnitude (for real numbers, largest absolute value) of
that matrix.
• For a transition matrix, the dominant eigenvalue is always 1.
• The smallest eigenvalue of matrix A is the same as the inverse (reciprocal) of the
largest eigenvalue of A-1; i.e. of the inverse of A.
If we know an eigenvalue its eigenvector can be computed. The reverse process is also
possible; i.e., given an eigenvector, its corresponding eigenvalue can be calculated.
Let's use the example of Figure 1 to compute an eigenvector for c1 = 3. From Equation 2 we
write
Note that c1 = 3 gives a set with infinite number of eigenvectors. For the other eigenvalue, c2
= 14, we obtain
Figure 3. Eigenvectors for eigenvalue c2 = 14.
As show in Figure 4, plotting these vectors confirms that eigenvectors that correspond to
different eigenvalues are linearly independent of one another. Note that each eigenvalue
produces an infinite set of eigenvectors, all being multiples of a normalized vector. So, instead
of plotting candidate eigenvectors for a given eigenvalue one could simply represent an entire
set by its normalized eigenvector. This is done by rescaling coordinates; in this case, by taking
coordinate ratios. In our example, the coordinates of these normalized eigenvectors are:
Mathematicians love to normalize eigenvectors in terms of their Euclidean Distance (L), so all
vectors are unit length. To illustrate, in the preceeding example the coordinates of the two
eigenvectors are (0.5, -1) and (1, 0.2). Their lengths are
You can do the same and normalize eigenvectors to your heart needs, but it is time consuming
(and boring). Fortunately, if you use software packages these will return unit eigenvectors for
you by default.
This is a lot easier to do. First we rearrange Equation 4. Since I = 1 we can write the general
expression
Equation 5: A*X = c*X
Now to illustrate calculations let's use the example given by Professor C.J. (Keith) van
Rijsbergen in chapter 4, page 58 of his great book The Geometry of Information Retrieval (3),
which we have reviewed already.
This result can be confirmed by simply computing the determinant of A and calculating the
latent roots. This should give two latent roots or eigenvalues, c = 41/2 = +/- 2. That is, one
eigenvalue must be c1 = +2 and the other must be c2 = -2. This also confirms that c1 + c2 =
trace of A which in this case is zero.
An alternate method for computing eigenvalues from eigenvectors consists in calculating the
so-called Rayleigh Quotient, where
For the example given in Figure 5, XT*A*X = 36 and XT*X = 18; hence, 36/18 = 2.
Rayleigh Quotients give you eigenvalues in a straightforward manner. You might want to use
this method instead of inspection or as double-checking method. You can also use this in
combination with other iterative methods like the Power Method.
• 5, 8, -7; then |8| > |-7| > |5| and 8 is the dominant eigenvalue.
• 0.2, -1, 1; then |1| = |-1| > |0.2| and since |1| = |-1| there is no dominant eigenvalue.
One of the simplest methods for finding the largest eigenvalue and eigenvector of a matrix is
the Power Method, also called the Vector Iteration Method. The method fails if there is no
dominant eigenvalue.
1. Asign to the candidate matrix an arbitrary eigenvector with at least one element being
nonzero.
2. Compute a new eigenvector.
3. Normalize the eigenvector, where the normalization scalar is taken for an initial
eigenvalue.
4. Multiply the original matrix by the normalized eigenvector to calculate a new
eigenvector.
5. Normalize this eigenvector, where the normalization scalar is taken for a new
eigenvalue.
6. Repeat the entire process until the absolute relative error between successive
eigenvalues satisfies an arbitrary tolerance (threshold) value.
It cannot get any easier than this. Let's take a look at a simple example.
Figure 6. Power Method for finding an eigenvector with the largest eigenvalue.
What we have done here is apply repeatedly a matrix to an arbitrarily chosen eigenvector. The
result converges nicely to the largest eigenvalue of the matrix; i.e.
Figure 7 provides a visual representation of the iteration process obtained through the Power
Method for the matrix given in Figure 3. As expected, for its largest eigenvalue the iterated
vector converges to an eigenvector of relative coordinates (1, 0.20).
Figure 7. Visual representation of vector iteration.
It can be demonstrated that guessing an initial eigenvector in which its first element is 1 and
all others are zero produces in the next iteration step an eigenvector with elements being the
first column of the matrix. Thus, one could simply choose the first column of a matrix as an
initial seed.
Whether you want to try a matrix column as an initial seed, keep in mind that the rate of
convergence of the power method actually depends on the nature of the eigenvalues. For
closely spaced eigenvalues, the rate of convergence can be slow. Several methods for
improving the rate of convergence have been proposed (Shifted Iteration, Shifted Inverse
Iteration or transformation methods). I will not discuss these at this time.
There are different methods for finding subsequent eigenvalues of a matrix. I will discuss only
one of these: The Deflation Method. Deflation is a straightforward approach. Essentially, this
is what we do:
1. First, we use the Power Method to find the largest eigenvalue and eigenvector of
matrix A.
2. multiply the largest eigenvector by its transpose and then by the largest eigenvalue.
This produces the matrix Z* = c *X*(X)T
3. compute a new matrix A* = A - Z* = A - c *X*(X)T
4. Apply the Power Method to A* to compute its largest eigenvalue. This in turns should
be the second largest eigenvalue of the initial matrix A.
Figure 8 shows deflection in action for the example given in Figure 1 and 2. After few
iterations the method converges smoothly to the second largest eigenvalue of the matrix.
Neat!
Figure 8. Finding the second largest eigenvalue with the Deflation Method.
Note. We want to thanks Mr. William Cotton for pointing us of an error in the original
version of this figure, which was then compounded in the calculations. These have been
corrected since then. After corrections, still deflation was able to reach the right second
eigenvalue of c = 3. Results can be double checked using Raleigh's Quotients.
We can use deflation to find subsequent eigenvector-eigenvalue pairs, but there is a point
wherein rounding error reduces the accuracy below acceptable limits. For this reason other
methods, like Jacobi's Method, are preferred when one needs to compute many or all
eigenvalues of a matrix.
Armed with this knowledge, you should be able to understand better articles that discuss link
models like PageRank, their advantages and limitations, when these succeed or fail and why.
The assumption from these models is that surfing the web by jumping from links to links is
like a random walk describing a markov chain process over a set of linked web pages.
The matrix is considered the transition probability matrix of the Markov chain and having
elements strictly between zero and one. For such matrices the Perron-Frobenius Theorem tells
us that the largest eigenvalue of the matrix is equal to one (c = 1) and that the corresponding
eigenvector, which satisfies the equation
Equation 7: A*X = X
does exists and is the principal eigenvector (state vector) of the Markov Chain, with elements
of X being the pageranks. Thus, according to theory, iteration should enable one to compute
the largest eigenvalue and this principal eigenvector, whose elements are the pagerank of the
individual pages.
If you are interested in reading how PageRank is computed, stay away from speculators,
especially from search engine marketers. It is hard to find accurate explanations in SEO or
SEM forums or from those that sell link-based services. I rather suggest you to read university
research articles from those that have conducted serious research work on link graphs and
PageRank-based models. Great explanations are all over the place. However, some of these
are derivative work and might not reflect how Google actually implements PageRank these
days (only those at Google know or should know this or if PageRank has been phased out for
something better). With all, these research papers are based on experimentation and their
results are verifiable.
There is a scientific paper I would like readers to at least consider: Link Analysis,
Eigenvectors and Stability, from Ng, Zheng and Jordan from the University of California,
Berkeley (5). In this paper the authors use many of the topics herein described to explain the
HITS and PageRank models. Regarding the later they write:
Figure 9. PageRank explanation, according to Ng, Zheng and Jordan from University of
California, Berkeley
Note that the last equation in Figure 9 is of the form A*X = X as in Equation 7; that is, p is
the principal eigenvector (p = X) and can be obtained through iterations.
After completing this 3-part tutorial you should be able to grasp the gist of this paper. The
group even made an interesting connection between HITS and LSI (latent semantic indexing).
If you are a student and are looking for a good term paper on Perron-Frobenius Theory and
PageRank computations, I recommend you the term paper by Jacob Miles Prystowsky and
Levi Gill Calculating Web Page Authority Using the PageRank Algorithm (6). This paper
discusses PageRank and some how-to calculations involving the Power Method we have
described.
How many iterations are required to compute PageRank values? Only Google knows.
According to this Perron-Frobenius review from Professor Stephen Boyd from Stanford (7),
the original paper on Google claims that for 24 million pages 50 iterations were required. A
lot of things have changed since then, including methods for improving PageRank and new
flaws discovered in this and similar link models. These flaws have been the result of the
commercial nature of the Web. Not surprisingly, models that work well under controlled
conditions and free from noise often fail miserably when transferred to a noisy environment.
These topics will be discussed in details in upcoming articles.
Meanwhile, if you are still thinking that the entire numerical apparatus validates the notion
that on the Web links can be equated to votes of citation importance or that the treatment
validates the link citation-literature citation analogy a la Eugene Garfield's Impact Factors,
think again. This has been one of the biggest fallacies around, promoted by many link
spammers, few IRs and several search engine marketers with vested interests.
Literature citation and Impact Factors are driven by editorial policies and peer reviews. On the
Web anyone can add/remove/exchange links at any time for any reason whatever. Anyone can
buy/sell/trade links for any sort of vested interest or overwrite links at will. In such noisy
environment, far from the controlled conditions observed in a computer lab, peer review and
citation policies are almost absent or at best contaminated by commercialization. Evidently
under such circumstances the link citation-literature citation analogy or the notion that a link
is a vote of citation importance for the content of a document cannot be sustained.
Tutorial Review
Prove that these are indeed the three eigenvalues of the matrix. Calculate the
corresponding eigenvectors.
4. Use the Power Method to calculate the largest eigenvalue of the matrix given in
Exercise 3.
5. Use the Deflation Method to calculate the second largest eigenvalue of the matrix
given in Exercise 3.
References
1. Graphical Exploratory Data Analysis; S.H.C du Toit, A.G.W. Steyn and R.H. Stumpf,
Springer-Verlag (1986).
2. Handbook of Applied Mathematics for Engineers and Scientists; Max Kurtz, McGraw
Hill (1991).
3. The Geometry of Information Retrieval; C.J. (Keith) van Rijsbergen, Cambridge
(2004).
4. Lecture 8: Eigenvalue Equations; S. Xiao, University of Iowa.
5. Link Analysis, Eigenvectors and Stability; Ng, Zheng and Jordan from the University
of California, Berkeley.
6. Calculating Web Page Authority Using the PageRank Algorithm; Jacob Miles
Prystowsky and Levi Gill; College of the Redwoods, Eureka, CA (2005).
7. Perron-Frobenius Stephen Boyd; EE363: Linear Dynamical Systems, Stanford
University, Winter Quarter (2005-2006).
Thank you for using this site.
Topics
This tutorial introduces matrices, eigenvalues, and eigenvectors to IR students and search
engine marketers. In Part 1 we go through some definitions and familiarize readers with
different type of matrices. Emphasis is given to stochastic matrices. In Part 2 we stop
momentarily to explain some basic matrix operations. Part 3 demystifies eigenvalues and
eigenvectors, showing how to calculate these.
We hope that presenting the material in this order, i.e., visualization of matrices first, followed
by matrix operations, might help students to associate math operations with what they have
visualized already. Currently, many matrix tutorials intermingle execution with visualization,
forcing students to stop and do a one-by-one mapping between text and graphics, before
processing new material. In our opinion that approach injects to the discourse an unnecessary
level of difficulty.
By separating visualization from execution, by the end of this tutorial the reader will be able
to discriminate between different type of matrices. Students will be able to identify key
concepts such as the rank of a matrix, digraphs, markov chains, and other key concepts
without resourcing to math operations.
We do not pretend to make a comprehensive review out of this tutorial. Rather the material is
limited to what we think might be relevant to link models and cluster structures. Applications
and examples are provided.
Most of the material and examples are taken from two great books (1, 2) I read way back
while in grad school and before the inception of commercial search engines (Google, Yahoo,
MSN, etc) in the Web scene:
1. Graphical Exploratory Data Analysis; S.H.C du Toit, A.G.W. Steyn and R.H. Stumpf,
Springer-Verlag (1986).
2. Handbook of Applied Mathematics for Engineers and Scientists; Max Kurtz, McGraw
Hill (1991).
Why we wrote this tutorial for an audience consisting of IR students and search marketers?
Well, there are plenty of reasons. Consider this:
Let us first define what is a matrix and go through some basic definitions.
A matrix is just a rectangular array of rows (m) and columns (n); that is, a table. Thus, tabular
data entered into an Excel spreadsheet can be viewed as a matrix. If you run a mom-n-pop
business and for some reason you have arranged numbers or letters in rows and columns, you
have handled matrices already.
If a matrix has the same number of rows (m) and columns (n) is termed a square matrix; i.e.,
m = n. The matrix is said to be of the nth order or of order n. Thus, an array consisting of two
rows and two columns is a square matrix of order m = n = 2 and an array consisting of three
rows and three columns is a square matrix of order m = n = 3.
Elements of a matrix are identified by assigning subscripts to rows and columns. Thus, for
matrix A its elements are aij. For instance, a32 means element in row 3 column 2.
The diagonal extending from the upper-left corner to the lower-right corner of a square matrix
is termed the principal. The elements of the principal are termed the principal elements or
diagonal elements. The sum of the principal is the trace of the matrix. The trace is an
important concept, as we will see in Part 1 and Part 2 of this tutorial. These concepts are
illustrated in Figure 1.
A one-row matrix is a called a row vector. Similarly, a one-column matrix is termed a column
vector. A null matrix is one with all elements being zero.
A matrix in which all nondiagonal elements have zero value is a diagonal matrix. If all
elements of a diagonal matrix are equal, we call this a scalar matrix. If all elements of a
scalar matrix are 1 this is termed a unit matrix or an identity matrix, I.
A transpose matrix AT is obtained by converting rows into columns and columns into rows.
Some of these definitions are illustrated in Figure 2.
A matrix in which all elements above or below the principal have zero value is a triangular
matrix. Moreover, a triangular matrix is classified as lower-triangular or upper-triangular,
respectively, according to whether the zero elements lie above or below the principal.
The rank of a matrix is equal to the number of linearly independent rows or linearly
independent columns it contains, whichever of these two number is smaller. Accordingly, the
rank of a square matrix is equal to the number of nonzero rows in its upper-triangular matrix
or the number of nonzero columns in its equivalent lower-triangular matrix, whichever of
these two number is smaller.
Figure 3 shows a square matrix and its equivalent triangular matrix. This was obtained by
subjecting the matrix to elementary column operations. Don't worry for now about
tranforming a square matrix into a triangular matrix. What is important is the following: since
B contains 3 nonzero columns, A is of rank 3.
Figure 3. Rank of a square matrix.
Another way of computing the rank of a matrix involves the use of singular values. This will
be discussed in an upcoming tutorial on Singular Value Decomposition (SVD).
Some times search engine marketers and those that sell links quote papers about link models,
not knowing that the term "rank" of a link graph is used in those articles in reference to the
rank of a matrix and not in reference to any web page ranks (i.e., positioning of search
results). Next thing one reads from these marketers is what we call a bunches of blogonomies.
We call a "blogonomy" the dissemination of false knowledge through blogs or public forums
and "blogorrhea" when a false concept is promoted for a profit.
If all elements of a matrix are non negative, we can normalize rows by adding row elements
together and dividing each element by the corresponding row totals. Obviously, adding
together normalized row elements equals 1. In general, a matrix whose sum of all row
elements (or column elements) equals 1 is called a stochastic matrix. Elements of a
stochastic matrix can be zero if their row totals (or column totals) equal 1. See Figure 4.
A directed graph or digraph consists of a number of points (nodes) linked together by arrows
or lines also called edges. Arrows indicates the direction of the relationship between two
nodes. The number of arrows ending at a specific node is called the outdegree of the node and
the number of arrows leading from it, is called the indegree.
To illustrate these concepts, let me use the example presented by the authors of Graphical
Exploratory Data Analysis (1) from 1986.
Here they represented the friendship between six individuals as a digraph (any similarity with
link graphs flying around?). The direction of the arrows says it all. 1, 3 and 6 consider 2 a
friend, but 2 is friendly with 3, only.
The following array describes how the nodes are related. Note that row totals give outdegrees
and column totals indegrees.
Figure 6. Indegrees and outdegrees for the friendship between six persons.
When these type of relationships are represented in matrix notation the resultant array is
called an adjacency matrix. Dividing each row element of the adjacency matrix by the
correponding outdegree, yields a row-stochastic matrix,
Now that we have the basic ideas clarified, let's move forward and talk about random
processes.
A random process is a process or series of events that occur by chance. If the process evolve
in time is called a Markov chain. Looking at some of the stochastic matrices we have
derived, if instead of mere numbers the elements represent probabilities pij these are called
transition probabilities. The corresponding matrix is termed a transition matrix
Therefore, it can be said that a Markov chain is just a random process evolving in time
according with the transition probabilities of the Markov chain.
SEO Blogonomies: The Search Engine Markov Chain
Many blogonomies are promoted by well known SEO and SEM specialists. These folks are
called "experts" by their followers and pose as such in their SEM conferences. They often
quote each other or call each other "experts". Many of these folks like to write the fine line of
fallacies, producing material where false concepts are decorated with scientific terms and
"fat" words. They are also experts in damage control and in saving face.
Some SEOs have written -giving the impression to readers- that search engines use a mythical
Markov Chain to find patterns in search engine search results or sites, like if such chain is a
special kind of detection instrument, tool or technique that is applied to find keyword patterns
in a web page or to detect how the document was optimized. This is pure non sense.
There is no such thing as a mythical Search Engine Markov Chain, which only exists in the
mind of these folks and followers, who often misquote research articles. A markov chain is
simply a random process that occurs over time according to some transition probabilities.
Suppose we run an experiment that has N possible results (states). Suppose that we keep
repeating the experiment and that the probability of each of the results or states occurring on
the (N+1)th repetition depends only on the result of the Nth repetition of the experiment. This
is called a markov chain.
Thus a markov chain is not an instrument, technique, tool or the like that allegedly is used by
search engines to rank web pages or to find word patterns in documents. True that there is a
lot of research in which things have been modeled as markov processes in an attempt at
understanding better behaviours and link graphs, but the analogy stops there.
True that there is something called an absorbing markov chain, but this is a specific case
involving random walks with absorbing states. Perhaps it might be a good idea to write a
tutorial on regular markov chains and absorbing markov chains or, better, recommend readers
to take a look at the book of James T. Sandefur, Discrete Dynamical Systems, Theory and
Applications (Oxford University Press; Chapter 6 Absorbing Markov Chains) (3). If you like
fractals, chaos and iterations, this book is for you.
Meanwhile, if while drunk you walked randomly from one point to another, chances are that
you might have "markov-chained yourself", already.
What's Next?
What all this discourse has to do with web links (linked web pages)? Well, consider a random
walk over a set of linked pages. This can be defined by a transition matrix, which in this case
is the links matrix. The largest eigenvector of the transition matrix tell us the probabilities of
the walk ending on the candidate pages. To understand the significance of this statement we
first need to define what we mean by the largest eigenvector and how these are computed.
This and the calculations involved, will be explained, step by step in Part 2 and Part 3 of this
tutorial.
Tutorial Review
References
1. Graphical Exploratory Data Analysis; S.H.C du Toit, A.G.W. Steyn and R.H. Stumpf,
Springer-Verlag (1986).
2. Handbook of Applied Mathematics for Engineers and Scientists; Max Kurtz, McGraw
Hill (1991).
3. Discrete Dynamical Systems, Theory and Applications; James T. Sandefur, Oxford
University Press; Chapter 6 Absorbing Markov Chains (1990).