Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Google PageRank Algorithm

And
Search Engine Optimisation

Group Members:

Muhammad Salman EE-126


Usama Khalid EE-129
Sheikh Talha EE-112
Kashaf Zain EE-113

Section C
Abstract:
The purpose of this report is to to study and analyse the Google Page Rank algorithm and its
relation with Search Engine Optimisation(SEO).​PageRank is a calculation utilized by Google
Search to rate sites in their search results. It is a method for estimating the significance of site
pages. ​SEO is short for search engine optimization. ​Search engine optimization is a way of
tactics applied to build the measure of guests to a site by getting a high-positioning situation in
the indexed lists page of an internet searcher (SERP) such as Google, Yahoo, Bing and other web
search tools.We will also further evaluate the pitfalls in Google page rank and the possible ways
to improve its working, and we will discuss how this affects our daily lives.[1]

Introduction
Problems with earlier Search Engines and the need for Page Rank:

The Internet is an important part of our everyday lives and everything is best a click away.
Just access your search engine,put in the words, and the search engine will display the pages
relevant for your search. However on what basis is the Search Engine actually working?
Initially it appears affordable to assume that what a search engine does is to hold an index of
all net pages, and while a person sorts in a query search, the engine browses through its index
and counts the occurrences of the key words in every internet document. The winners are the
pages with the very best quantity of occurrences of the words.There are then presented back to
the user. [2]
This was the situation in the earlier years of computing, whilst the first search engines used
primarily text based ranking structures to decide which pages are most applicable to a given
query. However there were certain problems with this method. A search about a common term
such as "net" become complex. the first page displayed by early search engines was in chinese,
with the word “net’’ being repeated continuously with no useful information regarding the
searched term. Suppose we create a website consisting only of the phrase "net" about a million
instances and nothing else. It would then be pointless if the first page displays this website. But if
a search engine is only to note the number of times the searched word or phrase is repeated then
this will happen. [2]
A search engine is reliable only if the end result it displays is relevant to the searched
information. There may additionally of course be millions of internet pages that have a particular
word or sentence; but some of them might be more proper, popular, and authentic than others.
No user has the time and strength now to scan through all the pages to find what their searching.
Relevant pages are expected to be displayed on the top pages returned by means of the search
engine,hence the need for PageRank came to be.[2]
The Google Page Rank Algorithm:

Present day search engines appoint techniques of ranking the results to offer the "quality" results
first which is more useful than simple text ranking..
The set of Rules used by Google Search engine known as the PageRank algorithm, is the
most regarded and influential algorithms for computing the relevance of net pages.Larry Page
and Sergey Brin pioneered this algorithm while they were only graduate students at Stanford.
What inspired the idea was that the significance of any internet web page can be judged by
means of detecting the pages that link to it.[2]
PageRank is a hyperlink evaluation set of rules and it assigns a numerical weighting to every
detail of a hyperlinked set of documents, consisting of the whole internet, with the purpose of
"measuring" its relative importance within the set. The algorithm may be carried out to any series
of entities with reciprocal quotations and references. The numerical weight that it assigns to any
given detail E is referred to as the PageRank of E and denoted through PR(E).
A PageRank results from a set of rules depending completely on the webgraph, using all world
wide web pages as nodes and hyperlinks as edges. Consider authority hubs including cnn.com or
u.s..gov, The rank suggests an importance of a specific page. A link to a web page is counted as a
plus point. It is distinguished recursively and depends at the quantity and PageRank metric of all
pages that link to it ("incoming links"). A web page that is related to by using many pages with
unrestrained PageRank receives a greater rank.[3]

Simplified Pagerank algorithm​ :


The rank is calculated using the following formula:

where

PR(X) is the PageRank of page X,

PR(Yi) is the PageRank of pages Yi which link to


page X,

C(Yi) is the number of outbound links on page Yi,

d is a damping factor having a value along 0 - 1.


The connections among pages is represented by a graph. A node represents a webpage and an
arrow from web page A to web page B means that there is a link from page A to page B. The
wide variety of outgoing links is an essential parameter. We use the commentary“out-degree of a
node” to hold out for the vast types of outgoing links contained in a web page. This graph is
typically called as the net graph. each node in the graph is determined with a web page. we are
able to use the time period “node” and “page” interchangeably. We permit L(p) to be the wide
variety of outgoing hyperlinks in a web page p.[4]
Example: Suppose there are 4 pages. Page A carries a hyperlink to web page B, a link to web
page C, and a link to web page D. page B carries one single hyperlink to web page D. Web page
C factors to pages A and D, and page D factors to pages A and C. they may be represented by the
subsequent graph. we've got L(A) = 3, L(B) = 1 and L(C) = L(D) = 2.

Let N be the total number of pages. We create an N × N matrix A by defining the (i, j)-entry as

In Example 1, the matrix A is the 4 × 4 matrix

Make note that the total of the numbers in every column is identical to 1.
In standard, a matrix is said to be column-stochastic if the entries are non-negative and the sum
of the entries in every column is identical to at least one. The matrix A is through layout a
column-stochastic matrix, furnished that every page carries at the least one outgoing link. [4]
The simplified Pagerank set of rules is: Initialize ‘x’ to an N ×1 column vector with non-negative
components, after which again and again update ‘x’ by means of the product ‘Ax’ till it
converges.
We call the vector ‘x’ the pagerank vector. commonly, we initialize it to a column vector whose
components are identical to each different.
we will consider a bored surfer who clicks the links in a random way. If there are k hyperlinks
within the web page, he/she sincerely selections one of the hyperlinks randomly and is going the
selected page. After a sufficiently long term, the N components of the pagerank vector are at
once proportional to the quantity of instances this surfer visits the N web pages.
In instance 1, we let the element of the vector ‘x’ be ​xA, xB, xC and xD​. Assign ‘x’ to be the
all-one column vector, i.e.,

We examine that the algorithm converges fast in this situation. inside 10 iterations, we can see
that page D has the highest rank. In truth, web page D has 3 incoming links, whilst the others
have both 1 or 2 incoming hyperlinks. It confirms with the purpose of the pagerank algorithm
that a web page with larger incoming links has greater importance. [4]

Pitfalls in the Google PageRank:


Spamming​:
Spamming is an important problem in PageRank. A Method applied to manage search engine
indexes is known as ‘Spamdexing’. Using the rules of the PageRank algorithm, a few web pages
use spams to boom the rank of certain pages. If a page receives a link from a page which has
better PageRank, consequently the PageRank of current page can also grow. Hence spammers
are attempting to make use of such hyperlinks to boom the PageRank in their pages. This
indicates the opportunity of sharing web page ranks.[7]

Storage Problem:
No matter what type of page it is, the number of web pages is increasing non stop. Thus, storing
data is a great pitfall in PageRank. As a matrix can overflow past the ability of the main memory,
compressing records may be the answer. In such scenarios, a updated version of PageRank is
applied. Or else, I/O efficient computations are applied. A variety record processing steps such
as addition multiplication etc, are used to compute the time complexity if the algorithm.. But if
colossal quantities of data, larger than the main memory are to be laboured with, the
computational hassle becomes more complicated.[7]

Improvements in PageRank:
Preventing Spams:
‘Personalization’ is something which is being focused by researchers. Since the mathematical
calculations are tough, this needs to go a long way.TrustRank is first rate method which uses
only the best pages. Fixed number of appropriate pages are relied on. The point is to get benefit
and trust from the best pages and recursively visit the outgoing hyperlinks. however the hassle
that arises right here is that intruders can depart bad hyperlinks someplace in the web page. The
solution for this is by putting a threshold value, if a web page is beneath threshold then it is
considered as spam.[8]

Preventing Storage Problem:


Using the technique of Gap method,comparing the same adjacency list of pages. Adjacency
listing of ‘a’, carries a 1 within the ​i​-th role if the corresponding adjacency list entry i is shared
by way of ‘a’ and ‘b’. The second vector within the reference encoding is a listing of all entries
within the adjacency listing of ‘b’ that aren't located inside the adjacency listing of its reference
‘a’. However we have the hassle of finding what web page serves as a reference web page to
any other web page. since the PageRank vector itself is large and absolutely dense, containing
over eight billion pages,a method to compress the PageRank vector is also recommended. This
encoding of the PageRank vector hopes to keep the ranking data cached in fundamental memory,
thus speeding data processing .[7]

General Improvements​:
Proposals had been made that allow you to reduce the quantity of work in every generation.
Adaptive PageRank was the name given to an improved PageRank. This become due to the
reality that convergence comes quicker for sure pages. What this method does is the isolation of
them in the later computations. It was confirmed to offer a boost of 17%. that is finished if
redundant calculations such as already converged pages aren't taken into consideration. It was
proposed to divides nodes in dangling and non-dangling and use aggregation. What this division
does is lowering to a z × z problem, in which z is the wide variety of non-dangling nodes at the
web [7].
BlockRank is the perfection of PageRank by making very little iteratons. In this version, host
categorization of pages is the main idea. The link shape is used and nevertheless ranking is
carried out domestically. After that, the neighborhood vectors for the hosts are compared from
their significance. In this case, we don't forget a desk bound vector to understand how plenty
time the random surfer spends on that host. It became visible that the speedup was twice as more.
Time is also considered as an improvement factor, as days can be required for the computation
because of the large number of websites. On this new algorithm, internal/outer iterations are
completed as though the damping issue is smaller, it is easier to clear up troubles. An iterative
scheme is applied in which each iteration requires solving every other linear link, which is
similar in its algebraic shape to the authentic one, but with a decrease in damping. [7]

Search Engine Optimisation:


Search Engine Optimization or SEO is the simple activity of making sure a internet site may be
found in search engines like google and yahoo for phrases and terms applicable to what the
website online is supplying. in lots of respects it is actually fine manipulation for web sites. [5]

The search engine optimization of your internet site is basically based upon how without
problems and fast your potential clients can discover your web site through a search engine
question. If your PageRank is excessive, it is much more likely that a patron will locate you,
because the search outcomes would have positioned your page higher at the list. therefore time,
effort, and care need to be given to attaining a high Google PageRank. [6]

Improving your PageRank via SEO:

How are you going to improve your Google PageRank? As with all commercial enterprise, your
internet site should have a proper advertising method. With PageRank, which means you should
get observed now not simply through paying customers, but by way of different websites from
whom you want a backlink. [6]

Along with your internet site already cleaned up and shining with excellent first-class content
and inner SEO techniques (key phrases, meta tags, and so forth), you must have a strategy to get
your website observed via others.[6]

Right here are a few hyperlink constructing techniques that can be powerful in getting a better
Google PageRank:

● Listing your website on treasured and famous directories like Yahoo, Dmoz, or
CitySearch. keep away from “link farms” that provide very little price in your link.
● Get your internet site reviewed by a famous and authoritative evaluation web site. Unpaid
critiques are excellent due to the fact they offer value to clients that aren't swayed via
cash incentives.
● Study other commercial enterprise web sites and inspire them to link to your website.
Joint ventures are an excellent way to trade links and benefit each other’s commercial
enterprise as nicely.
● Be a part of famous online boards and put up precious comments and discussions with
links back to your website. this may get others to do the equal if they prefer what they
see.
● Submit articles on famous article submission websites with hyperlinks for your internet
site. related content material and EzineArticles are desirable examples.
● Write “linkbait” articles a good way to entice the eye of social networks. those articles
may be precious sources, debatable, or humorous. If one of the articles or blog posts on
your web sites is picked up through the social networks, you not only experience a
incredible one way link, but a massive increase of site visitors. [6]

Effects on our daily lives:


The PageRank algorithm has most important consequences on society as it contains a social
impact. it is studied and analysed not only for its scientific development in the field of search
engines, but for its societal impacts.[3]

● T​he mathematics of PageRank are completely general and applicable to any graph or
community in any domain. as a consequence, PageRank is now frequently used in
bibliometrics, social and information network evaluation, and for link prediction and
advice. it's even used for systems analysis of avenue networks, as well as biology,
chemistry, neuroscience, and physics.[3]
● In the medical field, the PageRank of a neuron in a neural network has been located to
correlate with its relative firing rate.
● Personalised PageRank is utilized by Twitter to offer users with other accounts they may
want to follow.
● In any surroundings, a changed model of PageRank may be used to determine species
that are critical to the continuing fitness of the surroundings.[3]
● For the evaluation of protein networks in biology PageRank is likewise a useful
device.[3]
● Overall most of our time on the internet revolves around google search, searching for
directions, people,hobbies,recipes,jobs,restaurants,news and even shopping.

As you can see from the examples above, Google PageRank, search engine optimization and net
advertising and marketing are a inseparable part of our daily routine. we're continuously searching
stuff, being marketed and advertised to. Google and search engine optimization truly aim to make
our daily lives easier. Google and search engine optimization cater to the searcher, so that you’ll be
provided with handiest, the best and maximum relevant results.[9]

Conclusion:
So far we have seen what Google PageRank actually is,the problems which were present in the
algorithm and their respective solutions. But most importantly, being aware of the different
ways in which it is related with Search Engine Optimisation is a highly crucial, both Google
PageRank and SEO have become very significant in this era, and because of them, our lives have
become a lot easier.

You might also like