Project Part 2

1
PAGE RANK
ALGORITHM
2
ABSTRACT
3
ABSTRACT
The way in which the displaying of the web pages is done within a search is not a
mystery. It involves applied math and good computer science knowledge for the right
implementation. This relation involves vectors, matrixes and other mathematical notations.
The matrices hold the link structure and the guidance of the web surfer. As links are added
every day, and the number of websites goes beyond billions, the modifications of the web
link’s structure in the web affects the PageRank. In order to make this work, search
algorithms need improvements. This topic pays attention to various algorithms of calculating
page rank of a webpage and display of a simple model of a page using Python.
4
CHAPTER 1
INTRODUCTION
5
Introduction:
Most search engines, including Google, continually run an army of computer

programs that retrieve pages from the web, index the words in each document, and store this
information in an efficient format. Each time a user asks for a web search using a search
phrase, such as "search engine," the search engine determines all the pages on the web that
contains the words in the search phrase. Roughly 95% of the text in web pages is composed
from a mere 10,000 words. This means that, for most searches, there will be a huge number
of pages containing the words in the search phrase. What is needed is a means of ranking the
importance of the pages that fit the search criteria so that the pages can be sorted with the
most important pages at the top of the list. Google's PageRank algorithm assesses the
importance of web pages without human evaluation of the content.
PageRank is an algorithm that drives Google. It was invented by Larry Page and
Sergey Brin while they were graduate students at Stanford University and later it became a
trademark of google in 1998. PageRank does not have an impact only in the programming
industry, but also have an effect in the economic sector.
There is a math behind every algorithm. Matrix and vector can be considered as the
main source of many achievements. Since the number of websites is growing day by day, the
web sites have to be ranked using certain algorithms. The specialty of Google’s page rank is
that it does not allow spams, which are webpages that are coded in such a way to manipulate
the ranking results and that go against the guidelines established by Google. It also focuses on
the importance of the page when it is pointed by other important nodes. The functioning of
PageRank algorithm also realizes the importance of linear equations and graphs. This report
will provide a short summary of math behind the page ranking algorithm.
6
CHAPTER 2
PAGERANK
History:
PageRank was developed at Stanford University by Larry Page (hence the

name PageRank) and Sergey Brin in 1996 as part of a research project about a new kind of
search engine. Sergey Brin had the idea that information on the web could be ordered in a
hierarchy by “link popularity”: a page is ranked higher as there are more links to it. It was co-
authored by Rajeev Motwani and Terry Winograd. The first paper about the project,
describing PageRank and the initial prototype of the Google search engine, was published in
1998 shortly after, Page and Brin founded Google Inc., the company behind the Google
7
search engine. While just one of many factors that determine the ranking of Google search
results, PageRank continues to provide the basis for all of Google’s web search tools.
PageRank has been influenced by citation analysis, early developed by Eugene
Garfield in the 1950s at the University of Pennsylvania, and by Hyper Search, developed by
Massimo Marchiori at the University of Padua. In the same year PageRank was introduced
(1998), Jon Kleinberg published his important work on HITS. Google’s founders cite
Garfield, Marchiori, and Kleinberg in their original paper.
Text Ranking Vs Page Ranking:
In the early 90’s, the first search engine used text based ranking systems that
decided which page is relevant based on the text. There are many drawbacks with this
approach. Say for example, if a person search with the keyword “Internet”, can be
problematic. The surfer might get a page with the keyword internet that does not have any
information regarding internet in the displayed page.
Moreover, suppose we wanted to find some information about Google. We
type in the word "Google" and expect that "www.google.in" would be the most relevant site
to our query. However, there may be millions of pages on the web using the word Google,
and “www.google.in” may not be the one that uses it most often. Suppose we decided to write
a web site that contains the word "google" a billion times and nothing else. Would it then
make sense for our web site to be the first one displayed by a search engine. The answer is
obviously no.
Since the search engine uses the count of occurrence of words in the given
query, it doesn’t make sense for the most searched page to be displayed first. There might be
millions of web pages that have the searched word and when the search engine brings
allthose pages, it sounds useless for the surfer. Also, the surfer does not have the patience to
go through all thepages that contained the searched word to arrive at the page he/she
searching for. Usually, the user expects the relevant page to be displayed in the top 20-30
pages provided by search engine. A modern search engine uses themethod of providing best
results first that are more appropriate than the older text ranking method.
One of themost influential algorithms is the Page Rank algorithm which is
used by Google search engine. The main ideabehind the page rank algorithm is that the
importance of a web page is predicted by the pages linking to it. If wecreate a web page i that
has a hyperlink connected to page j then page j is considered as important. On the otherhand,
if page j has a backlink from page k (like www.google.com) we can say k transfers its control
to j (i.e., kasserts that j is important). We can iteratively assign a rank to each page based on
the number of pages that points to it.
Description of Page Rank:

8
Quoting from the original Google paper, PageRank is defined like this:
PageRank can be thought of as a model of user behaviour. Assume there is a
We assume page A has pages T1,…,Tn which point to it (i.e. are citations). The parameter d is
a damping factor which can be set between 0 and 1. We usually set d to .85. There are more
details about d in the next section. Also, C(A) is defined as the number of links going out of
Page A. The Page Rank of a Page A is given as follows:
PR(A)=(1-d) + d(PR(T1)/C(T1)+...+PR(Tn)/C(Tn))
Note that the Page Rank form a probability distribution over web pages, so the sum of all web
pages’ will be one.
Page Rank or PR(A) can be calculated using a simple iterative algorithm, and corresponds to
"random surfer" who is given a web page at random and keeps clicking on links, never hitting
"back" but eventually gets bored and starts on another random page. The probability that the
random surfer visits a page is its PageRank. And, the d damping factor is the probability at
each page the "random surfer" will get bored and request another random page. One
important variation is to only add the damping factor d to a single page, or a group of pages.
This allows for personalization and can make it nearly impossible to deliberately mislead the
system in order to get a higher ranking. Another intuitive justification is that a page can have
a high PageRank if there are many pages that point to it, or if there are some pages that point
to it and have a high PageRank. Intuitively, pages that are well cited from many places
around the web are worth looking at. Also, pages that have perhaps only one citation from
something like the Yahoo! homepage are also generally worth looking at. If a page was not
high quality, or was a broken link, it is quite likely that Yahoo's homepage would not link to
it. PageRank handles both these cases and everything in between by recursively propagating
weights through the link structure of the web.
Damping Factor:
The behaviour of PageRank also depends on the function of damping factor,

especially what happens when it goes close to 1. Usually, the damping factor is a decay
factor. It usually represents when a user gets bored of the current page and stop clicking the
current page and move to a random page. This new page may be a new urlinstead of
following from the previous page link. This was initially set to 85% or 0.85. Hence the
damping factor is mostly considered to be 0.85. And also, there is 15% of chance for the user
to visit a random page without following the link from the current page.
From that, we can see that this probability of a click-through is a good way to
prevent spams and pages without outgoing links and does not allow them to get the PageRank
of others. If we use value 1, the forever clicking link process will end up in spam sites and if
9
0 than we will have random restart and a uniform distribution. In short, whatever the number
of in links, there will be a probability (1-damping factor) in order for a page to have a
minimum PageRank. So, a value in between 0.9 and 0.85 deals with not only making the
right calculations but the convergence is not so quick and avoids the growing higher of the
PageRank.
Hyperlink Anatomy of the Web:
The number of pages in the web may be demonstrated as nodes in a directed

graph. The edges between nodes represent links between graphs. A graph of a simple 5-node
web is demonstrated in Figure 1 below. The directed edge from node a to node b signifies
that page a links to page b. However, page b does not link to page a, so there is no edge from
node b to node a.

Fig 1 DAG of a Web page

Fig 2 Graphical Demonstration of a five-page web

Usually, most important pages will have more inlinks. Inlinks from important
pages will also have a more effect on PageRank for a particular page than inlinks from
marginal pages. The calculation of PageRank is recursive. The method for evaluating
PageRank starts by demonstrating the directed Web graph as the square matrix of dimension
n × nnamed “Adjacency Matrix”, A, where n is the number of webpages.If webpage i has
1
l i ≥ 1links to other webpages and webpage i links to the webpage j of Aij = .
lj
Else, Aij =0.
For the above directed graph Adjacency matrix is:

10

0 .5 0 .5 0
[
0 0 1 0 0
A= 1 0 0 0 0
0 0 0 0 1
0 0 0 0 0
]
Links are number of non-zero elements in Adjacency matrix or Hyperlink
matrix.
Dangling Nodes:
The Web pages with no outlinks are called dangling nodes. Further, the remaining
Web pages, having at least one outlink are identified as Non-Dangling nodes. For the
execution of PageRank, we must have to resolve how to deal with dangling nodes and this
decision has influence on PageRank that we will compute. For above graph dangling node
can be handled by replacing dangling node row of Hyperlink matrix A by the probability
distribution vector,w, the stochastic n-dimensional row vector that sum equals to 1, n is
number of Web pages or nodes. The resultant matrix can be of the form,
P= A+ dw, where d is a dangling node vector, n-dimensional column vector that is:
d= 1 , if l i=0
{0 , else
(1)
w=[w1 w2 w3 .... wn ] (2)
The most appropriate selection for w is the uniform row vector that is represented by:
111 1
w=[ ... ]
nnn n
Similarly, for the above directed graph w will be:
11111
w=[ ]
55555
This novel matrix P should not contain any zero row means that all nodes have at least one
outlink. On this ground matrix P for above 5 Web page directed graph can be written as:
0 .5 0 .5 0
[ ]
0 .5 0 .5 0 0
[ ][]
0 0 1 0 0
0 0 1 0 0 0 11111
1 0 0 0 0
P= A+ dw= 1 0 0 0 0 + 0 [ ]=
55555 0 0 0 0 1
0 0 0 0 1 0
1 1 1 1 1
0 0 0 0 0 1
5 5 5 5 5
11
12
CHAPTER 3
GENERATING PAGE RANK
13
Generating Page Rank:
Suppose that page P j has l j links. If one of those links is to page Pi, then P j
will pass on 1/l j of its importance to Pi . The importance ranking of Pi is then the sum of all
the contributions made by pages linking to it. That is, if we denote the set of pages linking
to Pi by Bi, then
I (P j)
I (Pi)= ∑ (3)
P ∈Bj i
lj
In order to determine the importance of a page, we first need to know the

importance of all the pages linking to it. However, we may recast the problem into one that is
more mathematically familiar. Let’s first create a matrix, called the hyperlink matrix, in
which the entries in the i th row and the j th column is:
{
Aij = l j
, if l j ∈ Bi
0 , otherwise
(4)
Notice that A has some special properties. First, its entries are all nonnegative.
Also, the sum of the entries in a column is one unless the page corresponding to that column
has no links. Matrices in which all the entries are nonnegative and the sum of the entries in
every column is one are called stochastic. We will also form a vector I =¿)]whose
components are PageRank--that is, the importance rankings--of all the pages. The condition
above defining the PageRank may be expressed as:
I=AI (5)

In other words, the vector I is an eigenvector of the matrix A with eigenvalue
1. This can be called as the stationary vector of A.
Computing PageRank with Matrix method:
Consider the following graph with different nodes and links connecting them:
A B
C D
Figure3
14
The matrix A for the directed graph with vertex pointing from i to j is given as below
0 1/3 1 /3 0 1/3
[
1/3 0
A= 1/2 0
0 1/ 3
0 0
0 1/3 1 /3 0
0 0 0 0
1/3
1 /2
1/3
0
]
The below table demonstrates the number of inlinks for each and every node.
Table 1: Illustration of the number of links of each node of the diagram
Nodes A B C D E
Number of Nodes 2 2 2 2 4
To illustrate the calculation of page frank of the fig 3 we have the following steps:
Step 1: Take a 0.85 * a page’s PageRank, and divide it by the number of out links on
the page.
Step 2: Add that amount onto a new total for each page it’s being passed to.
Step 3: Add 0.15 to each of those totals.
As we start at zero, we will have 0.85*0=0. This leads that each page will get
0.15 as 0.15+0=0.15. But still we have the importance based on links.
So, now calculations become:
 Page A links to pages B, C and E. Page A’s PageRank is 0.15 so it will add
0.85* 0.15 = 0.1275 to the new PageRank scores of the pages it links to. There
are three of them so they each get 0.0425.
 Page B links to page A, C, E. Page B’s PageRank is 0.15 so it will add 0.85 *
0.15 =0.1275 to the new PageRank score of the pages it links to. Since it links to
page A, C, E, each gets 0.0425.
 Page C links to Page A, E, each 0.06375. Page D links to Page B, C, E. each 0.
0425.Page E links to none.

As a result,
Page A: 0.15 (base) + 0.1275 +0.0425(from Page C, B) = 0.35
Page B: 0.15 (base) + 0.0425+ 0(from Page A, D) = 0.1925

15
Page C: 0.15 (base) + 0.0425 (from Page A) + 0.0425 (from Page D) = 0.235
Page D: 0.15 (base) + 0.0425 (from Page B) = 0.1925 Page E: 0.15 (base)
Computing I:
There are different ways of calculating the eigenvectors. But the challengehere
is that the hyperlink matrix, H is a 45.3 billion×45.3 billion matrix! Studies show that on an
average a web page has 10 links going out, meaning almost all but 10 entries in each column
are 0.
Let us consider the power method for calculating the eigenvector. In this
method, we begin by choosing a vector I (0)(which is generally considered to be ¿)’) as a
candidate for I and then produce a sequence of vectors I (K ) such that
(K +1) (K)
I =AI
There are issues regarding the convergence of the sequence of vectors ( I (n)).
Matrix under consideration must satisfy certain conditions.
Consider the web shown in the figure 4 given below
A B
Figure 4
with hyperlink matrix
A= [ 01 00]
The algorithm defined above applies as
I (0) = 1 I (1)= 0 I (2)= 0 I (3)= 0

[] [] [] []
0 1 0 0
In this web, the measure of importance of both pages is zero, indicating nothing about
the relative importance of these pages. Problem arises as page 2 has no links going out.
Consequently, page 2 takes some of the importance from page1 in each iterative step but does
not pass it on to any other page, draining all the importance from the web. Pages with no
links are called dangling nodes, and there are, of course, many of them in the real web. We’ll
now modify A.
16
Probabilistic interpretation of A:
Assume that we are on a particular web page, and we randomly follow one of its links
to another page ie., if we are on page P j with l j links, one of which takes us to page Pi , the
1
probability that we next end up on page Pi is then .
lj
As we surf randomly, let T j be the fraction of time that we spend on page P j. Then,
Tj
the fraction of time that we spend on page Pi coming from its link in page P j is . If we end
lj
up on page Pi, then we must have come from some page linking to it, which means
Tj
T i= ∑
P ∈B l j
j i
From the equation we defined for PageRank rankings, we see that I( Pi) = T i
which can be understood as a web page’s PageRank is the fraction of time a random surfer
spends on that page.Notice that, given this interpretation, it is natural to require that the sum
of the entries in the PageRank vector I be 1, since we are considering fraction of times spent
on each page.
There is a problem with the above description, if we surf randomly, then at
some point we might end up at a dangling node. To overcome this, we pretend that a dangling
node has a link to all the pages in the web. Now, the hyperlink matrix A is modified by
1
replacing the column of zeroes (if any) with a column in which each entry is where n is the
n
total number of web pages. Let this matrix be denoted by S.
Again, consider the web in figure 3 where
1 1
S=
[ ] []
0
1
2
1
2
and I=
3
2
3
meaning P2has twice the measure of importance of P1, which seems reasonable now.
Note: S is also a column-stochastic matrix.
Let B be a matrix (with size same as of A) whose all entries are zero exceptfor the
1
columns corresponding to the dangling nodes, in which each entry is , then S =
n
A+B.
17
Now, consider the web shown below:
Fig 5
0 1 0 1
[ ] [] (0)
Where S= 0 0 1 and let I = 0 using power method, we see that
1 0 0 0
0 0 1 0
(1)
[] [] [] []
(2) (3) (4)
I =0 I =1 I =0 I =0 …
1 0 0 1
In this case Power method fails because,1 is not a simple eigen value of the matrix S.
Consider the web shown below:

18
Fig 6
1
0 0 0 0
[ ]
2
1
0 0 0 0 1
[]
2
0
1 1 1 (0 )
S= 0 0 and let I = 0 now, using power method
2 2 2
0
1 1
0 0 0 0
2 2
1 1 1
0 0
2 2 2
0 0.25 0
[] 0.5
I ( 1) = 0
0
0.5
(2)
[]
I = 0.5
0
.25
0
[]
.125
I (3)= .125
.25
.5
0 0 0
(13)
… I = .3325
[] [] []
.0001
.3332
.3341
(14)
0
I = .3337
.3333
.3328
(15)
0
I = .3331
.3333
.3335
0
0
[]
Hence I = .333 where page ranks assign to page1 and page 2 are 0 which is
.333
.333
unsatisfactory as page 1 and page 2 have links coming in and going out of them.
19
The problem here is that the web considered has a smaller web in it, ie., pages
3,4,5 are a web of themselves. Links come into this sub web formed by pages 3,4,5, but none
go out. Just as in the example of the dangling node, these pages form an” importance sink”
that drains the importance out of the other two pages. In mathematical terms power method
doesn’t work here as S is not irreducible.
20
CHAPTER 4
GOOGLE MATRIX
Google matrix:
We will modify S to get a matrix which is irreducible and has 1 as a simple
eigenvalue. As it stands now, our movement while surfing randomly is determined by S ie.,
either we follow one of the links on the current page or, if we are at a page with no links, we
randomly choose any other page to move to. To make our modification, we will first choose a
parameter
α ∋ 0<α <1
Now, suppose we move in a slightly different way. With probability α we are
guided by S, and with probability 1−α we choose the next page at random. Now we obtain the
Google Matrix
1
G=αS+(1−α ) J
n
where J is a matrix, all of whose entries are 1.
Parameter α:
The role of the parameter α is important. If α = 1 then, G = S which means we

1
are dealing with the unmodified version. If α = 0 then G= J which means the web we are
n
considering has a link between any two pages and we have lost the original hyperlink
structure of the web. Since, α is the probability by which we are guided by S, we would like
to choose α closer to one, so that the PageRanks are weighted heavily into the calculations.
λ2
But, the convergence of the power method is geometric with ratio ||λ1
,where λ 1 is the
eigenvalue with maximum magnitude and λ is the eigenvalue closest in magnitude to the
magnitude of λ 1. Hence power method converges slowly if λ 2 is close to λ 1.
21
22
CHAPTER 5
CHAPTER 5
ISSUES AND ADVANCES OF PAGE RANK
Spam issue with PageRank:
One of the main issues in PageRank is spamming. Spamdexing is a way used

to manipulate search engine indexes. While making use of the PageRank algorithm, some
web pages use spams in order to increase the rank of certain pages. When a page receives link
from a page which has higher PageRank then the PageRank of current page will also
increase. Hence, they try to make use of such links to increase the PageRank of their pages.
This shows the possibility of sharing page rank in connected nodes.
Inspired by this problem, researchers are focusing in a new area called

Personalization. It is an area yet to be explored as the mathematical calculations are hard. A
best-known technique is that of TrustRank which only used the good pages. They rely on the
expansion of a set of good pages. The idea is to gain trust from the good pages and
recursively go to the outgoing links. But the problem that arises here is that intruders can
leave bad links somewhere in the page. The solution for this is found by inserting a threshold
value, if a page is below threshold then it is considered as spam.
Storage Issue with PageRank:

23
Regardless of being a good page or a spam page, the number of web sites
keeps growing. For this reason, storing memory is a real issue in PageRank. As a matrix can
exceed the capacity of main memory, compressing data may be the solution. In such cases, a
modified version of PageRank is used. Or if not, they try to implement I/O efficient
computations. Generally, the time complexity of an algorithm is calculated by the number of
data processing steps as addition, multiply, etc. But when the data to be worked with are in
huge amount and they are bigger even than the main memory, the computational problem
becomes more complex. In this case, it will be the number of how many times the disk has
been accessed, which will hold importance rather than the time of running. We know that
cached data is much faster to be accessed than that in main memory, so the algorithms should
be cache friendly somehow. Also, there is another alternative of data compression in order to
fit the main memory, but that requires also a modification in the PageRank algorithm. As the
PageRank vector has to be consulted in order for the query to be processed, speeding the
process requires the help of cache memory. A best-known technique is that of gap method.
The idea here, it is that a page has inlinks from pages labelled next to it. For example: if a
page is labelled 50, most probably it will have inlinks from pages labelled 49,51. This is the
locality principle.
Another idea is that of comparing the similar adjacency list of two pages.
Adjacency list of x, contains a 1 in thei th position if the corresponding adjacency list entry i is
shared by x and y. The second vector in the reference encoding is a list of all entries in the
adjacency list of y that are not found in the adjacency list of its reference x. But this is not so
used as in this case we have the problem of finding what page serves as a reference page to
another page. Since the PageRank vector itself is large and completely dense, containing over
8 billion pages, and must be consulted in order to process each user query, has also suggested
a technique to compress the PageRank vector. This encoding of the PageRank vector hopes to
keep the ranking information cached in main memory, thus speeding query processing.
Advances:
Google Panda is a change to the Google’s search results ranking algorithm that
was first released in February 23, 2011. The change aimed to lower the rank of low-quality
sites or thin sites, and return higher-quality sites near the top of the search results. CNET
reported a surge in the rankings of news websites and social networking sites, and a drop-in
ranking for sites containing large amounts of advertising. This change reportedly affected the
rankings of almost 12 percent of all search results. Soon after the Panda rollout, many
websites, including Google’s webmaster forum, became filled with complaints of
scrapers/copyright infringers getting better rankings than sites with original content. At one
point, Google publicly asked for data points to help detect scrapers better. Google’s Panda
has received several updates since the original rollout in February 2011, and the effect went
global in April 2011. To help affected publishers, Google published an advisory on its blog,
thus giving some direction for self-evaluation of a website’s quality. Google has provided a
list of 23 bullet points on its blog answering the question of “What counts as a high-quality
site?” that is supposed to help webmasters step into Google’s mind-set.
24
Google Panda was built through an algorithm update that used artificial
intelligence in a more sophisticated and scalable way than previously possible. Human
quality testers rated thousands of websites based on measures of quality, including design,
trustworthiness, speed and whether or not they would return to the website. Google’s new
Panda machine-learning algorithm, made possible by and named after engineer Navneet
Panda, was then used to look for similarities between websites people found to be high
quality and low quality.
Google Penguin is a code name for a Google algorithm update that was first
announced on April 24, 2012. The update is aimed at decreasing search engine rankings of
websites that violate Googles Webmaster Guidelines by using black-hat SEO techniques,
such as keyword stuffing, cloaking, participating in link schemes, deliberate creation of
duplicate content, and others. Penguin update went live on April 24, 2012.
By Googles estimates, Penguin affects approximately 3.1% of search queries

in English, about 3% of queries in languages like German, Chinese, and Arabic, and an even
bigger percentage of them in highly-spammed languages. On May 25th, 2012, Google
unveiled the latest Penguin update, called Penguin 1.1. This update, was supposed to impact
less than one-tenth of a percent of English searches. The guiding principle for the update was
to penalise websites using manipulative techniques to achieve high rankings. Penguin 3 was
released Oct. 5, 2012 and affected 0.3% of queries.
In January 2012, so-called page layout algorithm update was released, which
targeted websites with little content above the fold. The strategic goal that Panda, Penguin,
and page layout update share is to display higher quality websites at the top of Googles
search results. However, sites that were down ranked as the result of these updates have
different sets of characteristics.
25
CHAPTER 6
PYTHON
Python:
Python is a widely used general-purpose, high level programming language. It
was initially designed by Guido van Rossum in 1991 and developed by Python Software
Foundation. It was mainly developed for emphasis on code readability, and its syntax allows
programmers to express concepts in fewer lines of code. Python is a programming language
that lets to work quickly and integrate systems more efficiently.
26
The major Reason for increasing popularity of Python language is as follows:

1. Emphasis on code readability, shorter codes, ease of writing
2. Programmers can express logical concepts in fewer lines of code in comparison to
languages such as C++ or Java.
3. Python supports multiple programming paradigms, like object-oriented, imperative
and functional programming or procedural.
4. There exist inbuilt functions for almost all of the frequently used concepts.
5. Philosophy is “Simplicity is the best”.
Comparison of Java and Python:

27
Java Python
Compilation Java is a compiled programming Python is an interpreted
process language programming language
Code Length Longer lines of code as compared to 3-5 times shorter than equivalent
python. Java programs.
Syntax Define particular block by curly No need of semi colons and
Complexity braces, end statements by ; curly braces, uses indentation
Ease of typing Strongly typed, need to define the Dynamic, no need to define the
exact datatype of variables exact datatype of variables.
Speed of Java is much faster than python in Expected to run slower than Java
execution terms of speed. programs
Multiple Multiple inheritance is partially Provide both single and multiple
Inheritance done through interfaces inheritance
Uses of Python:
Python is easy to use, powerful, and versatile, making it a great choice for beginners
and experts alike. Python’s readability makes it a great first programming language — it
allows us to think like a programmer and not waste time with confusing syntax. For instance,
look at the following code to print “hello world” in Java and Python.
Instead of focusing on how to get our code to even run, we’ll be able to focus
on learning actual programming concepts. And once we have those tools under your belt, if
we move on to other languages, we’ll be able to easily understand a given piece of code.
But don’t think that because Python is easy to use it’s a wimpy language.
Python is incredibly powerful — there’s a reason companies like Google, Dropbox, Spotify,
and Netflix use it.
28
The Dropbox dead client is written entirely in Python, which speaks to its

cross-platform compatibility. Dropbox has about 400 million users and considering it isn’t
bundled with any operating system distribution, that’s a lot of users downloading and
installing Dropbox. In addition to their desktop client, Dropbox’s server-side code is in
Python as well, making it the majority language used at the company.
Google uses a mix of languages, with C++, Python, and now Go among them.
Early on at Google, there was an engineering decision to use “Python where we can, C++
where we must.” Python was used for parts that required rapid delivery and maintenance.
Then, they used C++ for the parts of the software stack where it was important to have very
low latency and/or tight control of memory.
If we take a look at these companies, we can see they benefit from Python for
its ease of use and because it’s great for rapid prototyping and iteration. Python can be used
for a wide variety of applications, and as we learn the basics of Python, we’ll be able to create
almost anything we want. Many great developers contribute daily to the Python community
by creating Python libraries. These libraries can help us get started so that we don’t have to
write code to reinvent the wheel. So, for example, if we want to do complex image
processing, the Python Imaging Library will help we get started.
29

Project Part 2

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Project Part 2

Uploaded by

Copyright:

Available Formats

1

Most search engines, including Google, continually run an army of computer

PageRank was developed at Stanford University by Larry Page (hence the

Text Ranking Vs Page Ranking:

Description of Page Rank:

PageRank can be thought of as a model of user behaviour. Assume there is a

The behaviour of PageRank also depends on the function of damping factor,

Hyperlink Anatomy of the Web:

The number of pages in the web may be demonstrated as nodes in a directed

Fig 1 DAG of a Web page

For the above directed graph Adjacency matrix is:

w=[w1 w2 w3 .... wn ] (2)

Similarly, for the above directed graph w will be:

Generating Page Rank:

In order to determine the importance of a page, we first need to know the

Computing PageRank with Matrix method:

Table 1: Illustration of the number of links of each node of the diagram

Page A: 0.15 (base) + 0.1275 +0.0425(from Page C, B) = 0.35

Page B: 0.15 (base) + 0.0425+ 0(from Page A, D) = 0.1925

Consider the web shown in the ﬁgure 4 given below

with hyperlink matrix

I (0) = 1 I (1)= 0 I (2)= 0 I (3)= 0

Again, consider the web in figure 3 where

Now, consider the web shown below:

Consider the web shown below:

The role of the parameter α is important. If α = 1 then, G = S which means we

ISSUES AND ADVANCES OF PAGE RANK

Spam issue with PageRank:

One of the main issues in PageRank is spamming. Spamdexing is a way used

Inspired by this problem, researchers are focusing in a new area called

Storage Issue with PageRank:

By Googles estimates, Penguin aﬀects approximately 3.1% of search queries

The major Reason for increasing popularity of Python language is as follows:

Comparison of Java and Python:

The Dropbox dead client is written entirely in Python, which speaks to its

You might also like