Download as pdf or txt
Download as pdf or txt
You are on page 1of 2

PageRank Algorithm Documentation

Overview

This C++ program implements the PageRank algorithm using an Adjacency List data structure. PageRank is an algorithm used by search engines to rank webpages in
their search results. It measures the importance of webpages based on the structure of the web graph.

Program Structure

AdjacencyList Class

The AdjacencyList class represents the web graph using an adjacency list. It has the following private member variables:

std::map<std::string, std::vector<std::string>> graph : Represents the graph as an adjacency list.


std::map<std::string, int> idMapping : Maps webpage URLs to unique IDs.
std::vector<std::string> idToURL : Maps unique IDs back to webpage URLs.
std::vector<double> pageRank : Stores the PageRank values for each webpage.

Member Functions

void addEdge(const std::string& from, const std::string& to)

This function adds an edge between two webpages in the graph. It takes two parameters: from , the source webpage, and to , the destination webpage. The function
creates a directed edge from from to to .

void mapUrlsToIds()

This function maps URLs to unique IDs and initializes the PageRank vector. It iterates through the graph's nodes and assigns a unique ID to each webpage. These
mappings are stored in idMapping , allowing efficient access to webpage IDs. Additionally, the function initializes the pageRank vector, setting the initial PageRank
values to be uniformly distributed among the webpages.

void computePageRank(int powerIterations)

This function computes the PageRank of webpages after a given number of power iterations. The PageRank algorithm involves iterative calculations based on the
graph's structure. In each iteration, the function updates the PageRank values of webpages according to their incoming links. The powerIterations parameter
determines how many iterations of the PageRank algorithm will be performed. After the specified number of iterations, the PageRank values converge to stable
values.

void printPageRank()

This function prints the PageRank of all webpages in ascending alphabetical order. It first constructs a vector of pairs, where each pair contains a webpage URL and
its corresponding PageRank. This vector is sorted alphabetically based on the URLs. Then, the function iterates through the sorted vector and prints each webpage's
URL and PageRank, formatted to two decimal places.

These member functions collectively allow the AdjacencyList class to manage the graph structure, compute PageRank using the specified algorithm, and display
the final PageRank values in a readable format.

Main Function

The main() function takes input for the number of webpages and power iterations. It reads edges between webpages and calculates their PageRank using the
AdjacencyList class.

Usage
To run the program, input the number of webpages, power iterations, and webpage connections when prompted. The program will output the PageRank values for
each webpage.

Input Format
Line 1: Number of webpages ( numLines ) and number of power iterations ( powerIterations ).
Lines 2 to numLines + 1 : Pairs of webpage URLs separated by a space, representing edges in the graph.

Output Format
The program outputs the PageRank of webpages after the specified power iterations. Each line contains a webpage URL and its corresponding PageRank value.

Constraints
1 <= powerIterations <= 10,000
1 <= numLines <= 10,000
1 <= Number of unique webpages ( |V| ) <= 10,000

Example

Input

7 2
google.com gmail.com
google.com maps.com
facebook.com ufl.edu
ufl.edu google.com
ufl.edu gmail.com
maps.com facebook.com
gmail.com maps.com

Output

facebook.com 0.20
gmail.com 0.20
google.com 0.10
maps.com 0.30
ufl.edu 0.20

Output Explanation
Each line represents a webpage URL followed by its PageRank value after 2 power iterations. The PageRank values are calculated based on the input connections
between webpages and the specified power iterations. Webpages are listed in ascending alphabetical order.

PageRank Calculation Details:

1. google.com (PageRank: 0.10):

Connections: google.com is connected to gmail.com and maps.com.


After 2 power iterations, its PageRank is computed as 0.10.

2. gmail.com (PageRank: 0.20):

Connections: gmail.com is connected to maps.com. It also receives backlinks from google.com and ufl.edu.
After 2 power iterations, its PageRank is computed as 0.20.

3. maps.com (PageRank: 0.30):

Connections: maps.com is connected to facebook.com. It receives backlinks from google.com and gmail.com.
After 2 power iterations, its PageRank is computed as 0.30.

4. ufl.edu (PageRank: 0.20):

Connections: ufl.edu is connected to google.com and gmail.com.


After 2 power iterations, its PageRank is computed as 0.20.

5. facebook.com (PageRank: 0.20):

Connections: facebook.com is connected to ufl.edu.


After 2 power iterations, its PageRank is computed as 0.20.

These PageRank values are the result of applying the PageRank algorithm for 2 iterations on the given web graph. Each webpage's PageRank represents its
importance in the network, calculated based on the links between webpages.

You might also like