Professional Documents
Culture Documents
Anatomy of A Search Engine
Anatomy of A Search Engine
Submitted by:
Pradipta Kumar Rout
0805227040
MCA 4th Sem
CVRCA
2
11/04/21 12:44 PM
Topics to cover
• Introduction
• Google Architecture
3
11/04/21 12:44 PM
INTRODUCTION
• Search engine is a software program that searches for
sites based on the words that you designate as search
terms.
11/04/21 12:44 PM
11/04/21 12:44 PM
1. Web crawling
2. Indexing
3. Searching
6
11/04/21 12:44 PM
Web crawling
1. What is a Crawler and Crawling.
2. How it works
Search heavily used servers and
very popular pages.
The words within the page &
Where the words were found .
7
11/04/21 12:44 PM
Indexing
1.What is indexing.
2. How it is done.
Weights.
Hashing.
DocId
wordID.
The hash table contains
the hashed number along
with a pointer to the actual
data.
8
11/04/21 12:44 PM
Searching
1.How it works.
9
11/04/21 12:44 PM
11/04/21 12:44 PM
Google Architecture
1. URL server
2. Crawler
3. Store Server
4. Repository
5. Indexer
6. Barrels
7. Anchors
8. URL Resolver
9. Links
10.Doc Index
11.Page Rank
12.Sorter
13.Lexicon
11
11/04/21 12:44 PM
• Storeserver :The web pages that are fetched are then sent to the
•
storeserver. The storeserver then compresses and stores the web pages into a
repository.
• URL Resolver:The URLresolver reads the anchors file and converts relative URLs
into absolute URLs .
• Sorter :The sorter takes the barrels, which are sorted by docID, and resorts them
by wordID to generate the inverted index.
• Repository : The repository contains the full HTML of every web page in
compressed form;(the URL's checksum is computed and a binary search is
performed on the checksums file to find its docID)
• ,
12
11/04/21 12:44 PM
Repository
13
11/04/21 12:44 PM
Indexer
14
11/04/21 12:44 PM
Page Rank
0.25 0.25 1. Everyone gets page rank that
is 1/(number of pages) = ¼
0.25 2. Each page gets it’s page rank
A B
updated based on incoming
links.
C D
0.25 0.25
15
11/04/21 12:44 PM
Page Rank
• Links are weighted based on number of outgoing links
0.25/1
0.25/3 0.25/3 now:
PR(A)=0.25/2 + 0.25/1 + 0.25/3
C D
0.25/3
0.25 0.25
16
11/04/21
12:44 PM
References
• //howstuffworks.com
• //google.standforfd.edu
17
11/04/21 12:44 PM