Welcome to Scribd!

Skip carousel

Anatomy of A Search Engine

Uploaded by

pradiptart

0% found this document useful (0 votes)

48 views17 pages

Original Title

Anatomy of a Search Engine

Copyright

Available Formats

PPT, PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Attribution Non-Commercial (BY-NC)

Available Formats

Download as PPT, PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as ppt, pdf, or txt

0% found this document useful (0 votes)

48 views17 pages

Anatomy of A Search Engine

Uploaded by

pradiptart

Copyright:

Attribution Non-Commercial (BY-NC)

Available Formats

Download as PPT, PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as ppt, pdf, or txt

Jump to Page

You are on page 1of 17

Search inside document

1

Anatomy of a Search Engine

Submitted by:
Pradipta Kumar Rout
0805227040
MCA 4th Sem

CVRCA
2

11/04/21 12:44 PM

Topics to cover
• Introduction

• History of search Engines

• Working of a search engine

• Google Architecture
3

11/04/21 12:44 PM
INTRODUCTION
• Search engine is a software program that searches for
sites based on the words that you designate as search
terms.

• Search engines look through their own databases of

information in order to find what it is that you are
looking for.

• “Search engine” is the popular term for an

Information Retrieval (IR) system.
4

11/04/21 12:44 PM

History of Web Search Engines

• 1993 : W3Catalog (University of Geniva)

• 1994 : World Wide Web Worm (MIT)
• 1995 : Alta Vista
• 1996 : Yahoo
• 1998 : Google
• 2004 : Msn(now Bing)
5

11/04/21 12:44 PM

Working of a search engine

1. Web crawling
2. Indexing
3. Searching
6

11/04/21 12:44 PM

Web crawling
1. What is a Crawler and Crawling.
2. How it works
 Search heavily used servers and
very popular pages.
 The words within the page &
Where the words were found .
7

11/04/21 12:44 PM

Indexing

1.What is indexing.

2. How it is done.
 Weights.
 Hashing.
 DocId
 wordID.
 The hash table contains
the hashed number along
with a pointer to the actual
data.
8

11/04/21 12:44 PM

Searching
1.How it works.
9

11/04/21 12:44 PM

Working of a search Engine

11/04/21 12:44 PM

Google Architecture
1. URL server
2. Crawler
3. Store Server
4. Repository
5. Indexer
6. Barrels
7. Anchors
8. URL Resolver
9. Links
10.Doc Index
11.Page Rank
12.Sorter
13.Lexicon
11

11/04/21 12:44 PM

• URL server : That sends lists of URLs to be fetched to the crawlers.

• Storeserver :The web pages that are fetched are then sent to the
•
storeserver. The storeserver then compresses and stores the web pages into a
repository.

• Indexing : It reads the repository, uncompresses the documents, and parses

them.( Hits - record the word, position in document, an approximation of font
size, and capitalization. ,Anchor file- stores important information about a link.)

• Barrels : Stors data.(Forward index) .

• URL Resolver:The URLresolver reads the anchors file and converts relative URLs
into absolute URLs .

• Sorter :The sorter takes the barrels, which are sorted by docID, and resorts them
by wordID to generate the inverted index.

• Repository : The repository contains the full HTML of every web page in
compressed form;(the URL's checksum is computed and a binary search is
performed on the checksums file to find its docID)
• ,
12

11/04/21 12:44 PM

Repository
13

11/04/21 12:44 PM

Indexer
14

11/04/21 12:44 PM

Page Rank
0.25 0.25 1. Everyone gets page rank that
is 1/(number of pages) = ¼
0.25 2. Each page gets it’s page rank
A B
updated based on incoming
links.

0.25 In this case page rank of A PR(A)

0.25 is:
PR(A) = 0.25 + 0.25 + 0.25 =
0.75

C D

0.25 0.25
15

11/04/21 12:44 PM

Page Rank
• Links are weighted based on number of outgoing links

0.25 0.25 The page rank is divided by the

0.25/2 number of outgoing links a site
A B has (why?)

0.25/2 e.g., D’s links are worth 0.25/3

because it has 3 outgoing links

0.25/1
0.25/3 0.25/3 now:
PR(A)=0.25/2 + 0.25/1 + 0.25/3

C D
0.25/3
0.25 0.25
16

11/04/21
12:44 PM

References
• //howstuffworks.com
• //google.standforfd.edu
17

11/04/21 12:44 PM

Ethical Dimensions in The Health Professions PDF
Document2 pages
Ethical Dimensions in The Health Professions PDF
James
0% (7)
Man d0834 d0836 Spec Sheet p1jpg
Document6 pages
Man d0834 d0836 Spec Sheet p1jpg
FERNANDO FERRUSCA
100% (2)
Past Simple
Document2 pages
Past Simple
Flaviio Frete
No ratings yet
The Anatomy of A Large-Scale Hypertextual
Document41 pages
The Anatomy of A Large-Scale Hypertextual
Shivani
No ratings yet
DB Normalization and Design
Document11 pages
DB Normalization and Design
vadriangmail
No ratings yet
Week 5
Document25 pages
Week 5
FARYAL FATIMA
No ratings yet
Access
Document7 pages
Access
Ana Donato Nunes
No ratings yet
Web Search Engines: Practice and Experience: Content Analysis Query Prcessing Search Log
Document21 pages
Web Search Engines: Practice and Experience: Content Analysis Query Prcessing Search Log
Ayoub
No ratings yet
Oracle 11am
Document15 pages
Oracle 11am
gavhaned1718
No ratings yet
Active Directory Fundamentals For CSR
Document41 pages
Active Directory Fundamentals For CSR
jivng
No ratings yet
Page Rank Algorithm
Document26 pages
Page Rank Algorithm
venkatsahul
No ratings yet
Oracle Resources
Document2 pages
Oracle Resources
ramprius
No ratings yet
17 Semantic Web RDF RDFS
Document54 pages
17 Semantic Web RDF RDFS
Siddharth Swain
No ratings yet
3.designing The Site Topology
Document49 pages
3.designing The Site Topology
rkvrit
No ratings yet
SPPM 1002 Web Searching
Document12 pages
SPPM 1002 Web Searching
Izlaikha Aziz
No ratings yet
Exploring Library Resources and Services For Research and Instruction
Document40 pages
Exploring Library Resources and Services For Research and Instruction
Alvin Sibayan
No ratings yet
Lec0912 Databases
Document17 pages
Lec0912 Databases
Akang Aprian Tea
No ratings yet
Part 5 Data Mining
Document35 pages
Part 5 Data Mining
Aditi Anand Shetkar
No ratings yet
Mining The Web Graph: Technical Seminar Presentation On
Document15 pages
Mining The Web Graph: Technical Seminar Presentation On
lokseh
No ratings yet
EECS 395/495 Lecture 5: Web Crawlers: Doug Downey
Document23 pages
EECS 395/495 Lecture 5: Web Crawlers: Doug Downey
Gabriel Fernandes
No ratings yet
Search Engine
Document42 pages
Search Engine
VinayKumarSingh
100% (2)
Web Database Integration: Wei Liu Xiaofeng Meng
Document5 pages
Web Database Integration: Wei Liu Xiaofeng Meng
Ramandeep Singh
No ratings yet
COMP S834: Unit 4
Document44 pages
COMP S834: Unit 4
Kavita Dagar
No ratings yet
Capacity Planning For Microsoft® Sharepoint® Technologies
Document43 pages
Capacity Planning For Microsoft® Sharepoint® Technologies
DarksMan
No ratings yet
Brad Brown Application Express
Document41 pages
Brad Brown Application Express
Diganta Kumar Gogoi
No ratings yet
04 Configuring Active Directory Sites and Replication
Document44 pages
04 Configuring Active Directory Sites and Replication
Vĩnh Nguyễn Hữu
100% (1)
Shouhong Wang, Hai Wang - Business Database Technology (2nd Edition) - Theories and Design Process of Re
Document321 pages
Shouhong Wang, Hai Wang - Business Database Technology (2nd Edition) - Theories and Design Process of Re
Fika Chu
No ratings yet
PDF Practical Guide To Large Database Migration Preston Zhang Ebook Full Chapter
Document53 pages
PDF Practical Guide To Large Database Migration Preston Zhang Ebook Full Chapter
james.pitcock154
100% (1)
2014 Making The Move To RDA - Self-Study Primer To Cataloguers
Document347 pages
2014 Making The Move To RDA - Self-Study Primer To Cataloguers
aparecidaoliveirasilva
No ratings yet
Web Crawlers: Presented By: B. Tech. Final Year Information Technology
Document27 pages
Web Crawlers: Presented By: B. Tech. Final Year Information Technology
monil
No ratings yet
Backlinks - Pagerank
Document12 pages
Backlinks - Pagerank
Grace Mambu
No ratings yet
Crawling The Web: Information Retrieval © Crista Lopes, UCI
Document25 pages
Crawling The Web: Information Retrieval © Crista Lopes, UCI
Ritesh Raman
No ratings yet
EDS WebCrawlerArchitecture
Document3 pages
EDS WebCrawlerArchitecture
Anubhav Pareek
No ratings yet
Spatial & Web Mining
Document45 pages
Spatial & Web Mining
rekha
No ratings yet
Web Mining and Text Mining
Document65 pages
Web Mining and Text Mining
nikhithalazarus4
No ratings yet
Croma Campus - Advance Selenium4.0 Training Curriculum
Document11 pages
Croma Campus - Advance Selenium4.0 Training Curriculum
diwakarsingh406640
No ratings yet
Ip@chapter 1
Document59 pages
Ip@chapter 1
Aisha m
No ratings yet
Lesson 4: Database Normalization
Document16 pages
Lesson 4: Database Normalization
Dhani Alif Srinata
No ratings yet
Decoding Oracle Database: A Comprehensive Guide to Mastery
From Everand
Decoding Oracle Database: A Comprehensive Guide to Mastery
Kameron Hussain
No ratings yet
02 SharePoint Basics
Document19 pages
02 SharePoint Basics
Aq Salman
No ratings yet
Lab1 Crawling Python
Document10 pages
Lab1 Crawling Python
Sang Nguyễn
No ratings yet
Database Management:: Ray R. Larson University of California, Berkeley School of Information Management and Systems
Document51 pages
Database Management:: Ray R. Larson University of California, Berkeley School of Information Management and Systems
hani1986ye
No ratings yet
REST in Peace
Document68 pages
REST in Peace
Ralph Shnelvar
No ratings yet
Microsoft Official Course: Implementing Active Directory Domain Services Sites and Replication
Document31 pages
Microsoft Official Course: Implementing Active Directory Domain Services Sites and Replication
asegunlolu
No ratings yet
CSE1041Week1LecUpdated
Document16 pages
CSE1041Week1LecUpdated
splokbov
No ratings yet
Lecture 4: Let's Get Data!: Prof. Esther Duflo
Document44 pages
Lecture 4: Let's Get Data!: Prof. Esther Duflo
Jake Tolentino
No ratings yet
Web Crawler Assisted Web Page Cleaning For Web Data Mining
Document75 pages
Web Crawler Assisted Web Page Cleaning For Web Data Mining
theviper11
No ratings yet
Working of Webb Search Engines
Document29 pages
Working of Webb Search Engines
Mohammed Azzan Patni
No ratings yet
A Two Stage Crawler On Web Search Using Site Ranker For Adaptive Learning
Document4 pages
A Two Stage Crawler On Web Search Using Site Ranker For Adaptive Learning
Kumarecit
No ratings yet
Computer Applications For Managers (Lumen)
Document725 pages
Computer Applications For Managers (Lumen)
Sheraz Ahmed
No ratings yet
A Study of Focused Web Crawling Techniques
Document4 pages
A Study of Focused Web Crawling Techniques
Editor IJRITCC
No ratings yet
EC 240 Database Engineering: Agenda
Document16 pages
EC 240 Database Engineering: Agenda
AMINA QADEER
No ratings yet
CIS 555 F P P: P ' F S E: Inal Roject Oogle ENN S Avorite Earch Ngine
Document5 pages
CIS 555 F P P: P ' F S E: Inal Roject Oogle ENN S Avorite Earch Ngine
Rajesh
No ratings yet
Search Engines: by Bhaswanth 16311A0507
Document23 pages
Search Engines: by Bhaswanth 16311A0507
Bhaswanth Gudimella
No ratings yet
Search Engines: by Bhaswanth 16311A0507
Document23 pages
Search Engines: by Bhaswanth 16311A0507
Bhaswanth Gudimella
No ratings yet
Oracle Database Utilities
Document1,247 pages
Oracle Database Utilities
sjin911114
No ratings yet
Dept. of Cse, Msec 2014-15
Document19 pages
Dept. of Cse, Msec 2014-15
Kumar Kumar T G
No ratings yet
Search Engines: Sara Khalid Suliman
Document34 pages
Search Engines: Sara Khalid Suliman
Magnon Be7wak
No ratings yet
Web Services
Document63 pages
Web Services
zeeshan
No ratings yet
SearchLand: Search Quality For Beginners
Document29 pages
SearchLand: Search Quality For Beginners
vcvpaiva
No ratings yet
Domain-Driven Laravel: Learn to Implement Domain-Driven Design Using Laravel
From Everand
Domain-Driven Laravel: Learn to Implement Domain-Driven Design Using Laravel
Jesse Griffin
No ratings yet
Web Harvesting
Document25 pages
Web Harvesting
Vinod Vinu
No ratings yet
Web Crawling: Based On The Slides by Filippo
Document52 pages
Web Crawling: Based On The Slides by Filippo
YashwanthMadaka
No ratings yet
SH Single
Document52 pages
SH Single
Nivaldo
No ratings yet
Chapter 4 - 7 Emerging Technology
Document149 pages
Chapter 4 - 7 Emerging Technology
Taidor Reath
No ratings yet
Electrical Test Procedures For Armatures, Stators and Motors
Document7 pages
Electrical Test Procedures For Armatures, Stators and Motors
Carlos
No ratings yet
Face Reading
Document78 pages
Face Reading
SivaPrasad
100% (7)
Investigating The Effect of Some Fabric Parameters On The Thermal Comfort Properties of Flat Knitted Acrylic Fabrics For Winter Wear
Document11 pages
Investigating The Effect of Some Fabric Parameters On The Thermal Comfort Properties of Flat Knitted Acrylic Fabrics For Winter Wear
Sudipto Behera
No ratings yet
Module 1 PDF
Document18 pages
Module 1 PDF
Yashaswini
No ratings yet
Installation Manual For Sea Tel 5009-17 Broadband-At-Sea Transmit / Receive System With Selectable Co-Pol or Cross-Pol Receive
Document173 pages
Installation Manual For Sea Tel 5009-17 Broadband-At-Sea Transmit / Receive System With Selectable Co-Pol or Cross-Pol Receive
Juan E Cstll
No ratings yet
Cell Theory PowerPoint
Document17 pages
Cell Theory PowerPoint
Trinna Abrigo
No ratings yet
Manta Ray: © 2008 Brigitte Read. All Rights Reserved To Report Errors With This Pattern Contact
Document2 pages
Manta Ray: © 2008 Brigitte Read. All Rights Reserved To Report Errors With This Pattern Contact
Marta Lobo
100% (2)
Bank Reconciliation
Document3 pages
Bank Reconciliation
jinyangsuel
No ratings yet
Elec - Magnetism Course Outline
Document4 pages
Elec - Magnetism Course Outline
Bernard Panganiban
No ratings yet
Child and Adolescent Development 2
Document58 pages
Child and Adolescent Development 2
Princess Mae Cuayzon
No ratings yet
UTF-8'en' (2083831X - Studia Geotechnica Et Mechanica) Comparison of Analysis Specifications and Practices For Diaphragm Wall Retaining System
Document9 pages
UTF-8'en' (2083831X - Studia Geotechnica Et Mechanica) Comparison of Analysis Specifications and Practices For Diaphragm Wall Retaining System
ABDO
No ratings yet
Rudolf Steiner The Spiritual Guidance of The Individual and Humanity
Document125 pages
Rudolf Steiner The Spiritual Guidance of The Individual and Humanity
asdf
100% (1)
Generic Structure and Rhetorical Moves in English-Language Empirical Law Research
Document14 pages
Generic Structure and Rhetorical Moves in English-Language Empirical Law Research
Regina Cahyani
No ratings yet
Abs Sire Directory 23
Document40 pages
Abs Sire Directory 23
sfranjul64
No ratings yet
HOA5 Question Papers
Document5 pages
HOA5 Question Papers
revathi hariharan
No ratings yet
STEB Asia - Biskuat FIFA Promo 2021
Document4 pages
STEB Asia - Biskuat FIFA Promo 2021
imelda
No ratings yet
1 Pledge of EDCF Loan
Document3 pages
1 Pledge of EDCF Loan
yongju2182
No ratings yet
Cyclamen
Document9 pages
Cyclamen
LAUM1
No ratings yet
Resume Sr. Software Test Engineer, CRIF Solution, Pune
Document4 pages
Resume Sr. Software Test Engineer, CRIF Solution, Pune
aman
No ratings yet
79 Hunain Ibn Ishaq
Document44 pages
79 Hunain Ibn Ishaq
Aziza
No ratings yet
Annual General Report On The Audit of Information Systems FY 2021-22
Document69 pages
Annual General Report On The Audit of Information Systems FY 2021-22
ABI
No ratings yet
Pro Thesis 1
Document20 pages
Pro Thesis 1
Anh Giang
No ratings yet
Draft Minutes Apr 5, 2023
Document5 pages
Draft Minutes Apr 5, 2023
PPSMU PAMPANGA PPO
No ratings yet
Cambridge O Level: Mathematics (Syllabus D) 4024/12 October/November 2022
Document8 pages
Cambridge O Level: Mathematics (Syllabus D) 4024/12 October/November 2022
Ummema Atif
No ratings yet
Service Sheet - Mercedes Benz W211
Document8 pages
Service Sheet - Mercedes Benz W211
Pedro Viegas
No ratings yet