Professional Documents
Culture Documents
Search Engines .: Presented By: Rasik Mevada Vishal Dabhi Vimal Nair Ravi Mathai
Search Engines .: Presented By: Rasik Mevada Vishal Dabhi Vimal Nair Ravi Mathai
Presented by: Rasik Mevada Vishal Dabhi Vimal Nair Ravi Mathai
INTRODUCTION
HISTORY TYPES OF SEARCH ENGINE
CONCLUSION
Introduction
Search engine is a software program that
searches for sites based on the words that you designate as search terms. Search engines look through their own databases of information in order to find what it is that you are looking for. Some search engines also mine data available in databases or open directories. Unlike web directories, which are maintained only by human editors, search engines also maintain real-time information by running an algorithm on a web crawler.
Introduction Contd
A web search engine is designed to search for
information on the World Wide Web. The search results are generally presented in a list of results often referred to as search engine results pages (SERPs). Information consist of web pages, images, information and other types of files. Search engine is the popular term for an Information Retrieval (IR) system.
History
In 1990- first tool for searching on the internet
was ARCHIVE. The program downloaded the directory listings of all the files located on public anonymous FTP (File Transfer Protocol) sites, creating a searchable database of file names; However, Archie did not index the contents of these sites since the amount of data was so limited it could be readily searched manually.
search programs, Veronica and Jughead. Like Archie, they searched the file names and titles stored in Gopher index systems. Veronica (Very Easy Rodent-Oriented Net-wide Index to Computerized Archives) provided a keyword search of most Gopher menu titles in the entire Gopher listings. Jughead (Jonzy's Universal Gopher Hierarchy Excavation And Display) was a tool for obtaining menu information from specific Gopher servers.
produced what was probably the first web robot, the Perl-based World Wide Web Wanderer, and used it to generate an index called 'Wandex'. The web's second search engine Aliweb appeared in November 1993. Aliweb did not use a web robot, but instead depended on being notified by website administrators of the existence at each site of an index file in a particular format. One of the first "full text" crawler-based search engines was WebCrawler, which came out in 1994.
HumanPowered Directories
the World Wide Web in a methodical, automated manner or in an orderly fashion. Other terms for Web crawlers are ants, automatic indexers, bots, Web spiders, Web robots, orespecially in the FOAF communityWeb scutters.
Many sites, in particular search engines, use spidering as a means of providing up-to-date data. Web crawlers are mainly used to create a copy of all the visited pages for later processing by a search engine that will index the downloaded pages to provide fast searches.
Also called the spider, it visits a web page, reads it, and then follows links to other pages within the site. The spider will return to the site on a regular basis, such as every month or every fifteen days, to look for changes.
2) The index Everything the spider finds goes into the second part of the search engine, the index. The index will contain a copy of every web page that the spider finds. If a web page changes, then the index is updated with new information. 3) The search engine software
This is the software program that accepts the user-entered query, interprets it, and shifts through the millions of pages recorded in the index to find matches and ranks them in order of what it believes is most relevant and presents them in a customizable manner to the user.
general, it starts with a list of URLs to visit, called the seeds. As the crawler visits these URLs, it identifies all the hyperlinks in the page and adds them to the list of URLs to visit, called the crawl frontier. URLs from the frontier are recursively visited according to a set of policies.
The large volume implies that the crawler can only
download limited number of the Web pages within a given time, so it needs to prioritize its downloads. The high rate of change implies that the pages might have already been updated or even deleted.
information from Web pages, such as harvesting e-mail addresses (usually for sending spam).
by editors who add links based on the policies particular to that directory. Some directories may prevent search engines from rating a displayed link by using redirects, no follow attributes, or other techniques. Many human-edited directories, including the Open Directory Project, Salehoo and World Wide Web Virtual Library, are edited by volunteers, who are often experts in particular categories. These directories are sometimes criticized due to long delays in approving submissions, or for rigid organizational structures and disputes among volunteer editors.
directories have adopted wiki technology, to allow broader community participation in editing the directory (at the risk of introducing lower-quality, less objective entries).
Another direction taken by some web directories is the
inclusion for submissions and generally fewer listings as a result of the paid model.
These options typically have an additional fee
is considered a common SEO (search engine optimization) technique to get back-links for the submitted web site.
One distinctive feature of 'directory submission' is
3.
In the web's early days, it used to be that a
search engine either presented crawler-based results or human-powered listings. Today, it extremely common for both types of results to be presented. Usually, a hybrid search engine will favour one type of listings over another.
For example, MSN Search is more likely to
present human-powered listings from Look Smart. However, it does also present crawler-based results (as provided by Inktomi), especially for more obscure queries.