Professional Documents
Culture Documents
Types of Search Engines and How It Works
Types of Search Engines and How It Works
Internet Searching
Search Engine
History Examples
Internet
Internet An interconnected network of thousands of networks and millions of computers linking businesses, educational institutions, government agencies, and individuals together
Searching.
A lot of information makes a site huge, complex and navigation difficult.
Search is the user's lifeline for mastering complex websites.
Search feature is essential for users when they revisit a site, looking for specific info.
Types of Searching
A search can be of various types:
Internet Search: Search Engines like Yahoo, Info seek crawl the web gathering web pages or info on web pages, index them
and retrieve them when the specific term is found Database search: Databases store their information neatly organized into fields. A search Interface is provided for this.
SEARCH ENGINE
A tool designed to search for information on the World Wide Web. The
information may consist of web pages, images, information and other types of files.
A search engine is an information retrieval system
designed to help find information stored on a computer system. The search results are usually presented in a list and are commonly called hits.
about search engines and searching in order to explore the wonderful world that the Internet creates to a greater extent. Search engines help to minimize the time required to find information and the amount of information which must be consulted Searching is one of the most used action on the Internet. Search engines as an instrument of searching, are special sites on the Web that are designed to help people find information stored on other sites.
Includes external engines like Google, Yahoo, MSN, AOL, Live.
7
new The first tool for searching Archie Then rise of Gopher led to 2 new search programs Veronica and Jughead.(1991) Till 1993, no search engine existed for the web. Webs first primitive search engine W3catalog.(1993)
History Cont..
First all text crawler based search engine WEBCRAWLER (1994) Google adopted idea of selling search terms in 1998, from small
company named goto.com Brightest stars in the internet investing frenzy. Google rose to prominence (2000) Microsofts first SE MSN was using search results from Inktomi
History Cont
Microsoft rebranded SE, Bing launched on June 1 2009.
a chrome application.
10
11
12
13
14
15
16
automatically. They "crawl" or "spider" the web, then people search through what they have found.
If you change your web pages, crawler-based search engines
eventually find these changes, and that can affect how you are listed. Page titles, body copy and other elements all play a role.
17
Cont.
Crawler-based search engines are good when you have a specific
search topic in mind and can be very efficient in finding relevant information in this situation
LIKE.. Google, AllTheWeb and AltaVista
18
Human-Powered Directories
A human-powered directory, such as the Open Directory, depends on
humans for its listings. You submit a short description to the directory for your entire site, or editors write one for sites they review. A search looks for matches only in the descriptions submitted. Changing your web pages has no effect on your listing. Things that are useful for improving a listing with a search engine have nothing to do with improving a listing in a directory. The only exception is that a good site, with good content, might be more likely to get reviewed for free than a poor site
19
Cont..
Human-powered directories are good when you are interested in a
general topic of search. In this situation, a directory can guide and help you narrow your search and get refined results. Therefore, search results found in a human-powered directory are usually more relevant to the search topic and more accurate. However, this is not an efficient way to find information when a specific search topic is in mind. Example- Yahoo directory, Open Directory and LookSmart
20
beneficial if you are on the go, and using a service such as ChaCha or KGB, that allows you to ask and answer your question via text message.
Sometimes standard search engines don't know what you're talking
Unanswered questions: While some sites may take days, other sites may
Human Error: We all know and trust Google to deliver our answers, but
we have no idea who is answering our questions on human powered sites, and what their qualifications are. Would you trust just anyone? Because I certainly don't categorize, and sub-subcategorize your questions- which takes the simplicity out of these human powered search engines.
22
results and directory results. More and more search engines these days are moving to a hybrid-based model.
It extremely common for both types of results to be presented.
Usually, a hybrid search engine will favor one type of listings over another.
For example, MSN Search is more likely to present human-powered
integrated, duplicates can be eliminated and additional features such as clustering by subjects within the search results can be implemented by meta-search engines.
24
one place and sparing the need to use and learn several separate search engines.
But since meta-search engines do not allow for input of many
search variables, their best use is to find hits on obscure items or to see if something can be found using the Internet.
25
How it works
1. Index ahead of time
Find files or records Open each one and read it Store each word in a searchable index
Match the query terms with words in the index Sort documents by relevance
3. Display results
28
29
exist, a search engine employs special software robots, called spiders, to build lists of the words found on Web sites.
A program that automatically fetches Web pages. Spiders are used
to feed pages to search engines. It's called a spider because it crawls over the Web. Another term for these programs is web crawler.
30
Cont
Spiders store the lists in the engines database.
The engines indexing software builds an index of words . Information is matched against query input and retrieved
(processing algorithm)
31
File name / URL / record ID Title or equivalent Size, date, MIME type
Product name, picture ID Category, topic, or subject Other attributes, for relevance ranking and display
32
33
Cont..
Once the spiders have completed the task of finding information
on Web pages the search engine must store the information in a way that makes it useful.
a search engine could just store the word and the URL where it was
found. In reality, this would make for an engine of limited use, since there would be no way of telling whether the word was used in an important or a trivial way on the page
34
Cont
Ranking list that tries to present the most useful pages at the top of
values assigned to words as they appear near the top of the document, in sub-headings, in links, in the meta tags or in the title of the page
An index has a single purpose: It allows information to be found as
quickly as possible. There are quite a few ways for an index to be built, but one of the most effective ways is to build a hash table. In hashing, a formula is applied to attach a numerical value to each word.
35
a more complex query requires the use of Boolean operators that allow you to refine and extend the terms of the search
Boolean operators- AND, OR, NOT, FOLLOWED BY, NEAR etc.
36
Cont..
Most of search engines support caching to reduce the cost of time of
searching of common words like "Amazon" dramatically. If the site received a query whose result is stored in cache, it returns the result from the cache without any posting a query request to the main database.
37
3. Display result
After the search engine received the result from the main database
or cache, the site has to display the result to the user. The listing of result is usually quite simple: just list web pages that are hit with the description of the site. However, the order of the list is important yet difficult to judge by pure computation.
38
Page rank
Once the search engine has found web pages for the given query, what ordering should the links be provided?
some pages are found to be more important than others and so, if two pages match a query, order them so that the more important pages link comes first Ordering is based on the page rank which primarily looks to see if a page is an authoritarian page which means that a lot of other pages link to it
39
Cont.
Similarly, a hub is a page which has a lot of outgoing links and
may represent a good starting point Advertising can also affect the order that pages are offered
Advertisers will pay search engine sites to place their links before others, or in special areas of the web page If you go to Google and search for computers, you get links for Dell, Apple, Staples, and others near the top and to the right of the page why?
they paid to be there !! Best Buy didnt pay as much, so they are located lower down !
40
THANK YOU
42