Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 22

Internet

is a global system of interconnected


computer networks that use the standard
Internet protocol suite
begun in1960s, United States government +
private commercial interests
is a network of networks of millions of private,
public, academic, business, and government
networks, of local to global scope, that are linked by
electronic, wireless and optical networking
technologies
In 2011, more than 2.2 billion people a third of
Earth's population use internet

World Wide Web - net


World Wide Web (WWW or W3, or the
Web), is a system of interlinked
hypertext documents accessed via the
Internet
At CERN Berners-Lee and Robert Cailliau
proposed in 1990 to use hypertext to link
and access information of various kinds
as a web of nodes in which the user can
browse at will

e-search
E-search-A web search engine is
designed to search for information on
the World Wide Web
search engine results pages (SERPs)
E-metasearch-A metasearch
engine is a search tool that sends
user requests to several other search
engines and/or databases and
aggregates the results into a single
list, searching multiple data sources

web search engines


history
Alan Emtage, Bill Heelan J. Peter Deutsch

Montreal students, 1990: Archie.


(created a searchable database of file names)
1991 Mark McCahill, University of Minnesota :
Gopher protocol Veronica and Jughead
1993, Matthew Gray, MIT, produced the first
web robot, the Perl-based
World Wide Web Wanderer : Wandex
WebCrawler, 1994 it let users search for any
word in any webpage, first one widely known
by the public

web search engines


classification
Crawler-based (traditional, common)
search engines: Google, Yahoo &
Bing,
Directories (mostly human-edited
catalogs): Mozilla
Hybrid engines (META engines and
those using other engines' results):
Dogpile, Clusty

types of search engines

spider-based search engines. find


information on the Internet and store
it for search results in giant
databases or indexes
directory-based search engines.
link-based search engines.

web search-metasearch
engines
web search engines:
look in their own garden

web metasearch engines:


look in many gardens, asking
many other search engines

web search engines


Baidu (Chinese,
Japanese)
Bing
Blekko
Google
Sogou (Chinese)
Soso.com (Chinese)

Volunia
WireDoo
Yahoo!
Yandex.com
(Russian)
Yebol
Yodao (Chinese)

web metasearch engines


Yippy (formerly
Clusty)
DeeperWeb
Dogpile
Excite
Harvester42
HotBot
Info.com
Ixquick

Kayak
Mamma
Metacrawler
MetaLib (Not a
public search
engine)
Mobissimo
SideStep
WebCrawler

Special web search


engines

Enterprise
Medical
Food/Recipes News
Mobile/Handheld
People
Job
Real estate /
property
Legal
Television
Business
Accountancy Video Games

Special web search engines


Geographically limited scope
Accoona, China/
United States
Alleba, Philippines
Ansearch, Australia/
United States/
United Kingdom/
New Zealand
Biglobe, Japan
Daum, Korea
Goo, Japan
Guruji.com, India
Leit.is, Iceland

Naver, Korea
Onkosh, Arab World
Rambler, Russia
Redif, India
SAPO, Portugal/Angola/
Cabo Verde/Mozambique
Search.ch, Switzerland
Sesam, Norway, Sweden
Seznam, Czech Republic
Walla!, Israel
Yandex.ru, Russia
Yehey!, Philippines
ZipLocal, Canada/

Special web search engines


Food/Recipe

Mobile/Handheld

RecipeBridge:
vertical search engine
Taganode Local Search Eng
for recipes
ne
Yummly: semantic
Taptu: taptu mobile/
recipe search
social search

Special web search


engines
Accountancy Business

IFACnet

Business.com
GenieKnows (US
Canada)
GlobalSpec
Nexis (Lexis Nexis)
Thomasnet (
United States)

Special web search engines


Enterprise

Inbenta: Inbenta Semantic Search


Engine
ISYS Search Software: ISYS:web,
ISYS:sdk
Jumper 2.0: powered by Enterprise
bookmarking
Microsoft: SharePoint Search
Services
Northern Light
Open Text: Hummingbird Search
Server, Livelink Search
Oracle Corporation: Secure
Enterprise Search 10g
SAP: TREX
TeraText: TeraText Suite
Vivisimo: Vivisimo Clustering Engine

AskMeNow: S3 - Semantic Search


Solution
Concept Searching Limited: concept
Coveo: Coveo Enterprise Search
platform, Coveo Expresso

Dieselpoint: Search & Navigation


dtSearch: dtSearch Engine (SDK),

Endeca: Information Access Platform


Exalead: exalead one:enterprise

Expert System S.p.A.: Cogito


Fast Search & Transfer: Enterprise
Search Platform (ESP), RetrievalWare
(formerly Convera)

Funnelback: Funnelback Search


IBM: OmniFind Enterprise Edition

Special web search engines


Job

Bixee.com (India)
CareerBuilder.com
(USA)
Craigslist (by city)
Dice.com (USA)
Eluta.ca (Canada)
Hotjobs.com (USA)

Incruit (Korea)
Indeed.com (USA)
LinkUp.com (USA)
Monster.com
(USA), (India)
Naukri.com (India)
Yahoo! HotJobs

Special web search engines


Legal

Medical

Bing Health
Bioinformatic Harvester
Google Scholar Entrez (includes Pubmed)
GenieKnows
Lexis (Lexis Nexis)
GoPubMed
Manupatra
Healia
Quicklaw
Healthline
Nextbio (Life Science Search Engine
WestLaw
PubGene
Quertle (Semantic search)
Searchmedica
VADLO (Life Sciences Search Engine
WebMD

Special web search engines


Real estate / property

Fizber.com
HotPads.com
Realtor.com
Redfin
Rightmove
Zillow.com

Television
Video Games

TV Genius
Wazap (Japan)

Special web search engines


News

Bing News

Daylife

Google News

MagPortal

Newslookup

Nexis (Lexis Nexis)


Topix.net

Yahoo! News

People

Comfibook
Ex.plode.us
InfoSpace
PeekYou
Spock
Spokeo
Wink
Worldwide Helpers
Zabasearch.com
ZoomInfo

Visible/invisible web
Visible web: Surface Web- is
indexable by standard
search engines.
Deep Web (also called the
Deepnet, the Invisible Web, the
Undernet or the hidden Web) is
World Wide Web content that is not
part of the Surface Web

Invisible web

Dynamic content: dynamic pages which are returned in


response to a submitted query
Unlinked content: pages which are not linked to by other
pages
Private Web: sites that require login (password protected).
Contextual Web: pages with content varying for diferent
access contexts
Limited access content: sites that limit access in a
technical way
Scripted content: pages that are only accessible through
links produced by JavaScript as well as content
dynamically downloaded from Web servers via Flash or
Ajax solutions.
Non-HTML/text content: textual content encoded in
multimedia

Visible<invisible web
Using the Deep Web-Steve Gruchawka
11/14/2011

Visible web = 1%
Invisible web = 99%
Wikipedia:
open web is 167 terabytes
Invisible Web is estimated at
91,000 terabytes.

(The Library of Congress, in 1997, 3,000 terabytes!

Difficulties in e-search
The characteristics of information
Quantity
- use of specific query terms
Certification - use of scientific search
engines
- site provider
Visibility/accessibility - acces/finding it use of specific query terms
use of specific search engine

- pay
- log in necesity

You might also like