Professional Documents
Culture Documents
Deep Web
Deep Web
Amy Brunvand
Kate Holvoet
Peter Kraus
David Morrison
What is the Deep Web?
The deep Web is the hidden part of the
Web, containing a huge volume of content
that is inaccessible to conventional search
engines, and consequently, to most users.
How big is the Deep Web?
• 550 billion documents
• 500 times the content of the surface Web
• Google has identified 1.2 billion
documents
• An Internet search typically searches .03%
(1/3000) of available content.
What’s in the Deep Web?
• Searchable databases
• Downloadable files & spreadsheets
• Image and multi-media files
• Data sets
• Various file formats such as .pdf
• Lots of government information
Why use the Deep Web?
• Higher quality sources
– Selected and organized by subject experts
• Dynamic display
• Customized data sets
• Some data is visual, and not word
searchable
• Regular search engines miss vast
resources available in the Deep Web
Why are we talking about
Government Sites in the Deep
Web?
• Governments have the mandate and the
capacity to gather information that
individuals don’t
• Most government information is copyright
free
• Government information is authoritative
• Governments have the financial and
human resources to maintain Deep Web
sites
The Deep Web for Federal
Information
Peter L. Kraus
Federal Documents Librarian
Marriott Library – University of Utah
The Web Today
• Web sites from the federal government only
occupy about 1% of the entire global web.
However, they hold 85% of “The Deep Web”.
• The content of these web sites include items
with either an .html or .pdf format (reports,
records, data-sets, etc) – diversity of files. Little
standardization or uniformity ; Common term for
this content is “Grey Literature”.
Definition of “Grey Literature”
• “That which is produced on all levels of
government, academics, business and
industry in print and electronic formats, but
which is not controlled by commercial
publishers”
Growth and Life of Federal
Information
• On federal web sites the amount of
information grew 13-fold between 1992-
2003
http.cs.berkeley.edu/~daf/newo2.ps.Z
Graph showing number of citations to
“Finding Naked People”
Arches National Park : NASA Landsat 7 10/3/99
searching for displaying
""University of records 1 - 25 next 25 last 25
Utah"" of a total of 27