Download as pdf or txt
Download as pdf or txt
You are on page 1of 83

Exploring the Deep Web

University of Utah Government Documents Librarians

Amy Brunvand
Kate Holvoet
Peter Kraus
David Morrison
What is the Deep Web?
The deep Web is the hidden part of the
Web, containing a huge volume of content
that is inaccessible to conventional search
engines, and consequently, to most users.
How big is the Deep Web?
• 550 billion documents
• 500 times the content of the surface Web
• Google has identified 1.2 billion
documents
• An Internet search typically searches .03%
(1/3000) of available content.
What’s in the Deep Web?
• Searchable databases
• Downloadable files & spreadsheets
• Image and multi-media files
• Data sets
• Various file formats such as .pdf
• Lots of government information
Why use the Deep Web?
• Higher quality sources
– Selected and organized by subject experts
• Dynamic display
• Customized data sets
• Some data is visual, and not word
searchable
• Regular search engines miss vast
resources available in the Deep Web
Why are we talking about
Government Sites in the Deep
Web?
• Governments have the mandate and the
capacity to gather information that
individuals don’t
• Most government information is copyright
free
• Government information is authoritative
• Governments have the financial and
human resources to maintain Deep Web
sites
The Deep Web for Federal
Information
Peter L. Kraus
Federal Documents Librarian
Marriott Library – University of Utah
The Web Today
• Web sites from the federal government only
occupy about 1% of the entire global web.
However, they hold 85% of “The Deep Web”.
• The content of these web sites include items
with either an .html or .pdf format (reports,
records, data-sets, etc) – diversity of files. Little
standardization or uniformity ; Common term for
this content is “Grey Literature”.
Definition of “Grey Literature”
• “That which is produced on all levels of
government, academics, business and
industry in print and electronic formats, but
which is not controlled by commercial
publishers”
Growth and Life of Federal
Information
• On federal web sites the amount of
information grew 13-fold between 1992-
2003

• The average life expectancy of federal


web resource is 4 months (2003)
What can libraries do?
• LOCKSS-DOCS project (BYU and UU are
members) (Archival project)
• Cooperative efforts in specific subject
areas (Western Waters Digital Library)
• Individual Institutional Initiatives; such as
Institutional Repositories ; reflecting the
institutional productivity in research
(Information often funded by federal
grants)
The Deep Web for Health and
Science Information
Amy Brunvand – Government
Information Librarian
Marriott Library – University of Utah
Finding Naked People - Forsyth, Fleck (
1996)
(Correct) (54 citations)

This paper demonstrates an automatic


system for telling whether there are naked
people present in an image. The approach
combines color and texture properties to
obtain a mask for skin regions, which is
shown to be effective for a wide range of
shades and colors of skin.

http.cs.berkeley.edu/~daf/newo2.ps.Z
Graph showing number of citations to
“Finding Naked People”
Arches National Park : NASA Landsat 7 10/3/99
searching for displaying
""University of records 1 - 25 next 25 last 25
Utah"" of a total of 27

Development and Evaluation of Stitched


Sandwich Panels
Larry E. Stanley; Daniel O. Adams
NASA Langley Research Center
NASA/CR-2001-211025 , June 2001; 20010702
….. test panels were produced initially at the
University of Utah and later at NASA Langley
Research Center……
http://techreports.larc.nasa.gov/ltrs/PDF/2001/cr/
NASA-2001-cr211025.pdf
Marriott Library, Salt Lake City, Utah,
United States 9/18/2003 (TerraServer)
Utah Seismic Hazards (National Atlas)
The Deep Web for
International Information
Kate Holvoet –Interim Head,
Government Documents and
Microforms
Marriott Library – University of Utah
International Deep Web Resources
• International organizations collect an
amazing amount of data
• Statistical data is often best organized in
database and spreadsheet format
• Like the US Government, individual
countries post data files and databases
• This information may not be available in
print sources in schools and libraries
United Nations Official Documents
System
• http://documents.un.org/
Why use the ODS?
• Full-text Official United Nations
Documents (1993 -) online, free
• Retrospective digitization in process
• Highly relevant material for almost any
international topic
• Timely and authoritative
United Nations Statistical
Databases
• Value of the • Database topics
information: include:
– Authoritative • Commodity trade
– Comparative • Demographics
– Time series
• Disability statistics
– Compact
• Social indicators
• Statistics on men and
women
http://unstats.un.org/unsd/databases.htm
Individual Country Statistics
• http://www.census.gov/main/www/stat_int.html
Why use this kind of information?
• Aggregate statistical sources are often not
as up-to-date
• Individual countries are often more
specific in their indicators than aggregate
sources
• Information in databases, spreadsheets,
and downloadable files is usually NOT
searchable by web crawlers
Patents, Trademarks
and the Deep Web
Dave Morrison
Documents and Microforms Division
Marriott Library - University of Utah
For Further Information
• USPTO Information Line
800-PTO-9199

• Marriott Library, University of Utah


801-581-8394
www.lib.utah.edu/documents
Any Questions?
Thanks!

You might also like