The Deep Web: 21493 Shyneil Dsouza 21496 Aditya Upadhyay 21502 Poornank Shirke

The Deep
Web
21493 Shyneil Dsouza
21496 Aditya Upadhyay
21502 Poornank Shirke
Surface Web
 The surface Web is that portion of the World Wide Web that is indexable by conventional search
engines.
 It is also known as the Clearnet, the visible Web or indexable Web.
 Eighty-five percent of Web users use search engines to find needed information, but nearly as
high a percentage cite the inability to find desired information as one of their biggest frustrations.
 A traditional search engine sees only a small amount of the information that's available -- a
measly 0.03 % [source: OEDB].
Deep Web - Introduction
 The Deep Web is World Wide Web content that is not part of the Surface Web, which is
indexed by standard search engines.
 It is also called the Deepnet, Invisible Web or Hidden Web.
 Largest growing category of new information on the Internet.
 400-550X more public information than the Surface Web.
 Total quality 1000-2000X greater than the quality of the Surface Web.
History
 Jill Ellsworth used the term invisible Web in 1994 to refer to websites that were not
registered with any search engine.
 Mike Bergman cited a January 1996 article by Frank Garcia:

“It would be a site that's possibly reasonably designed, but they didn't bother to
register it with any of the search engines. So, no one can find them! You're hidden. I call that
the invisible Web”.
 Another early use of the term Invisible Web was by Bruce Mount and Matthew B. Koll of
Personal Library Software in 1996.
 The first use of the specific term Deep Web, now generally accepted, occurred in the
aforementioned 2001 Bergman study.
How search engines work
 Search engines construct a database of the Web by using programs called spiders or Web
crawlers that begin with a list of known Web pages.
 The spider gets a copy of each page and indexes it, storing useful information that will let
the page be quickly retrieved again later.
 Any hyperlinks to new pages are added to the list of pages to be crawled.
 Eventually all reachable pages are indexed, unless the spider runs out of time or disk space.
 The collection of reachable pages defines the Surface Web.

How search engines work
Contents
 Dynamic Content
 Unlinked content
 Private Web
 Contextual Web
 Limited access content
 Non-Scripted content
 Non-HTML/text content;
 Dynamic content
• Dynamic pages which are returned in response to a submitted query or accessed
only through a form
• especially if open-domain input elements (such as text fields) are used
• such fields are hard to navigate without domain knowledge
 Unlinked Content
• Pages which are not linked to by other pages
• Which may prevent web crawling programs from accessing the content
• This content is referred to as pages without backlinks (or inlinks).
 Private Web: sites that require registration and login (password-protected
resources).
 Contextual Web: pages with content varying for different access contexts (e.g.,
ranges of client IP addresses or previous navigation sequence).
 Limited access content: sites that limit access to their pages in a technical
way (e.g., using the Robots Exclusion Standard, CAPTCHAs, or no-cache Pragma
HTTP headers which prohibit search engines from browsing them and creating
cached copies.
 Scripted content
pages that are only accessible through links produced by JavaScript as well as
content dynamically downloaded from Web servers via Flash or Ajax solutions.
 Non-HTML/text content
textual content encoded in multimedia (image or video) files or specific file
formats not handled by search engines.
Deep Potential
 The deep Web is an endless repository for a mind-reeling amount of information.
 It's powerful. It unleashes human nature in all its forms, both good and bad.
 There are engineering databases, financial information of all kinds, medical papers, pictures,
illustrations ... the list goes on, basically, forever.
 For example, construction engineers could potentially search research papers at multiple universities in
order to find the latest and greatest in bridge-building materials.
 Doctors could swiftly locate the latest research on a specific disease.
 The potential is unlimited. The technical challenges are daunting. That's the draw of the deep Web.
Shadow Land
 The deep Web may be a shadow land of untapped potential.
 The bad stuff, as always, gets most of the headlines.
 You can find illegal goods and activities of all kinds through the dark Web.
 That includes illicit drugs, child pornography, stolen credit card numbers, human trafficking, weapons, exotic
animals, copyrighted media and anything else you can think of.
 Theoretically, you could even, say, hire a hit man to kill someone you don't like.
 But you won't find this information with a Google search.
 These kinds of Web sites require you to use special software, such as The Onion Router, more commonly known
as Tor.
The Onion Router (TOR)
 Tor is software that installs into your browser and sets up the specific connections you need to access
dark Web sites.
 Critically it is free software for enabling online anonymity and censorship resistance.
 Onion routing refers to the process of removing encryption layers from Internet communications,
similar to peeling back the layers of an onion.
 Using Tor makes it more difficult to trace Internet activity, including "visits to Web sites, online posts,
instant messages, and other communication forms", back to the user.
 It is intended to protect the personal privacy of users, as well as their freedom and ability to conduct
confidential business by keeping their internet activities from being monitored.
Cont…
 Instead of seeing domains that end in .com or .org, these hidden sites end in .onion.
 The most infamous of these onion sites was the now-defunct Silk Road, an online marketplace
where users could buy drugs, guns and all sorts of other illegal items.
 The FBI eventually captured Ross Ulbricht, who operated Silk Road, but copycat sites like Black
Market Reloaded are still readily available.
 Tor is the result of research done by the U.S. Naval Research Laboratory, which created Tor for
political dissidents and whistleblowers, allowing them to communicate without fear of reprisal.
 Tor was so effective in providing anonymity for these groups that it didn't take long for the
criminally-minded to start using it as well.
U.S. authorities shut down Silk after the
Silk Road Website alleged owner of the site Ross William Ulbricht
was arrested.
Money-related transactions
 You may wonder how any money-related transactions can happen when sellers and buyers can't
identify each other.
 That's where Bitcoin comes in.
 Bitcoin, it's basically an encrypted digital currency.
 Like regular cash, Bitcoin is good for transactions of all kinds, and notably, it also allows for
anonymity; no one can trace a purchase, illegal or otherwise.
 When paired properly with TOR, it's perhaps the closest thing to a foolproof way to buy and sell
on the web.
The Brighter Side of Darkness
 The deep Web is home to alternate search engines, e-mail services, file storage, file sharing,
social media, chat sites, news outlets and whistleblowing sites, as well as sites that provide a safer
meeting ground for political dissidents and anyone else who may find themselves on the fringes
of society.
 In an age where NSA-type surveillance is omnipresent and privacy seems like a thing of the past,
the dark Web offers some relief to people who prize their anonymity.
 Bitcoin may not be entirely stable, but it offers privacy, which is something your credit card
company most certainly does not.
 For citizens living in countries with violent or oppressive leaders, the dark Web offers a more
secure way to communicate with like-minded individuals.
Invisible Web Search Tools
• A List of Deep Web Search Engines – Purdue Owl’s Resources to Search the Invisible Web
• Art – Musie du Louvre
• Books Online – The Online Books Page
• Economic and Job Data – FreeLunch.com
• Finance and Investing – Bankrate.com
• General Research – GPO’s Catalog of US Government Publications
• Government Data – Copyright Records (LOCIS)
• International – International Data Base (IDB)
• Law and Politics – THOMAS (Library of Congress)
• Library of Congress – Library of Congress
• Medical and Health – PubMed
• Transportation – FAA Flight Delay Information
Future
 The lines between search engine content and the deep Web have begun to blur, as search
services start to provide access to part or all of once-restricted content.
 An increasing amount of deep Web content is opening up to free search as publishers and
libraries make agreements with large search engines.
 In the future, deep Web content may be defined less by opportunity for search than by access
fees or other types of authentication.
Conclusion
 The deep web will continue to perplex and fascinate everyone who uses the internet.
 It contains an enthralling amount of knowledge that could help us evolve technologically and as
a species when connected to other bits of information.
 And of course, its darker side will always be lurking, too, just as it always does in human nature.
 The deep web speaks to the fathomless, scattered potential of not only the internet, but the
human race, too.
References
 http://computer.howstuffworks.com/internet/basics/how-the-deep-web-works5.ht
m
 http://oedb.org/ilibrarian/invisible-web/
 http://en.wikipedia.org/wiki/Deep_Web
 http://money.cnn.com/infographic/technology/what-is-the-deep-web/?iid=EL
 http://en.wikipedia.org/wiki/Surface_Web
Thank You

The Deep Web: 21493 Shyneil Dsouza 21496 Aditya Upadhyay 21502 Poornank Shirke

Uploaded by

Copyright:

Available Formats

You might also like

The Deep Web: 21493 Shyneil Dsouza 21496 Aditya Upadhyay 21502 Poornank Shirke

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

The Deep Web: 21493 Shyneil Dsouza 21496 Aditya Upadhyay 21502 Poornank Shirke

Uploaded by

Copyright:

Available Formats

The Deep

 It is also known as the Clearnet, the visible Web or indexable Web.

 It is also called the Deepnet, Invisible Web or Hidden Web.

 Largest growing category of new information on the Internet.

 400-550X more public information than the Surface Web.

 Mike Bergman cited a January 1996 article by Frank Garcia:

 The collection of reachable pages defines the Surface Web.

 Limited access content

 Doctors could swiftly locate the latest research on a specific disease.

 The bad stuff, as always, gets most of the headlines.

 But you won't find this information with a Google search.

 That's where Bitcoin comes in.

 Bitcoin, it's basically an encrypted digital currency.

You might also like