Maharaja Surajmal Institute: Project Impact of Deep Web On Cyber Security Research

MAHARAJA SURAJMAL INSTITUTE
CYBER ETHICS
PROJECT
Impact of Deep Web on Cyber Security Research
SUBMITTED BY-: VINAY SALWAN (BCA 2B)

ROLL NO-09314902018
Impact of Deep Web on Cyber Security Research
Abstract
The surface Web, which people use routinely, consists of data
that search engines can find and then offer up in response to
queries. This is only the tip of the iceberg — a traditional search
engine sees about 0.03 percent of the information that is
available (Bergman 2001). Much of the rest is submerged in
what is called the deep Web. Also known as the “Undernet,”
“invisible Web” and the “hidden Web,” it consists of data that
cannot be located with a simple Google search.
Introduction
In order to formulate comprehensive strategies and
policies for governing the Internet, it is important to
consider insights on its farthest reaches — the deep
Web and, more importantly, the dark Web. This paper
endeavors to provide a broader understanding of the
dark Web and its impact on our lives. The dark Web is
the portion of the deep Web that has been intentionally
hidden and is inaccessible through standard Web
browsers. Dark Web sites serve as a platform
for Internet users for whom anonymity is essential, since
they not only provide protection from unauthorized users
but also usually include encryption to prevent
monitoring.
Objectives of Study
1. The deep web is huge, and nobody can see all of it
2. Good people use the deep and dark web, too.
3. The deep web can host command-and-control
infrastructure for malware.
4. Seizing criminal marketplaces doesn't do much due to
deep web.
Methodology of Deep Web

The term deep Web is used to denote a class of content on the
Internet that, for various technical reasons, is not indexed by
search engines. The dark Web is a part of the deep Web that
has been intentionally hidden and is inaccessible through
standard Web browsers. A relatively known source for content
that resides on the dark Web is found in the Tor Network.
Literature
One common misconception about the dark web and the deep
web is that these two terms are interchangeable. This is simply
not true. Take, for example, this sentence in Business Insider:
“The Dark (or Deep) Web, which refers to areas of the Internet
normally inaccessible to users without special anonymizing
software, first came to prominence with the Silk Road trial.”
While, yes, both the deep and dark web have been featured in
news stories about Silk Road, this writer is clearly referring
specifically to the dark web, which is just a tiny portion of
the deep web where users employ masked IP addresses to
conceal their identity.
A similar mistake is made by comedian Hannibal Buress in

a Funny or Die spoof of the Webby Awards called “The Deep
Webbys.” Awards include “Most Stolen Identity,” “Leakiest
Wiki,” and “The People’s Choice Award for the Trafficking of the
Choicest People.” Though his delivery is pristine, this is another
example of deep web being confused for dark web. The sorts of
illegal activities and documents named in these awards are
bought, sold, hosted via the dark web. The “Deep Webbys” is
too broad a title; the awards should technically be called the
“Dark Webbys,” a distinction that Buress, Funny or Die, and
the A.V. Club, which reported on Buress’ video, missed.
In Garner’s Modern American Usage, Bryan Garner advises

avoiding “skunked terms” like decimate if we don’t want to
distract readers or listeners who happen to have strong biases
either way. Is that what we should do here? Perhaps,
though deep web and dark web are still young terms; those
who currently confuse the two don’t have any strong biases
either way, so these terms are far from skunked. It’s possible
that, as people become more aware of the different entities
they describe, confusion will die down. But is this likely? The
deep technical nature of these terms might mean that the
majority of English speakers remain forever in the dark.
The most astonishing subset of the Deep Web is a collection of

dark alleys called the Dark Web. The Dark Web is generally
thought of as a collection of criminal elements intent on
subverting the law, stealing our money, and possibly
kidnapping our daughters.
John McAfee
“There's a compounding and unravelling chaos that is

perpetually in motion in the Dark Web's toxic underbelly.”
― James Scott
“The dark web is a world of power and freedom: of expression,
of creativity, of information, of ideas. Power and freedom
endow our creative and our destructive faculties. The dark web
magnifies both, making it easier to explore every desire, to act
on every dark impulse, to indulge every neurosis.”
― Jamie Bartlett
Conclusion
With the Internet Corporation for Assigned Names and

Numbers’ contract with the United States Department of
Commerce due to expire in 2015, the international debate on
Internet governance has been re-ignited. However, much of the
debate has been over aspects of privacy and security on the
visible Web and there has not been much consideration of the
governance of the “deep Web” and the “dark Web.”
Like any technology, from pencils to cell phones, anonymity

can be used for both good and bad. Users who fear economic
or political retribution for their actions turn to the dark Web for
protection. But there are also those who take advantage of this
online anonymity to use the dark Web for illegal activities such
as controlled substance trading, illegal financial transactions,
identity theft and so on.
Considering that the dark Web differs from the visible Web, it is
important to develop tools that can effectively monitor it.
Limited monitoring can be achieved today by mapping the
hidden services directory, customer data monitoring, social site
monitoring, hidden service monitoring and semantic analysis.
The deep Web has the potential to host an increasingly high
number of malicious services and activities. The global multi-
stakeholder community needs to consider its impact while
discussing the future of Internet governance.
An ever-increasing amount of information on the Web today is

available only through search interfaces: the users must type in
a set of keywords in a search form in order to access the pages
from certain Web sites. These pages are often referred to as
the Hidden Web or the Deep Web. Since there are no static
links to the Hidden Web pages, search engines cannot discover
and index such pages and thus do not return them in the
results.
However, according to recent studies, the content provided by

many Hidden Web sites is often of very high quality and can be
extremely valuable to many users.
Since the only “entry point” to a Hidden Website is a query

interface, the main challenge that a Hidden Web crawler has to
face is how to automatically generate meaningful queries to
issue to the site.
Many believe a Google search can identify most of the

information available on the Internet on a given subject. But
there is an entire online world – a massive one – beyond the
reach of Google or any other search engine. Policymakers
should take a cue from prosecutors – who just convicted one of
its masterminds – and start giving it some attention.
The scale of the Internet’s underworld is immense. The

number of non-indexed websites, known as the Deep Web, is
estimated to be 400 to 500 times larger than the surface web of
indexed, searchable websites. And the Deep Web is where the
dark side of the Internet flourishes.
While there are plenty of law-abiding citizens and well-

intentioned individuals (such as journalists, political
dissidents, and whistle-blowers) who conduct their
online activities below the surface, the part of the Deep Web
known as the Darknet has become a conduit for illegal and often
dangerous activities.
This policy brief outlines what the Deep Web and Darknet
are, how they are accessed, and why we should care about
them. For policymakers, the continuing growth of the Deep Web
in general and the accelerated expansion of the Darknet, in
particular, pose new policy challenges.
The response to these challenges may have profound

implications for civil liberties, national security, and the global
economy at large.
Obtained estimates together with a proposed sampling

technique could be useful for further studies to handle data
in the deep Web.
The deep Web, the huge part of the Web consisting of web
pages accessible via web search forms (or search interfaces),
is poorly crawled and thus invisible to current-day web search
engines.
Though the problems with crawling dynamic web content

hidden behind form-based search interfaces were evident as
early as 2000, the deep Web is still not adequately
characterized and its key parameters (e.g., the total number of
deep web sites and web databases, the overall size of the deep
Web, the coverage of the deep Web by conventional search
engines, etc.) can only be guessed.
Deep Web databases, whose content is presented as

dynamically generated Web pages hidden behind forms, have
mostly been left un indexed by search engine crawlers. In order
to automatically explore this mass of information, many current
techniques assume the existence of domain knowledge, which
is costly to create and maintain. In this article, we present a new
perspective on form understanding and deep Web data
acquisition that does not require any domain-specific
knowledge.
Unlike previous approaches, we do not perform the various

steps in the process (e.g., form understanding, m-record
identification, attribute labelling) independently but integrate
them to achieve a complete understanding of deep Web
sources.
Through information extraction techniques and using the form

itself for validation, we reconcile input and output schemas in a
labelled graph which is further aligned with a generic ontology.
The impact of this alignment is threefold: first, the resulting
semantic infrastructure associated with the form can assist Web
Crawlers when probing the form for content indexing; second,
attributes of response pages are labelled by matching known
ontology instances, and relations between attributes are
uncovered; and third, we enrich the generic ontology with facts
from the deep Web.
The deep Web consists of dynamically-generated Web pages

that are reachable by issuing queries through HTML forms. A
form is a section of a document with special control elements
(e.g., checkboxes, text inputs) and associated labels. Users
generally interact with a form by modifying its controls (entering
text, selecting menu items) before submitting it to a Web server
for processing.
Recently, there has been increased interest in the retrieval and

integration of hidden Web data with a view to leverage high-
quality information available in online databases. Although
previous works have addressed many aspects of the actual
integration, including matching form schemata and automatically
filling out forms, the problem of locating relevant data sources
has been largely overlooked. Given the dynamic nature of the
Web, where data sources are constantly changing, it is crucial to
automatically discover these resources.
However, considering the number of documents on the Web

(Google already indexes over 8 billion documents),
automatically finding tens, hundreds or even thousands of forms
that are relevant to the integration task is really like looking for a
few needles in a haystack. Besides, since the vocabulary and
structure of forms for a given domain are unknown until the forms
are actually found, it is hard to define exactly what to look for.
proposing a new crawling strategy to automatically locate

hidden- Web databases which aims to achieve a balance
between the two conflicting requirements of this problem: the
need to perform a broad search while at the same time avoiding
the need to crawl a large number of irrelevant pages.
The proposed strategy does that by focusing the crawl on a

given topic; by judiciously choosing links to follow within a topic
that are more likely to lead to pages that contain forms; and by
employing appropriate stopping criteria.

Maharaja Surajmal Institute: Project Impact of Deep Web On Cyber Security Research

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Maharaja Surajmal Institute: Project Impact of Deep Web On Cyber Security Research

Uploaded by

Copyright:

Available Formats

MAHARAJA SURAJMAL INSTITUTE

SUBMITTED BY-: VINAY SALWAN (BCA 2B)

Methodology of Deep Web

A similar mistake is made by comedian Hannibal Buress in

In Garner’s Modern American Usage, Bryan Garner advises

The most astonishing subset of the Deep Web is a collection of

“There's a compounding and unravelling chaos that is

With the Internet Corporation for Assigned Names and

Like any technology, from pencils to cell phones, anonymity

An ever-increasing amount of information on the Web today is

However, according to recent studies, the content provided by

Since the only “entry point” to a Hidden Website is a query

Many believe a Google search can identify most of the

The scale of the Internet’s underworld is immense. The

While there are plenty of law-abiding citizens and well-

The response to these challenges may have profound

Obtained estimates together with a proposed sampling

Though the problems with crawling dynamic web content

Deep Web databases, whose content is presented as

Unlike previous approaches, we do not perform the various

Through information extraction techniques and using the form

The deep Web consists of dynamically-generated Web pages

Recently, there has been increased interest in the retrieval and

However, considering the number of documents on the Web

proposing a new crawling strategy to automatically locate

The proposed strategy does that by focusing the crawl on a

You might also like