The Deep Web: A Tool For Good and Evil

Ekloan Dreshaj Advanced Computer Technology
1
The Deep Web: a Tool for Good and Evil
Every technology that we discover has the potential to bring progress and human
benefit but also the potential to be used for destructive purposes. The Internet is definitely
something that fits this description. We are all quite familiar with the typical range of
websites that are accessible from google.com places we can do online shopping,
chatting with friends, or reading news articles. Yet what we are able to find using Google
is only a tiny portion of the whole Internet. The hidden parts of the Internet are called the
Deep Web, which is much larger and has much more data than the Internet that we are
accustomed to, called the Surface Web. Although the Deep Web contains a lot of useful
information worth finding, it also has many opportunities for anonymous illegal activity
to take place.
Compared to the size of the Surface Web, the Deep Web is enormous. The Deep
Web is said to contain up to 500 times more data than the Shallow Web.
8
More
specifically, the Deep Web consists about 550 billion individual documents measuring a
total of 7500 TB of data, compared to the Surface Webs 1 billion documents taking up
about 19 TB of space.
3
There are some studies that even estimate the Deep Web may
contain up to 91,850 TB of data.
2
These numbers are very important, because it shows
how much information is out there on the web. With new search engines and techniques,
many people have started finding ways to access this data and make it available for us to
search, just as we have always done for easier-to-reach static HTML pages.
Because data in the Deep Web cannot be found using standard methods,
researchers have been working on a variety of ways to compile and index data from these
hidden sites. As explained by He et al, deep websites each have backend databases that
2
are searchable by using HTML forms. They write, To access a web database, we must
first fine its entrances: query interfaces.
4
The major problem that these people are trying
to overcome is that there are so many different forms on the web, and the only way to get
the hidden data is to fill out these forms and see what comes back. The two most common
types of searching the Deep Web are surfacing, which involves spidering as many web
forms as possible and stockpiling the results, and mediating, which involves brokering a
search query in real time across multiple sites.
8
In their paper, Ajoudanian and Jazi have
presented a new algorithm to increase the speed of web mining, by using a Jaccard
measure to find correlations between attributes. They explain in their research that there
are two types of attributes - grouping and synonym attributes - where grouping attributes
are seen together (like first name and last name) and synonym attributes are not (like
number of tickets and number of passengers).
1
Another set of research, led by Barbosa
and Freire, looked at how to locate these searchable forms that can then be filled out.
They developed a form crawler that can focus a search on a given topic, and they use a
page classifier to guide the crawler, which can learn to keep track of promising links.
They claim that their technique can learn to identify promising links and use appropriate
stop criteria that avoid unproductive searches.
2
Stanford researchers Raghavan and
Garcia-Molina, who present a Layout-based Information Extraction Technique (LITE).
7

This new technique focuses on using the physical layout of a page to extract content,
instead of just looking at the underlying HTML text.
Although there is a wealth of useful information contained in these hidden Deep
Web pages, there are also a number of illicit activities that go on in some of these sites.
There are websites and underground marketplaces that are hidden from the Internet on
3
purpose, and can only be accessed by special software. The most well-known software
for this is called Tor, which was originally developed by the U.S. Naval Research
Laboratory.
6
Using this software, people can access deep website anonymously, which
leads to very little responsibility in terms of following the law. There are people who
using these deep web structures to sell illegal high-powered weapons, advertise contract
killer services, and sell a huge variety of illegal drugs.
6
Unfortunately these illegal
marketplaces are relatively easy to access, and recently there has even been a black
market search engine, called Grams, developed for these underground markets. This
search engine indexes eight markets, and can easily allow people to find places to
purchase illegal drugs.
5
Despite the abundance of illegal activity, the dark net is used
by dissidents living under oppressive governments and by journalists who receive
classified information from anonymous sources.
6

The Deep Web clearly contains an extremely varied set of data, ranging from
hidden pages on mainstream ecommerce websites to illicit drug operations. Many
researchers are focusing on how to index these sites, so that ordinary users will be able to
find much more information than they could ever do by searching Google. On the other
hard, no matter how much hidden data these researchers uncover, it is unlikely that this
will stop the illegal activity going on in hidden websites. These anonymous sites have
ways to get around laws prohibiting weapons selling or sex trafficking or drug trade - and
these are issues that people should be held accountable for. Regardless of this, the Deep
Web is a fascinating and mysterious feature of the Internet, and we should definitely
continue researching and exploring what it has to offer.

4
Works Cited

1. Ajoudanian, Shohreh, and Mohammad Davarpanah. "Deep Web Content
Mining." Proceedings of World Academy of Science:
Engineering & Technology (2009). Web.

2. Barbosa, Luciano, and Juliana Friere. "Searching for Hidden-Web Databases."
Eighth International Workshop on the Web and Databases (2005). Web.

3. Bergman, Michael K. "WHITE PAPER: The Deep Web: Surfacing Hidden
Value." Journal of Electronic Publishing 7.1 (2001). Web.

4. He, Bin et al. "Accessing the Deep Web."
Communications of the ACM 50.5 (2007): 94-101. Web.

5. Jeffries, Adrianne. "The Darknet Just Got Its First Black Market Search Engine."
The Verge. Web. July 2014.

6. Powell, Tom. "What Is the Dark Net? FOX28's Tom Powell Reports."
Fox 28. Web. July 2014.

7. Raghavan, Siriam, and Hector Garcia-Molina. "Crawling the Hidden Web."
27th International Conference on Very Large Data Bases (2001). Web.

8. Wright, Alex. "Searching the Deep Web."
Communications of the ACM 51.10 (2008): 14. Web.

The Deep Web: A Tool For Good and Evil

Uploaded by

Copyright:

Available Formats

You might also like

The Deep Web: A Tool For Good and Evil

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

The Deep Web: A Tool For Good and Evil

Uploaded by

Copyright:

Available Formats

Ekloan Dreshaj Advanced Computer Technology

You might also like