Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

Scraper site

A scraper site is a website that copies content from other websites using web scraping. The content is then
mirrored with the goal of creating revenue, usually through advertising and sometimes by selling user data.
Scraper sites come in various forms. Some provide little, if any material or information, and are intended to
obtain user information such as e-mail addresses, to be targeted for spam e-mail. Price aggregation and
shopping sites access multiple listings of a product and allow a user to rapidly compare the prices.

Examples of scraper websites


Search engines such as Google could be considered a type of scraper site. Search engines gather content
from other websites, save it in their own databases, index it and present the scraped content to their search
engine's own users. The majority of content scraped by search engines is copyrighted.[1]

The scraping technique has been used on various dating websites as well. These sites often combine their
scraping activities with facial recognition.[2][3][4][5][6][7][8][9][10][11]

Scraping is also used on general image recognition websites, and websites specifically made to identify
images of crops with pests and diseases.[12][13]

Made for advertising


Some scraper sites are created to make money by using advertising programs. In such case, they are called
Made for AdSense sites or MFA. This derogatory term refers to websites that have no redeeming value
except to lure visitors to the website for the sole purpose of clicking on advertisements.[14]

Made for AdSense sites are considered search engine spam that dilute the search results with less-than-
satisfactory search results. The scraped content is redundant to that which would be shown by the search
engine under normal circumstances, had no MFA website been found in the listings.

Some scraper sites link to other sites in order to improve their search engine ranking through a private blog
network. Prior to Google's update to its search algorithm known as Panda, a type of scraper site known as
an auto blog was quite common among black-hat marketers who used a method known as spamdexing.

Legality
Scraper sites may violate copyright law. Even taking content from an open content site can be a copyright
violation, if done in a way which does not respect the license. For instance, the GNU Free Documentation
License (GFDL)[15] and Creative Commons ShareAlike (CC-BY-SA)[16] licenses used on Wikipedia[17]
require that a republisher of Wikipedia inform its readers of the conditions on these licenses, and give credit
to the original author.

Techniques
Depending upon the objective of a scraper, the methods in which websites are targeted differ. For example,
sites with large amounts of content such as airlines, consumer electronics, department stores, etc. might be
routinely targeted by their competition just to stay abreast of pricing information.

Another type of scraper will pull snippets and text from websites that rank high for keywords they have
targeted. This way they hope to rank highly in the search engine results pages (SERPs), piggybacking on
the original page's page rank. RSS feeds are vulnerable to scrapers.

Other scraper sites consist of advertisements and paragraphs of words randomly selected from a dictionary.
Often a visitor will click on a pay-per-click advertisement on such site because it is the only comprehensible
text on the page. Operators of these scraper sites gain financially from these clicks. Advertising networks
claim to be constantly working to remove these sites from their programs, although these networks benefit
directly from the clicks generated at this kind of site. From the advertisers' point of view, the networks don't
seem to be making enough effort to stop this problem.

Scrapers tend to be associated with link farms and are sometimes perceived as the same thing, when
multiple scrapers link to the same target site. A frequent target victim site might be accused of link-farm
participation, due to the artificial pattern of incoming links to a victim website, linked from multiple scraper
sites.

Domain hijacking

Some programmers who create scraper sites may purchase a recently expired domain name to reuse its
SEO power in Google. Whole businesses focus on understanding all expired domains and utilising them
for their historical ranking ability exist. Doing so will allow SEOs to utilize the already-established
backlinks to the domain name. Some spammers may try to match the topic of the expired site or copy the
existing content from the Internet Archive to maintain the authenticity of the site so that the backlinks don't
drop. For example, an expired website about a photographer may be re-registered to create a site about
photography tips or use the domain name in their private blog network to power their own photography
site.

Services at some expired domain name registration agents provide both the facility to find these expired
domains and to gather the HTML that the domain name used to have on its web site.

See also
Scraping
Contact scraping
Domain parking
Web scraping
Blog scraping
Multi-protocol messengers: can connect to several networks, yet require to have an account
on all of these, so don't violate any terms of the networks
Content farm

References
1. Google 'illegally took content from Amazon, Yelp, TripAdvisor,' report finds (https://www.theg
uardian.com/technology/2015/mar/20/google-illegally-took-content-from-amazon-yelp-tripad
visor-ftc-report)
2. This App Lets You Find People On Tinder Who Look Like Celebrities (https://www.buzzfeed
news.com/article/katienotopoulos/this-app-lets-you-find-people-on-tinder-who-look-like)
3. Dating app boss sees ‘no problem’ on face-matching without consent (https://nakedsecurity.s
ophos.com/2017/06/23/dating-app-boss-sees-no-problem-on-face-matching-without-consen
t/)
4. Dating.ai App Matches You With Celebrity Look-alikes (https://www.elitedaily.com/dating/dati
ng-ai-app-celebrities/1998422)
5. Facial recognition app matches strangers to online profiles (https://www.cnet.com/news/facia
l-recognition-app-matches-strangers-to-online-profiles/)
6. NameTag: Facial recognition app criticized as creepy and invasive (http://www.cbc.ca/news
blogs/yourcommunity/2014/01/nametag-facial-recognition-app-criticized-as-creepy-and-inva
sive.html)
7. Swipe Buster (https://www.vanityfair.com/news/2016/04/check-tinder-cheater-swipe-buster)
8. Stalker-friendly app, NameTag, uses facial recognition to look you up online (https://nakedse
curity.sophos.com/2014/01/09/stalker-friendly-app-nametag-uses-facial-recognition-to-look-y
ou-up-online/)
9. This Smart (but Unsettling) App Lets You Point Your Phone at People to Find Out Who They
Are (https://www.inc.com/minda-zetlin/this-unbelievable-app-is-like-shazam-for-faces.html)
10. Truly.am Uses Facial Recognition To Help You Verify Your Online Dates (https://techcrunch.
com/2013/10/27/truly-am-uses-facial-recognition-to-help-you-verify-your-online-dates/)
11. 3 Fascinating Search Engines That Search for Faces (https://www.makeuseof.com/tag/3-fas
cinating-search-engines-search-faces/)
12. Wolfram has created a website that will identify any image you throw at it (https://www.thever
ge.com/2015/5/13/8603531/wolfram-image-identification-site-trained-by-chewbacca)
13. Machine Learning Helps Small Farmers Identify Plant Pests And Diseases (https://www.fast
company.com/40468146/machine-learning-helps-small-farmers-identify-plant-pests-and-dis
eases)
14. Made for AdSense (http://google.about.com/od/m/g/mfadef.htm)
15. "Text of the GNU Free Documentation License" (https://en.wikipedia.org/wiki/Wikipedia:Text
_of_the_GNU_Free_Documentation_License).
16. "Creative Commons Attribution-ShareAlike 3.0 Unported License" (https://creativecommons.
org/licenses/by-sa/3.0/legalcode).
17. "Wikipedia:Reusing Wikipedia content" (https://en.wikipedia.org/wiki/Wikipedia:Reusing_Wi
kipedia_content).

Retrieved from "https://en.wikipedia.org/w/index.php?title=Scraper_site&oldid=1153151525"

You might also like