Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

Introduction

Web Analytics – 4th course. Degree on Data Science and Engineering

Patricia Callejo (pcallejo@inst.uc3m.es)


Angel Cuevas (acrumin@it.uc3m.es)
Rubén Cuevas (rcuevas@it.uc3m.es)
What is the World Wide Web (WWW)?
“It is an information system where documents and other web
resources are identified by Uniform Resource Locators (URLs, such
as https://example.com/), which may be interlinked by hyperlinks, and
are accessible over the Internet. The resources of the Web are
transferred via the Hypertext Transfer Protocol (HTTP), may be
accessed by users by a software application called a web browser, and
are published by a software application called a web server. ”

“The World Wide Web is not synonymous with the Internet, which pre-
dated the Web in some form by over two decades and upon which
technologies the Web is built.”
Source: Wikipedia
WWW’s history (the origin)

• Tim Berners-Lee is recognized as the inventor of the WWW in 1989. He


wrote the first web browser in 1990

• In 1991 Berners-Lee released his software

• In 1993, Mosaic the first graphical browser is released

• By the end of 1994 the Web had 10000 servers and 10 million users
WWW’s history (the development)
• Search engines:
• Altavista à 1995
• Backrub Google’s founders initial steps into search engines à
1996
• Google à 1998
• Mobile Web:
• Social Networks:
• IRC (A rudimentary not web-based chat service) à 1988 • WAP standard (Nokia, Ericsson, Unwired
• AOL Instant Messenger à 1997 Planet) à 1997
• Yahoo Messenger, MSN Messenger à 1999 • First smarthpones connected to Internet over
• Friendster (first social network) à 2002 3G connections à 2001
• LinkedIn, Myspace, Skype à 2003 • Iphone à 2007
• Facebook, Orkut (first failure of Google) à 2004 • LTE (high speed mobile connections ) à 2012
• Twitter à 2006
• Snapchat, Google+ (second failure of Google) à 2011
• Twitch à 2011
• Tinder à 2012
• Tiktok à 2017
WWW’s history (today)
The open, the deep and the dark web
Open Web (or Surface Web): “the body of content hosted on web
servers that can be accessed through any sort of web browser (e.g.,
Google Chrome). Additionally, this content can be indexed by search
engines, like Google or Bing, as the site operators have not specified
directions in the “robots.txt” for search engines not to display the
webpage.”

Source: https://cj.msu.edu/_assets/pdfs/cina/CINA-White_Papers-Holt_Open_Deep_Dark.PDF
The open web, the Deep web and the Dark
web
Deep Web: content hosted and accessible through the Open or Surface
Web, but it may not be accessible through search engines because of one of
several reasons:

1) the content is proprietary, involves personally identifiable


information (PII), or is regulated by law to restrict access (such as email
accounts, tax records, payment systems, etc)
2) the information is password protected (as with a forum that
requires users to register to observe content)
3) the content is behind a paywall (as with scientific journals and
media content)
4) the site operators have disabled features that allow the url to be
cached in search engine results
Source: https://cj.msu.edu/_assets/pdfs/cina/CINA-White_Papers-Holt_Open_Deep_Dark.PDF
The open web, the Deep web and the Dark
web
Dark Web: a portion of the Internet that can only be accessed via the
use of specialized encryption software and browser protocols.
Individuals can only access the Dark Web through the use of a
service/browser called TOR, which stands for the The Onion Router.
• Upside: It allows activists in non democratic regimes to perform
their online activity
• Downside: It allows criminals to conceal their activity (child
pornography trading, drug black markets, etc).

Source: https://cj.msu.edu/_assets/pdfs/cina/CINA-White_Papers-Holt_Open_Deep_Dark.PDF
The open, the deep and the dark web
What is web analytics?
- The WWW is the largest source of
data that has ever existed in the
history of the humanity

- How do we extract data from it?


Data Collection à Block 2

- Scrapers and APIs for Wikipedia,


Social Networks, News Outlets,
Housing services, etc.
What is web analytics?
The web is a giant graph that links
documents of any short
(webpages, photos, data files,
databases, etc)

How do we analyze it?


Graph Theory à Block 3

What is at the core of Google


search engine?, how social media
platforms identify influencers?,
how I identify clusters of
dangerous websites?
What is web analytics?
We have to share with others our
findings…

How do I present my findings about


the service analyzed in the Web?

Data Visualization à Block 4


What is web analytics?
We have to share with others our
findings…

How do I present my findings about


the service analyzed in the Web?

Data Visualization à Block 3

You might also like