Professional Documents
Culture Documents
Option C - Web Science - Youtube
Option C - Web Science - Youtube
Option C - Web Science - Youtube
Web Science
IB Computer Science
Intro
- Exhausting information dump
- Most difficult part is PHP/HTML/mySQL
- Overlaps with Topic 3
- HL is Random
- Why did I do this to my students?
- Study Guide out soon
Can be accessed via various devices such Can be accessed via the internet using web
as computers, mobile devices, and tablets. browsers on compatible devices.
Includes all connected devices and Includes all websites and web pages
networks worldwide. accessible via the internet.
Uses standardized protocols such as TCP/IP Uses standardized protocols such as HTTP
and DNS. Email, FTP (File Transfer Protocol) and HTML.
and Email are considered part of the
internet, but not the WWW.
How does the internet work? (Topic 3)
- Every website is stored on a server, which is just a
computer. Every computer has an IP address.
(192.168.0.1)
- An IP address is the most direct way to reach a
server.
- When you access a website from your web browser,
you send a request to that server for its contents.
- Data is transmitted across the internet using data
packets through all the switches, routers and hubs
(the internet’s plumbing) that make up the internet.
What are protocols? (Topic 3)
- a set of rules and guidelines that
govern how data is transmitted,
received, and processed across a
network
- define the format, timing,
sequencing, and error control of
data transmitted over the network
- Examples: TCP/IP, HTTP, FTP, SMTP,
(Simple Mail Transfer Protocol)
Why are protocols important? (Topic 3)
1) Allow successful communication to take place
2) Ensure data integrity processes such as error checking
3) Regulate speed of data packet flow; prevent a device from sending at a faster
rate than the receiving device can receive
4) Manages packet switching
URL (Uniform Resource Locator)
- A URL is like the address you type into your browser. Ex: www.google.com
- A URL has a protocol, domain name, and web page/file name.
URI (Uniform Resource Identifier)
- All URLs are URIs, but not all URIs are URLs
- URLs allow you to access a resource, while, URIs simply identify a resource
- In the following example, #date need not be included to access the resource.
Web Browser (Client)
- software application that is used to access and display
content on the World Wide Web
- allows users to view web pages, download files, and
interact with web-based applications
- use various technologies to render web pages, such as
HTML, CSS, and JavaScript
- allow users to manage bookmarks, history, and
preferences
- communicate with web servers using the Hypertext
Transfer Protocol (HTTP) or its secure variant, HTTPS
- Examples: Google Chrome, Safari, Mozilla Firefox
Cookies
- small text files that are stored on a user's computer or mobile device by a website
that the user visits
- allowing websites to remember user preferences, login information, and other data
- contains a unique identifier that allows the website to recognize the user's browser
on subsequent visits
Process
1) Website sends a cookie to the user's browser,
2) Cookie stored on the user's computer or mobile device
3) Website used cookie to keep track user's preferences, such as language settings,
font size, or display options
4) Website uses identifier to read cookie on subsequent visits and serve content to user
User Implications of Cookies
- Can be used to track user behavior
across multiple website
- This information can then be used
to deliver targeted advertising
- Can lead to privacy concerns
URLs and DNS (Domain Name System)
1) When you type a URL into your web browser and hit enter, it gets sent to a DNS
server.
2) A Domain Name Server checks if a URL exists in it’s list of URLs and if so, it
returns the corresponding IP address to you, the user. Your web browser then
automatically sends a request using the IP address.
3) If not, it passes the user request on to other DNS servers in the hierarchy, until
there’s a match at which point, it sends you back the IP address.
4) When it’s found, the URL is sent back to the original DNS server and then the
user’s web browser to make the request.
The DNS Hierarchy
- DNS servers are a common resource,
regulated by the global, non-governmental
Internet Corporation for Assigned Names
and Numbers (ICANN), Internet Engineering
Task Force (IETF), and World Wide Web
Consortium (W3C)
- DNS servers are operated and manages by
ISPs (Internet Service Providers), Domain
Name Registrars (GoDaddy etc.)
- The DNS system has no fixed number of
DNS servers globally, but it is a massive
network of servers operated
collaboratively.
HTTP (Hypertext Transfer Protocol)
- A protocol for transmitting
information on the WWW
- Anytime you request a webpage, you
are using HTTP
- When you send a request to that web
server using an IP, you do it using a
protocol called hypertext transfer
protocol (HTTP).
- All this means is that information that
you send to the server using an IP
address must be written and stored in
a very specific way, following the
rules of HTTP.
HTTP Request Cycle Steps (1)
1. DNS resolution: Client sends a DNS (Domain Name System) request to a DNS
server, which returns the IP address of the server.
2. Connection establishment: Once the IP address of the server is known, the
client establishes a TCP (Transmission Control Protocol) connection with
server.
3. Request sending: The client sends an HTTP request to the server over the
established TCP connection.
4. Request processing: The server receives the request and processes it.
HTTP Request Cycle Steps (2)
5. Response generation: Once the server has processed the request and
located the requested resource, it generates an HTTP response containing
the requested resource.
6. Response transmission: The server sends the HTTP response back to the
client over the established TCP connection.
7. Rendering: The client receives the HTTP response and renders the content,
typically using HTML, CSS, and JavaScript.
8. Connection termination: Once the client has received the response, the
TCP connection is closed
HTTPS (Hypertext Transfer Protocol Secure)
-This is a secure version of HTTP.
All data sent and received via HTTPS is encrypted.
- This encryption follows a certain standard, which can be either SSL (Secure
Sockets Layer) or TLS (Transport Layer Security).
- It uses a digital certificate, which a website needs to apply for from a
certificate authority, which will check if their website is authentic and
trustworthy.
- Your browser will check and validate this certificate before accepting data
from the server.
HTTPS Request Cycle Steps (1)
1. DNS resolution: Client sends a DNS (Domain Name System) request to a DNS
server, which returns the IP address of the server.
2. Connection establishment: Once the IP address of the server is known, the
client establishes a TCP (Transmission Control Protocol) connection with
server. This is done using the Transport Layer Security (TLS) or Secure Sockets
Layer (SSL) protocol.
HTTPS Request Cycle Steps (2)
3. Handshake: During the connection establishment phase, the client and server
perform a TLS/SSL handshake to negotiate the encryption algorithms and
exchange cryptographic keys that will be used to secure the communication
between them. The server also sends its digital certificate to the client during
this phase.
4. Certificate verification: The client (browser) checks the digital certificate sent
by the server to verify that it is valid and issued by a trusted certificate
authority (CA). This involves checking the certificate chain, verifying the digital
signature on the certificate, and checking the validity period of the certificate.
HTTPS Request Cycle Steps (3)
5. Request sending: Once the client has verified the server's digital certificate, it sends an
HTTP request to the server over the established secure TCP connection.
6. Request processing: The server receives the request and processes it.
7. Response generation: Once the server has processed the request and located the
requested resource, it generates an HTTP response containing the requested resource.
8. Response transmission: The server sends the HTTP response back to the client over the
established TCP connection.
9. Rendering: The client receives the HTTP response and renders the content, typically using
HTML, CSS, and JavaScript.
10. Connection termination: Once the client has received the response, the TCP connection
is closed
Ports
- Communication endpoint used by a network protocol to
identify a specific process or service running on a device.
- Ports are identified by numbers ranging from 0 to 65535,
and each number is associated with a specific protocol or
service.
- Examples:
- port 80 is commonly used for HTTP web traffic
- port 443 is used for HTTPS secure web traffic
- Important for allowing multiple network services to run
simultaneously on a device, each with its own unique
identifier to receive data
What is HTML (Hypertext Markup Language)?
- used to create web pages and other types of
documents that are intended for display in a
web browser
- used to structure content on web pages and
provides a way to add text, images,
multimedia, links, and other types of content
to a web page
- based on a set of tags that are used to
indicate how content such as headings,
paragraphs, lists, tables, and forms should be
displayed on the page
- Used in conjunction with CSS and Javascript
What is Client-Side Scripting?
- the process of using scripting or
programming languages to add
interactive or dynamic behavior to
web pages and web applications
- Client-side scripts (code) are
executed in the web browser
- Examples of languages used in
client-side scripts include HTML,
CSS, and Javascript
- Javascript is the most commonly
used language to add interactivity
What is Server-Side Scripting?
- the process of using scripts or programming languages to generate web pages
on the server-side before sending them to the client (i.e., the user's web
browser)
- Scripts are typically written in languages like PHP, C#, Java, Ruby, etc.
Process
1) The user uses their web browser to request a web page from a web server
2) A script (some code) is executed on the server, which can access a data on a
database, manipulate that or other data, and generate some
HTML/CSS/Javascript
3) The generated output is returned to client-side to be displayed in the user’s
browser.
CGI (Common Gateway Interface)
- A program that functions as an intermediary for executing server-side scripts
on web servers
- Sort of a middle-man, accepting an incoming request, executing the
corresponding script, and returning the result to the client
- Enables web servers to execute server-side scripts (in PHP, C#, etc.) to produce
dynamic content
- Largely been replaced by language-specific solutions for executing code on
web servers and returning the result to the user
<html>
<head>
<meta charset="UTF-8">
<meta name="description" content="My first webpage">
<meta name="author" content="Your Name">
<title>My Webpage</title>
</head>
<body>
<header>
<h1>Welcome to My Webpage</h1>
</header>
<main>
<p>This is my first webpage. I'm so excited!</p>
</main>
<footer>
<p>Copyright © 2022 My Webpage. All rights
reserved.</p>
</footer>
</body>
</html> example_1.html
<!DOCTYPE html>
<html>
<head>
<title>My Form Example</title>
</head>
<body>
<h1>Enter Your Details</h1>
<form>
<label for="name">Name:</label>
<input type="text" id="name" name="name"><br>
<label for="email">Email:</label>
<input type="email" id="email" name="email"><br>
<label for="age">Age:</label>
<select id="age" name="age">
<option value="18">18</option>
<option value="19">19</option>
<option value="20">20</option>
<option value="21">21</option>
<option value="22">22</option>
</select><br>
<label for="email">Email:</label>
<input type="email" id="email" name="email"><br>
// Check connection
if ($conn->connect_error) {
die("Connection failed: " . $conn->connect_error);
}
Server-side
processing No server-side processing Server-side processing
Limited client-side
Client-side processing, mostly basic
processing scripts Rich client-side processing using JavaScript and AJAX
Interactivity Minimal interactivity High level of interactivity
Performance Fast loading times Slower loading times
Easy to maintain, no need Requires more maintenance due to frequent updates and
Maintenance for frequent updates changes
1) Crawling - the search engine's crawler to scan the web, following links from page to page to
collect content
2) Indexing - Once the crawler has collected the content, it is organized and stored in a searchable
index, along with information about the page, such as the title, URL, and other metadata.
3) Ranking - When a user enters a search query, the search engine's algorithm retrieves the most
relevant pages from the index and ranks them based on a number of factors, including the
relevance of the content, the quality of the website, the popularity of the page, and other
factors.
How do search engines work? (2)
4) Deliver Results - The search engine returns a list of results based on the user's
search query, with the most relevant pages appearing at the top of the list.
5) Refinement - Users can refine their search results using a variety of tools,
including filters, sorting, and advanced search options.
- Search engines use complex algorithms to determine the relevance and quality
of content and to rank pages accordingly.
- These algorithms are constantly evolving to provide better search results and
to stay ahead of attempts to manipulate search rankings.
Typical Search Engine Metrics
1) Relevance - the relevance of a web page to the user's query, taking into account factors such as
the title tag, headings, content, and meta descriptions.
2) Authority - the number and quality of links pointing to it. Pages with more high-quality links are
generally considered to be more authoritative.
3) User experience - factors such as page load time, mobile-friendliness, and ease of navigation
to determine the user experience of a web page.
4) Freshness - more recently updated pages are often given a higher ranking.
5) Engagement - how users engage with a web page, taking into account factors such as
click-through rate, bounce rate, and time spent on the page. Pages with high engagement are
generally considered to be more valuable to users and are given a higher ranking.
Metatags
- HTML tags that provide information about a web page to search engines and
other applications that may access the page.
- Typically placed in the head section of an HTML document and provide
information such as the page's title, description, keywords, author, and
encoding.
- help search engines and other web services to identify and categorize the
contents of a webpage
Metatags Example
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<title>My Website</title>
<meta name="description" content="This is a website about web
development">
<meta name="keywords" content="web development, HTML, CSS,
JavaScript">
<meta name="author" content="John Doe">
</head>
<body>
<!-- page content goes here -->
</body>
</html>
Surface Web
- Refers to content accessible by
standard search engines
- Includes publicly available content
viewable in web browsers
- Majority of content exists in the
deep web
Deep Web
- Refers to parts of the web not indexed by
standard search engines
- Includes private networks, dynamically
generate content, non-public databases,
etc.
- Includes password-protected pages
- Includes pages with no incoming links
Dark Web
- A subset of the deep web
intentionally hidden from search
engines
- Requires specialized software like
TOR, a web browser designed to
ensure anonymity to access
- Typically associated with illicit
activity
Search Engine Algorithms: PageRank
- Analyzes links between web pages to rank relevant web pages in terms of
importance
- A web page is considered more credible if other pages link to it
- Each web page assigned a score between 0 and 1 based on quality and quality
of incoming links
Considerations
Pros Cons
Increased convenience and efficiency Privacy concerns
Greater connectivity and information access Security risks
Ability to adapt to user needs Cost of implementing and maintaining technology
Potential for improving quality of life Potential for technology dependence
Enables seamless integration of technology into
daily life Potential for loss of human interaction
P2P Computing
Pros Cons
Scalability: The cloud offers scalability, meaning Downtime: Because the cloud is hosted by a third
businesses can easily add or remove resources as party, businesses may experience downtime if the
needed. cloud service goes down.
Cost savings: Cloud computing can offer cost savings Security concerns: Storing data in the cloud can
as businesses only pay for what they use, instead of raise security concerns, as businesses have less
having to purchase and maintain their own control over their data and may be at risk for
infrastructure. breaches.
Dependency on third-party providers: Businesses
Flexibility: Cloud computing offers flexibility, allowing may become overly reliant on third-party cloud
businesses to work from anywhere with an internet providers, making it difficult to switch providers or
connection. move data back in-house.
Cloud Computing vs. Grid Computing
Grid Computing Cloud Computing
relies on a distributed network of computers relies on centralized data centers that are
that are owned by different organizations owned and managed by a single
organization
more difficult to secure, since it relies on a more secure, since it is managed by a single
decentralized network of computers that organization that can enforce consistent
may have different security policies and security policies
configurations
Interoperability
- Interoperability in the context of computing refers to the ability of different
systems, applications, devices, and services to work together seamlessly.
- Interoperable systems are compatible with each other, meaning they can
communicate and exchange data without any issues.
- Standardization: Interoperability often relies on the use of standardized
protocols, data formats, and interfaces to ensure that different systems can
communicate effectively.
- Flexibility: Interoperable systems are designed to be flexible and adaptable,
allowing them to work with a wide range of other systems and technologies.
Standards
- Established guidelines or specifications that determine how certain
technologies or processes should operate.
- Developed to ensure compatibility and interoperability between different
systems, applications, and devices.
- Examples: TCP/IP protocol for internet communication, the HTML and CSS
standards for web development, and the ISO 9001 standard for quality
management systems.
What are open standards?
- Open standards provide a publicly
available specification for a specified
task
- Agreed set of rules or methods that
allow interoperability between different
devices
- Everyone knows what the standards are
- Examples:
- HTML (Hypertext Markup Language)
- TCP/IP (Transmission Control Protocol/Internet
Protocol)
- CSS (Cascading Style Sheets)
Benefits of Open Standards
1) Interoperability - Various devices can communicate with each other
2) Encourages Innovation - no fees to make your devices compatible with others,
which means that more people can create new and unique, compatible
devices
3) Longevity - Open standards typically developed and maintained by a large,
diverse community - means they are likely to be used for a long time
4) Security - because the standards are open, more people can analyze them and
make recommendations to make software better and more secure
Standards vs. Protocols
- standard - document that specifies a set of requirements, specifications, or
guidelines that must be followed in order to achieve a certain level of quality
or compatibility.
- protocol - set of rules or guidelines that define how devices or systems
should communicate with each other
- Example: standard might specify the physical characteristics of a network,
such as the types of cables and connectors that can be used, while a network
might define how devices on the network should encode and transmit data.
What is the ISO?
- The International Organization for
Standardization (ISO) is an independent,
non-governmental organization that
develops and publishes standards for
various industries, including computing.
- Develops standards for software and
hardware to ensure compatibility and
interoperability across different systems
and platforms
- Establishes standards for network
protocols, security, and communication
technologies
Creative Commons License
- A Creative Commons license is a type
of copyright license that allows
creators to share their work with
others while retaining some rights
over it.
- Used when creators want to make
their work available to the public but
still maintain control over how it is
used.
- Allows others to use and share the
work without seeking explicit
permission, as long as they follow the
terms of the license.
HL
Graph Theory + WWW
- Graph theory is branch of mathematics used extensively in the context of the
World Wide Web (WWW)
- The Web can be represented as a graph, with web pages as nodes and
hyperlinks as edges between the nodes.
- Graph theory algorithms are used to analyze the structure and properties of
this web graph.
- Examples: PageRank & HITS Algorithms
Directed Graph (aka Web Graph)
- a mathematical representation of the
World Wide Web, where web pages are
represented as nodes and hyperlinks
as arrows..
- The arrows indicate the direction of
the link from one web page to
another.
- The Bowtie Model is a specific way to
represent the WWW as a Directed
Graph
Bowtie Model Components
1) The core: This includes the central and most densely connected part of the
web. It is made up of a small number of highly interconnected pages that form
the backbone of the web.
2) IN-Nodes: These are the pages that all have links to the core, have no links
from the core to themselves. They may be important resources or services, but
they are not central to the structure of the web.
3) OUT-Nodes: These are all the pages that that he core links to, but do not link
back to the core. They may include sites that are dependent on the core for
traffic, such as e-commerce or news sites.
Bowtie Model
4) Tendrils: These are the pages that can be reached from the in- or
out-components, but cannot reach the core. They may be small, specialized
sites or pages that are not well connected to the rest of the web.
5) Tubes: These are the pages that link the different components together. They
may include directories, search engines, and other pages that provide
navigation and connections between different parts of the web.
Sub-Graph
- A sub-graph is a subset of a larger
graph that contains only a portion
of its nodes and edges.
- It can be created by selecting a
specific set of nodes and edges
from the larger graph and isolating
them into a smaller graph.
- Sub-graphs are commonly used in
graph theory and network analysis
to simplify complex graphs and to
focus on specific areas of interest
within a larger network.
Ambient Intelligence
- refers to the integration of technology in the environment to create
smart spaces that can interact with humans in a natural and intuitive
way.
- involves the use of sensors, wireless networks, and other technologies to
create smart environments that can monitor and respond to changes in
the user's environment, such as their location, activities, and social
context.
- Example: a smart home system that uses sensors, machine learning
algorithms, and other technologies to adjust lighting, temperature, and
other environmental factors based on user behavior and preferences.
Collective Intelligence
- refers to the ability of a group of individuals to collaborate and pool their knowledge,
skills, and experiences to solve problems, make decisions, or create new ideas.
- based on the idea that the intelligence of a group can be greater than that of any
individual member, and that through collaboration and communication, groups can
achieve outcomes that are superior to those of even the most knowledgeable
individuals.
- Collective intelligence can be facilitated by technology, such as social media platforms,
wikis, or crowdsourcing tools, which allow groups to share and collaborate on
information in real-time.
- Example: Wikipedia
Power Law Distribution (“Power Laws”) + WWW
- Power laws describe a relationship
between two variables where one
variable's frequency or magnitude is
inversely proportional to its rank or
size
- power laws describe the distribution
of links to web pages, where a small
number of pages (known as hubs)
have a disproportionately large
number of links pointing to them,
while the majority of pages have
relatively few links.
Power laws have been used to explain the structure of the World Wide Web, which
has a highly skewed distribution of incoming and outgoing links among its pages. The
power law distribution suggests that there are a few highly connected nodes (known
as "hubs" or "super-nodes") that are much more connected than most of the other
nodes in the network. This is often referred to as the "rich get richer" effect.
In the context of the WWW, power laws help explain why some websites become
incredibly popular and attract large numbers of incoming links, while most websites
have relatively few links. This is because when a new site is created, it is more likely
to receive incoming links from sites that are already popular and highly connected,
thus increasing its own popularity and connectivity in a self-reinforcing cycle. This
leads to a highly skewed distribution of incoming links and creates the bowtie shape
of the web as described by the Bowtie Model.