Download as pdf or txt
Download as pdf or txt
You are on page 1of 487

BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE, Pilani

Pilani Campus
AUGS/AGSR Division

SECOND SEMESTER 2021-2022


Course Handout (Part II)
Date: 15 Jan 2022

In addition to Part-I (General handout for all courses appended to the timetable) this portion gives
further specific details regarding the course:

COURSE NO. : CS F303


COURSE TITLE : COMPUTER NETWORKS
INSTRUCTOR-In-Charge : VIRENDRA SINGH SHEKHAWAT
E-mail: vsshekhawat@pilani.bits-pilani.ac.in
Instructor(s) : Ashutosh Bhatia
E-mail: ashutosh.bhatia@pilani.bits-pilani.ac.in

Course page: http://nalanda-aws.bits-pilani.ac.in and Microsoft Teams

Scope and Objectives


This course will give you a breakdown of the applications, communications protocols, and network services
that make a computer network work. We will closely follow the top down approach to computer networking
as given in the textbook, which will enable you to understand the most visible part i.e. the applications, and
then seeing, progressively, how each layer is supported by the next layer down. Most of the time our example
network will be the Internet. Also, a chapter on wireless and mobile networks will be covered as currently
users access the Internet from offices, from homes, while on move, and from public places wirelessly. There
will be laboratory sessions to provide practical skills using a network simulator (NS-2), a network protocol
analyzer tool (Wireshark) and TCP/IP socket programming.

TEXT BOOK
[T1] James F. Kurose, and Keith W. Ross: Computer Networking: A Top-Down Approach Featuring the
Internet, Sixth Edition, Pearson Education, India, 2017. (Fifth Edition is also fine)

[T2] L. Peterson and B. Davie, Computer Networks: A Systems Approach, Fifth Edition, Elsevier, 2012

REFERENCE BOOKS
[R1] Andrew S. Tanenbaum & David J. Wetherall: Computer Networks, 5th Edition, Pearson, New Delhi,
2014.
[R2] Douglas E. Comer: Hands-on Networking, Pearson, New Delhi, 2015.
[R3] W. R. Stevens, UNIX Network Programming, Vol I, Networking APIs: Sockets and XTI, Pearson
Education, 3rd Edition.

Please Do Not Print Unless Necessary


BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE, Pilani
Pilani Campus
AUGS/AGSR Division

Module Topics Learning Objectives


No.
Internet Architecture and Computer  To know about elements of computer
Network Primitives: Overview of computer network design
M1 network building blocks, Internet architecture,  To understand the Internet Design
protocol layers Philosophy and layered architecture
Network Applications (Application Layer):  To understand working of various network
Principles of network applications (e.g., applications
M2 HTTP, FTP, e-mail, P2P, DNS etc.), Creating  To learn network application creation
network applications using socket process using socket programming.
programming
End to End Data Transfer (Transport  To understand end-to-end data transfer
Layer): Data transport services: mechanism used in the Internet.
Connectionless (UDP), Connection oriented  To understand congestion control and
M3 (TCP), Reliable data transfer protocol design, resource allocation principles used in the
Congestion control and resource allocation Internet on end-to-end basis
principles, TCP congestion control and
performance measurement
Data Routing and Forwarding (Network  To understand how to assign addresses to
Layer): IP addressing (IPv4 and IPv6) for the communicating nodes in the IP network
host and network devices, Network  To understand IP addressing mechanism to
M4 segmentation using subnets, IP Routing segregate a network into multiple
algorithms and protocols to move datagrams subnetworks for scalability
in the Internet (One to one, One to all, One to  To understand data routing and forwarding
many) mechanisms used in the Internet

Access Networks & LANs (Link Layer):  To understand how data moves from one
Hop by Hop data transmission using link layer hop to another hop between two end points.
frames, Multiple access links and protocols:  To learn about local area network design
M5 Point-to-Point and Broadcast link (LANs), and performance issues
Node addressing in switched LANs  To understand different channel access
(Ethernet), Link Virtualization (MPLS) protocols
Wireless and Mobile Networks: Wireless  To understand the challenges faced by IP
links and network characteristics, Wi-Fi network due to mobile communicating
M6 (802.11) networks, Node mobility nodes
management in wireless networks (Mobile IP)  To understand wireless network access in
IP networks.

Please Do Not Print Unless Necessary


BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE, Pilani
Pilani Campus
AUGS/AGSR Division

PLAN OF STUDY
Lect. Topics References
No.
M1: Internet Architecture and Computer Network Primitives
1-3 Internet Architecture, Network Hardware: The Network Edge, The Network Core, T1: 1.1 – 1.5
ISPs and Internet Backbones, Delay, Loss and Throughput in Packet Switched
Networks, Protocol Layers and their Service Models (TCP/IP)

M2: Network Applications (Application Layer)


4-5 Principles of Network Applications, Hypertext Transfer Protocol (HTTP): T1: 2.1 – 2.3
Persistent vs. Non-persistent connections, Cookies, Web Caching, File Transfer
Protocol: FTP
6-7 Mail Transfer Protocols (SMTP, POP3, IMAP), HTTP 1.0 and HTTP 2.0, The T1: 2.4 – 2.5
Internet Directory: Domain Name Systems (DNS), DNS services,
8-9 Peer to Peer (P2P) File distribution: BitTorrent, Distributed Hash Tables (DHTs) T1: 2.6
M3: End to End Data Transfer (Transport Layer)
10-12 Transport layer services: Connection oriented vs. Connectionless, Multiplexing, T1: 3.1 – 3.4
De-multiplexing, UDP, Principles of Reliable Data Transfer (Go-Back-N, and
Selective Repeat).
13-14 Introduction to Socket Programming; TCP, UDP, Creating simple Client Server T1: 2.7
Applications
15-18 Connection oriented transport using TCP: TCP connection management, RTT T1: 3.5 – 3.7
Estimation and Retransmission Timeout, TCP Flow Control. TCP Error Control
and Congestion control algorithms (Slow start, Congestion avoidance, Fast
Recovery, Fast Retransmit), TCP Fairness
M4: Data Routing and Forwarding (Network Layer)
19-21 Virtual Circuits Networks vs. Datagram Networks, Internal Architecture of T1: 4.1 – 4.4
Router, Forwarding and Addressing in the Internet (IP). IPv4 Addressing, Internet
Control Management Protocol (ICMP), IPv6 Addressing
22-24 Routing Algorithms: Shortest Path Routing, Flooding, Link State, Distance T1: 4.5
Vector, and Hierarchical Routing
25-27 Routing in the Internet: Intra-domain routing (RIP, OSPF), Inter-domain routing T1: 4.6 – 4.7
(BGP): BGP policy and attributes, Multicast routing algorithms: Source based
multicast tree vs. group based multicast tree, IP Multicast routing (DVMRP,
IGMP)
M5: Access Networks & LANs (Link Layer)
28-29 Services, Error Detection and Correction Techniques (Parity Checks, Checksums, T1: 5.1 – 5.2
CRC).
30-31 Multiple Access Protocol: TDM, FDM, Slotted ALOHA, Pure ALOHA, CSMA, T1: 5.3

Please Do Not Print Unless Necessary


BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE, Pilani
Pilani Campus
AUGS/AGSR Division

CSMA/CD
32-34 Local Area Networks, Link Layer addressing: MAC addresses, Address T1: 5.4
Resolution Protocol (ARP), Domain Host Control Protocol (DHCP), Ethernet,
Link Layer switches, Virtual Local Area Networks (VLANs)
35 Link Virtualization: Multi-Protocol Label Switching (MPLS) T1: 5.5
36-37 The theoretical basis for data communication (Bandwidth Limited Signals, R1: 2.1,
Maximum Data Rate of a Channel), Guided physical media. Line coding Class Notes
Schemes: NRZ, RZ, Manchester, Differential Manchester.
M6: Wireless and Mobile Networks
38-40 Wireless Links and Network Characteristics, Wi-Fi: 802.11 Wireless LAN T1: 6.2 – 6.6
Architecture and Protocol, Mobility management: addressing and routing, Mobile
IP

EVALUATION SCHEME

S. Component Duration Weightage Date and Time Nature of


No. component
1. Quiz (2 nos) TBA 20% TBA TBA
2. Mid Semester Test 1.5 hrs 25% TBA TBA
3. Lab Test/Assignment(s) TBA 20% TBA TBA
4. Comprehensive Exam 3 hrs 35% 19-05-2022 TBA

Notices: All course notices will be displayed on the NALANDA LMS/Microsoft Teams
Make-up Policy: Only in genuine cases, on a case-by-case basis, make-ups shall be allowed. Prior
permission from I/C is must.
Chamber Consultation Hour: Monday, Wednesday @ 5:00 PM – 6:00 PM
Instructor-In-Charge
CS F303

Please Do Not Print Unless Necessary


Computer Networks (CS F303)
BITS Pilani Virendra Singh Shekhawat
Department of Computer Science and Information Systems
Pilani Campus
Today’s Agenda

• Course Overview
• Course Administration
• What is network?
• What is Internet?
• Network Structure
– Edge, Access Network (Physical Media), Network Core
• Circuit Switching and Packet Switching

2
Computer Networks CS F303 BITS Pilani, Pilani Campus
Course Objective

• To get familiar with the principles and working of state-of-the-


art of networking
– Routing, Transport protocols, addressing, naming etc.
– Design of network and services
• Learn how communication networks are put together
–Mechanisms, Algorithms, Technology components
• To understand network internals in a hands-on way
– Writing simple network applications, understanding and analyzing
working principles of protocols
3
Computer Networks CS F303 BITS Pilani, Pilani Campus
Course Overview

• Internet Architecture and Computer Network Primitives


• Network Applications (Application Layer)
• End to End Data Transfer (Transport Layer)
• Data Routing and Forwarding (Network Layer)
• Access Networks & LANs (Link Layer)
• Communication Channels (Physical Layer)
• Wireless and Mobile Networks

4
Computer Networks CS F303 BITS Pilani, Pilani Campus
Course Administration
• Instruction delivery
– Lecture classes
• 12:00 – 12:50 pm [Tue, Th] and 5:00 – 5:50 PM [Fri]
– Lab classes
• Start from the first week of Feb (detail will be posted on MS Teams)
• Course page Information
– Lectures and course material will be available at MS Teams
– For assessments NALANDA will be used (https://nalanda-aws.bits-pilani.ac.in)
• Evaluation Plan
– Mid Semester Test @25%
– Quiz (Two) @20% [10% each]
– Lab Assessment @20%
– Comprehensive exam @35% 5
Computer Networks CS F303 BITS Pilani, Pilani Campus
Text Book

6
Computer Networks CS F303 BITS Pilani, Pilani Campus
What is a Network?

• An infrastructure (shared) that allows users


(distributed) to communicate with each
other
– People, devices, …
– By means of voice, video, text, …
– ex., Telephone n/w, Cable TV Network,
Satellite network, military n/w etc. …

• Basic building blocks are


– Nodes (Hosts and Forwarding nodes) and Links
7

Computer Networks CS F303 BITS Pilani, Pilani Campus


What is Internet?

• The Internet is a Network of networks…


– Interconnected Networks Internet

Computer Networks CS F303 BITS Pilani, Pilani Campus


How Internet is different from other
Networks?
• Enable communication between diverse applications on
diverse devices over diverse infrastructure

9
Computer Networks CS F303 BITS Pilani, Pilani Campus
Network Structure

• Network edge: applications


and hosts
• Network core:
interconnected routers
• Access networks
The network that physically
connects an host
• Physical media: wired,
wireless communication links

10
Computer Networks CS F303 BITS Pilani, Pilani Campus
Physical Media-Guided

• Twisted pair
– Two insulated copper wires
– Transmission rates supported are 100 Mbps, 1 Gbps, 10 Gbps
• Coaxial Cable
– Two concentric copper conductors
– Multiple channels on cable
• Fiber Optic Cable
– Glass fiber carrying light pulses, each pulse a bit
– High speed operation (10 Gbps to 100 Gbps)
– Low error rate
11
Computer Networks CS F303 BITS Pilani, Pilani Campus
Physical Media-Unguided

• Radio link types:


– terrestrial microwave
– e.g. up to 45 Mbps channels
• LAN (e.g., WiFi)
– 11Mbps, 54 Mbps
• Wide-area (e.g., cellular)
– 3G cellular: ~ few Mbps
– 4G cellular: ~100 Mbps
• Satellite
– Kbps to 45Mbps channel (or multiple smaller channels)
12
Computer Networks CS F303 BITS Pilani, Pilani Campus
Access Networks Example

13
Computer Networks CS F303 BITS Pilani, Pilani Campus
Internet structure: network of networks

Question: given millions of access ISPs, how to connect them together?


access access
net net
access
net
access
access net
net
access
access net
net

access access
net net

access
net
access
net

access
net
access
net
access access
net access net
net
14
Computer Networks CS F303 BITS Pilani, Pilani Campus
Internet structure: network of networks
Option: connect each access ISP to every other access ISP?
access access
net net
access
net
access
access net
net
access
access net
net

connecting each access ISP


access
to each other directly doesn’t access
net
scale: O(N2) connections. net

access
net
access
net

access
net
access
net
access access
net access net
net
15
Computer Networks CS F303 BITS Pilani, Pilani Campus
Internet structure: network of networks
Option: connect each access ISP to a global transit ISP? Customer
and provider ISPs have economic agreement.
access access
net net
access
net
access
access net
net
access
access net
net

global
access
net
ISP access
net

access
net
access
net

access
net
access
net
access access
net access net
net
16
Computer Networks CS F303 BITS Pilani, Pilani Campus
Internet structure: network of networks

Single global ISP does not scale, there are multiple global ISPs ….
access access
net net
access
net
access
access net
net
access
access net
net
ISP A

access access
net ISP B net

access
ISP C
net
access
net

access
net
access
net
access access
net access net
net
17
Computer Networks CS F303 BITS Pilani, Pilani Campus
Internet structure: network of networks
Multiple global ISPs must be interconnected
access access
Internet exchange point
net net
access
net
access
access net
net

access
IXP access
net
net
ISP A

access IXP access


net ISP B net

access
ISP C
net
access
net

access peering link


net
access
net
access access
net access net
net
18
Computer Networks CS F303 BITS Pilani, Pilani Campus
Internet structure: network of networks

… and regional networks may arise to connect access nets to ISPs


access access
net net
access
net
access
access net
net

access
IXP access
net
net
ISP A

access IXP access


net ISP B net

access
ISP C
net
access
net

access
net regional net
access
net
access access
net access net
net
19
Computer Networks CS F303 BITS Pilani, Pilani Campus
Internet structure: network of networks
… and content provider networks (e.g., Google, Microsoft, Akamai ) may run
their own network, to bring services, content close to end users
access access
net net
access
net
access
access net
net

access
IXP access
net
net
ISP A
Content provider network
access IXP access
net ISP B net

access
ISP B
net
access
net

access
net regional net
access
net
access access
net access net
net
20
Computer Networks CS F303 BITS Pilani, Pilani Campus
Internet structure: network of networks

Tier 1 ISP Tier 1 ISP Google

IXP IXP IXP

Regional ISP Regional ISP

access access access access access access access access


ISP ISP ISP ISP ISP ISP ISP ISP

• at center: small # of well-connected large networks


– “tier-1” commercial ISPs (e.g., Level 3, Sprint, AT&T, NTT), national & international coverage
– content provider network (e.g, Google): private network that connects it data centers to
Internet, often bypassing tier-1, regional ISPs

BITS Pilani, Pilani Campus


Tier-1 ISP: e.g., Sprint

POP: point-of-presence

to/from backbone

peering
… …



to/from customers

BITS Pilani, Pilani Campus


The Network Core

• Mesh of interconnected routers


• How is data transferred through
network?

– Circuit switching: Dedicated circuit


per call ex: telephone net
– Packet-switching: Data Sent through
net in discrete “chunks” ex: Internet

23
Computer Networks CS F303 BITS Pilani, Pilani Campus
Network Core: Circuit Switching

End to end resources reserved


for “call”
• Dedicated resources: no sharing
• Circuit-like (guaranteed)
performance
• Call setup required
• Link bandwidth is to be divided into
“pieces”
– Frequency division
– Time division

24
Computer Networks CS F303 BITS Pilani, Pilani Campus
Circuit Switching: FDM and TDM
Example:
FDM
4 users

frequency

time
TDM

frequency

time 25
Computer Networks CS F303 BITS Pilani, Pilani Campus
Circuit Switch: Numerical example

• How long does it take to send a file of 640,000 bits from host A to host B
over a circuit-switched network?
– All links are 1.536 Mbps
– Each link uses TDM with 24 slots/sec
– 500 msec to establish end-to-end circuit

26
Computer Networks CS F303 BITS Pilani, Pilani Campus
Network Core: Packet Switching

Host sending function:


• Takes application message two packets,
L bits each
• Breaks into smaller chunks,
known as packets, of length
L bits
2 1
• Transmits packet into access
R: link transmission rate
network at transmission host
rate R (aka Bandwidth)

• Store and forward


Transmission Delay = L (bits)
R (bits/sec)
27

Computer Networks CS F303 BITS Pilani, Pilani Campus


End-to-End Delay

transmission
A C
propagation

B D
nodal
processing queueing

28
Computer Networks CS F303 BITS Pilani, Pilani Campus
Caravan Analogy [.1]
100 km 100 km
ten-car toll toll
caravan booth booth
• Cars “propagate” at 100 km/hr
• Toll booth takes 12 sec to service a car (car transmission time)
• Car is analogous to bit; caravan is analogous to packet
• Question:
– How long until caravan is lined up before 2nd toll booth?

29
Computer Networks CS F303 BITS Pilani, Pilani Campus
Caravan analogy [..2]
100 km 100 km
ten-car toll toll
caravan booth booth

• Cars now “propagate” at 1000 km/hr


• Toll booth now takes 1 min to service a car

30
Computer Networks CS F303 BITS Pilani, Pilani Campus
Queuing Delay
• R=link bandwidth (bps)
• L=packet length (bits)
• a=average packet arrival rate

traffic intensity = La/R

 La/R > 1: more “work” arriving than can be serviced, average delay infinite!
 La/R <= 1: delays become large
 La/R ~ 0: average queueing delay small
31
Computer Networks CS F303 BITS Pilani, Pilani Campus
“Real” Internet delays and routes

• What do “real” Internet delay & loss look like?


• Traceroute program: provides delay measurement from source to router along end-to-end
Internet path towards destination. For all i:
– Sends three packets that will reach router i on path towards destination
– Router i will return packets to sender
– Sender times interval between transmission and reply.
– Read RFC 1393 for more detail !!!

• http://traceroute.org

3 probes 3 probes

3 probes

32
Computer Networks CS F303 BITS Pilani, Pilani Campus
Packet switching versus circuit switching

Packet switching allows more users to use network!


example:
 1 Mb/s link
N
 each user: users
• 100 kb/s when “active” 1 Mbps link
• active 10% of time

• Circuit-switching:
– How many users are supported?
• Packet switching:
– with 35 users, probability > 10 active at Exercise: How did we get value 0.0004?
same time is less than .0004 *

BITS Pilani, Pilani Campus


Performance Measure Parameters of
Networks
• Delay

• Packet Loss

• Throughput
– Amount of bits transferred in a unit time
• Instantaneous throughput
– e.g., P2P file sharing applications displays instantaneous throughput during downloads
• Average throughput
34
Computer Networks CS F303 BITS Pilani, Pilani Campus
Exercise

35
Computer Networks CS F303 BITS Pilani, Pilani Campus
Layered (Modular) Network Model (OSI)

Each layer performs specific operations


Implementation of a layer can change by keeping interfaces intact

36
Computer Networks CS F303 BITS Pilani, Pilani Campus
Layering of Airline Functionality

ticket (purchase) ticket (complain) ticket

baggage (check) baggage (claim baggage

gates (load) gates (unload) gate

runway (takeoff) runway (land) takeoff/landing

airplane routing airplane routing airplane routing airplane routing airplane routing

departure intermediate air-traffic arrival


airport control centers airport

Layers: Each layer implements a service


– Via its own internal-layer actions
– Relying on services provided by layer below
37
Computer Networks CS F303 BITS Pilani, Pilani Campus
Internet Hourglass Architecture

• Need to interconnect many existing networks


• Hide underlying technology from applications email WWW phone...

• Decisions: SMTP HTTP RTP...

TCP UDP…
Applications

– Network provides minimal functionality IP

– “Narrow waist” ethernet PPP…

– Best Effort Service…! CSMA async sonet...

copper fiber radio...


Technology

– Tradeoff No assumptions no guarantee

38
Computer Networks CS F303 BITS Pilani, Pilani Campus
source
message
segment
M application Layer Encapsulation
Ht M transport
datagram Hn Ht M network
frame Hl Hn Ht M link
physical
link
physical

switch

destination Hn Ht M network
M application H l Hn Ht M link Hn Ht M
Ht M transport physical
Hn H t M network
Hl Hn Ht M link router
physical

39
Computer Networks CS F303 BITS Pilani, Pilani Campus
Summary

• Network and its components


• Internet Structure
• Internet Core
– Packet Switching
– Circuit Switching
• Network Delays
• Network performance measure parameters
• Layered architecture of the Internet

40
Computer Networks CS F303 BITS Pilani, Pilani Campus
Thank You!

41

Computer Networks CS F303 BITS Pilani, Pilani Campus


Computer Networks (CS F303)
BITS Pilani Virendra Singh Shekhawat
Department of Computer Science and Information Systems
Pilani Campus
Internet Hourglass Architecture

• Need to interconnect many existing networks


• Hide underlying technology from applications email WWW phone...

• Decisions: SMTP HTTP RTP...

TCP UDP…
Applications

– Network provides minimal functionality IP

– “Narrow waist” ethernet PPP…

– Best Effort Service…! CSMA async sonet...

copper fiber radio...


Technology

– Tradeoff No assumptions no guarantee

2
Computer Networks CS F303 BITS Pilani, Pilani Campus
What is a Network Application?

• Programs that run on different end application


transport
network
systems and communicate over a data link
physical
network
– e.g., Web: Web server software
communicates with browser software

• Network core devices do not run user


application code
application
application transport
transport
• Application on end systems allows for network
network
data link
data link physical
rapid application development physical

3
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Application architectures

• Client-server
• Peer-to-Peer (P2P)
• Hybrid of client-server and P2P

4
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Client-Server Architecture

Server:
– “always-on” host
– Permanent IP address
– For scaling, data center is used to create
large powerful virtual server

Clients:
– Communicate with server
– May be intermittently connected
– May have dynamic IP addresses
– Clients do not communicate directly
with each other
5
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Pure P2P Architecture

• No “always-on” server
• Arbitrary end systems directly
communicate
• Peers are connected and change IP
addresses
– example: Freenet and BitTorrent (File Sharing
Apps)

Highly scalable but difficult to manage!!!

6
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Hybrid of client-server and P2P

Skype
– Internet telephony application
– Finding address of remote party: centralized server(s)
– Client-client connection is direct (not through server)

Instant messaging
– Chatting between two users is P2P
– Presence detection/location centralized:
• User registers its IP address with central server when it comes online
• User contacts central server to find IP addresses of buddies

7
Computer Networks (CS F303) BITS Pilani, Pilani Campus
How Network Applications
Communicate?
• Process sends/receives messages host or host or
server server
to/from its Socket
– Socket is the interface between the controlled by
application layer and the transport layer app developer
process process
within the host
socket socket
TCP with TCP with
• Within same host, two processes buffers, Internet buffers,
variables variables
communicate using inter-process
communication
controlled
by OS
• Processes in different hosts
communicate by exchanging
messages
8
Computer Networks (CS F303) BITS Pilani, Pilani Campus
How to identify a process running on a
machine?
• To receive messages, process must have
host or
identifier host or
server server

• IP address of host on which process runs is P1 P2 P3 P4


not sufficient for identifying the process. socket socket socket socket
Why? TCP with TCP with
buffers, Internet buffers,
variables variables

• Process identifier = IP address + port number


– e.g., HTTP server: 80, Mail server (SMTP): 25
– List of well known port numbers is available at
http://www.iana.org 9
Computer Networks (CS F303) BITS Pilani, Pilani Campus
What transport service does an app need?

• Data loss
– Some apps (e.g., audio, video) can tolerate some loss
– Other apps (e.g., file transfer, telnet) require 100% reliable data transfer
• Bandwidth
– Some apps (e.g., multimedia) require minimum amount of bandwidth to be
“effective”
– Other apps (“elastic apps”) make use of whatever bandwidth they get
– ex. E-mail, File Transfer
• Timing
– Some apps (e.g., Internet telephony, interactive games) require low delay to be
“effective”
10
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Web and HTTP [1994]
Web page consists of objects
• Object can be HTML file, JPEG image, Java applet,
audio file,…
• Web page consists of base HTML-file which
includes several referenced objects
• Each object is addressable by a URL
• Example URLs:
https://www.bits-pilani.ac.in/pilani/computerscience/ProgrammesOffered
https://www.bits-pilani.ac.in/pilani/computerscience/Faculty

11
Computer Networks (CS F303) BITS Pilani, Pilani Campus
HTTP Overview [.1]

• Types of messages exchanged


– e.g., request, response
PC running
• Message syntax: Firefox browser
– What fields in messages & how fields
are delineated
• Message semantics server
running
– Meaning of information in fields Apache Web
server
• Rules for when and how processes
send & respond to messages iphone running
Safari browser

12
Computer Networks (CS F303) BITS Pilani, Pilani Campus
HTTP Overview [..2]

Uses TCP:
• Client initiates TCP connection (creates
socket) to server at port 80
initiate TCP
• Server accepts TCP connection from client connection
RTT
request
• HTTP messages exchanged between file
browser (HTTP client) and Web server RTT
time to
transmit
(HTTP server) file
file
received
• TCP connection closed
time time

13
Computer Networks (CS F303) BITS Pilani, Pilani Campus
HTTP Request Message

carriage return character


line-feed character
request line
(GET, POST, GET /index.html HTTP/1.1\r\n
HEAD commands) Host: www-net.cs.umass.edu\r\n
User-Agent: Firefox/3.6.10\r\n
Accept: text/html,application/xhtml+xml\r\n
header Accept-Language: en-us,en;q=0.5\r\n
lines Accept-Encoding: gzip,deflate\r\n
Accept-Charset: ISO-8859-1,utf-8;q=0.7\r\n
carriage return, Keep-Alive: 115\r\n
line feed at start Connection: keep-alive\r\n
\r\n
of line indicates data data data data data ...
end of header lines 14
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Response Message
status line
(protocol
status code HTTP/1.1 200 OK\r\n
status phrase) Date: Sun, 26 Sep 2010 20:09:20 GMT\r\n
Server: Apache/2.0.52 (CentOS)\r\n
Last-Modified: Tue, 30 Oct 2007 17:00:02 GMT\r\n
ETag: "17dc6-a5c-bf716880"\r\n
header Accept-Ranges: bytes\r\n
Content-Length: 2652\r\n
lines Keep-Alive: timeout=10, max=100\r\n
Connection: Keep-Alive\r\n
Content-Type: text/html; charset=ISO-8859-1\r\n
\r\n
data data data data data ...

data, e.g.,
requested
HTML file
15
Computer Networks (CS F303) BITS Pilani, Pilani Campus
HTTP Response status Codes
200 OK
– request succeeded, requested object later in this msg
301 Moved Permanently
– requested object moved, new location specified later in this msg (Location:)
400 Bad Request
– request msg not understood by server
404 Not Found
– requested document not found on this server
505 HTTP Version Not Supported
– the HTTP version used in the request is not supported by the server.

16
Computer Networks (CS F303) BITS Pilani, Pilani Campus
How a Webpage transfers?

• Let’s assume a web page consists of a base HTML file and 5 JPEG images.
– https://www.bits-pilani.ac.in/Pilani/SustainableEnvironment

17
Computer Networks (CS F303) BITS Pilani, Pilani Campus
HTTP Connections

Non-persistent HTTP
• At most one object is sent over a TCP connection
• HTTP/1.0 uses non-persistent HTTP

Persistent HTTP
• Multiple objects can be sent over single TCP
connection between client and server.
• Persistent with Pipeline vs. Persistent without
Pipeline
• HTTP/1.1 uses persistent connections in default
mode
18
Computer Networks (CS F303) BITS Pilani, Pilani Campus
HTTP Method Types

HTTP/1.0: HTTP/1.1:
• GET • GET, POST, HEAD
• POST • PUT
• HEAD – uploads file in entity body
– asks server to leave to path specified in URL
requested object out of field
response • DELETE
– deletes file specified in the
URL field

19
Computer Networks (CS F303) BITS Pilani, Pilani Campus
State in HTTP using “Cookies”
client server

ebay 8734
usual http request msg Amazon server
cookie file creates ID
usual http response
1678 for user create backend
ebay 8734
set-cookie: 1678 entry database
amazon 1678
usual http request msg
cookie: 1678 cookie- access
specific
usual http response msg action

one week later:


access
ebay 8734 usual http request msg
amazon 1678 cookie: 1678 cookie-
specific
usual http response msg action 20
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Web Caches (aka Proxy Server)

origin
server

Proxy
server
client

client
origin
server

21
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Conditional GET
client server
• Goal: don’t send object if cache
has up-to-date cached version HTTP request msg
If-modified-since: <date> object
not
• cache: specify date of cached modified
HTTP response
before
copy in HTTP request HTTP/1.0
304 Not Modified
<date>
If-modified-since: <date>

• server: response contains no


HTTP request msg
object if cached copy is up-to- If-modified-since: <date> object
date: modified
HTTP response after
HTTP/1.0 304 Not Modified
HTTP/1.0 200 OK <date>
<data>
22
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Proxy Server Example [.1]

Assumptions:
 avg object size: 1000 K bits origin
 avg request rate from browsers to origin servers
public
servers: 15 req/sec Internet
 avg data rate to browsers: 1Mbps
 RTT from institutional router to any origin
server: 2 sec
 access link rate: 15 Mbps 15 Mbps
access link
institutional
network
100 Mbps LAN

23
Computer Networks (CS F303) BITS Pilani, Pilani Campus
HTTP/2 [Proposed in 2015]
• Limitations of HTTP/1.1
– It processes only one outstanding request per TCP connection
– Forcing browsers to use multiple TCP connections to process multiple requests
simultaneously
– HTTP1.x used to process text commands which makes it slower
• Motivation
– To improve internet user experience and effectiveness
– Webpages comprise resource-intensive multimedia content
– To make it more secure, reliable with improved performance

• Compatibility with existing applications


– HTTP/2 modifies how the data is formatted (framed) and transported between the client and
server, and hides all the complexity from applications within the new framing layer.
– It is an extension to its predecessor not replacing the older one
24
Computer Networks (CS F303) BITS Pilani, Pilani Campus
HTTP/2Feature: Stream Multiplexing

• What is stream?
– Bi-directional sequence of text format frames
sent over the HTTP/2 protocol exchanged
between the server and client

• HTTP/1 is capable of transmitting only


one stream at a time
– Receiving large amount of media content via
individual streams sent one by one is
inefficient and resource consuming
25
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Binary Framing Layer

• HTTP/2 allows transmission of parallel multiplexed requests and responses


– HTTP/2 breaks down the HTTP protocol communication into an exchange of binary-encoded frames,
which are then mapped to messages that belong to a particular stream, all of which are multiplexed
within a single TCP connection.
– This is the foundation that enables all other features and performance optimizations provided by the
HTTP/2 protocol.

26
Computer Networks (CS F303) BITS Pilani, Pilani Campus
HTTP/2.0 Connection
• Stream: A bidirectional flow of bytes
within an established connection, which
may carry one or more messages.
• Message: A complete sequence of
frames that map to a logical request or
response message.
• Frame: The smallest unit of
communication in HTTP/2, each
containing a frame header, which at a
minimum identifies the stream to which
the frame belongs.

27
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Domain Name System (DNS)

• The domain name system maps the name people use to locate a website to
the IP address that a computer uses to locate a website.

• Why do we need the mapping between host name and IP address?

• Application-layer protocol: hosts, name servers communicate to resolve


names (address/name translation)

28
Computer Networks CS F303 BITS Pilani, Pilani Campus
DNS Structure – Distributed Hierarchical
Database
Root DNS Servers

… …

com DNS servers org DNS servers edu DNS servers

pbs.org poly.edu umass.edu


yahoo.com amazon.com
DNS servers DNS serversDNS servers
DNS servers DNS servers

Client wants IP for www.amazon.com; 1st approx:


• Client queries root server to find com DNS server
• Client queries .com DNS server to get amazon.com DNS server
• Client queries amazon.com DNS server to get IP address for
www.amazon.com
List of all top level domain servers is available at: https://www.icann.org/resources/pages/tlds-2012-02-25-en 29
Computer Networks CS F303 BITS Pilani, Pilani Campus
Root Name Servers

• Root name server:


– Total 13 server, mostly located in North America.
– Each server is actually a network of replicated servers

c. Cogent, Herndon, VA (5 other sites)


d. U Maryland College Park, MD k. RIPE London (17 other sites)
h. ARL Aberdeen, MD
j. Verisign, Dulles VA (69 other sites ) i. Netnod, Stockholm (37 other sites)

e. NASA Mt View, CA m. WIDE Tokyo


f. Internet Software C. (5 other sites)
Palo Alto, CA (and 48 other
sites)

a. Verisign, Los Angeles CA 13 root name


(5 other sites)
b. USC-ISI Marina del Rey, CA
“servers”
l. ICANN Los Angeles, CA worldwide
(41 other sites)
g. US DoD Columbus,
OH (5 other sites)

30
Computer Networks CS F303 BITS Pilani, Pilani Campus
DNS Services

• Hostname to IP address translation


– Host name to IP address mapping
• Host aliasing
– Canonical name to alias name(s) mapping

• Mail server aliasing


– Host name to mail server mapping

• Load distribution
– Replicated Web servers: many IP addresses correspond to one name
31
Computer Networks CS F303 BITS Pilani, Pilani Campus
DNS Query Processing - Recursive root DNS server

2 3
7
6
TLD DNS
server

local DNS server


dns.poly.edu 5 4

1 8

authoritative DNS server


dns.cs.umass.edu
requesting host
cis.poly.edu

gaia.cs.umass.edu

32
Computer Networks CS F303 BITS Pilani, Pilani Campus
DNS Query Processing - Iterative root DNS server

2
• TLD server may know only of an 3
intermediate DNS server for the TLD DNS server
4
hostname, which in turn knows the
authoritative DNS server for the 5
hostname. local DNS server
dns.poly.edu
7 6
1 8
• DNS responses are usually cached to
improve the delay performance and to authoritative DNS server
reduce the number of DNS messages dns.cs.umass.edu
requesting host
– e.g., Local DNS server caches the TLD server cis.poly.edu
information
gaia.cs.umass.edu

33
Computer Networks CS F303 BITS Pilani, Pilani Campus
DNS Records
DNS: distributed database for storing resource records (RR)
RR format: (name, value, type, ttl)

type=A type=CNAME
 name is hostname  name is alias name for some
 value is IP address
“canonical” (the real) name
 www.ibm.com is really
type=NS servereast.backup2.ibm.com
– name is domain (e.g.,  value is canonical name
foo.com)
– value is hostname of type=MX
authoritative name  value is name of mailserver associated
server for this domain with name (host name, i.e.,
mailserver alias)
34
Computer Networks CS F303 BITS Pilani, Pilani Campus
DNS Messages
• Query and reply messages, both with same message format
• Explore DNS protocol in Lab Session #2
2 bytes 2 bytes

msg header identification flags


 identification: 16 bit # for # questions # answer RRs
query, reply to query uses
same # # authority RRs # additional RRs
 flags:
questions (variable # of questions)
 query or reply
 recursion desired
 recursion available answers (variable # of RRs)
 reply is authoritative
authority (variable # of RRs)

additional info (variable # of RRs)


35
Computer Networks CS F303 BITS Pilani, Pilani Campus
Inserting Records into DNS
• A newly created domain name should be first registered at a registrar
– Internet Cooperation of Assigned Names and Numbers (ICANN) accredits the registrars
– Accredited registrar list is available at www.internic.net
– Registrar is a commercial entity that verifies the uniqueness of the domain name.

36
Computer Networks CS F303 BITS Pilani, Pilani Campus
FTP: File Transfer Protocol
file transfer
FTP FTP FTP
user client server
interface
user
at host remote file
local file system
system

 Transfer file to/from remote host


 Client/server model
 Client: side that initiates transfer (either to/from remote)
 Server: remote host
 ftp: RFC 959
 ftp server: port 21

37
Computer Networks CS F303 BITS Pilani, Pilani Campus
FTP: Connections
TCP control connection,
• Control connection server port 21
– Authorization, directory listing
etc. TCP data connection,
FTP server port 20 FTP
client server
• When server receives file
transfer command,
– Server opens 2nd TCP data
connection (for file) to client

• After transferring one file,


server closes data connection

38
Computer Networks CS F303 BITS Pilani, Pilani Campus
FTP Commands and Responses

Sample commands: Sample return codes


• Sent as ASCII text over control • Status code and phrase
channel (as in HTTP)
• USER username • 331 Username OK,
• PASS password password required
• 125 data connection
• LIST return list of file in already open; transfer
current directory starting
• RETR filename retrieves • 425 Can’t open data
(gets) file connection
• 452 Error writing file
• STOR filename stores
(puts) file onto remote host

39
Computer Networks CS F303 BITS Pilani, Pilani Campus
eMail
outgoing
user message queue
Three major components: agent
user mailbox
• User agents mail user
server
– e.g., Outlook, Thunderbird agent

• Mail servers SMTP mail user


– Contains incoming messages for user server agent

• Simple mail transfer protocol: SMTP


– SMTP SMTP user
agent
mail
server
user
agent
user
agent

40
Computer Networks CS F303 BITS Pilani, Pilani Campus
SMTP [RFC 5321, Original RFC 821]

• Uses TCP to reliably transfer email message from client to server, port 25
• Direct transfer: sender’s mail server to receiver’s mail server
• Three phases of transfer
– Handshaking (greeting)Transfer of messagesConnection Closure
• Command/response interaction (like HTTP, FTP)
– Commands: ASCII text
– Response: status code and phrase
• Messages must be in 7-bit ASCII
– Painful for multimedia data

41
Computer Networks CS F303 BITS Pilani, Pilani Campus
Mail Transfer Process

S: 220 hamburger.edu
C: HELO crepes.fr
S: 250 Hello crepes.fr, pleased to meet you
C: MAIL FROM: <alice@crepes.fr>
S: 250 alice@crepes.fr... Sender ok
C: RCPT TO: <bob@hamburger.edu>
S: 250 bob@hamburger.edu ... Recipient ok
C: DATA
S: 354 Enter mail, end with "." on a line by itself
C: Do you like ketchup?
C: How about pickles?
C: .
S: 250 Message accepted for delivery
C: QUIT
S: 221 hamburger.edu closing connection 42
Computer Networks CS F303 BITS Pilani, Pilani Campus
Mail Access Protocols

• Mail access protocol: retrieval from server


– POP3 [Port:110]: Post Office Protocol [RFC 1939]: authorization, download and keep,
download and delete
• User can create folders and move the messages into them locally.
• Stateless across the sessions
– IMAP: Internet Mail Access Protocol [RFC 1730]: more features, including manipulation of
stored msgs on server
• Allows to create remote folders and maintains user state information across IMAP sessions
• Permit a user agent to obtain components of messages. Good for low bandwidth connections. 43
Computer Networks CS F303 BITS Pilani, Pilani Campus
POP3 Protocol
S: +OK POP3 server ready
Authorization phase C: user alex
S: +OK
• Client commands: C: pass hungry
– user: declare username S: +OK user successfully logged on
– pass: password C: list
S: 1 498
• Server responses S: 2 912
– +OK S: .
– -ERR C: retr 1
S: <message 1 contents>
Transaction phase, client: S: .
C: dele 1
• list: list message numbers
C: retr 2
• retr: retrieve message by number S: <message 1 contents>
• dele: delete S: .
• quit C: dele 2
C: quit
S: +OK POP3 server signing off 44
Computer Networks CS F303 BITS Pilani, Pilani Campus
Web based E-Mail

• Hotmail introduced Web-based access in the 1990s

45
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Thank You!

46
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Computer Networks (CS F303)
BITS Pilani Virendra Singh Shekhawat
Department of Computer Science and Information Systems
Pilani Campus
Today’s Agenda

• Peer to Peer Applications and Protocols


– P2P File Distribution, Bit Torrent Protocol

• Database Implementation Protocol in P2P Networks


– Distributed Hash Tables (DHTs)
– Chord Protocol

2
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Peer to Peer (P2P) Architecture

• No always-on server
• Arbitrary end systems directly communicate
• Peers are intermittently connected
• Examples
– File distribution (BitTorrent)
– Streaming (KanKan)
– VoIP (Skype)

3
Computer Networks CS F303 BITS Pilani, Pilani Campus
File Distribution: P2P vs CS
How much time required to distribute file (size F) from one server to N peers?
– peer upload/download capacity is limited resource

us: server upload


capacity

di: peer i download


file, size F u1 d1 capacity
us u2 d2
server
di
uN network (with abundant
bandwidth) ui
dN
ui: peer i upload
capacity

4
Computer Networks CS F303 BITS Pilani, Pilani Campus
File Distribution Time – Client Server
• Server transmission: must F
sequentially send (upload) N us

file copies: di
network
– Time to send N copies: NF/us ui

 Client: each client must download file copy


 dmin = min client download rate
 Slowest client download time: F/dmin

time to distribute F
to N clients using
client-server approach Dc-s > max{NF/us,,F/dmin}

5
Computer Networks CS F303 BITS Pilani, Pilani Campus
File Distribution Time - Peer to Peer
• Server transmission: must upload at least one copy
F
– time to send one copy: F/us us
di
 Client: each client must download file copy network
 Slowest client download time: F/dmin ui

 Clients: as aggregate must download NF bits


 max upload rate (limiting max download rate) is us + Sui

Time to distribute F
to N clients using
P2P approach
DP2P > max{F/us,,F/dmin,,NF/(us + Sui)}

6
Computer Networks CS F303 BITS Pilani, Pilani Campus
Exercise

• Distributing a File F = 15 Gbits to 10 peers


• Server upload rate is us = 30 Mbps
• Each peer download rate is di = 2 Mbps
• Each peer upload rate is u = 300 Kbps
• Question
– Calculate minimum distribution time for both CS and P2P

7
Computer Networks (CS F303) BITS Pilani, Pilani Campus
CS vs P2P: Example

client upload rate = u, F/u = 1 hour, us = 10u, dmin ≥ us


3.5
P2P

Minimum Distribution Time


3
Client-Server
2.5

1.5

0.5

0
0 5 10 15 20 25 30 35

N
8
Computer Networks CS F303 BITS Pilani, Pilani Campus
P2P File Distribution: BitTorrent

• File divided into 64 KB to 1 MB size (typically


256 KB) chunks

• Peers in torrent send/receive file chunks


– At any given time, each peer will have a subset of
chunks from the file
– A peer asks its neighbors for the list of chunks they
have and gets list from each
– A peer needs to take a call on-
• Which chunks should it request first from its neighbor?
• To which of its neighbors it should send requested
chunks?

9
Computer Networks CS F303 BITS Pilani, Pilani Campus
The lookup problem

N2 N3
N1

Internet
Key = “data item”
Value = video lecture ?
Client
Publisher
N4 N6 Lookup(“data item”)

N5
Decentralized network with several peers (servers/clients)
How to find specific peer that hosts desired data within this network?
10
Computer Networks CS F303 BITS Pilani, Pilani Campus
P2P Protocols

Napster

Gnutella
Kazaa (Skype is based on Kazaa)
11
Computer Networks CS F303 BITS Pilani, Pilani Campus
Distributed Hash Table (DHT)

• Each Peer hold a small subset of the total (key, value) pairs

• Any Peer can query the distributed database with a particular key
– Distributed DB locate the Peers that have the corresponding (key, value)
pairs and return to the querying Peer
– Each peer only knows about a small number of other peers
– Any Peer can insert new (key, value) pairs into the DB
– Robust to peers coming and going (churn)

12
Computer Networks CS F303 BITS Pilani, Pilani Campus
DHT Implementation [.1]

• Randomly scatter the (key, value) pairs across all the peers

• Each peer maintain a list of the IP addresses of all peers

• The querying peer sends its query to all other peers

• The peers containing the (key, value) pairs that match the key can respond
with matching pairs

• This approach is not scalable. Why? 13


Computer Networks CS F303 BITS Pilani, Pilani Campus
Database Implementation [..2] Circular DHT

• Hash function assigns each “node” and “key” an m-bit identifier using a
base hash function such as SHA-1
– Node_ID = hash(IP, Port)
N63
– Key_ID = hash(original key) N60 N2
k7
ID Space: 0 to 2m-1 k58
N10
Here: m = 6 k11

N50 k16
Range = 64
k46 N20

Assign (key-value) pair to the peer that has N40 k39 k25

the closest ID.


14
Computer Networks CS F303 BITS Pilani, Pilani Campus
Chord Protocol:Lookup Operation Example

Predecessor: pointer to the previous node on the id


circle
Successor: pointer to the succeeding node on the
id circle

 ask node n to find the successor of id


 If id between n and its successor
return successor
 else forward query to n´s successor and
so on

=>#messages linear in #nodes


15
Computer Networks CS F303 BITS Pilani, Pilani Campus
Scalable node localization
• Each node n contains a routing table with up-to m entries (m: number of bits
of the identifier) => finger table
• ith entry in the table at node n contains the first node s that succeds n by at
least 2i-1
– s = successor (n + 2i-1)
– s is called the ith finger of node n

16
Computer Networks CS F303 BITS Pilani, Pilani Campus
The Chord algorithm –
Scalable node localization
• Search in finger table for the node
which is
most immediatly precedes key

• Invoke find_successor from that node

Number of messages O(log N)!


17
Computer Networks CS F303 BITS Pilani, Pilani Campus
Failure Recovery (Peer Churn)

• Key step in failure recovery is maintaining correct successor pointers


• To achieve this, each node maintains a successor-list of its r nearest
successors on the ring
• If node n notices that its successor has failed, it replaces it with the first
live entry in the list
• The stabilize will correct finger table entries and successor-list entries
pointing to failed node
• Stabilization protocol should be invoked based on the frequency of
nodes leaving and joining
18
Computer Networks CS F303 BITS Pilani, Pilani Campus
Thank You!

19
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Computer Networks (CS F303)
BITS Pilani Virendra Singh Shekhawat
Department of Computer Science and Information Systems
Pilani Campus
Next…

• Creating network Applications


– Socket Programming
• TCP vs. UDP Sockets
• Transport Layer
– Transport Layer Services
• Multiplexing/Demultiplexing
– Connectionless and Connection Oriented
» TCP and UDP
• Reliable data transfer (Protocol design)
• Flow control
• Congestion control
2
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Socket Programming [.1]
• What is a socket?
– To the kernel, a socket is an endpoint of communication.
– To an application, a socket is a file descriptor that lets the application read/write from/to the network.
• Remember: All Unix I/O devices, including networks, are modeled as files.

• Clients and servers communicate with each other by reading from and writing to socket
descriptors.

application application
socket controlled by
process process app developer

transport transport
network network controlled
link by OS
link Internet
physical physical

3
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Socket Programming [..2]

Two socket types for two different transport services:


– UDP: unreliable datagram
– TCP: reliable, byte stream-oriented

Application Example:
1. Client reads a line of characters (data) from its keyboard and sends the
data to the server.
2. The server receives the data and converts characters to uppercase.
3. The server sends the modified data to the client.
4. The client receives the modified data and displays the line on its screen.

4
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Socket Programming with UDP

UDP: no “connection” between client & server


• No handshaking before sending data
• Sender explicitly attaches destination IP address and port # to each packet
• Receiver extracts sender IP address and port# from received packet

Note: Transmitted data may be lost or received out-of-order

5
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Socket Programming with TCP

Client contacts server by: • When contacted by client, server


• Creating TCP socket, specifying IP TCP creates new socket for
address, port number of server server process to communicate
process
with that particular client
• Server must have created socket
(door) that welcomes client’s contact – Allows server to talk with multiple
clients
• Client TCP establishes connection to
server TCP

Application viewpoint:
TCP provides reliable, in-order byte-stream transfer (“pipe”)
between client and server.
6
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Socket Structure [.1]

struct sockaddr
{
unsigned short int sa_family; // address family, AF_xxx
char sa_data[14] ; // 14 bytes of protocol address
}

• sa_family – this remains AF_INET for stream and datagram sockets


• sa_data - contains destination address and port number for the socket

7
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Socket Structure [..2]
• Parallel structure to sockaddr
struct sockaddr_in
{
short int sin_family; // Address family (e.g., AF_INET)
unsigned short int sin_port; // Port number (e.g., htons (2240)
struct in_addr sin_addr; // Internet address
unsigned char sin_zero[8]; // same size as sockaddr
}
struct in_addr
{ unsigned long s_addr;
}
• sin_zero is used to pad the structure to the length of a structure sockaddr and hence is set to all zeros with
the function memset()
• Important – you can cast sockaddr_in to a pointer of type struct sockaddr and vice versa
• sin_family corresponds to sa_family and should be set to “AF_INET”.
• sin_port and sin_addr must be in NBO
8
Computer Networks (CS F303) BITS Pilani, Pilani Campus
NBO & HBO Conversion Functions

• Two types that can be converted


– short (two bytes)
– long (two bytes)

• Primary conversion functions


– htons() // host to network short
– htonl() // host to network long
– ntohs // network to host short
– ntohl() // network to host long

• Very Important: Even if your machine is Big-Endian m/c, but you put your bytes in NBO before putting
them on to the network for portability

9
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Primary Socket System Calls

• socket() - create a new socket and return its descriptor


• bind() - associate a socket with a port and address
• listen() - establish queue for connection requests
• accept() - accept a connection request
• connect() - initiate a connection to a remote host
• recv() - receive data from a socket descriptor
• send() - send data to a socket descriptor
• close() - “one-way” close of a socket descriptor

10
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Socket System Calls:
Connectionless (e.g., UDP)

11
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Socket System Calls: Connection-
Oriented (e.g., TCP)

12
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Socket System Calls [.1]

• SOCKET: int socket(int domain, int type, int protocol);


– domain := AF_INET (IPv4 protocol)
– type := (SOCK_DGRAM or SOCK_STREAM )
– protocol := 0 (IPPROTO_UDP or IPPROTO_TCP)
– returned: socket descriptor (sockfd), -1 is an error

• BIND: int bind(int sockfd, struct sockaddr *my_addr, int addrlen);


– sockfd - socket descriptor (returned from socket())
– my_addr: socket address, struct sockaddr_in is used
– addrlen := sizeof(struct sockaddr)

13
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Socket System Calls [..2]

• LISTEN: int listen(int sockfd, int backlog);


– backlog: how many connections we want to queue

• ACCEPT: int accept(int sockfd, void *addr, int *addrlen);


– addr: here the socket-address of the caller will be written
– returned: a new socket descriptor (for the temporal socket)

• CONNECT: int connect(int sockfd, struct sockaddr *serv_addr, int


addrlen); //used by TCP client
– parameters are same as for bind()

14
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Socket System Calls […3]
• SEND: int send(int sockfd, const void *msg, int len, int flags);
– msg: message you want to send
– len: length of the message
– flags := 0
– returned: the number of bytes actually sent

• RECEIVE: int recv(int sockfd, void *buf, int len, unsigned int flags);
– buf: buffer to receive the message
– len: length of the buffer (“don’t give me more!”)
– flags := 0
– returned: the number of bytes received

15
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Socket System Calls [….4]
• SEND (DGRAM-style): int sendto(int sockfd, const void *msg, int len, int flags, const struct sockaddr *to, int
tolen);
– msg: message you want to send
– len: length of the message
– flags := 0
– to: socket address of the remote process
– tolen: = sizeof(struct sockaddr)
– returned: the number of bytes actually sent

• RECEIVE (DGRAM-style): int recvfrom(int sockfd, void *buf, int len, unsigned int flags, struct sockaddr
*from, int *fromlen);
– buf: buffer to receive the message
– len: length of the buffer (“don’t give me more!”)
– from: socket address of the process that sent the data
– fromlen:= sizeof(struct sockaddr)
– flags := 0
– returned: the number of bytes received

• CLOSE: close (socketfd);


16
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Byte ordering routines

#include <sys/types.h>
#include <netinet/in.h>

u_long htonl(u_long hostlong); /* host-to-network, long integer */

u_short htons(u_short hostshort); /* host-to-network, short integer */

u_long ntohl(u_long netlong); /* network-to-host, long integer */

u_short ntohs(u_short netshort); /* network-to-host, short integer */

Address conversion routines


#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>

unsigned long inet_addr(char *ptr);


accepts a char string of IP address and returns a 32-bit network byte-order integer equivalent.
char *inet_ntoa(struct in_addr inaddr);
accepts an IP addres expressed as a 32-bit quantity in network byte order and returns a string
expressed in dotted-decimal notation 17
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Simple TCP Server
#include <sys/types.h> listen(sockfd, 5);
#include <sys/socket.h>
#include <netinet/in.h> for(; ; ) {
#define SERVER_PORT 5888 clilen= sizeof(cliaddr);
connfd=accept(sockfd, (struct sockaddr *)
int main() &cliaddr, &clilen);
{ int sockfd, connfd,clilen,n;
if(connfd<0)
char buf[256]; { printf(“Server Accept error \n”); exit(1); }
struct sockaddr_in servaddr, cliaddr;
printf("Client IP: %s\n",
sockfd = socket( AF_INET, SOCK_STREAM, 0); inet_ntoa(cliaddr.sin_addr));
if (sockfd < 0) printf("Client Port: %hu\n",
{ printf(“ Server socket error"); ntohs(cliaddr.sin_port));
exit(1); }
servaddr.sin_family = AF_INET; n = read(connfd, buf,256);
servaddr.sin_port = htons(SERVER_PORT); printf("Server read: \"%s\" [%d chars]\n", buf,
n);
servaddr.sin_addr.s_addr =
htonl(INADDR_ANY);
write(connfd, “Server Got Message”, n);
close(connfd);
if(bind(sockfd,(struct }
sockaddr*)&servaddr,sizeof(servaddr) <0 )
{ printf(“Server Bind Error”); exit(1); } }

18
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Simple TCP Client
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#define SERVER_PORT 5888
int main()
{ int sockfd, clifd,len;
char buf[256];
struct sockaddr_in servaddr;
sockfd = socket( AF_INET, SOCK_STREAM, 0);
if (sockid < 0) { printf(“ Server socket error"); exit(1); }

servaddr.sin_family = AF_INET;
servaddr.sin_port = htons(SERVER_PORT);
servaddr.sin_addr.s_addr = inet_addr(“172.24.2.4”);

connect(sockfd,(struct sockaddr*)&servaddr, sizeof(servaddr))

print(“Enter Message \n”);


fgets(buf,256,stdin);
write(sockfd,buf,strlen(buf));

read(sockfd,buf,256);
printf(“Client Received%s\n",buf);
Close(sockfd);
}
19
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Simple UDP Server
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#define SERVER_PORT 9988
int main()
{ int sockfd, clilen;
char buf[256];
struct sockaddr_in servaddr, cliaddr;
sockfd = socket( AF_INET, SOCK_DGRAM, 0);
servaddr.sin_family = AF_INET;
servaddr.sin_port = htons(SERVER_PORT);
servaddr.sin_addr.s_addr =htonl(INADDR_ANY);
if (bind(sockfd,(struct sockaddr*)&servaddr,sizeof(servaddr)) <0 )
{ printf(“Server Bind Error”); exit(1); }
for(; ; )
{ clilen= sizeof(cliaddr);
recvfrom(sockfd,buf,256,0,(struct sockaddr*)&cliaddr,&clilen);

printf(“Server Received:%s\n”,buf);

sendto(sockfd,“Server Got Message",18, 0,(struct sockaddr*)&cliaddr,clilen);


}
}
20
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Simple UDP Client
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#define SERVER_PORT 9988
#define SERVER_IPADDR “172.24.2.4”
int main()
{ int sockfd,len;
char buf[256];
struct sockaddr_in ,cliaddr,servaddr;

servaddr.sin_family = AF_INET;
servaddr.sin_port = htons(SERVER_PORT);
servaddr.sin_addr.s_addr = inet_addr(SERVER_IPADDR);

sockfd = socket( AF_INET, SOCK_DGRAM, 0);

cliaddr.sin_family = AF_INET;
cliaddr.sin_port = htons(0);
cliaddr.sin_addr.s_addr =htonl(INADDR_ANY);
bind(sockfd,(struct sockaddr*)&cliaddr,sizeof(cliaddr));

printf(“Enter Message\n”); fgets(buf,255,stdin);


len= sizeof(server);

sendto(sockfd,buf,strlen(buf), 0,(struct sockaddr*)&servaddr,len);

recvfrom(sockfd,buf,256,0,NULL,NULL);
printf(“Clinet Received: %s \n”,buf);
close(sockfd);
} 21
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Thank You!

22
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Computer Networks (CS F303)
BITS Pilani Virendra Singh Shekhawat
Department of Computer Science and Information Systems
Pilani Campus
Topics

• Transport Layer
– Transport Layer Services
• Multiplexing/Demultiplexing
– Connectionless and Connection Oriented
» TCP and UDP
• Reliable data transfer (Protocol design)
• Flow control
• Congestion control

2
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Transport Layer Services and Protocols

• Provides logical communication between app


processes
– Apps processes sends msgs to each other using the logical
communication

• Extend host-to-host delivery to process-to-process


delivery

3
Computer Networks (CS F303) BITS Pilani, Pilani Campus
TP Layer vs. Network Layer
• Network layer: logical communication between hosts

• TP Layer: logical communication between processes

• TP layer services are constrained by the service model of underlying


network-layer protocol

• But certain services can be offered by the TP layer even when the network
layer doesn’t offer
– e.g., Reliable data transfer

4
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Transport Layer Services

• Reliable in-order delivery (TCP)


– Congestion control
– Flow control
– Connection setup

• Unreliable, unordered delivery (UDP)


– Extension of “best-effort” IP

5
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Process-to-Process Delivery Service
Multiplexing at sendening time:
Demux at receiving time:
handle data from multiple use header info to deliver
sockets, add transport header received segments to correct
socket

application

application P1 P2 application socket


P3 transport P4
process
transport network transport
network link network
link physical link
physical physical

6
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Demultiplexing at Receiver

• Host receives IP datagrams 32 bits


 Each datagram has source IP address,
source port # dest port #
destination IP address
 Each datagram carries one transport-
layer segment other header fields
 Each segment has source, destination
port number application
data
• Host uses IP addresses & port (payload)
numbers to direct segment to
appropriate socket TCP/UDP segment format

7
3-7
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Connectionless (UDP) Demultiplexing

• When host receives UDP segment:


– Checks destination port # in segment and directs segment to socket with port #

• Recall: when creating datagram to send into UDP socket, must specify
• Destination IP address
• Destination port #

8
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Example: Connectionless Demultiplexing
DatagramSocket serverSocket
DatagramSocket = new DatagramSocket DatagramSocket
mySocket2 = new mySocket1 = new
(6428);
DatagramSocket DatagramSocket
(9157); application
(5775);
application application
P1
P3 P4
transport
transport transport
network
network link network
link physical link
physical physical

source port: 6428 source port: ?


dest port: 9157 dest port: ?

source port: 9157 source port: ?


dest port: 6428 dest port: ?
9
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Connection Oriented Demultiplexing

• TCP socket is identified by 4-tuples:


• Source IP address, Source port #, Destination IP address, Destination port #
• Demultiplexing: receiver uses all four values to direct segment to appropriate socket

• Server host may support many simultaneous TCP sockets:


– Web servers have different sockets for each connecting client
– e.g., non-persistent HTTP will have different socket for each request

10
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Example: Connection Oriented Demux

application
application P4 P5 P6 application
P1 P2 P3
transport
transport transport
network
network link network
link physical link
physical server: IP physical
address B

host: IP source IP,port: B,80 host: IP


address A dest IP,port: A,9157 source IP,port: C,5775 address C
dest IP,port: B,80
source IP,port: A,9157
dest IP, port: B,80
source IP,port: C,9157
dest IP,port: B,80
Three segments, all destined to IP address: B,
dest port: 80 are demultiplexed to different sockets 11
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Example
threaded server
application
application application
P4
P3 P2 P3
transport
transport transport
network
network link network
link physical link
physical server: IP physical
address B

host: IP source IP,port: B,80 host: IP


address A dest IP,port: A,9157 source IP,port: C,5775 address C
dest IP,port: B,80
source IP,port: A,9157
dest IP, port: B,80
source IP,port: C,9157
dest IP,port: B,80
12
Computer Networks (CS F303) BITS Pilani, Pilani Campus
User Datagram Protocol [RFC 768]

• Best effort service


– UDP segment may lost, delivered out of order to app
• Connectionless
– No handshaking between sender and receiver

• Each UDP segment handled independently of others

13
Computer Networks (CS F303) BITS Pilani, Pilani Campus
UDP Segment Header
length, in bytes of
32 bits UDP segment,
source port # dest port # including header

length checksum
Why is there a UDP?
• No connection establishment
application
(which can add delay)
data
(payload) • simple: no connection state
at sender, receiver
• small header size
UDP segment format • no congestion control: UDP
can blast away as fast as
desired
14
Computer Networks (CS F303) BITS Pilani, Pilani Campus
UDP Checksum

• Treat segment contents (with header fields) as a sequence of 16-bit


integers at sender
– Sum all such 16-bit words in the segment
– One’s complement of the sum is put in checksum field
• At the receiver, all 16-bit words are added (including checksum) to detect
error in segment

15
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Principles of Reliable Data Transfer

• Important in application, transport, link layers


• Top-10 list of important networking topics!

16
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Principles of Reliable Data Transfer

• Important in application, transport, link layers


• Top-10 list of important networking topics!

17
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Principles of Reliable Data Transfer

• Important in application, transport, link layers


• Top-10 list of important networking topics!

• Characteristics of unreliable channel will determine complexity of reliable data


transfer protocol (rdt) 18
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Reliable Data Transfer: getting started

rdt_send(): called from above, deliver_data(): called by


(e.g., by app.). Passed data to be rdt to deliver data to upper
delivered to receiver upper layer

send receive
side side

udt_send(): called by rdt, rdt_rcv(): called when packet


to transfer packet over arrives on rcv-side of channel
unreliable channel to receiver
19
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Reliable Data Transfer: getting started

We will:
• Incrementally develop sender, receiver sides of reliable data
transfer protocol (rdt)
• Consider only unidirectional data transfer
– But control info will flow on both directions!
• Use finite state machines (FSM) to specify sender, receiver
event causing state transition
actions taken on state transition
State: when in this “state”
next state uniquely state state
1 event
determined by next 2
event actions

20
Computer Networks (CS F303) BITS Pilani, Pilani Campus
rdt1.0: reliable transfer over a reliable channel

• Underlying channel perfectly reliable


– No bit errors, No loss of packets
• Separate FSMs for sender, receiver:
– Sender sends data into underlying channel
– Receiver read data from underlying channel

Wait for rdt_send(data) Wait for rdt_rcv(packet)


call from packet = make_pkt(data) call from extract (packet,data)
above rdt_send(packet) below deliver_data(data)

sender receiver
21
Computer Networks (CS F303) BITS Pilani, Pilani Campus
rdt2.0: channel with bit errors
• Underlying channel may flip bits in packet
– Don’t worry… Checksum is there to detect bit errors

• The question? How to recover from errors?


– Acknowledgements (ACKs): receiver explicitly tells sender that pkt received OK
– Negative acknowledgements (NAKs): receiver explicitly tells sender that pkt had errors
– Sender retransmits pkt on receipt of NAK

• New mechanisms in rdt2.0 (beyond rdt1.0):


– Error detection
– Receiver feedback: control msgs (ACK,NAK) rcvr->sender

22
Computer Networks (CS F303) BITS Pilani, Pilani Campus
rdt2.0: FSM Specification
rdt_send(data)
sndpkt = make_pkt(data, checksum) receiver
udt_send(sndpkt)
rdt_rcv(rcvpkt) &&
isNAK(rcvpkt)
Wait for rdt_rcv(rcvpkt) &&
Wait for call
ACK or udt_send(sndpkt) corrupt(rcvpkt)
from above
NAK
udt_send(NAK)

rdt_rcv(rcvpkt) && isACK(rcvpkt)


Wait for call
L
from below
sender
rdt_rcv(rcvpkt) &&
notcorrupt(rcvpkt)
extract(rcvpkt,data)
deliver_data(data)
udt_send(ACK)
23
Computer Networks (CS F303) BITS Pilani, Pilani Campus
rdt2.0: Operation with no Errors
rdt_send(data)
sndpkt = make_pkt(data, checksum)
udt_send(sndpkt)
rdt_rcv(rcvpkt) &&
isNAK(rcvpkt)
Wait for call Wait for rdt_rcv(rcvpkt) &&
from above ACK or udt_send(sndpkt) corrupt(rcvpkt)
NAK
udt_send(NAK)

rdt_rcv(rcvpkt) && isACK(rcvpkt)


Wait for call
L from below

rdt_rcv(rcvpkt) &&
notcorrupt(rcvpkt)
extract(rcvpkt,data)
deliver_data(data)
udt_send(ACK)

24
Computer Networks (CS F303) BITS Pilani, Pilani Campus
rdt2.0: Error Scenario
rdt_send(data)
sndpkt = make_pkt(data, checksum)
udt_send(sndpkt)
rdt_rcv(rcvpkt) &&
isNAK(rcvpkt)
Wait for call Wait for rdt_rcv(rcvpkt) &&
from above ACK or udt_send(sndpkt) corrupt(rcvpkt)
NAK
udt_send(NAK)

rdt_rcv(rcvpkt) && isACK(rcvpkt)


Wait for call
L from below

rdt_rcv(rcvpkt) &&
notcorrupt(rcvpkt)
extract(rcvpkt,data)
deliver_data(data)
udt_send(ACK)

25
Computer Networks (CS F303) BITS Pilani, Pilani Campus
rdt2.0 Has a fatal flaw!

• What happens if ACK/NAK corrupted?


– Sender doesn’t know what happened at receiver!
– Simple, just retransmit.

• How to handle duplicates?


– Sender adds sequence number to each pkt
– Receiver discards (doesn’t deliver up) duplicate pkt

26
Computer Networks (CS F303) BITS Pilani, Pilani Campus
rdt2.1: Sender, handles garbled ACK/NAKs

rdt_send(data)
sndpkt = make_pkt(0, data, checksum)
udt_send(sndpkt) rdt_rcv(rcvpkt) &&
( corrupt(rcvpkt) ||
Wait for call Wait for
ACK or NAK
isNAK(rcvpkt) )
0 from
0 udt_send(sndpkt)
above
rdt_rcv(rcvpkt)
&& notcorrupt(rcvpkt) rdt_rcv(rcvpkt)
&& isACK(rcvpkt) && notcorrupt(rcvpkt)
&& isACK(rcvpkt)
L
L
Wait for Wait for
ACK or NAK call 1 from
rdt_rcv(rcvpkt) && 1 above
( corrupt(rcvpkt) ||
isNAK(rcvpkt) ) rdt_send(data)

udt_send(sndpkt) sndpkt = make_pkt(1, data, checksum)


udt_send(sndpkt)

27
Computer Networks (CS F303) BITS Pilani, Pilani Campus
rdt2.1: Receiver, handles garbled ACK/NAKs

rdt_rcv(rcvpkt) && notcorrupt(rcvpkt)


&& has_seq0(rcvpkt)
extract(rcvpkt,data)
deliver_data(data)
sndpkt = make_pkt(ACK, chksum)
udt_send(sndpkt)
rdt_rcv(rcvpkt) && (corrupt(rcvpkt) rdt_rcv(rcvpkt) && (corrupt(rcvpkt)
sndpkt = make_pkt(NAK, chksum) sndpkt = make_pkt(NAK, chksum)
udt_send(sndpkt) udt_send(sndpkt)
Wait for Wait for
rdt_rcv(rcvpkt) && 0 from 1 from rdt_rcv(rcvpkt) &&
not corrupt(rcvpkt) && below below not corrupt(rcvpkt) &&
has_seq1(rcvpkt) has_seq0(rcvpkt)
sndpkt = make_pkt(ACK, chksum) sndpkt = make_pkt(ACK, chksum)
udt_send(sndpkt) udt_send(sndpkt)
rdt_rcv(rcvpkt) && notcorrupt(rcvpkt)
&& has_seq1(rcvpkt)

extract(rcvpkt,data)
deliver_data(data)
sndpkt = make_pkt(ACK, chksum)
udt_send(sndpkt)
28
Computer Networks (CS F303) BITS Pilani, Pilani Campus
rdt2.1: Discussion

Sender:
• Seq # added to pkt
• Two seq. #’s (0,1) will suffice. Why?
• Must check if received ACK/NAK corrupted
• Twice as many states
– State must “remember” whether “current” pkt has 0 or 1 seq. #

Receiver:
• Must check if received packet is duplicate
– State indicates whether 0 or 1 is expected pkt seq #
– For an out of order received packet, it sends ACK for it

• Note: Receiver can not know if its last ACK/NAK received OK at sender
29
Computer Networks (CS F303) BITS Pilani, Pilani Campus
rdt2.2: NAK Free Protocol

• Same functionality as rdt2.1, using ACKs only


• Instead of NAK, receiver sends ACK for last pkt received OK
– Receiver must explicitly include seq # of pkt being ACKed
30
Computer Networks (CS F303) BITS Pilani, Pilani Campus
rdt3.0: Channels with errors and loss

• New assumption: Underlying channel can also lose packets (data or ACKs)
– Checksum, seq. #, ACKs, retransmissions will be of help, but not enough

• Approach: Sender waits “reasonable” amount of time for ACK


– Retransmits if no ACK received in this time
– If pkt (or ACK) just delayed (not lost):
• Retransmission will be duplicate, but use of seq. #’s already handles this
• Receiver must specify seq # of pkt being ACKed

– Requires countdown timer

31
Computer Networks (CS F303) BITS Pilani, Pilani Campus
rdt 3.0 Sender and Receiver FSM

32
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Thank You!

33
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Computer Networks (CS F303)
BITS Pilani Virendra Singh Shekhawat
Department of Computer Science and Information Systems
Pilani Campus
Topics

• Transport Layer
• Reliable data transfer (Protocol design)
– Stop and Wait vs. Pipelining (Sliding Window)
– Go Back N and Selective Repeat Protocols
• Flow control
• Congestion control

2
Computer Networks (CS F303) BITS Pilani, Pilani Campus
rdt3.0: Channels with errors and loss

• New assumption: Underlying channel can also lose packets (data or ACKs)
– Checksum, seq. #, ACKs, retransmissions will be of help, but not enough

• Approach: Sender waits “reasonable” amount of time for ACK


– Retransmits if no ACK received in this time
– If pkt (or ACK) just delayed (not lost):
• Retransmission will be duplicate, but use of seq. #’s already handles this
• Receiver must specify seq # of pkt being ACKed

– Requires countdown timer

3
Computer Networks (CS F303) BITS Pilani, Pilani Campus
rdt 3.0 Sender and Receiver FSM

4
Computer Networks (CS F303) BITS Pilani, Pilani Campus
rdt3.0 in action

5
Computer Networks (CS F303) BITS Pilani, Pilani Campus
rdt3.0 (Lost ACK and Premature Timeout)

6
Computer Networks (CS F303) BITS Pilani, Pilani Campus
rdt3.0: Performance
sender receiver
first packet bit transmitted, t = 0
last packet bit transmitted, t = L / R

first packet bit arrives


RTT last packet bit arrives, send ACK

ACK arrives, send next


packet, t = RTT + L / R

Example: 1 Gbps link, 15 ms end to end prop. delay, 1KB packet:

U L/R .008
= = = 0.00027
sender 30.008
RTT + L / R microsec
7
onds
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Pipelining: Increased Utilization
sender receiver
first packet bit transmitted, t = 0
last bit transmitted, t = L / R

first packet bit arrives


RTT last packet bit arrives, send ACK
last bit of 2nd packet arrives, send ACK
last bit of 3rd packet arrives, send ACK
ACK arrives, send next
packet, t = RTT + L / R

Increase utilization
by a factor of 3!

U 3*L/R .024
= = = 0.0008
sender 30.008
RTT + L / R microsecon
ds
8
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Pipelined Protocols

• Pipelining: Sender allows multiple, “in-flight”, yet-to-be-acked pkts


– Range of sequence numbers must be increased
– Buffering at sender and/or receiver

9
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Pipelining Protocols Requirements

• The range of sequence numbers must be increased


– Multiple in-transit packets

• Packet Buffering is required at both sides. Why?

• Two basic approaches


– Go-Back-N (GBN)
– Selective Repeat (SR)

10
Computer Networks (CS F303) BITS Pilani, Pilani Campus
GBN Protocol
Sender window (N=4) Sender Receiver
012345678 send pkt0
012345678 send pkt1
012345678 send pkt2 receive pkt0, send ack0
012345678 send pkt3 Xloss receive pkt1, send ack1
(wait)
receive pkt3, discard,
012345678 rcv ack0, send pkt4 (re)send ack1
012345678 rcv ack1, send pkt5 receive pkt4, discard,
(re)send ack1
ignore duplicate ACK receive pkt5, discard,
(re)send ack1
pkt 2 timeout
012345678 send pkt2
012345678 send pkt3
012345678 send pkt4 rcv pkt2, deliver, send ack2
012345678 send pkt5 rcv pkt3, deliver, send ack3
rcv pkt4, deliver, send ack4
rcv pkt5, deliver, send ack5
11
Computer Networks (CS F303) BITS Pilani, Pilani Campus
GBN Sender

• K-bit seq # in pkt header (modulo 2K arithmetic)


• A “window” of upto N, consecutive unack’ed pkts allowed

12
Computer Networks (CS F303) BITS Pilani, Pilani Campus
GBN Sender FSM
rdt_send(data)
if (nextseqnum < base+N) {/*If we are allowed to send packets*/
sndpkt[nextseqnum] = make_pkt(nextseqnum,data,chksum)
udt_send(sndpkt[nextseqnum])
if (base == nextseqnum) /*If there are no packets in flight*/
start_timer
nextseqnum++
}
L else
refuse_data(data)
base=0
nextseqnum=0
timeout
start_timer
Wait
udt_send(sndpkt[base])
rdt_rcv(rcvpkt) udt_send(sndpkt[base+1])
&& corrupt(rcvpkt) …
udt_send(sndpkt[nextseqnum-1])
L
rdt_rcv(rcvpkt) &&
notcorrupt(rcvpkt)
base = getacknum(rcvpkt)+1 /*Increase left size of the window*/
If (base == nextseqnum)
stop_timer
else
start_timer 13
Computer Networks (CS F303) BITS Pilani, Pilani Campus
GBN Receiver FSM
• Always send ACK for correctly-received pkt with highest in-order seq #
– Need only to remember “expectedseqnum”
• If out-of-order pkt arrived
– Discard it
– Re-ACK pkt with the highest in-order seq #

default
udt_send(sndpkt) rdt_rcv(rcvpkt)
&& notcurrupt(rcvpkt)
L && hasseqnum(rcvpkt,expectedseqnum)
expectedseqnum=0 Wait extract(rcvpkt,data)
sndpkt = deliver_data(data)
make_pkt(expectedseqnum,ACK,chksum) sndpkt = make_pkt(expectedseqnum,ACK,chksum)
udt_send(sndpkt)
expectedseqnum++
14
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Selective Repeat Protocol
sender window (N=4) sender receiver
012345678 send pkt0
012345678 send pkt1
012345678 send pkt2 receive pkt0, send ack0
012345678 send pkt3 Xloss receive pkt1, send ack1
(wait)
receive pkt3, buffer,
012345678 rcv ack0, send pkt4 send ack3
012345678 rcv ack1, send pkt5 receive pkt4, buffer,
send ack4
record ack3 arrived receive pkt5, buffer,
send ack5
pkt 2 timeout
012345678 send pkt2
012345678 record ack4 arrived
012345678 rcv pkt2; deliver pkt2,
record ack5 arrived
012345678 pkt3, pkt4, pkt5; send ack2

Q: what happens when ack2 arrives?


15
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Selective Repeat (SR) Protocol

• Receiver individually acknowledges all correctly received pkts


– Buffers pkts, as needed, for eventual in-order delivery to upper layer

• Sender only resends pkts for which ACK not received


– Sender keeps timer for each unACKed pkt

• Retransmit only that unacked packet for which timer expires

16
Computer Networks (CS F303) BITS Pilani, Pilani Campus
SR Protocol: Windows

Events at Sender
 Data from above
 Timeout
 ACK(n) in [sendbase,sendbase+N-1]

Events at Receiver
 Pkt n in [rcvbase, rcvbase+N-1]
 Pkt n in [rcvbase-N,rcvbase-1]
 Otherwise
17
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Selective Repeat Dilemma
0123012 pkt0
0123012 pkt1 0123012
0123012 pkt2 0123012
X 0123012
X
timeout
retransmit pkt0 X
0123012 pkt0
will accept packet
with seq number 0

18
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Relation between Window Size and
Sequence Number
• Sequence numbers range for K bits
– 0 to 2K-1
• What should be the window size N for
– Selective Repeat
– Go Back N

19
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Thank You!

20
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Computer Networks (CS F303)
BITS Pilani Virendra Singh Shekhawat
Department of Computer Science and Information Systems
Pilani Campus
Topics

• Transport Layer
• TCP Protocol
– Connection Establishment
– TCP Segment Structure
– Reliable data transfer
– Flow control
– Congestion control

2
Computer Networks (CS F303) BITS Pilani, Pilani Campus
TCP [RFCs: 793,1122,1223,2018,2581]

• Point to Point protocol


– One sender and one receiver
• Reliable in-order byte stream
– No message boundaries
• Pipelined
– Window size is set by congestion and flow control
• Full duplex data
– Bi-directional data flow in same connection
• Connection oriented
– Handshaking (exchange of control msgs)
• Flow controlled
– Sender do not overwhelm receiver
3
Computer Networks (CS F303) BITS Pilani, Pilani Campus
TCP Segment Structure
32 bits
URG: urgent data counting
(generally not used) source port # dest port #
by bytes
sequence number of data
ACK: ACK #
valid acknowledgement number (not segments!)
head not
PSH: push data now len used
UAP R S F receive window
(generally not used) # bytes
checksum Urg data pointer
rcvr willing
RST, SYN, FIN: to accept
options (variable length)
connection estab
(setup, teardown
commands)
application
Internet data
checksum (variable length)
(as in UDP)

4
Computer Networks (CS F303) BITS Pilani, Pilani Campus
TCP: Wireshark Capture

5
Computer Networks (CS F303) BITS Pilani, Pilani Campus
TCP Sequence Numbers and ACKs
TCP views data as stream of bytes
outgoing segment from sender Sequence number reflects stream of transmitted bytes not segments
source port #
Sequence number of a segment – Byte stream number of the first byte in the
dest port #
segment
sequence number window size
acknowledgement number N
rwnd
checksum urg pointer
sender sequence number space

incoming segment to sender


sent sent, not- usable not
ACKed yet ACKed but not usable
source port # dest port # (“in- yet sent
flight”)
sequence number
acknowledgement number
A rwnd

checksum urg pointer


6
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Connection Management

• Before exchanging data, sender/receiver do “handshake”


– Agree on connection parameters

application application

connection state: ESTAB connection state: ESTAB


connection variables: connection Variables:
seq # client-to-server seq # client-to-server
server-to-client server-to-client
rcvBuffer size rcvBuffer size
at server, client at server, client

network network

Socket clientSocket = Socket connectionSocket =


newSocket("hostname","port welcomeSocket.accept();
number");
7
Computer Networks (CS F303) BITS Pilani, Pilani Campus
2-way Handshake

• Will 2-way handshake


always work in network?
Let’s talk
ESTAB
OK
ESTAB

choose x
req_conn(x)
ESTAB
acc_conn(x)
ESTAB

8
Computer Networks (CS F303) BITS Pilani, Pilani Campus
TCP 3-way Handshake

client state server state


LISTEN LISTEN
choose init seq num, x
send TCP SYN msg
SYNSENT SYNbit=1, Seq=x
choose init seq num, y
send TCP SYNACK
msg, acking SYN SYN RCVD
SYNbit=1, Seq=y
ACKbit=1; ACKnum=x+1
received SYNACK(x)
ESTAB indicates server is live;
send ACK for SYNACK;
this segment may contain ACKbit=1, ACKnum=y+1
client-to-server data
received ACK(y)
indicates client is live
ESTAB

9
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Lost ACK Scenario

10
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Premature Timeout

11
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Cumulative ACK

12
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Is TCP GBN or SR…?

1. Is out of order segments are individually ACKed?


2. Are ACKs cumulative?
3. How many timers are maintained by sender?
4. Is TCP receiving out of order segments?

13
Computer Networks (CS F303) BITS Pilani, Pilani Campus
TCP Connection Close

client state server state


ESTAB ESTAB
clientSocket.close()
FIN_WAIT_1 can no longer FINbit=1, seq=x
send but can
receive data CLOSE_WAIT
ACKbit=1; ACKnum=x+1
can still
FIN_WAIT_2 wait for server send data
close

LAST_ACK
FINbit=1, seq=y
TIMED_WAIT can no longer
send data
ACKbit=1; ACKnum=y+1
timed wait
for 2*max CLOSED
segment lifetime

CLOSED
14
Computer Networks (CS F303) BITS Pilani, Pilani Campus
TCP Connection States-Client and Server

15
Computer Networks (CS F303) BITS Pilani, Pilani Campus
TCP Flow Control

• Receiver advertises free buffer space by


including rwnd value in TCP header to application process

– RcvBuffer size set via socket options (typical


default is 4096 bytes) RcvBuffer buffered data
– many operating systems auto adjust
RcvBuffer rwnd free buffer space

• Sender limits amount of unacked (“in-


TCP segment payloads
flight”) data to receiver’s rwnd value

16
Computer Networks (CS F303) BITS Pilani, Pilani Campus
TCP Timeout

• How to set TCP Timeout value?


– Must be longer than RTT
– Too short vs. too long

• How to estimate RTT?


– RTT: measured time from segment transmission until ACK receipt

17
Computer Networks (CS F303) BITS Pilani, Pilani Campus
RTT Estimation
EstimatedRTT = (1-)*EstimatedRTT + *SampleRTT
– Influence of past sample decreases exponentially fast
– Typical value of  = 0.125
RTT: gaia.cs.umass.edu to fantasia.eurecom.fr

350

300

250

RTT (milliseconds)
200

150

100
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)

SampleRTT Estimated RTT

18
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Timeout Interval

• Timeout Interval
– Estimated RTT + “Safety margin”
– Large variation in Estimated RTT  large safety margin
DevRTT = (1-)*DevRTT +
*|SampleRTT-EstimatedRTT|
(typically,  = 0.25)

TimeoutInterval = EstimatedRTT + 4*DevRTT

estimated RTT “safety margin”

19
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Example: Timeout Interval

• Consider three RTT samples (in ms): 150, 200 and 210 in that order. Assume
initial estimated RTT= 200 ms, initial DevRTT = 50 ms, β = 0.25 and α = 0.125

20
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Next…

• Transport Layer
• TCP Protocol
– Connection Establishment
– TCP Segment Structure
– Reliable data transfer
– Flow control
– Timeout Estimation
– Congestion control

21
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Network Congestion…?

10 Mbps
1.5 Mbps
100 Mbps

• Why is it a problem?
– Different sources compete for resources inside
network
– Sources are unaware of current state of resource
– Sources are unaware of each other
– In many situations will result in < 1.5 Mbps of
throughput (congestion collapse)
22
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Congestion Control

• What is congestion
– Too many sources sending too much data too fast for network to handle

• Congestion results in
– Packet losses
– Packet delays
– Throughput reduction

23
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Causes/Cost of Congestion [.1]

• Two senders and two receivers


• One router with infinite buffers
• Output link capacity R
• No retransmission

delay
R/2
lout

lin R/2

lin R/2

24
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Causes/Cost of Congestion [..2]

• One router, finite buffers


• Sender retransmission of timed-out packet
– Application-layer input = Application-layer output: lin = lout

– Transport-layer input includes retransmissions:


lin > lin

25
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Causes/Cost of Congestion […3]

• Packets can be dropped at router due to


full buffers
• Sender only resends if packet known to be
lost (Tricky Assumption…)

• Cost of congestion
– Retransmission for dropped packets

26
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Causes/Cost of Congestion [….4]

• Packets can be lost, dropped at router due to full buffers


– Sender times out prematurely, sending two copies, both of which are
delivered

27
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Causes/Cost of Congestion […..5]

When packet dropped, any “upstream transmission capacity


used for that packet was wasted!
28
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Approaches Towards Congestion Control

• Network Assisted Congestion control


– Routers provide feedback to end systems

• End-to-end Congestion control


– No explicit feedback from network

29
Computer Networks (CS F303) BITS Pilani, Pilani Campus
TCP Congestion Control

• Approach
– Sending rate is a function of perceived congestion

• Arises three important questions


– How does sender perceive the congestion on the path?
– What algorithm should be used to change its sending rate?
– How does sender limit the sending rate?

30
Computer Networks (CS F303) BITS Pilani, Pilani Campus
TCP Congestion Control

• Sender limits transmission


– LastByteSent – LastByteAcked <= min(cwnd, rwnd)
sender sequence number space
cwnd

last byte last byte


ACKed sent, not- sent
yet ACKed
(“in-
flight”)

• What is TCP Sending Rate/Throughput?


31
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Switching from Slow Start to CA

32
Computer Networks (CS F303) BITS Pilani, Pilani Campus
FSM Description of TCP Congestion Control

33
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Example: TCP Congestion Control
a) Identify the intervals of time when TCP slow start is
operating.
b) Identify the intervals of time when TCP congestion
avoidance is operating.
c) What is the ssthresh value between transmission round 7-
10?
d) What is the congestion window value at transmission round
11?
e) How many segments have been sent till transmission round
11? (including 11th transmission round)
f) Identify the intervals of time when TCP fast retransmission
and fast recovery is used?

34
Computer Networks (CS F303) BITS Pilani, Pilani Campus
TCP Sawtooth Behavior

Congestion
Window Timeouts
may still
occur

Slowstart Fast Time


Initial
to pace Retransmit
Slowstart
packets and Recovery

35
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Congestion Control Objectives

• Key to congestion avoidance is the “control function” used to


increase or decrease their sending window
– Distributedness
– Efficiency: Xknee = Sxi(t)
– Fairness: (Sxi)2/n(Sxi2)

– Convergence: control system must be stable and reach to goal state


from any starting state quickly

36
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Linear Control
• Many different possibilities for reaction to congestion and
probing
– Examine simple linear controls
– W(t + 1) = a + b*W(t)
– Different ai/bi for increase and ad/bd for decrease

• Supports various reaction to signals


– Increase/decrease additively
– Increased/decrease multiplicatively
– Which of the four combinations is optimal?
37
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Phase plots (Vector Representation)

• What are desirable properties?

Fairness Line

Overload

User 2’s
Allocation
x2 Optimal point

Underutilization

Efficiency Line

User 1’s Allocation x1

38
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Additive Increase/Decrease
• Both X1 and X2 increase/decrease by the same amount over time
– The additive increase/decrease policy of increasing both users’ allocations by
aI corresponds to moving along a 450 line

Fairness Line

T1

User 2’s
Allocation T0
x2

Efficiency Line

User 1’s Allocation x1


39
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Multiplicative Increase/Decrease

• Both X1 and X2 increase/decrease by the same factor over time


– Extension from origin – constant fairness

Fairness Line
T1

User 2’s
Allocation
x2 T0

Efficiency Line

User 1’s Allocation x1

40
Computer Networks (CS F303) BITS Pilani, Pilani Campus
What is the Right Choice?

• Constraints limit us to AIMD


– AIMD moves towards optimal point

Fairness Line
x1

x0
User 2’s
Allocation
x2 x2

Efficiency Line

User 1’s Allocation x1


41
Computer Networks (CS F303) BITS Pilani, Pilani Campus
TCP Modeling

• Given the congestion control behavior of TCP can we predict what type
of performance we should get?
• Important factors which affect the performance:-
– Loss rate
• Affects how often window is reduced
– RTT
• Affects increase rate and relates BW to window
– RTO
• Affects performance during loss recovery
– MSS
• Affects increase rate

42
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Simple TCP Model

• Some assumptions
– Fixed RTT
– No delayed ACKs

43
Computer Networks (CS F303) BITS Pilani, Pilani Campus
TCP Performance

• Example: 1500 byte segments, 100ms RTT, want 10 Gbps throughput


• Throughput in terms of loss rate:

• L = 2x10-10 1.22  MSS


Throughput 
RTT L

• Requires average congestion window as 83,333 segments


• To get 10 Gbps throughput, it can afford one loss event for every
5,000,000,000 segments
44
Computer Networks (CS F303) BITS Pilani, Pilani Campus
TCP Fairness

Fairness Goal: if K TCP sessions share same bottleneck link of


bandwidth R, each should have average rate of R/K

TCP connection 1

bottleneck
router
capacity R
TCP connection 2

45
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Fairness and UDP

• Multimedia apps often do not use TCP


– Do not want rate throttled by congestion control
• Instead use UDP
– Send audio/video at constant rate, tolerate packet loss
• Multimedia applications running over UDP are not being fair.
Why???

46
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Fairness and Parallel TCP Connections

• Web browsers often use multiple parallel TCP connections


– To transfer multiple objects within a Web page

• Application level fairness with multiple parallel TCP


connections???

47
Computer Networks (CS F303) BITS Pilani, Pilani Campus
TCP Fairness with different RTT

• Flows sharing bottleneck link with different RTT do not get


same bandwidth. Why???
– BW is proportional to 1/RTT

48
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Goals for TCP Fairness and Throughput

• Fast window growth


– Slow start and additive increase are too slow when bandwidth is large
– Want to converge more quickly
• Maintain fairness with other TCP variants
– Window growth cannot be too aggressive
• Improve RTT fairness
– TCP Tahoe/Reno flows are not fair when RTT vary widely

49
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Compound TCP Implementation
• Default TCP implementation in Windows 2008 TCP Stack
– Good for connections with large “bandwidth-delay” products
– Make congestion decisions that reduces the transmission rate based on RTT variations
• Key idea: split cwnd into two separate windows
• swnd = min(cwnd + dwnd, awnd)
– Cwnd (congestion window) is controlled by AIMD
– dwnd (delay window) is the delay window
• Rules for adjusting dwnd:
– If RTT is increasing, decrease dwnd (dwnd >= 0)
– If RTT is decreasing, increase dwnd
– Increase/decrease are proportional to the rate of change
50
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Compound TCP Example
Faster
High Low
cwnd
RTT RTT
Slower growth
Timeout cwnd Timeout
growth
cwnd

Slow Start

Time
• Aggressiveness corresponds to changes in RTT
• +ive: fast ramp up, more fair to flows with different RTTs
• -ive: must estimate precise value of RTT, which is challenging 51

Computer Networks (CS F303) BITS Pilani, Pilani Campus


TCP Cubic

• W(t) = C( t – K)3 + Wmax K = 3 Wb / C


- Wmax = cwnd before last reduction
- β multiplicative decrease factor
- C is a CUBIC parameter
- t is the time elapsed since last
window reduction
- K is a time period that the
function takes to increase W to
Wmax (when there is no further
loss)

52
Computer Networks (CS F303) BITS Pilani, Pilani Campus
TCP CUBIC Example
Slowly accelerate to
CUBIC Function probe for bandwidth

Timeout cwndmax
cwnd

Stable
Slow Start Region

Fast ramp
up

Time
• Less wasted bandwidth due to fast ramp up
• Stable region and slow acceleration help maintain fairness
– Fast ramp up is more aggressive than additive increase
– To be fair to Tahoe/Reno, CUBIC needs to be less aggressive
53
Computer Networks (CS F303) BITS Pilani, Pilani Campus
TCP Congestion Control

• TCP uses loss-based congestion control strategy


– Poor performance with high BW links and large buffer sizes
– Large buffers leads to long RTTs and delayed congestion notification
• Reference
– Congestion Based Congestion Control- BBR [N Cardwell 2016]

54
Advanced Computer Networks CS G525 BITS Pilani, Pilani Campus
BBR Congestion Control

• BBR provides a “queueless” congestion control


• A flow should ideally have data in-flight equal to
Bandwidth Delay Product (BDP)
– BDP = RTprop x BtlBw
– At this point, a connection completely saturates the
bottleneck link with no buffer(queue)
• What is congestion…?
– In-flight data more than BDP (for a long duration)
considered as congestion
55
Advanced Computer Networks CS G525 BITS Pilani, Pilani Campus
BBR

• RTprop and BtlBw cannot


be measured
simultaneously. Why?
– Measuring RTprop requires
operating to the left of BDP
while measuring BtlBw
requires operation to the
right.

56
Advanced Computer Networks CS G525 BITS Pilani, Pilani Campus
Thank You

57
Computer Networks (CS F303) BITS Pilani, Pilani Campus
What’s Next…

• Network Layer
– Network layer service models (Internet and ATM)
– Forwarding versus Routing
– How a router works?
– IPv4 Datagram and Fragmentation
– IPv4 Addressing
• Hierarchical Addressing
• NAT, Sub Netting, IPv4 to IPv6 translation, ICMP
– Routing Algorithms and Protocols
• Inter-domain Routing and Intra-domain routing
– Multicast Routing
58
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Silly Window Syndrome[.1]

• Should the sender transmit a half-MSS or wait for the window to


open to a full-MSS?
– Early implementations of TCP allows transmission of half MSS
– This strategy can lead to silly window syndrome

• What is silly window syndrome?


– Either sender transmits a small segment or the receiver opens the
window a small amount

59
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Silly Window Syndrome[..2]

• It is not possible to outlaw sending small segments. Why?

• But we can keep the receiver from introducing small “containers”


– After advertising zero window, receiver must wait for space equal to an
MSS before it advertises again
– Receiver can delay ACKs and sends combined ACK
• It has no clue how long should wait !!!

60
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Nagle’s Algorithm

• When does the TCP sender decide to transmit a segment?


– How long sender should wait?

When the app produces data to send


if both the available data and window >= MSS
send a full segment
else
if there is unACKed data in flight
buffer the new data until an ACK arrives
else
send all the new data now

61
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Computer Networks (CS F303)
BITS Pilani Virendra Singh Shekhawat
Department of Computer Science and Information Systems
Pilani Campus
What’s Next…

• Network Layer
– Network layer service models (Datagram, MPLS and ATM)
– Forwarding versus Routing
– How a router works?
– IPv4 Datagram and Fragmentation
– IPv4 Addressing
• Hierarchical Addressing
• NAT, Sub Netting, IPv4 to IPv6 translation, ICMP
– Routing Algorithms and Protocols
• Inter-domain Routing and Intra-domain routing
– Multicast Routing
2
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Network layer

• Transport segment from sending to receiving


host
• On sending side encapsulates segments into application
transport
datagrams network
data link network
physical
• On receiving side, delivers segments to network
data link
data link
physical
network
data link
physical physical
transport layer network
data link
physical network
data link
physical
• Router examines header fields in all IP network
network
datagrams passing through it data link
data link
physical
physical
network
data link application
physical transport
network
data link
physical

3
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Network Layer Connection and Connection-less
Service

• Datagram network
– Network-layer connectionless service
• VC Network
– Network-layer connection service

• Analogous to the transport-layer services, but


– Service: host-to-host
– No choice: network provides one or the other
– Implementation: in network core
4
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Datagram Networks

• No call setup at network layer


• Routers: no state about end-to-end connections
– No network-level concept of “connection”
• Packets forwarded using destination host address
– Packets between same src-dest pair may take different paths

application
application
transport
transport
network
data link 1. Send data 2. Receive data network
data link
physical
physical

5
Computer Networks (CS F303) BITS Pilani, Pilani Campus
IP Datagram Format
IP protocol version
number 32 bits total datagram
header length length (bytes)
ver head type of length
(bytes) len service for
“type” of data fragment fragmentation/
16-bit identifier flgs
offset reassembly
max number time to upper header
remaining hops live layer checksum Detect bit errors
(decremented at
32 bit source IP address In Datagram headers
each router)
32 bit destination IP address
upper layer protocol
to deliver payload to options (if any) e.g. timestamp,
record route
how much overhead? data taken, specify
(variable length, list of routers
 20 bytes of TCP
typically a TCP to visit.
 20 bytes of IP
or UDP segment)
 = 40 bytes + app
layer overhead
6
Computer Networks (CS F303) BITS Pilani, Pilani Campus
IPv4 Addressing

• An IPv4 address is a 32-bit address


• Uniquely and universally defines the connection of a device (for
example, a computer or a router) to the Internet.
• The address space of IPv4 is 232 or 4,294,967,296

7
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Datagram Forwarding Table
routing algorithm 4 billion IP addresses, so
rather than list individual
destination address
local forwarding table list range of addresses
dest address output link (aggregate table entries)
address-range 1 3
address-range 2 2
address-range 3 2
address-range 4 1

IP destination address in
arriving packet’s header
1
3 2

8
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Forwarding Table

Destination Address Range Link Interface

11001000 00010111 00010000 00000000


through 0
11001000 00010111 00010111 11111111

11001000 00010111 00011000 00000000


through 1
11001000 00010111 00011000 11111111

11001000 00010111 00011001 00000000


through 2
11001000 00010111 00011111 11111111

otherwise 3
9
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Longest prefix matching

Prefix Match Link Interface


11001000 00010111 00010 0
11001000 00010111 00011000 1
11001000 00010111 00011 2
otherwise 3
Examples
DA: 11001000 00010111 00010110 10100001 Which interface?

DA: 11001000 00010111 00011000 10101010 Which interface?

10
Computer Networks (CS F303) BITS Pilani, Pilani Campus
IP Addressing [.1]
223.1.1.1 = 11011111 00000001 00000001 00000001
223.1.1.1
Q: How are interfaces actually
223.1.2.1
connected?
223.1.1.2
223.1.1.4 223.1.2.9

223.1.3.27
223.1.1.3
223.1.2.2

wired Ethernet interfaces connected


by Ethernet switches
223.1.3.1 223.1.3.2

wireless WiFi interfaces connected


by WiFi base station
11
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Subnets
223.1.1.1
• IP address:
–Subnet part - high order bits 223.1.1.2 223.1.2.1
223.1.1.4 223.1.2.9
–Host part - low order bits
223.1.2.2
• What’s a subnet ? 223.1.1.3 223.1.3.27

–Device interfaces with same subnet subnet


part of IP address
223.1.3.2
–Can physically reach each other 223.1.3.1

without intervening router

12
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Subnets: Subnet Mask
223.1.1.0/24
223.1.2.0/24
223.1.1.1

223.1.1.2 223.1.2.1
223.1.1.4 223.1.2.9

223.1.2.2
223.1.1.3 223.1.3.27

subnet

223.1.3.1 223.1.3.2

223.1.3.0/24
13

Computer Networks (CS F303) BITS Pilani, Pilani Campus


Subnets 223.1.1.2

223.1.1.1 223.1.1.4

223.1.1.3

223.1.9.2 223.1.7.0

223.1.9.1 223.1.7.1
223.1.8.1 223.1.8.0

223.1.2.6 223.1.3.27

223.1.2.1 223.1.2.2 223.1.3.1 223.1.3.2

14
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Example: Subnetting
ISP's block 11001000 00010111 00010000 00000000 200.23.16.0/20

Organization 0 11001000 00010111 00010000 00000000 200.23.16.0/23


Organization 1 11001000 00010111 00010010 00000000 200.23.18.0/23
Organization 2 11001000 00010111 00010100 00000000 200.23.20.0/23
... ….. …. ….
Organization 7 11001000 00010111 00011110 00000000 200.23.30.0/23

15
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Hierarchical Addressing: Route Aggregation

Hierarchical addressing allows efficient advertisement of routing information

Organization 0
200.23.16.0/23
Organization 1
“Send me anything
200.23.18.0/23 with addresses
Organization 2 beginning
200.23.20.0/23 . Fly-By-Night-ISP 200.23.16.0/20”
.
. . Internet
.
Organization 7 .
200.23.30.0/23
“Send me anything
ISPs-R-Us
with addresses
beginning
199.31.0.0/16”
16
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Hierarchical Addressing: More Specific Routes

ISPs-R-Us has a more specific route to Organization 1

Organization 0
200.23.16.0/23

“Send me anything
with addresses
Organization 2 beginning
200.23.20.0/23 . Fly-By-Night-ISP 200.23.16.0/20”
.
. . Internet
.
Organization 7 .
200.23.30.0/23
“Send me anything
ISPs-R-Us
with addresses
Organization 1 beginning 199.31.0.0/16
or 200.23.18.0/23”
200.23.18.0/23
17
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Exercise

• An organization is given a block 17.12.14.0/26


– How many total addresses in the block???
– Range of addresses???
– Organization has Four departments. Wants to divide the addresses
into four sub-blocks of equal size.
– Address of each subnet???

18
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Example: Different Size Subnets

• Organization has three departments. Wants


to divide the addresses into three sub-
blocks of 32, 16, 16.

• Subnet mask for each subnet???

• Range of addresses in each subnet???

19
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Class full Addressing

Find the class of each address:


a. 00000001 00001011 00001011 11101111
b. 11000001 10000011 00011011 11111111
c. 14.23.120.8
d. 252.5.15.111

20
Computer Networks (CS F303) BITS Pilani, Pilani Campus
What is Inside a Router?
• Forwarding/Switching a datagram
– The actual transfer of datagram from a router’s incoming links to the
appropriate outgoing links at the router

21
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Router architecture overview

Two key router functions:


 Run routing algorithms/protocol (RIP, OSPF, BGP)
 Forwarding datagrams from incoming to outgoing link

forwarding tables computed, routing


pushed to input ports routing, management
processor
control plane (software)

forwarding data
plane (hardware)

high-seed
switching
fabric

router input ports router output ports


22
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Input Port Functions

Physical layer:
bit-level reception
Decentralized switching:
Data link layer:
• Given datagram dest., lookup output port using
e.g., Ethernet forwarding table in input port memory
• Goal: complete input port processing at ‘line speed’

23
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Router Architecture Overview

24
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Switching Fabric [.1]
• Switching rate: rate at which packets can be transferred from
inputs to outputs
• N inputs: switching rate N times line rate is desirable

25
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Switching Fabric [..2]

26
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Input Ports

Queuing: Fabric slower than input ports combined

output port contention: one packet time later:


only one red datagram can be green packet
transferred. experiences HOL
lower red packet is blocked blocking
Head Of Line (HOL) Blocking: queued datagram at front of
queue prevents others in queue from moving forward 27
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Output Port Queuing

Suppose Rswitch is N times faster than Rline

switch
switch
fabric
fabric

at t, packets move one packet-time later


from input to output

28
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Output Ports

• Buffering required when datagrams arrive from fabric faster than


the transmission rate
• Scheduling discipline chooses among queued datagrams for
transmission (FIFO, FQ, WFQ, RED)
• Queuing (delay) and loss due to output port buffer overflow!
29
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Thank You!

30
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Computer Networks (CS F303)
BITS Pilani Virendra Singh Shekhawat
Department of Computer Science and Information Systems
Pilani Campus
What’s Next…

• Network Layer
– Network layer service models (Datagram, Virtual Circuit)
– Forwarding versus Routing
– How a router works?
– IPv4 Datagram and Fragmentation
– IPv4 Addressing
• Sub netting, Hierarchical Addressing
– NAT, IPv4 to IPv6 translation, ICMP
– Routing Algorithms and Protocols
• Inter-domain Routing and Intra-domain routing
– Multicast Routing
– Virtual Circuit Networks
• MPLS, ATM
2
Computer Networks (CS F303) BITS Pilani, Pilani Campus
What is Inside a Router?
• Forwarding/Switching a datagram
– The actual transfer of datagram from a router’s incoming links to the
appropriate outgoing links at the router

3
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Router architecture overview

Two key router functions:


 Run routing algorithms/protocol (RIP, OSPF, BGP)
 Forwarding datagrams from incoming to outgoing link

forwarding tables computed, routing


pushed to input ports routing, management
processor
control plane (software)

forwarding data
plane (hardware)

high-seed
switching
fabric

router input ports router output ports


4
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Input Port Functions

Physical layer:
bit-level reception
Decentralized switching:
Data link layer:
• Given datagram dest., lookup output port using
e.g., Ethernet forwarding table in input port memory
• Goal: complete input port processing at ‘line speed’

5
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Router Architecture Overview

6
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Switching Fabric [.1]
• Switching rate: rate at which packets can be transferred from
inputs to outputs
• N inputs: switching rate N times line rate is desirable

7
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Switching Fabric [..2]

8
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Input Ports

Queuing: Fabric slower than input ports combined

output port contention: one packet time later:


only one red datagram can be green packet
transferred. experiences HOL
lower red packet is blocked blocking
Head Of Line (HOL) Blocking: queued datagram at front of
queue prevents others in queue from moving forward 9
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Output Port Queuing

Suppose Rswitch is N times faster than Rline

switch
switch
fabric
fabric

at t, packets move one packet-time later


from input to output

10
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Output Ports

• Buffering required when datagrams arrive from fabric faster than


the transmission rate
• Scheduling discipline chooses among queued datagrams for
transmission (FIFO, FQ, WFQ, RED)
• Queuing (delay) and loss due to output port buffer overflow!
11
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Network Address Translation (NAT)

• Motivation: local network uses just one IP address as far as outside


world is concerned
– Can change addresses of devices in local network without notifying
outside world
– Can change ISP without changing addresses of devices in local network
– Devices inside local net not explicitly addressable, visible by outside world
(a security plus)
12
Computer Networks (CS F303) BITS Pilani, Pilani Campus
NAT: Motivation???

rest of local network


Internet (e.g., home network)
10.0.0/24 10.0.0.1

10.0.0.4
10.0.0.2
138.76.29.7

10.0.0.3

all datagrams leaving local datagrams with source or


network have same single destination in this network
source NAT IP address: have 10.0.0/24 address for
138.76.29.7,different source source, destination (as usual)
port numbers
13
Computer Networks (CS F303) BITS Pilani, Pilani Campus
How it Works???
NAT translation table 1: host 10.0.0.1
2: NAT router WAN side addr LAN side addr
changes datagram sends datagram to
source addr from 138.76.29.7, 5001 10.0.0.1, 3345 128.119.40.186, 80
10.0.0.1, 3345 to …… ……
138.76.29.7, 5001,
updates table S: 10.0.0.1, 3345
D: 128.119.40.186, 80
10.0.0.1
1
S: 138.76.29.7, 5001
2 D: 128.119.40.186, 80 10.0.0.4
10.0.0.2
138.76.29.7 S: 128.119.40.186, 80
D: 10.0.0.1, 3345
4
S: 128.119.40.186, 80
D: 138.76.29.7, 5001 3 10.0.0.3
4: NAT router
3: reply arrives changes datagram
dest. address: dest addr from
138.76.29.7, 5001 138.76.29.7, 5001 to 10.0.0.1, 3345

14
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Facts about NAT

• 16-bit port-number field:


– How many devices can be connected?

• NAT is controversial:
– Routers should only process up to layer 3
– Violates end-to-end argument
– Address shortage should instead be solved by IPv6

15
Computer Networks (CS F303) BITS Pilani, Pilani Campus
NAT Traversal Problem

• Client wants to connect to server with


address 10.0.0.1 10.0.0.1
– Server address 10.0.0.1 local to LAN (client client
can’t use it as destination address) ?
– Only one externally visible NATed address: 10.0.0.4
138.76.29.7
138.76.29.7 NAT
router

16
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Solutions [.1]

• Statically configure NAT to forward incoming connection requests at given


port to server
– e.g., (138.76.29.7, port 25000) always forwarded to 10.0.0.1 port 25000

• Universal Plug and Play (UPnP) Internet Gateway Device (IGD) Protocol.
Allows NATed host to:
– Learn public IP address (138.76.29.7)
– e.g., BitTorrent application in the host asks NAT to create a hole that maps
(10.0.0.1,3345) to (138.76.29.7,5001)
– Add/remove port mappings (with lease times)
17
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Solutions [..2]

• Relaying (used in Skype)


– NATed client establishes connection to relay
– External client connects to relay
– Relay bridges packets between two connections
2. connection to
relay initiated 1. connection to 10.0.0.1
by client relay initiated
by NATed host
3. relaying
client established
138.76.29.7 NAT
router

18
Computer Networks (CS F303) BITS Pilani, Pilani Campus
IP Datagram Format
IP protocol version 32 bits
number total datagram
header length length (bytes)
ver head. type of length
(bytes) len service for
“type” of data fragment fragmentation/
16-bit identifier flgs
offset reassembly
max number time to upper header
remaining hops live layer checksum Detect bit errors
(decremented at
32 bit source IP address In Datagram headers
each router)
32 bit destination IP address
upper layer protocol
to deliver payload to options (if any) e.g. timestamp,
record route
how much overhead? data taken, specify
(variable length, list of routers
 20 bytes of TCP
typically a TCP to visit.
 20 bytes of IP
or UDP segment)
 = 40 bytes + app
layer overhead
19
Computer Networks (CS F303) BITS Pilani, Pilani Campus
IP Fragmentation & Reassembly [.1]

fragmentation:


in: one large datagram
out: 3 smaller datagrams

reassembly


Computer Networks (CS F303)
20
BITS Pilani, Pilani Campus
IP Fragmentation & Reassembly [..2]

length ID fragflag offset


example: =4000 =x =0 =0
 4000 byte datagram
one large datagram becomes
 MTU = 1500 bytes several smaller datagrams

1480 bytes in length ID fragflag offset


data field =1500 =x =1 =0

offset = length ID fragflag offset


1480/8 =1500 =x =1 =185

length ID fragflag offset


=1040 =x =0 =370

21
Computer Networks (CS F303) BITS Pilani, Pilani Campus
The Internet network layer
host, router network layer functions:

transport layer: TCP, UDP

routing protocols IP protocol


• path selection • addressing conventions
• RIP, OSPF, BGP • datagram format
network • packet handling conventions
layer forwarding
table
ICMP protocol
• error reporting
• router “signaling”

link layer

physical layer

22
Computer Networks (CS F303) BITS Pilani, Pilani Campus
How does a host get IP address?

• Hard-coded by system admin in a file


– Windows: control-panel->network->configuration->tcp/ip->properties
– UNIX: /etc/rc.config

• DHCP: Dynamic Host Configuration Protocol: dynamically get


address from as server
– Automate the process of connecting a host into a network. “plug-and-
play”
– Works as client (arriving host) and Server (DHCP server)
– Useful where hosts join and leave network frequently
23
Computer Networks (CS F303) BITS Pilani, Pilani Campus
DHCP: Dynamic Host Configuration Protocol

Goal: allow host to dynamically obtain its IP address from network server when it joins
network
– Can renew its lease on address in use
– Allows reuse of addresses (only hold address while connected/“on”)
– Support for mobile users who want to join network (more shortly)
DHCP overview:
– Host broadcasts “DHCP discover” msg
– DHCP server responds with “DHCP offer” msg
– Host requests IP address: “DHCP request” msg
– DHCP server sends address: “DHCP ack” msg
24
Computer Networks (CS F303) BITS Pilani, Pilani Campus
DHCP Client-Server Scenario

DHCP
223.1.1.0/24
server
223.1.1.1 223.1.2.1

223.1.1.2 arriving DHCP


223.1.1.4 223.1.2.9
client needs
address in this
223.1.3.27
223.1.2.2 network
223.1.1.3

223.1.2.0/24

223.1.3.1 223.1.3.2

223.1.3.0/24
25
Computer Networks (CS F303) BITS Pilani, Pilani Campus
DHCP
DHCP server: 223.1.2.5 DHCP discover arriving
client
src : 0.0.0.0, 68
dest.: 255.255.255.255,67
yiaddr: 0.0.0.0
transaction ID: 654

DHCP offer
src: 223.1.2.5, 67
dest: 255.255.255.255, 68
yiaddrr: 223.1.2.4
transaction ID: 654
lifetime: 3600 secs
DHCP request
src: 0.0.0.0, 68
dest:: 255.255.255.255, 67
yiaddrr: 223.1.2.4
transaction ID: 655
lifetime: 3600 secs

DHCP ACK
src: 223.1.2.5, 67
dest: 255.255.255.255, 68
yiaddrr: 223.1.2.4
transaction ID: 655
lifetime: 3600 secs 26
Computer Networks (CS F303) BITS Pilani, Pilani Campus
IPv6 Motivation

• Initial Motivation: 32-bit address space soon to be completely allocated.


• Additional motivation:
– Header format helps speed processing/forwarding
– Header changes to facilitate QoS
• IPv6 datagram format:
– Fixed-length 40 byte header
– No fragmentation allowed
• Ipv6 deployment status
– https://en.wikipedia.org/wiki/IPv6_deployment

27
Computer Networks (CS F303) BITS Pilani, Pilani Campus
IPv4 vs IPv6

Source: www.cisco.com 28
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Transition from IPv4 to IPv6

• Not all routers can be upgraded simultaneously


– No “flag days”
– How will network operate with mixed IPv4 and IPv6 routers?
• Tunneling: IPv6 datagram carried as payload in IPv4 datagram
among IPv4 routers

IPv4 header fields IPv6 header fields


IPv4 payload
IPv4 source, dest addr IPv6 source dest addr
UDP/TCP payload

IPv6 datagram
IPv4 datagram 29
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Tunneling [.1]

A B IPv4 tunnel E F
connecting IPv6 routers
logical view:
IPv6 IPv6 IPv6 IPv6

A B C D E F
physical view:
IPv6 IPv6 IPv4 IPv4 IPv6 IPv6

30
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Tunneling [..2]
A B IPv4 tunnel E F
connecting IPv6 routers
logical view:
IPv6 IPv6 IPv6 IPv6

A B C D E F
physical view:
IPv6 IPv6 IPv4 IPv4 IPv6 IPv6

flow: X src:B src:B flow: X


src: A dest: E src: A
dest: F
dest: E
dest: F
Flow: X Flow: X
Src: A Src: A
data Dest: F Dest: F data

data data

A-to-B: E-to-F:
IPv6 B-to-C: B-to-C: IPv6
IPv6 inside IPv6 inside 31
IPv4
Computer Networks (CS F303) IPv4 BITS Pilani, Pilani Campus
Dual Stack Approach

32
Computer Networks (CS F303) BITS Pilani, Pilani Campus
ICMP Protocol
• The Internet Control Message Protocol (ICMP) is a helper protocol that supports
IP with facility for
– Error reporting and Simple queries
– Used by hosts and routers to communicate network layer information to each other
• ICMP lies just above IP
– ICMP messages are encapsulated as IP datagrams bit # 0 7 8 15 16 23 24 31

type code checksum

ICMP Message additional information


or
0x00000000
from IP datagram that triggered the error

IP header ICMP header IP header 8 bytes of payload

When a host receives an IP packet with ICMP


specified as the upper layer protocol, it de-
type code checksum multiplexes the packet to ICMP, just as it would
Unused (0x00000000) de-multiplex a packet to TCP/UDP

33
Computer Networks (CS F303) BITS Pilani, Pilani Campus
ICMP Message Types

Type Message Type Description Code Definition


0 Net Unreachable
3 Destination Unreachable Packet could not be delivered
1 Host Unreachable
11 Time Exceeded Time to live field hit 0 2 Protocol Unreachable

12 Parameter Problem Invalid header field 3 Port Unreachable


4 Fragmentation needed & Don’t Fragment was set
4 Source Quench Choke Packet
5 Source Route failed
5 Redirect Teach a router about geography 6 Destination Network Unknown
7 Destination Host Unknown
8 Echo Request Ask a machine if it is alive
0 TTL Expired
13 Timestamp Request Same as Echo request, but with
0 Echo Request
timestamp
0 Echo Reply
0 Echo Reply Ping utility

34
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Traceroute and ICMP
 Source sends series of UDP segments to dest  When ICMP messages arrives,
 first set has TTL =1 source records RTTs
 second set has TTL=2, etc.
 unlikely port number
Stopping criteria:
 When nth set of datagrams arrives to nth  UDP segment eventually arrives at
router: destination host
 router discards datagrams  Destination returns ICMP “port
unreachable” message (type 3, code 3)
 and sends source ICMP messages (type 11, code 0)
 Source stops
 ICMP messages includes name of router & IP
address

3 probes 3 probes

3 probes 35
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Routing

• Typically a host is attached directly to one router which is called as


default router

• The default router connected to the source host is called as source


router

• The problem of routing a packet from source host to the destination


host boils down to the routing the packet from source router to the
destination router

36
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Routing Algorithm Taxonomy [.1]

• Global or Centralized Routing Algorithms


– Each node has the complete information about connectivity and link costs
(Link State Algorithms)

• Decentralized Routing Algorithms


– Each node begins with only the knowledge of the costs of its own directly
attached links
– Then uses an iterative process of calculation to find the least cost paths to
a set of destinations or all
37
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Routing Algorithm Taxonomy [..2]

• Static Routing Algorithms


– Routes change very slowly over time
– Forwarding tables are changed manually

• Dynamic Routing Algorithms


– Routes change due to the traffic load and/or change in topology

• Load sensitive Vs. Load-insensitive


– Link cost changes due to change in congestion level of the link

38
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Routing Abstraction
5

v 3 w
2 5
u 2 1 z
3
1 2
x 1
y

Graph: G = (N,E)

N = set of routers = { u, v, w, x, y, z }

E = set of links ={ (u,v), (u,x), (v,x), (v,w), (x,w), (x,y), (w,y), (w,z), (y,z) }

39
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Dijsktra’s Algorithm
5
1 Initialization:
2 N' = {u} 3
3 for all nodes p v w
2 5
4 if p adjacent to u
5 then D(p) = c(u,p) u 2 1 z
3
6 else D(p) = ∞ 1
2
x y
7 Loop 1
8 find q not in N' such that D(q) is a minimum
9 add q to N'
10 update D(p) for all p adjacent to q and not in N' :
11 D(p) = min( D(p), D(q) + c(q,p) )
12 /* new cost to p is either old cost to p or known
shortest path cost to q plus cost from q to p */
13 until all nodes in N'
40
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Link State Routing: Example

Step N’ D(v),p(v) D(w),p(w) D(x),p(x) D(y),p(y) D(z),p(z)


0 u 2,u 5,u 1,u ∞ ∞
1 ux 2,u 4,x 2,x ∞
2 uxy 2,u 3,y 4,y
3 uxyv 3,y 4,y
4 uxyvw 4,y
5 uxyvwz
5

3
v w
2 5

u 2 1 z
3
1
2
x y
1
41
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Route Oscillations

• Link cost is equal to the load carried on the link


• Link costs are not symmetric

1
A 1+e A A A
2+e 0 0 2+e 2+e 0
D 0 0 B D 1+e 1 B D B D 1+e 1 B
0 0
0 e 0 0
1
C C 0 1
C 1+e C 0
1
e
given these costs, given these costs, given these costs,
initially find new routing…. find new routing…. find new routing….
resulting in new costs resulting in new costs resulting in new costs

42
Computer Networks (CS F303) BITS Pilani, Pilani Campus
OSPF Protocol
• “open”: publicly available
• Uses link state algorithm
– LS packet dissemination
– Topology map at each node
– Route computation using Dijkstra’s algorithm
• OSPF advertisement carries one entry per neighbor
• Advertisements flooded to entire AS
– Carried in OSPF messages directly over IP (rather than TCP or UDP)
– Link state broadcast and reliable message transfer must be implemented in the
OSPF itself
– Broadcasts LSA whenever there is a change in link’s state and also send
Periodic updates (after every 30 mints)
43
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Hierarchical OSPF Routing
An AS can be configured hierarchically into areas. boundary router
Each area runs its own OSPF link state routing algorithm.
backbone router

backbone
area
border
routers

area 3

internal
routers
area 1
area 2
44
Computer Networks (CS F303) BITS Pilani, Pilani Campus
OSPF Messages
• HELLO
– To check whether links are operational or not
• Database Description
– contain descriptions of the topology of the AS or area
• Link State Request
– used by one router to request updated information about a portion of the Link
State Database Description (LSDB) from another router
• Link State Update
– contain information about an updated portion of the LSDB. These messages
are sent in response of a Link State Request message
• Link State Acknowledgement
– acknowledges a Link State Update message
45
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Thank You!

46
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Computer Networks (CS F303)
BITS Pilani Virendra Singh Shekhawat
Department of Computer Science and Information Systems
Pilani Campus
What’s Next…

• Network Layer
– Network layer service models (Datagram, Virtual Circuit)
– Forwarding versus Routing
– How a router works?
– IPv4 Datagram and Fragmentation
– IPv4 Addressing
• Sub netting, Hierarchical Addressing
– NAT, IPv4 to IPv6 translation, ICMP
– Routing Algorithms and Protocols
• Inter-domain Routing and Intra-domain routing
– Multicast Routing
– Virtual Circuit Networks
• MPLS, ATM
2
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Routing

• Typically a host is attached directly to one router which is called as


default router

• The default router connected to the source host is called as source


router

• The problem of routing a packet from source host to the destination


host boils down to the routing the packet from source router to the
destination router

3
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Routing Algorithm Taxonomy [.1]

• Global or Centralized Routing Algorithms


– Each node has the complete information about connectivity and link costs
(Link State Algorithms)

• Decentralized Routing Algorithms


– Each node begins with only the knowledge of the costs of its own directly
attached links
– Then uses an iterative process of calculation to find the least cost paths to
a set of destinations or all
4
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Routing Algorithm Taxonomy [..2]

• Static Routing Algorithms


– Routes change very slowly over time
– Forwarding tables are changed manually

• Dynamic Routing Algorithms


– Routes change due to the traffic load and/or change in topology
– Susceptible to problems such as routing loops and oscillations

• Load sensitive vs. Load-insensitive


– Link cost changes due to change in congestion level of the link

5
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Routing Abstraction
5

v 3 w
2 5
u 2 1 z
3
1 2
x 1
y

Graph: G = (N,E)

N = set of routers = { u, v, w, x, y, z }

E = set of links ={ (u,v), (u,x), (v,x), (v,w), (x,w), (x,y), (w,y), (w,z), (y,z) }

6
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Dijsktra’s Algorithm
5
1 Initialization:
2 N' = {u} 3
3 for all nodes r v w
2 5
4 if r adjacent to u
5 then D(r) = c(u,r) u 2 1 z
3
6 else D(r) = ∞ 1
2
x y
7 Loop 1
8 find s not in N' such that D(s) is a minimum
9 add s to N'
10 update D(r) for all r adjacent to s and not in N' :
11 D(r) = min( D(r), D(s) + c(s,r) )
12 /* new cost to r is either old cost to r or known
least path cost to s plus cost from s to r */
13 until all nodes in N'
7
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Link State Routing: Example

Step N’ D(v),p(v) D(w),p(w) D(x),p(x) D(y),p(y) D(z),p(z)


0 u 2,u 5,u 1,u ∞ ∞
1 ux 2,u 4,x 2,x ∞
2 uxy 2,u 3,y 4,y
3 uxyv 3,y 4,y
4 uxyvw 4,y
5 uxyvwz
5

3
v w
2 5

u 2 1 z
3
1
2
x y
1
8
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Route Oscillations

• Link cost is equal to the load carried on the link


• Link costs are not symmetric

1
A 1+e A A A
2+e 0 0 2+e 2+e 0
D 0 0 B D 1+e 1 B D B D 1+e 1 B
0 0
0 e 0 0
1
C C 0 1
C 1+e C 0
1
e
given these costs, given these costs, given these costs,
initially find new routing…. find new routing…. find new routing….
resulting in new costs resulting in new costs resulting in new costs

9
Computer Networks (CS F303) BITS Pilani, Pilani Campus
OSPF Protocol
• “open”: publicly available
• Uses link state algorithm
– LS packet dissemination
– Topology map at each node
– Route computation using Dijkstra’s algorithm
• OSPF advertisement carries one entry per neighbor
• Advertisements flooded to entire AS
– Carried in OSPF messages directly over IP (rather than TCP or UDP)
– Link state broadcast and reliable message transfer must be implemented in the
OSPF itself
– Broadcasts LSA whenever there is a change in link’s state and also send
Periodic updates (after every 30 mints)
10
Computer Networks (CS F303) BITS Pilani, Pilani Campus
OSPF: Advanced Features

• Link state updates can be authenticated


• Allows multiple same cost paths
• Support for hierarchy within a single routing domain
• Integrated support for multicast and unicast routing

11
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Hierarchical OSPF Routing
An AS can be configured hierarchically into areas. boundary router
Each area runs its own OSPF link state routing algorithm.
backbone router

backbone
area
border
routers

area 3

internal
routers
area 1
area 2
12
Computer Networks (CS F303) BITS Pilani, Pilani Campus
OSPF Messages
• HELLO
– To check whether links are operational or not
• Database Description
– contain descriptions of the topology of the AS or area
• Link State Request
– used by one router to request updated information about a portion of the Link
State Database Description (LSDB) from another router
• Link State Update
– contain information about an updated portion of the LSDB. These messages
are sent in response of a Link State Request message
• Link State Acknowledgement
– acknowledges a Link State Update message
13
Computer Networks (CS F303) BITS Pilani, Pilani Campus
OSPF Packet Format

14
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Distance Vector (DV) Routing

• Distributed
– Node receives some information from
its one or more neighbors wait for (change in local link
cost or msg from neighbor)
• Iterative
– Process continues until no more
info is exchanged recompute estimates

• Asynchronous
if DV to any dest has
– Does not require nodes to operate
changed, notify neighbors
in lockstep manner
• Self terminating!
15
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Distance Vector Routing Algorithm

• From time-to-time, each node sends its own At each node x


distance vector estimate to neighbors
– DV contains estimate of its cost to all
destinations in the network
• When x receives new DV estimate from
neighbor, it updates its own DV using
Bellman-Ford eq.:

dx(y) ← minv{c(x,v) + dv(y)} for each node y ∊ N

• Eventually, estimate Dx(y) converge to the


actual least cost dx(y)
16
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Bellman-Ford Example

5
3
dv(z) = 5, dx(z) = 3, dw(z) = 3
v w 5
2
u 2 1 z Cost of least cost path from u to z
3
1 2 du(z) = ???
x 1
y

17
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Dx(y) = min{c(x,y) + Dy(y), c(x,z) + Dz(y)}
= min{2+0 , 7+1} = 2

node x cost to cost to


table x y z x y z Dx(z) = min{c(x,y) +
Dy(z), c(x,z) + Dz(z)}
x 0 2 7 x 0 2 3
= min{2+1 , 7+0} = 3

from
from y ∞∞ ∞ y 2 0 1
z ∞∞ ∞ z 7 1 0

node y cost to
table x y z y
2 1
x ∞ ∞ ∞
x z
from

y 2 0 1 7
z ∞∞ ∞

node z cost to
table x y z
x ∞∞ ∞
from

y ∞∞ ∞
z 7 1 0
18
time
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Dx(y) = min{c(x,y) + Dy(y), c(x,z) + Dz(y)}
= min{2+0 , 7+1} = 2

node x cost to cost to cost to


table x y z x y z x y z Dx(z) = min{c(x,y) +
x 0 2 7 x 0 2 3 x 0 2 3 Dy(z), c(x,z) + Dz(z)}

from
y ∞∞ ∞ y 2 0 1 = min{2+1 , 7+0} = 3
from
y 2 0 1

from
z ∞∞ ∞ z 7 1 0 z 3 1 0
node y cost to cost to cost to
table x y z x y z x y z y
2 1
x ∞ ∞ ∞ x 0 2 7 x 0 2 3 x z
from

y 2 0 1 from
y 2 0 1 y 2 0 1 7

from
z ∞∞ ∞ z 7 1 0 z 3 1 0

node z cost to cost to cost to


table x y z x y z x y z
x ∞∞ ∞ x 0 2 7 x 0 2 3
from

from
y 2 0 1 y 2 0 1
from

y ∞∞ ∞
z 7 1 0 z 3 1 0 z 3 1 0
19
time
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Distance Vector: Link Cost Changes [.1]
Link cost changes: 1
Y
 Node detects local link cost change 4 1
 Updates the distance table X Z
 If cost change in least cost path, notify neighbors 50

20
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Distance Vector: Link Cost Changes [..2]
Link cost changes:
 Bad news travels slow - “count to 60
infinity” problem! Y
4 1
X Z
50

21
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Poisson Reverse and Split Horizon

• Split Horizon rule states that a route can't be advertised out of the
interface if the next hop for the advertised route is found on that
interface.

• Poison Reverse rule states that routes received via one interface
have to be advertised back out from that interface with an
unreachable metric

22
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Split Horizon with Poisson Reverse

• If Z routes through Y to get to X:


– Z tells Y its (Z’s) distance to X is infinite (so Y won’t route to X via Z)

23
Computer Networks (CS F303) BITS Pilani, Pilani Campus
RIP ( Routing Information Protocol)

• Distance vector algorithm


• Included in BSD-UNIX Distribution in 1982
• Distance metric: # of hops (max = 15 hops)

• Distance vectors: exchanged among neighbors every 30 sec via Response


Message (also called advertisement)
• Each advertisement: list of up to 25 destination nets within AS

24
Computer Networks (CS F303) BITS Pilani, Pilani Campus
RIP Table processing

• RIP routing tables managed by application-level process called route-d (daemon)


• Advertisements sent in UDP packets, periodically repeated.

routed routed

Transprt Transprt
(UDP) (UDP)
network forwarding forwarding network
(IP) table table (IP)
link link
physical physical
25
Computer Networks (CS F303) BITS Pilani, Pilani Campus
RIP: Example
z
w x y
A D B

C
Destination Network Next Router Num. of hops to dest.
w A 2
y B 2
z B 7
x -- 1
…. …. ....
Routing table in D
26
Computer Networks (CS F303) BITS Pilani, Pilani Campus
RIP: Example
Dest Next hops
w - - Advertisement
x - - from A to D
z C 4
…. … ... z
w x y
A D B

C
Destination Network Next Router Num. of hops to dest.
w A 2
y B 2
z BA 75
x -- 1
…. …. ....
Routing table in D 27
Computer Networks (CS F303) BITS Pilani, Pilani Campus
RIP: Link Failure and Recovery

If no advertisement heard after 180 sec --> neighbor/link declared


dead
– Routes via neighbor invalidated
– New advertisements sent to neighbors
– Neighbors in turn send out new advertisements (if tables changed)
– Link failure info quickly propagates to entire net
– Poison reverse used to prevent ping-pong loops (infinite distance = 15 hops)

28
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Weaknesses of RIP

• INFINITY defined as 15
– thus RIP cannot be used in networks where routes are more than 15 hops

• Difficulty in supporting multiple metrics (default metric: # of hops)


– The potential range for such metrics as bandwidth, throughput, delay, and reliability
can be large
– Thus the value for INFINITY should be large; but this can result in slow convergence of
RIP due to count-to-infinity problem

29
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Internet Routing System:
Two Tier
• Inter-domain routing: Between ASes
– Routing policies are based on business relationships
– No common metrics, and limited cooperation
– BGP: policy-based, path-vector routing protocol
• Intra-domain routing: Within an AS
– Shortest-path routing based on link metrics
– Routers are managed by a single institution
– OSPF and IS-IS: link-state routing protocol
– RIP and EIGRP: distance-vector routing protocol
30
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Next…
• BGP
– ASes, Policies
– BGP Attributes
– BGP Path Selection
– I-BGP vs. E-GBP

31
Computer Networks (CS F303) BITS Pilani, Pilani Campus
The BIG Picture

32
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Autonomous Systems (ASes)
• Autonomous system
– AS is an actual entity that participates in routing
– Has an unique 16 bit ASN (now 32 bit [RFC 4893 @ 2007]) assigned to it and
typically participates in inter-domain routing

• Examples:
– MIT: 3, CMU: 9
– AT&T: 7018, 6341, 5074, …
– UUNET: 701, 702, 284, 12199, …
– Sprint: 1239, 1240, 6211, 6242, …

33
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Let’s Find out…

• How do ASes interconnect to provide global connectivity?

• How does routing information get exchanged?

34
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Interconnected ASes

3c
3a 2c
3b 2a
AS3 2b
1c AS2
1a 1b AS1
1d

Intra-AS Inter-AS
Routing Routing
algorithm algorithm

Forwarding
table

35
Computer Networks (CS F303) BITS Pilani, Pilani Campus
AS Categories [Stub/Multi-homed/Transit]

Traffic NEVER
flows from AS 2
through AS 1 to AS 3

AS 3
AS 2

AS 1
Traffic
AS 1 carries ONLY
flows from AS 2
local traffic
through AS 1 to AS 3
36
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Inter-domain Routing in the Internet

• Link state or distance vector?

• Problems with distance-vector:


– Bellman-Ford algorithm may not converge

• Problems with link state:


– Metric used by routers not the same – loops
– LS database too large – entire Internet
– May expose policies to other AS’s

37
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Solution: Distance Vector with Path

• Each routing update carries the entire path


• Loops are detected as follows:
– When AS gets route, check if AS already in path
• If yes, reject route
• If no, add self and (possibly) advertise route further

38
Computer Networks (CS F303) BITS Pilani, Pilani Campus
BGP-4
• BGP = Border Gateway Protocol
• Is a Policy-Based routing protocol
• It is the EGP of today’s global Internet
• Relatively simple protocol, but configuration is complex

1989 : BGP-1 [RFC 1105]


– Replacement for EGP (1984, RFC 904)

1990 : BGP-2 [RFC 1163]


1991 : BGP-3 [RFC 1267]
1995 : BGP-4 [RFC 1771]
2006: BGP-4 [RFC 4271]
– Support for Classless Interdomain Routing
(CIDR) , Route Aggregation
39
Computer Networks (CS F303) BITS Pilani, Pilani Campus
BGP Operations

Open : Establish a peering session. AS1


Keep Alive : Handshake at regular
intervals.
Notification : Shuts down a peering
session. Establish session on BGP session
Update : Announcing new routes TCP port 179

or withdrawing previously
announced routes.
AS2

Route announcement = While connection


is ALIVE exchange
prefix + attributes values route UPDATE messages
40
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Fundamental Rules: BGP

• BGP advertises to neighbors only those routes that it uses


– Consistent with the hop-by-hop Internet paradigm
• No need for periodic refresh - routes are valid until withdrawn,
or the connection is lost
• Incremental updates are possible

41
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Policy Decisions
• BGP provides capability for enforcing various policies

• BGP enforces policies by choosing paths from multiple alternatives and


controlling advertisement to other ASes

• Import policy
– What to do with routes learned from neighbors?

• Export policy
– What routes to be announced to neighbors?
42
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Distributing Path Information

3c
3a
3b
AS3 2c other
1c 2a networks
other 1a 2b
networks 1b AS2
AS1 1d

43
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Export Policy

• Once the route is announced the AS is willing to transit


traffic on that route

• To Customers: Announce all routes learned from peers,


providers and customers, and self-origin routes

• To Providers and Peers: Announce routes learned from


customers and self-origin routes

44
Computer Networks (CS F303) BITS Pilani, Pilani Campus
How to implement export policies?

• BGP Attributes
– Local Preference
– AS-Path Length
– MED (Multi Exit Discriminator)
– NEXT-HOP

45
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Local Preference

• Used to choose outbound external BGP path.


Local-pref = 90
AT&T Sprint

Local-pref = 100
Tier-2

Tier-3 Yale

46
Computer Networks (CS F303) BITS Pilani, Pilani Campus
NEXT-HOP

The next hop IP address that is going to be used to reach a certain destination.

15.33.50.0/24, AS-PATH: AS3, NEXT-HOP: IP ADD of 3a-1c interface

3c
3a
3b
AS3 2c other
1c 2a networks
other 1a 2b
networks 1b AS2
AS1 1d

49
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Internal BGP vs. External BGP

•R3 and R4 can learn routes by using BGP


•How do R1 and R2 learn routes?
•Option 1: Inject routes in IGP
Different rules about re-advertising prefixes in I-BGP
–Only works for small routing tables Prefix learned from E-BGP neighbor can be advertised to I-BGP
•Option 2: Use I-BGP neighbor and vice-versa, but Prefix learned from one I-BGP
neighbor cannot be advertised to another I-BGP neighbor.
Subnet-1

R1 E-BGP
AS1 AS2
R3 R4

R2
Subnet-2

50
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Route Selection Process

51
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Thank You!

52
Computer Networks (CS F303) BITS Pilani, Pilani Campus
Computer Networks
CS F303

DATA LINLK LAYER

Ashutosh Bhatia
Department of Computer Science and Information Systems
BITS Pilani Birla Institute of Technology and Science
Pilani|Dubai|Goa|Hyderabad Pilani Campus, Pilani
Data-Link Layer
• Frame-by-Frame next-hop delivery
– Frame: Block of data exchanged at link layer
• Uses services of PHY layer (which delivers bits) to
deliver frames

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Link Layer Protocols
• Link could be point-to-point or broadcast
– Broadcast: Many nodes connected to same communication
channel (e.g. wireless)
• Protocol:
– Define format of frames to be exchanged over the link
– In response to frames, action to be taken by nodes
– Examples: Ethernet, Token-Ring, WiFi, PPP etc

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Services
• Logical Link Control (LLC): Interface between Network layer and
MAC sub-layer
– Multiplexing
– Error Detection
– Error Recovery (optional)
– Flow Control (optional)
• Media Access Control (MAC): Controls access to physical media
(Broadcast Channels)
• Switching (Interconnecting LANs)

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Data Link Layer
Error Control
Error Detection
• What cause errors?
– Distortion of signals due to frequency dependant
attenuation, noise (PHY layer)
– Random single-bit vs Bursty errors
• Why detect errors?
– Data fidelity, prevent wastage of resources

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


What next?
• After Detection:
– Drop Frame
• Higher layers (e.g TCP) will recover or few losses dont hurt
applications (e.g. audio)
– Recover Frame
• Error Correction: Frame carries enough information to correct
errors
• Retransmission: Receiver signals sender on error, sender
retransmits the frame

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Error correction vs Retransmission

• Error correction requires more redundant bits per frame


than error detection
– Redundancy bits are sent all the time (every frame)
• Retransmission requires another copy to be transmitted
– Copy sent only on error

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Usage
• Error correction useful when
– Error rate if high (e.g. wireless)
– Cost (e.g. latency) of retransmission is too high (e.g. satellite
link)

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Framework
• Add redundant information to a frame to detect or correct errors
• At Sender: Add k bits of redundant data to a m bit message
– k derived from original message through some algorithm
• At Receiver: Reapply same algorithm as sender to detect errors;
take corrective action if necessary
• Examples:
– Detection: k << m; k = 32; m ~12,000 for Ethernet
– Correction: Code Rate: m/(m+k)
• WiFi code rates range from 1/2 to 5/6

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Hamming Distance
• Code word: n=m+k bits
• Hamming distance between two codewords:
• Number of bits they differ in
– XOR the two codewords
• Example:
– Codewords: 01110110, 00011101
– Distance is 5

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Hamming Distance of a Code
• Number of possible code words is 2n
• Legal code words = 2m (determined by the algorithm)
• Among the list of legal code words, find the smallest
hamming distance between two code words
– This is the hamming distance of a code (=d)

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Rules
• The error detection/correction capabilities are a
function of the code’s hamming distance
• Error Detection: Can detect up to d-1 errors
– To change one codeword to another require atleast d bit
changes
• Error Correction: Can correct up to (d-1)/2 errors
– The received codeword (in error) is still closer to the original
codeword than any other codeword

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Example

• Repetition code
– 0 -> 000; 1 -> 111
– m=1, k=2; n=3
– Hamming distance is 3, code rate is 1/3
• Can detect up to 2 errors and correct up to 1 error

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Design Considerations of a Code

• Reduce k to achieve high code rate


• For given values of n and k, maximize d
• Easy encoding and decoding
–Minimal memory and processing time

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Focus

• Error Detection
• Reliable Transfer (retransmissions)
• Error Correction
–E.g. Reed-Solomon codes, Convolution codes,
–Turbo codes

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Summary
• Important to detect errors in frames
• Error Recovery: FEC or Retransmission
– Inherent tradeoffs
• Framework (Overview)
– Hamming distance and error detection/correction
capabilities
– Design considerations
• Going forward: Error detection (in detail)
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
General Approach

• Add redundant information to a frame


• At Sender:
– Add k bits of redundant data to a m bit message
– k << m; k = 32; m = 12,000 for Ethernet
– k derived from original message through some algorithm
• At Receiver:
– Reapply same algorithm as sender to detect errors; take corrective action
if necessary

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Parity Bit

• Even Parity: 1100, send 11000


• Detects odd number of errors

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Two Dimensional Parity
• Used by BISYNC protocol for
ASCII characters

• “N + 8” bits of redundancy for


“N” ASCII characters (character
is 7 bits)

• Catches all 1, 2, 3 bit errors and


most 4 bit errors

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Internet Checksum

• Used at the network layer (IP header)


• Algorithm:
– View data to be transmitted as a sequence of 16-bit integers.
– Add the integers using 16 bit one's complement arithmetic.
– Take the one's complement of the result – this result is the
checksum
– Receiver performs same calculation on received data and
compares result with received checksum
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Example
• Sender: IPV4 header in hexadecimal
– 4500 0073 0000 4000 4011 c0a8 0001 c0a8 00c7 (16-bit words)
– Sum up the words (can use 32 bits): 0002 479c
– Add carry to the 16-bit sum: 479e
– Take the complement: b861 checksum
• Receiver:
– Sum up the words including checksum (use 32 bits): 2fffd
– Add carry to the 16 bit sum: ffff (= 0 in 1’s complement) no error
was detected
example values from wikipedia
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Internet Checksum

• Not very strong in detecting errors


– Pair of single-bit errors, one which increments a word, other
decrements a word by same amount
• Why is it used still?
– Very easy to implement in software
– Majority of errors picked by CRC at link-level (implemented
in hardware)

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Cyclic Redundancy Check (CRC)

• Used by many link-level protocols: HDLC, DDCMP,


Ethernet, Token-Ring
• Uses powerful math based on finite fields
• Background: Polynomial Arithmetic

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Polynomial Arithmetic

• Represent a m bit message with a polynomial of


degree “m-1”
–11000101 = x7 + x6 +x2 + 1
• Arithmetic over the field of integers modulo 2
(coefficients are 1 or 0)
• Addition or subtraction are identical: XOR

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Polynomial Arithmetic

• Polynomial division (very similar to integer


division)
–X/Y is X = q*Y + r
–For integers: 0<=r<Y
–For polynomials: degree of r (remainder polynomial)
is less than divisor polynomial

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Cyclic Redundancy Check (CRC)
• Message polynomial M(x): m bit message represented with a
polynomial of degree “m-1”;
– 11000101 = x7 + x6 +x2 + 1
• Sender and receiver agree on a divisor polynomial C(x) of degree k
– k: Number of redundancy bit
– E.g. C(x) = x3+x2+1 (degree k = 3)
– Choice of C(x) significantly effects error detection and is derived carefully
based on observed error patterns
– Ethernet uses CRC of 32 bits, HDLC, DDCMP use 16 bits
– Ethernet: x32+x26+x23+x22+x16+x12+x11+x10+x8+x7+x5+x4+x2+x+1

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Idea
• Sender sends m+k bits => Transmitted message P(x)
• Contrive to make P(x) exactly divisibly by C(x)
• Received message R(x)
– No errors: R(x) = P(x), exactly divisible by C(x)
– Errors: R(x) ~= P(x); likely not divisible by C(x)

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Generate P(x)

• You have M(x) and C(x). Generate P(x)


• Multiply M(x) by xk to get T(x)
– Add k zeros at the end of the message
– Divide T(x) by C(x) to get remainder R(x)
• Subtract remainder R(x) from T(x) to get P(x)
• P(x) is now exactly divisible by C(x)

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Details

• T(x) = x^k M(x) = Q(x) C(x) + R(x)


• P(x) = x^k M(x) –R(x) = x^kM(x) +R(x) =Q(x)C(x)
– Coefficients of R(x) are the redundant bits
– Transmitted Bits: Messaged (n) bits, followed by redundant
bits (k)

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Example

• Message (M): 11001011

• Divisor (C): 1101

• T: 11001011000

• Remainder (R): 101

• Transmitted Bits (P): 11001011101

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Error Detection

• Received polynomial = P(x) + E(x)


• E(x) captures bit map of the positions of errors
• Cannot detect errors if E(x) is also divisible by C(x)
• Goal: Design C(x) such that for anticipated error
patterns, E(x) is not divisible by C(x)

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Example
• Detect all instances of odd number of bit errors
• E(x) contains odd number of terms with coefficient of ‘1’
– Implies E(1) = 1

• If C(x) were a factor of E(x), then C(1) would also have to be 1


• If C(1) = 0, we can conclude C(x) does not divide E(x)
• If C(x) has some factor of the form x^i+1, then C(1)=0

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Capabilities
• All single-bit errors, if xk and x0 have non-zero coefficients
• All double-bit errors, if C(x) has at least three terms
• All odd bit errors, if C(x) contains the factor (x + 1)
• Any bursts of length <= k, if C(x) includes a constant term (x0
term)
• CRC is easily implementable on shift registers

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Data Link Layer :
Media Access Control
(MAC)
Problem
• Status: Can transfer data reliably between two point-to-point
nodes
• Next: How to make a few tens of nodes talk?

?
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Outline
• How to interconnect nodes? Network Topology
• How to mediate access among the nodes? Media Access
Control (MAC)
– Categorize and discuss some popular MAC protocols
• Overview, merits and demerits

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Network Topologies

Note: Shared Wire or Medium is broadcast


BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Types of Transmission
• Unicast:
– Packet is intended for one node only
• Broadcast
– Packet intended for everyone
• Multicast
– Packet intended for a subset.

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Media Access Control (MAC)

• Two or more simultaneous transmissions by nodes 


interference (collision)
• MAC: Protocol that determines how nodes share
channel
– Determine when a node can transmit
– Communication about channel sharing must use channel
itself!

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Ideal MAC
• Broadcast channel of rate R bps
– When one node wants to transmit, it can send at rate R.
– When M nodes want to transmit, each can send at average
rate R/M
– Simple and easy to implement
– Fault tolerant

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Human Analogy

• Speed Dating Party: Couples want to talk with each


other
– Assumption: Everyone talks loudly, so everyone can hear
everyone else in the room  If two speaker talk at same
time, none can understand what was talked
• How would you facilitate meaningful conversations (no
interference)?

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Channel Partitioning Protocols

Divide resource into smaller “pieces”.


Allocate piece to node for exclusive use.
Time Division Multiplexing
• Allocate couples different time slots  Time Division
Multiplexing (TDM)
• Time divided into time frames. Time frames divided into N
time slots. Each sender allocated one time slot.
• Disadvantage: Sender limited to R/N even when other senders
are idle, channel access delay

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Frequency Division Multiplexing
• Move couples to different rooms Frequency Division
Multiplexing (FDM)
• Spectrum divided into frequency bands
– Sender/Receivers tune in to assigned frequency band
– If there are N senders, each sender gets R/N bandwidth
• Disadvantages:
– A sender limited to R/N even when other senders are idle
– Sender-Receiver channel coordination

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Frequency Division Multiplexing

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Human Analogy

• Speed Dating Party: Couples need to talk with each


other
– Assumption: Everyone talks loudly, so everyone can hear
everyone else in the room
• How would you facilitate meaningful conversations (no
interference)?

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Random Access Protocols

• Polite Speaker: Listen. If its quiet, start talking. If this


clashes with others, backoff and try again.
• No a priori coordination among nodes
• Sender transmits at full rate. If two or more transmit at
same time  Collision

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Random Access Protocols

• Specify:
– How to detect collisions?
– How to recover from collisions?
• Disadvantages:
– High load leads to too many collisions and wastage of
resources

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Taking Turns Protocols
• Quickly poll to see who wants to talk, give time slots to only
speakers
• Channel partitioning MAC protocols: efficient and fair at high
load, inefficient at low load
• Random access MAC protocols: efficient at low load, inefficient
at high load
• Taking Turns protocols: Make the best of both worlds!

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Polling (Centralized)
• A central coordinator polls nodes in a round robin fashion
• Disadvantages:
– Polling overhead (single user will get rate < R)
– Single point of failure (coordinator)

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Token Passing (Decentralized)
• Control token passed from one node to next in certain order
• Concerns
– Token overhead
– Single point of failure (token)

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Summary
• Many nodes sharing a link  Need Media Access Control
• Three broad classes of Protocols:
– Channel Partitioning: Divide resource into smaller “pieces” (time
slots, frequency, code); Allocate piece to node for exclusive use
– Random Access: Allow full access to resource but provide means to
recover from collisions
– Taking turns: Take turns using the resource, but nodes with more
need get longer turns
• Next: Explore some popular technologies along with their
corresponding MAC

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Aloha
Background
• 1970’s : Wireless computer network developed at University of
Hawaii to interconnect Hawaiian islands
– First operational packet radio network
• Inspiration to many standards: Ethernet, WiFi, Cellular (random
access channels)
• Simple and relatively easy to analyze

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Pure Aloha
• Senders transmit whenever they have a packet to send
• Sender can determine status of packet (intact or collision) at
end of transmission
• If collision, sender waits a random amount of time and tries
again

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Efficiency
• What is the efficiency of ALOHA?
– What is the probability that a transmitted frame does not suffer
collision?

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Assumptions
• Frames are of equal length
• Probability of k transmission attempts per frame time (old
retransmissions and new) is Poisson with mean G per frame
time.
– Pr[k] = Gk e-G / k!

(Infinite user population generating new frames with a poisson


distribution with mean rate less than 1 per frame)

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Throughput

• Throughput S = G * Ps
– Ps: probability that a frame is successful i.e. did not suffer
collision
• Determine Ps
– Under what conditions will a frame not suffer collision?

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Vulnerable Period

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Analysis
• Consider the sequence of successive transmission attempts on
the channel.
• For some given i, let Ti be the time interval between the ith and
the i+1th transmission attempt
• ith attempt will be successful if both Ti and Ti-1 exceed frame
time
– Intervals are independent

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Analysis

• From Poisson distribution, the inter arrival time


between attempts is exponential
– Pr (T > frame time) = e-G
– Prob of success = Ps = e-G * e-G = e-2G
– S = GPs = G e-2G
• Maximum Throughput: G = 0.5, S = 0.184 (18%)

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Slotted Aloha

• Time divided into discrete intervals (slots)


– Slot interval corresponds to frame time
• Nodes can transmit frames only at beginning of slots
– Nodes are time synchronized
• Vulnerable period reduced by half

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Vulnerable Period (Slotted Aloha)

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Analysis
• S = G e-G
• Maximum Throughput: G=1, S = 1/e = 0.368 (36.8%)
• At G=1, empty slots is 37%, successes is 37% and collisions is
26%
• Higher values of G decrease empty slots, but increase collisions
exponentially

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Another Method
• N nodes with many frames to send
• A node transmits with probability p in a slot
• Prob that a given node succeeds = p (1-p)N-1
• Prob that a slot is a success = E(N,p) = prob any node succeeds =
Np(1-p)N-1
• For maximum efficiency, find p such that maximizes Np(1-p)N-1
• p* turns out to be 1/N
• Efficiency = 1/e, In the limit N -> infinity

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Theory vs Practice
• Assumptions very important
• Reality can be very different from theory
• Example:
– Poisson arrivals not true
– Fixed packet size not true
– Infinite population not true
– Other parameters, buffering, slotting

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Summary
• Looked at two simple random access protocols – Pure Aloha
and Slotted Aloha
• Looked at how such protocols can be theoretically evaluated
• Maximum efficiency of both is rather poor
• Another class of random access protocols: CSMA (Carrier Sense
Multiple Access)

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Outline
• CSMA class of protocols
– Persistent and Non-persistent
• Ethernet MAC : CSMA/CD
– Applicable for Bus or Star topology in shared mode

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Problems with Aloha

• What causes collisions?


–Pure Aloha: Transmissions without care or concern for
channel state
–Slotted Aloha: Multiple arrivals in previous slot ->
collision in current slot (greedy to access channel)

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


CSMA
• ‘Listen before Talk’ – Carrier Sense
• If a node has a frame to send, listen to channel first
– If busy, don’t send frame --- don’t disrupt ongoing transmission
– If idle, send frame
• Two categories: persistent and non-persistent

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


1-Persistent CSMA
• Employed by Ethernet
• If a node has a frame to send:
– If channel busy, wait till it becomes idle, then transmit
– If channel idle, transmit
– If collision, wait a random amount of time and start over
• Better than Pure Aloha, but is it better than Slotted Aloha?

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Non Persistent CSMA
• Used in 802.15.4 (Zigbee/Sensor technology)
• If a node has a frame to send:
– If channel busy, do not sense anymore. Wait a random amount of
time and try again
– If channel idle, transmit
– If collision, wait a random amount of time and start over
• Better channel utilization than 1-persistent but longer delays

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


P-persistent
• Employed by 802.11 (WiFi)
• Assumes a slotted system
• If a sender has a frame to send:
– If channel idle, transmit with probability p (defer to next slot with
probability q=1-p). Repeat till frame sent or channel busy due to
another transmission
– If channel busy, wait till idle. Repeat above.
– If collision, wait a random time and try again
• Good Tradeoff between non-persistent and 1-persistent

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


QUIZ
• In which of the following protocols can a channel be idle at the end of a transmission even when there are
nodes with traffic to send?
– Slotted aloha
– 1-persistent
– Non-persistent
– P-persisitent

• Which of the following protocols does better in terms of channel utilization at high loads?
– Slotted aloha
– 1 persistent
– Non persistent
– 0.5 persistent

• Which among the following is a slotted system?


– Pure aloha
– 1 persistent CSMA
– Non persistent CSMA
– P persistent CSMA

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Ethernet MAC
• CSMA/CD: Carrier Sense Multiple Access (1-persistent) with
Collision Detection
– ‘Listen before talk’
– Simultaneous talking, stop talking reduces wastage of resource
• Following explanation applicable to 10Mbps Ethernet

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Collision Detection

• Cases under which collision occurs?


– Two stations waiting for channel to become idle
– Two stations attempting transmission at same time on an idle
channel
– Two stations attempting transmission at slightly different times on an
idle channel
• Effect of propagation delay

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Collision Detection

• Collision detection done by hardware


• Propagation delay affects efficiency
– Longer the propagation delay, higher chances of collision
• Worst case delay of detecting collision?
– One RTT

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Collision Detection

• On detecting collision send a jamming signal of 32 bits


• Why jamming signal?
– Ethernet Frame is 96 bits (64 bits preamble + 32 jamming)
– Jamming extends the frame to allow collision detection

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Frame Size
• Minimum frame size is 64 bytes (512 bits)
– 46 bytes of payload (18 byte header)
• Why this restriction?
– A host must transmit for one RTT to detect all collisions
– This RTT for 2500m long cable with 4 repeaters is about 51.2us
(10Mbps -> 512 bits)
• Maximum number of hosts: 1024 in a collision domain

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


CSMA/CD

• Adaptor has a frame to send:


– If channel idle (for 96 bit time), start transmission. If busy,
wait until channel idle (+96 bits time) and then transmit.
– If no collision detected, done
– If collision detected, stop transmission, send jamming signal.
Enter exponential backoff
• (For 10Mbps, bit time is 0.1us)

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Exponential Backoff

• When transmitting a frame after mth collision


– Wait for K*512 bit times and return to step 1
– K chosen at random from {0,1,2 ….. 2m-1} m = min (n,10)
– 1st collision: choose K from {0,1}
– 2nd collision: choose K from {0,1,2,3}
– After 10th collisions, choose K from {0,1,2,3,4,…,1023}
– Maximum number of transmissions of a frame: 16 (15
retransmissions)
– Size of k grows exponentially after each collision

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Exponential Backoff

• Why exponential backoff?


– Adapts to current load
– Not very fair (Capture effect)
• Why 512 bit time?
– Ensures that if a node chose a lower value of K than any
other node, it can transmit without collision

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Efficiency
• Long run fraction of ‘useful’ time on the channel
– Large number of nodes with large number of frames to transmit
– Efficiency = 1 / [1+5(Tprop/Ttx)]

• Tprop = max prop time between 2 nodes in LAN


• Ttx = time to transmit a frame
• As Tprop approaches 0 or Ttx becomes large, efficiency
approaches 1

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Exercise
• Suppose you want to employ CSMA/CD (not the Ethernet
standard as such) on a bus topology of length 2km and rate
10Mbps. What minimum size frame in bytes would you use?
Format X. Assume speed of light is 2* 10^8m/s

Ans. 25
• The propogation delay is 2 * 10^3/2*10^8 = 10us. RTT = 20us.
Bits that can be sent during 20us = 200 bits = 25 bytes.
Hence we need a frame size of at least 25 bytes.

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Ethernet Capture Effect
• Two stations A, B employing Ethernet standard are competing for access. Suppose,
both stations have many frames to send (labelled A1, A2, ... for A and B1, B2, .... for
B). Suppose A and B simultaneously transmit their respective first frames and
collide. After their first collision, A chooses k=0, while B chooses k=1. This ensures
that transmission of A1 is successful while B1 is yet to meet with success. At the end
of A1's transmission, A attempts to send A2 and B attempts to send B1. What is the
probability that A2's transmission succeeds over B1's?

Ans. 0.625.
A is choosing between k = {0,1}, while B is choosing between k = {0,1,2,3}. The number
of possible combinations is 2 * 4 = 8. Out of these, if A chose k=0, it will win if B chose
{1,2,3} = 3 choices But if A chose k=1, it will win if B chose {2,3} = 2 choice. Hence A2
will succeed = 5/8.

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Exercise
• Suppose two nodes A and B both transmit packets at the same time and
collide. As soon as they collide, they send the jamming signal and enter
exponential backoff. Assume the time at which both A and B finish
transmitting the last bit of the jamming signal is set to 0, and the one way
propagation delay between the nodes is 250 bit times. Also suppose that A
chose a value of k=0 and B chose a value of k=1 (waits 512 bit times).
1. When does node A begin transmission? Express the answer in terms of
bit times.
2. When does node A's first bit reaches B relative to time 0? Express answer
in bit times.
3. When does node B attempt to transmit its first bit?

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Computer Networks
CS F303

Obtaining IP Addresses
Ashutosh Bhatia
Department of Computer Science and Information Systems
BITS Pilani Birla Institute of Technology and Science
Pilani|Dubai|Goa|Hyderabad Pilani Campus, Pilani
Problem Statement

• IP layer forwarding is based on IP addresses

• Next-hop delivery based on Link addresses (MAC)

• Need to perform IP to MAC address translation

• Answer: Address Resolution Protocol (ARP)

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Address Resolution Protocol (ARP)
• Operates at Link layer (Frame type = 0x0806)

• Based on broadcast: What is the MAC address corresponding


to given IP address?
– Host with matching IP address replies

• Each host maintains a cache with IP to MAC translations


– Entries in cache timed out periodically (15 min)
– arp –a shows all the ARP cache entries

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Address Resolution Protocol (ARP)

• Originator: Add entry to cache corresponding to target

• Target: Add entry to cache corresponding to the originator


(sender)

• Intermediate hosts: Refresh existing entries


• When forwarding a datagram, check cache, if no mapping,
invoke ARP

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


ARP Packet Format

Numbers in brackets capture mapping IP


addresses to Ethernet addresses

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Proxy ARP
• Router answers ARP requests for a host on of the network on another
interface
• The sender of the ARP request thinks that the router is the destination host
• The router acts as proxy agent for the destination host, and relays its packets
• Motivation
– Can hide a number of machines
– All packets for these machines have to pass through the router running
Proxy ARP, where the packets can be examined
– The sender does not know that its packets are passing through a machine
and are being checked

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


How Does Proxy ARP Work?
• The Host A (172.16.10.100) on Subnet A needs to send
packets to Host D (172.16.20.200) on Subnet B.

• As Host A has a /16 subnet mask, it believes that it is


directly connected to Host D, so it sends an ARP request
to Host D. But does not reach Host D.

• The router sends Proxy ARP reply to Host A telling its


own MAC address as the host D MAC address .

• Upon receipt of this ARP reply, Host A updates its ARP


table

• Later whenever router receives a packet for host D at


interface e0 from anybody in subnet A, it relays the
packet to interface e1
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Gratuitous ARP
• Wireshark/Tcpdump Output
– arp who-has IP_x tell IP_y
– Source protocol address: IP_y
– Target protocol address: IP_x

• Sometimes, one sees IP_x= IP_y


– The sender knows its address; yet it issues a request asking to resolve its address
– Hence “gratuitous”
– Does not expect a reply

• But if a reply arrives: misconfigured system!

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Gratuitous ARP

• A feature of ARP
– Host H has an entry in its ARP cache for IP address X
– It receives an ARP request from IP address X for some
address Y
– Even though H does not reply to the ARP request, it updates
its ARP cache with X’s hardware address (contained in the
ARP Req)
– “Latest” hardware address is maintained

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Uses of Gratuitous ARPs
• Backup server taking over from a failed server

• After detecting that the primary server has failed, the backup
server

– Issues a gratuitous ARP request, with the primary server’s IP address and
its own hardware address

– Causes all machines to update their ARP cache entries, so that the
backup’s hardware address is noted

– Henceforth, all traffic is directed to the backup server

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956

You might also like