Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

CS549: Performance Analysis of Computer Networks

Instructor: Dr.Sreelakshmi Manjunath

Lab Assignment 2 : Performance Analysis using CPUHog and WGET

Name : Sandeep N Kundalwal Date of Submission : 07/04/2023


Roll No.: T22051

Experiment 01

Aim : To compare three methods of time measurement for a program titled ‘cpuhog’.
Description : Create a program ‘cpuhog’. In this experiment, we are calculating the inverse tangent and tangent of a
number(8) using Java. We have varied the number of iterations so that the execution time varies from a few seconds to a
minute. Three different techniques used to measure the execution time are -
(i) Using stopwatch in mobile
(ii) Using time command in command line
(iii) Using time() & getrusage() function before/after the while loop.

Machine Used : Dell Inspiron 15 3000 series

Procedure : Measuring execution time using three methods mentioned in the description.
➔ Following readings have been observed using stopwatch in mobile -

Number of Iterations Round 01 (seconds) Round 02 (seconds) Round 03 (seconds) Average (seconds)

30 11.93 11.85 11.90 11.89

60 24.90 24.42 23.09 24.13

90 30.47 34.84 35.87 33.72

120 37.67 38.02 38.13 37.18

150 47.58 47.28 47.49 47.45

180 57.64 58.73 56.60 57.65

210 65.08 65.67 68.53 66.42

240 75.84 83.90 76.29 78.67

270 83.87 83.01 83.95 83.61

300 97.07 95.80 93.37 95.41


➔ Following readings have been observed using time in command line -

No. of Iterations Round 01 (seconds) Round 02 (seconds) Round 03 (seconds) Average Execution Time
(seconds)

30 9.38 9.38 9.38 9.38

60 18.70 18.70 18.69 18.69

90 28.02 27.99 28.03 28.01

120 37.30 37.32 37.31 37.31

150 46.67 46.63 46.61 46.63

180 55.89 55.89 55.93 55.90

210 65.33 65.21 65.18 65.24

240 74.61 74.67 74.68 74.65

270 83.94 83.24 83.96 83.71

300 93.24 93.25 93.27 93.25

➔ Following readings have been observed using currentTimeMillis() method -

No. of Iterations Round 01 (seconds) Round 02 (seconds) Round 03 (seconds) Average Execution Time
(seconds)

30 8.89 8.68 8.86 8.81

60 21.82 21.38 20.01 21.07

90 27.66 32.01 31.62 30.43

120 34.84 35.06 34.99 34.96

150 44.02 44.38 43.53 43.97

180 54.56 55.68 53.17 54.47

210 61.92 62.52 65.69 63.37

240 72.80 79.98 73.64 75.47

270 81.08 80.01 80.70 80.59

300 93.15 92.96 90.46 92.19


Note:
➔ time() & getrusage() are functions of C and Python. In Java, currentTimeMillis() is used to find the current time
in milliseconds.
➔ time command equivalent for windows is,
> Measure-Command {java <program-name> <argument-list>}

Fig. Line Chart: No. of Iteration v/s Execution Time

Program cpuhog:
public class cpuhog {
public static void main(String[] args) {
int iterations = Integer.parseInt(args[0]);
long start = System.currentTimeMillis();
for(int i = 0; i < iterations; i++){
double result = calculateTan(8);
}
long end = System.currentTimeMillis();
System.out.println("Execution Time: " + ((end - start) / 1000f) + "s");
}
public static double calculateTan(int baseNumber){
double result = 0;
double baseNumberPowered = Math.pow(baseNumber, 7);
while(baseNumberPowered >= 0){
result += Math.atan(baseNumberPowered) * Math.tan(baseNumberPowered);
baseNumberPowered--;
}
return result;
}
}
Observations :
1. The execution time calculated using the currentTimeMillis() method is the least among all the three methods,
whereas the execution time calculated using the Stopwatch method is maximum in every round of
experimentation.
2. Time Command gives a uniform execution time for each round of a particular iteration where the execution time
does not vary much, but the same is not true for the other two methods.
3. The Time command measures the total amount of CPU time used by the program, which includes the time spent
by the program in system calls and in executing other programs.

Inferences:
1. Execution time - Stopwatch > Time Command ≈ currentTimeMillis()
2. Since there’s always a margin of human error in calculating execution time using the Stopwatch method, the
results are not as accurate as with the other two methods.
3. All the methods used for calculating execution time depict similar behavior i.e, the graph grows in linear fashion,
with the time command calculations being the most linear.
4. As the number of iterations increases, the execution time also increases in all the methods. So, we can conclude
that,
Execution time ∝ No. of iterations
Experiment No. 2

Aim : To measure network throughput using wget command.


Description: Using fractional factorial design, measure network throughput using wget command. Use Ranking method
and Range method to analyze the impact of the factors. Identify the two factors that have the greatest impact. Now,
analyze the effects of these 2 factors using the allocation of variance technique. If needed, run a few more experiments
with different levels of these 2 factors.

Factors given:

Below are the given factors and their possible levels.

Fig. Given factors with possible levels

As per the given experiment, we have a total of four factors. And each factor can have a minimum of four levels
(assumed). As per the full factorial design, the total number of rounds to be conducted can be given as -

nk = 44 = 256 experiments
Since we are using fractional factorial design, the number of factors to be considered can be reduced by including only
the factors that affect the result the most. So, here we are reducing the number of factors to 2. So, the total number of
rounds to be conducted can be given as

3k - p = 34-2 = 9 replications (For 3 repetitions, 9 ✕ 3 = 27 replications)

Procedure: Measure throughput using the chosen factors and their respective levels.

For each round,

Throughput = (Sum of throughput for every file download / Number of Iterations)

Table 2.1 - Factors with their respective levels considered for Fractional Factorial Design

Symbol Factors Level-I Level-II Level-III

A File Size 100KB 1MB 100MB

B No. of Concurrent Downloads 10 25 50

C Download Speed Limit 1 MB/s 10 MB/s 20 MB/s

D Time of the Day 08:00 AM 01:00 PM 07:00 PM

Table 2.2 - Measured data for Network Throughput Study

Exp No. File Number of Download Time of the Round 1 Round 2 Round 3 Average
Size Concurrent File Speed Limit Day (MB/s) (MB/s) (MB/s) Throughput
Downloads (MB/s)

1. 100KB 10 20 MB/s 01:00 PM 1.30 1.20 1.15 1.21

2. 100KB 25 10 MB/s 07:00 PM 0.68 0.73 0.76 0.72

3. 100KB 50 1 MB/s 08:00 AM 1.75 1.78 1.62 1.71

4. 1MB 10 20 MB/s 07:00 PM 0.48 0.56 0.52 0.52

5. 1MB 25 10 MB/s 01:00 PM 0.27 0.28 0.21 0.25

6. 1MB 50 1 MB/s 08:00 AM 0.64 0.68 0.64 0.65

7. 100MB 10 20 MB/s 08:00 AM 0.58 0.61 0.58 0.59

8. 100MB 25 10 MB/s 01:00 PM 0.14 0.11 0.13 0.12

9. 100MB 50 1 MB/s 07:00 PM 0.03 0.06 0.05 0.04


Note:
Various considerations has been employed while arranging the data in the table,
➔ File Size & No. of Concurrent Download are arranged in ascending order.
➔ Download Speed Limit is arranged in descending order.
➔ Time of the Day is arranged in random order

Note:
The link to check the autologs of the three repetitions of the above experiment are given below:
➔ https://drive.google.com/drive/folders/1qiDxJoZVpqNZzXJtHn0XKK9rPywLy9Go?usp=share_link

Using the above data, we plot the Network Throughput vs. No. of Concurrent File Downloads graph for various file sizes
namely 100KB, 1MB, 100MB in MB/s.

Fig. Line Chart: Various Factors v/s Network Throughput

Note:
To compute the above graphs, the average of all values for each level have been taken as a single point.
Observations from the Line Charts:
● As the File Size increases, the network throughput decreases i.e., File Size is inversely proportional to the
Network Throughput.
● The behavior regarding Time of the Day is somewhat similar to that of the File Size i.e., as the time increases(from
morning to evening), the network throughput decreases.
● The behavior of the remaining two factors are similar i.e., the graph dips down as we increase the factor and then
increases linearly.

Python Code : Below is the python script that runs the wget command to concurrently download the given files using
Threadpool. After the wget command gets executed, the method “run_wget_cmd” returns the log of the wget and we
extract the network throughput for each downloaded file.
import numpy as np
import re, os, subprocess, concurrent.futures

# function to get the path of the current directory


def getCurrentDirectoryPath(directoryName):
base_dirc = os.path.dirname(os.path.abspath(__file__))
path = os.path.join(base_dirc, directoryName)
return path

# function to create a directory


def createDirectory(directoryName):
print(directoryName)
try:
path = getCurrentDirectoryPath(directoryName)
if not os.path.exists(path):
os.makedirs(path)
except OSError:
print('Error: Creating directory. ', directoryName)

# runs the wget command and logs the information in log file
def run_wget_cmd(wget_cmd, verbose = False, *args, **kwargs):
process = subprocess.Popen(
wget_cmd,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
text=True,
shell=True
)
std_out, std_err = process.communicate()
return std_err

# Downloading the files concurrently using thread pool.


# @repetition -> Represents the repetition number
# @downloadSize -> size of the file to be downloaded
# @link -> url from where the files is being downloaded
# @logFileName -> to log the output of wget command
# @noOfConcurrentDownloads -> number of times the files is downloaded concurrently
def runConcurrentDownloads(repetition, downloadSize, link, logFileName, noOfConcurrentDownloads,
downloadLimit):
downloadLimits = ['20', '10M', '10M']
directory_name = str(downloadSize) + "_concurrentdownload_" + str(noOfConcurrentDownloads)
createDirectory(directory_name)
path = getCurrentDirectoryPath(directory_name)
wget = 'wget --limit-rate=' + downloadLimits[downloadLimit] + ' --directory-prefix=' + path + ' ' + link
with concurrent.futures.ThreadPoolExecutor(max_workers=noOfConcurrentDownloads) as threadExecutor:
throughputs = []
throughputValues = []
futureToUrl = {threadExecutor.submit(run_wget_cmd, wget, verbose=False): degree for degree in
range(noOfConcurrentDownloads)}
for future in concurrent.futures.as_completed(futureToUrl):
wgetLog = future.result()
throughput = regex.search(wgetLog)
if throughput is not None:
splittedThroughput = throughput.group(0).split(" ")
throughputValue = float(splittedThroughput[0])
throughputMetric = splittedThroughput[1].strip()
# if the throughput is in KB/s, convert into MB/s
if throughputMetric == "KB/s":
throughputValue = round((throughputValue / 1024), 2)
throughputMetric = "MB/s"
throughputs.append(str(throughputValue) + " " + throughputMetric)
throughputValues.append(throughputValue)

with open(logFileName, 'a') as logFile:


print(wgetLog, file=logFile)

print("#########################################################################################",
file=logFile)

extraValuesNeeded = noOfConcurrentDownloads - len(throughputValues)


initialAverage = round(np.mean(throughputValues), 2)
for x in range(extraValuesNeeded):
throughputValues.append(initialAverage)

dataLogName = str(repetition) + "_downloadLimit" + downloadLimits[downloadLimit] + "_" +


str(downloadSize) + "File_" + str(noOfConcurrentDownloads) + ".txt"
with open(dataLogName, 'w') as dataLog:
print(throughputs, file=dataLog)
print("Average Throughput: " + str(round(np.mean(throughputValues), 2)) + "MB/s", file=dataLog)

# Start point of the execution of the program


if __name__ == '__main__':
noOfConcurrentDownloads = [10, 25, 25]
sizes = ["100KB","1MB","100MB"]
links = ["https://cloud.iitmandi.ac.in/f/5d31e8769b954109be61/?dl=1",
"https://cloud.iitmandi.ac.in/f/07109a50545d4930a714/?dl=1",
"https://cloud.iitmandi.ac.in/f/c200476d3ce247e08c87/?dl=1"]
# regex to match the format (12.0 MB/s) to get the throughput from wget output
regex = re.compile(r'(\d{1,4}\.?\d{0,4}\ [KM][Bb]/s)')

for repetition in range(1, 2):


logFileName = "Autolog-" + str(repetition) + "-1PM.txt"
with open(logFileName, 'w') as logFile:
print("Repetition No: " + str(repetition))

i = 0
for link in links:
runConcurrentDownloads(repetition, sizes[i], link, logFileName, noOfConcurrentDownloads[i], i)
i += 1
Analysis Methods: Three different methods have been employed for the performance analysis of the above experiment
namely,
1. Ranking Method
2. Range Method
3. Allocation of Variance

➔ Ranking Method: The ranking method is similar to the observation method except that the experiments are written in
the order of increasing or decreasing responses so that the experiment with the best response is first and the worst
response is the last. Now the factor columns are observed to find the levels that consistently produce good or bad
results.

Let us define two variables xA and xB as follows:


Table 2.3 - Sign Table for Ranking Method in order of decreasing throughput

Exp A B C D Throughput (Tthroughput)


No.

3 -1 1 -1 -1 1.71

1 -1 -1 1 0 1.21

2 -1 0 0 1 0.72

6 0 1 -1 -1 0.65

7 1 -1 1 -1 0.59

4 0 -1 1 1 0.52

5 0 0 0 0 0.25

8 1 0 0 0 0.12

9 1 1 -1 1 0.04

From the above sign table, we can observe that -


● File Size shows more impact than any other factors i.e., files size is inversely proportional to network
throughput.

3>1>2>6>7>4>5>8>9

● We can’t infer much from the other factors as their effect is not very clear.

As per ranking method, we arrange the impact of factors in increasing order,

File Size > Download Speed Limit = Concurrent File Downloads = Time of the Day

➔ Range Method: In the Range Method, we find the average response corresponding to each level of the factor and find
the difference between the maximum and minimum of such averages. This difference is called the range. A factor
with a large range is considered important.
Table 2.4 - Factor Averages and Range for the Network Throughput Study

Factor Level 1 Level 2 Level 3 Range of Averages

File Size 1.21 0.47 0.25 0.96

No. of Concurrent Downloads 0.77 0.36 0.80 0.44

Download Speed Limit 0.80 0.36 0.77 0.44

Time of the Day 0.98 0.52 0.42 0.56

As per Range method, we arrange the impact of factors in increasing order,

File Size > Time of the Day > Download Speed Limit = Concurrent File Downloads

Conclusion:
As per results from ranking method & range method, we can say that two factors that have the greatest
impact are -

File Size & Time of the Day

➔ Allocation of Variation: In order to analyze the effect of the two most crucial factors found using the Ranking and
Range Method, let’s consider a design with r replications of each of the ab experiments corresponding to the a levels
of factor A (File Size) and b levels of factor B. The model in this case is,

ȳijk = μ + αj + βi + 𝛾ij

Here,
ȳijk = response (observation) in the kth replication of experiment with factor A at level j and factor B at level i
μ = mean response
⍺j = effect of factor A at level j
βi = effect of factor B at level k
𝛾ij = effect of interaction between factor A at level j and factor B at level i
eijk = experimental error

The effects are computed so that their sum is zero:


The interactions are computed so that their row as well as column sums are zero:

The errors in each experiment add to zero:

Table 2.5 - Data for Network Throughput Study with Replications

08:00 AM 01:00 PM 07:00 PM

100KB 1.23 0.95 0.91

1.19 1.01 0.69

1.28 1.06 0.79

1MB 0.85 0.79 0.55

1.66 0.64 0.27

0.58 0.66 0.53

100MB 0.21 0.19 0.15

0.21 0.17 0.15

0.2 0.18 0.18

Note:
While calculating the data for the above table, the below mentioned factors were kept constant.
● Number of Concurrent Downloads = 25
● Download Speed Limit = 10 MB/s

Computation of Effects: The expressions for effects can be obtained in a manner similar as above for two-factor
designs without replications. The observations are assumed to be arranged in ab cells arranged as a matrix of ‘b’
rows and ‘a’ columns. Each cell contains r observations belonging to the replications of one experiment. Averaging
the observations in each cell produces,
Similarly, averaging across columns and rows and overall observations produces,

From these equations, we obtain the following expressions for effects:

Table 2.6: Computation of Effects for the Network Throughput Study with Replications

08:00 AM 01:00 PM 07:00 PM Row Sum Row Mean Row Effect

100KB 1.2333 1.0067 0.7967 9.1100 1.0122 0.3722

1MB 1.0300 0.6967 0.4500 6.5300 0.7255 0.0855

100MB 0.2067 0.1800 0.1600 1.6400 0.1822 -0.4577

Column Sum 2.4700 1.8834 1.4067 5.7601

Column Mean 0.8233 0.6278 0.4689 0.6400

Column Effect 0.1839 -0.0122 -0.1711

The interactions (or cell effects) for the (i, j)th cell are computed by subtracting μ + αj + βi from the cell mean 𝑦ij.
The computation can be verified by checking that the row as well as column sums of interactions are zero.

Table 2.7: Interactions in the Network Throughput with Replications

8:00 AM 1:00 PM 7:00 PM


100KB 0.0377 0.0066 -0.0444
1MB 0.1211 -0.0166 -0.1044
100MB -0.1588 0.0099 0.1488

The total variation of 𝑦 can be allocated to the two factors, the interaction between them, and the experimental errors.
SSY = SSO + SSA + SSB + SSAB + SSE

The total variation (SST) is,

SST = SSY - SSO = SSA + SSB + SSAB + SSE

Sum of Squares for the data are,


SSY = (1.23)2 + (1.19)2 + (1.28)2 + … + (0.108)2 = 15.2450
SSO = 3 * 3 * 3 * (0.6400)2 = 11.0593
SSA = 3 * 3 * [(0.3722)2 + (0.0855)2 + (-0.4577)2] = 3.1988
SSB = 3 * 3 * [(0.1832)2 + (-0.0122)2 + (-0.1711)2] = 0.5672
SSAB = 3 * [(0.0377)2 + (0.0066)2 + … + (0.1488)2] = 0.2303
SST = SSY - SSO = 4.1856
SSE = SSY - SSO - SSA - SSB - SSAB = 0.1891

The percentage of variation explained by a factor or interaction can be used to measure the importance of the
corresponding effect,

Explained by File Size (A) = 100 * (SSA/SST) = 76.4253%


Explained by Time of the Day (B) = 100 * (SSB/SST) = 13.5513%
Explained by Interaction (AB) = 100 * (SSAB/SST) = 5.5044%
Unexplained = 100 * (SSE/SST) = 4.5188%

Conclusions:

1. From the above experiment, we can conclude that the following,


a. File Size has the greatest impact on the Network Throughput amongst all the factors i.e., ~76.43%
b. Time of the Day does impact the Network Throughput, but not significantly. We can say that morning time
is much better to download large files.
c. The error rate is a bit high at around ~4.52%, which can be due to unstable internet connection.
2. The Allocation of Variation Method does provide a significant amount of information about the factors that affect
the Network Throughput, but is slightly complex.

You might also like