Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

lOMoARcPSD|6452268

Answers Lab1 R

Data-Intensive Computing (City University of Hong Kong)

StuDocu n'est pas sponsorisé ou supporté par une université ou école


Téléchargé par Ameur Youssef (youssef.ameur.2021@gmail.com)
lOMoARcPSD|6452268

Name:
Lab 1 – R Language ID:
1. Download the “yearly_sales.csv” file from CANVAS
2. Open the csv file using a spreadsheet program. What is the csv file?

The csv file contains the sales information for different customers with the customer ID, total sales
amount, number of sale orders, and customer genter.

3. Open the program “RStudio”


4. Load the previously downloaded csv file into R with the command:
sales <- read.csv([your csv file path])

5. Type “head(sales)”. What can you observe?

The first few rows of the customer sales data.

6. Type “summary(sales)”. What can you observe?

The column summary of the sales data.

7. Type the following command. What can you observe?


plot(sales$num_of_orders,sales$sales_total)

A scatterplot between the number of sale orders and total sales amount.

8. Type the following commands. What have you done to the “sales” data?
sales$per_order <- sales$sales_total/sales$num_of_orders
head(sales)

An average sales amount column has been added for each customer.

9. Type the following command. What have you done?


write.table(sales,"sales_modified.txt", sep="\t", row.names=FALSE)

A text file has been generated containing the modified sales data in the same directory.

10. Type the following commands. What have you done?


jpeg(file="sales_hist.jpeg")
hist(sales$num_of_orders)

Téléchargé par Ameur Youssef (youssef.ameur.2021@gmail.com)


lOMoARcPSD|6452268

dev.off()

A histogram file "sales_hist.jpeg" has been generated for number of sales order in the same directory.

11. Type the following commands. What have you done?


x <- sales$sales_total
y <- sales$num_of_orders

The total sales amount column has been stored as the x vector while the number of sale order column
has been stored as the y vector.

12. Type the commands in the leftmost column and fill in the following table:

R Command Return Value Semantic Meaning (hint: use the help panel in RStudio)
cor(x,y) 0.7508015 To produce the correlation between x and y

cov(x,y) 345.2111 To compute the covariance between x and y

IQR(x) 215.21 To compute the interquartile range of x

mean(x) 249.4557 To compute the average or mean of x

median(x) 151.65 To compute the median of x

range(x) 30.02 7606.09 To obtain the minimum and maximum of x

sd(x) 319.0508 To calculate the standard deviation of x

var(x) 101793.4 To calculate the variance of x

13. Apply the R knowledge you have learned in the previous steps to the dataset “zipIncome.csv” in
CANVAS and state your data analytics insights in less than 50 words.

(Open Answers)

14. This is the end; please hand in your lab sheet to the tutor or person-in-charge for taking lab participation.
Once your lab sheet is accepted, please feel free to leave quietly.

Téléchargé par Ameur Youssef (youssef.ameur.2021@gmail.com)

You might also like