Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Week 7

a) Reading different types of data sets (.txt, .csv) from Web or disk and
writing in file in specific disk location.

Aim: To read different types of data sets (.txt, .csv) from Web or disk and writing in file in
specific disk location.

Description:

Reads a file in table format and creates a data frame from it, with cases corresponding to lines
and variables to fields in the file.

Usage:

read.table(file, header = FALSE, sep = "", quote = "\"'",dec = ".", row.names, col.names,
na.strings = "NA", colClasses = NA, nrows = -1, encoding = "unknown", text)

read.csv(file, header = TRUE, sep = ",", quote = "\"", dec = ".", ...)

read.csv2(file, header = TRUE, sep = ";", quote = "\"", dec = ",",...)

Arguments:

file the name of the file which the data are to be


read from. Each row of the table appears as
one line of the file.

header a logical value indicating whether the file


contains the names of the variables as its
first line.
sep the field separator character. Values on
each line of the file are separated by this
character.
quote the set of quoting characters. To disable
quoting altogether, use quote = "".

dec the character used in the file for decimal


points.
row.names a vector of row names.

col.names a vector of optional names for the


variables.
nrows integer: the maximum number of rows to
read in.

Write.table:

1
write.table prints its required argument x (after converting it to a data frame if it is not one
nor a matrix) to a file or connection.

Usage:

write.table(x, file = "", append = FALSE, quote = TRUE, sep = " ", eol = "\n", na = "NA",
dec = ".", row.names = TRUE, col.names = TRUE, fileEncoding = "")

write.csv(...)

write.csv2(...)

Arguments:

x the object to be written, preferably a matrix


or data frame.
file either a character string naming a file or a
connection open for writing. "" indicates
output to the console.
append logical. Only relevant if file is a character
string. If TRUE, the output is appended to the
file. If FALSE, any existing file of the name
is destroyed.
quote a logical value or a numeric vector. If TRUE,
any character or factor columns will be
surrounded by double quotes. If FALSE,
nothing is quoted.

sep the field separator string. Values within


each row of x are separated by this string.
eol the character(s) to print at the end of each
line (row).

dec the string to use for decimal points in


numeric or complex columns: must be a
single character.
row.names either a logical value indicating whether the
row names of x are to be written along with
x, or a character vector of row names to be
written.
col.names either a logical value indicating whether the
column names of x are to be written along
with x, or a character vector of column names
to be written. See the section on ‘CSV files’
for the meaning of col.names = NA.

2
Source code:

> rt<-read.table("c:/Users/Dell/Documents/sample.txt",header=TRUE)

> print(rt)

> write.table(rt,"c:/Users/Dell/Documents/s2.txt",quote=FALSE)

> my_data <- read.delim("http://www.sthda.com/upload/boxplot_format.txt")

> head(my_data)

> rt2<-read.table("c:/Users/Dell/Documents/s2.txt",header=TRUE,sep=',')

> print(rt2)

> head(rt)

Output:

name age gender company salary

1 rani 21 f ibm 10k

2 raju 21 m ibm 10k

3 latha 21 f tcs 20k

4 nandu 20 f tcs 20k

5 shiv 20 m info 18k

Nom variable Group

1 IND1 10 A

2 IND2 7 A

3 IND3 20 A

4 IND4 14 A

5 IND5 14 A

6 IND6 12 A

name.age.gender.company.salary

1 1 rani 21 f ibm 10k

2 2 raju 21 m ibm 10k

3 3 latha 21 f tcs 20k

3
4 4 nandu 20 f tcs 20k

5 5 shiv 20 m info 18k

name age gender company salary

1 rani 21 f ibm 10k

2 raju 21 m ibm 10k

3 latha 21 f tcs 20k

4 nandu 20 f tcs 20k

5 shiv 20 m info 18k

4
b) Reading Excel data sheet in R.

Aim: To read Excel data sheet in R.

Description:

read_excel() function is basically used to import/read an excel file and it can only be
accessed after importing of the readxl library in R language..

Usage:

read_excel(path,sheet = NULL,range = NULL,col_names = TRUE, col_types = NULL, na =


"", n_max = Inf, guess_max = min(1000, n_max), .name_repair = "unique")

read_xls(path,sheet = NULL, range = NULL, col_names = TRUE, col_types = NULL, na =


"", trim_ws = TRUE, n_max = Inf, guess_max = min(1000, n_max), .name_repair = "unique"

read_xlsx( path, sheet = NULL, range = NULL, col_names = TRUE, col_types = NULL, na
= "", trim_ws = TRUE, n_max = Inf, guess_max = min(1000, n_max),.name_repair =
"unique")
Arguments:

path Path to the xls/xlsx file.


sheet Sheet to read. Either a string (the name of a sheet), or
an integer (the position of the sheet).

range A cell range to read from, as described in cell-


specification.

col_names TRUE to use the first row as column names,


FALSE to get default names, or a character
vector giving a name for each column.
col_types Either NULL to guess all from the spreadsheet or a
character vector containing one entry per column from
these options: "skip", "guess", "logical", "numeric",
"date", "text" or "list". If exactly one col_type is
specified, it will be recycled.

na Character vector of strings to interpret as missing


values. By default, readxl treats blank cells as missing
data.

n_max Maximum number of data rows to read. Trailing


empty rows are automatically skipped, so this is an
upper bound on the number of rows in the returned
tibble.

guess_max Maximum number of data rows to use for guessing


column types.

.name_repair Handling of column names. Passed along to


tibble::as_tibble(). readxl's default is '.name_repair =
"unique", which ensures column names are not empty
and are unique.

5
Source code:

#Reading Excel data sheet in R.


>install.packages("readxl")

# Load the library into R workspace.


>library("readxl")
> data1<-read.csv(file.choose(),header=TRUE)
> data1
> data2<-read.table(file.choose(),header=T,sep=",")
> data2
> d<-read.csv("E:/R Programming 2023/products.csv")
>d

Output:
PRODUCT PRICE X

1 NA NA

2 Refriegerator 1200 NA

3 oven 750 NA

4 Dishwasher 600 NA

5 Cofeemaker 300 NA

6
c) Reading XML data set into R.

Aim: To read XML data set into R.

Description:

XML is a file format which shares both the file format and the data on the World Wide Web,
intranets, and elsewhere using standard ASCII text. It stands for Extensible Markup
Language (XML). Similar to HTML it contains markup tags. But unlike HTML where the
markup tag describes structure of the page, in xml the markup tags describe the meaning of
the data contained into he file.
You can read a xml file in R using the "XML" package. This package can be installed using
following command.
install.packages("XML")

Input Data
Create a XMl file by copying the below data into a text editor like notepad. Save the file with
a .xml extension and choosing the file type as all files(*.*).

Sourcecode:

>install.packages("XML")

># Load the package required to read XML files.

> library("XML")

> # Also load the other required package.

> library("methods")

> # Give the input file name to the function.

> r <- xmlParse(file = "C:/Users/Dell/Documents/input.xml")

> d <- xmlToDataFrame("C:/Users/Dell/Documents/input.xml")

> print(d)

> # Load the package required to read XML files.

> library("XML")

Warning message:

package ‘XML’ was built under R version 4.2.3

> # Also load the other required package.

> library("methods")

7
> # Give the input file name to the function.

> result <- xmlParse(file = "input.xml")

> # Print the result.

> print(result)

Output:

<?xml version="1.0"?>
<RECORDS>
<EMPLOYEE>
<ID>1</ID>
<NAME>Rick</NAME>
<SALARY>623.3</SALARY>
<STARTDATE>1/1/2012</STARTDATE>
<DEPT>IT</DEPT>
</EMPLOYEE>
<EMPLOYEE>
<ID>2</ID>
<NAME>Dan</NAME>
<SALARY>515.2</SALARY>
<STARTDATE>9/23/2013</STARTDATE>
<DEPT>Operations</DEPT>
</EMPLOYEE>
<EMPLOYEE>
<ID>3</ID>
<NAME>Michelle</NAME>
<SALARY>611</SALARY>
<STARTDATE>11/15/2014</STARTDATE>
<DEPT>IT</DEPT>
</EMPLOYEE>
<EMPLOYEE>
<ID>4</ID>
<NAME>Ryan</NAME>
<SALARY>729</SALARY>
<STARTDATE>5/11/2014</STARTDATE>
<DEPT>HR</DEPT>
8
</EMPLOYEE>
<EMPLOYEE>
<ID>5</ID>
<NAME>Gary</NAME>
<SALARY>843.25</SALARY>
<STARTDATE>3/27/2015</STARTDATE>
<DEPT>Finance</DEPT>
</EMPLOYEE>
<EMPLOYEE>
<ID>6</ID>
<NAME>Nina</NAME>
<SALARY>578</SALARY>
<STARTDATE>5/21/2013</STARTDATE>
<DEPT>IT</DEPT>
</EMPLOYEE>
<EMPLOYEE>
<ID>7</ID>
<NAME>Simon</NAME>
<SALARY>632.8</SALARY>
<STARTDATE>7/30/2013</STARTDATE>
<DEPT>Operations</DEPT>
</EMPLOYEE>
<EMPLOYEE>
<ID>8</ID>
<NAME>Guru</NAME>
<SALARY>722.5</SALARY>
<STARTDATE>6/17/2014</STARTDATE>
<DEPT>Finance</DEPT>
</EMPLOYEE>
</RECORDS>
Output:

ID NAME SALARY STARTDATE DEPT

1 1 Rick 623.3 1/1/2012 IT

2 2 Dan 515.2 9/23/2013 Operations

9
3 3 Michelle 611 11/15/2014 IT

4 4 Ryan 729 5/11/2014 HR

5 5 Gary 843.25 3/27/2015 Finance

6 6 Nina 578 5/21/2013 IT

7 7 Simon 632.8 7/30/2013 Operations

8 8 Guru 722.5 6/17/2014 Finance

10

You might also like