Week 7

Week 7
a) Reading different types of data sets (.txt, .csv) from Web or disk and
writing in file in specific disk location.
Aim: To read different types of data sets (.txt, .csv) from Web or disk and writing in file in
specific disk location.
Description:
Reads a file in table format and creates a data frame from it, with cases corresponding to lines
and variables to fields in the file.
Usage:
read.table(file, header = FALSE, sep = "", quote = "\"'",dec = ".", row.names, col.names,
na.strings = "NA", colClasses = NA, nrows = -1, encoding = "unknown", text)
read.csv(file, header = TRUE, sep = ",", quote = "\"", dec = ".", ...)
read.csv2(file, header = TRUE, sep = ";", quote = "\"", dec = ",",...)
Arguments:
file the name of the file which the data are to be

read from. Each row of the table appears as
one line of the file.
header a logical value indicating whether the file

contains the names of the variables as its
first line.
sep the field separator character. Values on
each line of the file are separated by this
character.
quote the set of quoting characters. To disable
quoting altogether, use quote = "".
dec the character used in the file for decimal

points.
row.names a vector of row names.
col.names a vector of optional names for the

variables.
nrows integer: the maximum number of rows to
read in.
Write.table:
1
write.table prints its required argument x (after converting it to a data frame if it is not one
nor a matrix) to a file or connection.
Usage:
write.table(x, file = "", append = FALSE, quote = TRUE, sep = " ", eol = "\n", na = "NA",
dec = ".", row.names = TRUE, col.names = TRUE, fileEncoding = "")
write.csv(...)
write.csv2(...)
Arguments:
x the object to be written, preferably a matrix

or data frame.
file either a character string naming a file or a
connection open for writing. "" indicates
output to the console.
append logical. Only relevant if file is a character
string. If TRUE, the output is appended to the
file. If FALSE, any existing file of the name
is destroyed.
quote a logical value or a numeric vector. If TRUE,
any character or factor columns will be
surrounded by double quotes. If FALSE,
nothing is quoted.
sep the field separator string. Values within

each row of x are separated by this string.
eol the character(s) to print at the end of each
line (row).
dec the string to use for decimal points in

numeric or complex columns: must be a
single character.
row.names either a logical value indicating whether the
row names of x are to be written along with
x, or a character vector of row names to be
written.
col.names either a logical value indicating whether the
column names of x are to be written along
with x, or a character vector of column names
to be written. See the section on ‘CSV files’
for the meaning of col.names = NA.
2
Source code:
> rt<-read.table("c:/Users/Dell/Documents/sample.txt",header=TRUE)
> print(rt)
> write.table(rt,"c:/Users/Dell/Documents/s2.txt",quote=FALSE)
> my_data <- read.delim("http://www.sthda.com/upload/boxplot_format.txt")
> head(my_data)
> rt2<-read.table("c:/Users/Dell/Documents/s2.txt",header=TRUE,sep=',')
> print(rt2)
> head(rt)
Output:
name age gender company salary
1 rani 21 f ibm 10k
2 raju 21 m ibm 10k
3 latha 21 f tcs 20k
4 nandu 20 f tcs 20k
5 shiv 20 m info 18k
Nom variable Group
1 IND1 10 A
2 IND2 7 A
3 IND3 20 A
4 IND4 14 A
5 IND5 14 A
6 IND6 12 A
name.age.gender.company.salary
1 1 rani 21 f ibm 10k
2 2 raju 21 m ibm 10k
3 3 latha 21 f tcs 20k
3
4 4 nandu 20 f tcs 20k
5 5 shiv 20 m info 18k
name age gender company salary
1 rani 21 f ibm 10k
2 raju 21 m ibm 10k
3 latha 21 f tcs 20k
4 nandu 20 f tcs 20k
5 shiv 20 m info 18k
4
b) Reading Excel data sheet in R.
Aim: To read Excel data sheet in R.
Description:
read_excel() function is basically used to import/read an excel file and it can only be
accessed after importing of the readxl library in R language..
Usage:
read_excel(path,sheet = NULL,range = NULL,col_names = TRUE, col_types = NULL, na =

"", n_max = Inf, guess_max = min(1000, n_max), .name_repair = "unique")
read_xls(path,sheet = NULL, range = NULL, col_names = TRUE, col_types = NULL, na =

"", trim_ws = TRUE, n_max = Inf, guess_max = min(1000, n_max), .name_repair = "unique"
read_xlsx( path, sheet = NULL, range = NULL, col_names = TRUE, col_types = NULL, na
= "", trim_ws = TRUE, n_max = Inf, guess_max = min(1000, n_max),.name_repair =
"unique")
Arguments:
path Path to the xls/xlsx file.

sheet Sheet to read. Either a string (the name of a sheet), or
an integer (the position of the sheet).
range A cell range to read from, as described in cell-

specification.
col_names TRUE to use the first row as column names,

FALSE to get default names, or a character
vector giving a name for each column.
col_types Either NULL to guess all from the spreadsheet or a
character vector containing one entry per column from
these options: "skip", "guess", "logical", "numeric",
"date", "text" or "list". If exactly one col_type is
specified, it will be recycled.
na Character vector of strings to interpret as missing

values. By default, readxl treats blank cells as missing
data.
n_max Maximum number of data rows to read. Trailing

empty rows are automatically skipped, so this is an
upper bound on the number of rows in the returned
tibble.
guess_max Maximum number of data rows to use for guessing

column types.
.name_repair Handling of column names. Passed along to

tibble::as_tibble(). readxl's default is '.name_repair =
"unique", which ensures column names are not empty
and are unique.
5
Source code:
#Reading Excel data sheet in R.

>install.packages("readxl")
# Load the library into R workspace.

>library("readxl")
> data1<-read.csv(file.choose(),header=TRUE)
> data1
> data2<-read.table(file.choose(),header=T,sep=",")
> data2
> d<-read.csv("E:/R Programming 2023/products.csv")
>d
Output:
PRODUCT PRICE X
1 NA NA
2 Refriegerator 1200 NA
3 oven 750 NA
4 Dishwasher 600 NA
5 Cofeemaker 300 NA
6
c) Reading XML data set into R.
Aim: To read XML data set into R.
Description:
XML is a file format which shares both the file format and the data on the World Wide Web,
intranets, and elsewhere using standard ASCII text. It stands for Extensible Markup
Language (XML). Similar to HTML it contains markup tags. But unlike HTML where the
markup tag describes structure of the page, in xml the markup tags describe the meaning of
the data contained into he file.
You can read a xml file in R using the "XML" package. This package can be installed using
following command.
install.packages("XML")
Input Data
Create a XMl file by copying the below data into a text editor like notepad. Save the file with
a .xml extension and choosing the file type as all files(*.*).
Sourcecode:
>install.packages("XML")
># Load the package required to read XML files.
> library("XML")
> # Also load the other required package.
> library("methods")
> # Give the input file name to the function.
> r <- xmlParse(file = "C:/Users/Dell/Documents/input.xml")
> d <- xmlToDataFrame("C:/Users/Dell/Documents/input.xml")
> print(d)
> # Load the package required to read XML files.
> library("XML")
Warning message:
package ‘XML’ was built under R version 4.2.3
> # Also load the other required package.
> library("methods")
7
> # Give the input file name to the function.
> result <- xmlParse(file = "input.xml")
> # Print the result.
> print(result)
Output:
<?xml version="1.0"?>
<RECORDS>
<EMPLOYEE>
<ID>1</ID>
<NAME>Rick</NAME>
<SALARY>623.3</SALARY>
<STARTDATE>1/1/2012</STARTDATE>
<DEPT>IT</DEPT>
</EMPLOYEE>
<EMPLOYEE>
<ID>2</ID>
<NAME>Dan</NAME>
<DEPT>Operations</DEPT>
</EMPLOYEE>
<EMPLOYEE>
<ID>3</ID>
<NAME>Michelle</NAME>
<SALARY>611</SALARY>
<DEPT>IT</DEPT>
</EMPLOYEE>
<EMPLOYEE>
<ID>4</ID>
<NAME>Ryan</NAME>
<DEPT>HR</DEPT>
8
</EMPLOYEE>
<EMPLOYEE>
<ID>5</ID>
<NAME>Gary</NAME>
<DEPT>Finance</DEPT>
</EMPLOYEE>
<EMPLOYEE>
<ID>6</ID>
<NAME>Nina</NAME>
<DEPT>IT</DEPT>
</EMPLOYEE>
<EMPLOYEE>
<ID>7</ID>
<NAME>Simon</NAME>
<DEPT>Operations</DEPT>
</EMPLOYEE>
<EMPLOYEE>
<ID>8</ID>
<NAME>Guru</NAME>
<DEPT>Finance</DEPT>
</EMPLOYEE>
</RECORDS>
Output:
ID NAME SALARY STARTDATE DEPT
1 1 Rick 623.3 1/1/2012 IT
2 2 Dan 515.2 9/23/2013 Operations
9
3 3 Michelle 611 11/15/2014 IT
4 4 Ryan 729 5/11/2014 HR
5 5 Gary 843.25 3/27/2015 Finance
6 6 Nina 578 5/21/2013 IT
7 7 Simon 632.8 7/30/2013 Operations
8 8 Guru 722.5 6/17/2014 Finance
10

Week 7

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Week 7

Uploaded by

Copyright:

Available Formats

Week 7

read.csv2(file, header = TRUE, sep = ";", quote = "\"", dec = ",",...)

file the name of the file which the data are to be

header a logical value indicating whether the file

dec the character used in the file for decimal

col.names a vector of optional names for the

x the object to be written, preferably a matrix

sep the field separator string. Values within

dec the string to use for decimal points in

> my_data <- read.delim("http://www.sthda.com/upload/boxplot_format.txt")

name age gender company salary

1 rani 21 f ibm 10k

2 raju 21 m ibm 10k

3 latha 21 f tcs 20k

4 nandu 20 f tcs 20k

5 shiv 20 m info 18k

Nom variable Group

1 1 rani 21 f ibm 10k

2 2 raju 21 m ibm 10k

3 3 latha 21 f tcs 20k

5 5 shiv 20 m info 18k

name age gender company salary

1 rani 21 f ibm 10k

2 raju 21 m ibm 10k

3 latha 21 f tcs 20k

4 nandu 20 f tcs 20k

5 shiv 20 m info 18k

Aim: To read Excel data sheet in R.

read_excel(path,sheet = NULL,range = NULL,col_names = TRUE, col_types = NULL, na =

read_xls(path,sheet = NULL, range = NULL, col_names = TRUE, col_types = NULL, na =

path Path to the xls/xlsx file.

range A cell range to read from, as described in cell-

col_names TRUE to use the first row as column names,

na Character vector of strings to interpret as missing

n_max Maximum number of data rows to read. Trailing

guess_max Maximum number of data rows to use for guessing

.name_repair Handling of column names. Passed along to

#Reading Excel data sheet in R.

# Load the library into R workspace.

Aim: To read XML data set into R.

># Load the package required to read XML files.

> # Also load the other required package.

> # Give the input file name to the function.

> r <- xmlParse(file = "C:/Users/Dell/Documents/input.xml")

> d <- xmlToDataFrame("C:/Users/Dell/Documents/input.xml")

> # Load the package required to read XML files.

package ‘XML’ was built under R version 4.2.3

> # Also load the other required package.

> result <- xmlParse(file = "input.xml")

> # Print the result.

ID NAME SALARY STARTDATE DEPT

1 1 Rick 623.3 1/1/2012 IT

2 2 Dan 515.2 9/23/2013 Operations

4 4 Ryan 729 5/11/2014 HR

5 5 Gary 843.25 3/27/2015 Finance

6 6 Nina 578 5/21/2013 IT

7 7 Simon 632.8 7/30/2013 Operations

8 8 Guru 722.5 6/17/2014 Finance

You might also like