3 Loading and Saving Data in R Data

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 11

Programing and scripting

3. Loading and saving data in R

Create folder ex3 in you folder on local disc. Open RStudio and create new R Script inside
this folder. !Remember to save your script for time to time during classes! At the end of
the exercises save script and environment image (any additional things to save will be
mentioned in instruction)
1. Read CVS (comma separated values) files
CSV files are one of the most common format of saving data. This files can contain numeric,
character and other data type, but it is just simple text file which can be open in any text
editor. To read this type of data you can use function read.csv(). Check read.csv() help.

Programing and scripting 3. Loading and saving data in R, author: Wojciech Ciężkowski
You will see that help for read.table() function was opened, read.csv() is a wrapper1 for
read.table(). It’s means that is the same function but with some arguments set to specific
default value. For CSV files default column separator is set to coma sep = “,”, decimal
delimiter is specify dec = “.”, also few more less important arguments are specify. For both
function you need to specify file: name (if file is in your working directory), path (to read in
file form local disc) or Url address (to read file form web site). This instruction are based on
Jared P. Lander book2. Hence, some data form his website3 will be used in this (and other)
exercise.
> theUrl = "http://www.jaredlander.com/data/TomatoFirst.csv"
> tomato = read.table(file = theUrl, head = TRUE, sep = ",")
> tomato2 = read.csv(file = theUrl)
> head(tomato)
Round Tomato Price Source Sweet Acid Color Texture Overall Avg.of.Totals Total.of.Avg

1 1 Simpson SM 3.99 Whole Foods 2.8 2.8 3.7 3.4 3.4 16.1 16.1

2 1 Tuttorosso (blue) 2.99 Pioneer 3.3 2.8 3.4 3.0 2.9 15.3 15.3

3 1 Tuttorosso (green) 0.99 Pioneer 2.8 2.6 3.3 2.8 2.9 14.3 14.3

4 1 La Fede SM DOP 3.99 Shop Rite 2.6 2.8 3.0 2.3 2.8 13.4 13.4

5 2 Cento SM DOP 5.49 D Agostino 3.3 3.1 2.9 2.8 3.1 14.4 15.2

6 2 Cento Organic 4.99 D Agostino 3.2 2.9 2.9 3.1 2.9 15.5 15.1

> head(tomato2)
Round Tomato Price Source Sweet Acid Color Texture Overall Avg.of.Totals Total.of.Avg

1 1 Simpson SM 3.99 Whole Foods 2.8 2.8 3.7 3.4 3.4 16.1 16.1

2 1 Tuttorosso (blue) 2.99 Pioneer 3.3 2.8 3.4 3.0 2.9 15.3 15.3

3 1 Tuttorosso (green) 0.99 Pioneer 2.8 2.6 3.3 2.8 2.9 14.3 14.3

4 1 La Fede SM DOP 3.99 Shop Rite 2.6 2.8 3.0 2.3 2.8 13.4 13.4

5 2 Cento SM DOP 5.49 D Agostino 3.3 3.1 2.9 2.8 3.1 14.4 15.2

6 2 Cento Organic 4.99 D Agostino 3.2 2.9 2.9 3.1 2.9 15.5 15.1

Be careful reading in CVS files! Based on computer settings default separator for
columns can be “;” and decimal delimiter “,” (standard in polish settings). If you file is
read in incorrectly open it in notepad and see how it is looks like. Look at read.table()
function help, is there quick option to read data with mentioned standard? (answer in
script)
After read data in function head() was use to see first rows of data. In our case using
read.table() and read.csv() gave the same result, but as you can see using read.csv() you
don’tneed to specify “head = and sep = arguments. Now you can use rm() function to
remove tomato2.
To see some details about data table use function str() which compactly display the structure
of an arbitrary R object.
> str(tomato)
'data.frame': 16 obs. of 11 variables:
$ Round : int 1 1 1 1 2 2 2 2 3 3 ...

1
https://stat.ethz.ch/pipermail/r-help/2008-March/158393.html (access 12.11.2019)
2
Lander, J. P. (2017). R for everyone: advanced analytics and graphics (2nd Edition). Pearson Education
3
https://www.jaredlander.com/datasets/ (access 12.11.2019)

Programing and scripting 3. Loading and saving data in R, author: Wojciech Ciężkowski
$ Tomato : Factor w/ 16 levels "365 Whole Foods",..: 12 15 16 7 4 3 8 9 13 5
...
$ Price : num 3.99 2.99 0.99 3.99 5.49 4.99 3.99 3.99 4.53 NA ...
$ Source : Factor w/ 10 levels "D Agostino","Eataly",..: 10 6 6 8 1 1 8 3 7
5 ...
$ Sweet : num 2.8 3.3 2.8 2.6 3.3 3.2 2.6 2.1 3.4 2.6 ...
$ Acid : num 2.8 2.8 2.6 2.8 3.1 2.9 2.8 2.7 3.3 2.9 ...
$ Color : num 3.7 3.4 3.3 3 2.9 2.9 3.6 3.1 4.1 3.4 ...
$ Texture : num 3.4 3 2.8 2.3 2.8 3.1 3.4 2.4 3.2 3.3 ...
$ Overall : num 3.4 2.9 2.9 2.8 3.1 2.9 2.6 2.2 3.7 2.9 ...
$ Avg.of.Totals: num 16.1 15.3 14.3 13.4 14.4 15.5 14.7 12.6 17.8 15.3 ...
$ Total.of.Avg : num 16.1 15.3 14.3 13.4 15.2 15.1 14.9 12.5 17.7 15.2 ...

In output in console you can see that object type is data.frame and it has 16 observation
(rows) and 11 variables (columns). Most of the columns are numeric (first one is integer),
two (Tomato and Source) are factors, by default character string as read in as factor. If you
want to read this data as string you need to change argument as.is = T or stringsAsFactor =
F. Without creating new object check differences in data.frame after adding this parameters
during reading in data.
> str(read.csv(file = theUrl, as.is = T))
> str(read.csv(file = theUrl, stringsAsFactors = F))

In read.table() help you can also see read.delim() wrapper. Separator for columns is “\t”, it’s
for tab-delimited files “\t” is specific sign in R which is tabulator.
2. Read_delim() and fread()
Read.table() function is generic function in R and it is enough for typical usage. But during
working with big data sets it will take a while to read in big CSV or tab-delimited files. This
problem is overcome in function read_delim() from package readr and fread() from package
data.table.
Let’s install required packages by proper commands in console or using RStudio interface.
> install.packages("readr")
> install.packages("data.table")

Now use read_delim() function


> library(readr) #remember to firstly load functions form installed package
> tomato2 = read_delim(file = theUrl, delim = ",")
Parsed with column specification:
cols(
Round = col_double(),
Tomato = col_character(),
Price = col_double(),
Source = col_character(),
Sweet = col_double(),

Programing and scripting 3. Loading and saving data in R, author: Wojciech Ciężkowski
Acid = col_double(),
Color = col_double(),
Texture = col_double(),
Overall = col_double(),
`Avg of Totals` = col_double(),
`Total of Avg` = col_double()
)

While executing read_delim() function information about columns names and its type will be
displayed in console. Use function str() on tomato2 object.
> str(tomato2)
Classes ‘spec_tbl_df’, ‘tbl_df’, ‘tbl’ and 'data.frame': 16 obs. of 11
variables:
$ Round : num 1 1 1 1 2 2 2 2 3 3 ...
$ Tomato : chr "Simpson SM" "Tuttorosso (blue)" "Tuttorosso
(green)" "La Fede SM DOP" ...
$ Price : num 3.99 2.99 0.99 3.99 5.49 4.99 3.99 3.99 4.53 NA ...
$ Source : chr "Whole Foods" "Pioneer" "Pioneer" "Shop Rite" ...
$ Sweet : num 2.8 3.3 2.8 2.6 3.3 3.2 2.6 2.1 3.4 2.6 ...
$ Acid : num 2.8 2.8 2.6 2.8 3.1 2.9 2.8 2.7 3.3 2.9 ...
$ Color : num 3.7 3.4 3.3 3 2.9 2.9 3.6 3.1 4.1 3.4 ...
$ Texture : num 3.4 3 2.8 2.3 2.8 3.1 3.4 2.4 3.2 3.3 ...
$ Overall : num 3.4 2.9 2.9 2.8 3.1 2.9 2.6 2.2 3.7 2.9 ...
$ Avg of Totals: num 16.1 15.3 14.3 13.4 14.4 15.5 14.7 12.6 17.8
15.3 ...
$ Total of Avg : num 16.1 15.3 14.3 13.4 15.2 15.1 14.9 12.5 17.7
15.2 ...
- attr(*, "spec")=
.. cols(
.. Round = col_double(),
.. Tomato = col_character(),
.. Price = col_double(),
.. Source = col_character(),
.. Sweet = col_double(),
.. Acid = col_double(),
.. Color = col_double(),
.. Texture = col_double(),
.. Overall = col_double(),
.. `Avg of Totals` = col_double(),
.. `Total of Avg` = col_double()
.. )

You will see different Classes then for tomato and also one again information about data
types in columns. Using read_delim() function will create tibble4 object. It is extended

4
https://tibble.tidyverse.org/ (access 12.11.2019)

Programing and scripting 3. Loading and saving data in R, author: Wojciech Ciężkowski
data.frame object with some added or subtracted features. You can see some differences it
this to object types by displaying both objects in console.
> print(tomato) #or simply tomato
> tomato2

Now use function fread() to read in the same data.


> tomato3 = fread(input = theUrl, sep =",", header = T) #in this function use should specyfi separator and if
there is row with colnames, by defult R will guess separator and if data hase colnames
trying URL 'http://www.jaredlander.com/data/TomatoFirst.csv'
Content type 'text/csv' length 1107 bytes
downloaded 1107 bytes

This function also give you some information in console about file after executing it. Us
function str() on tomato3 and write what you noticed in the script.
Both function read_delim() and fread() are fast and flexible function for reading in data.
Theirs inputs are two different object types connected with different packages for data
manipulation. Tibbles are commonly used in dplyr package and data.tables in data.table
package (both will be shown later in course), you can also use read.table() function and work
with standard data.frames.
3. Reading excel files
Excel is one of the most common tool for data analysis, despite its disadvantages (and also
some advantages) R user sooner or later will need to read excel files. One simple way to read
excel files is to open them in excel and save as CSV file. To avoid this additional step you
can also use readxl package and its main function read_excel(). This function cannot read
files from the internet so firstly you need to download file. You can do this using browser or
inside R using function downolad.file().
download.file(url = "http://www.jaredlander.com/data/ExcelExample.xlsx", destfile = "ExcelExample.xlsx",
method = "curl") #remember that file will be saved in you working directory under the name specify in
argument desfile.

You can open file in excel to see its content or use function excel_sheets() to see if file have
multiple sheets. If you have problem with this function copy file from disc indicated by the
lecturer and try again.
By default function read_excel() read in only first sheet form the file:
> tomatoXL = read_excel("ExcelExample.xlsx")

What object type is tomatoXL? (answer in script)


To read different sheet form xlsx file you need to add argument sheet = with name on of the
sheet or its number.

Programing and scripting 3. Loading and saving data in R, author: Wojciech Ciężkowski
> wine = read_excel("ExcelExample.xlsx", sheet = 2)
> head(wine)
# A tibble: 6 x 14
Cultivar Alcohol `Malic acid` Ash `Alcalinity of ~ Magnesium `Total
phenols` Flavanoids
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
<dbl> <dbl>
1 1 14.2 1.71 2.43 15.6 127
2.8 3.06
2 1 13.2 1.78 2.14 11.2 100
2.65 2.76
3 1 13.2 2.36 2.67 18.6 101
2.8 3.24
4 1 14.4 1.95 2.5 16.8 113
3.85 3.49
5 1 13.2 2.59 2.87 21 118
2.8 2.69
6 1 14.2 1.76 2.45 15.2 112
3.27 3.39
# ... with 6 more variables: `Nonflavanoid phenols` <dbl>, Proanthocyanins
<dbl>, `Color
# intensity` <dbl>, Hue <dbl>, `OD280/OD315 of diluted wines` <dbl>,
Proline <dbl>
> ACS = read_excel("ExcelExample.xlsx", sheet = "ACS")
There were 50 or more warnings (use warnings() to see the first 50)
> head(ACS)
# A tibble: 6 x 18
Acres FamilyIncome FamilyType NumBedrooms NumChildren
NumPeople NumRooms NumUnits
<dttm> <dbl> <chr> <dbl> <dbl>
<dbl> <dbl> <chr>
1 2015-01-10 00:00:00 150 Married 4 1
3 9 Single ~
2 2015-01-10 00:00:00 180 Female He~ 3 2
4 6 Single ~
3 2015-01-10 00:00:00 280 Female He~ 4 0
2 8 Single ~
4 2015-01-10 00:00:00 330 Female He~ 2 1
2 4 Single ~
5 2015-01-10 00:00:00 330 Male Head 3 1
2 5 Single ~
6 2015-01-10 00:00:00 480 Male Head 0 3
4 1 Single ~
# ... with 10 more variables: NumVehicles <dbl>, NumWorkers <dbl>, OwnRent
<chr>, YearBuilt <chr>,
# HouseCosts <dbl>, ElectricBill <dbl>, FoodStamp <chr>, HeatingFuel
<chr>, Insurance <dbl>,
# Language <chr>

Programing and scripting 3. Loading and saving data in R, author: Wojciech Ciężkowski
4. Importing time series
During previous exercise date type of data was introduced. Time series are common type of
data and proper handling of date formats is crucial during analysis. One useful package for
handling dates are lubridate (some function form this package will be used during course),
another package is openair (for more info you can look at openair project website 5 and
openair manual6). It contains functions that simplify the analysis of meteorological data and
their graphical representation. But a large part of the functions can be used for time of any
environmental parameters.
To see how to import data using openair package copy file meteo_data.csv for you working
directory. First read it to R using read.csv() function and look at this object structure and first
rows (to see last rows use function tail() which is similar to function head()).
> meteo = read.csv("meteo_data.csv", as.is = T)
> str(meteo)
'data.frame': 172800 obs. of 31 variables:
$ date : chr "2017-06-01 00:00:00" "2017-06-01 00:00:15" "2017-
06-01 00:00:30" "2017-06-01 00:00:45" ...
$ RECORD : int 233186 233187 233188 233189 233190 233191 233192
233193 233194 233195 ...
$ Year : int 2017 2017 2017 2017 2017 2017 2017 2017 2017
2017 ...
$ Month : int 6 6 6 6 6 6 6 6 6 6 ...
$ DOM : int 1 1 1 1 1 1 1 1 1 1 ...
$ Hour : int 0 0 0 0 0 0 0 0 0 0 ...
$ Minute : int 0 0 0 0 1 1 1 1 2 2 ...
$ Kd_Avg : num -0.956 -0.971 -0.994 -0.956 -0.986 ...
$ Ku_Avg : num 0.282 0.275 0.289 0.289 0.268 ...
$ CG3Up_Avg : num -21.3 -21.3 -21.1 -21.2 -21.3 ...
$ CG3Dn_Avg : num -5.96 -5.65 -5.63 -5.7 -5.4 ...
$ CNR4TC_Avg : num 13.4 13.4 13.4 13.4 13.4 ...
$ Ld_Avg : num 361 361 361 361 361 ...
$ Lu_Avg : num 376 376 376 376 377 ...
$ Albedo_Avg : num -0.295 -0.283 -0.29 -0.302 -0.272 ...
$ BilRad_Avg : num -16.5 -16.9 -16.8 -16.8 -17.1 ...
$ PAR_Avg : num -0.0591 0.0591 -0.0591 0.1772 0.1181 ...
$ PAR2_Avg : num -0.0338 -0.0338 0 0 -0.0338 ...
$ PAR3_Avg : num -0.1013 0.1013 -0.0338 0 0.0338 ...
$ PAR4_Avg : num -0.203 0.473 -0.135 -0.27 1.824 ...
$ t_50cm_Avg : num 13.4 13.4 13.1 13.1 13.1 ...
$ RH_50cm_Avg : num 74.5 74.3 73.9 73.4 73.4 ...
$ t_1m_Avg : num 13.5 13.2 13.5 13.8 13.2 ...
$ RH_1m_Avg : num 73.9 73.6 73.7 73.6 72.8 ...
$ t_2m_Avg : num 13.5 13.5 13.5 13.5 13.5 ...
$ RH_2m_Avg : num 66.4 66.6 65.6 66.2 66 ...
$ t_350cm_Avg : num 13.6 13.6 13.6 13.6 13.6 ...
5
http://www.openair-project.org/ (access 13.11.2019)
6
http://www.openair-project.org/PDF/OpenAir_Manual.pdf (access 13.11.2019)

Programing and scripting 3. Loading and saving data in R, author: Wojciech Ciężkowski
$ RH_350cm_Avg : num 64.4 64.5 63.9 64.3 64.1 ...
$ Hux002986_Avg: num -1.73 -1.73 -1.74 -1.72 -1.74 ...
$ IR_temp : num -62 -62 -62 -62 -62 ...
$ Batt_Volt_Min: num 13 12.9 13.1 13 13 ...
> head(meteo)
date RECORD Year Month DOM Hour Minute Kd_Avg
Ku_Avg CG3Up_Avg CG3Dn_Avg
1 2017-06-01 00:00:00 233186 2017 6 1 0 0 -0.9555723
0.2818218 -21.25717 -5.960277
2 2017-06-01 00:00:15 233187 2017 6 1 0 0 -0.9709833
0.2749476 -21.31627 -5.654272
3 2017-06-01 00:00:30 233188 2017 6 1 0 0 -0.9941016
0.2886949 -21.12113 -5.627661
4 2017-06-01 00:00:45 233189 2017 6 1 0 0 -0.9555727
0.2886956 -21.23944 -5.700847
5 2017-06-01 00:01:00 233190 2017 6 1 0 1 -0.9863980
0.2680746 -21.27493 -5.401505
6 2017-06-01 00:01:15 233191 2017 6 1 0 1 -0.9478669
0.3093168 -21.35771 -5.567808
CNR4TC_Avg Ld_Avg Lu_Avg Albedo_Avg BilRad_Avg PAR_Avg
PAR2_Avg PAR3_Avg PAR4_Avg
1 13.36784 360.8541 376.1510 -0.2949246 -16.53430 -0.05905434 -
0.03377911 -0.10133730 -0.2026747
2 13.37120 360.8129 376.4749 -0.2831641 -16.90795 0.05905397 -
0.03377891 0.10133670 0.4729047
3 13.36229 360.9605 376.4539 -0.2904079 -16.77627 -0.05905395
0.00000000 -0.03377889 -0.1351156
4 13.36671 360.8658 376.4044 -0.3021179 -16.78284 0.17715920
0.00000000 0.00000000 -0.2702272
5 13.36479 360.8200 376.6935 -0.2717712 -17.12789 0.11810660 -
0.03377853 0.03377853 1.8240410
6 13.35963 360.7097 376.4996 -0.3263294 -17.04707 -0.05905372 -
0.10133630 0.10133630 -0.1688938
t_50cm_Avg RH_50cm_Avg t_1m_Avg RH_1m_Avg t_2m_Avg RH_2m_Avg t_350cm_Avg
RH_350cm_Avg
1 13.43 74.48 13.50 73.87 13.50 66.37 13.56
64.36
2 13.42 74.33 13.21 73.60 13.51 66.58 13.59
64.53
3 13.14 73.93 13.47 73.70 13.50 65.58 13.58
63.86
4 13.11 73.38 13.78 73.60 13.51 66.21 13.56
64.26
5 13.11 73.38 13.22 72.82 13.52 65.97 13.57
64.07
6 13.12 73.38 13.21 72.66 13.52 66.05 13.57
64.28
Hux002986_Avg IR_temp Batt_Volt_Min

Programing and scripting 3. Loading and saving data in R, author: Wojciech Ciężkowski
1 -1.725 -61.95 13.01
2 -1.728 -61.97 12.90
3 -1.735 -61.97 13.09
4 -1.721 -62.01 13.01
5 -1.738 -61.95 13.00
6 -1.725 -62.03 13.09

You can see that this is quite big file with 31 columns (parameters) and 172800 rows
(observation).This is real data from WULS meteorological station from only one month.
Focus on first column which is date. Is read in as an character (because we set parameter
as.is = T, with default it will be factor). You should notice that data has 15 second time step.
Now we can try to change this column format to some date type, or we can use columns
Year, Month, DOM (day of the month), hour and minute to try create date columns (it was
basically done before, originally date didn’t have date column). Or we can use function
import() form openair package.
> library(openair)
> meteo2 = import("meteo_data.csv", file.type = "csv", date.format = "%Y-%m-%d %H:%M:%S")
date1 date2 RECORD Year Month
DOM Hour Minute
"POSIXct" "POSIXt" "integer" "integer" "integer"
"integer" "integer" "integer"
Kd_Avg Ku_Avg CG3Up_Avg CG3Dn_Avg CNR4TC_Avg
Ld_Avg Lu_Avg Albedo_Avg
"numeric" "numeric" "numeric" "numeric" "numeric"
"numeric" "numeric" "numeric"
BilRad_Avg PAR_Avg PAR2_Avg PAR3_Avg PAR4_Avg
t_50cm_Avg RH_50cm_Avg t_1m_Avg
"numeric" "numeric" "numeric" "numeric" "numeric"
"numeric" "numeric" "numeric"
RH_1m_Avg t_2m_Avg RH_2m_Avg t_350cm_Avg RH_350cm_Avg
Hux002986_Avg IR_temp Batt_Volt_Min
"numeric" "numeric" "numeric" "numeric" "numeric"
"numeric" "numeric" "numeric"
> str(meteo2)

'data.frame': 172800 obs. of 31 variables:


$ date : POSIXct, format: "2017-06-01 00:00:00" "2017-06-01
00:00:15" "2017-06-01 00:00:30" "2017-06-01 00:00:45" ...
$ RECORD : int 233186 233187 233188 233189 233190 233191 233192
233193 233194 233195 ...
$ Year : int 2017 2017 2017 2017 2017 2017 2017 2017 2017
2017 ...
$ Month : int 6 6 6 6 6 6 6 6 6 6 ...
$ DOM : int 1 1 1 1 1 1 1 1 1 1 ...
$ Hour : int 0 0 0 0 0 0 0 0 0 0 ...
$ Minute : int 0 0 0 0 1 1 1 1 2 2 ...
$ Kd_Avg : num -0.956 -0.971 -0.994 -0.956 -0.986 ...

Programing and scripting 3. Loading and saving data in R, author: Wojciech Ciężkowski
$ Ku_Avg : num 0.282 0.275 0.289 0.289 0.268 ...
$ CG3Up_Avg : num -21.3 -21.3 -21.1 -21.2 -21.3 ...
$ CG3Dn_Avg : num -5.96 -5.65 -5.63 -5.7 -5.4 ...
$ CNR4TC_Avg : num 13.4 13.4 13.4 13.4 13.4 ...
$ Ld_Avg : num 361 361 361 361 361 ...
$ Lu_Avg : num 376 376 376 376 377 ...
$ Albedo_Avg : num -0.295 -0.283 -0.29 -0.302 -0.272 ...
$ BilRad_Avg : num -16.5 -16.9 -16.8 -16.8 -17.1 ...
$ PAR_Avg : num -0.0591 0.0591 -0.0591 0.1772 0.1181 ...
$ PAR2_Avg : num -0.0338 -0.0338 0 0 -0.0338 ...
$ PAR3_Avg : num -0.1013 0.1013 -0.0338 0 0.0338 ...
$ PAR4_Avg : num -0.203 0.473 -0.135 -0.27 1.824 ...
$ t_50cm_Avg : num 13.4 13.4 13.1 13.1 13.1 ...
$ RH_50cm_Avg : num 74.5 74.3 73.9 73.4 73.4 ...
$ t_1m_Avg : num 13.5 13.2 13.5 13.8 13.2 ...
$ RH_1m_Avg : num 73.9 73.6 73.7 73.6 72.8 ...
$ t_2m_Avg : num 13.5 13.5 13.5 13.5 13.5 ...
$ RH_2m_Avg : num 66.4 66.6 65.6 66.2 66 ...
$ t_350cm_Avg : num 13.6 13.6 13.6 13.6 13.6 ...
$ RH_350cm_Avg : num 64.4 64.5 63.9 64.3 64.1 ...
$ Hux002986_Avg: num -1.73 -1.73 -1.74 -1.72 -1.74 ...
$ IR_temp : num -62 -62 -62 -62 -62 ...
$ Batt_Volt_Min: num 13 12.9 13.1 13 13 ...

You only need specify date format and file type (by default it is csv so you don’t need to add
this argument). Date need to be specify according ‘R’ format according to strptime. For
example, a date format such as 1/11/2000 12:00 (day/month/year hour:minutes) is given the
format “%d/%m/%Y %H:%M”. Use help for strptime to look for signs which are used for
selected date parts.
5. Different data formats
R allows to import data form more different formats and types. Package foreign contains
functions which can read in data from different popular statistical tools like: SAS, SPSS or
Octave. Additionally it can read also dbf (Data Base File) files.
Besides typical numeric data in tables R also can handle almost any type of data. For example
GIS (Geographic Information System) data can be read in to R for spatial analysis. Both
vector (do not confuse with the data structure in R) and raster data can be easily analysed you
packages like: rgdal, sp or raster.
Some of other format and data types will be cover further in this course. If you want to use R
for some specific data try find help in the internet (there are packages and help for very
specify usage) or ask lecturer.
6. Saving and exporting data
You already know how to save data in R binary files. It is easy way to share data with other R
user, but sometimes you need to export data as simple table readable for not R user (of course

Programing and scripting 3. Loading and saving data in R, author: Wojciech Ciężkowski
you can also exporting different data types, by the will be explained later if they are cover in
this course).
Exporting table is simply done by function write.table() (which as the read.table() have
similar wrappers).
write.csv(tomato, file = "tomat.csv")
write.csv(tomato, file = "tomato2.csv", row.names = F)

After writing csv file look at the difference in file without argument row.names = F, and with
this argument. Write answer in script.
At the end save your script with answers and environment image, you can also save
history if you like.

Programing and scripting 3. Loading and saving data in R, author: Wojciech Ciężkowski

You might also like