Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

11/14/2016

A Brief Introduction to R

Dr. Norberto E. Milla

What is R?
• R is a language and environment for statistical
computing and graphics
• R is the open source - public domain version of S+
• Initially developed by Robert Gentleman and Ross
Ihaka of University of Auckland (early 1990’s)
• R is written by statisticians for statisticians (and the
rest of us)
• An environment–huge library of algorithms for data
access, data manipulation, analysis and graphics
• A community
–Thousands of contributors, 2 million users
–Resources and help in every domain

1
11/14/2016

Awesome thing #1: Its FREE!


• Open Source, licensed under GPL (like Linux!)
–Free as in freedom
• Flexible and runs on a wide array of platforms,
including Windows, Unix, and Mac OS X
• Open for integration
–Data ($A$, $P$$, $TATA, Excel, …)
• Broad user-base
–De-facto standard for data analysis and
teaching statistics

Awesome thing #2: Language


• Programming, not dialogs or cell formulas
–Freedom to combine methods
–Repeatable results
–Reliable and reusable
• Language designed for data analysis
–Object-oriented: vector, matrix, model, …
–Built-in library of algorithms
• Get more done, faster

2
11/14/2016

Awesome thing #3: Graphics


• Functions for standard graphs
–Scatterplot, boxplot, histogram, smoothing
–Bar plot, pie chart, dot chart, …
–Image plot, 3-D surface, map, …
• Customize without limits
–Combine graph types
–Create entirely new graphics
– Use of colors

Awesome thing #4: Statistics


• All standard statistical methods built in
–Mean, median, covariance, distributions, …
–Regression, ANOVA, cross-tabulations, …
–Survival, nonlinear mixed effects, GLM, …
–Neural networks, trees, GAM, …
• Object-oriented functions
–Access all parts of the analysis results
–Combine analytic methods
• Over 3,000 contributed packages for specialized
applications (as of 2011)

3
11/14/2016

Caveat

“Using R is a bit akin to smoking.


The beginning is difficult, one may
get headaches and even gag the
first few times. But in the long run,
it becomes pleasurable and even
addictive.”
--Francois Pinard

Downloading and installing R


Step 1: Go to the R homepage: http://www.r-project.org

Click here

4
11/14/2016

Downloading and installing R


Step 2: Select a CRAN mirror site
Click here

Downloading and installing R


Step 3: Select appropriate installer based on OS

Click here

5
11/14/2016

Downloading and installing R


Step 4: Select “base” installer

Downloading and installing R


Step 5: Download R installer

Click here

6
11/14/2016

Downloading and installing R


Step 5: Double click on the R application

Step 6. On the pop-up menu, click OK.


Step 7. Click Next on the next pop-up window and continue
answering all pop-up windows until you reach FINISH
window.

The R console

7
11/14/2016

Data types in R
• R has varied data types: scalars, vectors, matrices,
data frames and lists
• A vector is a single entity consisting of an ordered
collection of numbers (numeric, character, logical)
• A matrix is a vector that can be indexed by two or
more indices
• Data frames are matrix-like structures, in which the
columns can be of different types.
• Data frames are ‘data matrices’ with one row per
observational unit but with (possibly) both
numerical and categorical variables

Vector
• R is case-sensitive
• Assignment operators in R: <-, =
a<-c(1, 2, 5, 3, 6, -2 , 4) # numeric vector
b=c(“one”, ”two”, “three”) #character vector
c=c(TRUE, FALSE, TRUE, TRUE) #logical vector
• Elements of a vector can be referred to using
subscripts
• The following command will display the 2nd and 4th
elements of vector a
a[c(2, 4)]

8
11/14/2016

Matrix
• All columns in a matrix must have the same mode
(numeric, character, etc.) and the same length
mymatrix=matrix(vector,nrow=r,ncol=c,
byrow=FALSE,dimnames=list(char_vector_row
names, char_vector_colnames))

Example:
matrix1=matrix(1:20, 4, 5) #generates a 4x5 matrix

x=c(1:9)
rownames=c(“r1”,”r2”,”r3”)
colnames=c(“c1”,”c2”,”c3”)
matrix2=matrix(x, 3, 3, byrow=T,
dimnames=list(rownames,colnames)

Data Frame
• In a data frame different columns can have different
modes
• Similar to SAS and SPSS data sets
• Example:
x=c(1,2,3,4)
y=c(“red”, ”white”, ”red”, NA)
z=c(TRUE, TRUE, FALSE, FALSE)
mydata=data.frame(x,y,z) #will create the data frame
mydata
names(mydata)=c(“ID”, ”Color”, ”Passed”) #creates column
labels for mydata

9
11/14/2016

R built-in data editor


• One can enter data interactively into R using its
built-in spreadsheet
mydata=data.frame() #will create an empty data frame
mydata=edit(mydata) #will open the spreadsheet for
data entry
• Example:

Importing data from Excel


• For Excel 2003 or earlier, save the file in csv format
and use any one the following commands to import
the file into R
read.csv("D:/DMPS/R Training/QUICK-R/
import1.csv",header=TRUE,sep=",")
or,

read.table("D:/DMPS/R Training/QUICK-R/
import1.csv",header=TRUE,sep=",")

10
11/14/2016

Importing data from Excel


• For Excel 2007 or 2010, load first the xlsx library
using the following command
library(xlsx)

• Then use the following command to import the file


into R
read.xlsx("D:/DMPS/R Training/QUICK-R/
import2.xlsx",sheetIndex=1)
or, simply
read.xlsx("D:/DMPS/R Training/QUICK-R/
import2.xlsx“,1)

Importing data from SPSS


• There are two packages which can be used to
import SPSS data sets into R: foreign and Hmisc
• Load the foreign package
library(foreign)
• Use the following command to import the data into
R
myspssdata=read.spss(“D:/DMPS/R Training/QUICK-
R/ched_complete.sav”, use.value.labels=TRUE,
to.data.frame = TRUE)

11
11/14/2016

Importing data from SPSS


• Save the SPSS data set in portable (*.por) format
• Load the Hmisc package
library(Hmisc)
• Use the following command to import the data into
R
myspssdata=spss.get(“D:/DMPS/R Training/QUICK-
R/ched_complete.por”, use.value.labels=TRUE,
to.data.frame.=TRUE)

Importing data from STATA


• Call in the foreign package
library(foreign)
• Use the following command to import the data into
R
mystatadata=read.dta(“D:/DMPS/R Training/QUICK-
R/statadata.dta”, convert.factors=TRUE)

12
11/14/2016

Variable labels
• Using the edit() function we can specify the
variable labels in the R spreadsheet

• An alternative is by using the following command:

names(mydata)[3]=“age” # this assigns age as the label the 3rd


column of mydata

Value labels
• Use the factor() function for nominal data and the
ordered() function for ordinal data
• Suppose the variable v1 is coded 1, 2 or 3 and we
want to attach value labels 1=red, 2=blue and
3=green

mydata$v1=factor(mydata$v1, levels=c(1,2,3),
labels=c(“red”, ”blue”, ”green”))

13
11/14/2016

Value labels
• Suppose the variable y is coded 1, 3 or 5 and we
want to attach value labels 1=Low, 3=Medium, and
5=High

mydata$y=ordered(mydata$y, levels=c(1,3,5),
labels=c(“Low”, ”Medium”, ”High”))

Creating new variables


• There are three ways to create new variables from
existing variables in an R data set
• Suppose the R data set mydata has two variables x1
and x2 and we want to create two variables the
mean and sum of x1 and x2
• This can be accomplished as follows:
attach(mydata)
mydata$sum=x1+x2
mydata$sum=(x1+x2)/2
detach(mydata)

14
11/14/2016

Recoding variables
• Suppose we want to categorize age as follows:
>75=Old, 45-75=Middle Aged, and <=45=Young
• This can be done as follows:
attach(mydata)
mydata$agecat[age<=45]=“Young”
mydata$agecat[age>45 and age<=75]=“Middle Aged”
mydata$agecat[age>75]=“Old”
detach(mydata)

Renaming variables
• There are many ways to do this
• The simplest is using the fix() function
mydata=fix(mydata) # results are saved on close

15
11/14/2016

Merging data sets


• We can merge data sets horizontally using the
merge() function
newdata=merge(data1,data2,by=“id”) #assuming id is
common to data1 and
data2
• Vertical merging can be done using the rbind()
function
newdata=rbind(data1,data2) #assuming data1 and
data2 have the same
variables

Selecting variables
• The following command can be used to select
variables
newdata=mydata[c(“v1”,”v3”,”v15”)] # this selects variables
v1, v3, and v15 in
my data

Or,

newdata=mydata[c(5:10)] # this will select the 5th through


the 10th variables in mydata

16
11/14/2016

Excluding/removing variables
• The following command can be used to exclude
variables in the analysis
newdata=mydata[c(-1, -3)] # this will remove the 1st and 3rd
variables in mydata

Or,

mydata$v1=mydata$v3=NULL # this will delete the


variables v1 and v3 in mydata

Selecting observations
• Use the following commands to select observations
newdata=mydata[1:5,] #will select the first 5
observations in mydata

attach(mydata)
newdata=mydata[which(gender==“male” &
age>=65),] #will select males aged 65 and
over
detach(mydata)

17

You might also like