Professional Documents
Culture Documents
A Brief Introduction To R
A Brief Introduction To R
A Brief Introduction to R
What is R?
• R is a language and environment for statistical
computing and graphics
• R is the open source - public domain version of S+
• Initially developed by Robert Gentleman and Ross
Ihaka of University of Auckland (early 1990’s)
• R is written by statisticians for statisticians (and the
rest of us)
• An environment–huge library of algorithms for data
access, data manipulation, analysis and graphics
• A community
–Thousands of contributors, 2 million users
–Resources and help in every domain
1
11/14/2016
2
11/14/2016
3
11/14/2016
Caveat
Click here
4
11/14/2016
Click here
5
11/14/2016
Click here
6
11/14/2016
The R console
7
11/14/2016
Data types in R
• R has varied data types: scalars, vectors, matrices,
data frames and lists
• A vector is a single entity consisting of an ordered
collection of numbers (numeric, character, logical)
• A matrix is a vector that can be indexed by two or
more indices
• Data frames are matrix-like structures, in which the
columns can be of different types.
• Data frames are ‘data matrices’ with one row per
observational unit but with (possibly) both
numerical and categorical variables
Vector
• R is case-sensitive
• Assignment operators in R: <-, =
a<-c(1, 2, 5, 3, 6, -2 , 4) # numeric vector
b=c(“one”, ”two”, “three”) #character vector
c=c(TRUE, FALSE, TRUE, TRUE) #logical vector
• Elements of a vector can be referred to using
subscripts
• The following command will display the 2nd and 4th
elements of vector a
a[c(2, 4)]
8
11/14/2016
Matrix
• All columns in a matrix must have the same mode
(numeric, character, etc.) and the same length
mymatrix=matrix(vector,nrow=r,ncol=c,
byrow=FALSE,dimnames=list(char_vector_row
names, char_vector_colnames))
Example:
matrix1=matrix(1:20, 4, 5) #generates a 4x5 matrix
x=c(1:9)
rownames=c(“r1”,”r2”,”r3”)
colnames=c(“c1”,”c2”,”c3”)
matrix2=matrix(x, 3, 3, byrow=T,
dimnames=list(rownames,colnames)
Data Frame
• In a data frame different columns can have different
modes
• Similar to SAS and SPSS data sets
• Example:
x=c(1,2,3,4)
y=c(“red”, ”white”, ”red”, NA)
z=c(TRUE, TRUE, FALSE, FALSE)
mydata=data.frame(x,y,z) #will create the data frame
mydata
names(mydata)=c(“ID”, ”Color”, ”Passed”) #creates column
labels for mydata
9
11/14/2016
read.table("D:/DMPS/R Training/QUICK-R/
import1.csv",header=TRUE,sep=",")
10
11/14/2016
11
11/14/2016
12
11/14/2016
Variable labels
• Using the edit() function we can specify the
variable labels in the R spreadsheet
Value labels
• Use the factor() function for nominal data and the
ordered() function for ordinal data
• Suppose the variable v1 is coded 1, 2 or 3 and we
want to attach value labels 1=red, 2=blue and
3=green
mydata$v1=factor(mydata$v1, levels=c(1,2,3),
labels=c(“red”, ”blue”, ”green”))
13
11/14/2016
Value labels
• Suppose the variable y is coded 1, 3 or 5 and we
want to attach value labels 1=Low, 3=Medium, and
5=High
mydata$y=ordered(mydata$y, levels=c(1,3,5),
labels=c(“Low”, ”Medium”, ”High”))
14
11/14/2016
Recoding variables
• Suppose we want to categorize age as follows:
>75=Old, 45-75=Middle Aged, and <=45=Young
• This can be done as follows:
attach(mydata)
mydata$agecat[age<=45]=“Young”
mydata$agecat[age>45 and age<=75]=“Middle Aged”
mydata$agecat[age>75]=“Old”
detach(mydata)
Renaming variables
• There are many ways to do this
• The simplest is using the fix() function
mydata=fix(mydata) # results are saved on close
15
11/14/2016
Selecting variables
• The following command can be used to select
variables
newdata=mydata[c(“v1”,”v3”,”v15”)] # this selects variables
v1, v3, and v15 in
my data
Or,
16
11/14/2016
Excluding/removing variables
• The following command can be used to exclude
variables in the analysis
newdata=mydata[c(-1, -3)] # this will remove the 1st and 3rd
variables in mydata
Or,
Selecting observations
• Use the following commands to select observations
newdata=mydata[1:5,] #will select the first 5
observations in mydata
attach(mydata)
newdata=mydata[which(gender==“male” &
age>=65),] #will select males aged 65 and
over
detach(mydata)
17