Professional Documents
Culture Documents
Introduction to R
Introduction to R
SAMSON LETA
samiwude@gmail.com
samson.leta@aau.edu.et
We’ll Cover
What is R
Packages in R
Based on S
Programming Environment
Strengths
Free and Open Source
Strong User Community
Highly extensible, flexible
Implementation of high end statistical methods
Flexible graphics and intelligent defaults
Weakness
Steep learning curve
Slow for large datasets
Installing R
Rstudio (
http://www.rstudio.com/products/rstudio/downloa
d/
) is an integrated development environment (IDE)
for R.
It includes a console, syntax highlighting editor that
supports direct code execution, as well as tools for
plotting, history, debugging and workspace
management.
RStudio
Basics
Highly Functional
Everything is an object
“<-” is an assignment operator
“X <- 5”: X GETS the value 5
Getting Help in R
From Documentation:
?WhatIWantToKnow
help(“WhatIWantToKnow”)
help.search(“WhatIWantToKnow”)
help.start()
getAnywhere(“WhatIWantToKnow”)
example(“WhatIWantToKnow”)
Familiarizing with R
R comes with extensive documentation
help.start()
R objects - Data Structures
read.table()
Read Excel files with read.excel()
Operators and Expressions
Logical Operators
Operator Description
< less than
<= less than or equal to
> greater than
>= greater than or equal to
== exactly equal to
!= not equal to
!x Not x
x|y x OR y
x&y x AND y
Operators and Expressions
Functions
R has a large number of functions; here are a few
frequently-used mathematical functions,
abs(x) the absolute value of x
exp(x) the exponential function of x
ln(x) or log(x) the natural logarithm of x if x>0
log10(x) the log base 10 of x (for x>0)
round(x) x rounded to the nearest whole
number
sqrt(x) the square root of x if x >= 0
Statistical Functions
Descriptive Statistics
Statistical Modeling
Regressions:
Survival
Time series
Multivariate Functions
> summary
Displaying data using plotting functions
> plot()
> hist()
> boxplot()
Statistical Modeling
Simple
Multiple
Regressions
How to model
Specify your model like this:
y ~ xi+ci, where
How to model
Modeling -- object oriented
Model simplification
Model diagnosis
plot() – general
Normality - hist(), qqnorm()/qqline, shapiro.test()
Homoscedasticity - by plotting the standardised residuals
against the predicted values
ncvTest - library(car)
Linearity – plotting continuous variable with residual
error
plot(age, res)
lines(lowess(age, res))
Regressions
influencePlot()
Generalized linear models -glm
Family/ Explanatory
Model random Link variables/systematic
component components
Linear
Normal Identity Continuous
Regression
Logistic
Binomial Logit Mixed
Regression
Poisson
Poisson Log Mixed
Regression
Generalized linear models -glm
?family
o binomial(link = "logit")
o gaussian(link = "identity")
o Gamma(link = "inverse")
o inverse.gaussian(link = "1/mu^2")
o poisson(link = "log")
For more resources, check out…
R home page
http://www.r-project.org
R discussion group
http://www.stat.math.ethz.ch/mailman/listinfo/r-help