Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 36

Introduction to R

SAMSON LETA
samiwude@gmail.com
samson.leta@aau.edu.et
We’ll Cover

What is R

How to obtain and install R

Packages in R

How to read and export data

How to do basic statistical analyses

LM and GLM models in R


What is R

Software for Statistical Data Analysis

- written by Robert Gentleman and Ross Ihaka

Based on S

Programming Environment

Data Storage, Analysis, Graphing


Brief Introduction to R

Available at www.r-project.org R is Free and Open


Source Software
Runs on a wide variety of platforms:
 UNIX, Windows and MacOS.
R allows you to carry out statistical analyses in an
interactive mode, as well as allowing simple
programming
Current Version: R-3.5.0
Strengths and Weaknesses

Strengths
 Free and Open Source
 Strong User Community
 Highly extensible, flexible
 Implementation of high end statistical methods
 Flexible graphics and intelligent defaults
Weakness
 Steep learning curve
 Slow for large datasets
Installing R

 To use R, you first need to install the R program on your


computer.
 Installing R on a Windows PC – from Comprehensive R

Archive Network: http://cran.r-project.org


Starting R

Windows, Double-click on Desktop Icon


R Working Area

This is the area where all


commands are issued, and
non-graphical outputs
observed when run
interactively
Installing an R package

Sometimes we need additional functionality beyond

those offered by the core R library.


You can install an additional package from R CRAN
Installing RStudio

Rstudio (
http://www.rstudio.com/products/rstudio/downloa
d/
) is an integrated development environment (IDE)
for R.
It includes a console, syntax highlighting editor that
supports direct code execution, as well as tools for
plotting, history, debugging and workspace
management.
RStudio
Basics

 Highly Functional

 Everything done through functions


 Strict named arguments
 Abbreviations in arguments OK (e.g. T for TRUE)
 Object Oriented

 Everything is an object
 “<-” is an assignment operator
 “X <- 5”: X GETS the value 5
Getting Help in R

 From Documentation:

 ?WhatIWantToKnow
 help(“WhatIWantToKnow”)
 help.search(“WhatIWantToKnow”)
 help.start()
 getAnywhere(“WhatIWantToKnow”)
 example(“WhatIWantToKnow”)
Familiarizing with R
R comes with extensive documentation
help.start()
R objects - Data Structures

 Supports virtually any type of data

 Numbers, characters, logicals (TRUE/ FALSE)

 Arrays of virtually unlimited sizes

 Simplest: Vectors and Matrices

 Lists: Can Contain mixed type variables

 Data Frame: Rectangular Data Set


In an R Session…. A to Z

First, read data from other sources

 Use packages, libraries, and functions


 Write functions wherever necessary
Conduct Statistical Data Analysis
• Save outputs to files, write tables
Save R workspace if necessary (exit prompt)
Reading data into R

 R not well suited for data preprocessing

 Preprocess data elsewhere (Excel, SPSS, etc)

 Easiest form of data to input: text, csv file

 Read from other systems:

 Use the library “foreign”: library(foreign)


 Can import from SAS, SPSS, Epi Info and STATA
Reading Data into R

Read TXT files with read.delim() or read.table()

Read Comma Delimited files with read.csv() or

read.table()
Read Excel files with read.excel()
Operators and Expressions

The following table shows the standard arithmetic,

logical and relational operators you may use in


expressions: Operator Description
+ addition
- subtraction
* multiplication
/ division
^ or ** exponentiation
Operators and Expressions

Logical Operators
Operator Description
< less than
<= less than or equal to
> greater than
>= greater than or equal to
== exactly equal to
!= not equal to
!x Not x
x|y x OR y
x&y x AND y
Operators and Expressions

Functions
R has a large number of functions; here are a few
frequently-used mathematical functions,
abs(x) the absolute value of x
exp(x) the exponential function of x
ln(x) or log(x) the natural logarithm of x if x>0
log10(x) the log base 10 of x (for x>0)
round(x) x rounded to the nearest whole
number
sqrt(x) the square root of x if x >= 0
Statistical Functions

 Descriptive Statistics

 Statistical Modeling

 Regressions:
 Survival
 Time series
 Multivariate Functions

 Inbuilt Packages, contributed packages


Descriptive Statistics

 Has functions for all common statistics

 summary() gives lowest, mean, median, first, third quartiles,

highest for numeric variables


 table() gives tabulation of categorical variables
Data description

> summary
 Displaying data using plotting functions
> plot()
> hist()
> boxplot()
Statistical Modeling

Over 400 functions

 lm, glm, aov, t.test


Numerous libraries & packages

 lattice, MASS, survival, …


Regressions

Linear models (lm)

Generalized linear models (glm)


Regressions

Fitting linear model

 Simple

 Multiple
Regressions

How to model
Specify your model like this:

 y ~ xi+ci, where

 y = outcome variable, xi = main explanatory variables, ci =


covariates, + = add terms
 Operators have special meanings
 + = add terms, : = interactions, / = nesting, so on…
Regressions

How to model
 Modeling -- object oriented

 each modeling procedure produces objects


 classes and functions for each object
Regressions

Model simplification

 Comparing nested modes (anova)


 Stepwise variable elimination (stepAIC)
– library MASS
Regressions

Model diagnosis
 plot() – general
 Normality - hist(), qqnorm()/qqline, shapiro.test()
 Homoscedasticity - by plotting the standardised residuals
against the predicted values
ncvTest - library(car)
 Linearity – plotting continuous variable with residual
error
plot(age, res)
lines(lowess(age, res))
Regressions

Assessment of individual observations


Outliers – outlierTest(), qqPlot()

Leverage – oservations with large X- value

Influential observation –cooks.distance(),

influencePlot()
Generalized linear models -glm

Family/ Explanatory
Model random Link variables/systematic
component components

Linear
Normal Identity Continuous
Regression

ANOVA Normal Identity Categorical

Logistic
Binomial Logit Mixed
Regression
Poisson
Poisson Log Mixed
Regression
Generalized linear models -glm

?family
o binomial(link = "logit")
o gaussian(link = "identity")
o Gamma(link = "inverse")
o inverse.gaussian(link = "1/mu^2")
o poisson(link = "log")
For more resources, check out…

R home page
http://www.r-project.org
R discussion group
http://www.stat.math.ethz.ch/mailman/listinfo/r-help

Search Google for R and Statistics


The End
THANK YOU

You might also like