This document provides an introduction to programming in R for data science. It discusses installing R and RStudio, basic R syntax including comments and help documentation, and different data types and objects in R like vectors, matrices, factors, lists, and data frames. Key points covered include the history of R's development, downloading and setting up the coding environment, writing and running R scripts, and the basic atomic classes of numeric, integer, character, logical, and raw objects.
This document provides an introduction to programming in R for data science. It discusses installing R and RStudio, basic R syntax including comments and help documentation, and different data types and objects in R like vectors, matrices, factors, lists, and data frames. Key points covered include the history of R's development, downloading and setting up the coding environment, writing and running R scripts, and the basic atomic classes of numeric, integer, character, logical, and raw objects.
This document provides an introduction to programming in R for data science. It discusses installing R and RStudio, basic R syntax including comments and help documentation, and different data types and objects in R like vectors, matrices, factors, lists, and data frames. Key points covered include the history of R's development, downloading and setting up the coding environment, writing and running R scripts, and the basic atomic classes of numeric, integer, character, logical, and raw objects.
Week 1 - Introduction to R • History and overview • Getting started with R o Install R o Basic R syntax o Prepare the coding environment in R o R script o Help and documentation • Data types and objects in R History and Overview § R was developed from the S language (Roger Peng said in his book “R Programming for Data Science” that “R is a dialect of S”).
§ R, based on the first letter of first name of the two R
authors (Robert Gentleman and Ross Ihaka) from University of Auckland. History and Overview § R system is designed to two parts (the base system downloaded from CRAN) and everything else § CRAN = Comprehensive R Archive Network It is a collection of sites which carry identical material, consisting of the R distribution(s), the contributed extensions, documentation for R, and binaries. § The CRAN master site at WU (Vienna University of Economics and Business) in Austria can be found at the URL https://cran.r- project.org/ and is mirrored daily to many sites around the world. Please use the CRAN site closest to you to reduce network load (2 CRAN sites in the UK, one at Univ. Bristol and one at Imperial College London). History and Overview § R functionality (base packages plus packages for domain usage) Getting Started with R - R installation § Browser à Internet search for ‘R’ à CRAN (Comprehensive R Archive Network) à pick the appropriate link for download o In Windows o In MacOS o In Linux § R installation process is self-explanatory Getting Started with R - R installation § A slick visual Interface for R (RStudio) can be downloaded and installed once R is ready in your pc/laptop – This is the software majority R users use
§ Jupyter Notebook also supports R and can be used for
interactively developing and presenting data science projects using R R & RStudio Environment
Text editor for
writing script Console Basic R Syntax § R command prompt o all commands are typed on the R prompt > o the end of a command is indicated by the return key ↵ § Comments o helping text in R programme and ignored by R interpreter o single comment written using # Basic R Syntax
#comment command prompt
myString <- “Welcome to Module COMP1832” is a R expression
Print(myString) is another R expression
R creates a variable called myString and
delete old one with the same name. Getting Started with R - Write & execute scripts in R § R script o is a series of commands that you can execute at one time o is a plain text file with R command in it.
§ A script is a good way to keep track of what you're doing. If you
have a long analysis, and you want to be able to recreate it later, a good idea is to type it into a script.
§ Text editor is embedded in R/RStudio for writing scripts or you
can use any text editor to prepare your script. Getting Started with R - Write & execute scripts in R § Different ways to execute scripts in R ü Save as .R File to your working directory and load in R console ü Copy and paste code from text editor to your R console ü Run scripts (whole or partially by selection) directly (in RStudio) ü Save scripts as .txt file to your working directory and type source(file = “script_name.txt”) in R console to read scripts Getting Started with R - Prepare coding environment § Working directory is where R looks for files you ask it to load and where R will put any files that you ask it to save. o check working directory getwd() o set working directory setwd()
§ How to set up a working directory in (check this week’s supportive
reading for details) o Windows o Mac (equivalent commands on Linux since both Mac and Linux are based on Unix.)
§ Check objects in workspace ls()
§ Check files in workspace list.files() § Delete everything in the workspace rm(list=ls()) to clean up workspace Getting Started with R - Help & Documentation § Get help in R Type ?function name of ?’operator’ in R console for details of functions and operators https://www.r-project.org/help.html What version of R/packages are you https://www.rdocumentation.org using? What operating system? § Get help from Internet Can the problem be reproduced? Google it What steps will reproduce the problem? § Ask others for help by follow the rules What is the expected output? What do you see instead? Moodle discussion forum Stack Overflow https://stackoverflow.com Getting Started with R - Help & Documentation § Get help in RStudio o Use help/plot panel o Save your help search record and can review all your help list Getting Started with Statistics - Help & Documentation § Get help in Statistics Stack Exchange https://stats.stackexchange.com/
§ Get help from TalkStats.com
http://www.talkstats.com
§ Get help from Stackoverflow
https://stackoverflow.com/ Object types and structures § We call everything that we encounter in R objects. § Objects can be of different types. § Basic classes: 6 types of atomic classes of objects o Numeric (real number, decimal number) o Integer (0L, 2L, 23L, 999L) o Complex (e.g., 2+5i) o Character (‘a’, ”1”, “23”, “TRUE”) o Logical (TRUE/FALSE) o Raw (hold byte and used to process data byte by byte) https://www.tutorialkart.com/r-tutorial/r-data-types/ Object types and structures § Vectors o the most basic and simplest R objects o elements of only one type of class, otherwise coercion happens § Lists o special type of vector objects that can contain elements of different classes § Matrices o convert from a vector by adding the dimension attribute o elements of only one type of class Object types and structures § Factors o Represents categorical data/nominal variables (e.g., male, female) o consider a factor as an integer vector with label o An ordered factor represents an ordinal variable § Data Frames o special type of list to store tabular data o contain elements of different classes • Arrays o a special type of matrices and store data in more than 2 dimensions o elements of only one type of class Object types and structures R’s base object structures can be organised by their dimensionality (1d, 2d, or nd) and whether they’re homogeneous (all elements must be of the same type) or heterogeneous (elements can be of different types):
Dimension Homogeneous Heterogeneous
1d Atomic vector List Factor 2d Matrix Data frame nd Array