Download as pdf or txt
Download as pdf or txt
You are on page 1of 21

Programming

Fundamentals for
Data Science
COMP1832

Dr. Jia Wang


Week 1 - Introduction to R
• History and overview
• Getting started with R
o Install R
o Basic R syntax
o Prepare the coding environment in R
o R script
o Help and documentation
• Data types and objects in R
History and Overview
§ R was developed from the S language (Roger Peng
said in his book “R Programming for Data Science”
that “R is a dialect of S”).

§ R, based on the first letter of first name of the two R


authors (Robert Gentleman and Ross Ihaka) from
University of Auckland.
History and Overview
§ R system is designed to two parts (the base system
downloaded from CRAN) and everything else
§ CRAN = Comprehensive R Archive Network
It is a collection of sites which carry identical material,
consisting of the R distribution(s), the contributed extensions,
documentation for R, and binaries.
§ The CRAN master site at WU (Vienna University of Economics
and Business) in Austria can be found at the URL https://cran.r-
project.org/ and is mirrored daily to many sites around the
world. Please use the CRAN site closest to you to reduce
network load (2 CRAN sites in the UK, one at Univ. Bristol and
one at Imperial College London).
History and Overview
§ R functionality (base packages plus packages for domain
usage)
Getting Started with R
- R installation
§ Browser à Internet search for ‘R’ à CRAN (Comprehensive R
Archive Network) à pick the appropriate link for download
o In Windows
o In MacOS
o In Linux
§ R installation process is self-explanatory
Getting Started with R
- R installation
§ A slick visual Interface for R (RStudio) can be
downloaded and installed once R is ready in your
pc/laptop – This is the software majority R users use

§ Jupyter Notebook also supports R and can be used for


interactively developing and presenting data science
projects using R
R & RStudio Environment

Text editor for


writing script
Console
Basic R Syntax
§ R command prompt
o all commands are typed on the R prompt >
o the end of a command is indicated by the return key ↵
§ Comments
o helping text in R programme and ignored by R interpreter
o single comment written using #
Basic R Syntax

#comment
command prompt

myString <- “Welcome to Module COMP1832” is a R expression


Print(myString) is another R expression

R creates a variable called myString and


delete old one with the same name.
Getting Started with R
- Write & execute scripts in R
§ R script
o is a series of commands that you can execute at one time
o is a plain text file with R command in it.

§ A script is a good way to keep track of what you're doing. If you


have a long analysis, and you want to be able to recreate it
later, a good idea is to type it into a script.

§ Text editor is embedded in R/RStudio for writing scripts or you


can use any text editor to prepare your script.
Getting Started with R
- Write & execute scripts in R
§ Different ways to execute scripts in R
ü Save as .R File to your working directory and load in R console
ü Copy and paste code from text editor to your R console
ü Run scripts (whole or partially by selection) directly (in RStudio)
ü Save scripts as .txt file to your working directory and type source(file =
“script_name.txt”) in R console to read scripts
Getting Started with R
- Prepare coding environment
§ Working directory is where R looks for files you ask it to load and where
R will put any files that you ask it to save.
o check working directory getwd()
o set working directory setwd()

§ How to set up a working directory in (check this week’s supportive


reading for details)
o Windows
o Mac (equivalent commands on Linux since both Mac and Linux are
based on Unix.)

§ Check objects in workspace ls()


§ Check files in workspace list.files()
§ Delete everything in the workspace rm(list=ls()) to clean up workspace
Getting Started with R
- Help & Documentation
§ Get help in R
Type ?function name of ?’operator’ in R console for details of functions
and operators
https://www.r-project.org/help.html What version of R/packages are you
https://www.rdocumentation.org using?
What operating system?
§ Get help from Internet
Can the problem be reproduced?
Google it What steps will reproduce the problem?
§ Ask others for help by follow the rules What is the expected output?
What do you see instead?
Moodle discussion forum
Stack Overflow https://stackoverflow.com
Getting Started with R
- Help & Documentation
§ Get help in RStudio
o Use help/plot panel
o Save your help search record and can review all your help list
Getting Started with Statistics
- Help & Documentation
§ Get help in Statistics Stack Exchange
https://stats.stackexchange.com/

§ Get help from TalkStats.com


http://www.talkstats.com

§ Get help from Stackoverflow


https://stackoverflow.com/
Object types and structures
§ We call everything that we encounter in R objects.
§ Objects can be of different types.
§ Basic classes: 6 types of atomic classes of objects
o Numeric (real number, decimal number)
o Integer (0L, 2L, 23L, 999L)
o Complex (e.g., 2+5i)
o Character (‘a’, ”1”, “23”, “TRUE”)
o Logical (TRUE/FALSE)
o Raw (hold byte and used to process data byte by byte)
https://www.tutorialkart.com/r-tutorial/r-data-types/
Object types and structures
§ Vectors
o the most basic and simplest R objects
o elements of only one type of class, otherwise coercion
happens
§ Lists
o special type of vector objects that can contain elements of
different classes
§ Matrices
o convert from a vector by adding the dimension attribute
o elements of only one type of class
Object types and structures
§ Factors
o Represents categorical data/nominal variables (e.g.,
male, female)
o consider a factor as an integer vector with label
o An ordered factor represents an ordinal variable
§ Data Frames
o special type of list to store tabular data
o contain elements of different classes
• Arrays
o a special type of matrices and store data in more than 2
dimensions
o elements of only one type of class
Object types and structures
R’s base object structures can be organised by their
dimensionality (1d, 2d, or nd) and whether they’re homogeneous
(all elements must be of the same type) or heterogeneous
(elements can be of different types):

Dimension Homogeneous Heterogeneous


1d Atomic vector List
Factor
2d Matrix Data frame
nd Array

You might also like