Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 19

Statistics with R

Some quotes on Data..

• Data is the new oil –Clive Humby (the famous phrase was later embraced by
World Economic Forum in 2011)
• Data is the new oil. We need to find it, extract it, distribute it and monetize it. –
David Buckingham
• The world’s most valuable resource is no longer oil, but data. –The Economist,
May 2017
• The data is not only new oil, but also new soil. –Mukesh Ambani (Hindustan
Times Leadership Summit, 2017)
• Fundamentals of R
• Overview of the language
• Input and Output of Data
• Operators in R
• Variables in R
History

R is an implementation of the S programming language. S was created by John


Chambers in 1976, while at Bell Labs. There are some important differences, but
much of the code written for S runs unaltered. R was created by R Ihaka and Robert
Gentleman at the University of Auckland, New Zealand and is developed by the R
development Core Team. R is named partly after the first names of the first two R
authors and partly as a play on the name of S. The project was conceived in 1992,
with an initial version released in 1995 and a stable beta version (v1.0) on 29
February 2000. (..from Wikipedia)

Official website: https://www.r-project.org/


The work was originally published in a research paper
• According to TIOBE Index Oct 2022, ‘R’ ranks 12th in popularity. The list is headed
by Python.
• For Data Scientists R and Python are two most preferred programming languages.
• R is always the first choice for academicians and data miners!
R programming language

• is free & open source


• has great community
• has more than 9000 packages
• is a language for statistical computing, data manipulation & graphics
• is a language for Data Scientists
• provides GUI
• is an IDE
Cont..

• At its heart, R is a functional programming language. But the R system includes


some support for object-oriented programming (OOP).
• R is an interpreted language. When you enter expressions into the R console (or
run an R script in batch mode), a program within the R system, called the
interpreter, executes the actual code that you wrote. Unlike C, C++, and Java,
there is no need to compile your programs into an object language.
• R is a low level programming language.
R Interface

• Start the R system, the main window (R GUI) with a sub window (R Console) will
appear.
• In the ‘Console’ window, the cursor is waiting for you to type in some R
commands.
R Introduction

• Results of calculations can be stored in objects using the assignment operators:


• An arrow (<-) formed by a smaller than character and a hyphen without a space!
• The equal character (=).

• These objects can then be used in other calculations. To print the object just
enter the name of the object. There are some restrictions when giving an object a
name:
• Object names cannot contain ‘strange’ symbols like !, +, -, #.
• A dot (.) and an underscore (_) are allowed, also a name starting with a dot.
• Object names can contain a number but cannot start with a number.
• R is case sensitive, X and x are two different objects, as well as temp and temP.
Few important points:

1. > is the command prompt and need not be typed.


2. R is case sensitive.
3. <- (=) called assignment operator: commands in R are separated by assignment
operator. It is made up of objects and functions.
4. Pressing ctrl+L will clear the console. However variables defined will remain in
the memory.
5. Use command rm(‘variable’) to remove a variable from console. It will be
removed from the memory. (clears the environment)
Some fundamental commands

How to know the date?


• date()
• Sys.Date()
• Sys.timezone()
• Sys.time()
How to see or get the current working directory?
• getwd()
How to set the working directory to a desired folder?
• setwd(“C:/Users/brain/OneDrive/Desktop/R”)
To Input data in R
• Assignment operator <-, =
• Example x<-5, y=4
To see all the objects in the Environment
• ls()
To remove a variable/object
• rm(x) to remove object ‘x’ from the environment
• rm(x,y) to remove objects ‘x’ and ‘y’ from the environment
• rm(list=ls()) to remove all the objects from the environment
To see the output
• Write name of object ‘x’ and press enter
• print(x)
To see the output using cat command
• cat(v)
• cat(“hello, how are you?”)
• print(“hello, how are you?”) command will produce the same output
• print(“the sum of 2 and 3 is”, 5) command will produce the output ‘the sum of 2 and 3 is’
only and not the 5 part
• cat(“the sum of 2 and 3 is”, 5) command will produce the output ‘the sum of 2 and 3 is 5’
output
• print(“sum of a and b is”, a+b) command will produce the output ‘sum of a and b is’
output only and will not perform the a+b operation. (objects a and b are assumed to be
predefined in the environment)
• cat(“sum of a and b is”, a+b) command will produce the output ‘sum of a and b is 5’
output. It will perform the a+b operation as well. So, cat command could be called
superior to the print command.
• paste(a), paste(a,b,c), paste(“sum of a and b is”, a+b) commands can also be used in place
of cat or print commands.
Types of Data Variables/Data Structures:

1. Logical
• a1=TRUE or a1=T
• a2=FALSE or a2=F
2. Numeric
• a3=4
• a4=7.6
3. Integer
• a5=10
4. Complex
• z=3+4i or z=complex(real=3,imaginary=4)
5. Character or String
• char=“Hello, how are you?”
Types of Operators:

• Assignment Operators (=),(<-)


• Mathematical or Arithmetic Operators
• Addition (+), x+y
• Subtraction (-), x-y
• Multiplication (*), x*y
• Division (/), x/y
• Exponent (^), x^y
• Modulus/Remainder (%%), x%%y
• Integer Division (%/%), x%/%y
• Logical Operators
• Equal to (==), x==y
• Not Equal to (!=), x!=y
• Less than (<), x<y
• Greater than (>), x>y
• Less than Equal to (<=), x<=y
• Greater than Equal to (>=), x>=y
• Element-wise Logical AND (&), x&y
• Element-wise Logical OR (|), x|y
• Logical AND (&&), x&&y {takes first element of both the vectors}
• Logical OR (||), x||y {takes first element of both the vectors}
Datasets

• There are some in-built datasets in R under the package named ‘datasets’.
• data(), command to see the list of these in-built datasets.
• View(iris), command to view a dataset named ‘iris’.
• dim(iris), command to see the dimensions of the dataset.
• names(iris), command to see the names of all the column heads in the dataset.
• summary(iris), to summarise a particular dataset.
• mean(trees$Height), command to view the mean value of the ‘Height’ head in
the ‘trees’ dataset.
• median(trees$Height), command to view the median value.
• plot(trees$Height,trees$Volume), command to create a scatterplot.
• hist(trees$Height), command to create a histogram.
Packages

• How to install a new package?


• install.packages(“moments”), command to install a package called ‘moments’.
• How to unlock a newly installed package?
• library(moments), command to unlock a newly installed package. Only installing a new
package isn’t enough, it only adds the package to our library of packages. We need to unlock
it only then we would be able to use it.
• Another way to unlock a newly installed package is to tick it on under the ‘packages’ tab. This
tab is available in the third window of the interface.

You might also like