Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 20

Dear readers, these R Interview Questions have been designed specially to

get you acquainted with the nature of questions you may encounter during
your interview for the subject of R programming. As per my experience
good interviewers hardly plan to ask any particular question during your
interview, normally questions start with some basic concept of the subject
and later they continue based on further discussion and what you answer −
What is R Programming?
R is a programming language meant for statistical analysis and creating graphs for this
purpose.Instead of data types, it has data objects which are used for calculations. It is used in the
fields of data mining, Regression analysis, Probability estimation etc., using many packages
available in it.
What are the different data objects in R?
There are 6 data objects in R. They are vectors, lists, arrays, matrices, data frames and tables.
What makes a valid variable name in R?
A valid variable name consists of letters, numbers and the dot or underline characters. The
variable name starts with a letter or the dot not followed by a number.
What is the main difference between an Array and a matrix?
A matrix is always two dimensional as it has only rows and columns. But an array can be of any
number of dimensions and each dimension is a matrix. For example a 3x3x2 array represents 2
matrices each of dimension 3x3.
Which data object in R is used to store and process categorical data?
The Factor data objects in R are used to store and process categorical data in R.
How can you load and use csv file in R?
A csv file can be loaded using the read.csv function. R creates a data frame on reading the csv
files using this function.
How do you get the name of the current working directory in R?
The command getwd() gives the current working directory in the R environment.
What is R Base package?
This is the package which is loaded by default when R environment is set. It provides the basic
functionalities like input/output, arithmetic calculations etc. in the R environment.
How R is used in logistic regression?
Logistic regression deals with measuring the probability of a binary response variable. In R
the function glm() is used to create the logistic regression.
How do you access the element in the 2nd column and 4th row of a matrix named M?
The expression M[4,2] gives the element at 4th row and 2nd column.
What is recycling of elements in a vector? Give an example.
When two vectors of different length are involved in a operation then the elements of the shorter
vector are reused to complete the operation. This is called element recycling. Example - v1 <-
c(4,1,0,6) and V2 <- c(2,4) then v1*v2 gives (8,4,0,24). The elements 2 and 4 are repeated.
What are different ways to call a function in R?
We can call a function in R in 3 ways. First method is to call by using position of the
arguments. Second method id to call by using the name of the arguments and the third method
is to call by default arguments.
What is lazy function evaluation in R?
The lazy evaluation of a function means, the argument is evaluated only if it is used inside the
body of the function. If there is no reference to the argument in the body of the function then it
is simply ignored.
How do you install a package in R?
To install a package in R we use the below command.

install.packages("package Name")

Name a R packages which is used to read XML files.


The package named "XML" is used to read and process the XML files.
Can we update and delete any of the elements in a list?
We can update any of the element but we can delete only the element at the end of the list.
Give the general expression to create a matrix in R.
The general expression to create a matrix in R is - matrix(data, nrow, ncol, byrow, dimnames)
which function is used to create a boxplot graph in R?
The boxplot() function is used to create boxplots in R. It takes a formula and a data frame as
inputs to create the boxplots.
In doing time series analysis, what does frequency = 6 means in the ts() function?
Frequency 6 indicates the time interval for the time series data is every 10 minutes of an hour.

Different Time Intervals


The value of the frequency parameter in the ts() function decides the time intervals at which the
data points are measured. A value of 12 indicates that the time series is for 12 months. Other
values and its meaning is as below −
 frequency = 12 pegs the data points for every month of a year.
 frequency = 4 pegs the data points for every quarter of a year.
 frequency = 6 pegs the data points for every 10 minutes of an hour.
 frequency = 24*6 pegs the data points for every 10 minutes of a day

What is reshaping of data in R?


In R the data objects can be converted from one form to another. For example we can create a
data frame by merging many lists. This involves a series of R commands to bring the data into
the new format. This is called data reshaping.
What is the output of runif(4)?
It generates 4 random numbers between 0 and 1.
How to get a list of all the packages installed in R ?
Use the command

installed.packages()

What is expected from running the command - strsplit(x,"e")?


It splits the strings in vector x into substrings at the position of letter e.
Give a R script to extract all the unique words in uppercase from the string - "The quick brown
fox jumps over the lazy dog".

x <- "The quick brown fox jumps over the lazy dog"

split.string <- strsplit(x, " ")

extract.words <- split.string[[1]]

result <- unique(tolower(extract.words))

print(result)

Vector v is c(1,2,3,4) and list x is list(5:8), what is the output of v*x[1]?


Error in v * x[1] : non-numeric argument to binary operator
Vector v is c(1,2,3,4) and list x is list(5:8), what is the output of v*x[[1]]?
[1] 5 12 21 32s
What does unlist() do?
It converts a list to a vector.
Give the R expression to get 26 or less heads from a 51 tosses of a coin using pbinom.
x <- pbinom(26,51,0.5)

print(x)

X is the vector c(5,9.2,3,8.51,NA), What is the output of mean(x)?


NA
How do you convert the data in a JSON file to a data frame?
Using the function as.data.frame()
Give a function in R that replaces all missing values of a vector x with the sum of elements
of that vector?

function(x) { x[is.na(x)] <- sum(x, na.rm = TRUE); x }

What is the use of apply() in R?


It is used to apply the same function to each of the elements in an Array. For example finding
the mean of the rows in every row.
Is an array a matrix or a matrix an array?
Every matrix can be called an array but not the reverse. Matrix is always two dimensional but
array can be of any dimension.
How to find the help page on missing values?
?NA
How do you get the standard deviation for a vector x?
sd(x, na.rm=TRUE)
How do you set the path for current working directory in R?
setwd("Path")
What is the difference between "%%" and "%/%"?
"%%" gives remainder of the division of first vector with second while "%/%" gives the
quotient of the division of first vector with second.
What does col.max(x) do?
Find the column has the maximum value for each row.
Give the command to create a histogram.
hist()
How do you remove a vector from the R workspace?
rm(x)
List the data sets available in package "MASS"
data(package = "MASS")
List the data sets available in all available packages.

data(package = .packages(all.available = TRUE))

What is the use of the command - install.packages(file.choose(), repos=NULL)?


It is used to install a r package from local directory by browsing and selecting the file.
Give the command to check if the element 15 is present in vector x.

15 %in% x

Give the syntax for creating scatterplot matrices.

pairs(formula, data)

Where formula represents the series of variables used in pairs and data represents the data set
from which the variables will be taken.
What is the difference between subset() function and sample() function in R?
The subset() functions is used to select variables and observations. The sample() function is
used to choose a random sample of size n from a dataset.
How do you check if "m" is a matrix data object in R?
is.matrix(m) should retrun TRUE.
What is the output for the below expression all(NA==NA)?
[1] NA
How to obtain the transpose of a matrix in R?
The function t() is used for transposing a matrix. Example - t(m) , where m is a matrix.
What is the use of "next" statement in R?
The "next" statement in R programming language is useful when we want to skip the current
iteration of a loop without terminating it.

R - Data Types
Generally, while doing programming in any programming language, you
need to use various variables to store various information. Variables are
nothing but reserved memory locations to store values. This means that,
when you create a variable you reserve some space in memory.
You may like to store information of various data types like character, wide
character, integer, floating point, double floating point, Boolean etc. Based
on the data type of a variable, the operating system allocates memory and
decides what can be stored in the reserved memory.

In contrast to other programming languages like C and java in R, the


variables are not declared as some data type. The variables are assigned
with R-Objects and the data type of the R-object becomes the data type of
the variable. There are many types of R-objects. The frequently used ones
are −

 Vectors

 Lists

 Matrices

 Arrays

 Factors

 Data Frames

The simplest of these objects is the vector object and there are six data
types of these atomic vectors, also termed as six classes of vectors. The
other R-Objects are built upon the atomic vectors.

Data Type Example Verify

Logical TRUE, FALSE  Live Demo

v <- TRUE

print(class(v))

it produces the following


result −

[1] "logical"

Numeric 12.3, 5, 999  Live Demo


v <- 23.5

print(class(v))

it produces the following


result −

[1] "numeric"

Integer 2L, 34L, 0L  Live Demo

v <- 2L

print(class(v))

it produces the following


result −

[1] "integer"

Complex 3 + 2i  Live Demo

v <- 2+5i

print(class(v))

it produces the following


result −

[1] "complex"

Character 'a' , '"good", "TRUE", '23.4'  Live Demo

v <- "TRUE"

print(class(v))

it produces the following


result −

[1] "character"

Raw "Hello" is stored as 48 65 6c 6c  Live Demo


6f
v <- charToRaw("Hello")

print(class(v))

it produces the following


result −

[1] "raw"

Business Analytics with R Interview Questions


And Answers

1) What is R Programming?
A) R is a language and environment for statistical computing and graphics. It is a GNU project which
is similar to the S language and environment which was developed at Bell Laboratories.

 R can be considered as a different implementation of S. There are some important differences, but
much code written for S runs unaltered under R.

2) What are the advantages of using R for business analytics?


A) R provides a wide variety of statistical (linear and nonlinear modeling, classical statistical tests,
time-series analysis, classification, clustering, …) and graphical techniques, and is highly extensible.
The S language is often the vehicle of choice for research in statistical methodology, and R provides
an Open Source route to participation in that activity.

One of R’s strengths is the ease with which well-designed publication-quality plots can be produced,
including mathematical symbols and formulae where needed. Great care has been taken over the
defaults for the minor design choices in graphics, but the user retains full control.

3) What operating systems can R support?


A) R is available as Free Software under the terms of the Free Software Foundation’s GNU General
Public License in source code form. It compiles and runs on a wide variety of UNIX platforms and
similar systems (including FreeBSD and Linux), Windows and MacOS.

4) Explain the R environment?


A) R is an integrated suite of software facilities for data manipulation, calculation and graphical
display. It includes

 an effective data handling and storage facility,


 a suite of operators for calculations on arrays, in particular matrices,
 a large, coherent, integrated collection of intermediate tools for data analysis,
 graphical facilities for data analysis and display either on-screen or on hardcopy, and
 a well-developed, simple and effective programming language which includes
conditionals, loops, user-defined recursive functions and input and output facilities.

5) What are vectors in R?


A) R operates on named data structures. The simplest such structure is the numeric vector, which is
a single entity consisting of an ordered collection of numbers.

6) What are logical vectors in R?


A) As well as numerical vectors, R allows manipulation of logical quantities. The elements of a
logical vector can have the values TRUE, FALSE, and NA

7) What are the types of objects in R?


A) Vectors are the most important type of object in R, but there are several others which we will
meet more formally in later sections.

matrices or more generally arrays are multi-dimensional generalizations of vectors. In fact, they are
vectors that can be indexed by two or more indices and will be printed in special ways. See Arrays
and matrices.

factors provide compact ways to handle categorical data.


lists are a general form of vector in which the various elements need not be of the same type, and
are often themselves vectors or lists. Lists provide a convenient way to return the results of a
statistical computation.

data frames are matrix-like structures, in which the columns can be of different types. Think of data
frames as ‘data matrices’ with one row per observational unit but with (possibly) both numerical and
categorical variables. Many experiments are best described by data frames: the treatments are
categorical but the response is numeric.

functions are themselves objects in R which can be stored in the project’s workspace. This provides
a simple and convenient way to extend R.

8) What are the concatenation functions in R?


A) cbind() and rbind() are concatenation functions in R.

9) What are Data frames in R?


A) A data frame is a list with class "data.frame".

R Programming Interview Questions And Answers

10) What are attach(), search and detach() functions in R?


A) The attach() function in R can be used to make objects within data frames accessible in R with
fewer keystrokes

ds = read.csv("http://www.math.smith.edu/r/data/help.csv")
names(ds)
attach(ds)
mean(cesd)
[1] 32.84768
The search() function can be used to list attached objects and packages. Let's see what is there,
then detach() the dataset to clean up after ourselves.
search()
> search()
 [1] ".GlobalEnv"        "ds"                "tools:RGUI"        "package:stats"
 [5] "package:graphics"  "package:grDevices" "package:utils"     "package:datasets" 
 [9] "package:methods"   "Autoloads"         "package:base"    
detach(ds)

11) What is the read.table() function in R?


A) To read an entire data frame directly, the external file will normally have a special form.

The first line of the file should have a name for each variable in the data frame.
Each additional line of the file has as its first item a row label and the values for each variable.

12) What are the generic functions for extracting model information in R?
A) The value of lm() is a fitted model object; technically a list of results of class "lm". Information
about the fitted model can then be displayed, extracted, plotted and so on by using generic functions
that orient themselves to objects of class "lm". These include

add1    deviance   formula      predict  step


alias   drop1      kappa        print    summary
anova   effects    labels       proj     vcov
coef    family     plot         residuals

13) What is anova(object_1, object_2)?


A) anova() function compare a submodel with an outer model and produce an analysis of variance
table.

14) What is coef(object)?


A) coefficient() function extract the regression coefficient (matrix).

Long form: coefficients(object).

15) What is deviance(object)?


A) deviance() function finds residual sum of squares, weighted if appropriate.

R Interview Questions For Data Analyst

16) What is formula(object)?


A) formula() function extract the model formula.

17) What is plot(object)?


A) Produce four plots, showing residuals, fitted values and some diagnostics.

18) What is predict(object, newdata=data.frame)?


A) predict() function - The data frame supplied must have variables specified with the same labels as
the original. The value is a vector or matrix of predicted values corresponding to the determining
variable values in data.frame.

19) What is print(object)?


A) print() function print a concise version of the object. Most often used implicitly.

20) What is residuals(object)?
A) residuals() function extract the (matrix of) residuals, weighted as appropriate.

Short form: resid(object).

21) What is step(object)?


A) step() function select a suitable model by adding or dropping terms and preserving hierarchies.
The model with the smallest value of AIC (Akaike’s An Information Criterion) discovered in the
stepwise search is returned.

22) What is summary(object)?


A) summary() function print a comprehensive summary of the results of the regression analysis.
23) What is vcov(object)?
A) vcov() returns the variance-covariance matrix of the main parameters of a fitted model object.

24) What are Families in R?


A) The class of generalized linear models handled by facilities supplied in R includes gaussian,
binomial, poisson, inverse gaussian and gamma response distributions and also quasi-likelihood
models where the response distribution is not explicitly specified. In the latter case the variance
function must be specified as a function of the mean, but in other cases this function is implied by
the response distribution.

25) What is the glm() function in R?


A) Since the distribution of the response depends on the stimulus variables through a single linear
function only, the same mechanism as was used for linear models can still be used to specify the
linear part of a generalized model. The family has to be specified in a different way.

The R function to fit a generalized linear model is glm() which uses the form

> fitted.model <- glm(formula, family=family.generator, data=data.frame)

4 tricky SAS questions commonly


asked in interview
TAVISH SRIVASTAVA, NOVEMBER 24, 2013 

Introduction

While working extensively on SAS-EG, I lost touch of coding in Base SAS. I had to
brush up my base SAS before appearing for my first lateral interview. SAS is highly
capable of data triangulation, and what distinguishes SAS from other such languages is
its simplicity to code.

There are some very tricky SAS questions and handling them might become
overwhelming for some candidates. I strongly feel a need of a common thread which
has all the tricky SAS questions asked in interviews. This article will give a kick start to
such a thread. This article will cover 4 of such questions with relevant examples. This
article is the first part of tricky SAS questions series. Please note that the content of
these articles is based on the information I gathered from various SAS sources.

And if you’re looking to land your first data science role – look no further than the ‘Ace
Data Science Interviews‘ course. It is a comprehensive course spanning tons of videos
and resources (including a mammoth interview questions and answers guide).

1. Merging data in SAS :

Merging datasets is the most important step for an analyst. Merging data can be done
through both DATA step and PROC SQL. Usually people ignore the difference in the
method used by SAS in the two different steps. This is because generally there is no
difference in the output created by the two routines. Lets look at the following example :

Problem Statement : In this example, we have 2 datasets. First table gives the product
holding for a particular household. Second table gives the gender of each customer in
these households. What you need to find out is that if the product is Male biased or
neutral. The Male biased product is a product bought by males more than females. You
can assume that the product bought by a household belongs to each customer of that
household.

Thought process: The first step of this problem is to merge the two tables. We need a
Cartesian product of the two tables in this case. After getting the merged dataset, all
you need to do is summarize the merged dataset and find the bias.

Code 1

Proc sort data = PROD out =A1; by household;run;


Proc sort data = GENDER out =A2; by household;run;

Data MERGED;

merge A1(in=a) A2(in=b);

by household;

if a AND b;

run;

Code 2 :

PROC SQL;

Create table work.merged as

select t1.household,  t1.type,t2.gender

from prod as t1, gender as t2

where t1.household = t2.household;

quit; 

Will both the codes give the same result?

The answer is NO. As you might have noticed, the two tables have many-to-many
mapping. For getting a cartesian product, we can only use PROC SQL. Apart from
many-to-many tables, all the results of merging using the two steps will be exactly
same.

Why do we use DATA – MERGE step at all?

DATA-MERGE step is much faster compared to PROC SQL. For big data sets except
one having many-to-many mapping, always use DATA- MERGE.

2. Transpose data-sets :

When working on transactions data, we frequently transpose datasets to analyze data.


There are two kinds of transposition. First, transposing from wide structure to narrow
structure. Consider the following example :

Following are the two methods to do this kind of transposition :

a. DATA STEP :

data transposed;set base;

array Qtr{3} Q:;

do i = 1 to 3;Period = cat('Qtr',i);Amount = Qtr{i} ;output;end;

drop Q1:Q3;

if Amount ne .;
run; 

b. PROC TRANSPOSE :

proc transpose data = base out = transposed

(rename=(Col1=Amount) where=(Amount ne .)) name=Period;

by cust; run; 

In this kind of transposition, both the methods are equally good. PROC TRANSPOSE
however takes lesser time because it uses indexing to transpose.

Second, narrow to wide structure. Consider an opposite of the last example.

For this kind of transposition, data step becomes very long and time consuming.
Following is a much shorter way to do the same task,

Proc transpose data=transposed out=base (drop=_name_) prefix Q;

by cust;

id period;
var amount;

run; 

3. Passing values from one routine to other:

Imagine a scenario, we want to compare the total marks scored by two classes. Finally
the output should be simply the name of the class with the higher score. The score of
the two datasets is stored in two separate tables.

There are two methods of doing this question. First, append the two tables and sum the
total marks for each or the classes. But imagine if the number of students were too
large, we will just multiply the operation time by appending the two tables. Hence, we
need a method to pass the value from one table to another. Try the following code:

DATA _null_;set class_1;

total + marks;

call symputx ('class1_tot',total);

run;

DATA _null_;set class_2;

total + marks;
call symputx ('class2_tot',total);

run;

DATA results;

if &class1_tot > &class2_tot then better_class = 1;

else if &class1_tot > &class2_tot then better_class = 2;

else better_class = 0;

run; 

Funtion symputx creates a macro variable which can be passed between various
routines and thus gives us an opportunity to link data-sets.

4. Using where and if : 

“Where” and “if” are both used for sub-setting. Most of the times where and if can be
used interchangeably in data step for sub-setting. But, when sub-setting is done on a
newly created variable, only if statement can be used. For instance, consider the
following two programs,

Code 1 :                                                                                  Code 2 :

data a;set b;                         data a;set b;

z= x+y;                                  z= x+y;

if z < 10;                               where z < 10;


run;                                     run; 

Code 2 will give an error in this case, because where cannot be used for sub-setting
data based on a newly created variable.

End Notes : 

These codes come directly from my cheat chit. What is especial about these 4 codes,
that in aggregate they give me a quick glance to almost all the statement and options
used in SAS. If you were able to solve all the questions covered in this article, we think
you are up for the next level. You can read the second part of this article here
( https://www.analyticsvidhya.com/blog/2014/04/tricky-base-sas-interview-questions-
part-ii/ ) . The second part of the article will have tougher and lengthier questions as
compared to those covered in this article

zss

You might also like