Professional Documents
Culture Documents
R Interview
R Interview
get you acquainted with the nature of questions you may encounter during
your interview for the subject of R programming. As per my experience
good interviewers hardly plan to ask any particular question during your
interview, normally questions start with some basic concept of the subject
and later they continue based on further discussion and what you answer −
What is R Programming?
R is a programming language meant for statistical analysis and creating graphs for this
purpose.Instead of data types, it has data objects which are used for calculations. It is used in the
fields of data mining, Regression analysis, Probability estimation etc., using many packages
available in it.
What are the different data objects in R?
There are 6 data objects in R. They are vectors, lists, arrays, matrices, data frames and tables.
What makes a valid variable name in R?
A valid variable name consists of letters, numbers and the dot or underline characters. The
variable name starts with a letter or the dot not followed by a number.
What is the main difference between an Array and a matrix?
A matrix is always two dimensional as it has only rows and columns. But an array can be of any
number of dimensions and each dimension is a matrix. For example a 3x3x2 array represents 2
matrices each of dimension 3x3.
Which data object in R is used to store and process categorical data?
The Factor data objects in R are used to store and process categorical data in R.
How can you load and use csv file in R?
A csv file can be loaded using the read.csv function. R creates a data frame on reading the csv
files using this function.
How do you get the name of the current working directory in R?
The command getwd() gives the current working directory in the R environment.
What is R Base package?
This is the package which is loaded by default when R environment is set. It provides the basic
functionalities like input/output, arithmetic calculations etc. in the R environment.
How R is used in logistic regression?
Logistic regression deals with measuring the probability of a binary response variable. In R
the function glm() is used to create the logistic regression.
How do you access the element in the 2nd column and 4th row of a matrix named M?
The expression M[4,2] gives the element at 4th row and 2nd column.
What is recycling of elements in a vector? Give an example.
When two vectors of different length are involved in a operation then the elements of the shorter
vector are reused to complete the operation. This is called element recycling. Example - v1 <-
c(4,1,0,6) and V2 <- c(2,4) then v1*v2 gives (8,4,0,24). The elements 2 and 4 are repeated.
What are different ways to call a function in R?
We can call a function in R in 3 ways. First method is to call by using position of the
arguments. Second method id to call by using the name of the arguments and the third method
is to call by default arguments.
What is lazy function evaluation in R?
The lazy evaluation of a function means, the argument is evaluated only if it is used inside the
body of the function. If there is no reference to the argument in the body of the function then it
is simply ignored.
How do you install a package in R?
To install a package in R we use the below command.
install.packages("package Name")
installed.packages()
x <- "The quick brown fox jumps over the lazy dog"
print(result)
print(x)
15 %in% x
pairs(formula, data)
Where formula represents the series of variables used in pairs and data represents the data set
from which the variables will be taken.
What is the difference between subset() function and sample() function in R?
The subset() functions is used to select variables and observations. The sample() function is
used to choose a random sample of size n from a dataset.
How do you check if "m" is a matrix data object in R?
is.matrix(m) should retrun TRUE.
What is the output for the below expression all(NA==NA)?
[1] NA
How to obtain the transpose of a matrix in R?
The function t() is used for transposing a matrix. Example - t(m) , where m is a matrix.
What is the use of "next" statement in R?
The "next" statement in R programming language is useful when we want to skip the current
iteration of a loop without terminating it.
R - Data Types
Generally, while doing programming in any programming language, you
need to use various variables to store various information. Variables are
nothing but reserved memory locations to store values. This means that,
when you create a variable you reserve some space in memory.
You may like to store information of various data types like character, wide
character, integer, floating point, double floating point, Boolean etc. Based
on the data type of a variable, the operating system allocates memory and
decides what can be stored in the reserved memory.
Vectors
Lists
Matrices
Arrays
Factors
Data Frames
The simplest of these objects is the vector object and there are six data
types of these atomic vectors, also termed as six classes of vectors. The
other R-Objects are built upon the atomic vectors.
v <- TRUE
print(class(v))
[1] "logical"
print(class(v))
[1] "numeric"
v <- 2L
print(class(v))
[1] "integer"
v <- 2+5i
print(class(v))
[1] "complex"
v <- "TRUE"
print(class(v))
[1] "character"
print(class(v))
[1] "raw"
1) What is R Programming?
A) R is a language and environment for statistical computing and graphics. It is a GNU project which
is similar to the S language and environment which was developed at Bell Laboratories.
R can be considered as a different implementation of S. There are some important differences, but
much code written for S runs unaltered under R.
One of R’s strengths is the ease with which well-designed publication-quality plots can be produced,
including mathematical symbols and formulae where needed. Great care has been taken over the
defaults for the minor design choices in graphics, but the user retains full control.
matrices or more generally arrays are multi-dimensional generalizations of vectors. In fact, they are
vectors that can be indexed by two or more indices and will be printed in special ways. See Arrays
and matrices.
data frames are matrix-like structures, in which the columns can be of different types. Think of data
frames as ‘data matrices’ with one row per observational unit but with (possibly) both numerical and
categorical variables. Many experiments are best described by data frames: the treatments are
categorical but the response is numeric.
functions are themselves objects in R which can be stored in the project’s workspace. This provides
a simple and convenient way to extend R.
ds = read.csv("http://www.math.smith.edu/r/data/help.csv")
names(ds)
attach(ds)
mean(cesd)
[1] 32.84768
The search() function can be used to list attached objects and packages. Let's see what is there,
then detach() the dataset to clean up after ourselves.
search()
> search()
[1] ".GlobalEnv" "ds" "tools:RGUI" "package:stats"
[5] "package:graphics" "package:grDevices" "package:utils" "package:datasets"
[9] "package:methods" "Autoloads" "package:base"
detach(ds)
The first line of the file should have a name for each variable in the data frame.
Each additional line of the file has as its first item a row label and the values for each variable.
12) What are the generic functions for extracting model information in R?
A) The value of lm() is a fitted model object; technically a list of results of class "lm". Information
about the fitted model can then be displayed, extracted, plotted and so on by using generic functions
that orient themselves to objects of class "lm". These include
20) What is residuals(object)?
A) residuals() function extract the (matrix of) residuals, weighted as appropriate.
The R function to fit a generalized linear model is glm() which uses the form
Introduction
While working extensively on SAS-EG, I lost touch of coding in Base SAS. I had to
brush up my base SAS before appearing for my first lateral interview. SAS is highly
capable of data triangulation, and what distinguishes SAS from other such languages is
its simplicity to code.
There are some very tricky SAS questions and handling them might become
overwhelming for some candidates. I strongly feel a need of a common thread which
has all the tricky SAS questions asked in interviews. This article will give a kick start to
such a thread. This article will cover 4 of such questions with relevant examples. This
article is the first part of tricky SAS questions series. Please note that the content of
these articles is based on the information I gathered from various SAS sources.
And if you’re looking to land your first data science role – look no further than the ‘Ace
Data Science Interviews‘ course. It is a comprehensive course spanning tons of videos
and resources (including a mammoth interview questions and answers guide).
Merging datasets is the most important step for an analyst. Merging data can be done
through both DATA step and PROC SQL. Usually people ignore the difference in the
method used by SAS in the two different steps. This is because generally there is no
difference in the output created by the two routines. Lets look at the following example :
Problem Statement : In this example, we have 2 datasets. First table gives the product
holding for a particular household. Second table gives the gender of each customer in
these households. What you need to find out is that if the product is Male biased or
neutral. The Male biased product is a product bought by males more than females. You
can assume that the product bought by a household belongs to each customer of that
household.
Thought process: The first step of this problem is to merge the two tables. We need a
Cartesian product of the two tables in this case. After getting the merged dataset, all
you need to do is summarize the merged dataset and find the bias.
Code 1
Data MERGED;
by household;
if a AND b;
run;
Code 2 :
PROC SQL;
quit;
The answer is NO. As you might have noticed, the two tables have many-to-many
mapping. For getting a cartesian product, we can only use PROC SQL. Apart from
many-to-many tables, all the results of merging using the two steps will be exactly
same.
DATA-MERGE step is much faster compared to PROC SQL. For big data sets except
one having many-to-many mapping, always use DATA- MERGE.
2. Transpose data-sets :
a. DATA STEP :
drop Q1:Q3;
if Amount ne .;
run;
b. PROC TRANSPOSE :
by cust; run;
In this kind of transposition, both the methods are equally good. PROC TRANSPOSE
however takes lesser time because it uses indexing to transpose.
For this kind of transposition, data step becomes very long and time consuming.
Following is a much shorter way to do the same task,
by cust;
id period;
var amount;
run;
Imagine a scenario, we want to compare the total marks scored by two classes. Finally
the output should be simply the name of the class with the higher score. The score of
the two datasets is stored in two separate tables.
There are two methods of doing this question. First, append the two tables and sum the
total marks for each or the classes. But imagine if the number of students were too
large, we will just multiply the operation time by appending the two tables. Hence, we
need a method to pass the value from one table to another. Try the following code:
total + marks;
run;
total + marks;
call symputx ('class2_tot',total);
run;
DATA results;
else better_class = 0;
run;
Funtion symputx creates a macro variable which can be passed between various
routines and thus gives us an opportunity to link data-sets.
“Where” and “if” are both used for sub-setting. Most of the times where and if can be
used interchangeably in data step for sub-setting. But, when sub-setting is done on a
newly created variable, only if statement can be used. For instance, consider the
following two programs,
Code 1 : Code 2 :
Code 2 will give an error in this case, because where cannot be used for sub-setting
data based on a newly created variable.
End Notes :
These codes come directly from my cheat chit. What is especial about these 4 codes,
that in aggregate they give me a quick glance to almost all the statement and options
used in SAS. If you were able to solve all the questions covered in this article, we think
you are up for the next level. You can read the second part of this article here
( https://www.analyticsvidhya.com/blog/2014/04/tricky-base-sas-interview-questions-
part-ii/ ) . The second part of the article will have tougher and lengthier questions as
compared to those covered in this article
zss