Download as rtf, pdf, or txt
Download as rtf, pdf, or txt
You are on page 1of 9

1) Logistic Regression: Technique to predict binary outcome from a linear

combination of predictor variables.


2) Recommender Systems: Technique to predict the ratings or preferences that
a user would give to a product. Subclass of information filtering system.
3) Why data cleaning plays a pivotal role in analysis?
a. Cleaning data converts the data collected from various sources into a
format that the data analysts/scientist can work on.
b. It increases the efficiency of the algorithm.
4) What is Univariate, Bivariate and Multivariate analysis?
a. These are descriptive statistical analysis techniques that can be
differentiated based on no. of variables involved at a given point of
time.
5) What is ML?
ML is the science of getting computers to act without being explicitly
programmed. ML is a branch of AI which focus on use of data and
algorithms to imitate the way humans learn, gradually improving accuracy.
6) What is Supervised ML?
SML will be used if the output variable in the problem statement can be
classified or predicted. E.g.: KNN, Naïve Bayes, SVM, Decision Tree, Random
Forest, Neural Network.
7) What is Unsupervised ML?
In this, there won’t be any output variable to be predicted. Instead,
algorithm understands the pattern in the data. E.g.: Segmentation, SVD,
PCA, Market Basket Analysis, Recommender Systems
8) Classification Modelling
When the observations must be classified not predicted. E.g.: Cancerous
and non-cancerous tumours, Carpool.
9) What are Linkages in hierarchical clustering?
Linkage is the criteria based on which the distance between the clusters and
the distance of individual observation within the clusters are calculated.
Single: Distance b/w two clusters is the shortest dist. b/w any two
points in those clusters.
Complete: Distance b/w two clusters is the longest dist. b/w any two
points in those clusters.
Average: Distance b/w two clusters is the average of all the distances
of all the points in those clusters.
Centroid: Distance b/w two clusters is the distance b/w the centroids
of two clusters.
10) What is the function used to get the same result if the program re-run
again: set.seed()
11) Which is the non-parametric algorithm?
KNN is called non-parametric algorithm since it makes no assumption about
the underlying data.
12) How to choose value of k in KNN?
a. Sqrt(no.of obns/2)
b. Scree plot/Elbow plot
c. K-fold cross validation.
13) What is Probability?
No. of interested events/Total no. of events
14) What is Joint Probability?
Probability of two events occurring at the same time.
15) What is Bayes’ Theorem?
Bayes’ theorem finds the probability of an event occurring given the
probability of another event that has already occurred.
P(A|B) = P(B|A) * P(A)/P(B)
16) What is the assumption of Naive Bayes Classifier?
The fundamental assumption is that each independent variable
independently and equally contributes to the outcome.
17) What is SVM?
In SVM, we plot each data point in n-dimensional space. Then we perform
classification by finding the hyper plane that differentiates the classes well.
18) What are the tuning parameters in SVM?
Kernel, Regularization, Gamma and Margin
19) What is Kernel in SVM?
Kernel tricks are the transformations applied on input variables to separate
non-separable data to separable data.
20) What is Regularization parameter in SVM?
Regularization parameter tells the training model as to how much it can
avoid misclassifying each training observation.
21) What is Gamma parameter in SVM?
Gamma is the kernel coefficient in the kernel tricks RBF, Polynomial and
Sigmoid. Higher Gamma make model more complex and overfits the model.
22) What is Margin in SVM?
Margin is the separation line to the closest class of data points. Larger the
margin width, better is the classification.
23) What is Decision Tree?
DT is a SML algorithm used for regression and classification analysis. It is a
tree - like structure in which internal node represents on an attribute, each
branch represent outcome of test and leaf node represents class label.
24) What are rules in DT?
A path from root node to leaf node represents classification rules.
25) What is impurity in DT?
Impurity is the measure of homogeneity of labels at the node. For
Classification problems, we use Gini Impurity and Entropy, whereas for
regression we use Variance.
26) What is Pruning in DT?
Pruning is the process of removing sub nodes which contribute less power
to the DT model.
27) Advantages of pruning:
Pruning reduces complexity of the DT model which in turn reduces
overfitting of model.
28) Difference between Entropy and Information Gain?
Entropy is the probabilistic measure of uncertainty or impurity, whereas
Information Gain is the reduction of the uncertainty.
29) What is Random Forest?
RF is an Ensemble Classifier. RF builds many DTs and combines the output of
all DTs to give a stable output.
RF provides randomness to the model, since unlike DT which tries to get the
best feature to make the root node, RF searches the best feature among a
subset of input variables. This results in greater accuracy generally.
30) Pros and Cons of RF
Pros: Wont overfit the model, provides reliable accuracy, can work on large
datasets without any deletion of input variables, can handle outliers and
missing values very well.
Cons: Due to large number of trees the algorithm may tend to be slow for
real – time predictions.
31) What is Neural Network?
NN is a SML algorithm that tries to recognize the underlying relationship in
a set of data through’ a process like how human brain operates. It consists
of input layers, hidden layers and output layers. Various types of NN are
Artificial NN, Recurrent NN, Convolutional NN etc.
32) What is the use of Activation Function in NN?
Activation Function is used to convert an input signal of a node into output
signal, which is used as an input in the next layer in the stack.
33) Different types of Activation Functions?
Sigmoid or Logistic, tanh or Hyperbolic Tangent, ReLU or Rectified Linear
Units, Leaky ReLU.
34) What is Probability Distribution?
Probability Distribution is a function that provides the probabilities of
occurrence of different possible outcomes in an event. In a probability
distribution, the random variables are plotted in X-axis and associated
probabilities in Y-axis.
Types of Probability Distribution are:
Discrete PD: random variable is discrete
Continuous PD: random variable is continuous.
35) What is Expected Value (EV)?
EV or mean is the expected outcome of a given experiment calculated as
weighted averages of all possible values of random variables and their
probabilities.
36) What is Datatype?
Datatypes are classifications that specifies what type of values a variable
has and what type of mathematical, relational or logical operations can be
applied to it.
37) Classifications of Datatypes?
In a broader sense, we can classify Datatypes as Quantitative and
Qualitative. Again, they are further classified as Nominal, Ordinal, Interval
and Ratio type.
Nominal – Name or Label. E.g., Name on a Jersey
Ordinal – It has some natural ordering associated. E.g., Shirt size,
Educational Qualification
Interval – numerical scales with order and exact difference b/w two values,
but don’t have a ‘true zero’. E.g., Temp, Date
Ratio – values with order, exact difference and true zero b/w two values.
E.g., Height and Weight of students in a class.
38) Output of all(NA==NA)
NA
39) What is Binomial Distribution?
Binomial Distribution describes the probability of success or failure of
outcome when an experiment is conducted multiple times.
40) What is Normal Distribution?
Normal Distribution or bell curve is a probability function that describes
how the values of a variable are distributed by its mean and std deviation.
41) Uniform Distribution – Distribution that has Constant Probability.
42) Bar Plot – when the variables are categorical in nature.
43) What is the probability associated with a single value of continuous
random variable – zero.
44) What is Empirical Rule?
Empirical Rule, also known as 68–95–99.7 rule state that in statistics, 99.7%
of data occurs within three standard deviations of the mean within a
normal distribution. To this end, 68% of the observed data will occur within
the first standard deviation, 95% will take place in the second deviation, and
97.5% within the third standard deviation. The empirical rule predicts the
probability distribution for a set of outcomes. If µ represent mean and σ
represent std devn, then 68% of data lies within µ ± σ..
45) What is the min no of observations reqd to come up with a Linear
Regression output – 2. Since output of Linear Regression is in the form of a
straight line.
46) What is Standard Normal Distribution?
Z= (x-µ)/σ
Standard Normal Distribution is a Normal Distribution with mean of zero
and standard deviation of one. We obtain std normal distn by using Z
Transformation tech. In Normal Distribution, the results will be biased
towards observations with higher values. By using Std Normal Distribution
we can avoid this.

47) What is Q-Q plot ?


Q-Q or Quantile-Quantile plot is used to find whether the data follows a
normal distribution or not. It is plotted by considering actual values of
variables in Y-axis and their standardised values in X-axis. Hence it is also
called Normal Q-Q plot.
Reference Line in the Q-Q plot indicates the Normal Distribution. If most of
the data points are falling across the reference line, then we say the
distribution of the data follows Normal Distribution.
48) Explain Central Limit Theorem (CLT).
CLT explains about the distribution of sample data. The distribution of
sample data approximates a normal distribution irrespective of the
population distribution, if the sample size is sufficiently large.
49) When do we go for t-distribution?
When the population std deviation is unknown and the size of sample is less
than 30.
50) What do you mean by Hypothesis Testing?
It is a way of testing results of an experiment to check whether they are
valid or happened by chance.
51) What is True Positive Rate / Recall?
True Positive Rate also called Recall = TP/(TP+FN)
52) Diff b/w Covariance and Correlation?
Covariance is the measure of how two variables change together. It is
difficult to compare because of having different scales. Correlation is the
standardised form of Covariance. It has value b/w -1 to +1 irrespective of
scale.
53) Diff b/w Gini Index and Entropy?
Gini Index – Measure of probability of a specific feature classified
incorrectly if selected randomly. Value varies from 0 – 0.5.
Entropy – measure of randomness/impurity in the information being
processed. Value varies from 0 – 1 .
54) what is Logloss?
Log loss or Cross Entropy loss measures the performance of a classification
model whose output is a probability value b/w 0 and 1. Generally Log loss is
used for binary classes and cross entropy loss is used for multi class
classifications.
55) Similarities b/w KNN and KMeans Clustering?
Both are trial and error method, we try with different values of K to find the
best fit. Another similarity is that both algorithms use distance measure
calculations.
56) In Logistic Regression, what are the measures we can use to decide the
goodness of the model?
Other than accuracy, we can use Sensitivity (Recall), Specificity, ROC curve,
confusion matrix, AIC values are the other measures that we can rely on.
57) Why are ensemble models more preferred than other classification
models?
Ensemble models ensure the reliability of the accuracy. Also, it uses the
classification techniques intelligently by taking subset of features and
parameters to do the classifications. It averages out biases, reduce variance,
and are less likely to overfit.
58) Pre-processing steps before building a Recommendation system?
Missing value imputation, normalization, SVD or PCA or Clustering,
similarity measures.
59) Apriori algorithm uses à Affinity Analysis
60) How is User based collaborative filtering different from Item based
collaborative filtering?
In Item based filtering, we try to find the similarity among items which will
be more computationally complex since no. of items will be more. On the
other hand, User based filtering try to find the similarity among users,
hence we connect to the users’ tastes.
61) What is the first thing we do after getting a dataset?
First, we must look for any missing values/NA values. NA values should be
treated by missing value imputation rather than deleting those observation.
62) What is sampling bias?
It is a statistical error that causes a bias in the sampling portion of the
experiment. This error causes one sampling group to be selected more
often than other groups which causes an inaccurate solution.
63) What is ROC Curve? How it works? What is AUC-ROC?
ROC Curve or Receiver Operating Characteristics Curve. AUC or Area Under
Curve measures how well a model can distinguish between classes. ROC
curve is a graphical rep between True Positive Rate (TPR) and False Positive
Rate (FPR).
64) Define Recall and Precision?
Recall is also known as true positive rate, the amount of positives the model
claims compared to the actual number of positives throughout the data.
Precision is known as the positive predictive value, a measure of amount of
accurate positives your model claims compared to the number of positives
it claims.
65) What s Type – I and Type – II errors?
Type – I error is false positive whereas Type – II error is false negative.
66) What is Over-fitting? How to avoid Over-fitting?
Over-fitting occurs when a model studies the training data to such an extent
that it negatively effects the performance of the model on new data.
To avoid over-fitting, we can use the following methods:
a. Keep the model simple by using fewer variables and parameters, thereby
reducing variance and noises in the data.
b. Train the model on varied samples of the data using tech such as k-fold
cross-validation.
c. Use regularization techniques like LASSO that penalize certain model
parameters if they’re likely to cause over fitting.
d. Use ensemble techniques.
67) What evaluation techniques we use to gauge the effectiveness of a ML
model? How would you check the effectiveness of a ML model?
We can split the dataset into testing and training sets or use cross-validation
techniques to segment the dataset into train and test sets within data. Then
implement a selection of performance matrix like F1 score, accuracy,
confusion matrix alongside error measures in relevance to the problem.
68) What is kernel trick and how is it useful?
Kernel trick involves kernel functions that can function in higher
dimensions. The kernel function maps lower dimensional data into higher
dimensional and non-linear data will become linear in higher dimension.
Kernel trick enables us effectively run algorithms in higher dimension with
lower dimension data.
69) How is kNN different from k-means clustering?
k-means clustering partitions the dataset into clusters such that a cluster
formed is homogeneous and to maintain enough separability between
clusters. Due to unsupervised nature, there will not be any labels for the
clusters. kNN is a classification algorithm which classify an unlabelled
observation based on its k surrounding neighbours.
70) Explain PCA?
PCA or Principal Component Analysis is a method for transforming features
in a dataset into uncorrelated linear combinations. These new features or
the principal components sequentially represent the variance from max to
min. Hence, we can use PCA for dimensionality reduction.
71) What is GD (Gradient Descent) and SGD (Stochastic Gradient Descent)
GD and SGD are methods to minimize error or loss function.

You might also like