Data Science Interview Q's - V

Data Science Interview Q’s — V
A walkthrough with/from the essentials of data science interviews.
Hi hey there, thanks for the continuous support for my previous articles. Today we
will continue from our previous article “Data Science Interview Q’s — IV” PART-IV,
the commonly asked essential questions by the interviewers to understand the root
level knowledge of DS rather than going for fancy advanced questions.
1. Is the decision boundary linear or nonlinear in the case of a

logistic regression model?
The decision boundary is a line that separates the target variables into different
classes. The decision boundary can either be linear or nonlinear. In case of a logistic
regression model, the decision boundary is a straight line.
Logistic regression model formula = α+1X1+2X2+….+kXk. This clearly represents a
straight line. Logistic regression is only suitable in such cases where a straight line is
able to separate the different classes. If a straight line is not able to do it, then
nonlinear algorithms should be used to achieve better results.
2. What is the likelihood function?
The likelihood function is the joint probability of observing the data. For example,
let’s assume that a coin is tossed 100 times and we want to know the probability of
getting 60 heads from the tosses. This example follows the binomial distribution
formula.
p = Probability of heads from a single coin toss
n = 100 (the number of coin tosses)
x = 60 (the number of heads — success)
n-x = 30 (the number of tails)
Pr(X=60 |n = 100, p)
The likelihood function is the probability that the number of heads received is 60 in a
trail of 100 coin tosses, where the probability of heads received in each coin toss is p.
Here the coin toss result follows a binomial distribution.
This can be reframed as follows:
Pr(X=60|n=100,p) = c x p60x(1-p)100–60
c = constant
p = unknown parameter
The likelihood function gives the probability of observing the results using unknown
parameters.
3. What is the Maximum Likelihood Estimator (MLE)?
The MLE chooses those sets of unknown parameters (estimator) that maximise the
likelihood function. The method to find the MLE is to use calculus and setting the
derivative of the logistic function with respect to an unknown parameter to zero,
and solving it will give the MLE. For a binomial model, this will be easy, but for a
logistic model, the calculations are complex. Computer programs are used for
deriving MLE for logistic models.
(Here’s another approach to answering the question.)
MLE is a statistical approach to estimating the parameters of a mathematical model.
MLE and ordinary square estimation give the same results for linear regression if the
dependent variable is assumed to be normally distributed. MLE does not assume
anything about independent variables.
4. What are the different methods of MLE and when is each method
preferred?
In case of logistics regression, there are two approaches of MLE. They are conditional
and unconditional methods. Conditional and unconditional methods are algorithms
that use different likelihood functions. The unconditional formula employs joint
probability of positives (for example, churn) and negatives (for example, non-churn).
The conditional formula is the ratio of the probability of observed data to the
probability of all possible configurations.
The unconditional method is preferred if the number of parameters is lower
compared to the number of instances. If the number of parameters is high compared
to the number of instances, then conditional MLE is to be preferred. Statisticians
suggest that conditional MLE is to be used when in doubt. Conditional MLE will
always provide unbiased results.
5. What are the advantages and disadvantages of conditional and

unconditional methods of MLE?
Conditional methods do not estimate unwanted parameters. Unconditional methods
estimate the values of unwanted parameters also. Unconditional formulas can
directly be developed with joint probabilities. This cannot be done with conditional
probability. If the number of parameters is high relative to the number of instances,
then the unconditional method will give biased results. Conditional results will be
unbiased in such cases.
6. What is the output of a standard MLE program?
The output of a standard MLE program is as follows:

Maximised likelihood value: This is the numerical value obtained by replacing the
unknown parameter values in the likelihood function with the MLE parameter
estimator.
Estimated variance-covariance matrix: The diagonal of this matrix consists of
estimated variances of the ML estimates. The off-diagonal consists of the
covariances of the pairs of the ML estimates.
7. Why can’t we use Mean Square Error (MSE) as a cost function for
logistic regression?
In logistic regression, we use the sigmoid function and perform a non-linear

transformation to obtain the probabilities. Squaring this non-linear transformation
will lead to non-convexity with local minimums. Finding the global minimum in such
cases using gradient descent is not possible. Due to this reason, MSE is not suitable
for logistic regression. Cross-entropy or log loss is used as a cost function for logistic
regression. In the cost function for logistic regression, the confident wrong
predictions are penalised heavily. The confident right predictions are rewarded less.
By optimising this cost function, convergence is achieved.
8. Why is accuracy not a good measure for classification problems?
Accuracy is not a good measure for classification problems because it gives equal
importance to both false positives and false negatives. However, this may not be the
case in most business problems. For example, in case of cancer prediction, declaring
cancer as benign is more serious than wrongly informing the patient that he is
suffering from cancer. Accuracy gives equal importance to both cases and cannot
differentiate between them.
9. What are the true positive rate (TPR), true negative rate (TNR),
false-positive rate (FPR), and false-negative rate (FNR)?
TPR refers to the ratio of positives correctly predicted from all the true labels. In
simple words, it is the frequency of correctly predicted true labels.
TPR = TP/TP+FN
TNR refers to the ratio of negatives correctly predicted from all the false labels. It is
the frequency of correctly predicted false labels.
TNR = TN/TN+FP
FPR refers to the ratio of positives incorrectly predicted from all the true labels. It is
the frequency of incorrectly predicted false labels.
FPR = FP/TN+FP
FNR refers to the ratio of negatives incorrectly predicted from all the false labels. It
is the frequency of incorrectly predicted true labels.
FNR = FN/TP+FN
10. What are precision and recall?
Precision is the proportion of true positives out of predicted positives. To put it in

another way, it is the accuracy of the prediction. It is also known as the ‘positive
predictive value’.
Precision = TP/TP+FP
Recall is same as the true positive rate (TPR).
11. What is F-measure?
It is the harmonic mean of precision and recall. In some cases, there will be a trade-
off between the precision and the recall. In such cases, the F-measure will drop. It
will be high when both the precision and the recall are high. Depending on the
business case at hand and the goal of data analytics, an appropriate metric should be
selected.
F-measure = 2 X (Precision X Recall) / (Precision+Recall)
12. What are sensitivity and specificity?
Specificity is the same as true negative rate, or it is equal to 1 — false-positive rate.

Specificity = TN/TN + FP.
Sensitivity is the true positive rate.
Sensitivity = TP/TP + FN
13. What is a cumulative response curve (CRV)?
In order to convey the results of an analysis to the management, a ‘cumulative

response curve’ is used, which is more intuitive than the ROC curve. A ROC curve is
very difficult to understand for someone outside the field of data science. A CRV
consists of the true positive rate or the percentage of positives correctly classified on
the Y-axis and the percentage of the population targeted on the X-axis. It is
important to note that the percentage of the population will be ranked by the model
in descending order (either the probabilities or the expected values). If the model is
good, then by targeting a top portion of the ranked list, all high percentages of
positives will be captured. As with the ROC curve, there will be a diagonal line which
represents random performance. Let’s understand this random performance as an
example. Assuming that 50% of the list is targeted, it is expected that it will capture
50% of the positives. This expectation is captured by the diagonal line, which is
similar to the ROC curve.
9. What are the lift curves?
The lift is the improvement in model performance (increase in true positive rate)
when compared to random performance. Random performance means if 50% of the
instances is targeted, then it is expected that it will detect 50% of the positives. Lift is
in comparison to the random performance of a model. If a model’s performance is
better than its random performance, then its lift will be greater than 1.
In a lift curve, lift is plotted on the Y-axis and the percentage of the population
(sorted in descending order) on the X-axis. At a given percentage of the target
population, a model with a high lift is preferred.
10. Which algorithm is better at handling outliers logistic

regression or SVM?
Logistic regression will find a linear boundary if it exists to accommodate the outliers.
Logistic regression will shift the linear boundary in order to accommodate the
outliers. SVM is insensitive to individual samples. There will not be a major shift in
the linear boundary to accommodate an outlier. SVM comes with inbuilt complexity
controls, which take care of overfitting. This is not true in case of logistic regression.
11. How will you deal with the multiclass classification problem
using logistic regression?
The most famous method of dealing with multiclass classification using logistic
regression is using the one-vs-all approach. Under this approach, a number of
models are trained, which is equal to the number of classes. The models work in a
specific way. For example, the first model classifies the datapoint depending on
whether it belongs to class 1 or some other class; the second model classifies the
datapoint into class 2 or some other class. This way, each data point can be checked
over all the classes.
12. Explain the use of ROC curves and the AUC of an ROC Curve.
An ROC (Receiver Operating Characteristic) curve illustrates the performance of a

binary classification model. It is basically a TPR versus FPR (true positive rate versus
false-positive rate) curve for all the threshold values ranging from 0 to 1. In a ROC
curve, each point in the ROC space will be associated with a different confusion
matrix. A diagonal line from the bottom-left to the top-right on the ROC graph
represents random guessing. The Area Under the Curve (AUC) signifies how good the
classifier model is. If the value for AUC is high (near 1), then the model is working
satisfactorily, whereas if the value is low (around 0.5), then the model is not working
properly and just guessing randomly.
13. How can you use the concept of ROC in a multiclass

classification?
The concept of ROC curves can easily be used for multiclass classification by using
the one-vs-all approach. For example, let’s say that we have three classes ‘a’, ’b’, and
‘c’. Then, the first class comprises class ‘a’ (true class) and the second class comprises
both class ‘b’ and class ‘c’ together (false class). Thus, the ROC curve is plotted.
Similarly, for all the three classes, we will plot three ROC curves and perform our
analysis of AUC.
14. What is the use of regularisation? Explain L1 and L2

regularisations.
Regularisation is a technique that is used to tackle the problem of overfitting of the

model. When a very complex model is implemented on the training data, it overfits.
At times, the simple model might not be able to generalise the data and the complex
model overfits. To address this problem, regularisation is used.
Regularisation is nothing but adding the coefficient terms (betas) to the cost function
so that the terms are penalised and are small in magnitude. This essentially helps in
capturing the trends in the data and at the same time prevents overfitting by not
letting the model become too complex.
· L1 or LASSO regularisation: Here, the absolute values of the coefficients are added
to the cost function. This can be seen in the following equation; the highlighted part
corresponds to the L1 or LASSO regularisation. This regularisation technique gives
sparse results, which lead to feature selection as well.
· L2 or Ridge regularisation: Here, the squares of the coefficients are added to the
cost function. This can be seen in the following equation, where the highlighted part
corresponds to the L2 or Ridge regularisation.
15. How to choose the value of the regularisation parameter (λ)?
Selecting the regularisation parameter is a tricky business. If the value of λ is too high,
it will lead to extremely small values of the regression coefficient β, which will lead
to the model underfitting (high bias — low variance). On the other hand, if the value
of λ is 0 (very small), the model will tend to overfit the training data (low bias — high
variance).
There is no proper way to select the value of λ. What you can do is have a sub-
sample of data and run the algorithm multiple times on different sets. Here, the
person has to decide how much variance can be tolerated. Once the user is satisfied
with the variance, that value of λ can be chosen for the full dataset.
One thing to be noted is that the value of λ selected here was optimal for that subset,
not for the entire training data.
16. Can we use linear regression for time series analysis?
One can use linear regression for time series analysis, but the results are not
promising. So, it is generally not advisable to do so. The reasons behind this are —
1. Time series data is mostly used for the prediction of the future, but linear
regression seldom gives good results for future prediction as it is not meant for
extrapolation.
2. Mostly, time series data have a pattern, such as during peak hours, festive seasons,
etc., which would most likely be treated as outliers in the linear regression analysis.
17. What value is the sum of the residuals of a linear regression close to? Justify.
The sum of the residuals of a linear regression is 0. Linear regression works on the
assumption that the errors (residuals) are normally distributed with a mean of 0,
18. You run your regression on different subsets of your data, and in each subset,
the beta value for a certain variable varies wildly. What could be the issue here?
This case implies that the dataset is heterogeneous. So, to overcome this problem,
the dataset should be clustered into different subsets, and then separate models
should be built for each cluster. Another way to deal with this problem is to use non-
parametric models, such as decision trees, which can deal with heterogeneous data
quite efficiently.
19. Your linear regression doesn’t run and communicates that there is an infinite
number of best estimates for the regression coefficients. What could be wrong?
This condition arises when there is a perfect correlation (positive or negative)

between some variables. In this case, there is no unique value for the coefficients,
and hence, the given condition arises.
20. What do you mean by adjusted R2? How is it different from R2?
Adjusted R2, just like R2, is a representative of the number of points lying around the
regression line. That is, it shows how well the model is fitting the training data.
One drawback of R2 is that it will always increase with the addition of a new feature,
whether the new feature is useful or not. The adjusted R2 overcomes this drawback.
The value of the adjusted R2 increases only if the newly added feature plays a
significant role in the model
21. How do you interpret the residual vs fitted value curve?
The residual vs fitted value plot is used to see whether the predicted values and
residuals have a correlation or not. If the residuals are distributed normally, with a
mean around the fitted value and a constant variance, our model is working fine;
otherwise, there is some issue with the model.
The most common problem that can be found when training the model over a large
range of a dataset is heteroscedasticity(this is explained in the answer below). The
presence of heteroscedasticity can be easily seen by plotting the residual vs fitted
value curve.
22. What is heteroscedasticity? What are the consequences, and how can you
overcome it?
A random variable is said to be heteroscedastic when different subpopulations have

different variabilities (standard deviation).
The existence of heteroscedasticity gives rise to certain problems in the regression
analysis as the assumption says that error terms are uncorrelated and, hence, the
variance is constant. The presence of heteroscedasticity can often be seen in the
form of a cone-like scatter plot for residual vs fitted values.
One of the basic assumptions of linear regression is that heteroscedasticity is not
present in the data. Due to the violation of assumptions, the Ordinary Least Squares
(OLS) estimators are not the Best Linear Unbiased Estimators (BLUE). Hence, they do
not give the least variance than other Linear Unbiased Estimators (LUEs).
There is no fixed procedure to overcome heteroscedasticity. However, there are
some ways that may lead to a reduction of heteroscedasticity. They are —
1. Logarithmising the data: A series that is increasing exponentially often results in

increased variability. This can be overcome using the log transformation.
2. Using weighted linear regression: Here, the OLS method is applied to the weighted
values of X and Y. One way is to attach weights directly related to the magnitude of
the dependent variable.
23. How is hypothesis testing used in linear regression?
Hypothesis testing can be carried out in linear regression for the following purposes:
1. To check whether a predictor is significant for the prediction of the target variable.
Two common methods for this are —
2. By the use of p-values:

If the p-value of a variable is greater than a certain limit (usually 0.05), the variable is
insignificant in the prediction of the target variable.
3. By checking the values of the regression coefficient:
If the value of regression coefficient corresponding to a predictor is zero, that
variable is insignificant in the prediction of the target variable and has no linear
relationship with it.
4. To check whether the calculated regression coefficients are good estimators of the
actual coefficients.
24. Which graphs are suggested to be observed before model fitting?
Before fitting the model, one must be well aware of the data, such as what the
trends, distribution, skewness, etc. in the variables are. Graphs such as histograms,
box plots, and dot plots can be used to observe the distribution of the variables.
Apart from this, one must also analyse what the relationship between dependent
and independent variables is. This can be done by scatter plots (in case of univariate
problems), rotating plots, dynamic plots, etc.
25. What is the generalized linear model?
The generalized linear model is the derivative of the ordinary linear regression model.
GLM is more flexible in terms of residuals and can be used where linear regression
does not seem appropriate. GLM allows the distribution of residuals to be other than
a normal distribution. It generalizes the linear regression by allowing the linear
model to link to the target variable using the linking function. Model estimation is
done using the method of maximum likelihood estimation.
26. You will see two statements listed below. You will have to read both of them
carefully and then choose one of the options from the two statements’ options.
The contextual question is, Choose the statements which are true about bagging
trees.
1. The individual trees are not at all dependent on each other for a bagging tree.
2. To improve the overall performance of the model, the aggregate is taken from
weak learners. This method is known as bagging trees.
3. Only statement number one is TRUE.
4. Only statement number two is TRUE.
5. Both statements one and two are TRUE.
6. None of the options which are mentioned above.
Ans. The correct answer to this question is C because, for a bagging tree, both of
these statements are true. In bagging trees or bootstrap aggregation, the main goal
of applying this algorithm is to reduce the amount of variance present in the decision
tree. The mechanism of creating a bagging tree is that with replacement, a number
of subsets are taken from the sample present for training the data.
Now, each of these smaller subsets of data is used to train a separate decision tree.
Since the information which is fed into each tree comes out to be unique, the
likelihood of any tree having any impact on the other becomes very low. The final
result which all these trees give is collected and then processed to provide the
output. Thus, the second statement also comes out to be true.
The contextual question is, Choose the statements which are true about boosting
trees.
1. The weak learners in a boosting tree are independent of each other.
2. The weak learners’ performance is all collected and aggregated to improve the
boosted tree’s overall performance.
Ans. If you were to understand how the boosting of trees is done, you will
understand and will be able to differentiate the correct statement from the
statement, which is false. So, a boosted tree is created when many weak learners are
connected in series. Each tree present in this sequence has one sole aim: to reduce
the error which its predecessor made.
If the trees are connected in such fashion, all the trees cannot be independent of
each other, thus rendering the first statement false. When coming to the second
statement, it is true mainly because, in a boosted tree, that is the method that is
applied to improve the overall performance of the model. The correct option will be
B, i.e., only the statement number two is TRUE, and the statement number one is
FALSE.
28. You will see four statements listed below. You will have to read all of them
carefully and then choose one of the options from the options which follows the
four statements. The contextual question is, Choose the statements which are true
about Radom forests and Gradient boosting ensemble method.
1. Both Random forest and Gradient boosting ensemble methods can be used to
perform classification.
2. Random Forests can be used to perform classification tasks, whereas the gradient
boosting method can only perform regression.
3. Gradient boosting can be used to perform classification tasks, whereas the

Random Forest method can only perform regression.
4. Both Random forest and Gradient boosting ensemble methods can be used to
perform regression.
8. Only statement number three is TRUE
9. Only statement number four is TRUE
10. Only statement number one and four is TRUE
Ans. The answer to this question is straightforward. Both of these ensemble

methods are actually very capable of doing both classification and regression tasks.
So, the answer to this question would be F because only statements number one and
four are TRUE.
four statements. The contextual question is, consider a random forest of trees. So
what will be true about each or any of the trees in the random forest?
1. Each tree that constitutes the random forest is based on the subset of all the
features.
2. Each of the in a random forest is built on all the features.
3. Each of the trees in a random forest is built on a subset of all the observations
present.
4. Each of the trees in a random forest is built on the full observation set.

10. Both statements number one and four are TRUE
11. Both the statements number one and three are TRUE
12. Both the statements number two and three are TRUE
13. Both the statements number two and four are TRUE
Ans. The generation of random forests is based on the concept of bagging. To build a
random forest, a small subset is taken from both the observations and the features.
The values which are obtained after taking out the subsets are then fed into singular
decision trees. Then all the values from all such decision trees are collected to make
the final decision. That means the only statements which are correct would be one
and three. So, the right option would be G.
four statements. The contextual question is, select the correct statements about
the hyperparameter known as “max_depth” of the gradient boosting algorithm.
1. Choosing a lower value of this hyperparameter is better if the validation set’s

accuracy is similar.
2. Choosing a higher value of this hyperparameter is better if the validation set’s

accuracy is similar.
3. If we are to increase this hyperparameter’s value, then the chances of this model
actually overfitting the data increases.
4. If we are to increase this hyperparameter’s value, then the chances of this model
actually underfitting the data increases.
14.
Ans. The hyperparameter max_depth controls the depth until the gradient boosting
will model the presented data in front of it. If you keep on increasing the value of
this hyperparameter, then the model is bound to overfit. So, statement number
three is correct. If we have the same scores on the validation data, we generally
prefer the model with a lower depth. So, statements number one and three are
correct, and thus the answer to this decision tree interview questions is g.
four statements. The contextual question is which of the following methods does
not have a learning rate as one of their tunable hyperparameters.
1. Extra Trees.
2. AdaBoost
3. Random Forest
4. Gradient boosting.
Ans. Only Extra Trees and Random forest does not have a learning rate as one of
their tunable hyperparameters. So, the answer would be g because the statement
number one and three are TRUE.
32. Choose the option, which is true.
1. Only in the algorithm of random forest, real values can be handled by making
them discrete.
2. Only in the algorithm of gradient boosting, real values can be handled by making
them discrete.
3. In both random forest and gradient boosting, real values can be handled by
making them discrete.
Ans. Both of the algorithms are capable ones. They both can easily handle the
features which have real values in them. So, the answer to this decision tree
interview questions and answers is C.
33. Choose one option from the list below. The question is, choose the algorithm
which is not an ensemble learning algorithm.
1. Gradient boosting
2. AdaBoost
3. Extra Trees
4. Random Forest
5. Decision Trees
Ans. This question is straightforward. Only one of these algorithms is not an

ensemble learning algorithm. One thumb rule to keep in mind will be that any
ensemble learning method would involve the use of more than one decision tree.
Since in option E, there is just the singular decision tree, then that is not an ensemble
learning algorithm. So, the answer to this question would be E (decision trees).
The contextual question is, which of the following would be true in the paradigm
of ensemble learning.
1. The tree count in the ensemble should be as high as possible.
2. You will still be able to interpret what is happening even after you implement the
algorithm of Random Forest.

Ans. Since any ensemble learning method is based on coupling a colossal number of
decision trees (which on its own is a very weak learner) together so it will always be
beneficial to have more number of trees to make your ensemble method. However,
the algorithm of random forest is like a black box. You will not know what is
happening inside the model. So, you are bound to lose all the interpretability after
you apply the random forest algorithm. So, the correct answer to this question
would be A because only the statement that is true is statement number one.
35. Answer in only in TRUE or FALSE. Algorithm of bagging works best for the
models which have high variance and low bias?
Ans. True. Bagging indeed is most favorable to be used for high variance and low
bias model.
36 . You will see two statements listed below. You will have to read both of them
The contextual question is, choose the right ideas for Gradient boosting trees.
1. In every stage of boosting, the algorithm introduces another tree to ensure all the
current model issues are compensated.
2. We can apply a gradient descent algorithm to minimize the loss function.
Ans. The answer to this question is C meaning both of the two options are TRUE. For
the first statement, that is how the boosting algorithm works. The new trees
introduced into the model are just to augment the existing algorithm’s performance.
Yes, the gradient descent algorithm is the function that is applied to reduce the loss
function.
37. In the gradient boosting algorithm, which of the statements below are correct
about the learning rate?
1. The learning rate which you set should be as high as possible.

2. The learning rate which you set should not be as high as possible rather as low as
you can make it.
3. The learning rate should be low but not very low.
4. The learning rate which you are setting should be high but not super high.
Ans. The learning rate should be low, but not very low, so the answer to this decision
tree interview questions and answers would be option C.
I hope you will find the questionnaires useful for your career and also the credit goes
to up grad from i was able to gather this set of interview questionnaires for you.
Next, we will walk through the more advanced topics of data science like comparing
the 2 machine learning models.
Thanks again, for your time, if you enjoyed this short article there are tons of topics in
advanced analytics, data science, and machine learning available in my medium repo.
https://medium.com/@bobrupakroy
Some of my alternative internet presences Facebook, Instagram, Udemy, Blogger,

Issuu, Slideshare, Scribd and more.
Also available on Quora @ https://www.quora.com/profile/Rupak-Bob-Roy
Let me know if you need anything. Talk Soon.

Data Science Interview Q's - V

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Science Interview Q's - V

Uploaded by

Copyright:

Available Formats

Data Science Interview Q’s — V

A walkthrough with/from the essentials of data science interviews.

1. Is the decision boundary linear or nonlinear in the case of a

2. What is the likelihood function?

3. What is the Maximum Likelihood Estimator (MLE)?

5. What are the advantages and disadvantages of conditional and

6. What is the output of a standard MLE program?

The output of a standard MLE program is as follows:

In logistic regression, we use the sigmoid function and perform a non-linear

8. Why is accuracy not a good measure for classification problems?

10. What are precision and recall?

Precision is the proportion of true positives out of predicted positives. To put it in

11. What is F-measure?

12. What are sensitivity and specificity?

Specificity is the same as true negative rate, or it is equal to 1 — false-positive rate.

13. What is a cumulative response curve (CRV)?

In order to convey the results of an analysis to the management, a ‘cumulative

10. Which algorithm is better at handling outliers logistic

An ROC (Receiver Operating Characteristic) curve illustrates the performance of a

13. How can you use the concept of ROC in a multiclass

14. What is the use of regularisation? Explain L1 and L2

Regularisation is a technique that is used to tackle the problem of overfitting of the

15. How to choose the value of the regularisation parameter (λ)?

16. Can we use linear regression for time series analysis?

This condition arises when there is a perfect correlation (positive or negative)

21. How do you interpret the residual vs fitted value curve?

A random variable is said to be heteroscedastic when different subpopulations have

1. Logarithmising the data: A series that is increasing exponentially often results in

23. How is hypothesis testing used in linear regression?

2. By the use of p-values:

24. Which graphs are suggested to be observed before model fitting?

25. What is the generalized linear model?

3. Only statement number one is TRUE.

4. Only statement number two is TRUE.

5. Both statements one and two are TRUE.

6. None of the options which are mentioned above.

1. The weak learners in a boosting tree are independent of each other.

3. Only statement number one is TRUE.

4. Only statement number two is TRUE.

5. Both statements one and two are TRUE.

6. None of the options which are mentioned above.

3. Gradient boosting can be used to perform classification tasks, whereas the

5. Only statement number one is TRUE.

6. Only statement number two is TRUE.

7. Both statements one and two are TRUE.

8. Only statement number three is TRUE

9. Only statement number four is TRUE

10. Only statement number one and four is TRUE

Ans. The answer to this question is straightforward. Both of these ensemble

2. Each of the in a random forest is built on all the features.

5. Only statement number one is TRUE.

6. Only statement number two is TRUE.

7. Both statements one and two are TRUE.

8. Only statement number three is TRUE

10. Both statements number one and four are TRUE

1. Choosing a lower value of this hyperparameter is better if the validation set’s

2. Choosing a higher value of this hyperparameter is better if the validation set’s

5. Only statement number one is TRUE.

6. Only statement number two is TRUE.

7. Both statements one and two are TRUE.

8. Only statement number three is TRUE

9. Only statement number four is TRUE