Dats501 2021 Spring Week14 Exam2

12/29/21, 6:16 PM dats501_2021_spring_week14_exam2
dats501_2021_spring_week14_exam2
30 questions
1. What is the median of the following set of numbers? { 10, 5, 14, 11, 8, 6, 5, 6, 10, 12, 9 }
A. 6 B. 10
C. 9 D. 8
2. Which of the below are not measures of variability?
i. median
ii. range
iii. mode
iv. coefficient of variation

A. ii, iv B. i, iv
C. i, iii D. ii, iii
3. If a distribution is right-skewed, which of the below is/are correct?
i. The mean value is greater than the median value
ii. The mode is generally smaller than the median value
iii. It is also called positively-skewed

A. i, ii, iii B. i, iii
C. i, ii D. ii, iii
4. How many of the below are assumptions of linear regression?
i. linearity
ii. independence of errors
iii. normality of errors
iv. equal variance
v. nominal inputs
vi. narrow variance

A. 4 B. 2
C. 3 D. 1
5. When an important variable is not available, another variable can try to explain it wrongly. It is omitted
variable bias. Which of the below situations requires checking for such a case?
i. Coefficients of the variables are not logical to business owner's experience
ii. Adding new variables change the current coefficients of the variables dramatically
A. i B. ii
C. none D. i - ii
6. Which of the below are correct while converting categorical data to numeric?
i. Each category in the variable is expressed as a dummy variable
ii. These dummy variables consist of boolean values
iii. A boolean value shows whether that category is available for that data point or not
iv. Converting a variable with 2 categories results in 2 new dummy variables and 1 of them is redundant since
the second one is the mirror of the first one
A. iii, iv B. i, ii, iii, iv
C. i, ii, iii D. i, ii, iv
https://app.quizalize.com/quiz/preview/Q29udGVudDo4NTVjMGE5Ny05MTE4LTQ5YmUtYmI5MS04NzliYTI1ZDZmYzk= 1/5
7. Which of the below are potential methods while creating splits in a single decision tree?
i. variance
ii. entropy
iii. gini
A. ii, iii B. i, ii, iii
C. i, ii D. i, iii
8. How many of the below statements are correct for ensembling?
i. Ensembling reduces the interpretability
ii. Random forest and boosting uses ensembling idea
iii. Ensembling idea is applicable beyond tree algorithms
iv. Ensembling generally increases the overall model stability

A. i, ii, iii, iv B. i, ii, iii
C. iii, iv D. i, ii, iv
9. The fractional chance that the relationships between the target variable and the input variables are
random in linear regression is shown by
A. p-value B. t-statistic
C. f-statistic D. standard error
10. A machine learning model created with fewer variables is preferred whenever the success does not
change significantly. Which methods are used to penalize extra variables?
i. residuals
ii. degree of freedom
iii. lasso
iv. adjusted r-square
v. ridge
A. i, ii, iii, iv B. ii, iii, iv, v
C. iii, iv, v D. i, ii, iii, iv, v
11. A new variable is added to a linear regression model. P-values of the few previously included variables
change. Which of the below is always correct?
A. Some of the old variables overfit B. Some of the old variables are problematic
C. The newly added variable is redundant D. The newly added variable is not fully uncorrelated
12. What supervises the model in supervised learning?

A. independent variable B. dependent variable
C. model parameters D. coefficient of variation
13. Which of the below can be told for the k-means algorithm without additional info?
i. always converge to the same final clusters
ii. is not widely used in practice
iii. better than DB-scan
iv. uses actual data points as centers

A. None B. iii, iv
C. only iii D. i, ii
14. Which of the below are not related to the DBSCAN algorithm?
A. distance point B. core point
C. noise point D. border point
15. Which of the below statements are correct regarding DBSCAN algorithm?
i. epsilon is a distance
ii. epsilon is decided by the data scientist
iii. core point threshold is decided by the data scientist
iv. there is no practical limit to the number of border points around a core point
A. ii, iii B. i, ii, iii
C. i, ii, iii, iv D. i, iv
16. How many of the below is correct regarding the DBSCAN algorithm?
i. In an epsilon distance, the algorithm checks the number of points that are near a point
ii. If the number of neighbor data points are above a threshold (manually given) that point is accepted as a
core point
iii. Border points are close to core points, however, they do not have enough neighbors
iv. Core points and border points form a cluster together
v. The points which can not form a core point alone or close to a core point are accepted as noise
A. 2 B. 3
C. 5 D. 4
17. Which of the below are correct for connectivity-based algorithms?
i. In agglomerative, "bottom-up" approach: each observation starts in its own cluster, and pairs of clusters are
merged as the higher levels of the hierarchy is observed
ii. In divisive, "top-down" approach: all observations start in a single unified cluster, and splits are performed
recursively to create smaller clusters as the lower levels of the hierarchy is observed
A. i B. i, ii
C. ii D. none
18. Which one is the complete linkage considering agglomerative hierarchical clustering when the cities of
Turkey and Azerbaijan are assumed to form country-based clusters
A. The farthest distance; Izmir - Baku B. The average distance; center of Turkey - center of
Azerbaijan
C. The closest distance; Igdir - Karabag
19. Which of the below are correct for isolation forest?
i. It is a tree method
ii. It is used for grouping data based on similarities
iii. It is used for outlier detection

A. i, iii B. ii, iii
C. i, ii D. i, ii, iii
20. A model with higher bias and higher variance is ....

A. Inferior B. None
C. Equivalent D. Superior
21. Which of the below does not provide a penalty for too many variables?
A. BIC B. Cp
C. AIC D. R-square
22. How many of the below is correct for principal component analysis (PCA)?
i. It can be used if we have logical reasons to combine inputs
ii. It most of the time improves prediction performance
iii. Applying it to the whole dataset directly is generally a good approach
iv. It is not a prediction method

A. i, iv B. ii, iii
C. i, ii D. i, iii
23. Which of the below is not correct for optimization?

A. Non-negativity constraint requires each B. The non-linear model has linear objective
variable to be greater than or equal to zero function and nonlinear constraints
C. Set of all points that satisfy all of the D. A special case when the objective function can
problem's resource restrictions is called the be made infinitely large without violating any of
feasible region the constraints is called unboundedness
24. Assuming x and y are decision variables and the optimization problem is solved integer linear
programming. What is the value of x in the constraint 2x + 12y = 33?
A. 0 B. 5
C. 4 D. None
25. Which option is not related to the mediocre performance of the decision tree algorithm?
A. it being made up of a single tree B. Its ability to work with numeric and categoric targets
C. All D. Its greedy split approach
26. If we fail to reject the null hypothesis which of the below is considered correct?
A. The null hypothesis is true B. The null hypothesis is false
C. There is insufficient evidence to claim the null D. The alternative hypothesis is true
hypothesis is false
27. Hypothesis testing is performed with the purpose of understanding a characteristic of

A. a sample taken from a population B. a population
28. I say the US Dollar / Turkish Lira ratio reduced to 1 in an hour which is obviously not a correct statement.
Which of the below cognitive bias can be emerged in people's minds after this exaggeration?
A. Framing effect B. Gambler's fallacy
C. Endowment effect D. Anchoring
29. i. While solving a linear optimization problem by graphic, the corner-point solution method can only be
used to solve maximization problems
ii. Without exception, all of the functions in graphical LP are either straight lines or families of straight lines
iii. The objective function is a set of non-negativity conditions
Which is correct?
A. i B. ii
C. iii D. none
30. The "odds" term is defined as the tested option of outputs over all other outputs. Example: For a standard
six-sided dice, the odds of rolling the side "3" is 1/5. The probability of a particular customer paying back on
his loan is 0.50. What are the odds of default (not paying back)?
A. 2 B. 1
C. 0.5 D. 0.25

Dats501 2021 Spring Week14 Exam2

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Dats501 2021 Spring Week14 Exam2

Uploaded by

Copyright:

Available Formats

12/29/21, 6:16 PM dats501_2021_spring_week14_exam2

2. Which of the below are not measures of variability?

iv. coefficient of variation

3. If a distribution is right-skewed, which of the below is/are correct?

i. The mean value is greater than the median value

ii. The mode is generally smaller than the median value

iii. It is also called positively-skewed

4. How many of the below are assumptions of linear regression?

ii. independence of errors

iii. normality of errors

iv. equal variance

vi. narrow variance

i. Coefficients of the variables are not logical to business owner's experience

i. Each category in the variable is expressed as a dummy variable

ii. These dummy variables consist of boolean values

8. How many of the below statements are correct for ensembling?

i. Ensembling reduces the interpretability

ii. Random forest and boosting uses ensembling idea

iii. Ensembling idea is applicable beyond tree algorithms

iv. Ensembling generally increases the overall model stability

ii. degree of freedom

iv. adjusted r-square

12. What supervises the model in supervised learning?

i. always converge to the same final clusters

ii. is not widely used in practice

iii. better than DB-scan

iv. uses actual data points as centers

ii. epsilon is decided by the data scientist

iii. core point threshold is decided by the data scientist

iv. Core points and border points form a cluster together

17. Which of the below are correct for connectivity-based algorithms?

19. Which of the below are correct for isolation forest?

ii. It is used for grouping data based on similarities

iii. It is used for outlier detection

20. A model with higher bias and higher variance is ....

i. It can be used if we have logical reasons to combine inputs

ii. It most of the time improves prediction performance

iii. Applying it to the whole dataset directly is generally a good approach

iv. It is not a prediction method

23. Which of the below is not correct for optimization?

27. Hypothesis testing is performed with the purpose of understanding a characteristic of

iii. The objective function is a set of non-negativity conditions

You might also like