Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

12/29/21, 6:16 PM dats501_2021_spring_week14_exam2

dats501_2021_spring_week14_exam2
30 questions

1. What is the median of the following set of numbers? { 10, 5, 14, 11, 8, 6, 5, 6, 10, 12, 9 }
A. 6 B. 10
C. 9 D. 8

2. Which of the below are not measures of variability?

i. median

ii. range

iii. mode

iv. coefficient of variation


A. ii, iv B. i, iv
C. i, iii D. ii, iii

3. If a distribution is right-skewed, which of the below is/are correct?

i. The mean value is greater than the median value

ii. The mode is generally smaller than the median value

iii. It is also called positively-skewed


A. i, ii, iii B. i, iii
C. i, ii D. ii, iii

4. How many of the below are assumptions of linear regression?

i. linearity

ii. independence of errors

iii. normality of errors

iv. equal variance

v. nominal inputs

vi. narrow variance


A. 4 B. 2
C. 3 D. 1

5. When an important variable is not available, another variable can try to explain it wrongly. It is omitted
variable bias. Which of the below situations requires checking for such a case?

i. Coefficients of the variables are not logical to business owner's experience

ii. Adding new variables change the current coefficients of the variables dramatically
A. i B. ii
C. none D. i - ii

6. Which of the below are correct while converting categorical data to numeric?

i. Each category in the variable is expressed as a dummy variable

ii. These dummy variables consist of boolean values

iii. A boolean value shows whether that category is available for that data point or not

iv. Converting a variable with 2 categories results in 2 new dummy variables and 1 of them is redundant since
the second one is the mirror of the first one
A. iii, iv B. i, ii, iii, iv
C. i, ii, iii D. i, ii, iv

https://app.quizalize.com/quiz/preview/Q29udGVudDo4NTVjMGE5Ny05MTE4LTQ5YmUtYmI5MS04NzliYTI1ZDZmYzk= 1/5
12/29/21, 6:16 PM dats501_2021_spring_week14_exam2

7. Which of the below are potential methods while creating splits in a single decision tree?

i. variance

ii. entropy

iii. gini
A. ii, iii B. i, ii, iii
C. i, ii D. i, iii

8. How many of the below statements are correct for ensembling?

i. Ensembling reduces the interpretability

ii. Random forest and boosting uses ensembling idea

iii. Ensembling idea is applicable beyond tree algorithms

iv. Ensembling generally increases the overall model stability


A. i, ii, iii, iv B. i, ii, iii
C. iii, iv D. i, ii, iv

9. The fractional chance that the relationships between the target variable and the input variables are
random in linear regression is shown by
A. p-value B. t-statistic
C. f-statistic D. standard error

10. A machine learning model created with fewer variables is preferred whenever the success does not
change significantly. Which methods are used to penalize extra variables?

i. residuals

ii. degree of freedom

iii. lasso

iv. adjusted r-square

v. ridge
A. i, ii, iii, iv B. ii, iii, iv, v
C. iii, iv, v D. i, ii, iii, iv, v

11. A new variable is added to a linear regression model. P-values of the few previously included variables
change. Which of the below is always correct?
A. Some of the old variables overfit B. Some of the old variables are problematic
C. The newly added variable is redundant D. The newly added variable is not fully uncorrelated

12. What supervises the model in supervised learning?


A. independent variable B. dependent variable
C. model parameters D. coefficient of variation

13. Which of the below can be told for the k-means algorithm without additional info?

i. always converge to the same final clusters

ii. is not widely used in practice

iii. better than DB-scan

iv. uses actual data points as centers


A. None B. iii, iv
C. only iii D. i, ii

https://app.quizalize.com/quiz/preview/Q29udGVudDo4NTVjMGE5Ny05MTE4LTQ5YmUtYmI5MS04NzliYTI1ZDZmYzk= 2/5
12/29/21, 6:16 PM dats501_2021_spring_week14_exam2

14. Which of the below are not related to the DBSCAN algorithm?
A. distance point B. core point
C. noise point D. border point

15. Which of the below statements are correct regarding DBSCAN algorithm?

i. epsilon is a distance

ii. epsilon is decided by the data scientist

iii. core point threshold is decided by the data scientist

iv. there is no practical limit to the number of border points around a core point
A. ii, iii B. i, ii, iii
C. i, ii, iii, iv D. i, iv

16. How many of the below is correct regarding the DBSCAN algorithm?

i. In an epsilon distance, the algorithm checks the number of points that are near a point

ii. If the number of neighbor data points are above a threshold (manually given) that point is accepted as a
core point

iii. Border points are close to core points, however, they do not have enough neighbors

iv. Core points and border points form a cluster together

v. The points which can not form a core point alone or close to a core point are accepted as noise
A. 2 B. 3
C. 5 D. 4

17. Which of the below are correct for connectivity-based algorithms?

i. In agglomerative, "bottom-up" approach: each observation starts in its own cluster, and pairs of clusters are
merged as the higher levels of the hierarchy is observed

ii. In divisive, "top-down" approach: all observations start in a single unified cluster, and splits are performed
recursively to create smaller clusters as the lower levels of the hierarchy is observed
A. i B. i, ii
C. ii D. none

18. Which one is the complete linkage considering agglomerative hierarchical clustering when the cities of
Turkey and Azerbaijan are assumed to form country-based clusters
A. The farthest distance; Izmir - Baku B. The average distance; center of Turkey - center of
Azerbaijan
C. The closest distance; Igdir - Karabag

19. Which of the below are correct for isolation forest?

i. It is a tree method

ii. It is used for grouping data based on similarities

iii. It is used for outlier detection


A. i, iii B. ii, iii
C. i, ii D. i, ii, iii

20. A model with higher bias and higher variance is ....


A. Inferior B. None
C. Equivalent D. Superior

https://app.quizalize.com/quiz/preview/Q29udGVudDo4NTVjMGE5Ny05MTE4LTQ5YmUtYmI5MS04NzliYTI1ZDZmYzk= 3/5
12/29/21, 6:16 PM dats501_2021_spring_week14_exam2

21. Which of the below does not provide a penalty for too many variables?
A. BIC B. Cp
C. AIC D. R-square

22. How many of the below is correct for principal component analysis (PCA)?

i. It can be used if we have logical reasons to combine inputs

ii. It most of the time improves prediction performance

iii. Applying it to the whole dataset directly is generally a good approach

iv. It is not a prediction method


A. i, iv B. ii, iii
C. i, ii D. i, iii

23. Which of the below is not correct for optimization?


A. Non-negativity constraint requires each B. The non-linear model has linear objective
variable to be greater than or equal to zero function and nonlinear constraints
C. Set of all points that satisfy all of the D. A special case when the objective function can
problem's resource restrictions is called the be made infinitely large without violating any of
feasible region the constraints is called unboundedness

24. Assuming x and y are decision variables and the optimization problem is solved integer linear
programming. What is the value of x in the constraint 2x + 12y = 33?
A. 0 B. 5
C. 4 D. None

25. Which option is not related to the mediocre performance of the decision tree algorithm?
A. it being made up of a single tree B. Its ability to work with numeric and categoric targets
C. All D. Its greedy split approach

26. If we fail to reject the null hypothesis which of the below is considered correct?
A. The null hypothesis is true B. The null hypothesis is false
C. There is insufficient evidence to claim the null D. The alternative hypothesis is true
hypothesis is false

27. Hypothesis testing is performed with the purpose of understanding a characteristic of


A. a sample taken from a population B. a population

28. I say the US Dollar / Turkish Lira ratio reduced to 1 in an hour which is obviously not a correct statement.
Which of the below cognitive bias can be emerged in people's minds after this exaggeration?
A. Framing effect B. Gambler's fallacy
C. Endowment effect D. Anchoring

https://app.quizalize.com/quiz/preview/Q29udGVudDo4NTVjMGE5Ny05MTE4LTQ5YmUtYmI5MS04NzliYTI1ZDZmYzk= 4/5
12/29/21, 6:16 PM dats501_2021_spring_week14_exam2

29. i. While solving a linear optimization problem by graphic, the corner-point solution method can only be
used to solve maximization problems

ii. Without exception, all of the functions in graphical LP are either straight lines or families of straight lines

iii. The objective function is a set of non-negativity conditions

Which is correct?
A. i B. ii
C. iii D. none

30. The "odds" term is defined as the tested option of outputs over all other outputs. Example: For a standard
six-sided dice, the odds of rolling the side "3" is 1/5. The probability of a particular customer paying back on
his loan is 0.50. What are the odds of default (not paying back)?
A. 2 B. 1
C. 0.5 D. 0.25

https://app.quizalize.com/quiz/preview/Q29udGVudDo4NTVjMGE5Ny05MTE4LTQ5YmUtYmI5MS04NzliYTI1ZDZmYzk= 5/5

You might also like