Professional Documents
Culture Documents
Dats501 2021 Spring Week11 Exam1
Dats501 2021 Spring Week11 Exam1
dats501_2021_spring_week11_exam1
40 questions
1. Forecasting the state of the weather for tomorrow as rainy or notRainy is part of....
A. Predictive Analytics B. Prescriptive Analytics
C. Diagnostic Analytics D. Descriptive Analytics
2. Which definition is more convenient for the data of an insurance company that uses a single powerful 512
gigabyte RAM 48-core server for its analytics operations?
A. Medium data B. Small data
C. Big data D. Extreme data
A: I used to think correlation implied causation. Then I took a statistics class. Now I don't
A: ....
A. Not really, I just changed my mind B. Definitely, it helped
C. Yeah, the professor is convincing D. Well, maybe
4. What is the mean and mode of the following set of numbers? { 4, 9, 8, 8, 2, 16, 4, 4, 8, 9, 6, 8 }
A. mean, mode: 7, 8 B. mean, mode: 7, 4
C. mean, mode: 8, 9 D. mean, mode: 6, 8
i. Q1 - 1.5 * IQR is close to - 2.7 sigma for a normally distributed and normalized data
iii. In general practice of box plot, outliers are found beyond 2 IQR distances from quartiles
i. Pie charts may deceive the human eye regarding the position of the slice
ii. Bubble charts are similar to the ones used in the number of pandemia patients (covid-19) in countries
iv. GIS (geographic information systems) is not useful to municipalities and the general directorate of
highways
A. i, ii B. only ii
C. iii, iv D. i, ii, iv
https://app.quizalize.com/quiz/preview/Q29udGVudDo5ZjkzZDU1My0wMDczLTQ2OWYtOGZmYi05Y2ZiNzRkZjFjNWQ= 1/7
12/29/21, 6:13 PM dats501_2021_spring_week11_exam1
7. Order below options according to their explanatory power for the bimodal data for the heights of a group
of animals?
iii. 1 herd with the same species with two different genders without sexual dimorphism
iv. 2 herds of the same species from different geographies with environmental variation
A. iii, i, ii, iv B. ii, iv, i, iii
C. ii, iv, iii, i D. ii, iv, i, iii
8. An investment company asks one of its portfolio managers to find optimal solutions for two of its
customers. The customers each give $100 M to the investment manager. There are 3 portfolios to invest in
with a bankrupt risk, meaning losing all the money.
Customer A is a risk-taker and sees bankruptcy only as a financial loss and penalize it with the expected loss,
Customer B is risk-averse and penalizes bankruptcy with the square of the financial loss (i.e. losing 3 million
feels like losing 9 million)
Note: The penalty is calculated from the invested amount and probability of bankruptcy
Calculate utilities of the portfolios for each customer. Then, choose the optimal ones for them
A. "Customer A: Portfolio 3" - "Customer B: B. "Customer A: Portfolio 2" - "Customer B:
Portfolio 2" Portfolio 1"
C. "Customer A: Portfolio 1" - "Customer B: D. "Customer A: Portfolio 2" - "Customer B:
Portfolio 3" Portfolio 3"
iii. Offering a deal for two pairs of socks for the price of one
iv. Halving the price of the internet for the newcomers for a limited amount of time
A. iii, iv B. i, ii
C. ii, iii D. i, iv
ii. When people have to choose between an option framed in terms of a gain and an option framed in terms of
a loss, most people choose the option framed in terms of loss
iii. Tversky & Kahneman claims when the frame is positive people are more likely to take risks
iv. When something is framed in a positive way, people are more likely to go for the safest option
A. i, ii, iv B. ii, iv
C. only iii D. i, iii
https://app.quizalize.com/quiz/preview/Q29udGVudDo5ZjkzZDU1My0wMDczLTQ2OWYtOGZmYi05Y2ZiNzRkZjFjNWQ= 2/7
12/29/21, 6:13 PM dats501_2021_spring_week11_exam1
11. In the Moneyball movie, Brad Pitt's character is the sports team's (baseball) director who tries to compete
with rich teams with low-cost but statistically effective players and strategies. A Yale Economics graduate
helps him for the cause The value of a player is marked with a formula similar to "Score = (Hits + Walks —
Caught stealing)*(Total Bases + 0.7 Stolen Bases)/(At Bats + Walks +Caught Stealing)" What would be the
best description for this scoring process if this scoring is performed for the end of the next season with the
currently unrealized statistics from today to the next season's end?
A. Prescriptive analytics followed by predictive B. Simulated solution with the important player
and descriptive analytics features based on values predicted with
unsupervised learning
C. Predicting the scores regarding the next D. Descriptive value assignment based on the
season followed by utility optimization for the utilities provided by each important feature
team state at the end of the next season
12. Toss a coin 3 times. Let A = 'at least 2 tails' B = 'second toss is heads' What is P(A|B) and P(B|A) ?
A. 1/3, 1/4 B. 1/5, 1/3
C. 1/4, 1/4 D. 1/4, 1/3
13. X is a random variable and values of X are { 3, 5, 6, 8 } and cdf of it are { 0.25, 0.45, 0.70, 1.00 } respectively
What is P( X <= 5 ) and P( X = 7 )
A. 0.70, 0.00 B. 0.45, 0.30
C. 0.70, 0.30 D. 0.45, 0.00
- 5-sided dice. These side values are { 2, 4, 8, 16, 16 }. The sides have realization probability inversely
proportional to the face values
15. - PMF(probability mass function) is commonly used when there are a small number of unique & discrete
values
The pmf of a discrete random variable X is given as follows: { x; P( X = x ) } { ( -5, -1, 1, 4 ); ( 0.3, 0.25, 0.05, 0.4 )
}. Compute E(X)
A. -0.1 B. -0.4
C. 0.3 D. 0.0
16. X is a variable with following values and pmf respectively; { 3, 4, 5 } & { 0.2, 0.6, 0.2 } Compute the variance
A. 0.2 B. 0.6
C. 0.8 D. 0.4
17. Choose the correlation coefficient that shows the weakest relationship between two variables?
A. r = + 0.187 B. r = - 0.874
C. r = - 0.193 D. r = + 0.843
https://app.quizalize.com/quiz/preview/Q29udGVudDo5ZjkzZDU1My0wMDczLTQ2OWYtOGZmYi05Y2ZiNzRkZjFjNWQ= 3/7
12/29/21, 6:13 PM dats501_2021_spring_week11_exam1
i. linearity
v. nominal inputs
21. A fitted regression equation is given by Y-hat = 150 + 7X. What is the sum of the residuals' absolute values
at points ( X1=20, Y1=300) and ( X1=30, Y1=360)?
A. -10 B. 10
C. 0 D. 20
22. Which of the below directly shows the significance of a linear regression model?
A. f statistic B. standard error
C. p value D. t statistic
23. The p-value of a variable can change when another variable is included in the model. Which of the below
is always correct?
A. The newly added variable is redundant B. The old variable overfits
C. The old variable is problematic D. The newly added variable is not fully uncorrelated
24. When an important variable is not available, another variable can try to explain it wrongly. It is omitted
variable bias. How many of the below may be a reliable way to catch such a problem?
https://app.quizalize.com/quiz/preview/Q29udGVudDo5ZjkzZDU1My0wMDczLTQ2OWYtOGZmYi05Y2ZiNzRkZjFjNWQ= 4/7
12/29/21, 6:13 PM dats501_2021_spring_week11_exam1
27. The "odds" term is defined as the tested option of outputs over all other outputs. Example: For a standard
six-sided dice, the odds of rolling the side "3" is 1/5.
The probability of a particular customer paying back on his loan is 0.50. What are the odds of default (not
paying back)?
A. 0.25 B. 2
C. 1 D. 0.5
Which of the following options are possible regarding the above information?
C. People with higher fast-food consumption are more likely to suffer from obesity
D. Young people consume more fast food compared to the average of non-young people
A. A, D B. A, C
C. B, C D. B, D
29. Which of the following statements are more likely for good models relative to the models that overfit?
30. A dataset is created by using a 2-degree polynomial function. Then some noise data points are added. We
try to understand the y values by using a polynomial function with degree 4 as a model. What features are
expected for this model in terms of variance and bias?
A. Low bias, low variance B. High bias, low variance
C. High bias, low variance D. Low bias, high variance
31. Which is not correct if the regularization parameter = 0 for the lasso regression?
A. Small coefficients are penalized B. Overfitting problems are not dealt
C. Large coefficients are not penalized D. The loss function is as same as the ordinary least
square loss function
https://app.quizalize.com/quiz/preview/Q29udGVudDo5ZjkzZDU1My0wMDczLTQ2OWYtOGZmYi05Y2ZiNzRkZjFjNWQ= 5/7
12/29/21, 6:13 PM dats501_2021_spring_week11_exam1
32. Which is not correct if the regularization parameter is very high for the ridge regression?
A. May lead to coefficients of the variables B. Large coefficients are significantly penalized
become zero
C. May lead to a model that is too simple and D. May lead to perform worse than ordinary
ends up underfitting the data least square
b. RF uses random thresholds for each feature rather than searching for the best possible thresholds.
37. Which of the below best describes "list append" method's functionality?
A. add new elements as a single element B. Adds an element at the specified position
C. adds new elements D. returns error if the element is not in the list
38. You want jupyter notebook to show all the data instances while displaying data. Which code will you use?
A. pd.set_option("display.max.rows", None) B. pd.set_option('display.max_colwidth', -1)
C. pd.set_option("display.max.columns", None) D. pd.set_option('max_info_columns', 199)
https://app.quizalize.com/quiz/preview/Q29udGVudDo5ZjkzZDU1My0wMDczLTQ2OWYtOGZmYi05Y2ZiNzRkZjFjNWQ= 6/7
12/29/21, 6:13 PM dats501_2021_spring_week11_exam1
ii. if we use print() as user-defined function (udf) output and assign udf to variable (var_x), var_x points
"NoneType" in memory
ii. If x is a list then x[::-1] and x.reverse() give me the same result
https://app.quizalize.com/quiz/preview/Q29udGVudDo5ZjkzZDU1My0wMDczLTQ2OWYtOGZmYi05Y2ZiNzRkZjFjNWQ= 7/7