Professional Documents
Culture Documents
Amazon ML Summer School India Previous Year Questions Detailed Solutions Step by Step by Vikram Gaur
Amazon ML Summer School India Previous Year Questions Detailed Solutions Step by Step by Vikram Gaur
r
au
G
The selection test will have two parts – Part A will consist of 20 MCQ on basic ML
concepts and math fundamentals on topics such as probability, statistics and linear
algebra. Part B will consist of two Programming questions. The overall test duration
will be 75 minutes.
am
1. Two squares are chosen at random on a chessboard. What is the probability that
they have a side in common?
- a) 8/13
kr
- b) 17/18
- c) 5/13
- d) 1/18
Vi
● Edge squares: There are 4 squares on each edge (top, bottom, left, right)
that can only share a side with a maximum of 3 other squares. (4 edges * 3
cases.
r
Therefore, answer choice (d) 1/18 is correct.
au
2. The police plans to enforce speed limits during morning rush hour on four different
routes into the city. The traps on routes A, B, C, and D are operated 40%, 30%, 20%,
and 30% of the time, respectively. Biff always speeds to work, and he has probability
G
0.2, 0.1, 0.5, and 0.2 of using those routes. What is the probability that he'll get a
- a) 0.27
am
- b) 0.93
- c) 0.73
- d) 0.07
We need to calculate the probability that Biff speeds and encounters a speed trap.
Probability breakdown:
Vi
● Biff uses route A: 0.2 (Biff's probability) * 0.4 (speed trap probability) = 0.08
The probability that he will receive a speeding ticket by passing through these
locations is 0.27.
- Var(X) = 1
r
- Var(Y) = 4
au
- Var(2X - 3Y) = 34
2. Use the formula for the variance of a linear combination of random variables:
G
Var(aX + bY) = a^2 * Var(X) + b^2 * Var(Y) + 2ab * Cov(X, Y)
34 = 4 + 36 - 12 * Cov(X, Y)
34 - 40 = -12 * Cov(X, Y)
-6 = -12 * Cov(X, Y)
Cov(X, Y) = -6 / -12
Cov(X, Y) = 1/2
ρ(X, Y) = (1/2) / (1 * 2)
ρ(X, Y) = ¼
r
So, the correlation between X and Y is 1/4.
au
4. Probability of picking the fair coin with two heads:
To solve this problem, we can use Bayes' theorem. Let's define the events:
G
A: Picking the fair coin B: Getting heads twice
We want to find the probability of event A given event B, P(A|B).
According to Bayes' theorem, we have:
am
P(A) is the probability of picking the fair coin, which is 1/2 since we randomly picked
a coin.
P(B) is the probability of getting heads twice. This can be calculated by considering
Vi
both cases: picking the fair coin and getting heads twice, and picking the biased coin
and getting heads twice.
For the biased coin, the probability of getting heads twice is (3/4)^2 = 9/16.
So, P(B) = P(B|A) * P(A) + P(B|not A) * P(not A) = (1/4) * (1/2) + (9/16) * (1/2) = 1/8 +
9/32 = 13/32.
Now, we can substitute these values into Bayes' theorem:
P(A|B) = (P(B|A) * P(A)) / P(B) = (1/4) * (1/2) / (13/32) = 1/8 / (13/32) = 1/8 * (32/13)
= 4/13.
Question 5: PCA
- (i) We must standardize the data before applying: This is generally true because
PCA is affected by the scale of the data. Standardizing (subtracting the mean and
dividing by the standard deviation) ensures that each feature contributes equally to
the analysis.
r
- (ii) We should select the principal components which explain the highest
au
variance:This is true. PCA is designed to find the directions (principal components)
that maximize the variance in the data. Therefore, the principal components with the
highest variance are the most important.
- (iii) We should select the principal components which explain the lowest
G
variance:This is false. We generally discard components that explain little variance
as they do not contribute much to the data's structure.
- (iv) We can use PCA for visualizing the data in lower dimensions: This is true. One
am
of the key applications of PCA is to reduce the dimensionality of the data for
visualization, usually in 2D or 3D.
2x+y−z=4
x−2y+z=−2
−x+2y−z=−2
.
We can represent the system of equations in a matrix form using the coefficient
matrix and constant vector. The coefficient matrix A will contain the coefficients of the
variables, and the constant vector b will contain the constants on the right side of the
equations.
Follow Vikram Gaur : https://www.linkedin.com/in/vikram-gaur-0252aa185
5
For this system, we have:
| 2 1 -1 | | x | | 4 |
| 1 -2 1 | * | y | = | -2 |
| -1 2 -1 | | z | | -2 |
A*X=b
where:
A = [[2, 1, -1], [1, -2, 1], [-1, 2, -1]] X = [x, y, z] b = [4, -2, -2]
r
au
Step 2: Analyze the determinant of the coefficient matrix
The determinant of the coefficient matrix (det(A)) plays a crucial role in determining
● If det(A) ≠ 0: This indicates that the rows of the matrix are linearly
independent. In this case, there exists a unique solution for the system of
equations.
kr
r
Solving the system
au
Now, let's apply these steps to the given system:
3. Since det(A) = 0, the system might have either infinitely many solutions or no
solutions. Let's try using Gaussian elimination (not shown here, but you can
kr
zeros with a non-zero constant on the right side, there are no solutions. If it
results in a row of zeros with a zero constant on the right side, there are
Vi
Conclusion
By analyzing the determinant and using Gaussian elimination (if necessary), we can
inconsistency in the system that prevents any solution from satisfying all three
equations simultaneously.
● Weights: 1, 4, 3
● Inputs: 4, 8, 5
r
● Transfer function is linear with a constant of proportionality of 3
au
First, calculate the weighted sum of the inputs:
Weighted sum=(1×4)+(4×8)+(3×5)
Weighted sum=4+32+15
Weighted sum=51
G
am
Since the transfer function is linear with a proportionality constant of 3, the output
will be:
Output=3×Weighted sum
Output=3×51
kr
Output=153
b) 153
- Choosing a lower p-value (b): This would make the test more stringent, decreasing
false positives but potentially increasing false negatives, which is not what we want.
- Choosing the same p-value (c): This would not address the concern of minimizing
false negatives.
- False negatives do not depend on p-value (d): False negatives are indeed related to
the choice of p-value as it affects the sensitivity and specificity of the test.
r
au
Therefore, to minimize false negatives, we would choose a higher p-value:
- a) 1, Yes: The value 1 represents a perfect prediction where the probability of the
predicted class is 1. This value is attainable in theory for the likelihood of a single
observation.
kr
The upper bound of the likelihood for a single observation is 1, but for the combined
likelihood of all observations, achieving 1 is generally not feasible.
Follow Vikram Gaur : https://www.linkedin.com/in/vikram-gaur-0252aa185
9
- c) 1, No
We'll analyze each statement and identify the correct answer based on the
Statement 1:
r
au
The sample variance (S^2) is calculated as the sum of squared deviations
from the mean divided by n-1 (n is the sample size). This formula effectively
estimates the average squared deviations from the population mean in a finite
G
sample. It can be proven mathematically that E(S^2) = σ^2, where E denotes
The sample mean (S) is an unbiased estimator of the population mean (μ).
Statement 2:
why:
population mean (μ). Scaling it by (n-1/n) does not change its unbiased
Follow Vikram Gaur : https://www.linkedin.com/in/vikram-gaur-0252aa185
10
property. However, it's unnecessary and usually omitted.
Here's why:
bias for small sample sizes. Dividing S^2 by (n-1/n) corrects for this bias,
making (n-1/n) * S^2 a more accurate estimator of σ^2, especially for smaller
samples.
r
au
Conclusion:
Based on the analysis, only Statement 1 regarding S^2 being an unbiased estimator
G
of σ^2 is true. Therefore, the correct answer is a) 1 only.
Analysis:
From property 1, rank(AB) can't be greater than the minimum rank of A and B. Since
Solution Steps:
[1 1 1 | 1]
Follow Vikram Gaur : https://www.linkedin.com/in/vikram-gaur-0252aa185
11
[a -a 3 | 5 - a]
[5 -3 a | 6]
● R2 = R2 - a*R1:
[1 1 1 | 1]
[0 -2a 3-a | 5 - a]
[5 -3 a | 6]
r
au
● R3 = R3 - 5*R1:
[1 1 1 | 1]
[0 -2a 3-a | 5 - a]
[0 -8 a-5 | 1]
G
am
For infinite solutions, the second and third rows must be proportional. Set up the
a(a-5) = 4(3-a)
a^2 - 5a = 12 - 4a
(a-4)(a+3) = 0
a = 4 or a = -3
Answer:
r
au
Analysis:
Given:
● m=50 examples
● n=200000 features
G
● The normal equation involves calculating the inverse of a very large matrix (n
am
Therefore, the correct answer is: a) Gradient descent, since inverse(XTX) will be
Given:
● A 4x4 square matrix with 0's as diagonal elements and 1's as off-diagonal
elements.
[0 1 1 1]
[1 0 1 1]
[1 1 0 1]
[1 1 1 0]
r
au
Each row (or column) has the sum 2, which means the eigenvalues are 2 and 0.
Since it's a 4x4 matrix, each eigenvalue has a multiplicity of 2.
● d) 4, 0, 0, 0
G
am
You're right, relying on external libraries like NumPy might not be ideal during an
(1 - λ)(-1 - λ) - 1 * 1 = λ² + λ - 2 = 0
2. Solve for Eigenvalues: Solve the characteristic equation for λ. In this case, λ
= 1 or λ = -2.
r
au
G
am
kr
Vi
r
au
G
am
kr
Vi
(absolute value) might be sufficient. So, the answer choices with the closest
1. Logarithm both sides: Take the equation y = x^x and put both sides of the
r
ln(y) = ln(x^x)
au
2. Differentiate both sides: Differentiate both sides of the equation using the
directly might be difficult. Notice that y = 2^2 (based on the original equation).
kr
expression for dy/dx and then evaluating it at x = 2. However, directly solving for y at
+ 1) and then noted that for x = 2 (where y = 4), the expression becomes 4(1+log2).
r
au
The characteristic equation of a matrix A is a polynomial equation in terms of a
equation is:
t² - t - 1 = 0
G
We need to solve this equation to find the eigenvalues (λ).
am
λ = (1 ± √(1 + 4 * 1)) / 2
kr
λ = (1 ± √5) / 2
Vi
This equation tells us that A has two eigenvalues: λ₁ = (1 + √5) / 2 and λ₂ = (1 - √5) /
2.
an eigenvalue is zero, the determinant of the matrix (which can be obtained from the
characteristic equation) will also be zero. An inverse cannot exist for a matrix with a
determinant of zero.
Follow Vikram Gaur : https://www.linkedin.com/in/vikram-gaur-0252aa185
18
In our case, both eigenvalues (λ₁ and λ₂) are non-zero. This is a good sign!
Since the characteristic equation of A has no zero eigenvalues, the matrix A likely
has an inverse (A⁻¹). However, the characteristic equation alone doesn't provide
enough information to determine the exact form of the inverse (like A - I or A + 1).
Therefore, the answer is b) A⁻¹ exists but cannot be determined from the data.
r
au
Given:
● Fitting data from a cubic function corrupted by standard Gaussian noise using
G
linear and 5th-degree polynomial models.
Options:
am
Explanation:
● A 5th-degree polynomial model (M₅) is more complex than a linear model (M₁),
so it has the potential to capture more intricate patterns in the data, leading to
lower bias. Therefore, Bias(𝑀1)≥Bias(𝑀5)
r
au
G
am
kr
Vi