Download as pdf or txt
Download as pdf or txt
You are on page 1of 40

ANALYTICS PREPBOOK

Lateral Placements 2019

ANALYTICS SOCIETY, IIM Bangalore


analytics@iimb.ac.in
Contents
Overview ....................................................................................... 2
Concepts........................................................................................ 3
Analytical Concepts ..................................................................................3
Guesstimates ....................................................................................... 10
Fundamentals .............................................................................................. 10
Sample Guestimates ....................................................................................... 11

Sample Puzzles ..................................................................................... 16


Interview Recommendations................................................................ 23
American Express .................................................................................. 24
EXL .................................................................................................. 27
VISA ................................................................................................. 28
United Health Group ............................................................................... 29
Uber................................................................................................. 30
Gartner ............................................................................................. 32
Goldman Sachs Strats .............................................................................. 34
Industry Applications of Data Science ...................................................... 37

ANALYTICS SOCIETY, IIM BANGALORE 1


Overview

Dear Reader,
The domain of analytics is quickly gaining popularity across corporate firms worldwide. From a
subject which was prominently deployed in research projects in the 19th & 20th century to becoming
a Cambridge Analytics scam, and affirming its importance in turning the tides of elections around the
world, analytics has surely come a long way. Its importance has scaled to such an extent that firms
which are not utilizing its capabilities in any of its operations are bound to lose out in the long run.
Hence corporates are either building their own analytics capabilities or leveraging the services of
prominent analytics firms.
Intuition is slowly taking a backseat and every decision gets scrutinized through the lens of analytics.
As a result, managers who have the technical know-how of analytics combined with the grit to lead
capability development activities in this field are in great demand today. This demand would grow
exponentially as more and more firms push themselves to develop such capabilities. Increase in the
number of analytics profiles visiting top B-Schools is hence not a mere coincidence. With this thought,
the Analytics Society aims to keep students of IIMB ahead of the curve by inculcating the motivation
to learn analytics and provide the requisite tools at all stages.
As an effort in this direction, we present to you a compendium that can help you prepare for your
placement interviews. It is a three-part booklet with the first part focusing on prominent analytics
definitions and terminologies which are ‘must to know’ for any analytics interview. The techniques
on solving guess estimates and sample scenarios across the same interviews. The second part
discusses Summer Insights collected through years of interview experiences of IIM Bangalore
students. We would like to thank all the PGP1 students who shared their interview experiences with
us. We hope you find this useful. All the best for your interviews!

Regards,
The Analytics Society of IIM Bangalore

ANALYTICS SOCIETY, IIM BANGALORE 2


Concepts
Analytical Concepts

1. Difference between supervised and unsupervised Learning

Supervised Learning Unsupervised Learning


Supervised learning is where you have input Unsupervised learning is where you only
variables (x) and an output variable (Y) and have input data (X) and no corresponding
you use an algorithm to learn the mapping output variables.
function from the input to the output.
Y = f(X)
Further grouped into regression and Further grouped into clustering and
classification problems. association problems.
1. Classification: A classification 1. Clustering: A clustering problem is
problem is when the output variable where you want to discover the
is a category, such as “red” or “blue” inherent groupings in the data, such
or “disease” and “no disease” as grouping customers by purchasing
2. Regression: A regression problem is behaviour
when the output variable is a real 2. Association: An association rule
value, such as “dollars” or “weight” learning problem is where you want
to discover rules that describe large
portions of your data, such as people
that buy X also tend to buy Y
Few use cases of Supervised learning Few use cases of unsupervised learning
• Predicting the price of a house based • Grouping customers into different
on attributes like sq. foot of house, buckets based on their purchasing
no of rooms, locality etc pattern
• Image classification - Classifying • Image categorization - Categorize
whether an image is a cat image image into cat image, dog image, or
• Spam email classification - lion image
Identifying and classifying whether
an email is a spam or not
Few supervised learning algorithms Few unsupervised learning algorithms
• Linear Regression for regression • k-means for clustering problems
problems • Apriori algorithm for association rule
• Random Forest for classification and learning problems
regression problems
• Support Vector Machines for
classification problems

ANALYTICS SOCIETY, IIM BANGALORE 3


2. Difference between Data Mining and Data Analysis

Data Mining Data Analysis


A hypothesis is not required for Data Mining Data analysis begins with a hypothesis.
Data Mining demands clean and well- Data analysis involves data cleaning.
documented data.
Results of data mining are not always easy to Data analysts interpret the results and
interpret. present it to the stakeholders.
Data mining algorithms automatically Data analysts have to develop their own
develop equations. equations.

3. Steps in an Analytics Project

PROJECT DEFINITION
Describing and recording business
intentions for predictive analytics 1 DATA COLLECTION
DEPLOYMENT Data detection and assessment to
Executing deployment procedures for 6 2 assess your data’s readiness for
implanting insights from predictive predictive analytics
models into your business

MODEL VALIDATION DATA ANAYSIS


Testing the model on a new dataset 5 3 Planning and structuring analytics
to ensure consistent performance data cubes

4 DATA MODELING
Developing modeling method,
procedure and/or environment

4. What is data exploration?

Data exploration is done to become familiar with the data. This step is especially important when
dealing with new data. There are a number of things you will want to do in this step –
What is there in the data – look at the list of all the variables in the data set. Understand the meaning
of each variable using the data dictionary. Go back to the business for more information in case of
any confusion.
How much data is there – look at the volume of the data (how many records), look at the time frame
of the data (last 3 months, last 6 months etc.)
Quality of the data – how much missing information, quality of data in each variable. Are all fields
usable? If a field has data for only 10% of the observations, then maybe that field is not usable etc.
You will also identify some important variables and may do a deeper investigation of these. Like
looking at averages, min and max values, maybe 10th and 90th percentile as well…
5. What does Data Preparation entail?

ANALYTICS SOCIETY, IIM BANGALORE 4


In data preparation, you will prepare the data for the next stage i.e. the modelling stage. What you
do here is influenced by the choice of technique you use in the next stage.
But some things are done in most cases – example identifying missing values and treating them,
identifying outlier values (unusual values) and treating them, transforming variables, creating binary
variables if required etc.
This is the stage where you will partition the data as well i.e. create training data (to do modelling)
and validation (to do validation)
6. How are missing values treated?

The first step is to identify variables with missing values. Assess the extent of missing values. A few
possible methods are –

✓ Deleting Rows
Deletion ✓ Deleting Columns
✓ Pairwise Deletion
Handling Missing
Data
✓ Mean, Median, Mode, Random
Imputation Sample Imputation
✓ Linear Regression
✓ Logistic Regression

7. How are outliers treated?

You can identify outliers using graphical analysis and univariate analysis. If there are only a few
outliers, you can assess them individually. If there are many, you may want to substitute the outlier
values with the 1st percentile or the 99th percentile values.
If there is a lot of data, you may decide to ignore records with outliers.
Not all extreme values are outliers. Not all outliers are extreme values.
8. Correlation
Correlation analysis is a method of statistical evaluation used to study the strength of linear
relationship between two, numerically measured, continuous variables.
9. Covariance
Covariance is the expected value of variations of two random variates from their expected values. It
is a measure of how changes in one variable are associated with changes in a second variable. A
positive correlation means that higher values of one variable are associated with higher values of the
other variable.
10. What is Hypothesis Testing?
Hypothesis testing is the use of statistics to determine the probability that a given hypothesis is true.
The usual process of hypothesis testing consists of four steps.
ANALYTICS SOCIETY, IIM BANGALORE 5
i. Formulate the null hypothesis H0 (commonly, that the observations are the result of pure
chance) and the alternative hypothesis Ha (commonly, that the observations show a real
effect combined with a component of chance variation)
ii. Identify a test statistic that can be used to assess the truth of the null hypothesis
iii. Compute the P-value, which is the probability that a test statistic at least as significant as the
one observed would be obtained assuming that the null hypothesis were true. The smaller
the P-value, the stronger the evidence against the null hypothesis
iv. Compare the p-value to an acceptable significance value alpha (sometimes called an alpha
value). If p<=alpha, that the observed effect is statistically significant, the null hypothesis is
ruled out, and the alternative hypothesis is valid

11. R2 value
Coefficient of determination is the proportion in the variance of the dependent variable that can be
predicted from the independent variable. In regression, the R2 coefficient of determination is a
statistical measure of how well the regression predictions approximate the real data points. An R2 of
1 indicates that the regression predictions perfectly fit the data.
12. P value
A probability that provides a measure of the evidence against the null hypothesis given by the sample.
Smaller value indicate more evidence against H0.
13. What is Bias and Variance in a model? What is Bias Variance Trade off?

Bias refers to model error and variance refers to the consistency in predictive accuracy of models
applied to other data sets. The best models have low bias (low error, high accuracy) and low variance
(consistency of accuracy from data set to data set).
Unfortunately, there is always a tradeoff between these two building predictive models. You can
achieve low bias on training data, but may suffer from high variance on held-out data because the
models were overfit.
Error due to bias: The error due to bias is the difference between the expected (or average)
prediction of our model and the correct value which we are trying to predict. Imagine you could
repeat whole model building process more than once: each time you gather new data and run anew
analysis creating a new model. Due to randomness in the underlying datasets, the resulting model
will have range of predictions. Bias measures how far off in general these model's prediction are from
the correct value. A high bias error means we have an under-fitting model which keeps on missing
important labels
Error due to variance: The error due to variance is taken as the variability of a model prediction for
a given data point. Again, imagine you can repeat the entire model building process multiple times.
The variance is how much the predictions for a given point varies between different iterations of the
model. A high variance model will over-fit on your training population and perform very badly on any
observation outside the training data

ANALYTICS SOCIETY, IIM BANGALORE 6


The below diagram might oversimplify things e.g. models with just one or two features, but this helps
to conceptually understand what it means for a model to have high bias (underfitting) or high
variance (overfitting). The concept is much the same just much harder to visualize when there are
many features

14. What is the difference between Linear and Logistic Regression?


Linear Regression uses a linear function to map input variables to continuous response/dependent
variables. Once fitted, a Linear Regression model can be used to predict the values of
response/dependent variables for new values of the input variables. An example application, might
be to predict the total value of trades that will
occur by end of day based on the number and
size of orders that have been submitted so far.
The output of Linear Regression is a continuous
value, and due to the use of a straight line to
map the input variables to the dependent
variables, the output can be any one of an
infinite number of possibilities. This means that
outputs can be positive or negative, with no
maximum or minimum bounds.
Logistic Regression uses a logistic function to map the input variables to categorical
response/dependent variables. In contrast to Linear Regression, Logistic Regression outputs a
probability between 0 and 1. In essence,
Logistic Regression estimates the probability
of a binary outcome, rather than predicting
the outcome itself. Logistic Regression is
typically used for binary classification
problems, where the output is the
probability of the given input belonging to a
categorical target class. For example, threat
detection for cybersecurity aims to
distinguish between benign and suspicious
patterns of host and network activity; this is

ANALYTICS SOCIETY, IIM BANGALORE 7


a binary classification problem that can potentially be tackled using logistic regression (among other
techniques).
15. What is a confusion matrix?

A confusion matrix is a summary of prediction results on a


classification problem. The number of correct and incorrect
predictions are summarized with count values and broken down
by each class.
The confusion matrix shows the ways in which your classification
model is confused when it makes predictions. It gives us insight
not only into the errors being made by a classifier but more
importantly the types of errors that are being made.
Definition of the Terms:

• Positive (P): Observation is positive (for example: is an apple)


• Negative (N): Observation is not positive (for example: is not an apple)
• True Positive (TP): Observation is positive, and is predicted to be positive
• False Negative (FN): Observation is positive, but is predicted negative
• True Negative (TN): Observation is negative, and is predicted to be negative
• False Positive (FP): Observation is negative, but is predicted positive
Classification Rate/Accuracy:

16. What are Decision Trees?


Decision tree is a type of supervised learning algorithm
(having a pre-defined target variable) that is mostly used
in classification problems. It works for both categorical and
continuous input & output variables. In this technique, we
split the population (or sample) into two or more
homogeneous sets (or sub-populations) based on most
significant splitter / differentiator in input variables.
17. What are Model Ensembles?

Model Ensembles, or simply “ensembles,” are combinations of two or more predictions from
predictive models into a single composite score. Advantages of ensembling -

• Improved model accuracy (low error) - Ensembles nearly always improve model predictive
accuracy and rarely predict worse than single models
• Improved model robustness (less overfitting) - averaging multiple models into a single
prediction, no single model dominates the final predicted value of the models, reducing the
likelihood that a flaky prediction will be made (so long as all the models don’t agree on the
flaky prediction)

ANALYTICS SOCIETY, IIM BANGALORE 8


18. What are the common Statistical Data Distributions?
Distributions provide a parameterized mathematical function that can be used to calculate the
probability for any individual observation from the sample space. This distribution describes the
grouping or the density of the observations, called the probability density function. We can also
calculate the likelihood of an observation having a value equal to or lesser than a given value. A
summary of these relationships between observations is called a cumulative density function.
Binomial distribution – This measures the probabilities of the number of successes over a given
number of trials with a specified probability of success in each try. In the simplest scenario of a coin
toss (with a fair coin), where the probability of getting a head with each toss is 0.50 and there are a
hundred trials, the binomial distribution will measure the likelihood of getting anywhere from no
heads in a hundred tosses (very unlikely) to 50 heads (the most likely) to 100 heads (also very
unlikely). The binomial distribution in this case will be symmetric, reflecting the even odds; as the
probabilities shift from even odds, the distribution will get more skewed.
Poisson distribution - This distribution measures the likelihood of a number of events occurring
within a given time interval, where the key parameter that is required is the average number of
events in the given interval (l). The resulting distribution looks similar to the binomial, with the
skewness being positive but decreasing with l.
Negative Binomial distribution - Returning again to the coin toss example, assume that you hold the
number of successes fixed at a given number and estimate the number of tries you will have before
you reach the specified number of successes. The resulting distribution is called the negative binomial
and it very closely resembles the Poisson. In fact, the negative binomial distribution converges on the
Poisson distribution, but will be more skewed to the right (positive values) than the Poisson
distribution with similar parameters.
Geometric distribution - Consider again the coin toss example used to illustrate the binomial. Rather
than focus on the number of successes in n trials, assume that you were measuring the likelihood of
when the first success will occur. For instance, with a fair coin toss, there is a 50% chance that the
first success will occur at the first try, a 25% chance that it will occur on the second try and a 12.5%
chance that it will occur on the third try. The resulting distribution is positively skewed and looks as
follows for three different probability scenarios
Normal distribution - The normal distribution is a probability function that describes how the values
of a variable are distributed. It is a symmetric distribution where most of the observations cluster
around the central peak and the probabilities for values further away from the mean taper off equally
in both directions. Extreme values in both tails of the distribution are similarly unlikely. It has the
following properties –

• symmetric bell shape


• mean and median are equal; both located at the centre of the distribution
• ~68% of the data falls within 111 standard deviation of the mean
• ~95% percent of the data falls within 222 standard deviations of the mean
• ~99.7% of the data falls within 333 standard deviations of the mean

ANALYTICS SOCIETY, IIM BANGALORE 9


Guesstimates
Fundamentals
Since most of the analytics interviews concentrate on guestimates, let’s revise these concepts.
Points to consider while solving guestimates:

• Demand side/ supply side


To determine which approach to use, see which of the two sides is constrained. Use that side to
estimate the number. For example, to calculate number of people travelling in Delhi metro,
supply side (capacity of metro) is constrained and therefore the guesstimate has to be done from
supply side. However, if we need to plan the capacity of a new metro, demand needs to be
gauged.

• Commercial/ Household/ Individual


Commercial/ Household/ Individual needs should be considered to determine the extent of usage
of the product. For example, cars can be purchased for both commercial and household purpose.
This can be clarified from interviewer. However, cigarettes are purchased mostly on individual
basis.

• Replaceable & Reusable products


From the demand side, a product can be replaced every few years. This can change the demand
of the product. For example, car tires can be replaced. So, a replacement factor can be used to
determine the demand of tires. A product can be reusable, which decreases the demand. For
example, taxis can be used throughout the day, which needs to be discounted before calculating
the total number of cabs needed in a day.

• Segmentation
As the demand varies across various segments, segmentation needs to be done while solving the
guesstimate. Examples of segments are income bracket, age, rural/ urban, gender.
To determine the number of cars, income bracket could be used as the higher income people use
cars more. To determine the number of cigarettes, gender segmentation could be useful. On the
same lines, to determine number of burgers made in India, rural/urban segmentation could prove to
be helpful.

• Other factors to be considered:


o Peak vs non-peak occupancy rates
The utilization of a product need not be 100% and it can vary across the time. For example,
the occupancy rate of a taxi in peak time would be 100%, whereas in the non-peak times,
it could be 80%. However, the occupancy rate depends on other factors also like weekday
& weekend.

ANALYTICS SOCIETY, IIM BANGALORE 10


o Conversion rates
This is useful in running marketing campaigns, to determine the target number of
customers.
Some basic figures that would help if kept at your fingertips are:

120 crores (can take as 100 crores for ease of


Population of India
calculation)

Rural: Urban Split(India) 70:30

Average family size 5 people

Male: Female ratio 50:50s

Upper: Middle: Lower Class split


10: 40:50
(India)

Sample Guestimates

1. Estimate the market for car tires in India.


This can be done from demand side. Let’s start by estimating car market in India.

Cars

Commercial - No Household

30 crore families
(120 cr. population per
family)

Upper - 10% Middle - 40% Lower - 50%


(2 per family & 50% have (1 per family & 10% have
(No)
cars) cars)

3 crores 1.2 crores

ANALYTICS SOCIETY, IIM BANGALORE 11


After estimating number of cars to be about 4.2 crores in India, tire market needs to be estimated.
The demand of tires can come from both new cars and replacements.
Suppose, life-time of a tire is 3 years and life time of a car is 10 years.

Demand of tyres

Replacements New cars

Demand = 1 tyre per car Demand = 5 tyres per car


(1 * 4.2 crore / 3 years) (5 * 4.2 crores / 10 years)

1.4 crores 2.1 crores

Hence, the total demand for tires in India is 3.5 crores per year.

2. Estimate the number of flights taking off from Bangalore in a day


There are 2 types of flights flying from Bangalore- Domestic and International.
Let’s assume we need to estimate only domestic flights from Bangalore. There is higher traffic for
Tier-1 cities and lower traffic for Tier-2 cities. Hence, the segmentation is done accordingly. It is
assumed that there are about 10 tier-2 cities to which Bangalore has direct connectivity. Since,
Delhi and Mumbai have higher capacity, tier-1 cities are again divided into 2 categories.
The occupancy rates depend on the timing of the day. Suppose, there are 2 busy periods in morning
and evening each for 3 hours. Further, the first 4 hours (12 AM- 4 AM) airports have lesser traffic
and it is safe to assume as non-operational hours. Hence, the rest 14 hours is non-peak hours.

ANALYTICS SOCIETY, IIM BANGALORE 12


Flights from
Bangalore

Domestic - Yes International - No

Tier 1 cities - 5 Tier 2 cities - 10

Kolkata, Peak time


Delhi, Mumbai Hyderabad, 10*1/2 hours*6 =
Chennai 30

Peak time Peak time Non-peak time


2*4/hour*6 = 40 3*2/hour*6 = 36 10*1/5 hours*14 =
28

Non-peak time Non-peak time


3*1/2 hours*14 =
2*1/hour*14 = 30
21

The numbers are taken as per general observations.


For example, there are generally 4 flights every hour in peak time (for 4 major airline players) for
Delhi.
In total, there are 90 flights for Delhi, Mumbai. Further, 57 (~60) flights for the remaining 3 tier-1
cities. So, in a day there are about 150 flights flying from Bangalore to tier-1 cities. For tier-2 cities,
there are about 58 (~60) flights flying from Bangalore to tier-2 cities.
So, in a day there are about 210 domestic flights taking off from Bangalore airport.

3. Ola cab services are starting in Vizag. Estimate the number of cabs required on first week.
Assumptions:

• Population of Vizag is 20 lakhs


• Uniform demand across the day from 6AM-12PM o Carpooling is not there in first week
We need to first look at demand side for potential customers for the cab service

ANALYTICS SOCIETY, IIM BANGALORE 13


There are multiple factors to be considered while solving this:
1) Age group
2) Income group
3) Gender- Female population (1/4) usage rate shall be half compared to male (1/2)-
Hence, a conversion factor of ¾ is used

Population of Vizag -
20 lakh

High Income - 10% Middle Income - 40% Lower Income - No

30% potential 50% potential


customers customers
0.22 lakh 1.5 lakh

There are 1.72 lakh potential customers for cabs. Considering, the public transportation and auto
rickshaws, it can be assumed that about 50% of the middle class and 30% upper class (who owns
more cars) can be converted to use cabs.
Suppose 5% of them can be converted to customers of Ola in first week. This gives scope for almost
10K Ola customers in first week. Considering the size of Vizag, it can be assumed that each
customer takes average half an hour of travel on cab per day. Adding a 50% factor for waiting time,
and assuming on an average each cab driver works 9 hours a day.
Cabs required= 10000*(1/2) *(1+1/2) / (9 hours) = 833 cabs
So, about 800 cabs need to be acquired on first week

4. How many square feet of pizza are eaten in the United States each month?
Take your figure of 300 million people in America. How many people eat pizza?
Let’s say 200 million. Now let’s say the average pizza-eating person eats pizza twice a month and
eats two slices at a time. That’s four slices a month. If the average slice of pizza is perhaps six inches
at the base and 10 inches long, then the slice is 30 square inches of pizza. So, four pizza slices would
be 120 square inches. Since one square foot equals 144 square inches, let’s assume that each

ANALYTICS SOCIETY, IIM BANGALORE 14


person who eat pizza eats one square foot per month. Since there are 200 million pizza-eating
Americans, 200 million square feet of pizza are consumed in the US each month.
To summarize:
300 million people in America
200 million eat pizza
Average slice of pizza is six inches at the base and 10 inches long = 30 square inches (height x half
the base)
Average American eats four slices of pizza a month
Four pieces x 30 square inches = 120 square inches (one square foot is 144square inches), so let’s
assume one square foot per person
1 square foot x 200 million people = 200 million square feet a month

5. To find the number of books in IIM Bangalore

Clarify type of books (here, notebooks, xerox study material, magazines excluded)
Number of books with faculty (for instance 10 areas, 10 profs each, average 5 years’ experience, 5
books added per year)
Library: 1 floor (operational currently), 10 racks per floor, 5 shelves per rack, 30 books per shelf)

6. What is the size of the market for disposable diapers in India?


How many people live in India? A billion. Because the population of India is young, a full 600
millions of those inhabitants might be of child-bearing age. Half are women, so there are about 300
million Indian women of childbearing age. Now, the average family size in India is restricted, so it

ANALYTICS SOCIETY, IIM BANGALORE 15


might be 1.5 children, on average, per family. Let s say two-thirds of Indian women have children.
That means that there are about 300 million children in India. How many of those kids are under
the age of two? About a tenth, or 30 million. So there are at least30 million possible consumers of
disposable diapers.
To summarize:
1 billion people x 60% childbearing age = 600,000,000 people
600,000,000 people x 1/2 are women = 300,000,000 women of child-bearing age
300,000,000 women x 2/3 have children = 200,000,000 women with children
200,000,000 women x 1.5 children each = 300,000,000 children
300,000,000 children x 1/10 under age 2 = 30 million

Sample Puzzles

1. Bag of Coins
You have 10 bags full of coins. In each bag are infinite coins. But one bag is full of forgeries, and you
can’t remember which one. But you do know that a genuine coins weigh 1 gram, but forgeries weigh
1.1 grams. You have to identify that bag in minimum readings. You are provided with a digital
weighing machine.
Answer
1 reading
Explanation - Take 1 coin from the first bag, 2 coins from the second bag, 3 coins from the third bag
and so on. Eventually, we’ll get 55 (1+2+3…+9+10) coins. Now, weigh all the 55 coins together.
Depending on the resulting weighing machine reading, you can find which bag has the forged coins
such that if the reading ends with 0.4 then it is the 4th bag, if it ends with 0.7 then it is the 7th bag
and so on.
2. Prisoners and hats
There are 100 prisoners all sentenced to death. One night before the execution, the warden gives
them a chance to live if they all work on a strategy together. The execution scenario is as follows –
On the day of execution, all the prisoners will be made to stand in a straight line such that one
prisoner stands just behind another and so on. All prisoners will be wearing a hat either blue or red
in color. The prisoners don’t know what color of hat they are wearing. The prisoner who is standing
at the last can see all the prisoners in front of him (and what color of hat they are wearing). A prisoner
can see all the hats in front of him. The prisoner who is standing in the front of the line cannot see
anything.

ANALYTICS SOCIETY, IIM BANGALORE 16


The executioner will ask each prisoner what color of hat they are wearing one by one, starting from
the last in the line. The prisoner can only speak “Red” or “Blue”. He cannot say anything else. If he
gets it right, he lives otherwise he is shot instantly. All the prisoners standing in front of him can hear
the answers and gunshots.
Assuming that the prisoners are intelligent and would stick to the plan, what strategy would the
prisoners make over the night to minimize the number of deaths?
Answer
The strategy is that the last person will say ‘red’ if the number of red hats in front of him are odd and
‘blue’ if the number of red hats in front of him are even. Now, the 99th guy will see the if the red hats
in front of him are odd or even. If it is odd then obviously the hat above him is blue, else it is red. From
now on, it’s pretty intuitive.
3. Blind games
You are in a dark room where a table is kept. There are 50 coins placed on the table, out of which 10
coins are showing tails and 40 coins are showing heads. The task is to divide this set of 50 coins into
2 groups (not necessarily same size) such that both groups have same number of coins showing the
tails.
Answer
Divide the group into two groups of 40 coins and 10 coins. Flip all coins of the group with 10 coins.
4. Time and tide waits for none
You have two sand timers, which can show 4 minutes and 7 minutes respectively. Use both the sand
timers (at a time or one after other or any other combination) and measure a time of 9 minutes.
Answer

• Start the 7 minute sand timer and the 4 minute sand timer
• Once the 4 minute sand timer ends turn it upside down instantly
• Once the 7 minute sand timer ends turn it upside down instantly
• After the 4 minute sand timer ends turn the 7 minute sand timer upside down(it has now
minute of sand in it)
So effectively 8 + 1 = 9
5. Chaotic Bus
There is a bus with 100 labeled seats (labeled from 1 to 100). There are 100 persons standing in a
queue. Persons are also labelled from 1 to 100.
People board on the bus in sequence from 1 to n. The rule is, if person ‘i’ boards the bus, he checks
if seat ‘i’ is empty. If it is empty, he sits there, else he randomly picks an empty seat and sit there.
Given that 1st person picks seat randomly, find the probability that 100th person sits on his place i.e.
100th seat.

ANALYTICS SOCIETY, IIM BANGALORE 17


Answer
The final answer is the probability that the last person ends in up in his proper seat is exactly 1/2
Explanation - First, observe that the fate of the last person is determined the moment either the first
or the last seat is selected! This is because the last person will either get the first seat or the last seat.
Any other seat will necessarily be taken by the time the last guy gets to ‘choose’. Since at each choice
step, the first or last is equally probable to be taken, the last person will get either the first or last with
equal probability: 1/2
6. Shooters in a circle
N persons are standing in a circle. They are labelled from 1 to N in clockwise order. Every one of them
is holding a gun and can shoot a person on his left. Starting from person 1, they starts shooting in
order e.g. for N=100, person 1 shoots person 2, then person 3 shoots person 4, then person 5 shoots
person 6……..then person 99 shoots person 100, then person 1 shoots person 3, then person 5 shoots
person 7……and it continues till all are dead except one. What’s the index of that last person?
Answer
Write 100 in binary, which is 1100100 and take the complement which is 11011 and it is 27. Subtract
the complement from the original number. So 100 – 27 = 73.
Try it out for 50 people. 50 = 110010 in binary.
Complement is 1101 = 13. Therefore, 50 – 13 = 37.
For the number in form 2^n, it will be the first person. Let’s take an example:
64 = 1000000
Complement = 111111 = 63.
64-63 = 1.
You can apply this for any ’n’
7. Lazy people need to be smart
Four glasses are placed on the corners of a square Dosy Luther (a square plate which can rotate about
its center). Some of the glasses are upright (up) and some upside-down (down).
A blindfolded person is seated next to the Dosy Luther and is required to re-arrange the glasses so
that they are all up or all down, either arrangement being acceptable (which will be signalled by say
ringing of a bell).
The glasses may be rearranged in turns with subject to the following rules: Any two glasses may be
inspected in one turn and after feeling their orientation the person may reverse the orientation of
either, neither or both glasses. After each turn the Dosy Luther is rotated through a random angle.
The puzzle is to devise an algorithm which allows the blindfolded person to ensure that all glasses
have the same orientation (either up or down) in a finite number of turns. (The algorithm must be
deterministic, i.e. non-probabilistic )
ANALYTICS SOCIETY, IIM BANGALORE 18
Answer
This algorithm guarantees that the bell will ring in at most five turns:

• On the first turn, choose a diagonally opposite pair of glasses and turn both glasses up
• On the second turn, choose two adjacent glasses at least one will be up as a result of the
previous step. If the other is down, turn it up as well. If the bell does not ring, then there are
now three glasses up and one down
• On the third turn, choose a diagonally opposite pair of glasses. If one is down, turn it up and
the bell will ring. If both are up, turn one down. There are now two glasses down, and they
must be adjacent
• On the fourth turn, choose two adjacent glasses and reverse both. If both were in the same
orientation then the bell will ring. Otherwise there are now two glasses down and they must
be diagonally opposite
• On the fifth turn, choose a diagonally opposite pair of glasses and reverse both. The bell will
ring

8. The Red wedding


A bad king has a cellar of 1000 bottles of delightful and very expensive wine. A neighbour queen plots
to kill the bad king and sends a servant to poison the wine.
Fortunately (or say unfortunately) the bad king’s guards catch the servant after he could poison only
one bottle. Alas, the guards don’t know which bottle, but know that the poison is so strong that even
if diluted 100,000 times it would still kill the king.
Furthermore, it takes one month to have an effect. The bad king decides he will get some of the
prisoners in his vast dungeons to drink the wine. Being a clever bad king, he knows that he needs to
murder no more than 10 prisoners – believing he can fob off such a low death rate – and will still be
able to drink the rest of the wine (999 bottles) at his wedding party in 5 week time.
Explain what is in mind of the king, how will he be able to do so ? (he has only 10 prisoners in his
prisons)
Answer
The number the bottles are 1 to 1000. Now, write the number in binary format. We can write it as:
bottle 1 = 0000000001 (10 digit binary)
bottle 2 = 0000000010
.
.
bottle 500 = 0111110100
bottle 1000 = 1111101000

ANALYTICS SOCIETY, IIM BANGALORE 19


Now, take 10 prisoners and number them 1 to 10. Let prisoner 1 take a sip from every bottle that has
a 1 in its least significant bit. And, this process will continue for every prisoner until the last prisoner
is reached. For example:
Prisoner = 10 9 8 7 6 5 4 3 2 1
Bottle 924 = 1 1 1 0 0 1 1 1 0 0
For instance, bottle no. 924 would be sipped by 10,9,8,5,4 and 3. That way if bottle no. 924 was the
poisoned one, only those prisoners would die.
After four weeks, line the prisoners up in their bit order and read each living prisoner as a 0 bit and
each dead prisoner as a 1 bit. The number that you get is the bottle of wine that was poisoned. We
know, 1000 is less than 1024 (2^10). Therefore, if there were 1024 or more bottles of wine it would
take more than 10 prisoners.
9. Weighing balls
You have 12 balls that all weigh the same except one, which is either slightly lighter or slightly heavier.
The only tool you have is a balance scale that can only tell you which side is heavier. Using only three
weightings, how can you deduce, without a shadow of a doubt, which is the odd one out, and if it is
heavier or lighter than the others?
Answer
First we weigh {1,2,3,4} on the left and {5,6,7,8} on the right. There are three scenarios which can
arise from this:
If they balance, then we know 9, 10, 11 or 12 is fake. Weigh {8, 9} and {10, 11} (Note: 8 is surely not
fake). If they balance, we know 12 is the fake one. Just weigh it with any other ball and figure out if it
is lighter or heavier.
If {8, 9} is heavier, then either 9 is heavy or 10 is light or 11 is light. Weigh {10} and {11}. If they balance,
9 is fake (heavier). If they don’t balance then whichever one is lighter is fake (lighter).
If {8, 9} is lighter, then either 9 is light or 10 is heavy or 11 is heavy. Weigh {10} and {11}. If they
balance, 9 is fake (lighter). If they don’t balance then whichever one is heavier is fake (heavier).
If {1,2,3,4} is heavier, we know either one of {1,2,3,4} heavier or one of {5,6,7,8} is lighter but it is
guarantees that {9,10,11,12} are not fake. This is where it gets really tricky, watch carefully. Weigh
{1,2,5} and {3,6,9} (Note: 9 is surely not fake).
If they balance, then either 4 is heavy or 7 is light or 8 is light. Following the last step from the previous
case, we weigh {7} and {8}. If they balance, 4 is fake(heavier). If they don’t balance then whichever
one is lighter is fake (lighter).
If {1,2,5} is heavier, then either 1 is heavy or 2 is heavy or 6 is light. Weigh {1} and {2}. If they balance,
6 is fake (lighter). If they don’t balance then whichever one is heavier is fake (heavier).
If {3,6,9} is heavier, then either 3 is heavy or 5 is light. Weigh {5} and {9}. They won’t balance. If {5} is
lighter, 5 is fake (lighter). If they balance, 3 is fake (heavier).

ANALYTICS SOCIETY, IIM BANGALORE 20


If {5,6,7,8} is heavier, it is the same situation as if {1,2,3,4} was heavier. Just perform the same steps
using 5,6,7 and 8. Unless maybe you are too lazy to try and reprocess the steps, then you continue
reading the solution. Weigh {5,6,1} and {7,2,9} (Note: 9 is surely not fake).
If they balance, then either 8 is heavy or 3 is light or 4 is light. Following the last step from the previous
case, we weigh {3} and {4}. If they balance, 8 is fake(heavier). If they don’t balance then whichever
one is lighter is fake (lighter).
If {5,6,1} is heavier, then either 5 is heavy or 6 is heavy or 2 is light. Weigh {5} and {6}. If they balance,
2 is fake (lighter). If they don’t balance then whichever one is heavier is fake (heavier).
If {7,2,9} is heavier, then either 7 is heavy or 1 is light. Weigh {1} and {9}. If they balance, 7 is fake
(heavier). If they don’t balance then 1 is fake (lighter).
10. The Age Game
Two IIMB grads bump into each other at Kempegowda Airport. They haven’t seen each other in over
20 years.
The first grad says to the second: “how have you been?”
Second: “great! i got married and i have three daughters now”
First: “really? how old are they?”
Second: “well, the product of their ages is 72, and the sum of their ages is the same as the number
on that building over there..”
First: “right, ok.. oh wait.. hmm, i still don’t know”
Second: “oh sorry, the oldest one just started to play the piano”
First: “wonderful! my oldest is the same age!”
How old are the daughters ?
Answer
We know that there are 3 daughters whose ages multiply to 72. Taking a look at the possibilities -
Ages: Sum of ages:
1 1 72 74
1 2 36 39
1 3 24 28
1 4 18 23
1 6 12 19
189 18
2 2 18 22

ANALYTICS SOCIETY, IIM BANGALORE 21


2 3 12 17
249 15
266 14
338 14
346 13
After looking at the building number the second man still can’t figure out what their ages are, so that
means that the sum of the ages (or building number) must be 14, since that is the only sum that has
more than one possibility. Finally the man discovers that there is an oldest daughter. That rules out
the “2 6 6” possibility since the two oldest would be twins. Therefore, the daughters ages must be “3
3 8”

ANALYTICS SOCIETY, IIM BANGALORE 22


Interview Recommendations

ANALYTICS SOCIETY, IIM BANGALORE 23


American Express

Recommendations based on interviews


1. Candidates preferred - Involves analytics and some amount of finance. Having work experience
or projects in ML or AI could be useful. A certain amount of coding is required in the Modelling
Team but not in the Strategy Division
2. Process – Resume, Application Form, GD and Interview
3. Company based preparation – Some basic information about AmEx and its differentiation from
competitors. One of the GD topics was on how to make the customer experience better for
Amex card users. (It included everything from better service ideas to fraud detection). Just focus
on case studies, they will not ask cases but it helps to create an idea about what affects what
and leads to structured answers.
4. Interview focus - Thinking process, on the spot application. A lot of counter questions. Giving
analytical logical answers is important. Candidate should have an interest, basic analytics skills
and business acumen
5. Skills Required
a. Basic DS 1 and DS2 questions - Hypothesis Testing, Type 1 and type 2 errors, Linear
regression, collinearity, r2, significance
b. If you have any knowledge of ML, AI, read up on that. Some basic finance.
c. Estimation of markets
6. Resources for preparations
a. Data analytics related case studies
b. Guestimates - Know general numbers for guestimates (Approach is important)
c. Basic current affairs and new happenings in finance and analytics
7. Internship/ Placement
a. Nature of work
i. Workload- Very Reasonable (1pm-9pm) from Mon – Thurs, Friday 9-5pm
ii. Nature of Projects - 1 major project with 2 reviews
iii. Team - Size, Composition: 4-6 members. 1 manager, 1 mentor and 3-4 other
employees
b. Chances of PPO - About 50% based on work
c. Role & Growth Opportunities - Involving data analysis and mainly data cleaning.
d. Exit options - Fields like market research and analytics, project management. Domain
switching flexible

Sample Questions

Question When do you generally use linear regression, and can you explain the basic
steps?
Interviewee A simple linear regression is used to predict the value of a dependent variable
using a single independent variable. In case of multiple linear regression, the
value of the dependent variable is predicted using two or more independent
variables.
Steps of the process were also explained.
Question What do you know about Clustering and Principle Component Analysis? Can you
talk about various types of Clustering?

ANALYTICS SOCIETY, IIM BANGALORE 24


(Here the interviewee had reasonable experience in analytics industry)
Interviewee Clustering is used to figure out what groups do the data points fall into, to gain
insights from the given data set.
There are mainly 5 types of clustering are – K-means clustering, Mean-shift
clustering, Density based spatial clustering of applications with noise, expectation
maximisation clustering using Gaussian mixture model and Agglomerative
hierarchical clustering
Question Imagine that you are the HR in a leading company (L&T). You are assigned a task
to estimate when a person is going to resign, basically figure out the attrition rate
of the company. Also, what all factors would you consider for the same
Interviewee The attrition of people depends on various factors – duration of work ex, last
promotion, last raise, payment difference, feedback. Different weightage will be
assigned to individual factors to reach to an optimal valuation.
Question Two dice are rolled, and the possible sum lie in the range of 1 to 12, with all
values having equal probability of showing up. Tell me the
numbers on both the dices
Interviewee Since there are 36 possibilities, each value has 3 combinations of candidate
numbers. With 1, the only case is combination of 1 and 0.
Hence I took one normal dice with 1 to 6 and the other dice with three sides
having 0 (Explaining the cases with sum lying in the range of 1 to 6). To be
symmetrical, the other three sides had 6 dots.
Question I assume you are familiar with “people you may know” feature in
Facebook. Guess the key factors that was considered to develop the algorithm.
Interviewee Factors like location, work/college, mutual friends, recent interactions, part of
same groups etc
(and mentioned 4-5 more factors)
Question Let’s say Airtel can share all the customer information with American Express.
Assume there is no regulation restricting this. Suggest at least 5 parameters
which American express can use to identify prospective premium customers.
Interviewee Yes, we can look at the locality in which he lives in, places he travels
to, online purchases he makes, the bank account balance and the proximity with
other premium customers
(Initially answered default in payments historically, which the interviewer shot
down as validating the credibility of the customer)
Question Imagine that you are the HR in a leading company (L&T). You are assigned a task
to estimate when a person is going to resign, basically figure out the attrition rate
of the company. Also, what all factors would you consider for the same
Interviewee The attrition of people depends on various factors – duration of work ex, last
promotion, last raise, payment difference, feedback. Different weightage will be
assigned to individual factors to reach to an optimal valuation.
Question Fermat’s little theorem:
a^(p-1) = 1 mod(p)
where p is a prime and a is a natural number
Fermat’s last theorem:
x^(n) +y^(n) = z^(n)

holds for any integer n>2, where x,y and z are positive integers

ANALYTICS SOCIETY, IIM BANGALORE 25


Each interviewer asked to model a problem
1) Amex wants to subsidize children of Bannerghatta’s education. Can you build
me a model that I should follow?
2) Building model to check a transaction prior to credit card default.

Question Basics of Excel such as the pivot table etc.


Basic Questions from Financial Accounting course. Questions include:
a. What is the difference between Cash flow statement and profit loss
statement for a company
General Economic Discussion – Questions include:
a. What do you feel is the current major reason for slowdown in the economy?
b. Do you feel the government is doing enough to tackle slowdown.
Microeconomics Questions – Prisoner’s Dilemma

Contributors
1. Srijit Mondal 6. Vignesh S 11. Shouvik Das
2. Ravindra 7. Uday Kumar 12. Abhishek Kumar Sachan
3. Ayush Singh 8. Ramya Kolli 13. Ayush Singh
4. Gouthami N 9. Arpan Ghosh 14. Avinash Deka
5. Kunal Kumar 10. Nilesh Paliwal

ANALYTICS SOCIETY, IIM BANGALORE 26


EXL

Recommendations based on interviews

1. Candidates preferred - Work experience compulsory for Analytics.


2. Process – Interviews - 2 technical and one HR round
3. Company based preparation – General company overview and some knowledge about major
domains
4. Interviewer expectations – On-spot thinking and logical assumption making
5. Interview focus - Analytics-based concepts and pure analytics focus use SASS & SQL
Prepare well for common HR questions
6. Skills Required
a. Basic Concepts
b. Data analytics related case studies and case questions
For example, one candidate was asked to analyze why a particular company was suffering
from declining profits in the last one year and provide recommendations.
Another case asked was on the pricing of a new bulb that a company was releasing. The
bulb had a lifetime of 3 years with no substitutes.
c. Guestimates - Know general numbers for guestimates e.g. number of people who travel by
metro in Delhi, total number of street lights in Bangalore, etc.
7. Internship/ Placement
a. Role & Growth Opportunities: Standard policy of promotion after 2 years, performance
based – 1/1.5 year
b. Exit options: Analytics opportunities much better – SAS & Python

Contributors

1. Uday Kumar 4. Vishnu Pradeep


2. Nikhil Bichukale 5. Sameera Puli
3. Fung Rojee 6. Rahul Nagla

ANALYTICS SOCIETY, IIM BANGALORE 27


VISA

Recommendations based on interviews


1. Candidates preferred - Experience and technology
2. Company based preparation – Payment Gateway industry and competitors
3. Interviewer expectations – Analytical problem solving skills and applications of probabilistic
concepts
4. Interview focus – Ask clarifying questions, business acumen and knowledge and applications of
analytics. Be thorough with what you have written in your resume. Keep an example ready to
explain the concept. They just want to see if you know how to interpret the output.
5. Skills Required
a. Skill based: Regression, multiple regression
b. RMD: Understanding customer using data analytics
6. Resources for preparations
a. Data analytics related case studies
b. Guestimates - Know general numbers for guestimates
c. Skill based: Analytics DS-I,II
7. Internship/ Placement
a. Nature of work
i. Workload (hours): 8hrs/day
ii. Nature of Projects: Unstructured projects, personalised on user behaviour,
Improve Conversion Rate, ROI
iii. Team - Size, Composition: individual, 4-5 Product Managers
b. Role & Growth Opportunities: Based on your Performance
c. Exit options: Product Managers, Domain- Tech consult, strategy consult

Contributors
1. Kedar Mahamuni
2. Lakshya Verma

ANALYTICS SOCIETY, IIM BANGALORE 28


United Health Group

Recommendations based on interviews


1. Candidates preferred - People with analytics background
2. Process – Resume, Application Form, GD (general topics), HR, Managerial and Technical
Interview
3. Company based preparation – Learn about the healthcare industry
4. Interviewer expectations – practical and general industry knowledge
5. Resources for preparations
a. Data analytics related case studies
b. Case book from placement team
c. Guestimates - Know general numbers for guestimates
d. Knowledge about basic analytics- tools and bigdata
6. Internship/ Placement
a. Nature of work
i. Workload (hours): Relatively chill, friendly people
ii. Nature of Projects: Live projects, Analytics projects – find out NPS score, process
of analysis of feedbacks, data cleaning, simulations
iii. Culture: Free, directly talk with employee, good work culture
b. Team - Size, Composition: Worked under associate director, team size very big, everyone
worked on analytics. Compositions – software data analytics – L1, MBA People from
mathematics background
c. Chances of PPO: Company gives PPO
d. Exit options: Another analytics company (majorly)

Contributors
1. Anusmita Saha
2. Kausik Tamuli
3. Abhijit Khonde
4. Dr Nisha Sharma
5. Nishant Jaiswal

ANALYTICS SOCIETY, IIM BANGALORE 29


Uber

Recommendations based on interviews


1. Candidates preferred - No such preference
2. Process – Resume, Application Form, written test, GD (general knowledge and opinions),
interview
One GD topic was on the difference between start-ups in India and US and why are we copying
US start-ups more.
Another one was, ‘Era of Donald Trump’
3. Company based preparation –
(a) Current company affairs, and developments.
(b) Be thorough with what the company is doing in the market (new initiatives, existing
initiatives/programs)
(c) Knowledge of competitors: Uber’s shortfalls, what competitors are doing better.
(to be backed up by data/concrete examples)
(d) Conflicts with driver partners: Uber’s operations & business model from driver’s
perspectives. How Uber is helping create value for driver partners.
4. Interviewer expectations – They are looking for an approach on how you structure the problem
and go about solving it. In Uber the problems are abstract. So how do you quantity something
abstract is what they are looking for.
5. Internship/ Placement
a. Nature of work
i. Workload (hours) - Normal. No one would force you to work more. It is more
self-driven
ii. Nature of Projects - Candidates are usually given projects on which they currently
were working. So not some boring market analysis stuff. Real time important
projects were given
iii. Team Size and Composition - Varies
b. Role & Growth Opportunities - Uber in a way is a data company. You have a lot of data.
Analysing it and getting insights from it and implementing them is the typical job profile.
SQL knowledge would be good to pull data. Other folks in internship had to learn SQL in
their initial days. Growth opportunities would be in other start-ups in terms of
operations and also analyst roles
c. Exit options - Another analytics company (majorly) and start-ups in operational roles

Sample Question

Question Why could a driver cancel a trip. How would you solve it?
Interviewee The driver may cancel the trip because the customer is too late. To solver for this
issue, I would suggest incentivizing the driver.
(The interviewee then went on to approximate the amount that might be paid to
the driver as incentive)
Question Interview was based on problems faced by Uber and how can we tackle them.
Interviewees generally look for specific solutions and not generic ones. E.g.
Rather than saying improve UI for booking a ride, one can pinpoint exact issue

ANALYTICS SOCIETY, IIM BANGALORE 30


like location mismatch and tell how it can be solved procedurally. My interviews
revolved around:
• Round 1:
o GTM for launching Intercity from Bangalore to Mysore
• Round 2:
o Pros and Cons for Drivers in Uber
o How can Uber perform better than its competitors?
Question Based on what Uber should do to enhance its operations or enter the new
market. More questions on strategy formation. Uber’s fare estimation. How can
we utilise data analytics to enhance Uber’s operations.

2nd round – Major challenges faced by Uber and ways to circumvent it. Also, two
strategies that should be followed by Uber to increase its market share. How
does it differentiate from Ola.

Contributors
1. Nishanth Buddhavarapu
2. Amrit Satsangi
3. Kandalam Sai Madhuri
4. Navjot Kaur
5. Vikas Attri

ANALYTICS SOCIETY, IIM BANGALORE 31


Gartner

Recommendations based on interviews


1. Candidates preferred - The people selected from IIMB had 2-3 years prior of work experience.
Resume should have spikes.
2. Process – Resume, written test (quantitative aptitude test), interview
3. Company based preparation –Not much was required but
always good to use company specific jargons that you learn during buddy talks. It gets you
brownie point and help establish connection with the interviewer, worked for me very well.
Learn about the industry. Clients include Amazon, British Petroleum, Samsung, and other big
companies.
4. Interviewer expectations – Gartner doesn’t expect you to have a lot of the technical know-how.
Their major focus is if you have a logical mindset and a structured way of thinking. Build a
product management aptitude during placement preparation and also learn to structure your
thoughts as is required in a consulting interview. They were looking for Big audacious ideas,
anything less than that wasn’t being broached. Also, how would you go about executing those
ideas was discussed.
5. Placement
a. Nature of work
i. Workload (hours): Fixed working hours, 11 am – 8 pm
ii. Nature of Projects: Ideation and research projects. Example – increasing
engagement for the low engaged clients through Google Chrome extensions and
emails. This involves analysing the usage statistics of these channels
b. Team Size and Composition: 3 people team directly reporting to the director
c. Role & Growth Opportunities: The new recruits are put into a leadership development
program of 3 years. The program provides excellent opportunities for moving to the
upper management of the company.
d. Exit options: Consulting and Product Management

Sample Question

Question Tell us the name of 3 startups you have heard about/ you know about.
Interviewee a. Udaan
b. We work
c. Neuralink
They chose Wework and asked to come up with a new product that we work
could potentially offer. Firstly, I was asked to define the criteria I would use to
determine the type of product and the demand for the product. I used the
CIRCLES approach and came up with 4-5 product ideas.
Then I was asked to select one of the presented ideas. I did a pros and cons
analysis of all the options and selected 2 options based on ease of
implementation and expected revenues.
Question Second part of interview was a guesstimate to estimate the number of people
flying in and out of Bangalore per day.

ANALYTICS SOCIETY, IIM BANGALORE 32


Interviewee I divided the problem into weekdays and weekends. Additionally, I also divided
the air traffic into domestic and international. Assuming an average flight
occupancy of 70% and number of domestic and international flights in a day, I
calculated the number.

Contributors
1. Shubhankar Sarkar
2. Siddharth Gupta
3. Rishabh Garg

ANALYTICS SOCIETY, IIM BANGALORE 33


Goldman Sachs Strats

Recommendations based on interviews


1. Candidates preferred - The people selected from IIMB seem to have a quantitative or strong
mathematical background. There did not seem to be any particular preference for work
experience.
2. Process – Resume, interview
3. Company based preparation – Learn a bit about the industry. Based upon the experience, the
candidate may be asked basic finance questions. A few examples –
a. What is EBITDA?
b. Difference between Amortisation and Depreciation
c. Does Depreciation have any impact on the actual cash flow?
d. How does bond prices change as it approaches maturity? Draw the diagram.
e. What is the difference between an expense and capital expenditure?
f. What is goodwill?
g. Which bonds get more affected with interest rate changes? Long-term or short term?
Why?
4. Interviewer expectations – The interviewer expects you to have good analytical skills and may
ask you a number of quantitative questions. They would test your logical and structured thinking
ability and your communication skills. The questioned were basically focused on the kind of
experience the candidate had. Candidates with experience in computer science were asked about
data structures (arrays, stacks, queues) and easy level coding questions. Candidates with a
finance background however, were asked basic finance questions. Apart from this, they would
ask basic HR and resume-based questions including –
a. What do you know about work culture at GS?
b. How eager are you to join GS?
c. What do you understand about the role that is being offered? Are your skills in alignment
with the role? If there are any shortcomings, how do you plan to make up for it?
A few examples of quantitative questions asked have been illustrated in sample questions below.
5. Resources for preparations
a. Puzzles
b. Geeksforgeeks

Sample Questions

Question 100 prisoners in jail are standing in a queue facing in one direction. Each prisoner
is wearing a hat of color either black or red. A prisoner can see hats of all
prisoners in front of him in the queue, but cannot see his hat and hats of
prisoners standing behind him.
The jailer is going to ask color of each prisoner’s hat starting from the last
prisoner in queue. If a prisoner tells the correct color, then is saved, otherwise
executed. How many prisoners can be saved at most if they are allowed to
discuss a strategy before the jailer starts asking colors of their hats.

ANALYTICS SOCIETY, IIM BANGALORE 34


Interviewee Solution given at https://www.geeksforgeeks.org/puzzle-13-100-prisoners-with-
redblack-hats/
Question Situation: If Goldman Sachs at US decides to buy a Nigerian Government bond,
what are the factors that it has to keep in mind before making the deal?
Interviewee Answered
Question Find out the no of paths from top left to bottom right in a m*n grid. The
interviewer wanted a generalized formula.
Interviewee Explained the logic of how to find paths
Question Estimate the value of pi using random number generator.
Interviewee Explained the answer using the Monte-carlo simulation
Question Given two players and also an infinitely large sheet of paper. There is a
probability distribution given by f(x). One player uses the distribution once, the
other one uses it twice. Let’s call player one as A and other player as B.

The game proceeds as follows


A invokes the distribution and gets a number X. He/She then cuts a square of side
X from that sheet of paper.
Similarly B gets numbers Y and Z. He/She then cuts a rectangle having sides Y and
Z from the sheet of paper.

If both of them repeat the process n times. Which player has got a larger
expected average area?
Interviewee Player A.
Let the random variables representing A’s n trials be X1, X2, etc.
Let the random variables representing B’s trials be (Y1, Z1), (Y2, Z2), etc. (B uses
the distribution twice)

The average area for A is given by

Area = (X21+X22+…X2n)/n

E(Area) = E(X2) ……. (All Xi’s follow same distribution)

Similarly, Average area of B is given by

Area = (Y1Z1+Y2Z2+…..+YnZn)/n

E(Area) = E(YZ) = E(Y)E(Z)


(Call out the assumption that Y and Z are independent)(interviewer was fine with
it)

Now Y and Z are also calling the same distribution so,

E(X)= E(Y)=E(Z)

Hence, the difference in the expected average area of A and B becomes

ANALYTICS SOCIETY, IIM BANGALORE 35


E(X2)-[E(X)]2 = Var(X) > 0.
Question 100 people are standing in a row, all facing in the same direction wearing either a
white or a black cap. Every person can see the color of the caps of only those
standing in front of him.
You are the last (100th person), the leader. Devise a strategy to save as many
people as possible. (Ans: Designed a probabilistic approach with 50 saved with
100% probability and the remaining saved with 50% probability each)
Interviewee Answer given above in the puzzles section

Contributors

1. Abhishek Joshi GS
2. Adithya PSS
3. Kritika Saini
4. Saumya Dixit
5. Madhur Gupta
6. Nikitha Hegde

ANALYTICS SOCIETY, IIM BANGALORE 36


Industry Applications of Data Science

1. Finance
With the introduction of quantitative analytical models, large financial institutions and hedge funds
have shifted from manual trading to trading backed by technology. These models can be used to
analyse big data to

• Make accurate enter/exit trade decisions


• Risk mitigation
• Gauge market sentiment using opinion mining
For example, The Securities Exchange Commission (SEC) is using big data to monitor financial
market activity. They are currently using network analytics and natural language processors to
catch illegal trading activity in the financial markets.
2. Banking
The amount of data in the banking sector is skyrocketing every second. Proper study and analysis of
this data can help detect any and all illegal activities that are being carried out such as

• Misuse of credit/debit cards


• Venture credit hazard treatment
• Business clarity
• Money laundering
• Reduce NPA and increase profitability
For example, banks could use predictive analytics to calculate makeshift “credit scores” for people
that don’t have a credit history based on behavioural traits such as social media posts and spending
habits. They could use these scores to determine whether or not to lend to someone
3. Communications, Media and Entertainment
• Create content for different target audiences
• Recommend content on demand
• Measure content performance
• Effective targeting of the advertisements
• Optimized or on-demand scheduling of media streams in digital media distribution
platforms
For example, Spotify, an on-demand music service, uses Hadoop big data analytics, to collect data
from its millions of users worldwide and then uses the analysed data to give informed music
recommendations to individual users.
4. Retail
Retail data, derived from customer loyalty cards, POS scanners, RFID etc. is now being used enough
to improve customer experiences on the whole. Analytics in retail can be used for

ANALYTICS SOCIETY, IIM BANGALORE 37


• Optimized staffing through data from shopping patterns, local events, and so on
• Reduced fraud
• Timely analysis of inventory
Demand forecasting is another application of big data. For example, retailers like Walmart and
Walgreens regularly analyse changes in weather to see any patterns in product demand.
5. Transportation
Since the rise of big data, it has been used in various ways to make transportation more efficient
and easy. Following are some of the areas where big data contributes to transportation.

• Route planning - Big data can be used to understand and estimate users’ needs on different
routes and on multiple modes of transportation and then utilize route planning to reduce
their wait time
• Congestion management and traffic control - Using big data, real-time estimation of
congestion and traffic patterns is now possible. For examples, people are using Google Maps
to locate the least traffic-prone routes
• Safety level of traffic - Using the real-time processing of big data and predictive analysis to
identify accident-prone areas can help reduce accidents and increase the safety level of
traffic

6. Telecom
Customer retention/improving customer loyalty
There is an increased focus on improving user experience. To do so, analysts are creating
sophisticated 360-degree profiles assembled from a number of sources like, voice, SMS and data
usage patterns, video choices, customer care history, social, consumer demographics, service
usage, etc. This allows telecom companies to offer personalized services or products at every step
of the purchasing process. Businesses can tailor messages to appear on the right channels (e.g.,
mobile, web, call centre, in-store), in the right areas and in the right words and images.
Network optimization
Telecommunication companies tend to regard the customer's engagement process and internal
channels as a guarantee of smooth functioning of the operations. Network management and
optimization gives an opportunity to define the score points in operations to identify the root
causes of these complications. Looking into historical data and predicting possible future problems
or, on the contrary, beneficial scenarios is a great benefit for the telecom providers.
7. Health care
A few examples of applying big data and predictive analytics in healthcare are

• Big data can be used to predict negative health events that seniors could experience from
home-care. At AlayaCare, the analysis reduced hospitalizations and ER visits by 73 percent,
and 64 percent amongst chronically ill patients

ANALYTICS SOCIETY, IIM BANGALORE 38


• Historical big data from healthcare providers can be used to identify and analyse certain risk
factors in patients. This is useful for earlier detection of diseases, allowing doctors and their
patients to take action sooner
• Big data can identify disease trends as a whole based on demographics, geographies, socio-
economics, and other factors

8. Manufacturing
• Supply chain management and big data go hand-in-hand, which is why manufacturing is one
of the top industries to benefit from the use of big data.
• Monitoring the performance of production sites is more efficient with big data analytics. The
use of analytics is also extremely useful for quality control, especially in large-scale
manufacturing projects.
• Big data analytics plays a key role in tracking and managing overhead and logistics across
multiple sites. For example, being able to accurately measure the cost of shop floor tasks
can help reduce labour costs.
• Then there’s predictive analytics software, which uses big data from sensors attached to
manufacturing equipment. Early detection of equipment malfunctions can save sites from
costly repairs capable of paralyzing production.

ANALYTICS SOCIETY, IIM BANGALORE 39

You might also like