Big Data & Business Analytics 2021 Q&a

Third Semester
Faculty ofManagement Science
Master of Business Administration
2019 Admission Onwards

Answer key

Part A

1. What is a nominaldata?

Ans: Data that is used for naming/labelling or just categorizing .Eg: Gender, Marital status

2. What are decisiontrees?

Ans: Collection of predictive analytics techniques that use tree like graphs for predicting the
value of a target variable based on values of explanatory variables

3. What is a symmetricdistribution?

Ans: A symmetric distribution is when the data on either side of mean/median is the same.
Also, for the symmetric distribution, the mean, mode and median all fall at the same point

4. What are the uses of a ScatterPlot?

Ans: Scatter plots are used to visually represent relationships between variables and dots are
used to represent the data points.

5. Differentiatethetesttobeconductedinmultiplelinearregressionmodellingtocheckthe
statistical significance of individual variable and overall model validation at a given

Ans: t-test is used to check the statistical significance of response variable and
individual explanatory variable
F-test is used to check overall model validation
6. What are the advantages of hierarchicalclustering?
1. Easy to understand and implement
2. No pre specification of number of clusters
3. Easy to decide the clusters by dendogram

7. Calculation ofadvertisementeffectivenesswillbeeasierwiththehelpofanalytics. Justify

your answer with analyticalsolution.
Ans: some points that can be considered are
1. Demographics of customers who view advertisement
2. Sales forecast using ads
3. Understanding conversion rate
4. Social media analytics
(5•2 = 10 Marks)
Part B
Answer any five questions. Each question carries 6 marks.

8. What is data science? Explain data sciencelifecycle.

Ans :
a. Data science- an interdisciplinary field that uses scientific methods, processes, algorithms
and systems to extract knowledge and insights from structured and unstructured data, (2
b. Life cycle (4 marks)
Stage 1 :Identify problems/ Improvement Opportunities
Stage 2 :Identify sources of data required for problem identified in stage 1
Stage 3 :Pre process the data for missing/incorrect data and generate new variables if
Stage 4:Split the data into training and validation data
Stage 5 :Develop the models and select the models for deployment based on performance
Stage 6 :Deploy solution/decision

9. What is SPSS package? State the advantages and limitations of using SPSSpackage.
Ans: SPSS is statistical package for social sciences used for data entry, coding and analysis (2
Advantages: Easy , easy to interpret results, data entry is like excel, many tests under one
roof, advanced analysis add ins are also available , good gui
Limitations: expensive & proprietary software , charts are not very comfortable , not easy to
customize (4 marks)

10. Justifytheneedforameasureofcentraltendency.Statetherequisitesforanideal measure

of centraltendency.
Ans: Measures of Central Tendency provide a summary measure that attempts to
describe a whole set of data with a single value that represents the middle or
center of its distribution. Understands the clustering of the data points to the
central point. The measures are mean, median and mode. (3 marks)
Requisites: (3 marks)
a. should be clearly defined
b. Easy to understand and calculate
c. Should take into consideration all observations
d. Should be suitable for further analysis
e. Less effect by fluctuations in sampling
f. Extreme observations /outliers must not have great effect.

11. What will be the impact on model due to presence of multi collinearity?
Ans: Multicollinearity refers to predictors that are correlated with other predictors in the
model.  Reduces the precision of the estimated coefficients, which weakens the statistical
power of your regression model. The coefficients become very sensitive to small changes in
the model. Will not be able to trust the p-values to identify independent variables that are
statistically significant.
12. Explain the significance of Receiver Operating Characteristics (ROC)curve.
Ans: A receiver operating characteristic curve, or ROC curve, is a graphical plot to
understand the overall worth of a logistic regression model It is a plot between sensitivity
in the vertical axis and 1-specifity in the horizontal axis.

13. Explain unsupervised predictiveanalytics.

Ans: When data has only predictor variables an not the outcome variable, unsupervised
learning algorithms are used.
• Techniques that find patterns in unlabeled data, or data that lacks a defined response
• Help the analyst identify data-driven patterns that may warrant further investigation
• Eg:  bit-mapped photograph, a series of comments from social media etc
Method : clustering

14. Explain the steps used for formulating a problem as linear programmingproblem.
Ans :
1. Identification of decision variable
2. Identify objective function
3. Identify constraints
4. Identify implicit constraints
5. Solve the problem
6. Perform sensitivity analysis

(5 6 = 30Marks)
Part C
Answer any two questions. Each question carries 10 marks.
Question number 17 is compulsory .
15. Explain the reason behind calculating standardized regression coefficient and method to
calculate the same with anexample.
Ans: To compare the impact of different explanatory variables that have different units
of measurement. Hence normalization has to be done.
When a regression model is built on standardized dependent variable and standardized
independent variables, then the regression coefficients are known ans standardized
regression coefficients.
For one standard devision change in the explanatory variable, standardized regression
coefficient captures the number of standard deviations by which the response variable
will change
16. Briefly explain the importance of R and MS Excel in DataAnalytics?
Ans: Excel starts off easier to learn and is frequently cited as the go-to program for
reporting, thanks to its speed and efficiency. R is designed to handle larger data sets, to be
reproducible, and to create more detailed visualizations. R is open source while Excel is
Compulsory Question
17. Explain the roadmap for analytics capabilitybuilding.
a. Define Analytics Strategy
a. Develop long term plan
b. Identify key functional areas to kick start the analytical process
c. Communicate analytics strategy cross the organization
b. Build talent
a. Plan a recruitment strategy
b. Get the right team-
c. Build infrastructure
a. Hardware & software
b. Cloud option for IT infrastructure
d. Identify sources of data and develop data collection plan
a. Identify all relevant data
b. Automate data collection process
e. Analytics implementation
a. Start with simple applications targeting small improvements
b. Innovate
c. Build effective communication strategy for analytics output
d. Calculate ROI

(2•10 = 20 Marks)

