Professional Documents
Culture Documents
Big Data & Business Analytics 2021 Q&a
Big Data & Business Analytics 2021 Q&a
Big Data & Business Analytics 2021 Q&a
Third Semester
Faculty ofManagement Science
Master of Business Administration
Core Courses - MB010301 - BIG DATA & BUSINESS ANALYTICS
2019 Admission Onwards
97E1FF23
Answer key
Part A
Answeranyfivequestions.Eachquestioncarries2marks.
1. What is a nominaldata?
Ans: Data that is used for naming/labelling or just categorizing .Eg: Gender, Marital status
Ans: Collection of predictive analytics techniques that use tree like graphs for predicting the
value of a target variable based on values of explanatory variables
3. What is a symmetricdistribution?
Ans: A symmetric distribution is when the data on either side of mean/median is the same.
Also, for the symmetric distribution, the mean, mode and median all fall at the same point
Ans: Scatter plots are used to visually represent relationships between variables and dots are
used to represent the data points.
5. Differentiatethetesttobeconductedinmultiplelinearregressionmodellingtocheckthe
statistical significance of individual variable and overall model validation at a given
significantlevel.
Ans: t-test is used to check the statistical significance of response variable and
individual explanatory variable
F-test is used to check overall model validation
6. What are the advantages of hierarchicalclustering?
Ans:
1. Easy to understand and implement
2. No pre specification of number of clusters
3. Easy to decide the clusters by dendogram
9. What is SPSS package? State the advantages and limitations of using SPSSpackage.
Ans: SPSS is statistical package for social sciences used for data entry, coding and analysis (2
marks)
Advantages: Easy , easy to interpret results, data entry is like excel, many tests under one
roof, advanced analysis add ins are also available , good gui
Limitations: expensive & proprietary software , charts are not very comfortable , not easy to
customize (4 marks)
11. What will be the impact on model due to presence of multi collinearity?
Ans: Multicollinearity refers to predictors that are correlated with other predictors in the
model. Reduces the precision of the estimated coefficients, which weakens the statistical
power of your regression model. The coefficients become very sensitive to small changes in
the model. Will not be able to trust the p-values to identify independent variables that are
statistically significant.
12. Explain the significance of Receiver Operating Characteristics (ROC)curve.
Ans: A receiver operating characteristic curve, or ROC curve, is a graphical plot to
understand the overall worth of a logistic regression model It is a plot between sensitivity
in the vertical axis and 1-specifity in the horizontal axis.
14. Explain the steps used for formulating a problem as linear programmingproblem.
Ans :
1. Identification of decision variable
2. Identify objective function
3. Identify constraints
4. Identify implicit constraints
5. Solve the problem
6. Perform sensitivity analysis
(5 6 = 30Marks)
Part C
Answer any two questions. Each question carries 10 marks.
Question number 17 is compulsory .
15. Explain the reason behind calculating standardized regression coefficient and method to
calculate the same with anexample.
Ans: To compare the impact of different explanatory variables that have different units
of measurement. Hence normalization has to be done.
When a regression model is built on standardized dependent variable and standardized
independent variables, then the regression coefficients are known ans standardized
regression coefficients.
For one standard devision change in the explanatory variable, standardized regression
coefficient captures the number of standard deviations by which the response variable
will change
16. Briefly explain the importance of R and MS Excel in DataAnalytics?
Ans: Excel starts off easier to learn and is frequently cited as the go-to program for
reporting, thanks to its speed and efficiency. R is designed to handle larger data sets, to be
reproducible, and to create more detailed visualizations. R is open source while Excel is
proprietory..
Compulsory Question
17. Explain the roadmap for analytics capabilitybuilding.
a. Define Analytics Strategy
a. Develop long term plan
b. Identify key functional areas to kick start the analytical process
c. Communicate analytics strategy cross the organization
b. Build talent
a. Plan a recruitment strategy
b. Get the right team-
c. Build infrastructure
a. Hardware & software
b. Cloud option for IT infrastructure
d. Identify sources of data and develop data collection plan
a. Identify all relevant data
b. Automate data collection process
e. Analytics implementation
a. Start with simple applications targeting small improvements
b. Innovate
c. Build effective communication strategy for analytics output
d. Calculate ROI
(2•10 = 20 Marks)