Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

lOMoARcPSD|33634359

Question bank

Artificial intelligence and data science (Anna University)

Scan to open on Studocu

Studocu is not sponsored or endorsed by any college or university


Downloaded by Helena Angel (helenafaustino375@gmail.com)
lOMoARcPSD|33634359

AD3491 – Fundamentals of Data Science and Analytics

Semester : IV Regulation 2021

Unit 1

2 marks:

1. De昀椀ne Data Science


2. What is Big data?
3. What is machine Learning?
4. De昀椀ne Data Mining?
5. List the characteristics of big data.
6. Mention the categories of data.
7. List some of the application domains of data science.
8. What is structured data? Give some examples.
9. De昀椀ne unstructured data. Give examples.
10. What is machine generated data.
11. State the importance of setting the research goal.
12. List the phases involve in the data science process.
13. What is meant by data cleaning?
14. What is project charter?
15. Identify the important contents of a project charter.
16. List some of the visualization techniques.
17. Name some problems associated with real world data.
18. De昀椀ne data warehouse, data mart and data lake.
19. List some of the factors involved in selecting the modeling technique.
20. What is a dummy variable?
21. What do you meant by exploratory data analysis?
22. List out the methods for combining data from di昀昀erent table.
23. Why we need to build a model?
24. On what factors the modelling technique is being selected.
25. Why the data’s to be cleaned.

10 mark Questions:

1. Discuss the applications of data science and big data with suitable examples.
2. Illustrate the overview of the data science process.
3. Elaborate any 昀椀ve application domains of data science
4. Describe the categories of data for data mining.
5. Discuss the signi昀椀cance of setting the research goal for the data science project.
6. Discuss the categories involved in retrieving relevant data from di昀昀erent sources of
data.
7. Explain the di昀昀erent stages of data preparation phase.
8. Elucidate the techniques involved in data cleansing.
9. Illustrate the steps involved in combining data from di昀昀erent data sources.
10. Explain the impact of variable reduction on data science project highlighting its pros
and cons.
11. Elaborate on the steps involve in model building with suitable diagrams.

UNIT II Descriptive Analytics

2 marks:
1. What is meant by frequency distribution?
2. What is meant by qualitative data? Give examples.
3. What is meant by quantitative data? Give examples.
4. Di昀昀erentiate qualitative and quantitative data.

Downloaded by Helena Angel (helenafaustino375@gmail.com)


lOMoARcPSD|33634359

5. Compare discrete and continuous variables.


6. State the di昀昀erence between nominal and ordinal data.
7. Mention the types of frequency distribution?
8. De昀椀ne an outlier?
9. What is percentile rank?
10. Provide the equation for percentile rank.
11. State the di昀昀erences between a histogram and bar graph.
12. Give the measures of central tendency.
13. De昀椀ne mode.
14. De昀椀ne median.
15. De昀椀ne positively skewed distribution.
16. What is negatively skewed distribution?
17. De昀椀ne variance.
18. De昀椀ne standard deviation.
19. What is normal curve?
20. De昀椀ne z score.
21. Give the equation for z-score.
22. How will convert the z score to the original score.
23. De昀椀ne correlation.
24. Mention the types of correlation.
25. De昀椀ne scatterplot.
26. What is a curvilinear relationship?
27. List the key properties of correlation coe昀케cient r.
28. De昀椀ne regression.
29. Give the types of regression models.
30. De昀椀ne restricted range.
31. What is a regression line?
32. What is the interpretation of r2.
33. What is the standard error of estimate.
34. Give the least squares Regression Equation.
35. State the desirable property of least square regression.
36. State the multi regression equation.
37. When does the regression fallacy occur.
38. How does the standard error of estimate is calculated.
39. Give the general form of linear regression model
40. Provide the di昀昀erence between correlation and regression.

10 Marks Questions:
1. Explain the di昀昀erent types of frequency distribution with suitable examples
and diagrams.
2. Elaborate the di昀昀erent ways to describe or represent data using tables with
suitable examples.
3. Explain the various ways by which data can be represents or describes using
graphs with suitable examples.
4. Elaborate the di昀昀erent measures of central tendency and describe the
suitable measures for the di昀昀erent types of data distribution.
5. Construct the frequency table an draw bar graph and stem, leaf displays for
the following data:

Downloaded by Helena Angel (helenafaustino375@gmail.com)


lOMoARcPSD|33634359

6. The following data are the shoe sizes of 50 male students. The sizes are
discrete data since shoe size is measured in whole and half units only.
Construct a histogram and calculate the width of each bar or class interval.
Suppose you choose six bars.

9; 9; 9.5; 9.5; 10; 10; 10; 10; 10; 10; 10.5; 10.5; 10.5; 10.5; 10.5; 10.5; 10.5;
10.5
11; 11; 11; 11; 11; 11; 11; 11; 11; 11; 11; 11; 11; 11.5; 11.5; 11.5; 11.5; 11.5;
11.5; 11.5
12; 12; 12; 12; 12; 12; 12; 12.5; 12.5; 12.5; 12.5; 14

7. The following data are the heights (in inches to the nearest half inch) of 100
male semiprofessional soccer players. The heights are continuous data, since
height is measured.

60; 60.5; 61; 61; 61.5 63.5; 63.5; 63.5 64; 64; 64; 64; 64; 64; 64; 64.5; 64.5;
64.5; 64.5; 64.5; 64.5; 64.5; 64.5 66; 66; 66; 66; 66; 66; 66; 66; 66; 66; 66.5;
66.5; 66.5; 66.5; 66.5; 66.5; 66.5; 66.5; 66.5; 66.5; 66.5; 67; 67; 67; 67; 67;
67; 67; 67; 67; 67; 67; 67; 67.5; 67.5; 67.5; 67.5; 67.5; 67.5; 67.5 68; 68; 69;
69; 69; 69; 69; 69; 69; 69; 69; 69; 69.5; 69.5; 69.5; 69.5; 69.5 70; 70; 70; 70;
70; 70; 70.5; 70.5; 70.5; 71; 71; 71 72; 72; 72; 72.5; 72.5; 73; 73.5 74

8. Compute the mean, median and mode for the following data sets.
I ) 9, 10, 12, 13, 13, 13, 15, 15, 16, 16, 18, 22, 23, 24, 24, 25

9. Explain the various measures of variability with suitable examples.


10. Using the computation formula for the sum of squares, calculate the
population standard deviation and sample standard deviation for the scores:
 1,3,7,2,0,4,3,7
 10,8,5,0,1,7,9,2,1
11. Elaborate in detail the signi昀椀cance of correlation and the various types of
correlation .
12. What are scatterplots? Illustrate on the various types with suitable examples.
13. Elaborate on the correlation coe昀케cient r. Compare the various correlation
coe昀케cients.
14. Calculate and analyze the correlation coe昀케cient for the following table:

Subject Age x Glucose Level y

1 43 99

2 21 65

3 25 79

4 42 75

Downloaded by Helena Angel (helenafaustino375@gmail.com)


lOMoARcPSD|33634359

5 57 87

6 59 81

15. What is the signi昀椀cance of r²? Give a detailed interpretation of r²?


16. Discuss the importance of regression. Elaborate on the types of Regression.
Calculate the regression coe昀케cient and obtain the lines of regression for the
following data.
17. Explain the signi昀椀cance of regression line and Least squares regression
equation.
18. Find the standard error for the sample data: 10, 20, 30, 40, 45.
19. Elaborate on multiple regression equations.
20. Elucidate regression towards the mean. Explain regression fallacy and state
how it can be avoided.

Unit III : Inferential Statistics

2 Mark Questions:

1. De昀椀ne population? Give an example.


2. What is real population?
3. List the di昀昀erent types of population.
4. What is hypothetical population?
5. De昀椀ne sample.
6. List the categories of sample.
7. What is random sampling?
8. Mention the types of random sampling.
9. Di昀昀erentiate population and sample.
10. List the types of non-probability sampling.
11. De昀椀ne snowball sampling.
12. Di昀昀erentiate non-probability and probability sampling.
13. Give the optimal sample size.
14. What is systematic sampling?
15. De昀椀ne cluster sampling.
16. Mention the advantages of random sampling.
17. De昀椀ne consecutive sampling.
18. Provide the standard error of the mean.
19. Give the level of con昀椀dence.
20. Compare two tailed and one tailed test.

10 Mark Questions:

1. Discuss on population and samples with suitable examples.


2. Discuss the di昀昀erent types of random sampling techniques.
3. Elaborate on the di昀昀erent types of non-probability based sampling techniques.
4. Illustrate the hypothesis testing with an example.
5. Explain the procedure of z-test with an example.
6. A teacher claims that the mean score of students in the class is greater than
80 with a standard deviation of 20. If a sample of 75 students was selected
with a mean score of 90 then check if there is enough evidence to support this
claim at a 0.05 signi昀椀cance level.

Downloaded by Helena Angel (helenafaustino375@gmail.com)


lOMoARcPSD|33634359

7. An online food delivery company claims that the mean delivery time is less
than 30 minutes with a standard deviation of 10 minutes. Is there enough
evidence to support this claim at a 0.05 signi昀椀cance level if 49 orders were
examined with a mean of 20 minutes?
8. A company wants to improve the quality of products by reducing defects and
monitoring the e昀케ciency of assembly lines. In assembly line A, there were 9
defects reported out of 100 samples and in line B, 25 defects out of 600
samples were identi昀椀ed. Check if there is a di昀昀erence in the procedures at a
0.05 alpha level?
9. Explain in detail about Estimation and the signi昀椀cance of point estimates.
10. Elaborate on Con昀椀dence interval and level of con昀椀dence.

Unit –IV Analysis of Variance

2 Mark Questions:

1. De昀椀ne categorical variable. Give example.


2. Mention the types of categorical variable.
3. Give the di昀昀erence between on way and two way anova.
4. What is t-test?
5. Give the measures of the t-test.
6. When to use the t-test.
7. Provide the di昀昀erence between a one-sample t-test and a paired t-test.
8. Can the t-test is used to measure the di昀昀erence among several groups.
9. De昀椀ne chi-square test and write its formulae.
10. Specify the purpose of chi-square test.
11. How the chi-square test is interpreted.
12. What is an acceptable value in chi-square method.
13. De昀椀ne F-test.
14. Write the decision criteria for a right tailed Ftest.
15. Give the critical value for the F test.
16. Why does Anova uses Ftest?
17. Is it possible for a negative F statistic in a f test
18. How F test is di昀昀erentiated from T .
19. Di昀昀erentiate one way Anova from two way Anova.
20. How Anova’s statistical signi昀椀cance is determined.
21. What is factorial anova?
22. Where does the chi square test is used.

10 Mark Questions:

1. Alex timed 21 people in the sprint race, to the nearest second:


59, 65, 61, 62, 53, 55, 60, 70, 64, 56, 58, 58, 62, 62, 68, 65, 56, 59, 68, 61, 67
Find the mean, median and mode.
2. The table shows the star rating for 20 hotels.
What is the mode star rating?

Downloaded by Helena Angel (helenafaustino375@gmail.com)


lOMoARcPSD|33634359

3. Find the Variance of the Frequency Table

Unit 5 – Predictive Analysis

2 Mark Questions:
1. How do you calculate least squares?
2. List the methods the available to calculate least square.
3. De昀椀ne the principle of least square.
4. De昀椀ne least square.
5. What is least square curve 昀椀tting?
6. Why do we need Time series Analysis?
7. Give some examples for time series analysis.
8. Mention the types of Time series Analysis.
9. Mention the applications of Time Series Analysis.
10. Give the limitations of Time series Analysis.
11. List the Data types of Time series.
12. What does Goodness of 昀椀t mean?
13. Why is Goodness of 昀椀t is important?
14. Provide the most common goodness of 昀椀t tests.
15. Why do we test goodness of 昀椀t.
16. De昀椀ne multiple linear regression.
17. How the error is calculate in linear regression model.

Downloaded by Helena Angel (helenafaustino375@gmail.com)

You might also like