Week_1&2_Mock_Solution_May24

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

Statistics for Data Science-1

Weekly Mock: Week-1 & 2

1. A teacher asked students in a class to do a project. The goal of the project is to find
the proportion of the specific chemical contaminant in water of the different states of
India. A student, Ramya, did a survey on the randomly selected five states of North
India and gave her report to the teacher. Identify the sample and population. [2
marks]

(a) The sample consists of all the states of India and the population consists of the
five states of India.
(b) The sample consists of all the states of North India and the population consists of
all the states of India.
(c) The sample consists of the five states of India and the population consists of all
the states of North India.
(d) The sample consists of five selected states of North India and the population con-
sists of all the states of India.

Answer: d
Solution:
By definition, population is the entire collection of elements we are interested in. Here,
the goal of the project is to find the proportion of the specific chemical contaminant
in water of the different states of India. Hence, the population will be all the states of
India.
Also, Sample is a subset of the population which is being studied. Since, the five states
of North India is studied for the goal of project. Therefore, sample is five selected
states of North India.
Thus, The sample consists of five selected states of North India and the population
consists of all the states of India.
Hence, option (d) is correct.

2. The data of decadal birth rate in India from 1901 to 2011 is collected. Based on this,
choose the correct option: [2 marks]

(a) It is a time series data.


(b) It is a cross-sectional data.

Answer: a
Solution:
Since the data of decadal birth rate in India are recorded over a period of time, the

1
data collected is time series data.
Hence, option (a) is correct.

The data of the five districts of various states of India was collected and shown in Table
1.1.M. The government used this data to plan the upcoming term of five years. Based
on the information given, answer the questions (3), (4) and (5).

District Area Population Population Growth Number of Villages


(In km2 ) (Since 2001)
Varanasi 1535.2 3676841 17.1 % 835
Ujjain 6091.8 1986864 25.34 % 132
Patna 3202.68 5838465 8.21 % 1395
Mirzapur 4521.27 2496970 4.97 % 1967
Jaipur 11152 6626178 −0.6 % 73

Table 1.1.M

3. Which of the following statements is/are true? [2 marks]

(a) Ujjain is a variable


(b) Mirzapur is a case
(c) District is a case
(d) Population is a variable

Answer: b, d
Solution:
Here, the specification data of the five districts of various states of India are collected.
So each specification (columns of the table) i.e. District, Area (In km2 ), Population,
Population Growth (Since 2001) and Number of Villages is a variable.
Observation is an individual data point for which the entire data is being collected.
So, here each value corresponding to which each of the specification noted is a case.
Thus, it is clear that Mirzapur is a case.
Hence, options (b) and (d) are correct.

4. Which of the following statements is/are true? [2 marks]

(a) Population Growth is a continuous numerical variable.


(b) District and Area are numerical variables.
(c) There are 5 numerical variables in the dataset.
(d) Number of villages is a discrete numerical variable.

2
Answer: a, d
Solution:
Since Population Growth has numeric properties and can have arithmetic operations
performed on it, it follows that Population Growth is a numerical variable. Moreover, it
can take any value. Therefore, Population Growth is a continuous numerical variable.
Hence, option (a) is correct.
From the table, it is clear that District is a categorical variable and there are 4 numerical
variables , i.e, Area, Population, Population Growth and Number of Villages. Thus,
option (b) and (c) are incorrect.
Now, it is clear that Number of Villages has numeric properties and can have arithmetic
operations performed on it, it follows that Number of Villages is a numerical variable.
Moreover, it can take only countable value. Therefore, Number of Villages is a discrete
numerical variable.
Hence, option (d) is correct.

5. Choose the incorrect option(s): [2 marks]

(a) Area has a ratio scale of measurement.


(b) Population growth has an interval scale of measurement.
(c) Number of villages has an interval scale of measurement.
(d) Population has a ratio scale of measurement.

Answer: c
Solution:
Number of Villages have numerical values that can be added, subtracted, multiplied
or divided. It also has an absolute zero. Thus, it should come under the ratio scale
of measurement. But, it is given that number of villages has an interval scale of
measurement, which is incorrect statement.
Hence, option (c) is correct.

6. In an exam, there are 5 multiple select questions (more than one options can be correct).
For every question, the student will be awarded 0.3 marks for the selection of each
correct option. While, 0.2 marks will be deducted for every wrong input. What is the
scale of measurement of final score obtained by a student in the exam? [3 marks]

(a) Ordinal
(b) Nominal
(c) Ratio
(d) Interval

Answer: d
Solution:
Final score obtained can have a meaningful interval. But, it does not have an absolute

3
zero as final score obtained can be negative as well. Therefore, it comes under the
interval scale of measurement.
Hence, option (d) is correct.

Table 1.2.M represents the way students commute to school on a regular basis.

Travel Method Number of Students Relative frequency


Public Transport x
Car y
Cycle z
Bike w
Walk 25 0.2

Table 1.2.M

Based on the given information, answer questions (7) and (8).


7. What is the total number of students? [2 marks]
Answer : 125
Solution:
By the definition of relative frequency, we know that Relative frequency for ith category
is:
fi fi
R fi = =⇒ N = ; i = 1, 2..., n
N Rfi
25
Therefore, Total number of students; N = = 125
0.2
8. If 75 students are going to school by Public Transport, Car and Cycle, then how many
students are going to school by Bike alone? [3 marks]
Answer: 25
Solution:
We are given that, x + y + z = 75 and we have N = 125.
Since,
f1 + f2 + f3 + f4 + f5 = 125
=⇒ x + y + z + w + 25 = 125
75 + w + 25 = 125
w + 100 = 125
w = 125 − 100
=⇒ w = 25

Therefore, 25 students are going to school by Bike alone.

4
9. The data on the number of test matches played by Sourav Ganguly in different countries
is given in Table 1.3.M.

Country Number of test matches played


Australia 20
Sri Lanka 17
West Indies 12
South Africa 16
New Zealand 10

Table 1.3.M : Number of test matches played by Sourav Ganguly in different countries

Based on the given information, which of the following statements is/are incorrect? [2
marks]

(a) Median can be computed for the given data.


(b) Pareto chart can be plotted for the given data.
(c) Mode can be computed for the given data.
(d) Bar chart is most appropriate to represent the percentage of test matches played
by Sourav Ganguly in different countries.

Answer: a, d
Solution:
Since, the data is collected for different countries in which Sourav Ganguly played
the test matches and we cannot order the countries. Therefore, Median cannot be
computed for the given data as median can be calculated only for the data which can
be ordered. But, it is given that Median can be computed for the given data which is
incorrect statement.
Hence, option (a) is correct.
As we know that pie chart is most appropriate to represent the percentage of test
matches played by Sourav Ganguly in different countries. But, it is given that bar
chart will be most appropriate to represent the percentage of test matches which is
incorrect.
Hence, option (d) is correct.

10. Which of the following statements is/are true? [2 marks]

(a) To represent the share of a particular category, bar chart is the most appropriate
graphical representation.
(b) If the categorical variable is ordinal, then the bar chart must preserve the order.
(c) A bar chart is used to get the count of the corresponding categories in the data.

5
(d) A bar chart cannot be plotted vertically.

Answer: b, c
Solution:
To show the share of a particular category, pie chart is a most appropriate graphical
representation.” Hence, option (a) is incorrect.
By the definition and property of Bar chart, it is clear that options (b) and (c) are
correct.
Also, we know that bar chart can be plotted horizontally as well as vertically.
Hence, option (d) is incorrect.

The data of production of rods (in 1,000 tonnes) by a company is given in the figure
1.1.M.

Figure 1.1.M: Production of rods (in 1,000 tonnes) over the years

Based on the given information, answer questions (11) and (12).

11. In which year the production of rods by the company is highest? [1 Mark]

(a) 1997
(b) 1999
(c) 1998

6
(d) 1995

Answer: c
Solution:
Since, the highest bar in the given chart is corresponding to the year 1998. Therefore,
the highest production of rods by the company is in 1998.
Hence, option (c) is correct.

12. Which of the following is/are the appropriate Pareto chart for the Figure 1.1.M? [2
Marks]
(a)

(b)

(c)

7
(d)

Answer: b, d
Solution:
When the categories in a bar chart are sorted by frequency, then it is called Pareto
chart.
Arrangement of frequencies in ascending order will be 5, 10, 15, 20, 25, 30 with corre-
sponding years 1994, 1996, 1999, 1995, 1997, 1998.
Thus, option (b) is correct.
Also, the Arrangement of frequencies in descending order will be 30, 25, 20, 15, 10, 5
with corresponding years 1998, 1997, 1995, 1999, 1996, 1994.
Thus, option (d) is correct.

You might also like