22BC285 - Ayush Jain - ITDA - Sem4

1) Differentiate between a discrete and continuous variable. Give examples.
Category Discrete Variable Continuous Variable

Variables that can only take on
Variables that can take on any value
specific, distinct values, often
Definition within a certain range, typically
countable and finite or countably
uncountable and infinite.
infinite.
Countable and typically finite or
Nature Uncountable and infinite.
countably infinite.
Usually measured in whole
Measurement Measured in decimal or fractional values.
numbers.
Can be represented by discrete Represented by a continuous line or
Representation
points on a graph. curve on a graph.
Discrete variables have finite Continuous variables have infinite
Precision precision, meaning there are gaps precision, with values existing at any
between adjacent values. point within the range.
Probability distributions for discrete Probability distributions for continuous
Probability variables are represented by variables are represented by probability
probability mass functions (PMFs). density functions (PDFs).
a. Number of children in a family,
a. Weight of a person, time taken to
number of goals scored in a soccer
complete a task.
Examples match.
b. Height of students (e.g., 150 cm, 150.1
b. Number of students in a
cm, 150.11 cm, etc.)
classroom (1, 2, 3, ...)
2) What do you understand by a categorical variable? Differentiate between nominal and

ordinal variable.
Categorical Variable
A categorical variable is a type of variable used in statistics and data analysis that represents
qualitative characteristics or attributes rather than numerical values. Categorical variables can take on
a limited, predefined set of categories or groups, and each observation or data point is assigned to one
of these categories. These categories can represent characteristics such as types, labels, or attributes.
A categorical variable has values that you can put into a countable number of distinct groups based on
a characteristic. For a categorical variable, you can assign categories but the categories have no
natural order. If the variable has a natural order, it is an ordinal variable. Categorical variables are
also called qualitative variables or attribute variables.
For example, college major is a categorical variable that can have values such as psychology, political
science, engineering, biology, etc.
Categorical variables can be represented in various ways:

 Text Labels: Categories are often represented using descriptive text labels, such as "Male,"
"Female," "Married," "Single," etc.
 Numerical Codes: In some cases, categories may be assigned numerical codes for computational
purposes. These codes do not imply any numerical relationship between the categories but are
simply used as identifiers.
Differentiate between Nominal Variable and Ordinal Variable
Category Nominal Variable Ordinal Variable

Variables where categories have no Variables where categories have a clear
Definition
inherent order or ranking. order or ranking.
Gender (Male, Female), Eye Color Educational Qualification (High School,
Examples
(Brown, Blue, Green) Bachelor's, Master's)
Categories are mutually exclusive Categories have a specific order or
Nature
with no inherent order. ranking.
Cannot be measured numerically; Assigned numerical values to reflect
Measurement
assigned labels or codes. order or ranking.
Gender: Male (1), Female (2); Eye
Examples of Educational Qualification: High School
Color: Brown (1), Blue (2), Green
Coding (1), Bachelor's (2), Master's (3)
(3)
Categories are treated equally; no Categories are ordered, allowing for
Comparison
ranking is implied. comparison and ranking.
Example Frequency counts, mode Frequency counts, mode & median
Analysis calculation. calculation.
3) Illustrate the different methods of collecting primary data.
Meaning of Primary Data

Primary data refers to data collected from first-hand experience directly from the main source. It
refers to data that has never been used in the past. The data gathered by primary data collection
methods are generally regarded as the best kind of data in research.
The methods of collecting primary data can be further divided into quantitative data collection
methods (deals with factors that can be counted) and qualitative data collection methods (deals with
factors that are not necessarily numerical in nature).
Quantitative Data Collection Methods

It is based on mathematical calculations using various formats like close-ended questions, correlation
and regression methods, mean, median or mode measures. This method is cheaper than qualitative
data collection methods and it can be applied in a short duration of time.
Qualitative Data Collection Methods

It does not involve any mathematical calculations. This method is closely associated with elements
that are not quantifiable. This qualitative data collection method includes interviews, questionnaires,
observations, case studies, etc. There are several methods to collect this type of data. They are
A. Observation Method
Observation method is used when the study relates to behavioural science. This method is
planned systematically. It is subject to many controls and checks. The different types of
observations are:
 Structured and unstructured observation
 Controlled and uncontrolled observation
 Participant, non-participant and disguised observation
B. Interview Method
The method of collecting data in terms of verbal responses. It is achieved in two ways, such as:
 Personal Interview – In this method, a person known as an interviewer is required to ask
questions face to face to the other person. The personal interview can be structured or
unstructured, direct investigation, focused conversation, etc.
 Telephonic Interview – In this method, an interviewer obtains information by contacting
people on the telephone to ask the questions or views, verbally.
C. Questionnaire Method
In this method, the set of questions are mailed to the respondent. They should read, reply and
subsequently return the questionnaire. The questions are printed in the definite order on the form.
A good survey should have the following features:
 Short and simple
 Should follow a logical sequence
 Provide adequate space for answers
 Avoid technical terms
 Should have good physical appearance such as colour, quality of the paper to attract the
attention of the respondent
D. Schedules
This method is similar to the questionnaire method with a slight difference. The enumerations are
specially appointed for the purpose of filling the schedules. It explains the aims and objects of the
investigation and may remove misunderstandings, if any have come up. Enumerators should be
trained to perform their job with hard work and patience.
E. Experiments
Experimental methods involve manipulating one or more variables to observe the effect on
another variable. Experiments are conducted in controlled settings where researchers can control
and manipulate variables. Experiments allow for establishing cause-and-effect relationships,
precise control over variables, and replication of findings. Experimental designs require careful
planning, randomization, and control to minimize confounding variables and ensure internal
validity.
4) List the important sources of secondary data.
Meaning of Secondary Data

Secondary data are basically second-hand pieces of information. These are not gathered from the
source as the primary data. To put it in other words, the secondary data are those that are already
collected. So, these are comparatively less reliable than the primary data.
These are usually used when the time for the enquiry is compact and the exactness of the enquiry can
be settled to an extent. However, the secondary data can be gathered from different sources which can
be categorised into two categories. These are as follows:
A. Published Sources
a. Government Agencies
 National statistics bureaus (e.g., U.S. Census Bureau, UK Office for National Statistics)
provide published reports and datasets on demographic, economic, and social indicators.
 Health departments publish statistics on diseases, mortality rates, healthcare access, and
public health interventions.
 Other government agencies publish reports, white papers, and statistical bulletins on topics
such as education, labor, transportation, crime, and the environment.
b. Academic Institutions
 Universities and research institutions publish academic journals, research reports, theses,
dissertations, and conference papers covering various disciplines.
 Institutional repositories provide access to scholarly works produced by faculty, researchers,
and students, including published research articles and reports.
c. Non-Governmental Organizations (NGOs)

 NGOs and international organizations (e.g., World Bank, UNICEF, WHO) publish reports,
research studies, and evaluations related to their areas of focus, such as poverty, health,
education, and human rights.
 Reports, publications, and databases from NGOs are often made available to the public
through their websites or online platforms.
d. Market Research Firms

 Market research companies (e.g., Nielsen, GfK, Ipsos) publish syndicated reports, market
studies, and consumer surveys on consumer behavior, market trends, product sales,
advertising effectiveness, and industry performance.
 Commercial databases offer subscription-based access to published market intelligence
reports and datasets.
e. Media Outlets
 Newspapers, magazines, television networks, and online news websites publish articles,
reports, and multimedia content covering current events, social issues, and trends.
 Archives of news articles and documentaries serve as published sources of information on
historical events, social movements, and cultural phenomena.
B. Unpublished Sources
a. Academic Institutions
 Universities and research institutions maintain unpublished research data, survey responses,
field notes, and raw data collected by researchers for ongoing or completed studies.
 Institutional repositories may include unpublished manuscripts, working papers, technical
reports, and datasets that have not yet been formally published.
b. Market Research Firms

 Market research companies conduct proprietary research studies, surveys, and analyses for
clients, generating unpublished reports, datasets, and insights tailored to specific business
needs.
 Custom research projects and consulting engagements may result in unpublished deliverables
shared exclusively with clients.
c. Trade Associations and Professional Organizations

 Industry-specific associations and professional bodies collect unpublished data on market
trends, standards, regulations, and best practices within their respective sectors.
 Membership directories, industry surveys, and proprietary research conducted by associations
may contain unpublished information available to members only.
d. Social Media and Online Communities

 Social media platforms, forums, and online communities generate vast amounts of user-
generated content (e.g., posts, comments, reviews) that may remain unpublished but can be
accessed and analyzed for research purposes.
 Data scraping tools and APIs enable researchers to collect and analyze unpublished social
media data for insights into consumer behavior, public opinion, and social trends.
5) Prepare a frequency distribution by inclusive method taking a class interval of 7 from

the following data.
28, 17, 15, 22, 29, 21, 23, 27, 18, 12, 7, 2, 9, 4, 1, 8, 3, 10, 5, 20, 16, 12, 8, 4, 33, 27, 21,
15, 3, 36, 27, 18, 9, 2, 4, 6, 32, 31, 29, 18, 14, 13, 15, 11, 9, 7, 1, 5, 37, 32, 28, 26, 24, 20,
19, 25, 19, 20, 6, 9
Frequency Distribution:
Class
Frequency
Interval
1-7 15
8-14 12
15-21 15
22-28 10
29-35 6
36-42 2
Total 60
6) Create a histogram for the following data.
Class Frequency
75-89 10
90-104 11
105-119 23
120-134 26
135-149 31
150-164 23
165-179 9
180-194 9
195-209 6
210-224 2
Convert the Inclusive Series into the Exclusive Series:
Class Intervals Frequency

74.5-89.5 10
89.5-104.5 11
104.5-119.5 23
119.5-134.5 26
134.5-149.5 31
149.5-164.5 23
164.5-179.5 9
179.5-194.5 9
194.5-209.5 6
209.5-224.5 2
Histogram:
7) Prove that sum of deviations from the mean is 0. Use Equation Editor.
n
¿ Prove: ∑ ( x i−x )=0
i=1
n n
∑ ( xi −x ) =( x 1−x ) +( x 2−x ) +( x 3−x ) + …+( x n−x )∑ ( xi −x ) =( x 1+ x2 + x 3 +…+ x n ) −¿

i=1 i=1
n n
∑ ( xi −x ) =n x−n x ∑ ( xi −x ) =0
i=1 i=1
x 1+ x2 + x 3 +…+ x n
since , =x therefore , x 1 + x 2+ x3 + …+ x n=n x
n
8) Find the weighted arithmetic mean of first n natural numbers, the weights being the
numbers themselves.
Weighted Mean=
∑ wx Weighted Mean= ( 1 ×1 ) +( 2 ×2 ) +( 3 ×3 )+ …+( n ×n )
∑w 1+2+3+ …+n
2 2 2 2
1 +2 + 3 + …+n
Weighted Mean=
1+ 2+ 3+…+n
n ( n+1 ) ( 2 n+1 )
Since , ∑ of Square of first n natural numbers=
6
n ( n+1 )
¿ , ∑ of first n natural numbers=
2
n ( n+1 ) ( 2 n+1 )
6 1
So ,Weighted Mean= Weighted Mean= (2 n+1)
n ( n+ 1 ) 3
2
9) From the following table showing the wage distribution in a certain factory, determine
a) The mean wage
b) The median wage
c) The modal wage
d) The wage limits for the middle 50% of the wage earners
e) The percentage of workers who earned between Rs. 75 and Rs. 125
f) The percentage of workers who earned more than Rs. 150 per week
g) The percentage of workers who earned less than Rs. 100 per week
Weekly Wages (in Rs.) No. of employees

20-40 8
40-60 12
60-80 20
80-100 30
100-120 40
120-140 35
140-160 18
160-180 7
180-200 5
Frequency Distribution:
Weekly Wages (in No. of employees

X fx cf
Rs.) (f)
20-40 8 30 240 8
40-60 12 50 600 20
60-80 20 70 1400 40
80-100 30 90 2700 70
100-120 40 110 4400 110
120-140 35 130 4550 145
140-160 18 150 2700 163
160-180 7 170 1190 170
180-200 5 190 950 175
175 18730
a. Mean
Mean= 107.03
b. Median
N/2= 87.5
Cf= 70
f= 40
l=100
h=20
Median= 108.75
c. Mode
f1= 40
f0= 30
f2= 35
l= 100
h= 20
Mode= 113.33
d. Middle 50% of Wage Earners

Calculate Q1
1(N/4) = 43.75
Cf= 40
F= 30
L= 80
H= 20
Q1= 82.50
Calculate Q3
3(N/4) = 131.25
Cf= 40
F= 30
L=80
H= 20
Q3= 132.14
Wage Limit of 50% of Wage Earners = 82.50 – 132.14
e. Percentage of Workers who earned between 75 and 125
Target Class Target Frequency

75-80 5
80-100 30
100-120 40
120-125 8.75
83.75
So, Percentage of Workers who earned between 75 and 125 = 47.86%
f. Percentage of Workers who earned more than 150

150-160 9
160-180 7
180-200 5
21
So, Percentage of Workers who earned more than 150 = 12%
g. Percentage of Workers who earned less than 100

20-40 8
40-60 12
60-80 20
80-100 30
70
So, Percentage of Workers who earned less than 100 = 40%
10) Draw Ogives and hence estimate the median.

Class Interval Frequency
0-9 8
10-19 32
20-29 142
30-39 216
40-49 240
50-59 206
60-69 143
70-79 13
Frequency Table
Class Intervals Less Than More Than

Frequenc
Lower Upper cf
y Upper Limit cf Lower Limit cf
Limit Limit
-0.5 9.5 8 8 9.5 8 -0.5 1000
9.5 19.5 32 40 19.5 40 9.5 992
19.5 29.5 142 182 29.5 182 19.5 960
29.5 39.5 216 398 39.5 398 29.5 818
39.5 49.5 240 638 49.5 638 39.5 602
49.5 59.5 206 844 59.5 844 49.5 362
59.5 69.5 143 987 69.5 987 59.5 156
69.5 79.5 13 1000 79.5 1000 69.5 13
1000
Ogives:
Less Than & More Than Ogives

1200
1000
800
Frequency
600
Less Than
400 Ogive
200
0
-10 0 10 20 30 40 50 60 70 80 90
Less Than/ More Than
Calculation of Median
N/2= 500
cf= 398
Median Class= 39.5-49.5
l= 39.5
f= 240
h= 10
N
−cf
2
Median=l+ ×h
f
Median = 43.75
11) Given below is the distribution of 140 candidates, obtaining marks X or higher in an
examination. (All marks are given in whole numbers). Calculate the mean, median and
mode of the distribution.
X C.F.
10 140
20 133
30 118
40 100
50 75
60 45
70 25
80 9
90 2
100 0
Frequency Table:
cf (more
X f fx cf
than)
10 140 7 70 7
20 133 15 300 22
30 118 18 540 40
40 100 25 1000 65
50 75 30 1500 95
60 45 20 1200 115
70 25 16 1120 131
80 9 7 560 138
90 2 2 180 140
100 0 0 0 140
140 6470
N=140
Calculation of Mean:
Mean=
∑ fx
N
Mean = 46.214
Calculation of Median:
N +1 140+1
= =70.5
2 2
c.f. greater than 70.5 is 95 and the corresponding X value to it is 50.
Hence,
Median = 50
Calculation of Mode:
Highest frequency is 30 and the corresponding X value to it is 50.
Hence,
Mode = 50
12) The following numbers give the weights of 55 students of a class. Prepare a suitable
frequency table.
a) Draw the histogram and frequency polygon of the above data

b) For the above weights, prepare a cumulative frequency table and draw the less than
ogive
Frequency Table:
Class Cumulative
Interval Frequency Less than Frequency
40-50 7 50 7
50-60 7 60 14
60-70 10 70 24
70-80 16 80 40
80-90 7 90 47
90-100 4 100 51
100-110 3 110 54
110-120 1 120 55
55
Histogram:
Frequency Polygon:
Less Than Ogive:

Less Than Ogive
60 55
54
51
50 47
40
40
Frequency
30
24
20
14
10 7
0
50 60 70 80 90 100 110 120
Less Than
13) Find out the missing figures:

a) Mean =? (3Median-Mode)
b) Mean-Mode =? (Mean-Median)
c) Median = Mode+? (Mean-Mode)
d) Mode = Mean-? (Mean-Median)
a. 1/2
b. 3
c. 2/3
d. 3
14) Find the mean and variance of first n natural numbers.
Mean=
∑ X Here ,∑ X=1+2+3+ …+n¿ , n=n
n
n ( n+1 )
n ( n+1 ) 1+ 2+ 3+…+n
Also , ∑ of first n natural numbers=
❑
' '
Mean= 2
2 n Mean=
n
n+1
Mean=
2
Calculation of Variance:
n ( n+1 ) ( 2n+ 1 ) 2
n −1
Variance=n ∑ x −¿ ¿ ¿ ¿Variance=n ×
2
−¿ ¿Variance=
6 12
15) Find the mean and standard deviation of the following distribution
x F
2.5-7.5 12
7.5-12.5 28
12.5-17.5 65
17.5-22.5 121
22.5-27.5 175
27.5-32.5 198
32.5-37.5 176
37.5-42.5 120
42.5-47.5 66
47.5-52.5 27
52.5-57.5 9
57.5-62.5 3
Frequency Table:
x f x fx x-mean (x-mean)^2 f(x-mean)^2

2.5-7.5 12 5 60 -25.01 625.25 7503.00
7.5-12.5 28 10 280 -20.01 400.20 11205.60
12.5-17.5 65 15 975 -15.01 225.15 14634.75
17.5-22.5 121 20 2420 -10.01 100.10 12112.10
22.5-27.5 175 25 4375 -5.01 25.05 4383.75
27.5-32.5 198 30 5940 -0.01 0.00 0.00
32.5-37.5 176 35 6160 5.00 24.95 4391.20
37.5-42.5 120 40 4800 10.00 99.90 11988.00
42.5-47.5 66 45 2970 15.00 224.85 14840.10
47.5-52.5 27 50 1350 20.00 399.80 10794.60
52.5-57.5 9 55 495 25.00 624.75 5622.75
57.5-62.5 3 60 180 30.00 899.70 2699.10
1000 30005 100174.98
Mean=
∑ fx
N
Mean = 30.005
Calculation of Standard Deviation:
Standard Deviation=
√
Standard Deviation = 10.009
∑ f (x−mean)2
∑ fx
16) The following data gives the arithmetic averages and standard deviations of three
groups. Calculate the arithmetic average and standard deviation of the whole group.
Sub-group No. of men Average wages (in Standard deviation
Rs.) (in Rs.)
A 50 61 8
B 100 70 9
C 120 80.5 10
Calculation of Combined Mean

n1 x1 +n 2 x 2+ n3 x 3
Combined Mean=
n1 +n2 +n3
Combined Mean= 73
Calculation of Combined Standard Deviation
√ n1 ( σ 1 +d 1 ) + n2 ( σ 2 +d 2 ) + n3 ( σ 3 +d 3 )
2 2 2 2 2 2
Combined SD=
n1+ n2 +n3
D1= 12
D2= 3
D3= -7.5
Combined SD= 11.89
17) Define and provide formulas for Coefficient of Variation and Coefficient of Dispersion.
What is the use of the following measures?
The Coefficient of Variation (CV) and the Coefficient of Dispersion (CD) are statistical measures
used to assess the variability or spread of a dataset relative to its mean.
A. Coefficient of Variation (CV)

The Coefficient of Variation is a dimensionless measure of relative variability. It is calculated as
the ratio of the standard deviation (𝜎) to the mean (μ) of a dataset, expressed as a percentage:
σ
CV = × 100 %
μ
Where:
 𝜎 is the standard deviation of the dataset.
 μ is the mean of the dataset.
The CV provides a standardized measure of dispersion that allows comparison of variability

between datasets with different units of measurement. A lower CV indicates less variability
relative to the mean, while a higher CV suggests greater variability.
B. Coefficient of Dispersion (CD)

The Coefficient of Dispersion is another measure of relative variability, particularly used for
frequency distributions. It is calculated as the ratio of the standard deviation (𝜎σ) to the mean
(𝜇μ) of a dataset:
σ
CD=
μ
Where:
 𝜎 is the standard deviation of the dataset.
 μ is the mean of the dataset.
The CD indicates how much the values in a dataset deviate from the mean. A CD greater than 1
suggests that the data are more dispersed or spread out compared to the mean, while a CD less
than 1 indicates less dispersion.
Uses:
A. Coefficient of Variation (CV):
 It is commonly used in fields such as finance, economics, and biology to compare the
variability of datasets with different units or scales.
 It helps in assessing the risk associated with an investment portfolio by comparing the
volatility (standard deviation) to the expected return (mean).
 It aids in evaluating the consistency of processes or products in manufacturing and
quality control.
B. Coefficient of Dispersion (CD):

 It is used in descriptive statistics to quantify the spread or dispersion of data around the
mean.
 It helps in comparing the variability of datasets within the same scale or units.
 In economics, it can be used to assess income inequality by measuring the dispersion of
income distribution among a population.
18) What is a positively skewed distribution? Illustrate graphically.
In statistics, a positively skewed (or right-skewed) distribution is a type of distribution in which most
values are clustered around the left tail of the distribution while the right tail of the distribution is
longer. The positively skewed distribution is the direct opposite of the negatively skewed distribution.
Unlike with normally distributed data where all measures of the central tendency (mean, median, and
mode) equal each other, with positively skewed data, the measures are dispersed. The general
relationship among the central tendency measures in a positively skewed distribution may be
expressed using the following inequality:
Mean > Median > Mode
In contrast to a negatively skewed distribution, in which the mean is located on the left from the peak
of distribution, in a positively skewed distribution, the mean can be found on the right from the
distribution’s peak. However, not all negatively skewed distributions follow the rules. You may
encounter many exceptions in real life that violate the rules.
19) Define and illustrate through an example leptokurtic, platykurtic and mesokurtic
distributions.
Kurtosis is a statistical measure that describes the shape of the distribution of data points in a dataset
relative to the normal distribution. A normal distribution has a kurtosis of 3, and distributions with
higher kurtosis are called leptokurtic, while those with lower kurtosis are called platykurtic.
Mesokurtic distributions have kurtosis equal to 3, similar to the normal distribution.
1. Leptokurtic Distribution:
 Definition: Leptokurtic distributions have a higher peak and heavier tails compared to the
normal distribution, indicating more extreme values or outliers.
 Example: A distribution of stock returns during a period of high market volatility may
exhibit leptokurtic behavior due to frequent large gains or losses.
 Illustration: In a leptokurtic distribution, the data points cluster tightly around the mean,
with taller and thinner tails compared to the normal distribution. The peak of the distribution
is higher, indicating a greater concentration of values near the mean, while the tails extend
further outward, suggesting the presence of outliers.
2. Platykurtic Distribution:
 Definition: Platykurtic distributions have a flatter peak and lighter tails compared to the
normal distribution, indicating fewer extreme values or outliers.
 Example: A distribution of test scores in a classroom where the majority of students perform
similarly with few very high or very low scores may exhibit platykurtic behavior.
 Illustration: In a platykurtic distribution, the data points are spread out more evenly across
the range of values, resulting in a lower peak and shorter tails compared to the normal
distribution. The distribution appears flatter, with less clustering around the mean and fewer
extreme values.
3. Mesokurtic Distribution:
 Definition: Mesokurtic distributions have kurtosis equal to 3, similar to the normal
distribution, indicating a moderate concentration of data points around the mean with tails
similar to the normal distribution.
 Example: The heights of adult males in a population often follow a mesokurtic distribution,
with most individuals clustered around the average height and fewer outliers at the extremes.
 Illustration: In a mesokurtic distribution, the shape closely resembles the normal
distribution, with a moderate peak and tails extending to the left and right. The data points are
symmetrically distributed around the mean, and the distribution displays neither excessive
peakedness nor flatness compared to the normal distribution.
20) In terms of Normal Distribution, answer the following questions:

a) What is the shape of the normal curve?
b) What is the PDF of a Normal Distribution?
c) What is a Standard Normal Distribution?
d) A normal random variable X has the following PDF.
1 − {(x−1)
e 8 } , -∞<x<∞
2
f ( x )=
√ 8 π
∞
Then ∫ f ( x ) dx is equal to?

1
a. Shape of Normal Curve

The shape of the normal curve is bell-shaped, symmetric about its mean, and characterized by its
mean (μ) and standard deviation (σ).
b. PDF of a Normal Distribution
The Probability Density Function (PDF) of a Normal Distribution is given by the formula:
2
−( x−μ)
1 2
f ( x )= e 2σ
σ √2 π
Here, 𝜇 is the mean of the distribution and 𝜎 is the standard deviation.
c. Standard Normal Distribution

The standard normal distribution, also known as the z-distribution, is a special type of normal
distribution with a mean of 0 and a standard deviation of 1.
Its PDF simplifies to:
2
−x
1
f ( x )= e 2
√2 π
∞
d. Calculation of ∫ f ( x ) dx
1
1 − {(x−1)
e 8 } , -∞<x<∞
2
Given, f ( x )=
√8 π
1 −{( x−1)
e 8 } dx
∞ ∞ 2
so ,∫ f ( x ) dx=∫
1 1 √ 8 π
To solve this integral, it's helpful to recognize that the provided PDF is already normalized
(integrates to 1 over the entire real line). Therefore, the integral from 1 to infinity will be the
complement of the cumulative distribution function (CDF) evaluated at 1:
∞ 1
∫ f ( x ) dx=1−∫ f (x ) dx
1 −∞
Since the Normal Distribution is symmetric about its mean, we can rewrite the integral as:
1
1−2 ∫ f ( x) dx
−∞
This is the area to the left of 𝑥=1, which corresponds to the CDF at 𝑥=1. So, the integral
∞
∫ f ( x ) dx is equal to the probability of observing a value greater than 1 in this Normal

1
Distribution.

22BC285 - Ayush Jain - ITDA - Sem4

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

22BC285 - Ayush Jain - ITDA - Sem4

Uploaded by

Copyright:

Available Formats

1) Differentiate between a discrete and continuous variable. Give examples.

Category Discrete Variable Continuous Variable

2) What do you understand by a categorical variable? Differentiate between nominal and

Categorical variables can be represented in various ways:

Differentiate between Nominal Variable and Ordinal Variable

Category Nominal Variable Ordinal Variable

3) Illustrate the different methods of collecting primary data.

Meaning of Primary Data

Quantitative Data Collection Methods

Qualitative Data Collection Methods

4) List the important sources of secondary data.

Meaning of Secondary Data

c. Non-Governmental Organizations (NGOs)

d. Market Research Firms

b. Market Research Firms

c. Trade Associations and Professional Organizations

d. Social Media and Online Communities

5) Prepare a frequency distribution by inclusive method taking a class interval of 7 from

6) Create a histogram for the following data.

Convert the Inclusive Series into the Exclusive Series:

Class Intervals Frequency

∑ ( xi −x ) =( x 1−x ) +( x 2−x ) +( x 3−x ) + …+( x n−x )∑ ( xi −x ) =( x 1+ x2 + x 3 +…+ x n ) −¿

Weekly Wages (in Rs.) No. of employees

Weekly Wages (in No. of employees

d. Middle 50% of Wage Earners

Wage Limit of 50% of Wage Earners = 82.50 – 132.14

e. Percentage of Workers who earned between 75 and 125

Target Class Target Frequency

So, Percentage of Workers who earned between 75 and 125 = 47.86%

f. Percentage of Workers who earned more than 150

Target Class Target Frequency

So, Percentage of Workers who earned more than 150 = 12%

g. Percentage of Workers who earned less than 100

Target Class Target Frequency

So, Percentage of Workers who earned less than 100 = 40%

10) Draw Ogives and hence estimate the median.

Class Intervals Less Than More Than

Less Than & More Than Ogives

Less Than/ More Than

a) Draw the histogram and frequency polygon of the above data

Less Than Ogive:

13) Find out the missing figures:

14) Find the mean and variance of first n natural numbers.

x f x fx x-mean (x-mean)^2 f(x-mean)^2

Calculation of Standard Deviation:

Calculation of Combined Mean

Calculation of Combined Standard Deviation

A. Coefficient of Variation (CV)

The CV provides a standardized measure of dispersion that allows comparison of variability

B. Coefficient of Dispersion (CD)

B. Coefficient of Dispersion (CD):

18) What is a positively skewed distribution? Illustrate graphically.

Mean > Median > Mode

20) In terms of Normal Distribution, answer the following questions:

Then ∫ f ( x ) dx is equal to?

a. Shape of Normal Curve

c. Standard Normal Distribution

∫ f ( x ) dx is equal to the probability of observing a value greater than 1 in this Normal

You might also like