Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 18

Introduction to Statistics (4485)

Semester: Spring, 2023

ASSIGNMENT NO: 01

Maqbool Ahmed

Allama Iqbal Open University Islamabad

Q. 1 (a) what do you understand by the term statistics? Give its


chief characteristics.
(b) Give brief account of the importance of statistics in different fields.
Ans: What do you understand by the term statistics? Give its chief
characteristics.

Give brief account of the importance of statistics in different fields.


Statistics plays a crucial role in various fields by providing valuable tools and
techniques for data analysis and decision-making. Here are some brief accounts
of the importance of statistics in different domains:

Business and Economics:


Statistics helps businesses make informed decisions by analyzing market trends,
consumer behavior, and sales data. It enables companies to forecast future
demand, optimize production processes, and assess the effectiveness of
marketing strategies. In economics, statistics is used to study economic
indicators, measure economic growth, and analyze the impact of policies.

Medicine and Healthcare:


Statistics is vital in medical research and healthcare. It is used to design and
conduct clinical trials, analyze patient data, and evaluate the effectiveness of
treatments. Statistical methods help in identifying risk factors for diseases,
studying epidemiological patterns, and making evidence-based decisions for
patient care and public health interventions.

Social Sciences:
Statistics provides researchers in social sciences with tools to gather and
analyze data on human behavior and social phenomena. It enables sociologists,
psychologists, and political scientists to study public opinion, social trends, and
conduct surveys and experiments. Statistics helps in making sense of large
datasets and drawing meaningful conclusions from complex social data.

Environmental Sciences:
Environmental scientists use statistics to analyze environmental data, such as
climate patterns, pollution levels, and biodiversity. Statistical methods help in
identifying trends, modeling environmental processes, and assessing the impact
of human activities on ecosystems. It also plays a crucial role in environmental
risk assessment and policy-making.

Education:
Statistics is essential in educational research and assessment. It is used to
analyze test scores, evaluate the effectiveness of teaching methods, and identify
factors influencing student performance. Statistics also helps in designing
experiments, conducting surveys, and measuring educational outcomes.

Finance and Risk Management:


In finance, statistics is used for portfolio analysis, risk assessment, and asset
pricing. It helps in analyzing stock market trends, modeling financial data, and
making investment decisions. Statistical methods are also crucial in risk
management, including insurance and actuarial sciences, where probabilities
and predictive models are used to estimate and manage risks.

Quality Control and Manufacturing:


Statistics plays a key role in quality control processes in manufacturing and
industry. It is used to monitor production processes, analyze product defects,
and ensure consistent product quality. Statistical quality control methods help in
identifying and reducing variations, improving efficiency, and meeting quality
standards.

These examples highlight the importance of statistics in various fields,


demonstrating its ability to provide valuable insights, support decision-making,
and enable evidence-based practices.

(b) Give brief account of the importance of statistics in different fields.


Statistics plays a crucial role in various fields by providing valuable insights and
enabling informed decision-making. Here's a brief account of the importance of
statistics in different areas:
Business and Economics:
Statistics helps in market research, forecasting, and analyzing consumer
behavior. It aids in making strategic business decisions, optimizing processes,
and identifying trends and patterns in economic data. It also assists in financial
analysis, risk management, and determining the success of business initiatives.

Healthcare and Medicine:


Statistics is essential for conducting clinical trials and medical research. It
enables the analysis of patient data to assess the effectiveness of treatments,
study disease patterns, and identify risk factors. Statistical methods help in
understanding public health issues, analyzing epidemiological data, and
evaluating the impact of interventions.

Social Sciences:
Statistics plays a vital role in social sciences such as psychology, sociology, and
political science. It helps researchers design surveys, collect and analyze data,
and draw meaningful conclusions. Statistical analysis aids in studying social
trends, measuring public opinion, and testing hypotheses in various social
contexts.

Environmental Science:
Statistics is utilized in environmental research to analyze data related to climate
change, pollution levels, and ecological systems. It helps in understanding the
impact of human activities on the environment, identifying trends, and
developing models to predict future scenarios.

Education:
Statistics is essential in educational research for analyzing student performance,
evaluating teaching methods, and assessing the effectiveness of educational
programs. It helps in designing experiments, conducting assessments, and
identifying factors that influence learning outcomes.

Engineering and Technology:


Statistics is applied in engineering and technology fields to analyze
experimental data, evaluate product reliability, and perform quality control. It
assists in making informed decisions during the design and manufacturing
processes and helps identify patterns and anomalies in large datasets.

Sports and Entertainment:


Statistics plays a significant role in analyzing sports performance, player
evaluation, and game strategies. It helps in identifying key performance
indicators, assessing team and player efficiency, and predicting outcomes. In the
entertainment industry, statistics are used for audience measurement, market
analysis, and optimizing content distribution.

These are just a few examples of how statistics is important across various
fields. In general, statistics provides a systematic approach to collecting,
analyzing, and interpreting data, enabling professionals to make evidence-based
decisions and gain a deeper understanding of complex phenomena.

Q. 2 (a) Define the terms “Classification” and “Tabulation”.


Outline the main steps in Tabulation. What do you mean by captions,
stubs, title and prefatory notes?

(b) Explain the method of constructing histograms when the class


intervals are unequal.

Ans: Notes?
(a) "Classification" refers to the process of organizing and categorizing data or
objects into distinct groups or classes based on certain criteria or characteristics.
It involves grouping similar items together and separating them from dissimilar
items. Classification is commonly used in various fields, such as statistics, data
analysis, information organization, and machine learning, to make data more
manageable and understandable.
"Tabulation" refers to the systematic arrangement of data in the form of a table
or matrix. It involves organizing data into rows and columns to facilitate easy
analysis and comparison. Tabulation is a common method used to present and
summarize data, making it more accessible and digestible.

Outline of the main steps in tabulation:

Identify the purpose:


Determine the objective of tabulating the data. This could be to analyze trends,
compare variables, summarize information, or present findings.

Define variables:
Identify the variables or factors that need to be tabulated. These variables are
typically represented as column headings in the table.

Collect and organize data:


Gather the relevant data required for tabulation. Ensure that the data is properly
organized and structured, with each observation or data point assigned to the
appropriate variable.

Design the table structure:


Determine the layout and structure of the table. Decide on the number of
columns, the order of variables, and the overall format. This includes deciding
whether the table will have captions, stubs, titles, and prefatory notes.

Create captions:
Captions are short titles or labels that provide a description for each column in
the table. Captions help readers understand the content of each column and its
corresponding variable.

Create stubs:
Stubs are labels or headings that describe the rows of the table. They typically
represent different categories or groups associated with the variables being
tabulated.

Title the table:


The title provides an overall description or summary of the table's content. It
helps readers understand the purpose and context of the table.

Add prefatory notes:


Prefatory notes are additional explanatory statements or instructions that
provide further details or context about the table. They can include information
about data sources, definitions of terms, or explanations of any abbreviations
used in the table.

Enter the data:


Populate the table with the collected data, placing each observation in the
appropriate cell based on the corresponding variable and category.

Format and present the table:


Format the table in a clear and organized manner, ensuring that it is easy to read
and interpret. Use appropriate formatting options such as borders, shading, font
styles, and alignment to enhance the table's visual appeal.

Review and validate:


Double-check the accuracy of the data entered in the table and ensure that it
accurately represents the intended information.
Cross -verify the data against the original sources, if available.

Interpret and analyze:


Once the tabulated data is ready, analyze and interpret the information presented
in the table. Look for patterns, trends, relationships, or insights that can be
derived from the data.

Overall, the main steps in tabulation involve identifying the purpose, defining
variables, collecting and organizing data, designing the table structure, creating
captions and stubs, titling the table, adding prefatory notes, entering the data,
formatting the table, reviewing and validating the data, and finally, interpreting
and analyzing the tabulated information.

(b) Explain the method of constructing histograms when the class


intervals are unequal.

When constructing histograms with unequal class intervals, the process is


slightly different compared to constructing histograms with equal class
intervals. Here's how you can construct a histogram with unequal class
intervals:

Determine the range of the data:


Find the minimum and maximum values in the dataset to determine the overall
range.

Decide on the number and width of the class intervals:


Unlike equal class intervals where the width is constant, with unequal class
intervals, you have the flexibility to choose different widths for each interval.
Consider the range of the data, the distribution, and the nature of the data to
determine suitable interval widths. It's important to ensure that each data point
falls into exactly one interval.

Create a table:
Create a table with columns to represent the intervals, the frequency (or count)
of data points falling within each interval, and optionally, additional columns
for cumulative frequency, relative frequency, etc.

Divide the range into intervals:


Starting from the minimum value, divide the range into intervals based on the
chosen widths. The intervals should be non-overlapping and cover the entire
range of the data.

Count the frequency:


Examine each data point and determine which interval it falls into. Increment
the frequency count for the corresponding interval in the table.

Display the histogram:


The histogram is typically displayed as a bar chart, with the x-axis representing
the intervals and the y-axis representing the frequency or count. The height of
each bar corresponds to the frequency of data points falling within that interval.

Additional considerations:
It's important to label the axes and provide a clear title for the histogram. You
might also want to include additional statistical information, such as the total
count, cumulative frequency, relative frequency, etc.

It's worth noting that constructing histograms with unequal class intervals
requires careful consideration of the data and the purpose of the histogram. The
choice of intervals can influence the appearance and interpretation of the
histogram, so it's important to select them appropriately.

Q. 3 (a) what are the different measures of central tendency?


Describe the manner of computation any three of them with
suitable illustrations.
Ans: The measures of central tendency are statistical measures used to
describe the center or average of a data set. The three commonly used measures
of central tendency are the mean, median, and mode.

Mean:
The mean is calculated by summing up all the values in a data set and dividing
the sum by the total number of values. It is the most commonly used measure of
central tendency.

For example, let's consider the following set of numbers: 2, 4, 6, 8, 10. To


calculate the mean, we add up all the numbers (2 + 4 + 6 + 8 + 10 = 30) and
divide by the total count (5). The mean in this case is 30/5 = 6.

Median:
The median is the middle value of a data set when it is arranged in ascending or
descending order. If the data set has an odd number of values, the median is the
middle number. If the data set has an even number of values, the median is the
average of the two middle numbers.

For example, let's consider the following set of numbers: 3, 5, 7, 9, 11. The
median is the middle value, which is 7 in this case.

If we have another set of numbers: 2, 4, 6, 8, 10, 12. The median is the average
of the two middle numbers: (6 + 8) / 2 = 7.

Mode:
The mode is the value that appears most frequently in a data set. It is possible
for a data set to have one mode, multiple modes, or no mode at all.

For example, let's consider the following set of numbers: 2, 4, 6, 4, 8, 4. The


mode is 4 because it appears more frequently than any other value.

In some cases, a data set may have multiple modes. For instance, in the set of
numbers: 2, 4, 6, 4, 8, 6, both 4 and 6 appear with the same frequency, so the
data set has two modes.
If there are no repeating values in a data set, it is said to have no mode.

These measures of central tendency provide valuable insights about the data set
and help summarize its characteristics in terms of the center or average value.

(b) The following distribution shows kilowatt-hours of electricity used in


one month by 75 residential consumers in a certain locality of Islamabad.

Consumption in 5- 25- 45- 65- 154- 85- 105-


kw hours 24 44 64 84 162 104 124
No. of consumers 3 5 9 12 5 4 2

Estimate mean and median.


To estimate the mean and median from the given distribution, we can calculate
the midpoint for each range and then compute the weighted mean and median.

Let's calculate the midpoint for each range:

Range Midpoint
5-24 (5 + 24) / 2 = 14.5
25-44 (25 + 44) / 2 = 34.5
45-64 (45 + 64) / 2 = 54.5
65-84 (65 + 84) / 2 = 74.5
154-162 (154 + 162) / 2 = 158
85-104 (85 + 104) / 2 = 94.5
105-124 (105 + 124) / 2 = 114.5

Now, let's calculate the total number of consumers and the sum of the products
of each midpoint and the corresponding number of consumers:

Total consumers = 3 + 5 + 9 + 12 + 5 + 4 + 2 = 40

Sum of (Midpoint * Number of Consumers):


(14.5 * 3) + (34.5 * 5) + (54.5 * 9) + (74.5 * 12) + (158 * 5) + (94.5 * 4) +
(114.5 * 2) = 4453.5
Mean = Sum of (Midpoint * Number of Consumers) / Total consumers
Mean = 4453.5 / 40
Mean ≈ 111.34

To estimate the median, we need to find the position of the median in the
cumulative frequency distribution.

Cumulative Frequency:
5-24: 3
5-44: 3 + 5 = 8
5-64: 8 + 9 = 17
5-84: 17 + 12 = 29
5-162: 29 + 5 = 34
5-104: 34 + 4 = 38
5-124: 38 + 2 = 40

Since the total number of consumers is 40, the median would be in the 20th
position (middle position).

Looking at the cumulative frequency distribution, we can see that the median
falls within the range of 45-64.

Since the range width is 20 (64 - 45 + 1 = 20), and the median falls in the 20th
position, we can calculate the median as follows:

Median = Lower limit of the range + [(n/2 - cumulative frequency of previous


range) / frequency of the current range] * range width

Median = 45 + [(20/2 - 17) / 12] * 20


Median = 45 + [(10 - 17) / 12] * 20
Median = 45 + (-7/12) * 20
Median ≈ 45 - 11.67
Median ≈ 33.33

Therefore, the estimated mean is approximately 111.34 and the estimated


median is approximately 33.33.
Q. 4 (a) Discuss the empirical relationship between mean, median
and mode.

Ans: The mean, median, and mode are all measures of central tendency used to
describe the distribution or average value of a set of data. While they provide
different perspectives on the data, there can be empirical relationships among
them depending on the shape of the distribution.

Symmetrical Distribution:
In a symmetrical distribution, where the data is evenly distributed around a
central value, the mean, median, and mode tend to be very close or even equal
to each other. This is commonly observed in normal distributions. For example,
in a perfectly symmetrical normal distribution, the mean, median, and mode will
all coincide at the exact center of the distribution.

Skewed Distribution:
In skewed distributions, where the data is not evenly distributed and exhibits a
tail on one side, the mean, median, and mode may differ from each other.

- Positively Skewed Distribution (Right-skewed): In a positively skewed


distribution, the tail extends towards the higher values. In such cases, the mode
is generally the smallest value, the median is smaller than the mean, and the
mean tends to be larger than both the median and mode.

- Negatively Skewed Distribution (Left-skewed): In a negatively skewed


distribution, the tail extends towards the lower values. Here, the mode is
generally the largest value, the median is larger than the mean, and the mean
tends to be smaller than both the median and mode.

Bimodal or Multimodal Distribution:


In cases where the data has multiple modes, meaning there are multiple peaks or
high-frequency values, the mean, median, and mode can be distinct from each
other. The mean may not necessarily represent any of the high-frequency
values.

No Mode or Equal Frequencies:


If the data has no mode, meaning all values occur with equal frequencies, the
mean and median will be the same. This can happen in uniform distributions,
where each value has the same probability of occurrence.

It's important to note that while the mean, median, and mode provide
information about the central tendency of a dataset, they do not provide a
complete description of the entire distribution. Other statistical measures like
variance, standard deviation, and range are necessary to understand the spread
and variability of the data.

(b) Calculate mode from the following data:

X 2 2 2 28 30 32 34 36 3 40 4 4
2 4 6 8 2 4
f 3 1 4 10 17 22 20 13 6 25 6 1
3 3 2 5 0 4 9 9

To calculate the mode from the given data, we need to find the value(s) that
occur most frequently. In this case, we have the following data:

X: 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44
f: 3, 13, 43, 102, 175, 220, 204, 139, 69, 25, 6, 1

The value with the highest frequency is the mode.

From the given data, we can see that the frequency reaches its peak at X = 32,
with a frequency (f) of 220. Therefore, the mode of the data set is 32.

Q. 5 (a) Discuss in detail the criteria of a suitable average.


Ans: When it comes to analyzing data or summarizing a set of values, an
average is a commonly used statistical measure. However, not all averages are
suitable for every situation. The choice of a suitable average depends on the
characteristics of the data and the objective of the analysis. Here are some
criteria to consider when determining the suitability of an average:

Arithmetic Mean:
The arithmetic mean is the most widely used average. It is calculated by
summing up all the values in a dataset and dividing it by the number of
observations. The arithmetic mean is suitable when the data is numeric,
continuous, and symmetrically distributed. It is sensitive to extreme values,
making it less suitable for skewed or heavy-tailed distributions.

Weighted Mean:
The weighted mean takes into account the importance or contribution of each
value in the dataset. It assigns weights to each value and calculates the mean
based on the weighted values. This average is suitable when certain values have
more significance or when dealing with data from different sources with
varying reliability.

Median:
The median represents the middle value in a dataset when it is arranged in
ascending or descending order. It is suitable for data that is skewed or contains
outliers. The median is not influenced by extreme values and provides a better
measure of central tendency in such cases. It is particularly useful when dealing
with ordinal or interval data.

Mode:
The mode represents the most frequently occurring value in a dataset. It is
suitable for categorical or discrete data, such as nominal variables. The mode is
useful for identifying the most common category or identifying peaks in the
data distribution. It can be used alongside other averages to provide a
comprehensive understanding of the dataset.
Geometric Mean:
The geometric mean is suitable for data involving ratios, growth rates, or
exponential processes. It is calculated by taking the nth root of the product of n
values. This average is commonly used in finance, biology, and other fields
where multiplicative factors play a significant role.

Harmonic Mean:
The harmonic mean is suitable when dealing with rates, averages of rates, or
inversely proportional values. It is calculated by dividing the number of
observations by the sum of the reciprocals of the values. This average gives
more weight to smaller values, making it suitable for situations where the focus
is on the reciprocal relationship between variables.

Trimmed Mean:
The trimmed mean is calculated by removing a certain percentage of extreme
values from both ends of the dataset and then calculating the mean of the
remaining values. It is suitable when the data contains outliers that can unduly
influence the arithmetic mean. The trimmed mean provides a compromise
between the robustness of the median and the efficiency of the mean.

In summary, the choice of a suitable average depends on the nature of the data,
the presence of outliers, the distributional properties, and the specific objective
of the analysis. It is important to consider these criteria to ensure the chosen
average accurately represents the central tendency of the data and provides
meaningful insights.

(b) Calculate the geometric mean and harmonic mean from the given
distribution

Classe 4-6 6-8 8-10 10-12 12-14 14-16


s
f 13 11 182 105 19 7

To calculate the geometric mean and harmonic mean from the given
distribution, we first need to find the midpoint of each class interval. Then we
can use the following formulas:

Geometric Mean:
GM = √(x₁ * x₂ * x₃ * ... * xn)

Harmonic Mean:
HM = n / ((1/x₁) + (1/x₂) + (1/x₃) + ... + (1/xn))

Where n is the number of data points, and x₁, x₂, x₃, ..., xn are the midpoints of
the class intervals.

Let's calculate the geometric mean and harmonic mean step by step:

Classes | Midpoint (x) | f


-----------------------------
4-6 | 5 | 13
6-8 | 7 | 11
8-10 | 9 | 182
10-12 | 11 | 105
12-14 | 13 | 19
14-16 | 15 | 7

Step 1: Calculate the product of x * f for each class.


(x₁ * f₁) = 5 * 13 = 65
(x₂ * f₂) = 7 * 11 = 77
(x₃ * f₃) = 9 * 182 = 1638
(x₄ * f₄) = 11 * 105 = 1155
(x₅ * f₅) = 13 * 19 = 247
(x₆ * f₆) = 15 * 7 = 105
Step 2: Calculate the geometric mean.
GM = √(x₁ * f₁ * x₂ * f₂ * x₃ * f₃ * x₄ * f₄ * x₅ * f₅ * x₆ * f₆)
= √(65 * 77 * 1638 * 1155 * 247 * 105)
≈ √(7,854,161,441,750)
≈ 2,800.5

Step 3: Calculate the harmonic mean.


HM = n / ((1/x₁) + (1/x₂) + (1/x₃) + (1/x₄) + (1/x₅) + (1/x₆))
= 6 / ((1/5) + (1/7) + (1/9) + (1/11) + (1/13) + (1/15))
= 6 / (0.2 + 0.1429 + 0.1111 + 0.0909 + 0.0769 + 0.0667)
= 6 / 0.6885
≈ 8.722

Therefore, the geometric mean of the given distribution is approximately


2,800.5, and the harmonic mean is approximately 8.722.

You might also like