Unit I Bbbbbbbbbbbbbba

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

Unit I: Statistical Data and Descriptive Statistics - Measurement of Central Tendency

Central tendency is a measure that identifies a central value or a typical value around which data tends to cluster. The three main
measures of central tendency are the mean, median, and mode.

1.1 Measurement of Central Tendency:

a) Mathematical Average including Arithmetic Mean, Geometric Mean, and Harmonic Mean:
• Arithmetic Mean: This is the most common measure, calculated by adding up all values and dividing by the number of
values. Arithmetic Mean=∑Arithmetic Mean=n∑i=1nXi
• Geometric Mean: Useful for datasets with multiplicative relationships, it is the nth root of the product of all values.
Geometric Mean=Geometric Mean=nX1⋅X2⋅…⋅Xn
• Harmonic Mean: Represents the reciprocal of the arithmetic mean of the reciprocals of the values.
Harmonic Mean=Harmonic Mean=X11+X21+…+Xn1n

Properties and Applications:


• These measures are used to summarize and describe the central tendency of a dataset.
• The arithmetic mean is sensitive to extreme values, while the geometric mean is more stable in such cases.
• The harmonic mean is often used in situations where rates are involved, like speed or efficiency.

b) Positional Average:
• Mode: The mode is the value that occurs most frequently in a dataset.
• Median: The median is the middle value when the data is arranged in ascending or descending order.
• Quartiles, Deciles, and Percentiles: These are positional averages that divide the dataset into quarters, tenths, and
hundredths, respectively.

Graphic Determination:
• Graphical methods, such as histograms or cumulative frequency curves, can aid in determining modes, medians,
quartiles, deciles, and percentiles.

Understanding and calculating these measures of central tendency are crucial in summarizing data and gaining insights into its
typical characteristics.
including positional averages, mode, median, quartiles, deciles, and percentiles, along with their graphic
determination. Let's explore each of these concepts:
1. Positional Average:
• Also known as the weighted average or mean, it is calculated by adding up all the values and
dividing by the number of values.
• Formula: Mean=∑Mean=n∑i=1nxi, where xi is the i-th value and n is the number of values.
2. Mode:
• The mode is the value that appears most frequently in a dataset.
• A dataset can be unimodal (one mode), bimodal (two modes), or multimodal (more than two
modes).
3. Median:
• The median is the middle value of a dataset when it is ordered.
• If the dataset has an even number of values, the median is the average of the two middle values.
4. Quartiles:
• Quartiles divide a dataset into four equal parts.
• The first quartile (Q1) is the median of the lower half of the data.
• The second quartile (Q2) is the overall median.
• The third quartile (Q3) is the median of the upper half of the data.
5. Deciles:
• Deciles divide a dataset into ten equal parts.
• The first decile (D1) represents the 10th percentile, the second decile (D2) the 20th percentile, and
so on.
6. Percentiles:
• Percentiles divide a dataset into 100 equal parts.
• The p-th percentile is the value below which p percent of the data falls.
7. Graphic Determination:
• Graphical representations, such as box plots or cumulative distribution plots, can be used to
visually identify quartiles, percentiles, and other distribution characteristics.

For example, a box plot typically displays the median, quartiles, and potential outliers. A cumulative distribution
plot can show the percentage of data below a certain value.
a) Mathematical Average including Arithmetic Mean, Geometric Mean, and Harmonic Mean:
1. Arithmetic Mean:
• Formula: Arithmetic MeanArithmetic Mean=n∑i=1nXi
• Properties:
• Sensitive to extreme values (outliers).
• The sum of the deviations from the mean is always zero.
• Easy to compute and widel=y used in various applications.
• Applications:
• Commonly used in business and economics for financial analysis.
• Provides a representative value for a set of observations.
• Used in inferential statistics for hypothesis testing and confidence intervals.
2. Geometric Mean:
• Formula: Geometric Mean=Geometric Mean=nX1⋅X2⋅…⋅Xn
• Properties:
• Suitable for multiplicative relationships and exponential growth rates.
• Less sensitive to extreme values than the arithmetic mean.
• The product of the deviations from the geometric mean is always one.
• Applications:
• Used in financial analysis for calculating average returns on investments.
• Applied in biology, physics, and other fields for rates of growth or decay.
• Useful when dealing with percentages and ratios.
3. Harmonic Mean:
• Formula: Harmonic Mean=Harmonic Mean=X11+X21+…+Xn1n
• Properties:
• Suitable for situations where rates are involved.
• The harmonic mean is always less than or equal to the geometric mean, which is less than or
equal to the arithmetic mean.
• Applications:
• Used in physics for calculations involving speed and velocity.
• Applicable in economics for average rates.
• Useful in situations where reciprocal values need to be considered.

Overall Considerations:
• The choice of the mathematical average depends on the nature of the data and the specific goals of
analysis.
• Arithmetic mean is widely used but can be affected by outliers.
• Geometric mean is appropriate for proportional growth rates.
• Harmonic mean is suitable for situations involving rates or ratios.
1. Range:
• Range is the simplest measure of variation and is the difference between the maximum and
minimum values in a dataset.
• Formula:
Range=Maximum Value−Minimum ValueRange=Maximum Value−Minimum Value
2. Quartile Deviation:
• Quartile Deviation is the half of the interquartile range (IQR) and is a measure of the spread around
the median.
• Formula12QD=2Q3−Q1 , where 1Q1 is the first quartile, and 3Q3 is the third quartile.
3. Mean Deviation:
• Mean Deviation is the average of the absolute differences between each data point and the mean
of the dataset.
• Formula: Mean Deviation=Mean Deviation=n∑i=1n∣xi−Mean∣
4. Standard Deviation:
• Standard Deviation is a widely used and more sophisticated measure of variation. It considers the
squared differences from the mean.
• Formulaσ=n∑i=1n(xi−Mean)2, where σ is the standard deviation.
5. Coefficient of Variation (CV):
• CV is a relative measure of variation, expressing the standard deviation as a percentage of the
mean.
• Formula: =(Standard DeviationMean)×100CV=(MeanStandard Deviation)×100

Properties of Standard Deviation/Variation:


• Sensitivity to Outliers: Standard deviation is sensitive to extreme values in the dataset.
• Squared Differences: It involves squaring the differences from the mean, which helps in penalizing larger
deviations more heavily.
• Units: Standard deviation is in the same units as the original data.
• Additivity: Standard deviation has the property of additivity, making it suitable for analysis in various
statistical methods.
• Normal Distribution: In a normal distribution, about 68% of the data falls within one standard deviation,
95% within two, and 99.7% within three.
Skewness:

Skewness is a measure of the asymmetry or skew of a probability distribution. In a symmetrical distribution, the
skewness is zero. A positive skewness indicates a distribution that is skewed to the right (tail on the right), while a
negative skewness indicates a distribution that is skewed to the left (tail on the left).

Measurement of Skewness:
1. Karl Pearson's Coefficient of Skewness:
• The Pearson skewness coefficient is based on the third standardized moment.
• Formula:
Skewness=3(Mean−Median)Standard DeviationSkewness=Standard Deviation3(Mean−Median)
•Interpretation:
• Positive skewness (>0>0 ): Right-skewed distribution.
• Negative skewness (<0<0 ): Left-skewed distribution.
• Skewness close to zero: Approximately symmetrical distribution.
2. Bowley's Coefficient of Skewness:
• Bowley's skewness coefficient is based on quartiles and is also known as the quartile skewness
coefficient.
• Formula: Skewness=1Skewness=Q3−Q1Q1+Q3−2Median
• Interpretation:
• Positive skewness (>0>0 ): Right-skewed distribution.
• Negative skewness (<0<0 ): Left-skewed distribution.
• Skewness close to zero: Approximately symmetrical distribution.

Differences between Pearson and Bowley's Measures:


1. Basis for Calculation:
• Pearson's skewness is based on the mean, median, and standard deviation.
• Bowley's skewness is based on quartiles (Q1, Q3) and the median.
2. Sensitivity to Outliers:
• Pearson's skewness can be sensitive to outliers due to its reliance on the mean.
• Bowley's skewness is less affected by extreme values since it involves quartiles.
3. Formula Complexity:
• Pearson's formula involves the mean, median, and standard deviation, making it slightly more
complex.
• Bowley's formula is simpler, relying only on quartiles.

2.1 Theory of Probability:


• Definition: Probability is a measure of the likelihood that a particular event will occur.
• Classical Approach: Assumes that all outcomes are equally likely (like rolling a fair die).
• Relative Frequency Approach: Based on observed frequencies in past events.
• Subjective Approach: Involves personal judgment or belief about the likelihood of an event.

2.2 Calculation of Event Probabilities:


• Probability of an Event (P): A number between 0 and 1 representing the likelihood of an event.
• Complementary ProbabilityP(not A)=1−P(A).
• Addition Law: P(A∪B)=P(A)+P(B)−P(A∩B) for two events A and B.
• Multiplication Law: P(A∩B)=P(A)×P(B∣A) for the joint probability of A and B.

2.3 Conditional Probability and Bayes' Theorem:


• Conditional Probability (P(B|A)): The probability of event B occurring given that event A has occurred.
• Multiplication Law for Conditional Probability: P(A∩B)=P(A)×P(B∣A).
• Bayes' Theorem: P(A∣B)=P(B)P(B∣A)×P(A).
• Bayes' Theorem helps update probabilities based on new information.

In simpler terms, probability helps us quantify uncertainty. Classical probability assumes equal likelihood, relative
frequency uses past data, and subjective probability involves personal judgment.
When events are not mutually exclusive, the addition law considers their overlap. The multiplication law accounts
for the joint probability of two events.

Conditional probability is about adjusting probabilities based on known information. Bayes' Theorem is a tool for
updating probabilities when new evidence is considered.

3.1 Correlation Analysis:


• Meaning of Correlation:
• Correlation measures the strength and direction of a linear relationship between two variables.
• Simple, Multiple, and Partial Correlation:
• Simple Correlation: Examines the relationship between two variables.
• Multiple Correlation: Involves more than two variables.
• Partial Correlation: Studies the relationship between two variables while controlling for the
influence of one or more other variables.
• Linear and Nonlinear Correlation:
• Linear Correlation: Represents a straight-line relationship.
• Nonlinear Correlation: Involves curved relationships between variables.
• Correlation and Causation:
• Correlation does not imply causation; it only indicates a statistical association between variables.
• Scatter Diagram:
• A graphical representation of paired data points on a Cartesian plane to visually observe the
relationship between variables.
• Pearson’s Coefficient of Correlation:
• Measures the strength and direction of a linear relationship.
• Formula2r=∑(Xi−Xˉ)2⋅∑(Yi−Yˉ)2∑(Xi−Xˉ)(Yi−Yˉ)
• Correlation and Probable Error:
• Involves assessing the reliability of the correlation coefficient.
• Rank Correlation:
• A non-parametric measure of correlation, like Spearman's rank correlation coefficient.

3.2 Regression Analysis:


• Principles of Least Squares and Regression Lines:
• Least Squares: A method to minimize the sum of squared differences between observed and
predicted values.
• Regression Line: Represents the best-fit line through the data points.
• Regression Equation and Estimation:
• Regression Equation: Represents the mathematical relationship between the dependent and
independent variables.
• Estimation: Predicting values based on the regression equation.
• Properties of Regression Coefficients:
• Intercept (a): The value of the dependent variable when the independent variable is zero.
• Slope (b): The rate of change of the dependent variable for a unit change in the independent
variable.
• Relationship between Correlation and Regression Coefficient:
• The correlation coefficient (r) is the product of the ratio of the covariance of variables to the
product of their standard deviations.
• Standard Error of Estimate:
• Measures the average distance between observed and predicted values.
4.1 Index Numbers:
• Meaning and Use of Index Numbers:
• Index numbers are statistical measures used to represent changes in a group of related variables
over time.
• They help in comparing changes in various economic, financial, or social phenomena.
• Constructions of Index Numbers:
• Fixed Base and Chain Base:
• Fixed Base Index: Uses a single base period for comparison.
• Chain Base Index: Uses the previous period as the base for comparison in the next period.
• Aggregative and Average of Relatives:
• Aggregative Index: Sums up individual changes without considering their weights.
• Average of Relatives Index: Computes the average of individual changes, considering their
weights.
• Simple and Weighted Index Numbers:
• Simple Index: Each item in the index has equal importance.
• Weighted Index: Assigns different weights to different items based on their relative importance.
4.2 Tests of Adequacy of Index Numbers:
• Ensure reliability and accuracy of index numbers through tests like the Time Reversal Test,
Factor Reversal Test, and Circular Test.

Base Shifting, Splicing, and Deflating:


• Base Shifting: Changing the reference base period of an index to reflect current conditions.
• Splicing: Combining two index series with different base years into a single series.
• Deflating: Adjusting nominal values for inflation to obtain real values.

Problems in the Construction of Index Numbers:


• Challenges include selecting an appropriate base year, handling new products, changes in
quality, and accurately representing the basket of goods.

In summary, tests ensure index reliability, base shifting adjusts for relevance, splicing combines
indices, and deflating accounts for inflation. Problems involve selecting a representative base year
and addressing changes in products and quality.
4.3 Constructions of Consumer Price Indices:
• Consumer Price Index (CPI): Measures the average change over time in the prices paid by
urban consumers for a market basket of consumer goods and services.
• Construction: Based on the expenditures of a typical household, tracking changes in the
prices of goods and services they commonly purchase.

Important Share Price Indices:


• BSE SENSEX (Bombay Stock Exchange Sensitive Index):
• Represents the performance of the 30 largest and most actively traded stocks on the
BSE.
• Calculated using the free-float market capitalization-weighted methodology.
• NSE NIFTY (National Stock Exchange NIFTY 50):
• Represents the performance of the 50 largest and most liquid Indian stocks listed on the
NSE.
• Also follows the free-float market capitalization-weighted methodology.
5.1 Components of Time Series:

Time series analysis involves studying data points collected or recorded over time. The components of a
time series can be broken down into:
1. Trend:
• Represents the long-term movement or direction in the data.
• It shows whether the data is increasing, decreasing, or staying relatively constant over time.
2. Seasonal Variation:
• Refers to regular and predictable fluctuations that occur at specific intervals within a time period,
often influenced by external factors.
• Seasonal variation repeats over a fixed period, such as yearly, quarterly, or monthly.
3. Cyclical Variation:
• Represents long-term oscillations or waves that are not as regular as seasonal patterns.
• Cyclical variations are influenced by economic conditions, business cycles, or other non-seasonal
factors.
4. Irregular or Random Fluctuations (Residual):
• Unpredictable and irregular fluctuations in the time series data.
• They are often caused by unexpected events, errors in data collection, or other unpredictable
factors.

Additive and Multiplicative Models:


1. Additive Model:
• Equation: Yt=Tt+St+Ct+It
• Components are added together.
• Suitable when the magnitude of seasonal fluctuations or trends remains relatively constant over
time.
2. Multiplicative Model:
• Equation: Yt=Tt×St×Ct×It
• Components are multiplied together.
• Appropriate when the magnitude of seasonal fluctuations or trends varies with the level of the
series.

Summary:
• Time series analysis involves understanding and modeling the components of data collected over time.
• The components include trend, seasonal variation, cyclical variation, and irregular fluctuations.
• Additive and multiplicative models are two approaches used to represent the interaction of these
components in time series data.
5.2 Trend Analysis:

Trend Analysis:
• Trend is the long-term movement or direction observed in a time series.

Fitting of Trend Line using Principle of Least Squares:


• Linear Trend Line:
• Uses the equation Yt=a+bt.
• a is the intercept, and b is the slope.
• Second Degree Parabola (Quadratic Trend):
• 2Yt=a+bt+ct2.
• Captures a curvilinear trend.
• Exponential Trend:
• Yt=abt.
• Suitable for data with exponential growth or decay.

Conversion of Annual Linear Trend Equation:


• To convert an annual trend equation to quarterly/monthly, divide the annual rate by 4 (for quarterly) or 12
(for monthly).

Moving Averages:
• Simple Moving Average:
• Calculates the average of a specified number of data points.
• Smoothens fluctuations, highlighting trends.
• Weighted Moving Average:
• Assigns different weights to different data points.
• Useful when recent values are considered more important.

5.3 Seasonal Variation:

Seasonal Variation:
• Seasonal Variation refers to regular and predictable fluctuations that occur at specific intervals within a
time period.

Calculation of Seasonal Indices:


1. Simple Averages:
• Calculate the average for each season across multiple years.
• Seasonal index = (Seasonal Average / Overall Average) * 100.
2. Ratio to Trend:
• Divide the observed value by the corresponding trend value.
• Seasonal index = (Observed Value / Trend Value) * 100.
3. Ratio to Moving Averages:
• Divide the observed value by the moving average for the corresponding season.
• Seasonal index = (Observed Value / Moving Average) * 100.

Using Seasonal Indices:


• Adjust the observed values for seasonal effects using the calculated indices.
• Helps in deseasonalizing data, making it easier to identify underlying trends and patterns.

Familiarization with Software:


• Utilize statistical software to automate and streamline calculations.
• Functions in software tools facilitate the formation of frequency distributions, calculation of averages,
measures of variation, and correlation and regression coefficients.

You might also like