This document provides an overview of basic statistics concepts, including measures of central tendency (average) and measures of variation (dispersion). It discusses the mean, median, and mode as measures of central tendency and their properties. For measures of variation, it covers range, mean deviation, variance and standard deviation, explaining their definitions and merits and demerits. It also describes how to compute the standard deviation using a shortcut method and how to calculate the mean and standard deviation of combined data sets.
This document provides an overview of basic statistics concepts, including measures of central tendency (average) and measures of variation (dispersion). It discusses the mean, median, and mode as measures of central tendency and their properties. For measures of variation, it covers range, mean deviation, variance and standard deviation, explaining their definitions and merits and demerits. It also describes how to compute the standard deviation using a shortcut method and how to calculate the mean and standard deviation of combined data sets.
Copyright:
Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPT, PDF, TXT or read online from Scribd
This document provides an overview of basic statistics concepts, including measures of central tendency (average) and measures of variation (dispersion). It discusses the mean, median, and mode as measures of central tendency and their properties. For measures of variation, it covers range, mean deviation, variance and standard deviation, explaining their definitions and merits and demerits. It also describes how to compute the standard deviation using a shortcut method and how to calculate the mean and standard deviation of combined data sets.
Copyright:
Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPT, PDF, TXT or read online from Scribd
COMP Department, HKUST. Statistics Statistics is the science dealing with the collection, organizing, analysis and interpretation of numerical data We will focus on the analysis of data Two topics will be discussed Measures of central tendency (average) Measures of variation (dispersion) Measures of central tendency Motivation We have a bulk of data in hand, how can we reduce them to a human understandable form? All values lie in between two extreme values Can we make use of a single number to represent such a data set? Goal By computing the “average” of a data set Types of averages There are several methods to measure the central tendency using different averages List of different averages Arithmetic mean or simply, mean Median Mode Arithmetic mean Definition Suppose we have N numbers x1, x2, …, xN The arithmetic mean is defined as Merits and demerits of mean Merits Mean is well understood by most people Computation of mean is easy Demerits Sensitive to extreme value For example: X={1,1,1,1,2,9}, mean(X)=2.5 which does not reflect the actually central tendency of this set of numbers Median Definition It divides the numbers into two halves such that the number of items below it is the same as the number of items above it Suppose we have n numbers x1, x2, ……, xn. Median is defined as Merits and demerits of median Merits Another widely used measure of central tendency It is not influenced by extreme values Demerits When the number of items are small, median may not be representative, because it is a positional average Mode Definition Mode is defined as the most frequent value in a set of numbers Example: X={1,1,3,3,3,3,4,5,6}, Mode(X)=3 Merits and demerits Merits It represents the most typical value in the distribution Demerits It may not be uniquely defined Example: X={1,1,2,2}, Mode(X)=1 or 2 Measures of variation (dispersion) Motivation Sometimes using measures of central tendency alone is not enough Two data sets which look very different from each other may have the same average value How can we solve this problem? Goal By computing a value called “variation” or “dispersion” to characterize how data varies on each side of the average value Types of variation (dispersion) There are several methods to measure the variation of a data set List of different measures Range Mean deviation Variance and standard deviation Range Definition Range is the difference between the largest and the smallest numbers Suppose we have N numbers X={x1,x2,…,xN}, then Range(X) = max(X) - min(X) Merits Meaningful in some scenarios Easy to compute and well understood Demerits Greatly affected by extreme values Mean deviation Definition First, sum all the absolute difference between every item value and the mean of the distribution. Then divide the sum by the number of items Suppose we have N numbers x1, x2, ……, xN. The mean deviation is defined as Merits and demerits of mean deviation Merits Relatively easy to understand Less affected by the extreme values Demerits We have no short-cut method to compute mean deviation Given only the means and mean deviations of two sets, we can not compute the mean deviation of the combined set. We need to know every items in the combined set Variance and standard deviation Definition Suppose we have N numbers x1, x2, ……, xN. The variance σ2 is defined as
The square root of variance, σ is called the
standard deviation In the literature, N-1 is sometimes used as the denominator in the computation of σ, instead of N Merits and Demerits of standard deviation Merits Standard deviation is the most common method used to measure variation Short-cut method to compute standard deviation Demerits More affected by extreme values in comparison with the mean deviation Reason for this? Standard deviation Instead of by definition, there is a short- cut method to compute σ. How to derive it? Combined standard deviation Let x1,x2,…,xm and y1,y2,…,yn be two set of data with means and standard deviation x , y and σx and σy respectively Then the combined mean z and the standard σz deviation can be computed as follow: