This document provides an introduction to basic statistical concepts taught in a Business Statistics course. It includes definitions of key terms like population, sample, qualitative and quantitative variables, and scales of measurement. It also outlines common measures of central tendency like mean, median and mode, and their properties. The document discusses descriptive statistics and presents a case study example to demonstrate how to construct a frequency table and relative frequency distribution for a set of vehicle selling price data. Finally, it mentions common graphs like histograms that can be used to present frequency distributions.
This document provides an introduction to basic statistical concepts taught in a Business Statistics course. It includes definitions of key terms like population, sample, qualitative and quantitative variables, and scales of measurement. It also outlines common measures of central tendency like mean, median and mode, and their properties. The document discusses descriptive statistics and presents a case study example to demonstrate how to construct a frequency table and relative frequency distribution for a set of vehicle selling price data. Finally, it mentions common graphs like histograms that can be used to present frequency distributions.
This document provides an introduction to basic statistical concepts taught in a Business Statistics course. It includes definitions of key terms like population, sample, qualitative and quantitative variables, and scales of measurement. It also outlines common measures of central tendency like mean, median and mode, and their properties. The document discusses descriptive statistics and presents a case study example to demonstrate how to construct a frequency table and relative frequency distribution for a set of vehicle selling price data. Finally, it mentions common graphs like histograms that can be used to present frequency distributions.
This document provides an introduction to basic statistical concepts taught in a Business Statistics course. It includes definitions of key terms like population, sample, qualitative and quantitative variables, and scales of measurement. It also outlines common measures of central tendency like mean, median and mode, and their properties. The document discusses descriptive statistics and presents a case study example to demonstrate how to construct a frequency table and relative frequency distribution for a set of vehicle selling price data. Finally, it mentions common graphs like histograms that can be used to present frequency distributions.
Lecture 1 & 2 Contents: Basic Statistical Concepts Summarisation of Data Frequency Distribution Measures of Central Tendency Measures of Dispersion Relative Dispersion, Skewness. Shwetank Rewatkar Using Statistics 2 Malcom Forbes a businessman and a key hot air balloon enthusiast lost his way & landed in the middle of a cornfield. He saw a man running to him and had the following conversation, Forbes Sir, Can you tell me where I am? Man Certainly, you are in a basket in a field of Corn. Forbes Sir, You must be a Statistician. Man Thats amazing. How did you know? Forbes Easy. Your information is concise, precise and absolutely useless!!! A GOOD STUDENT of Statistics should ensure that the information resulting from a good statistical analysis is always CONCISE, often PRECISE and never USELESS. Shwetank Rewatkar Types of Variables 3 A. Qualitative or Attribute variable - the characteristic being studied is generally nonnumeric. A. EXAMPLES: Gender, religious affiliation, type of automobile owned, state of birth, eye color are examples.
B. Qualitative variables could also be described by numbers, although the description might be arbitrary. A. Examples: Car Registration number, State of birth 1, 2, 3, 4, etc.
C. Quantitative variable Can be described by a number for which arithmetic operations such as averaging makes sense. A. EXAMPLES: Balance in your mobile account, minutes remaining in class, or number of children in a family.
D. Quantitative Variable can be either Discrete or Continuous. Shwetank Rewatkar Summary of Types of Variables 4 Shwetank Rewatkar Four Scales of Measurement Weakest 1 & Strongest 4 5 1 Nominal scale - data that is classified into categories and cannot be arranged in any particular order. Numbers are just labels for groups or classes. Nominal stands for NAME
EXAMPLES: eye color, gender, religious affiliation, Platform number.
2 Ordinal scale involves data arranged in some order according to their relative size or quality. The differences between data values cannot be determined or are meaningless. We know one is better than the other but how much better is not known.
EXAMPLE: During a taste test of 4 soft drinks, Coca Cola was ranked number 1, Sprite number 2, Seven-up number 3, and Orange Mirinda number 4.
3 Interval scale - similar to the ordinal scale, with the additional property that meaningful amounts of differences between data values can be determined. There is no natural zero point.
EXAMPLE: Time of a day. 10:00 a.m. is not twice of 5:00 a.m. but the interval between 00:00 & 10:00 a.m. is twice the interval between 00:00 and 5:00 a.m..
4 Ratio scale - the interval scale with an inherent zero starting point. Differences and ratios are meaningful for this level of measurement.
EXAMPLES: Monthly income of surgeons, or distance traveled by manufacturers representatives per month.
Shwetank Rewatkar Population v/s Sample 6 A population is a collection of all possible individuals, objects, or measurements of interest. The population is also called the UNIVERSE. Greek letters, like or are used for population & termed as Population Parameter. A sample is a portion, or part, or subset of measurements selected from the population of interest. Roman letters, x, s are used for describing sample statistic. Shwetank Rewatkar Types of Statistics Descriptive Statistics 7 Data and Data Collection A set of measurements obtained on some variable is called a data set.
Descriptive Statistics - methods of organizing, summarizing, and presenting data in an informative way. Generally when the entire population space is considered, tabulating & presenting the data is a challenge.
Inferential Statistics: A decision, estimate, prediction, or generalization about a population, based on a sample.
Shwetank Rewatkar Problems To Be Solved 8 Percentiles & Quartiles. Measures of Central Tendency, Mean, Arithmetic, Geometric, Harmonic. Mean for individual, discrete, continuous distribution. Mean from Assumed mean. Median for individual, discrete, continuous distribution. Mode for individual, discrete, continuous distribution Measures of Dispersion, Range. Mean Deviation. Standard Deviation. Coefficient of Variation. Combined Standard Deviation. Skewness, Test for Skewness. Shwetank Rewatkar Requisites of a Good Measure of Central Tendency 9 It should be rigidly defined, which means that it should be calculated and interpreted in the same way by everyone
It should be based on all values of the data
It should not be unduly affected by the extreme values
It should be amenable for further algebraic treatment
It should be amenable to sampling, by which we mean that the results obtained by various samples should be similar
It should be simple to compute. Shwetank Rewatkar Some Measures of Central Tendency 10 Arithmetic Mean: It is an mathematical average and is obtained by dividing the sum of the observations by the number of observations. Median: It refers to the VALUE of the middle observation of the array & is an positional average. Quartiles, Deciles, Percentiles: These are also positional averages and divides the series into four parts, ten parts and 100 parts respectively. MODE: MODE is the Value of the data that occurs most frequently. Geometric Mean: It is a specialized average and is applicable when quantities requiring averaging are drawn from situations following Exponential law of growth or decline. Harmonic Mean: Harmonic Mean is used to average rates.
Shwetank Rewatkar Arithmetic Mean 11 Merits Easy to understand and simple to calculate
It is based on all items of the series
Rigidly defined by a mathematical formula
It is capable of further algebraic treatment
It has sampling stability and is least affected by sampling fluctuations
Arrangement of items is not required Demerits It is affected by extreme values & thus for distributions where concentration is on small or big values the mean is not an ideal representative
For open ended distributions mean cannot be calculated with accuracy
Mean is not useful for studying quantitative phenomena like beauty, intelligence, honesty, etc
Mean does not have a life of its own. Average number of children is 3.6 in India is meaningless
Mean averages out the positive and negative deviations, which is incorrect. Shwetank Rewatkar 12 Median Merits Useful in Open ended series as it is based on position and not on the values. Easier to compute as compared to mean in case of unequal class intervals. It is not affected by extreme values. Suitable in case of Qualitative Data It minimises total absolute deviations. Demerits Requires arrangement of data. It is not based on all the items of the series. Incapable of any algebraic treatment & combined medians cannot be obtained. Assumption of uniformly distributed median class is not always true. Shwetank Rewatkar 13 MODE Merits In certain situations mode is the only suitable average, e.g. size of shoes, garments, wages, etc. It is not affected by extreme values. It can be used for qualitative phenomena. It indicates point of maximum concentration in case of highly skewed distributions. Limitations In case of bi modal or multi modal series, mode cannot be uniquely defined. It is incapable of further algebraic treatment. It is not based on all the items of the series. It is not rigidly defined because different formulae will give different answers. Its value is affected by size of class interval. Shwetank Rewatkar 14 Case Study Descriptive Statistics Ms. Kathryn Ball of AutoUSA wants to develop tables, charts, and graphs to show the typical selling price on various dealer lots. The table on the right reports only the price of the 80 vehicles sold last month at Whitner Autoplex. Shwetank Rewatkar 15 Constructing a Frequency Table - Example Step 1: Decide on the number of classes. A useful recipe to determine the number of classes (k) is the 2 to the k rule. such that 2 k > n. There were 80 vehicles sold. So n =80. If we try k = 6, which means we would use 6 classes, then 2 6 = 64, somewhat less than 80. Hence, 6 is not enough classes. If we let k =7, then 2 7
128, which is greater than 80. So the recommended number of classes is 7. Step 2: Determine the class interval or width. The formula is: i (H-L)/k where i is the class interval, H is the highest observed value, L is the lowest observed value, and k is the number of classes. ($35,925 - $15,546)/7 = $2,911 Round up to some convenient number, such as a multiple of 10 or 100. Use a class width of $3,000
Shwetank Rewatkar 16 Step 3: Set the individual class limits
Constructing a Frequency Table - Example Shwetank Rewatkar 17
Step 4: Tally the vehicle selling prices into the classes. Step 5: Count the number of items in each class.
Constructing a Frequency Table Shwetank Rewatkar 18 Relative Frequency Distribution To convert a frequency distribution to a relative frequency distribution, each of the class frequencies is divided by the total number of observations. Shwetank Rewatkar 19 Graphic Presentation of a Frequency Distribution The three commonly used graphic forms are: Histograms Frequency polygons Cumulative frequency distributions
Shwetank Rewatkar 20 Histogram Histogram for a frequency distribution based on quantitative data is very similar to the bar chart showing the distribution of qualitative data. The classes are marked on the horizontal axis and the class frequencies on the vertical axis. The class frequencies are represented by the heights of the bars. Shwetank Rewatkar 21 Histogram Using Excel Shwetank Rewatkar 22 Frequency Polygon A frequency polygon also shows the shape of a distribution and is similar to a histogram. It consists of line segments connecting the points formed by the intersections of the class midpoints and the class frequencies.
Shwetank Rewatkar 23 Cumulative Frequency Distribution Shwetank Rewatkar 24 Cumulative Frequency Distribution Shwetank Rewatkar 25 Standard Deviation, Merits. It is based on all items of the distribution. It is amenable to algebraic treatment. It is least affected by fluctuations in sampling. It facilitates the calculation of combined standard deviation of two or more groups. It provides a unit of measurement for normal distribution. Demerits. It cannot be used for comparing the variability of two or more series of observations given in different units. It is difficult to compute as compared with other measures of dispersion. It is very much affected by the extreme values & importance is given to extreme values from the mean than the near values. Shwetank Rewatkar