Professional Documents
Culture Documents
What Is Statistics?: A Science of
What Is Statistics?: A Science of
A science of:
Collecting numerical information (data) Evaluating the numerical information (classify, summarize, organize, analyze)
Information
Information: Knowledge communicated concerning some particular fact.
1.2
Sample A sample is a set of data drawn from the population. [Part of a population] Potentially very large, but less than the population.
E.g. a sample of 1000 voters exit polled on election day.
1.3
Variable
A characteristic that is observed or manipulated Can take on different values
Independent variables
Precede dependent variables in time Are often manipulated by the researcher The treatment or intervention that is used in a study
Dependent variables
What is measured as an outcome in a study Values depend on the independent variable
4
Subset
Parameter
Statistic
Populations
A population is the group from which a sample is drawn
e.g., headache patients in a chiropractic office; automobile crash victims in an emergency room
In research, it is not practical to include all members of a population Thus, a sample (a subset of a population) is taken
Evidence-based Chiropractic
7
Random samples
Subjects are selected from a population so that each individual has an equal chance of being selected Random samples are representative of the source population Non-random samples are not representative
May be biased regarding age, severity of the condition, socioeconomic status etc.
Evidence-based Chiropractic
8
Descriptive Statistics
are methods of organizing, summarizing, and presenting data in a convenient and informative way. These methods include: Graphical Techniques (Chapter 2), and Numerical Techniques (Chapter 4). The actual method used depends on what information we would like to extract. Are we interested in Your weight each Monday when you are on a 6 month diet. The amount of medication in blood pressure pills. The starting salaries for business students from TCU, SMU, and UTA. Others
10
DSs (cont.)
Central tendency describes the location of the middle of the data Variability is the extent values are spread above and below the middle values
a.k.a., Dispersion
11
Descriptive Statistics
Measures of Central Tendency
Mean Median Mode
Measures of Dispersion
Range Variance Standard Deviation
Descriptive Statistics
Measures of Central Tendency
Mean Median Mode
Measures of Dispersion
Range Variance Standard Deviation
Mean:
Most representative
Arithmetic mean or average Mean birth weight, mean DBP
Merits:
Easy to Understand and compute
Limitations:
Affected by extreme values
Not useful for the study of qualities like intelligence, honesty and character
17
X N
X(X bar) refers to the mean of a sample and refers to the mean of a population EX is a command that adds all of the X values n is the total number of values in the series of a sample and N is the same for a population
Mode
Evidence-based Chiropractic
19
Median:
Arrange the data in ascending or descending order. Middle value is median.
Not influenced by extreme values
If the number of cases in the sample is an even number, there will be two middle scores and the median is halfway between these two middle scores. (e.g. 1 2 3 4 5 6 )
Mode:
Most commonly occurring observation.
Not Unique. Not very frequently used. Used in investigation of an epidemic.
Evidence-based Chiropractic
23
Evidence-based Chiropractic
24
Inferential Statistics
Descriptive Statistics describe the data set thats being analyzed, but doesnt allow us to draw any conclusions or make any interferences about the data, other than visual It looks like .. type statements. Hence we need another branch of statistics: inferential statistics. Inferential statistics is also a set of methods, but it is used to draw conclusions or inferences about characteristics of populations based on data from a sample.
1.25
Statistical Inference
Statistical inference is the process of making an estimate, prediction, or decision about a population based on a sample.
Population Sample
Inference
Statistic
Parameter
Statistical Inference
Rationale:
Large populations make investigating each member impractical and expensive + its been shown that observing 100% of a population is not perfect. Easier and cheaper to take a sample and make inferences about the population from the sample.
However:
Such conclusions and estimates are not always going to be correct. For this reason, we build into the statistical inference measures of reliability, namely confidence level and significance level.
1.27
When the purpose of the statistical inference is to test a claim about a population parameter, the significance level measures how frequently a true claim is accidently rejected.
E.g. a 5% significance level means that, in the long run, a true claim will be rejected 5% of the time. Coin flips should result in 50% heads, on average. A 5% significance level implies that we run a 5% risk of concluding that heads do not occur 50% of the time, on average [even though everyone in this room most likely believes that heads do occur 50% of the time].
1.28
1.29