Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 29

What is Statistics?

A science of:
Collecting numerical information (data) Evaluating the numerical information (classify, summarize, organize, analyze)

Drawing conclusions based on evaluation

What is Statistics? Where does this Data come from?

Statistics is a way to get information from data


Statistics Data
Data: Facts, especially numerical facts, collected together for reference or information.

Information
Information: Knowledge communicated concerning some particular fact.

Statistics is a tool for creating new understanding from a set of numbers.


Definitions: Oxford English Dictionary

1.2

Key Statistical Concepts


Population a population is the group of all items of interest to a statistics practitioner. frequently very large; sometimes infinite.
E.g. All 5 million Florida voters who voted in todays election.

Sample A sample is a set of data drawn from the population. [Part of a population] Potentially very large, but less than the population.
E.g. a sample of 1000 voters exit polled on election day.
1.3

Common statistical terms


Data
Measurements or observations of a variable

Variable
A characteristic that is observed or manipulated Can take on different values

Independent variables
Precede dependent variables in time Are often manipulated by the researcher The treatment or intervention that is used in a study

Dependent variables
What is measured as an outcome in a study Values depend on the independent variable
4

Key Statistical Concepts


Parameter A descriptive measure of a population. - the true percent of Florida Voters who will vote for Mary Poppins Statistic A descriptive measure of a sample. - Of the 1000 exit voters polled, 550 indicated that they voted for Mary Poppins or 550/1000 = 0.55 or 55% Parameters Summary data from a population Statistics Summary data from a sample
1.5

Key Statistical Concepts


Population Sample

Subset

Parameter

Statistic

Populations have Parameters, Samples have Statistics.


1.6

Populations
A population is the group from which a sample is drawn
e.g., headache patients in a chiropractic office; automobile crash victims in an emergency room

In research, it is not practical to include all members of a population Thus, a sample (a subset of a population) is taken

Evidence-based Chiropractic
7

Random samples
Subjects are selected from a population so that each individual has an equal chance of being selected Random samples are representative of the source population Non-random samples are not representative
May be biased regarding age, severity of the condition, socioeconomic status etc.

Evidence-based Chiropractic
8

Descriptive Statistics
are methods of organizing, summarizing, and presenting data in a convenient and informative way. These methods include: Graphical Techniques (Chapter 2), and Numerical Techniques (Chapter 4). The actual method used depends on what information we would like to extract. Are we interested in Your weight each Monday when you are on a 6 month diet. The amount of medication in blood pressure pills. The starting salaries for business students from TCU, SMU, and UTA. Others

Descriptive Statistics helps to answer these questions


1.9

Descriptive statistics (DSs)


A way to summarize data from a sample or a population DSs illustrate the shape, central tendency, and variability of a set of data
The shape of data has to do with the frequencies of the values of observations

10

DSs (cont.)
Central tendency describes the location of the middle of the data Variability is the extent values are spread above and below the middle values
a.k.a., Dispersion

DSs can be distinguished from inferential statistics


DSs are not capable of testing hypotheses

11

Descriptive Statistics
Measures of Central Tendency
Mean Median Mode

Measures of Dispersion
Range Variance Standard Deviation

Descriptive Statistics
Measures of Central Tendency
Mean Median Mode

Measures of Dispersion
Range Variance Standard Deviation

Mean:

Single value that could describes the characteristics of


the entire data

Most representative
Arithmetic mean or average Mean birth weight, mean DBP

Merits:
Easy to Understand and compute

Based on the value of every item in the series

Limitations:
Affected by extreme values

Not useful for the study of qualities like intelligence, honesty and character

Computing Mean - Sample Problem


Consider the number of children in 6 families. In the first family there are 4 children, in the second there are 2, in the third 5, in fourth & fifth 3, and in the sixth, 4.
Find average number of children per family. Step 1: Summing the scores ie., 4+2+5+3+3+4 = 21 Step 2: Dividing by the number of families ie., 21 6 = 3.5

Measures of central tendency


Mean (a.k.a., average)
The most commonly used DS

To calculate the mean


Add all values of a series of numbers and then divided by the total number of elements

17

Formula to calculate the mean


Mean of a sample
Mean of a population
X X n

X N

X(X bar) refers to the mean of a sample and refers to the mean of a population EX is a command that adds all of the X values n is the total number of values in the series of a sample and N is the same for a population

Measures of central tendency (cont.)


Mode
The most frequently occurring value in a series The modal value is the highest bar in a histogram

Mode

Evidence-based Chiropractic
19

Median:
Arrange the data in ascending or descending order. Middle value is median.
Not influenced by extreme values

Unique and easy to calculate


More appropriate when the measure is Duration (survival), age etc

Computing the Median


To compute the median, we sort the values from low to high. The median is the middle score. If the number of cases in the sample is an odd number, the middle case is the case above and below which the same number of cases occur. ( e.g. 1 2 3 4 5 )

If the number of cases in the sample is an even number, there will be two middle scores and the median is halfway between these two middle scores. (e.g. 1 2 3 4 5 6 )

Mode:
Most commonly occurring observation.
Not Unique. Not very frequently used. Used in investigation of an epidemic.

Measures of central tendency (cont.)


Median
The value that divides a series of values in half when they are all listed in order When there are an odd number of values
The median is the middle value

When there are an even number of values


Count from each end of the series toward the middle and then average the 2 middle values

Evidence-based Chiropractic
23

Measures of central tendency (cont.)


Each of the three methods of measuring central tendency has certain advantages and disadvantages Which method should be used?
It depends on the type of data that is being analyzed e.g., categorical, continuous, and the level of measurement that is involved

Evidence-based Chiropractic
24

Inferential Statistics
Descriptive Statistics describe the data set thats being analyzed, but doesnt allow us to draw any conclusions or make any interferences about the data, other than visual It looks like .. type statements. Hence we need another branch of statistics: inferential statistics. Inferential statistics is also a set of methods, but it is used to draw conclusions or inferences about characteristics of populations based on data from a sample.

1.25

Statistical Inference
Statistical inference is the process of making an estimate, prediction, or decision about a population based on a sample.
Population Sample
Inference

Statistic

Parameter

What can we infer about a Populations Parameters based on a Samples Statistics?


1.26

Statistical Inference
Rationale:
Large populations make investigating each member impractical and expensive + its been shown that observing 100% of a population is not perfect. Easier and cheaper to take a sample and make inferences about the population from the sample.

However:
Such conclusions and estimates are not always going to be correct. For this reason, we build into the statistical inference measures of reliability, namely confidence level and significance level.

1.27

Confidence & Significance Levels


The confidence level is the proportion of times that an interval estimate for a population parameter will be correct.
E.g. a confidence level of 95% means that, interval estimates based on this form of statistical inference will be correct 95% of the time. I am 95% confident that the TRUE mean IQ of female business students at UTA is between 120 and 122.

When the purpose of the statistical inference is to test a claim about a population parameter, the significance level measures how frequently a true claim is accidently rejected.
E.g. a 5% significance level means that, in the long run, a true claim will be rejected 5% of the time. Coin flips should result in 50% heads, on average. A 5% significance level implies that we run a 5% risk of concluding that heads do not occur 50% of the time, on average [even though everyone in this room most likely believes that heads do occur 50% of the time].
1.28

Confidence & Significance Levels


We use (Greek letter alpha) to represent the significance level when testing a claim about a population parameter , and 1 to represent the confidence level when we wish to estimate a population parameter.

1.29

You might also like