What Is Statistics?: A Science of

What is Statistics?
A science of:
Collecting numerical information (data) Evaluating the numerical information (classify, summarize, organize, analyze)
Drawing conclusions based on evaluation
What is Statistics? Where does this Data come from?
Statistics is a way to get information from data

Statistics Data
Data: Facts, especially numerical facts, collected together for reference or information.
Information
Information: Knowledge communicated concerning some particular fact.
Statistics is a tool for creating new understanding from a set of numbers.

Definitions: Oxford English Dictionary
1.2
Key Statistical Concepts

Population a population is the group of all items of interest to a statistics practitioner. frequently very large; sometimes infinite.
E.g. All 5 million Florida voters who voted in todays election.
Sample A sample is a set of data drawn from the population. [Part of a population] Potentially very large, but less than the population.
E.g. a sample of 1000 voters exit polled on election day.
1.3
Common statistical terms

Data
Measurements or observations of a variable
Variable
A characteristic that is observed or manipulated Can take on different values
Independent variables
Precede dependent variables in time Are often manipulated by the researcher The treatment or intervention that is used in a study
Dependent variables
What is measured as an outcome in a study Values depend on the independent variable
4

Parameter A descriptive measure of a population. - the true percent of Florida Voters who will vote for Mary Poppins Statistic A descriptive measure of a sample. - Of the 1000 exit voters polled, 550 indicated that they voted for Mary Poppins or 550/1000 = 0.55 or 55% Parameters Summary data from a population Statistics Summary data from a sample
1.5

Population Sample
Subset
Parameter
Statistic
Populations have Parameters, Samples have Statistics.

1.6
Populations
A population is the group from which a sample is drawn
e.g., headache patients in a chiropractic office; automobile crash victims in an emergency room
In research, it is not practical to include all members of a population Thus, a sample (a subset of a population) is taken
Evidence-based Chiropractic
7
Random samples
Subjects are selected from a population so that each individual has an equal chance of being selected Random samples are representative of the source population Non-random samples are not representative
May be biased regarding age, severity of the condition, socioeconomic status etc.
8
Descriptive Statistics
are methods of organizing, summarizing, and presenting data in a convenient and informative way. These methods include: Graphical Techniques (Chapter 2), and Numerical Techniques (Chapter 4). The actual method used depends on what information we would like to extract. Are we interested in Your weight each Monday when you are on a 6 month diet. The amount of medication in blood pressure pills. The starting salaries for business students from TCU, SMU, and UTA. Others
Descriptive Statistics helps to answer these questions

1.9
Descriptive statistics (DSs)

A way to summarize data from a sample or a population DSs illustrate the shape, central tendency, and variability of a set of data
The shape of data has to do with the frequencies of the values of observations
10
DSs (cont.)
Central tendency describes the location of the middle of the data Variability is the extent values are spread above and below the middle values
a.k.a., Dispersion
DSs can be distinguished from inferential statistics

DSs are not capable of testing hypotheses
11
Measures of Central Tendency
Mean Median Mode
Measures of Dispersion
Range Variance Standard Deviation
Measures of Central Tendency
Mean Median Mode
Measures of Dispersion
Range Variance Standard Deviation
Mean:
Single value that could describes the characteristics of

the entire data
Most representative
Arithmetic mean or average Mean birth weight, mean DBP
Merits:
Easy to Understand and compute
Based on the value of every item in the series
Limitations:
Affected by extreme values
Not useful for the study of qualities like intelligence, honesty and character
Computing Mean - Sample Problem

Consider the number of children in 6 families. In the first family there are 4 children, in the second there are 2, in the third 5, in fourth & fifth 3, and in the sixth, 4.
Find average number of children per family. Step 1: Summing the scores ie., 4+2+5+3+3+4 = 21 Step 2: Dividing by the number of families ie., 21 6 = 3.5
Measures of central tendency

Mean (a.k.a., average)
The most commonly used DS
To calculate the mean

Add all values of a series of numbers and then divided by the total number of elements
17
Formula to calculate the mean

Mean of a sample
Mean of a population
X X n
X N
X(X bar) refers to the mean of a sample and refers to the mean of a population EX is a command that adds all of the X values n is the total number of values in the series of a sample and N is the same for a population
Measures of central tendency (cont.)

Mode
The most frequently occurring value in a series The modal value is the highest bar in a histogram
Mode
19
Median:
Arrange the data in ascending or descending order. Middle value is median.
Not influenced by extreme values
Unique and easy to calculate

More appropriate when the measure is Duration (survival), age etc
Computing the Median

To compute the median, we sort the values from low to high. The median is the middle score. If the number of cases in the sample is an odd number, the middle case is the case above and below which the same number of cases occur. ( e.g. 1 2 3 4 5 )
If the number of cases in the sample is an even number, there will be two middle scores and the median is halfway between these two middle scores. (e.g. 1 2 3 4 5 6 )
Mode:
Most commonly occurring observation.
Not Unique. Not very frequently used. Used in investigation of an epidemic.

Median
The value that divides a series of values in half when they are all listed in order When there are an odd number of values
The median is the middle value
When there are an even number of values

Count from each end of the series toward the middle and then average the 2 middle values
23

Each of the three methods of measuring central tendency has certain advantages and disadvantages Which method should be used?
It depends on the type of data that is being analyzed e.g., categorical, continuous, and the level of measurement that is involved
24
Inferential Statistics
Descriptive Statistics describe the data set thats being analyzed, but doesnt allow us to draw any conclusions or make any interferences about the data, other than visual It looks like .. type statements. Hence we need another branch of statistics: inferential statistics. Inferential statistics is also a set of methods, but it is used to draw conclusions or inferences about characteristics of populations based on data from a sample.
1.25
Statistical Inference
Statistical inference is the process of making an estimate, prediction, or decision about a population based on a sample.
Population Sample
Inference
Statistic
Parameter
What can we infer about a Populations Parameters based on a Samples Statistics?

1.26
Statistical Inference
Rationale:
Large populations make investigating each member impractical and expensive + its been shown that observing 100% of a population is not perfect. Easier and cheaper to take a sample and make inferences about the population from the sample.
However:
Such conclusions and estimates are not always going to be correct. For this reason, we build into the statistical inference measures of reliability, namely confidence level and significance level.
1.27
Confidence & Significance Levels

The confidence level is the proportion of times that an interval estimate for a population parameter will be correct.
E.g. a confidence level of 95% means that, interval estimates based on this form of statistical inference will be correct 95% of the time. I am 95% confident that the TRUE mean IQ of female business students at UTA is between 120 and 122.
When the purpose of the statistical inference is to test a claim about a population parameter, the significance level measures how frequently a true claim is accidently rejected.
E.g. a 5% significance level means that, in the long run, a true claim will be rejected 5% of the time. Coin flips should result in 50% heads, on average. A 5% significance level implies that we run a 5% risk of concluding that heads do not occur 50% of the time, on average [even though everyone in this room most likely believes that heads do occur 50% of the time].
1.28
Confidence & Significance Levels

We use (Greek letter alpha) to represent the significance level when testing a claim about a population parameter , and 1 to represent the confidence level when we wish to estimate a population parameter.
1.29

What Is Statistics?: A Science of

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

What Is Statistics?: A Science of

Uploaded by

Copyright:

Available Formats

What is Statistics?

Drawing conclusions based on evaluation

What is Statistics? Where does this Data come from?

Statistics is a way to get information from data

Statistics is a tool for creating new understanding from a set of numbers.

Key Statistical Concepts

Common statistical terms

Key Statistical Concepts

Key Statistical Concepts

Populations have Parameters, Samples have Statistics.

Descriptive Statistics helps to answer these questions

Descriptive statistics (DSs)

DSs can be distinguished from inferential statistics

Single value that could describes the characteristics of

Based on the value of every item in the series

Computing Mean - Sample Problem

Measures of central tendency

To calculate the mean

Formula to calculate the mean

Measures of central tendency (cont.)

Unique and easy to calculate

Computing the Median

Measures of central tendency (cont.)

When there are an even number of values

Measures of central tendency (cont.)

What can we infer about a Populations Parameters based on a Samples Statistics?

Confidence & Significance Levels

Confidence & Significance Levels

You might also like