Bus Math Chapter 5

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 137

Tables, charts and graphs are all ways of

representing data, and they can be used for


two broad purposes.
The first is to support the collection,
organization and analysis of data as part of
the process of a scientific study. The
second is to help present the conclusions
of a study to a wider audience.
The choices of how to represent data are
influenced by:
• the nature of the data
• the kinds of questions about the data that
are of interest.
When gathering data, you collect different
types of information, depending on what
you hope to investigate or find out. For
example, if you wanted to analyze the
spending habits of people living in Tokyo,
you might send out a survey to 500 people
asking questions about their income, their
exact location, their age, and how much
they spend on various products and
services. These are your variables: data
that can be measured and recorded, and
whose values will differ from one individual
to the next.
A variable is a symbol and placeholder for
any mathematical object.
There are two types of variables:
1. Categorical Variable (or qualitative
variable)
A categorical variable has values that can
be put into a countable number of distinct
groups based on a characteristic.
Sometimes categorical variables take
numerical values but having numerical
values doesn’t mean it has any
mathematical meaning.
There are two types of categorical
variable: nominal and ordinal variable
1.A. Nominal Variable
The nominal scale simply categorizes
variables according to qualitative labels (or
names). These labels and groupings don’t
have any order or hierarchy to them, nor do
convey any numerical value.
Example: Pizza Toppings
1.B. Ordinal Variable
The ordinal scale also categorizes
variables into labeled groups, and these
categories have an order or hierarchy to
them.
Example: At a coffee shop, you might
choose between a small, medium or large
cup
2. Numerical Variable (or quantitative
variable)
Numerical variables have values that
describe a measurable quantity as a
number, like 'how many' or 'how much’.
There are two types of numerical variable:
discrete and continuous variable
2.A. Discrete Variable
A discrete variable can only take certain
distinct or/ and separate values. It is a
variable whose value is obtained by
counting.
Example: Number of transactions done by
a customer on a particular day. It can 0, 1,
2.
2.B. Continuous Variable
A continuous variable is defined as a
variable which can take an infinite set of
values (re: It assumes any value between
two values or certain range). It is a variable
whose value is obtained by measuring.
There are two types of continuous variable:
interval and ratio variable.
Interval Variable is a numerical level of
measurement which, like the ordinal scale,
places variables in order. Unlike the ordinal
scale, however, the interval scale has a
known and equal distance between each
value on the scale.
Unlike the ratio scale, interval data has no
true zero; in other words, a value of zero on
an interval scale does not mean the
variable is absent. This is best explained
using temperature as an example. A
temperature of zero degrees Fahrenheit
doesn’t mean there is “no temperature” to
be measured—rather, it signifies a very low
or cold temperature.
The difference between the two values of
the interval variable is meaningful.
Example: The temp in Baguio City is 21°C
while at Quezon City is 28°C. We can
conclude that it is hotter by 7°C in Quezon
City by taking the difference between the
two temperature.
Ratio Variable is a numerical level of
measurement with equal intervals between
each point and has a true zero. That is, a
value of zero on a ratio scale means that
the variable being measured is absent. For
example, population. If you have a
population count of zero people, this means
there are no people.
The difference or ratio between two values
for ratio variables are both meaningful.
Example: The width of road A is 120 feet,
while the width of road B is 60 feet.
We can conclude that road A is wider by 60
ft. by taking the difference between the two
width. We can also conclude that road A is
two times wider than road B by taking the
ratio between the two width (120/60 = 2).
Ratio Variable can be discrete or
continuous variables.
Presenting and Analyzing Business Data
in Tabular Form
Table is a display of information (numerical
or textual) in rows and columns. The data
are organized to give detailed information.
It can be used to both store and display
data in a structured format.
Example 1: You are evaluating the effect of
different types of fertilizers on plant growth.
You plant 12 tomato plants and divide them
into three groups, where each group
contains four plants. After three weeks, you
measure the growth of each plant in
centimeters (cm) and calculate the average
growth for each type of fertilizer.
Table I – The effect of different brands of fertilizer on tomato
plant growth over three weeks Table Number & Title
Fertilizer Used Plant Growth (in cm) Average
Plant 1 Plant 2 Plant 3 Plant 4
None 10 12 8 9 9.75
Fertilizer ‘A’ 15 16 14 12 14.25
Fertilizer ‘B’ 22 25 21 27 23.75

Source Note & Footnote (if any)


Table I – The effect of different brands of fertilizer on tomato
plant growth over three weeks Column Headings

Fertilizer Used Plant Growth (in cm) Average


Plant 1 Plant 2 Plant 3 Plant 4
None 10 12 8 9 9.75
Body
Fertilizer ‘A’ 15 16 14 12 14.25
Fertilizer ‘B’ 22 25 21 27 23.75

Row Headings
Tables are also used to support the
processing of raw data in various ways.
One example is when further columns are
added to a table to carry out calculations on
existing columns.
The figure shows a table in which two of
the columns (mass and volume) are used
to collect measured values, while the final
column (density) contains calculated
values.
FAQs:
How to present data into a graph?
What type of graph to use?
What should we put in X-axis and Y-
axis? ~ Typically, the independent
variable will be shown on the X-axis and
the dependent variable will be shown on
the Y-axis.
An independent variable is manipulated
or changed by the researcher to investigate
the effect on a dependent variable.
The dependent variable is the variable
which occurrence or frequency depends on
the conditions and the manipulation of the
independent variable. It is called the
dependent variable because its value
depends on and varies with the value of the
independent variable.
Presenting and Analyzing Business Data
in Graphical Form
A chart is a graphical representation of data
(organizes a set of numerical or qualitative
data, adorned with extra information,
constructed for a special purpose) in which
"the data is represented by symbols”.
Charts are often used to ease
understanding of large quantities of data
and know the relationships between parts of
the data.
What is the difference between a chart
and a graph?
Charts present information in the form of
graphs, diagrams or tables.
Graphs show the mathematical relationship
between sets of data. Graphs are one type
of chart, but not the only type of chart; in
other words, all graphs are charts, but
not all charts are graphs.
General Rules
1. Clarity and simplicity are key.
Remember to keep things simple: let the
data speak for itself. You don’t need neon
colors or myriad thematic icons to get a
point across. Data visualizations should be
a combination of visual appeal and clearly
represented information, but if you have to
choose, be simple.
If you find that your chart is getting overly
complicated, think about splitting it up into
multiple charts. This can make the
information easier to read and absorb.
2. Make it easy to read and interpret.
Help your readers understand the point you
are trying to make with your data. Start by
giving your visualization an informative title.
Provide a legend and labels: make it clear
what symbols, colors, and sizes mean, and
be consistent in their usage. Emphasize
the units you are using. You can even use
arrows and concise phrases to call
attention to important elements of your
chart.
When dealing with information sorted into
categories, organize values in a meaningful
order (such as ascending or descending in
terms of their values) to make it easy for
others to compare values.
When using colors, use hues that stand out
from one another or use a saturation
spectrum (going from very light to very
dark) of a single color, making sure your
reader can easily distinguish between
hues. Avoid using color combinations that
are hard to distinguish by the readers.
3. Respect visual and mathematical
principles.
When using shapes to convey data, size
them proportionally according to their area,
rather than their length or diameter.
Separate your data into variables. For
example, if you are creating a bar chart
comparing the total populations of different
countries, the variable you’re looking at is
population (and the numbers for each
country are the different values).
Keep things in two dimensions, preferably:
3D shapes are difficult to read and
compare. The perspective that is used to
create the illusion of three dimensions can
also be confusing for readers by
accidentally making some items feel larger
or smaller than they really are.
A lot of visualizations include icons, or
small pictures, as decoration. Consider
leaving these out. Even when they match
your data, they can distract from the point
you are trying to make. They often make it
more difficult to make comparisons and
assess differences. Stick with plain
representative shapes instead.
4. Play around with your data!
It’s easy to test out a couple different charts
and see which ones do a good job
showcasing your data — and which ones
do not: play around with the tools at your
disposal to get an idea for what feels right
for visualizing an individual dataset. Excel
and Google Sheets are good starting
points: you can switch from chart to chart at
the click of a button, and it’s easy to
customize general elements.
You might find things you hadn’t noticed
before, (trends, patterns, outliers — or
even typos or errors in the data) and you’ll
definitely get a good sense of what charts
and graphs are a good fit for your data.
5. Cite your sources.
Finally, always give the source of your data
so others can investigate for themselves.
It’s like providing a bibliography at the end
of a paper: it’s good scholarly practice, and
it lets your readers know your data comes
from a legitimate source.
43
1. Line Graph uses line segments to
connect data points. It is useful in showing
the trends or in determining relationships
between two variables and analyzing how
data has changed over time. Line graphs
have an x-axis and a y-axis.
Uses of line graphs:
• When you want to show trends. For
example, how house prices have
increased over time.
• When you want to make predictions
based on a data history over time.
• When comparing two or more different
variables, situations, and information
over a given period.
• To illustrate changes of continuous
variables (over time)
Title of the Chart

Figure Number & Title

Source Note
Intervals

Data Points
* connected to form a line

Axis Labels
In Figure 6, the line chart answers
questions like:
• How did the unemployment rate evolve
over time?
• When in this period of time was the
unemployment rate highest? And when
was it lowest?
From this chart, we can see how the
unemployment rate often rises and falls by
small amounts from month to month. The
big spike in early 2008 can be explained
using some background knowledge: that is
when the recession hit. It could be helpful
for this chart to add an annotation to
explain this sudden climb, since the cause
is known.
2. Pie Chart shows how much of the whole
each part makes up or the part-whole
relationships. Each slice of the pie is written
as a percentage.
The earliest recorded example of a pie chart
is by William Playfair in his book 'A
Statistical Breviary' published in 1801.
amount of an item
x 100%
total amount of all items
Note: The sum of all parts is equal to 1.
Pie Chart Uses:
• When you want to create and represent
the composition of something.
• When comparing areas of growth or
illustrating the differences in
categories (limited to 2 – 8 categories)
Title of the Chart

Figure Number & Title

Source Note
Data Legend or Key

Pie Slices
The pie charts in Figure 1 showcase the
breakdown by gender of the number of
faculty members at institutions of higher
education in the United States in two
different years, 1987 and 2011.
If x is the variable representing the number
of men in the chart, and y is the variable
representing the number of women, what
do you notice? What information does the
chart communicate?
Answer: These pie charts tell us that, while women made up one
third of faculty members in the United States in 1987, in 2011 they
made up almost one half of the total number of faculty members.
Together, these two charts tell a more complex story than they
would separately, because they show an evolution in time.
3. Bar Graph displays values assigned to
individual categories. Each bar represents
an entire, exact value for a variable in
question.
Bar Charts Uses:
• To compare data among different
categories
• Bar charts can also show large data
changes over time.
• Ideal for visualizing the distribution of
data for more than three categories.
Title of the Chart

Figure Number & Title


Source Note
Legend

Intervals
Data
* depends on the type
of data/ variable

Axis Labels
* include units
Figure 3 shows the number of male and
female faculty members at institutions of
higher education in the U.S. between 1987
and 2011. Each year gets two bars: one for
the number of women and one for the
number of men. What do you think about
this chart? How does it convey information
differently than the Figure 1 pie chart?
Answer: The chart in Figure 3 tells an interesting story. While both
grow, the number of female faculty grows at a more rapid rate than
the number of male faculty: between 1987 and 2011, the number of
female faculty has almost tripled. This chart helps you compare this
information more effectively than a pie chart for each year would,
since you can compare each bar to all the other bars. These bar
charts provide a bigger picture than the pie charts in Figure 1: here,
we see both the ratio of men to women, by comparing the two bars
for a given year, and the raw numbers that show how much the
number of faculty has grown between 1987 and 2011.

Note that the bar chart in Figure 3 showcases data that is


continuous: the years depicted have a sequential order, so you can
talk about an upward trend, or growth, in faculty members as years
go by and you can observe an evolution from one set of bars to
another. But bar charts do not necessarily have to showcase
continuous data: they can also showcase data for distinct
categories. In a bar chart showing the total populations of different
countries, each country is a separate entity: you can compare the
values associated with them, but you can’t chart an evolution
between them.
4. Histogram is a graphical representation
of the frequency distribution. It may look
like a bar chart, but it displays numeric
rather than categorical data. A histogram
groups values into consecutive numeric
ranges or intervals, also known as bins: the
more values from a dataset fall within a
particular range, the bigger its bar.
The ranges are continuous, so bars do not
usually have much space between them
(unlike bar charts, which use the spaces
between bars to distinguish between
categories).
A histogram answers questions like:
• What are the patterns in my data?
• In what intervals do data points have the
highest frequency (i.e., in what intervals are
data points most concentrated)?
• What is the distribution of my data? Does
it skew a certain way?
A histogram answers questions like:
• What are the patterns in my data?
• In what intervals do data points have the
highest frequency (i.e., in what intervals are
data points most concentrated)?
• What is the distribution of my data? Does
it skew a certain way?
The two histograms in Figure 9 both
showcase the same data: tips given in a
restaurant. But the sizes of the intervals (the
bins) are different. The histogram at the top
has a $1 bin width. And the histogram at the
bottom has a 10¢ bin width: this allows you
to see the data in greater detail. What do
the two different histograms tell you about
the data?
Both bin widths used by the two histograms
reveal different patterns in the data. The
histogram with the $1 bin width
demonstrates very clearly that the data
skews to the right (i.e., to smaller rather
than larger tips — since that’s where the
highest frequencies are on the graph). It
shows that the range with the highest
frequency is $1.5 to $2.5.
The histogram with the 10¢ bin width shows
an interesting pattern: tips that are round
dollar amounts have higher frequencies. It
also shows more precisely what range has
the highest frequency: it is the $1.95 to
$2.05 range.
Pareto Chart identify the most frequent
defects, complaints, or any other factor you
can count and categorize. The chart takes
its name from Vilfredo Pareto, originator of
the "80/20 rule," which postulates that 80%
of problems and events happen because of
20% of the causes and resources.
5. A pictograph displays numerical
information with the use of picture or
symbols to represent quantitative data.
Each symbol corresponds to a specific
quantity or number of units.
Pictograph Uses:
• When your audience prefers and
understands better displays that include
icons and illustrations.
• It’s habitual for infographics to use of a
pictogram.
• When you want to compare two points
in an emotionally powerful way.
Proportion of men and women who experienced workplace
harassment in the past 12 months, 2016 Title of the Chart

Category Data

Legend: = 10 persons

Key
Analyzing Business Data Using
Statistical Tools
Measures of Central Tendency is a
summary statistic or measure that
represents the center point or typical value
of a data set or of a probability distribution.
These include mean, median and mode.
1. Mean is the arithmetic average.
The mean of n numbers 𝑎1 , 𝑎2 , 𝑎3 , … 𝑎𝑛 is
(sum of all the data values) ÷ (number of
data values) . In formula form,
𝑛
𝑎1 + 𝑎2 + 𝑎3 , + ⋯ + 𝑎𝑛
= ෍ 𝑎𝑖 Τ 𝑛
𝑛
𝑖=1
2. The median is the middle observation if
the number of observations is odd or the
mean of the two middle observations if the
number of observations is even provided
the data is written in increasing (or
decreasing) order. Median can be obtained
for quantitative data; qualitative data has no
median.
3. The mode is the observation with the
highest frequency. The mode uses the
frequencies and hence a mode can be
obtained for both quantitative and
qualitative data.
How to interpret the results?
Many questions related to mean, mode and
median merely test a pupil’s ability to recall
a formula, to substitute the values into the
formula and to compute an arithmetically
correct answer (operational or instrumental
understanding) while pupils lack a
relational or functional understanding of
these measures of central tendency.
Many questions in data handling deal only
with arithmetical aspects and not real
statistical questions. How to compute
measures for central tendency is of limited
value when the pupil does not know how
to interpret the values.
For example: A train is to leave the station
at 8.30 each morning. The departing time
on Monday was 35 minutes late due to a
fire in the restaurant car. On Tuesday, the
train was delayed 5 minutes, on
Wednesday, 3 minutes, Thursday, 3
minutes and Friday, 4 minutes. What was
the average number of minutes that the
train was late leaving?
Is this ‘average’ a good figure to use to
represent the week’s data? Why or why
not? Explain. No, because the delay on Monday is an exceptional
instance, and hence it should NOT be included in the
average; the 10 minutes are not a reflection of what
passengers might expect.
The concept of the mean
1. The mean is located between the
extreme values.
For example: The number of pupils present
in class during a week are as follows:
Monday – 26
Tuesday – 18
Wednesday – 24
Thursday – 29
Friday – 28
The mean is (26 + 18 + 24 + 29 + 28) ÷ 5
= 25
This is between the extreme values of 18
and 29.
2. The sum of the deviations from the
mean is zero.
Using the same data as in the example
above, the deviations from the mean 25
are: +1, -7, -1, +4, +3. The sum of being
zero.
3. The average is influenced by values
that deviate from the average.
Using the above data and assuming that
on Saturday 28 pupils attended, the mean
attendance over the six days will be
different from 25. However, if on Saturday
25 pupils attended (the mean over the first
five days) the mean over the six days will
remain 25.
4. The average can be a fractional value
with no counterpart in reality.
The example in 4 shows that the mean
(25.5) can be a decimal with no real object
it can refer to in reality: 25.5 pupils do not
exist.
5. The average value is representative of
the values that were averaged.
This is an important property used when
interpreting data. The mean represents all
the data in the data set.
6. In computing an average, a value of
zero, if it appears, is to be taken into
account.
Some pupils have the misunderstanding
that 0 is ‘nothing’ and hence need not to
be included in calculation of the mean.
However, 0 is a legitimate numerical
value.
For example, somebody might have 0
brothers and sisters, the temperature might
be 0° C.
Which is the best average to use?
The best average to use depends on the
situation and what you want to use the
average for.
The mean is the most commonly used
measure of central tendency as it is the
only one of the three averages using all the
data. It takes all the data in the distribution
into account. The mean—being
arithmetically based—can be combined
with the means of other groups on the
same variable. The median and the mode,
not being arithmetically based, do not have
such a property. However, using the mean
can give a rather distorted picture of the
data if there are outliers, or if the mean is
not meaningful in the given context.
Examples:
1. The leader of a youth club can get
discounts on cans of drinks if she buys all
one size. She tool a vote on which size the
members of the club wanted.
Size of can (mL) 100 200 330 500
Number of votes 9 12 19 1
Mode = 330 mL, median = 200 mL and
mean = 245.6 mL. Which size should she
buy? Answer: The mean is clearly of no use – cans of size 245.60
mL do not exist. The median would be possible as 200 mL
cans are for sale. However only 12 out of the 41 club
members want this size. In this case, the mode is the best
average to use as it is the most popular one among the club
members.
2. In a small butchery the four laborer earn
each $400 per month, the supervisor earns
$1200 and the manager $2600. Which
average best represents the monthly
wages earned?
Answer: The mean ($900) is misleading as it is more than
twice the salary earned by most workers. The median
($400) is representative. The other appropriate average to
use is the mode: it gives the wages of most of the workers.
When datasets are skewed to one side, like wages or house
prices, the median and mode are more realistic than the
mean.
3. The time taken (in hours) by 6 pupils to
complete their project was 20, 25, 31, 35,
87, 87. Which average best represents the
time spent by the students in completing
their project?
Answer: The mean is 47.5 but most pupils worked less than
that on their project. The mode is 87 but is also misleading.
The median is 33 which is the best to use as it tells us that
half of the number of pupils needed less than 33h and half
needed more.
The mean is generally used if the data is
more or less symmetrically grouped about
a central point, i.e., the data do not contain
outliers. If further calculation is required
(e.g., measures of dispersion) or
comparison with a similar measure on
another group is intended, or the (sample)
mean is to be used in estimating
parameters of the population then the
mean is to be used as mode and median
cannot be used in ‘further’ calculations.
A distribution with outliers is frequently best
described by using the median.
The mode is used when the context
suggests ‘most usual’ or ‘typical’ value.
Measures of Variability define how far
away (spread) the data points tend to fall
from the center (mean) or the discrepancy
or difference between the data.
The measures of variability are important
for the following purposes:
• Used to test the extent to which an
average represents the characteristics of
a data. If the variation is small, then it
indicates high uniformity of values in the
distribution and the average represents
the characteristics of the data. On the
other hand, if variation is large then it
indicates lower degree of uniformity and
unreliable average.
• Help in identifying the nature and cause
of variation. Such information can be
useful to control the variation.
• Help in the comparison of the spread in
two or more sets of data with respect to
their uniformity or consistency.
• Facilitate the use of other statistical
techniques such as correlation, regression
analysis, and so on.
Example: A math teacher is interested to
know the performance of two groups (A
and B) of her students. She gives them a
test of 40 points. The marks obtained by
the students of groups A and B in the test
are as follows:
Marks of Group A: 5, 4, 38, 38, 20, 36, 17,
19, 18, 5
Marks of Group B: 22, 18, 19, 21, 20, 23,
17, 20,18, 22
The mean scores of both the groups is 20,
as far as mean goes there is no
difference in the performance of the two
groups. But there is a difference in the
performance of the two groups in terms of
how each individual student varies
in marks from that of the other. For
instance, the test scores of group A are
found to range from 5 to 38 and the test
scores of group B range from 18 to 23.
It means that some of the students of group
A are doing very well, some are doing very
poorly and performance of some of the
students is falling at the average level.
On the other hand, the performance of all
the students of the second group is falling
within and near about the average (mean)
that is 20. It is evident from this that the
measures of central tendency provide us
incomplete picture of a set of data. It gives
insufficient base for the comparison
of two or more sets of scores.
1. Range is the difference between the
highest and lowest entries.
Range = highest entry – lowest entry
The calculation of range is based only on
two extreme values in the data set and
does not consider other values of the data
set.
A single extreme score (unusually large or
small score) may also increase the range
disproportionately.
2. Quartile is a statistical term describing a
division of observations into four defined
intervals based upon the values of the data.
There are three quartiles: 𝑄1 , 𝑄2 𝑎𝑛𝑑 𝑄3 .
The value of quartile deviation is based on
the middle 50 percent values, it is not
based on all the observations.
3. Standard Deviation uses the mean of the
distribution as a reference point and
measures variability by computing the
average of distance of all the scores around
the mean.
The standard deviation of population is
denoted by ‘σ’ (Greek letter sigma) and that
for a sample is ‘s’.
Standard deviation shows how much
variation there is, from the mean. SD is
calculated from the mean only. If standard
deviation is low, it means that the data is
close to the mean. A high standard deviation
indicates that the data is spread out over a
large range of values. Standard deviation
may serve as a measure of uncertainty.
If you want to test the theory or in other
word, want to decide whether
measurements agree with a theoretical
prediction, the standard deviation provides
the information. If the difference between
mean and standard deviation is very large,
then the theory being tested probably needs
to be revised. The mean with smaller
standard deviation is more reliable than
mean with large standard deviation. A
smaller standard deviation shows the
homogeneity of the data. The value of
standard deviation is based on every
observation in a set of data. It is the only
measure of dispersion capable of algebraic
treatment therefore, standard deviation is
used in further statistical analysis.
4. The term variance was used to describe
the square of the standard deviation by
R.A. Fisher in 1913. The concept of
variance is of great importance in advanced
work where it is possible to split the total
into several parts, each attributable to one
of the factors causing variations in their
original series.
Variance is a measure of the dispersion of a
set of data points around their mean value.
It is a mathematical expectation of the
average squared deviations from the mean.
Example: Two machines, A and B, in a
factory produce pens which are on average
10 inches long. A sample of 11 pens is
selected from each machine.
Machine
6 6 6 8 8 10 12 12 14 14 14
A
Machine
6 8 8 10 10 10 10 10 12 12 14
B

Which of the two machines produce pens


that are close to the average pen length?
In Statistics, tests of significance are the
method of reaching a conclusion to reject or
support the claims based on sample data.
Tests for statistical significance are used to
address the question: what is the
probability that what we think is a
relationship between two variables is
really just a chance occurrence?
In statistics, tests of significance are the
method of reaching a conclusion to reject
or support the claims based on sample
data.
A test of significance is a formal procedure
for comparing observed data with a claim
(also called a hypothesis), the truth of
which is being assessed.
• The claim is a statement about a
parameter, like the population proportion p
or the population mean µ.
• The results of a significance test are
expressed in terms of a probability that
measures how well the data and the claim
agree.
In order to determine if two numbers are
significantly different, a statistical test must
be conducted to provide evidence. (The
difference could possibly be attributed to
chance or to sampling error.) Researchers
cannot rely on subjective interpretations.
Researchers must collect statistical evidence
to make a claim, and this is done by
conducting a test of statistical significance.
The first step in conducting a test of
statistical significance is to state the
hypothesis.
The claim tested by a statistical test is called
the null hypothesis, H0 .
The test is designed to assess the strength
of the evidence against the null hypothesis.
The term null is used because this
hypothesis assumes that there is no
difference between the two means or
that the recorded difference is not
significant.
The claim about the population that
evidence is being sought for is the
alternative hypothesis (Ha ).
The alternative is one-sided if it states that a
parameter is larger or smaller than the null
hypothesis value.
It is two-sided if it states that the parameter
is different from either smaller or larger).
The alternative hypothesis is the claim that
researchers are actually trying to prove is
true. However, they prove it is true by
proving that the null hypothesis is false. If
the null hypothesis is false, then its
opposite, the alternative hypothesis, must
be true.
When using logical reasoning, it is much
easier to demonstrate that a statement is
false, than to demonstrate that it is true.
This is because proving something false
only requires one counterexample.
Proving something true, however, requires
proving the statement is true in every
possible situation.
When conducting a significance test, the
goal is to provide evidence to reject the null
hypothesis.
• If the evidence is strong enough to reject
the H0 , then Ha can automatically be
accepted.
• However, if the evidence is not strong
enough, researchers fail to reject the H0 .
Next step is to compute the test statistics.
Type of Test Purpose Example
Z – Test Testing the difference of a Do babies born at a
sample mean with a known certain hospital
population mean weigh more than the
city average?
(n > 30), SD known
1 Sample T – Testing the difference of a Is the average height
Test sample mean with a known of male college
population mean students greater than
6.0 feet?
(n < 30), SD unknown
Paired T – Testing the average of the Weigh a set of
Test differences between paired people.
or dependent samples is Put them on a diet
equal to a target value plan
Weigh them after

Is the average weight


loss significant
enough to conclude
the diet works?
2 Sample T - Testing the difference Is the average speed
Test between the averages of two of cyclists during
independent populations is rush hour greater
equal to a target value than the average
speed of drivers?
After computing the test statistic, the next
step is to find out the p-value or the
probability of obtaining this score when the
null hypothesis is true.
It represents the percent of chance exists
of getting this specific sample mean score
if it is actually no different from the
population mean.
When the p-value is very small,
researchers can say they have strong
evidence that the null hypothesis is false.
This is because if the p-value is very small,
it means that the probability of obtaining a
score that is so extreme or even higher is
very small.
It is important to know how small the p-
value needs to be in order to reject the null
hypothesis.
The cutoff value for p is called alpha, or
the significance level.
The researcher establishes the value of
alpha prior to beginning the statistical
analysis.
In social sciences, alpha is typically set at
0.05 (or 5%). This represents the amount
of acceptable error, or the probability of
rejecting a null hypothesis that is in fact
true.
Once the alpha level has been selected
and the p-value has been computed:
• If the p-value is larger than alpha, accept
the null hypothesis and reject the
alternative hypothesis.
• If the p-value is smaller than alpha, reject
the null hypothesis and accept the
alternative hypothesis.
In summary,
1. State the null and alternative hypotheses
2. Calculate the test statistic. .
3. Find the p-value (using a table or
statistical software).
4. Compare p-value with α and decide
whether the null hypothesis should be
rejected or accepted.

You might also like