LIV-STATS 2

18/3/24
OBSERVATION/PATTERN
HYPOTHESIS
PREDICTION
TEST THE PREDICTION (study/experimental
design/data collecting/data analysis/stats)
SUPPORT HYPOTHESIS ( interpret results)
COMMUNICATE FINDINGS (report)
REVISE HYPOTHESIS OR TEST OTHER
PREDICTIONS (do it all over again)
OBSERVATION/PATTERN
HYPOTHESIS
PREDICTION
TEST THE PREDICTION (study/experimental
design/data collecting/data analysis/stats)
SUPPORT HYPOTHESIS ( interpret results)
COMMUNICATE FINDINGS (report)
REVISE HYPOTHESIS OR TEST OTHER
PREDICTIONS (do it all over again)
1
18/3/24
SAMPLING,
STATISTICAL ANALYSIS
AND GRAPHING
1) Learn how to determine what analytical approach is appropriate for different types of data.
2) Learn how to compute descriptive statistics for a sample. These statistics include the mean,
standard deviation, standard error, and 95% confidence intervals.
3) Learn to compare 2 continuous variables with correlation and regression.
4) Learn to create bar graphs (with error bars), scatterplots (with best-fit lines), and other types
of graphs.
SAMPLING
• It is rarely the case that biologists can determine the true mean of a variable – it requires
testing ALL of the individuals in the world
• Make inferences and interpretations based on the measure of only a fraction of the
total population, and compute a mean from those individuals (SAMPLE MEAN)
• Assumption: the sample mean is a good predictor of the true mean for the population.
With statistics, we can make

inferences about the
population from the sample
2
18/3/24
Choosing the correct analytical approach:

variable type
• How is each variable measured (are

they grouped, ranked, counted,
units of length, etc.)?
• If the variable has a continuous

string of numbers without any
groupings, It is numeric
• CONTINUOUS
• DISCRETE
• If variable are ranked or in

groupings, they are categorical
• RANGOS
• CATEGORIAS DISCRETAS

variable use/function
Determine how each variable is used or its function in the analysis.
• Independent variable is the predictor variable. It is usually manipulated by researcher in order
to elicit a response. You are predicting that this variable is responsible for the variation in the data
set.
• Dependent variable is the response variable. Its value depends on the value or grouping of the
independent variable. It represents the pattern that is “going to be explained” (Figure 2.4).
3
18/3/24

summary
Knowing the types of variables will help you choose the correct analytical
approach.
- ANOVA/T-test
- Simple linear regression
How do you choose which type of analysis to use? based on your
predictions and the type of variables the independent and dependent
variables are (for example numerical and categorical).
DESCRIPTIVE STATISTICS
• Measures of central tendency – reflect the distribution of the data
(response/dependent variable)
• SAMPLE MEAN (average)
• MEDIAN (middle one)
• MODE (most common)
• The most obvious way in which we can assess the difference

between two groups is to compute the sample mean or average
(Σxi/n);
• sample means are based on only a portion of the whole population
• represents an estimate of the true mean for the whole plot*assuming…
• RELIABILITY OF THE SAMPLE MEAN?
4
18/3/24
MEASURES OF VARIABILITY
• If the means are different.. Is it because of the treatment or are
there normal differences due to other variable or is it the nature of
the data
• There is some degree of uncertainty in stating that means are

different due to a specific variable
• We must take into account this variability when assessing

differences between two samples.
• The first step is to quantify variability in a sample
MEASURES OF VARIABILITY:
STANDARD DEVIATION
10
5
18/3/24
STANDARD DEVIATION
Represents a measure of variability in a data set. The standard deviation is the
square root of the variance
1) compute the deviation: difference between the mean and the value for each replicate
Do this for each sample
-- these values can be + o -.. But we only care about the absolute value
2) take the square of each deviation value (this gets rid of the negative signs). (Xi-mean)2
This is a measure of how different each replicate is from the mean.
3) summation of the deviation squared values (statisticians call this the sum of squares).
Now we just have one value that measures the variability within a sample – but this depends on
how many replicates you have in your sample (few replicates small, many replicates larger)
4) sums of squares divided by the number of replicates (n) - 1.

This value represents the average deviation squared. This value is known as the variance (s2).
5) we need to take the square root of the value r to get back to the original units of the sample
11
STANDARD ERROR
A statistic that reveals how accurately sample data
represents the whole population. It measures the accuracy
with which a sample distribution represents a population by
using standard deviation
se = sd/√n
h"ps://www.youtube.com/watch?v=A82brFpdr9g
12
6
18/3/24
RELIABILITY OF THE SAMPLE MEAN

NUMER OF RATTLESNAKES PER HECTARE IN TWO PROTECTED AREAS. ARE THERE DIFFERENCES?
Notice that the sample means for the

Homochitto and Kisatche are exactly the
same, 10.0.
However, you can see that there is a lot of

variability in the number of rattlesnakes per
hectare in the Kisatche (a range from 1-22,
versus only 8-12 for the Homochitto).
This high variability relative to the
Homochitto sample is reflected in the
standard deviations and standard errors.
Because there is a lot of variability among

replicates for the Kisatche, the sample mean
is not likely to be a very reliable predictor of
the true mean.
13
CONFIDENCE INTERVAL
• The reliability of a sample mean, as a predictor of the true mean, can be
quantified by computing Confidence intervals (CI).
• A CI is actually a range surrounding the sample mean.
• You define the % confidence you will accept in your data… however,
commonly we use 95%
• It is estimated as a derivation of the standard error.
• The 95% CI of the mean is ± 2 times the standard error (2*se).
• How to interpret this … We can be 95% certain that the TRUE MEAN for
the population is somewhere between these two values
h"ps://www.youtube.com/watch?app=desktop&v=w3tM-PMThXk
h"ps://www.youtube.com/watch?v=yDEvXB6ApWc
14
7
18/3/24
CONFIDENCE INTERVAL
• It is estimated as
a derivation of
the standard
error. The 95% CI of
the mean is ± 2 times
the standard error
(2*se).
• if
15
PRUEBAS DE HIPÓTESIS.. Tenía o no

razòn? 6 pasos..
• (1) Null hypothesis. Para un análisis estadísitco la hipótesis nula
(no efecto) está implicita en el análisis
Example: Seed production will be the same in removal and control plots.
• (2) Alternative hypothesis. Básicamente prueba que si hay diferencias

(efecto de VI sobre VD) entre grupos en relación a la variable
medida
Example: Big bluestem seed production will differ
between removal and control plots.
La más sencilla para comparar promedios

Entre dos grupos -- prueba de T*
Más de dos grupos– ANOVA*
* Para datos paramétricos (distribución normal)
16
8
18/3/24

razòn?
h"ps://meta-calculator.com/blog/how-to-interpret-t-test-results/
17

razòn?
18
9
18/3/24

razón? Hay pruebas que prueban esto
*PRUEBAS DE NORMALIDAD*
• (3) Assumptions
• Data are randomly collected and
independent
• Data within each group must be normally
distributed
• Each group must have similar variance.
(Homoscedasticity)
Aquí vamos a asumir que si pasan…
Y si no….
Tenemos alternativas…
Transformación de datos (log,z,etc) https://medium.com/mytake/understanding-different-types-of-distributions-
Análisis no paramétricos (más flexibles, menos you-will-encounter-as-a-data-scientist-27ea4c375eec
precisos)
19

razòn?
(4) Computation of test statistic, - t
test
• is the difference between the means divided by the
combined standard error of both groups
• a measure of the difference between the means of the 2
two groups that you are interested in comparison
(numerator), while taking into account the variation
within the two groups.
• high when the difference between groups is large, and
when the variation within groups is small.
• Pero como sabemos?… hay que

contrastar…
SE= sd/√n
20
10
18/3/24

razòn?
(5) Determination of the
Critical value
• need to know the degrees of
freedom (df), alpha (α), and
have a table of critical t-
values.
• degrees of freedom in a t-test

are the total sample size
(both groups added
together) minus two:
• df = (n1 + n2 )– 2
21

razòn?
Critical value
• Alpha (α) is the probability of
making an error, the error of
concluding, based on the data,
that the two groups are
different when in fact (in the
real world) they are not- Type I
error
• you want the probability of
making this kind of mistake to be
very small -set alpha at 0.05. This
means that 5 in 100 times (5%)
22
11
18/3/24

razòn?
Critical value
• Critical values. It is the value

of T associated to alpha (or
a probability)
• One tailed. _has a direction (0,05 in
that direction)
• Two tailed - could be larger or
smaller (divide 0,05 /2).
• Tabla de T
23
PRUEBAS DE HIPÓTESIS..
(6) outcome and interpretation.
report the test-statistic (t-value), df, the

critical value
Absolute Tvalue > critical value, then reject

the null hypothesis and support the
alternate hypothesis. – there is an effect
Absolute T value < than critical value,

support the null hypothesis and reject the
alternate (there is no effect)
https://www.tdistributiontable.com/
24
12
18/3/24
25
effect of the presence or absence of leafy

spurge on seed production in big
bluestem grass.
h"ps://invasivespecies.idaho.gov/leafy-spurge-factsheet h"ps://www.youtube.com/watch?app=desktop&v=qhC8ihrdxHs
26
13
18/3/24
How we go about it
Once we have our objectives, hypothesis and predictions stated, we can start:
• First, we must design a study to test our predictions. How do we manipulate or model the
independent variable in order detect changes in the dependent variable (response variable) for
our predicted outcome.
• Experimental design: Compare big bluestem seed production on 5 plants from plots with
and without leafy spurge.
• Second, we must determine the location, apply treatments, and collect data.
• Application of treatments: two 25m2 plots of land; one with leafy spurge removed
(treatment plot) and one with leafy spurge present (control plot).
• Third, we must collect data.
• Sample: Collect five plants from each plot and count the number of seeds total in each plot.
• Fourth, we must analyze data using statistical procedures and generate graphs and tables (focus
of the next chapter).
• Statistical test: Are seed counts per stem different in the two plots (t-test and bar graph)?
• Fifth- based on the statistical results we can answer whether The leafy spurge plots did/did not
produce more big bluestem seeds than the plots without leafy spurge (support or not our
hypothesis), and of course explain it
27

bluestem grass.
28
14
18/3/24

bluestem grass.
29
15

LIV-STATS 2

Uploaded by

Copyright:

Available Formats

You might also like

LIV-STATS 2

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

LIV-STATS 2

Uploaded by

Copyright:

Available Formats

18/3/24

With statistics, we can make

Choosing the correct analytical approach:

• How is each variable measured (are

• If the variable has a continuous

• If variable are ranked or in

Choosing the correct analytical approach:

Choosing the correct analytical approach:

• The most obvious way in which we can assess the difference

• There is some degree of uncertainty in stating that means are

• We must take into account this variability when assessing

• The first step is to quantify variability in a sample

4) sums of squares divided by the number of replicates (n) - 1.

RELIABILITY OF THE SAMPLE MEAN

Notice that the sample means for the

However, you can see that there is a lot of

Because there is a lot of variability among

PRUEBAS DE HIPÓTESIS.. Tenía o no

• (2) Alternative hypothesis. Básicamente prueba que si hay diferencias

La más sencilla para comparar promedios

PRUEBAS DE HIPÓTESIS.. Tenía o no

PRUEBAS DE HIPÓTESIS.. Tenía o no

PRUEBAS DE HIPÓTESIS.. Tenía o no

Aquí vamos a asumir que si pasan…

PRUEBAS DE HIPÓTESIS.. Tenía o no

• Pero como sabemos?… hay que

PRUEBAS DE HIPÓTESIS.. Tenía o no

• degrees of freedom in a t-test

PRUEBAS DE HIPÓTESIS.. Tenía o no

PRUEBAS DE HIPÓTESIS.. Tenía o no

• Critical values. It is the value

report the test-statistic (t-value), df, the

Absolute Tvalue > critical value, then reject

Absolute T value < than critical value,

effect of the presence or absence of leafy

effect of the presence or absence of leafy

effect of the presence or absence of leafy

You might also like