LIV-STATS 2

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

18/3/24

OBSERVATION/PATTERN
HYPOTHESIS
PREDICTION
TEST THE PREDICTION (study/experimental
design/data collecting/data analysis/stats)
SUPPORT HYPOTHESIS ( interpret results)
COMMUNICATE FINDINGS (report)
REVISE HYPOTHESIS OR TEST OTHER
PREDICTIONS (do it all over again)

OBSERVATION/PATTERN
HYPOTHESIS
PREDICTION
TEST THE PREDICTION (study/experimental
design/data collecting/data analysis/stats)
SUPPORT HYPOTHESIS ( interpret results)
COMMUNICATE FINDINGS (report)
REVISE HYPOTHESIS OR TEST OTHER
PREDICTIONS (do it all over again)

1
18/3/24

SAMPLING,
STATISTICAL ANALYSIS
AND GRAPHING
1) Learn how to determine what analytical approach is appropriate for different types of data.
2) Learn how to compute descriptive statistics for a sample. These statistics include the mean,
standard deviation, standard error, and 95% confidence intervals.
3) Learn to compare 2 continuous variables with correlation and regression.
4) Learn to create bar graphs (with error bars), scatterplots (with best-fit lines), and other types
of graphs.

SAMPLING
• It is rarely the case that biologists can determine the true mean of a variable – it requires
testing ALL of the individuals in the world
• Make inferences and interpretations based on the measure of only a fraction of the
total population, and compute a mean from those individuals (SAMPLE MEAN)
• Assumption: the sample mean is a good predictor of the true mean for the population.

With statistics, we can make


inferences about the
population from the sample

2
18/3/24

Choosing the correct analytical approach:


variable type

• How is each variable measured (are


they grouped, ranked, counted,
units of length, etc.)?

• If the variable has a continuous


string of numbers without any
groupings, It is numeric
• CONTINUOUS
• DISCRETE

• If variable are ranked or in


groupings, they are categorical
• RANGOS
• CATEGORIAS DISCRETAS

Choosing the correct analytical approach:


variable use/function
Determine how each variable is used or its function in the analysis.
• Independent variable is the predictor variable. It is usually manipulated by researcher in order
to elicit a response. You are predicting that this variable is responsible for the variation in the data
set.
• Dependent variable is the response variable. Its value depends on the value or grouping of the
independent variable. It represents the pattern that is “going to be explained” (Figure 2.4).

3
18/3/24

Choosing the correct analytical approach:


summary
Knowing the types of variables will help you choose the correct analytical
approach.
- ANOVA/T-test
- Simple linear regression
How do you choose which type of analysis to use? based on your
predictions and the type of variables the independent and dependent
variables are (for example numerical and categorical).

DESCRIPTIVE STATISTICS
• Measures of central tendency – reflect the distribution of the data
(response/dependent variable)
• SAMPLE MEAN (average)
• MEDIAN (middle one)
• MODE (most common)

• The most obvious way in which we can assess the difference


between two groups is to compute the sample mean or average
(Σxi/n);
• sample means are based on only a portion of the whole population
• represents an estimate of the true mean for the whole plot*assuming…
• RELIABILITY OF THE SAMPLE MEAN?

4
18/3/24

MEASURES OF VARIABILITY
• If the means are different.. Is it because of the treatment or are
there normal differences due to other variable or is it the nature of
the data

• There is some degree of uncertainty in stating that means are


different due to a specific variable

• We must take into account this variability when assessing


differences between two samples.

• The first step is to quantify variability in a sample

MEASURES OF VARIABILITY:
STANDARD DEVIATION

10

5
18/3/24

MEASURES OF VARIABILITY:
STANDARD DEVIATION
Represents a measure of variability in a data set. The standard deviation is the
square root of the variance
1) compute the deviation: difference between the mean and the value for each replicate
Do this for each sample

-- these values can be + o -.. But we only care about the absolute value

2) take the square of each deviation value (this gets rid of the negative signs). (Xi-mean)2
This is a measure of how different each replicate is from the mean.

3) summation of the deviation squared values (statisticians call this the sum of squares).
Now we just have one value that measures the variability within a sample – but this depends on
how many replicates you have in your sample (few replicates small, many replicates larger)

4) sums of squares divided by the number of replicates (n) - 1.


This value represents the average deviation squared. This value is known as the variance (s2).

5) we need to take the square root of the value r to get back to the original units of the sample

11

MEASURES OF VARIABILITY:
STANDARD ERROR
A statistic that reveals how accurately sample data
represents the whole population. It measures the accuracy
with which a sample distribution represents a population by
using standard deviation

se = sd/√n

h"ps://www.youtube.com/watch?v=A82brFpdr9g

12

6
18/3/24

RELIABILITY OF THE SAMPLE MEAN


NUMER OF RATTLESNAKES PER HECTARE IN TWO PROTECTED AREAS. ARE THERE DIFFERENCES?

Notice that the sample means for the


Homochitto and Kisatche are exactly the
same, 10.0.

However, you can see that there is a lot of


variability in the number of rattlesnakes per
hectare in the Kisatche (a range from 1-22,
versus only 8-12 for the Homochitto).
This high variability relative to the
Homochitto sample is reflected in the
standard deviations and standard errors.

Because there is a lot of variability among


replicates for the Kisatche, the sample mean
is not likely to be a very reliable predictor of
the true mean.

13

MEASURES OF VARIABILITY:
CONFIDENCE INTERVAL
• The reliability of a sample mean, as a predictor of the true mean, can be
quantified by computing Confidence intervals (CI).
• A CI is actually a range surrounding the sample mean.
• You define the % confidence you will accept in your data… however,
commonly we use 95%
• It is estimated as a derivation of the standard error.
• The 95% CI of the mean is ± 2 times the standard error (2*se).
• How to interpret this … We can be 95% certain that the TRUE MEAN for
the population is somewhere between these two values

h"ps://www.youtube.com/watch?app=desktop&v=w3tM-PMThXk
h"ps://www.youtube.com/watch?v=yDEvXB6ApWc

14

7
18/3/24

MEASURES OF VARIABILITY:
CONFIDENCE INTERVAL
• It is estimated as
a derivation of
the standard
error. The 95% CI of
the mean is ± 2 times
the standard error
(2*se).

• if

15

PRUEBAS DE HIPÓTESIS.. Tenía o no


razòn? 6 pasos..
• (1) Null hypothesis. Para un análisis estadísitco la hipótesis nula
(no efecto) está implicita en el análisis
Example: Seed production will be the same in removal and control plots.

• (2) Alternative hypothesis. Básicamente prueba que si hay diferencias


(efecto de VI sobre VD) entre grupos en relación a la variable
medida
Example: Big bluestem seed production will differ
between removal and control plots.

La más sencilla para comparar promedios


Entre dos grupos -- prueba de T*
Más de dos grupos– ANOVA*
* Para datos paramétricos (distribución normal)
16

8
18/3/24

PRUEBAS DE HIPÓTESIS.. Tenía o no


razòn?

h"ps://meta-calculator.com/blog/how-to-interpret-t-test-results/

17

PRUEBAS DE HIPÓTESIS.. Tenía o no


razòn?

18

9
18/3/24

PRUEBAS DE HIPÓTESIS.. Tenía o no


razón? Hay pruebas que prueban esto
*PRUEBAS DE NORMALIDAD*
• (3) Assumptions
• Data are randomly collected and
independent
• Data within each group must be normally
distributed
• Each group must have similar variance.
(Homoscedasticity)

Aquí vamos a asumir que si pasan…

Y si no….

Tenemos alternativas…
Transformación de datos (log,z,etc) https://medium.com/mytake/understanding-different-types-of-distributions-
Análisis no paramétricos (más flexibles, menos you-will-encounter-as-a-data-scientist-27ea4c375eec
precisos)

19

PRUEBAS DE HIPÓTESIS.. Tenía o no


razòn?
(4) Computation of test statistic, - t
test
• is the difference between the means divided by the
combined standard error of both groups
• a measure of the difference between the means of the 2
two groups that you are interested in comparison
(numerator), while taking into account the variation
within the two groups.
• high when the difference between groups is large, and
when the variation within groups is small.

• Pero como sabemos?… hay que


contrastar…

SE= sd/√n

20

10
18/3/24

PRUEBAS DE HIPÓTESIS.. Tenía o no


razòn?
(5) Determination of the
Critical value
• need to know the degrees of
freedom (df), alpha (α), and
have a table of critical t-
values.

• degrees of freedom in a t-test


are the total sample size
(both groups added
together) minus two:
• df = (n1 + n2 )– 2

21

PRUEBAS DE HIPÓTESIS.. Tenía o no


razòn?
(5) Determination of the
Critical value
• Alpha (α) is the probability of
making an error, the error of
concluding, based on the data,
that the two groups are
different when in fact (in the
real world) they are not- Type I
error
• you want the probability of
making this kind of mistake to be
very small -set alpha at 0.05. This
means that 5 in 100 times (5%)

22

11
18/3/24

PRUEBAS DE HIPÓTESIS.. Tenía o no


razòn?
(5) Determination of the
Critical value

• Critical values. It is the value


of T associated to alpha (or
a probability)
• One tailed. _has a direction (0,05 in
that direction)
• Two tailed - could be larger or
smaller (divide 0,05 /2).

• Tabla de T

23

PRUEBAS DE HIPÓTESIS..
(6) outcome and interpretation.

report the test-statistic (t-value), df, the


critical value

Absolute Tvalue > critical value, then reject


the null hypothesis and support the
alternate hypothesis. – there is an effect

Absolute T value < than critical value,


support the null hypothesis and reject the
alternate (there is no effect)

https://www.tdistributiontable.com/

24

12
18/3/24

25

effect of the presence or absence of leafy


spurge on seed production in big
bluestem grass.

h"ps://invasivespecies.idaho.gov/leafy-spurge-factsheet h"ps://www.youtube.com/watch?app=desktop&v=qhC8ihrdxHs

26

13
18/3/24

How we go about it
Once we have our objectives, hypothesis and predictions stated, we can start:
• First, we must design a study to test our predictions. How do we manipulate or model the
independent variable in order detect changes in the dependent variable (response variable) for
our predicted outcome.
• Experimental design: Compare big bluestem seed production on 5 plants from plots with
and without leafy spurge.
• Second, we must determine the location, apply treatments, and collect data.
• Application of treatments: two 25m2 plots of land; one with leafy spurge removed
(treatment plot) and one with leafy spurge present (control plot).
• Third, we must collect data.
• Sample: Collect five plants from each plot and count the number of seeds total in each plot.
• Fourth, we must analyze data using statistical procedures and generate graphs and tables (focus
of the next chapter).
• Statistical test: Are seed counts per stem different in the two plots (t-test and bar graph)?
• Fifth- based on the statistical results we can answer whether The leafy spurge plots did/did not
produce more big bluestem seeds than the plots without leafy spurge (support or not our
hypothesis), and of course explain it

27

effect of the presence or absence of leafy


spurge on seed production in big
bluestem grass.

28

14
18/3/24

effect of the presence or absence of leafy


spurge on seed production in big
bluestem grass.

29

15

You might also like