Download as pdf or txt
Download as pdf or txt
You are on page 1of 58

1

Distribution of British birds mapped very closely to pubs sampling bias.


Likewise, size of net mesh greatly affects the size of sampled fish

Model = analysis

Each requires its own set of statistical techniques therefore it is important to


know what you are dealing with.

Continuous, Discrete or Categorical data? The type of data determines the the
type of test i.e. cant talk in terms of average: slug colour.

Well now focus on this last point.

10

11

12

Growth rate = realized r (which we have already discussed)


Here in a statistical analysis context, R is entirely different. Here we are
referring to the amount of variation explained by the BEST FIT LINE between
of the two variables which in this case is 23% (23% of the variation in moose
population growth rate is attributable to the mean age of moose in the
population while the remaining 77% of variation is caused by other factors
(which includes random chance). If age, and only age, explained the
differences in population growth rate, all the data points would be on the line
and R2=1.0 (or 100%).
Define best fit line in simple terms: A line that minimizes the average distance
of each data point to the line. In this example we have chosen a linear
relationship (a straight line). Best fit lines can be a number of different shapes
(logistic, exponential, normal etc.). Linear relationships are the simplest to
describe.

13

14

Confidence in the relationship is the degree to which points fall on the best fit
line. Here 81% of the variation in fish abundance is explained by the size of
the pothole.
The average distance of points from the line is the variation
Why are all points not on the line?
Real, biological reasons but also artifacts of how we measure. We want to
eliminate the latter as best we can, leaving only the former. We do this by
optimizing our experimental design (measure as many of the biological driers
as possible) and use analysis tools that can tease these various drivers apart.

15

All of these plots have low variation but plots B,C,and D are not good models
of the data.

16

17

18

Was every grizzly bear enumerated? No. Therefore we must use samples to
estimate the actual (aka parameter) abundance of the population.
This example is a simplified interpretation of the Artelle et al paper.

19

Since these are estimates, we must also indicate the amount of likely variation
(i.e. the error) associated with each.
The line around the estimate of the mean (the dots) may be standard errors,
standard deviations, or confidence intervals each mean very different things
and knowing those differences is critical in interpreting scientific data.

20

Interpretation: There is a 95% chance the parameter (the actual, real


abundance) lays within this interval. Conversely, there is a 5% chance the
parameter value lays outside this interval (i.e. there is a 5% of being wrong)
Another way to look at this, if the experiment were run 100 times, the
abundance statistic would fall within this interval at least 95 times. The CI is an
estimate of confidence in the statistic.

21

22

Overlapping CIs indicate the real parameter value is likely captured by both
intervals and therefore statistically they cannot be said to be significantly
different. In other words, because each interval defines the range of values
within which we are 95% certain the real value lies within, we have to accept
the possibility that the real population abundance for both GOV and NGO are
the same value.

23

The far right estimate it has the narrowest interval within which we are 95%
certain the real value exists.

24

25

.and therefore reflects the confidence in your parameter estimates

26

27

Here the mean is 10 but of equal interest is the dispersion of points around
10. The greater the dispersion (aka variance) the less meaningful the mean
of 10 is.
This plot shows the bell curve or more correctly, the normal distribution. In a
normal distribution the dispersion of data points around the mean (i.e. in the
positive and negative direction) is roughly equal and symmetric. As a result,
when data are normally distributed, the mean, mode and median are the
same value (or very nearly so).
Put another way: the number or extremely small individuals is roughly
equivalent to the number of really large individuals.

28

These are all normal distributions.

29

The number of species plotted for different abundance intervals, each interval
being twice the preceding one. The overall pattern is normally distributed. An
interesting twist: The portion of the graph (red) left of Preston's veil line is
theoretical, depicting those species that are expected to be present but their
low abundance prevents them from being represented in the sample. Note the
x-axis is on a log scale.

30

From a very interesting paper that is directly relevant to your tutorial projects.
The relevance here is that various factors that affect detectability are all
normally distrusted around a mean value.

31

The power of the normal (aka Gaussian) distribution.


With standard deviation we can estimate how dispersed the data are without
actually sampling all individuals.

32

33

Based on our samples, 95% of individuals from the sampled population will be
between (mean-2SD) and (mean+2SD)

34

35

Here we can sample every individual in our population (i.e. we are not trying to
infer anything about ALL dogs, just the 5 dogs we have here). In this special,
rare, case, we divide by n (which here is 5). In future examples, where we use
samples to make inferences about a larger population (almost always the
situation we face), we would use n-1 as the denominator in the variation
calculation (here would be 4).

36

Having an estimate is only the start of an analysis. What we also need to know
is how precise that estimate is this is an estimate of the power of the
estimate or put another way, the confidence in the estimate.

37

If two researchers measure the abundance of a particular bird species,

38

The simplest of examples.

39

What is an example hypothesis here?


What is the null hypothesis (Ho)
What is the alternative hypothesis (HA)
What is the prediction?
How many tosses are necessary? Lets say each toss (replicate) costs
moneyhere we introduce a real life consideration data costs money (and/
or time, risk etc).

40

41

The answer is, the ability to detect a fixed coin increases with the number of
tosses

42

As sample size increases, the test becomes more precise (in other words, our
power to detect a fixed coin increases)
By increasing sample size you can overcome random runs of heads or tails
because the confidence limits shrink. Low sample sizes may not allow
confidence limits small enough to differentiate overlapping distributions.
Confidence interval is telling us that we are 95% certain that the real
population mean is within these bounds. Recall that we are using a sample to
infer facts of the population.
Deviation from 1:1 in heads:tails we can expect in a fair coin with 5% error
2 tosses = 0 to 2
10 tosses = 2 to 8
50 tosses = 18 to 32
100 tosses = 40 to 60
1000 tosses = 470 to 530

43

Therefore the mean estimate is meaningless unless the variance around that
mean is also reported. Gov and NGO estimates may be dramatically different
or the same number.depending on the variance. The take-home here that it
is NOT the mean that should be focus, but the confidence interval because the
real value (parameter) can be anywhere in this interval.

44

Remember that we are not talking about biologically significant here. At


issue is NOT whether the size of the difference is significant or not. It is that
smaller average differences in size will be harder to detect (given there will be
variation around the means) when the means are not greatly different.

45

When the effect size is large enough and the experiment is well designed, you
dont need statistics.

46

but more often the differences are more subtle and statistics are used to
determine if differences are large enough to be considered significant aka
scientifically valid.

47

The two means could be VERY similar and the difference could still be
confidently detected even with modest sample size if the variance is very low.
Think how low vs. high sample size may hurt or help differentiating the two
populations on the lower panel

48

Imagine one coin always landed heads 100% o the time. Ability to spot it
relative to a normal coin would be high (fewer tosses needed)
Imagine one coin always landed heads and the other always landed tails
would need very few tosses to confidently declare both as fixed
Now imagine one coin is fixed, but less so it lands heads 60% of the time.
How many tosses needed to detect it?

49

A priori or prospective power analysis how many samples do I need to detect


a difference at a given effect size
post hoc or after the fact can I detect a difference at a particular threshold
given my sample size
These considerations have very real economic implications in terms of time
and money required to gather data.

50

51

52

53

equal to or less than

54

55

56

Recall that standard deviation (SD) is the square root of variance.

57

95% Confidence Limits The interval in which we are 95% certain the real
mean resides.

58

You might also like