Benthic Macroinvertebrate Community Assemblage in Bear Run Watershed

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 11

Independent Analysis II

Alexa Hershberger Methods


4/26/2021 Sample Design
Benthic Macroinvertebrate Community This study was conducted in the Bear
Assemblage in the Bear Run Watershed Run Watershed in northern Indiana County.
It is one of the largest AMD impact zones to
Objective the headwaters section of the Susquehanna
River. The original data had six sites. Of
The data selected for this study is these six sites, there were three AMD sites
from my undergraduate pilot study in the and three control/remediated sites. Each site
Bear Run Watershed. This project is funded also had three Hester-Dendy Samplers to
by the Susquehanna River Basin replicate the artifical substrate and three
Commission. Acid mine drainage (AMD) is Hess Sampling collections to replicate the
a result of abandoned coal mines and refuse natural substrate. To avoid pseudo-
piles. Bear Run Watershed is looted with replication, I combined the three subsamples
abandoned coal mines that have not been for each substrate type into two measures
properly treated once the company left. The per site. So, there was one measurement for
untreated mining soil releases heavy trace the artificial substrate macroinvertebrate
metals into the stream that is detrimental to community and one measurement for the
different wildlife. In 2005, the Susquehanna natural substrate macroinvertebrate
River Basin Commission conducted a nine- community for each site.
phase AMD restoration project in the
Watershed. The restoration improved the
water quality and increased fish species
richness, but the recovery of the
macroinvertebrate community has been
slow, perhaps due to the poor habitat
quality. This pilot study examined the
Figure 1. Macroinvertebrate sampling methods. (Hester-
chlorophyl a abundance, environmental Dendy sampler on left and Hess sampler on
variables, and macroinvertebrate abundance, right)
but I will only focus on the
macroinvertebrate abundance for this The sample locations were chosen by the
analysis. Our goal was to determine how project manager at the Susquehanna River
substrate type and site impairment affects Basin Commission and Dr. Janetski. The
the macroinvertebrate recovery in the Bear sites were based off a previous thesis student
Run Watershed. We predicted that the in Dr. Janetski’ s lab. The environmental
recovery would be stronger on substrates variables collected at each site was:
lacking an iron residue and other substrate type, impairment, pH, water
precipitates, thus a higher abundance on the temperature in Celsius, and conductivity.
artifical substrate and control sites. The collected individuals were stored in
80% ethyl alcohol and identified to the order
taxonomic level.

1
Analysis
I based my analysis off the AW09,
AW10, and AW11 worksheets. To first
understand my dataset, I wanted to conduct
a community analysis examining the
patterns within the entire species
assemblage. In the species assemblage data,
there were 6 locations in Bear Run and a
total of ten species. I wanted to also explore
the beta diversity because it would compare
changes in macroinvertebrate communities Figure 1. Histogram of overall abundance across sites.
between sampled locations. The result of
the beta diversity will be used to analyze the The histogram for overall abundance is
second part of my hypothesis in comparing significantly skewed to the right. There are
community structure between AMD and two outlier species that has a higher
control sites. The third analysis workshop frequency compared to the others (Figure 1).
will link my community results to my There is evenness in this community expect
environmental data. This will be used to for the two species. I would expect to see a
look at the effect of substrate type of dominant species in this community, which I
macroinvertebrate community structure. The would assume is the Diptera and Plecoptera,
R packages used in this analysis were: readr, because they were listed as the most
here, dplyr, tidyr, tibble, ggplot2, betapart, abundant genera.
and vegan. I then looked at species occurrence
in the population. The total occurrence of
Results the most abundant species and least
The first step in my analysis was to abundance species is show in Table 1.
complete a species assemblages diversity Table 1. Species Occurrence in the Bear Run Watershed.
analysis. The main intention of this part of
the analysis was to examine different Species Species Occurrence
patterns throughout the entire species
assemblage. Diversity, species richness and Plecoptera 12
species evenness were all analyzed in this Ephermeroptera 11
portion. The macroinvertebrate counts were
from six locations in the Bear Run Trichoptera 10
Watershed. There was a total of 10 species.
The most abundant species were Diptera Diptera 10
(123), Plecoptera (114), Coleoptera (26),
and Annelida (22). The least abundant Annelida 9
species were Trichoptera (19), Megaloptera Coleoptera 8
(30), Tipuloidea (2), Pterygota (1), and
Copepod (1). Megaloptera 3

Tipuloidea 2
2
Pterygota 1

Copepod 1

I then used the computed Shannon Diversity


to determine the biological significance of
differences in diversity. I chose the
Shannon Diversity Index because it is the
default index. This index reflects the
balance between species richness and Figure 4. Rarefaction curve of species richness.
species evenness.
The results from the rarefaction curve
suggest that the minimum sample size from
our data set should be approximately 20
individuals. The trend does not appear to
fully stabilize once all the species have been
encountered (Figure 4). This does raise
some concern, but I am not sure how to
address it. I would assume that if I increase
my sample size, I should have a more
representative sample. This should stabilize
the species rarefaction curve.
Next, I looked at the gamma
diversity and species accumulation curves.
Figure 3. Histogram of effective species unit from Shannon
The species accumulation curve will look at
Diversity Index. site level variation, which should provide a
more adequate comparison between AMD
The range of the effective number of species and control sites compared to the rarefaction
across the sample locations is from curve. The gamma diversity is the
approximately 3 to 5 (Figure 3). relationship between the overall observed
The third part of this initial analysis was to species richness/diversity based on the six
examine the rarefaction curve to look how sites sampled. Based on the species
the species diversity varies from each site. accumulation plot, I should encounter more
This may have been skewed due to the species if I sampled an addition 6 locations
difference in water quality between the (Figure 5). The curve appears to flatten out
AMD and control streams. I added the the most at approximately 12 sites. I will be
species rarefaction analysis to address any interesting to see how the species
detection probability of low abundance accumulation curves compare when I have
issues in my study. 16 sites.

3
Table 2. Observed Community Simulation.

Statistic SES Mean Prob.

22.6 21.2 21.2 0.27

The p-value of the observed


community simulation is not significant, so I
know that this is equivalent to a random
structure (Table 2). If the p-value were
statistically significant, it would be a nested
Figure 5. Species accumulation plot. design because of the null model is random.
I then computed the beta diversity The second portion of the second
for the dataset. The reason I computed this large analysis was reviewing the
was because I wanted to examine the unconstrained ordination. The principle
macroinvertebrate species assemblages component analysis relies on a Euclidean
between sites. The nested patterns for beta distance.
diversity were tested to detect whether
species loss was random or if it occurred in Principle component one accounts for
a predictable pattern. This is especially approximately 40% of the variation in the
important in AMD sites. original dataset. The proportion explained
drastically drops after PC4. PC1 through
PC4 account for approximately 90% of the
variation.
Table 3. Kaiser-Guttman Criterion

PC1 PC2 PC3

0.114 0.061 0.026

According to the Kaiser-Gutman criterion


Figure 6. Nested temperature plot.
PC1, PC2, and PC3 should be retained
In the nested temperature plot, most (Table 3).
of the red appears on the left side of the
curve. This shows that there is a strong
nested pattern. There are also a few rare
species indicated on the right side of the
curve (Figure 6). I then compared the
observed community to a pattern from the
random simulation of species across the sites
where the number of species occurrences
were held constant and the number of
species per site were held constant.

4
variation in the dataset. The points are fairly
close to the line, so it is a decent fit (Figure
9).

Figure 7. Biplot of community assemblage based on PC1


and PC2 with a scaling of 2.

The plotted PCA was based on Figure 9. Stress plot of non-metric multidimensional
scaling.
method 2. The species that is most strongly
represented by PC1 appears to be Diptera (~ Next, I added categorical
0.5). The sites that appear the most different environmental descriptions to the
is site 7 and site 12. They are both control unconstrained ordination plot.
sites but differ as artificial and natural
substrate. The species that is most strongly
represented by PC2 was Annelida (0.4).
The two sites that appear the most different
is site 3 and site 5. This is interesting
because both sites represent AMD sites on
natural substrate. Based on the eigenvalues,
CA1, CA2, and CA3 should be included in
the correspondence analysis (Figure 8).

Figure 10. Unconstrained ordination plot with


environmental variables.

The two most distinct communities are


Pterygota and Copepod. This may be
Figure 8. Correspondence analysis of eigenvalues. skewed because there was only one
Copepod found in the samples. The two
To determine the rank rather than the actual
most similar communities are Diptera and
values, I conducted a non-metric
Annelida because they are overlapping. This
multidimensional scaling, which uses a non-
makes sense because both species are
parametric form of ordination. The stress
pollution-tolerant (Figure 10).
plot showed that the NMDS does an
adequate job of characterizing the original

5
Lastly, I conducted a post hoc in the variables of substrate type,
comparison of the environmental variables temperature, and pH. The r-squared value
to unconstrained ordination via non-metric was also the same for all three approaches.
multidimensional scaling. This indicated that it did not matter what
approach I chose.
The ordination of this dataset was
examined by the variance, eigenvalues, and
accumulated constrained eigenvalues. The
original variation in the data set that is
captured in the first two constrained
components is about 0.55 (RDA1 + RDA2).
The amount of additional information that is
provided by the third constrained component
is approximately 0.035, which is not that
much.
I then made a triplot that displays my
Figure 11. Post Hoc comparison of environmental sites, species, and environmental variables
variables to unconstrained ordination.
from my Redundancy Analysis Model with
The variables that are almost contours.
perfectly aligned with NMDS axis 2 are the
substrate type and temperature. The
variable that appears to be the strongest
when explaining differences in species
communities is substrate type. The variable
that appears to change most similarly across
the sites is conductivity and pH. I then used
a redundancy analysis, which is a form of
multivariate linear regression. The amount
of variation that is explained by the
constrained portion of the ordination was
approximately 50% (r-squared = 0.73,
adjusted r-squared = 0.5). The variables that
have high variance inflation factors are
water temperature and conductivity. Figure 12. RDA plot of environmental variables.

There was no difference in the r- Substrate type and pH appear to be most


squared values in the apriori models and the strongly related to the first axis because they
redundancy analysis. To determine which are the most parallel to the axis. All the
variables to include in the model, I used the variables have a non-linear relationship with
step-wise variable selection approach. I will the contour curves (Figure 12).
run the forward selection, backward
selection, and forward-backward selection. Conclusion
The outcome for each of the selection
The results of this analysis support
approaches was the same. They all resulted
my prediction that substrate type influences
6
the macroinvertebrate community. tot.occur <- macro_BRW.dat %>%
Temperature and pH both have an influence summarise_each(list(~sum(.>0))) %>%
on the community assemblage which is t %>% data.frame %>% arrange(desc(.))
expected. It is interesting that pH has an
influence while site type does not. I would tot.occur %>%
predict that there would be a different head(5) # 5 most abundnant species
community present at the AMD sites. A tot.occur %>%
literature search indicated that there should tail(5) # 5 least abundant species
be a difference. It is possible that my
sample size was not large enough to pick up tot.occur[,1] %>%
the difference in site type. I will increase hist(breaks=10) # Display histogram of
my site number for my thesis, so this should overall abundance across sites
solidify my results. I will also add a
sediment analysis to my thesis to better ##Initial Diversity Indices
understand how the substrate plays a role on
macro.Shannon <-
macroinvertebrate community.
specnumber(macro_BRW.dat)
Code Chunks hist(macro.Shannon, breaks=10)

easypackages::libraries("readr", macro.H <- diversity(macro_BRW.dat)


"here", "dplyr", "tidyr", "tibble", #Basic Shannon Diversity Index
"ggplot2", "vegan") hist(macro.H, breaks=10)

#Species Assemblage Data macro.simpson <-


diversity(macro_BRW.dat,
macro_BRW.dat <- index="simpson") #Basic Simpson's
read_csv(here("Original Data", Diversity Index
"macro_specR.csv")) hist(macro.simpson, breaks=10)
View(macro_BRW.dat) macro.J <-
macro.H/log(macro.Shannon) #Pielou's
##Summarizing Species Assemblage Data eveness
tot.abund <- macro_BRW.dat %>% hist(macro.J)
colSums() %>% sort(decreasing = TRUE) ##Effective Species Unit
%>% data.frame
macro.Hsp <- exp(macro.H) ##Effective
tot.abund %>% species number
head(5) # 5 most abundnant species hist(macro.Hsp)
tot.abund %>% #Species Rarefaction
tail(5) # 5 least abundant species
macro.min <-
tot.abund[,1] %>% min(rowSums(macro_BRW.dat))
hist(breaks=20) # Display histogram of macro.Srare <- rarefy(macro_BRW.dat,
overall abundance across sites sample= macro.min)
macro.Srare

7
plot(macro.Shannon, macro.Srare, plot(macro.temp, kind="incid")
xlab = "Observed No. of Species", ylab = #incidence
"Rarefied No. of Species")
oecosimu(macro_BRW.dat,
rarecurve(macro_BRW.dat, step = 20, nestedchecker, "quasiswap")
sample = macro.min, col = "blue", cex
= 0.6) #Beta Diversity ##Species Assemblage
Data
##Gamma Diversity
macro_env.tdf <-
macro.sac <- read.csv(here("Original Data",
specaccum(macro_BRW.dat) "macro_envR.csv"))
macro_sp.tdf <-
plot(macro.sac, ci.type="polygon", read.csv(here("Original Data",
ci.col="yellow") "macro_spR.csv"))
macroBRW_specRich<- macro_sp.tdf
specpool(macro_BRW.dat) #macro_sp.tdf <- macro_sp.tdf %>%
macroBRW_specRich
# mutate(SITE = as.factor(ï..SITE),
##Beta Diversity #SUBSTRATE =
as.factor(SUBSTRATE))
macro_dat.dist <-
vegdist(macro_BRW.dat, ##Nested Temperature
method="jaccard") # Includes relative
oecosimu(macro_BRW.dat,
abundance in the calculation
nestfun=nestedtemp, "quasiswap")
macro_bin.dist <- ##Similarity and Distance Measures
vegdist(macro_BRW.dat, binary=TRUE,
method="jaccard") # Calculation is betadiver(help='true') # Displays
based on presence-absence different methods used to calculate beta
diversity
betadiver(help='true') # Displays
macro.beta <-
different methods used to calculate
betadiver(macro_BRW.dat,
betadiver
method="sor", order=TRUE)
macro.beta <-
macro_dat.dist <-
betadiver(macro_BRW.dat, method="r",
vegdist(macro_BRW.dat,
order=TRUE)
macro.beta method="jaccard") # Includes relative
abundance in the calculation
#Nested Patterns for Beta Diversity
macro_bin.dist <-
nestedchecker(macro_BRW.dat) vegdist(macro_BRW.dat, binary=TRUE,
macro.temp <- method="jaccard") # Calculation is
nestedtemp(macro_BRW.dat) based on presence-absence
plot(macro.temp) # temperature

8
##Unconstrained Ordination ##Principle ##Plotting the PCA
Component Analysis
plot(macro.hel.pca, scaling=1)
view(macro_BRW.dat)
macro.hel<-decostand(macro_BRW.dat, plot(macro.hel.pca, scaling=2)
method="hellinger") biplot(macro.hel.pca, scaling=1)
macro.hel.pca<-rda(macro.hel)
biplot(macro.hel.pca, scaling=2)
##Number of Relevant Axes ##Correspondence Analysis
(macro.hel.eig<- macro.ca<-cca(macro_BRW.dat)
macro.hel.pca$CA$eig) summary(macro.ca)
macro.hel.eig[macro.hel.eig>mean(ma macro.ca.eig<-macro.ca$CA$eig
cro.hel.eig)] #Kaiser-Guttman macro.ca.eig

evplot <- function(ev) #Kaiser-Guttman


{ macro.ca.eig[macro.ca$CA$eig>mean(m
# Broken stick model (MacArthur acro.ca$CA$eig)]
1957)
n <- length(ev) evplot(macro.ca.eig)
bsm <- data.frame(j=seq(1:n),
plot(macro.ca, scaling=1)
p=0)
bsm$p[1] <- 1/n plot(macro.ca, scaling=2)
for (i in 2:n) bsm$p[i] <-
bsm$p[i-1] + (1/(n + 1 - i)) ##Non-metric Multidimensional Scaling
bsm$p <- 100*bsm$p/n
macro.nmds <-
# Plot eigenvalues and % of variation
metaMDS(macro_BRW.dat, k=2,
for each axis
distance="bray")
op <- par(mfrow=c(2,1))
barplot(ev, main="Eigenvalues", macro.nmds
col="bisque", las=2)
abline(h=mean(ev), col="red") stressplot(macro.nmds)
legend("topright", "Average
plot(macro.nmds, main="NMDS_Bray")
eigenvalue", lwd=1, col=2, bty="n")
barplot(t(cbind(100*ev/sum(ev), plot(macro.nmds, type = "n")
bsm$p[n:1])), beside=TRUE, points(macro.nmds, display = "sites",
main="% variation", cex = 0.8, pch=21, col="red",
col=c("bisque",2), las=2) bg="yellow")
legend("topright", c("% text(macro.nmds, display = "spec",
eigenvalue", "Broken stick model"), cex=0.7, col="blue")
pch=15, col=c("bisque",2),
bty="n") ##Add categorical environmental
par(op) descriptions to unconstrained ordination
} plots

9
View(macro_env.tdf) ##Reducing Explanatory Variables
##Apriori Models
##Ellipses
macro.rda.KNPFe<-
plot(with(macro_env.tdf,macro.nmds, rda(macro.hel~CONDUCT+SUBSTRATE+TEM
type = "n")) P+PH+ï..SITE,macro.e.scl) #include
text(macro.nmds, display = "spec", all variables; r-squared value drops
cex=0.7, col="blue") drastically
#summary(pine.rda)
##Hulls macro.rda.KNPFe
plot(with(macro_env.tdf,macro.nmds,
RsquareAdj(macro.rda.KNPFe)
type = "n"))
text(macro.nmds, display = "spec", vif.cca(macro.rda.KNPFe)
cex=0.7, col="blue")
##Step-wise Variable Selection
#Post Hoc Comparison of Environmental
Variables to Unconstrained Ordination macro.mod.all<-rda(macro.hel~ . ,
##Non-metric multidimensional scaling macro.e.scl) #Set maximum model size
macro.mod.0<-rda(macro.hel~ 1 ,
##Fit Selected Variables macro.e.scl) #Set minimum model size
macro.ef <-
envfit(macro.nmds~SUBSTRATE+PH+TEMP+ macro.step.fwd=ordistep(macro.mod.0,
CONDUCT , data=macro_env.tdf, scope=formula(macro.mod.all),
perm=1000) direction="forward",
macro.ef permutations=how(nperm=1000))

plot(macro.nmds, dis="site") vif.cca(macro.step.fwd)


plot(macro.ef)
RsquareAdj(macro.step.fwd)
##Constrained Ordination ##Scale Data
macro.step.bwd=ordistep(macro.mod.al
macro.hel<- macro_BRW.dat %>% l, direction="backward",
decostand(method="hellinger") permutations=how(nperm=1000))
macro.hel
vif.cca(macro.step.bwd)
macro.e.scl<-macro_env.tdf %>% scale
%>% data.frame RsquareAdj(macro.step.bwd)
macro.e.scl
macro.step.both=ordistep(macro.mod.0,
##Redundancy Analysis scope=formula(macro.mod.all),
direction="both",
macro.rda.all<- permutations=how(nperm=1000))
rda(macro.hel~.,macro.e.scl)
(macro.rda.all) vif.cca(macro.step.both)
RsquareAdj(macro.rda.all) RsquareAdj(macro.step.both)
vif.cca(macro.rda.all) ##Examine Ordination
10
summary(macro.step.bwd) text(macro.step.bwd, display = "spec",
cex=0.7, col="black", scaling=2)
coef(macro.step.bwd) text(macro.step.both, display= "bp",
anova(macro.step.bwd, cex=0.7, col="purple4", scaling=2)
permutations=1000)
with(macro_env.tdf,
anova(macro.step.bwd, by="axis", ordisurf(macro.step.fwd, SUBSTRATE,
permutations=1000) ###Tests add=TRUE, col="green4"))
signifiance of each axis
plot(macro.step.bwd, scaling=2,
#coef(macro.rda) main= "Triplot RDA; Scaling 2",
macro.rda.all type="n")
points(macro.step.bwd, display =
##Evaluate variables of the constrained "sites", scaling=2,cex = 1.1, pch=21,
ordination col="red", bg="yellow")
anova(macro.step.bwd, by="mar", text(macro.step.bwd, display = "spec",
permutations=1000) ####Marginal cex=0.7, col="black", scaling=2)
effects (Type III ANOVA where order text(macro.step.both, display= "bp",
should not matter cex=0.7, col="purple4", scaling=2)

coef(macro.step.bwd) with(macro_env.tdf,
ordisurf(macro.step.fwd, TEMP,
#Triplots add=TRUE, col="green4"))
plot(macro.step.bwd, scaling=2,
main= "Triplot RDA; Scaling 2",
type="n")
points(macro.step.bwd, display =
"sites", scaling=2,cex = 1.1, pch=21,
col="red", bg="yellow")
text(macro.step.bwd, display = "spec",
cex=0.7, col="black", scaling=2)
text(macro.step.both, display= "bp",
cex=0.7, col="purple4", scaling=2)

with(macro_env.tdf,
ordisurf(macro.step.fwd, PH,
add=TRUE, col="green4"))

plot(macro.step.bwd, scaling=2,
main= "Triplot RDA; Scaling 2",
type="n")
points(macro.step.bwd, display =
"sites", scaling=2,cex = 1.1, pch=21,
col="red", bg="yellow")

11

You might also like