Professional Documents
Culture Documents
Statistics-Glossary of Technical Terms For Non-Technical Readers
Statistics-Glossary of Technical Terms For Non-Technical Readers
cluster analysis: (noun) a rather loose collection of numerical methods that can be used
to assign objects to hierarchical groups (clusters) usually resulting in a dendrogram
(branching tree-like graphic, also called a tree) showing similarities or associations
between objects.
coefficient: (noun) a multiplier that measures some property of a particular set of data,
for which it is constant, while differing for different data sets, or a number multiplied
to a variable, as in 2x, where 2 is the coefficient.
correlation: a measure of association between two variables, similar to the idea of things
being proportional to each other.
Delta Score: (noun) a statistic of mensural reliability developed during this study. The
Delta score uses pairs of specimens taken from single plants to represent genetically
identical specimens grown in similar environments. Multiple sets of these pairs from
any number of plants are used in the analysis. The mean (average) is calculated for
each pair (see Figure1). The absolute value of the difference between mean and one
of the pair members (either one since they are the same) is recorded as the delta (∆).
The delta is divided by the mean to arrive a Delta percent. The distribution of the
whole set of Delta percents, most of which will be near zero, is examined to find the
median Delta Percent. This median is called the Delta Score.
191
0.20
0.15
0.12
0.10 0.10
0.08
0.05 0.06
0.04
0.02
0.00
0 00
Median = 0.058 = 5.8% Median = 0.0199 = 2%
Calculation of Delta percent Leaf Length Style Tip Length
(∆ %). Distribution of ∆ %s Distribution of ∆ %s
FIGURE 1. CALCULATION OF THE DELTA SCORES. Delta score is the median delta percentage.
Histogram of style tip length was compressed to be at a similar scale to the leaf length histogram.
eigenvalues: also known as characteristic roots or latent roots of a square matrix, these
values correspond to a stretch/shrink factor associated with transformation of a
matrix. Eigenvalues show the measure of the strength of the axis or the amount of the
total sample variance accounted for in that axis. Higher eigenvalues in the first
components or axes indicate that more of the variance is accounted for in those axes
and are preferred to an outcome where the eigenvalues gradually trail off.
192
eigenvectors: also known as characteristic vectors or latent vectors of a square matrix,
these numbers correspond to a change in direction associated with transformation of a
matrix. Eigenvectors are normally shown as columns of coefficients showing the
relative contribution of the original variable to the component (a correlation between
the axis in the new matrix and the original variable). These are used to identify the
important variables along the new axis.
histogram: a graphic for showing a frequency distribution. Usually the count of items in
and interval is shown by height of a rectangle, but it could also be stacked icons,
alpha-numeric characters, or even shades of a color. A pair of histograms is
illustrated under Delta score.
leptokurtic: relative to the normal bell-shaped curve, leptokurtic implies that there are
more items in the center (the bell is taller) and the "tails" shorter.
measurement error: the component of an observed score that accounts for the difference
between the observed score and the true score.
median: a measure of the center or middle of the dataset which is calculated by sorting
the data into ascending order and finding the observation that has an equal number of
observations on either side of it, thus the middlemost value in a distribution of data.
The median is not as influenced by outliers as the mean.
193
nested: a condition of one thing being contained within another as in cities nested within
states, here used as 1) a concept of hierarchical order where in one level of
classification is subordinate to another level of classification; 2) where a physical
structure is contained with a second physical structure.
nested variance components: variance components are, in general terms, the variance
accounted for by a source, they are estimated using the mean squares for random
effects and their expected values. In nested variance components, the sources of
variance are nested effects, so that in the city(state) model, there is a effect of being
in a state that all cities will be subject to and there is the effect of each city itself, and
there is also some variation that can't be assigned to an effect of either (model error).
outgroup: a group of organisms that is related to but removed from the study taxa.
194
outlier: a observation that is extremely small or large relative to the rest of the
observations. These may be due to errors of measurement, to recording errors, or
abberent individuals. If the sample is small, outliers can have a big effect on the
mean.
Pearson Correlation Coefficient: the most commonly used test for correlation, used
with continuous data. The assumptions for this test are that the data are normally-
distributed and the variances are equal.
platykurtic: a non-normal distribution that has fewer items in the center of the
distribution and more on the edges.
p-value: the statistic associated with the probability of getting this result (or one more
extreme) in the sample you took, if there were really no differences or correlations
between the things you are studying. Typically a p-value of 0.05 or less is taken to
indicate that some difference or correlation exists.
quantile: points at which various percentages of the total sample are above or below, the
median (50% quantile) and the quartiles (25% quantiles) are both quantiles. Besides
dividing the sample into halves or quarters, the quantile points can be defined to
divide the sample into any number of parts.
quartile: the set of quantiles that divides the number of items into four equal sized sets,
25 percent of the items are between the 1st quartile and the median.
195
reliability: the extent to which a measure represents the "true" measure; a measure of
consistency between a repeated measurements of the same thing.
r-value: the correlation coefficient, this value shows the nature and size of the
relationship between two variables.
standard deviation: A measure of variability equal to the square root of the variance. In
a normal distribution, approximately 68% of the sample will fall within 1 standard
deviation (above and below) the mean. In non-normal distributions this measure is
less useful.
stem and leaf plot: (also called a stemplot) a graphic device for showing a frequency
distribution, but where the intervals are defined by the first digits of the numbers and
the items are shown as the final digits of the numbers. For example if the items
ranged from 1 to 100, the intervals would be units of ten, labeled as 0, 1, 2,…, 10, and
the items would be shown as the last digit of each number; for example 73 would
show up as a 3 in the 7 interval.
196
transform: to change the scale the observations occur on. Common transformations
include subtracting the mean from each item, which changes the observations to have
a mean of zero, dividing by the standard deviation (which makes them a proportion of
the standard deviation, or converting them to logarithmic values.
Ultrametric distance: technically a distance function like metric, but with an additional
property. Here the significance is that this concept of distance allows a concise
description of a branching pattern. In morphometrics, ultrametric functions are used
to quantify clustering dendrograms, but it was suggested that this could also be used
to compare vein branching patterns.
unordered pair: a pair of measurements of similar things for which there is no reason to
have one first and the other second.
weighting: using a coefficient to emphasize some part of the data set. Weighting is used
in ordination to control what variables are likely to fall on the major axis of
variation.
z-score: the numerical result of transforming the data by subtracting the mean of each
observation (thus centering it on zero) and dividing it by the standard deviation
(making it an expression of how many standard deviations away the observation is
from the mean).
197