Experiment 1. Data Management

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 8

Experiment 1

Data Management
K. Draheim, B. N. Estrella, K. M. L. Garcia, L. Guillermo
Department of Biological Sciences, College of Science, University of Santo Tomas, Espaa
Street, Manila 108
Introduction
Data management is an essential part of research. Obtaining, handling, analyzing, and
presenting data are meticulous tasks that a scientist must learn to perform effectively. Not only
does data need to be well-kept, they also require to be managed in such a way that error is as
small as possible.
Prior to analysis, data are gathered in numerous ways. Traditional data recording using
pencil and paper is still the most common way of capturing data but more technologically
advanced methods are now being employed by some researchers. Results and observations must
be neatly documented. Data sheets should be easily understood and interpreted. Analysis can
then be performed.
Spreadsheet programs have now been indispensible for this purpose. They store, graph,
and statistically analyze data. A common example is the Microsoft Excel. It has been vastly
utilized since its version 5 in 1993 and has replaced Lotus 1-2-3 as the standard for spreadsheets
(Kaul, 2013). Analysis ToolPak is an Excel add-in program that must be activated prior to
statistical analysis (Weterings, 2010).
Statistical tests may either be descriptive, parametric, or non-parametric. Descriptive
statistics are basically numbers that summarize and describe information collected from an
experiment or a survey (Hebl, 2012). Included is the mean, median, mode, standard deviation,
standard error, coefficient of variation, etc. Descriptive statistics allow a researcher to
consolidate data into more simplified terms. On the other hand, parametric and non-parametric
statistics are distinguished by different levels of assumption and techniques of analyses. Nonparametric may be used both with actual or with transformed and ranked observations. In this
test, data may be nominal, ordinal, or in interval scales. Its analyses are based on comparing
medians. This test is fit for count and derived data. Parametric statistics, however, is used only

with actual observations that are strictly in interval scales. Means and variances are compared for
analyses. Count and derived data require transformation.
The analysis of variance (ANOVA) is a parametric test that determines the degree to
which two or more groups vary in an experiment. ANOVA may either be one-way or two-way.
One-way ANOVA is used in experiments where there is only one independent variable. It allows
a researcher to assess the effect of only one factor. Multiple samples are compared but only one
factor is involved. Two-way ANOVA, on the other hand, is designed to analyze two factors
simultaneously. Hence, there are two independent variables that affect the dependent variable. It
also determines whether there is an interaction between the parameters (Lombardo, 2014).

Objective
1. To be able to learn some of the principles and techniques of data management
2. To be able to familiarize ones self with the use of the computer and spreadsheet
3. To analyze and ascertain the proper statistical approach for a given set of data

Results and Discussion (Computation/formulas)


Problem set L: A marine biologist in charge of four marine reserves located on a small
island noticed that one of the marine reserves (Area I) was twice the size of the other areas
(II, III, and IV). Considering that all other aspects of the marine reserves were equal except
for size, the biologist wanted to find out if the size of the marine reserve had an effect on the
overall size of fish species living within them. To test this, he designated a single fish species
Acanthurus olivaceous as the test species, and collected 10 specimens of this fish in each of the
four marine reserves. He measured each fish (in cm) and tabulated the data below.

AREA I
AREA II
AREA III
AREA IV

83
83
84
82

93
83
78
74

92
88
84
80

93
86
80
75

88
83
82
79

87
86
83
88

86
86
85
85

85
87
83
80

85
81
88
81

94
81
89
80

After 6 months, he went back to the sites and again captured 10 fish from each site and took their
lengths. The data he obtained are as follows:

AREA I

90

93

91

10
0

10
5

92

10
3

91

94

99

89

93

96

10
1

91

92

98

92

99

10
1

96

95

92

89

91

93

97

10
1

88

90

95

92

90

86

89

91

87

85

95

82

AREA II

AREA III

AREA IV

Based on the data, assumptions were made. First, the dependent variable was measured
at the continuous level (size of fish). Second, there are two independent variables(before and
after six months) consisting of four independent groups (Area I-IV). Third, there
was independence of observations, which means that there is no relationship between the
observations in each group or between the groups themselves since there were different
participants (fish) in each group with no participant being in more than one group. And lastly,
there were no significant outliers observed. Given all these assumptions, it was ascertained to use
the Two-Way ANOVA. The following formulas were used :
Table 1. Two-Way ANOVA Table
Source
of
Variatio
n
A

df

SS

MS

a-1

SSA

MSA=

SSA
(a1)

MS(A) /
MS(W)

b-1

SSB

MSB=

SSB
(b1)

MS(B) /
MS(W)

AB

(a-1)(b1)

SS(AB)

MSAMSB

MS(A*
B) /

Error

ab(n-1)

Total

N-1

SSE=SST SSASSB
MSW =
Total SS

Data then was computed as follows:

Table 2. Analysis of data with the use of Two-Way ANOVA


S
o
u
r
c
e
o
f
V
a
r
i
a
t
i
o
n
S
a
m
p
l
e
(
A
r
e
a
/
M

S
S

M
S

5
7
4
.
4
0

1
9
1
.
4
7

SSW
(a1)( b1)

MS(W)

a
ri
n
e
R
e
s
e
r
v
e
)
C
o
l
u
m
n
s
(
F
i
s
h
S
i
z
e
)
I
n
t
e
r
a
c
ti
o
n
(
A
r
e
a
X
F

1
,
6
5
6
.
2
0

1
,
6
5
6
.
2
0

3
4
.
2
0

1
1
.
4
0

i
s
h
S
i
z
e
)
1
,
2
0
W 8
it .
h 0
i 0
n

T
o
t
a
l

1
6
.
7
8

3
,
4
7
2
.
8
0

For interaction:
H0 :Area or Marine Reserve does not have a significant effect on fish size.
Ha : Area or Marine Reserve has a significant effect on fish size
The rejection region states that reject H0 when F> Fcrit, or when the p-value < 0.05. Since
0.68 < F2.73, accept null hypothesis. Marine reserve has no significant effect on fish size.

For Factor: Marine Reserve


H0 : There are no differences among areas or marine reserves.
Ha : At least two of the area or marine reserve differ.
The rejection region states that reject H0 when F> Fcrit, or when the p-value < 0.05. Since
11.41> F2.73 , reject null hypothesis. At least two of the areas or marine reserves differ.

For Factor: Fish size


H0 : There are no differences among fish size.
Ha : At least two of the fish sizes differ.
The rejection region states that reject H0 when F> Fcrit, or when the p-value < 0.05. Since
98.71 > F3.97 , reject null hypothesis. At least two of the fish sizes differ.

Skewness was determined using SKEW factor in Microsoft excel. Both before and after
six months of fish collection and size reading have a positive skew, 0.120621722 and
0.280911909 respectively, therefore skewed to the right.
Conclusion
Data management is necessary for interpretation of data. There are many criteria to which
we can fit our data so that a specific test can be made. Some of these methods of analyzing data
are T-Test, Chi-Square Test, Regression, Correlation, and ANOVA. Results from these tests can
be represented using histogram, scatter plot, or a simple line graph, to name a few. There are also
various ways of transforming the data without breaking the integrity of the collected data. Many
software specialized for statistical data analysis are now accessible for a more convenient and
faster way of obtaining results. One of the most common software is the Microsoft Excel. Other
programs are also available such as SPSS and Minitab, but they are much more sophisticated
programs.
In the specific problem set, Two-Way ANOVA was efficiently utilized to determine the F
values and subsequently the significant difference between means of factors marine reserve and
fish size, and their interaction. Another tool in Microsoft Excel was used, the SKEW Factor that
indicated a positive skewness for given data, denoting the distribution is more stretched on the
side above the mean.
Reference
Alferez, M. S. &. Duro, M. C. A. 2006.
Statistics and Probability. MSA : Quezon City
Magurran, A. E. 2004. Measuring Biological

Diversity. Blackwell Publishing : Australia


Mendenhall, W., Beaver, R. J. & Beaver, B.
M. Introduction to probability and Statistics. Thomson Brooks/Cole : Singapore
Odum, E. P. & Barrett, G. W. Fundamentals
of Ecology. Thomson Brooks/Colen : Canada

You might also like