Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 30

DEPARTMENT OF PLANT BIOLOGY AND BIOTECHNOLOGY

FACULTY OF LIFE SCIENCES


UNIVERSITY OF BENIN
BENIN CITY

(SEMINAR)

TOPIC:
PRINCIPAL COMPONENT ANALYSIS (PCA) AS AN IDEAL TOOL FOR ANALYSING
ON-FARM RESEARCH DATA

SPEAKER:
Odoligie IMARHIAGBE

DATE :
27 FEBRUARY, 2013.
th

PRINCIPAL COMPONENT ANALYSIS (PCA) AS AN IDEAL TOOL FOR


ANALYSING ON-FARM RESEARCH DATA
by Odoligie IMARHIAGBE is licensed under a
Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Li
cense
.
INTRODUCTION
 On-Farm research trials have rapidly gained popularity in the past
few years with due consideration being given to the knowledge,
problems and priorities of farming ‘families’.

 Researchers, such as plant breeders and agronomists, who have


been trained in techniques of on-station research, are now under
pressure to move on-farm (Gomez and Gomez,1984).

 During data collections in on-farm research, one encounters


situations where there are large numbers of variables comprising
the dataset. One of the key steps in data analysis is finding ways
to reduce dimensionality without sacrificing accuracy.

 Principal component analysis (PCA) is a multivariate technique in


data analysis. It can be used for compressing higher dimensional
data sets to lower dimensional ones (Davis, 1986).
ON-FARM RESEARCH

 On-Farm Research is a set of procedures for adaptive research


whose purpose is to develop recommendations for representative
groups of farmers.

 In On-farm research, farmers participate in identifying problems and


its priorities, managing experiment and evaluating results (SSC,
1998).

 The objective of On-farm research is to identify existing inputs or


practices that might help to solve major problems of many farmers in
a defined study area (Wuest, 1999).
Plate 1: showing On-farm Tobacco Test to Evaluate and Compare Varietal Resistance to the
Bacterial Disease, Granville Wilt.
Photo credit: Craven County Center, (2012).
REASON FOR ON-FARM RESEARCH

 On-farm research gives high quality results regarding the suitability of the
investigated technological innovations under small, medium and large scale
farming conditions (Ashby, 1986).

 Since farmers are the adopters, the adapters, and often the innovators of
new farming techniques. It would therefore be unwise to undertake farm-
based research without involving farmers in the research process as much as
possible (Wuest et al., 1991).

 Using accepted methods of on-farm testing, farmers can achieve


experimental precision comparable to those of intensive university research
trials (Spencer, 1993).
STAGES OF ON-FARM RESEARCH

 Diagnosis

 Planning

 Experimentation

 Assessment / Evaluation of results

 Recommendation and diffusion


TYPES OF ON-FARM RESEARCH

 Researcher-designed and Managed Trial

 Researcher-designed and Farmer-managed Trials

 Farmer-designed and Managed Trials


Table 1: Three Types of On-farm Research

OBJECTIVES TRIAL TYPE TRIAL TRIAL


DESIGN MANAGEMET

Biophysical 1 Researcher-led Researcher-led


feasibility
Profitability, 2 Researcher-led Farmer-led
farmers
Assessment of
prototype
Acceptability: 3 Farmer-led Farmer-led
Farmers’ own
innovations

Source: Rudebjer, (2001)


MEASUREMENTS OF ON-FARM
RESEARCH DATA
In On-farm trial we can distinguish between three types of
measurement which include:

 Measurement of the type that are taken in on-station


trials. These are usually yield components, time to flowering,
milk yields, disease scores, and so on

 Measurements of concomitant variables. These can be at a


plot level, for example problems of water logging, or at a farm
level, for example rainfall or soil type. Some variables, such
as dates of sowing and weeding, and other management
practices may be at either level.

 Measurements of the farmers’ opinions. These are gotten from


informal discussions or questionnaires (SSC, 1998)
ANALYSIS OF ON-FARM DATA

Analysis of on-farm data can be viewed from three ways:

 Analysis of Questionnaire-type Data- These set of data results from


interviews and other observations. This information is normally at
the farmer level, though some questions can relate to particular
plots.

 Analysis of Yield Type Data- This information is mainly at the plot


level, though with some observations at the farm level.

 Combination of (I) and (II) Above- using the results from interviews to
understand the variation in yield type data.
STATISTICAL ANALYSIS

Statistical analysis refers to a collection of methods used to


process large amounts of data .

Source: SSC, (1998).


Principal Component Analysis

 Principal component analysis (PCA) is a mathematical procedure


that uses an orthogonal transformation to convert a set of
observations of possibly correlated variables into a set of values of
linearly uncorrelated variables called principal components (Davis,
1986).

 The number of principal components is less than or equal to the


number of original variables. This transformation is defined in such a
way that the first principal component has the largest possible
variance and each succeeding component in turn has the highest
variance possible under the constraint that it be orthogonal to the
preceding components (Cattel, 1966).

 PCA was invented in 1909 by Karl Pearson. Now it is mostly used as


a tool in exploratory data analysis and formatting predictive models
(Davis, 1986).
APPLICATIONS OF PRINCIPAL
COMPONENT ANALYSIS
 PCA as a multivariate technique can be use in analyzing
relationships among several quantitative variables.

 PCA can be use to analyze variables that are measured on


different units.

 PCA provides information about the relative importance of each


variable in characterizing the objects.

 PCA is used to reduce the number of variables of the data set,


but retain most of the original variability in the data. A small
number of these new variables will usually be sufficient to
describe the observational objects (Rencher, 2002).
WAYS OF ANALYSING DATA

PCA can be done in two ways:

 Eigenanalysis of the covariance matrix- here data are


analysed without standardizing them.

 Eigenanaysis of the correlation matrix-here data are


standardized. When using variables measured in different
units, the correlation matrix must be used.
THE PCA TECHNIQUES - STAGE 1

 The first stage in rotating


the data cloud is to
standardize the data.

 The standardized axes are


labeled S1, S2 and S3.

Source: Hotelling, (1993)


The PCA Techniques –
Stage 2
 PCA chooses the first PCA
axis as the line that goes
through the centroid and also
goes through the maximum
variation in the data

 The second PCA axis also


must go through the centroid,
goes through the maximum
variation in the data and must
be “orthogonal” to PCA 1

Source: Hotelling, (1993)


THE PCA TECHNIQUES – STAGE 3

 Rotation of the coordinate


frame of
PCA Axis 1 to be on the X-
axis, and
PCA Axis 2 to be on the Y-
axis, we
then get scatter diagram
like this.

Source: Hotelling, (1993)


HOW DO WE DETERMINE HOW MANY
PRINCIPAL AXES ARE WORTH
INTERPRETING?
 There are as many principal components that can be computed as
there are original variables, however, only the most important ones
are of relevance for further analysis. this can be found by
checking the eigenvalues. Every axis has an eigenvalue (also
called latent root) associated with it, and they are ranked from the
highest to the lowest

 PCA Axis 1: 63%

 PCA Axis 2: 33%

 PCA Axis 3: 4%

Source: Hotelling, (1993)


HOW DO WE KNOW WHAT EACH
VARIABLE CONTRIBUTE TO THE
VARIOUS PRINCIPLE COMPONENT
AXES?

 We look at the component loadings (or "factor loadings"):


this option shows to what degree your different original
variables enter into the different components. These
component loadings are important when you try to interpret
the “meaning” of the components.
Table 2: Showing Component Loading

Variables PCA 1  PCA 2 PCA 3 


SI 0.9688  0.0664 -0.2387 

S2 0.9701  0.0408 0.2391 

S3 -0.1045  0.9945 0.0061 

Source: Hotelling, (1993)


Using PCA to Analysis On-farm
Research Data
 Chemical and textural properties were measured on soil from 18
farmers’ fields in Yamrat, Bauchi State, Nigeria (Table 3). The
table has 18 observational units (Fields), each with measured
variables (soil characteristics).

The questions which arise are;

 Which soil properties are correlated (relationship)?

 Which soil properties contribute most to the overall variance in


soil characteristics?

 How the number of variables can be reduced without losing too


much information?

Source: Mutasaers et al., (1997).


Table 3: soil characteristics of
18 farmers’ fields

Source: Mutasaers et al., (1997).


Table 4: Correlation Coefficients of Soil
Characteristics of 18 Farmers'Fields

Source: Mutasaers et al., (1997).


Table 5: Eigenvalues of the Correlation Matrix and the Proportion and
Total of Variance Explained by the Five Largest Principal Components

Source: Mutasaers et al., (1997).


Table 6: Presents principal components with
their percentage variability
Principal Component Percentage Variability (%)

PRIN 1 43.4

PRIN 2 20.2

PRIN 3 12.9

PRIN 4 7.8

PRIN 5 7.1

Source: Mutasaers et al., (1997).


Table 7: Eigenvectors of Principal Components Representing a Linear
Combination of the Original Variables

Source: Mutasaers et al., (1997).


Table 8: Standardized Principal Component Scores used as three New
Variables Representing 76.6% of the Variance from the Original 11 Soil
Characteristics.

Source: Mutasaers et al., (1997).


Some Statistical Software Use for
PCA Analysis

 GENSTAT- General Statistics

 AGSTATS- Agricultural Statistics

 PAST- Pale Ontological Statistics


Conclusion
 The correct design of experimental studies, the selection of the
appropriate statistical analysis of data and the efficient
presentation of results are key to the good conduct and
communication of science.

 On-farm research has shown to be site specific, broader and easily


adopted as compared to on-station research.

 In on-farm research, during data collections, one encounter


situations where there are large number of variables. A good
statistical analysis would be needed to make valid conclusion
about such research.

 Principal component analysis is a powerful tool for reducing a


number of observed variables into a smaller number of variables
that account for most of the variance in the data set and it allows
the use of variables which are not measured in the same units.
Thereby making it a good tool for On-Farm research data analyzes.
K S
A N
T H
PRINCIPAL COMPONENT ANALYSIS (PCA) AS AN IDEAL TOOL
FOR ANALYSING ON-FARM RESEARCH DATA
by Odoligie IMARHIAGBE is licensed under a
Creative Commons Attribution-NonCommercial-ShareAlike 4.0 Internati
onal License
.

You might also like