Factor Analysis Nature Mechanism Uses in Social and Management Researches

Factor Analysis: Nature, Mechanism & Uses in Social and Management


Article · April 2009


16 21,491

2 authors, including:

Nimalathasan Balasundaram
University of Jaffna


Journal of Cost and Management Accountant, Bangladesh. 2009; XXXVII (2):15-25.

Factor Analysis: Nature, Mechanism and Uses in Social and Management Science



Factor Analysis (FA) attempts to simplify complex and diverse relationships that exist among

a set of observed variables by uncovering common dimensions or factors that link together

the seemingly unrelated variables and consequently provides insight into the significance of

underlying structure of the data. This state of affairs might produce some frustration among

social and management science practitioners and even among some academicians, who find it

difficult to comprehend and interpret the mechanism and results of FA. Therefore the study is

designed to assist those who need to read and comprehend research articles on FA, as well as

who may find it desirable to use FA in their own work.

Keywords: Factor Analysis, Nature, Mechanism, Social and Management Science.

Professor of Management Studies, Department of Management Studies, University of
Lecturer, Department of Commerce, University of Jaffna, SriLanka & Ph.D Research
Fellow (SAARC), Department of Management Studies, University of Chittagong.


Thousands of variables are proposed to explain the complex situation and their

interconnections and interrelationships. In this regard the few basic variables and propositions

central to understanding remain to be determined. The systematic dependencies and

correlations among these variables are charted only on presence - absence or rank order

scales. And to take the data on any one variable at face value is questionable in terms of their

of validity, reliability and comparability. Being confronted with entangled behaviour,

unknown interdependencies, masses of qualitative and quantitative variables, and bad data,

many social scientists are turning towards Factor Analysis (FA) to uncover characteristic

features of major social and international phenomena.

FA can simultaneously manage over a hundred variables, compensate for random error and

invalidity, and disentangle complex interrelationships into their major and distinct

regularities. FA is not without cost, however, it is mathematically complicated and entails

diverse and numerous considerations in application. Its technical vocabulary includes new

terms such as ‘eigenvalues’, ‘rotate’, ‘simple structure’, ‘orthogonal’, ‘loadings’ and


There are many studies conducted in the field of FA. Most of the articles are concerned with

the concerns of western countries. A few abridged studies are found available in Bangladesh

and other similar countries, but no detailed study is seen on it. Therefore, the authors took

interest to somewhat cover this research gap. The study was undertaken to understand the FA

and its application in social and management science researches.

Objectives of the Study

The followings are the main objectives of the paper.

1. To explain the terminologies of FA and its approaches.

2. To demonstrate its mechanism.

3. To focus on its application in social and management science researches.

Conceptual Overview

Factor Analysis

FA is a part of the General Linear Model (GLM) family of procedures bearing same

assumptions as multiple regressions e.g. linear relationships, interval or near – interval data,

latent variables, proper specification including relevant variables and excluding extraneous

ones, lack of high multicollineraity, and multivariate normality. It is useful for the purpose of

testing significance of results of the research. Further, FA is one of the most commonly used

methods for summarizing and reducing data to significant ones in social science researches.

FA assumes that underlying dimensions of factors can be used to explain complex


FA like multiple regression analysis is usually done by computer since it is manually

inconvenient even with a small number of variables and data, and the varied statistical

packages viz.,Statistical Package for Social Sciences (SPSS), Analysis of MOment Structures

(AMOS), S+, R are available for this.

The goal of FA is to identify not-directly-observable factors based on a larger set of

observable or measurable indicators (variables). It attempts to identify underlying variables or

factors that explain the patterns of correlations within a set of observed variables. FA is often

used in data reduction to identify a small number of factors that explain most of the variances

observed in a much larger number of manifest variables. FA can also be used to generate

hypotheses regarding casual mechanisms or to screen variables for subsequent analysis (e.g.

to identify co-linearity prior to performing a liner regression analysis). It is a generic term for

a family of statistical techniques concerned with the reduction of a set of observable variables

in terms of small number of latent factors. It has been developed primarily for analysing

relationships among a number of measurable entities (such as survey items or test scores).

The underlying assumption of FA is that there exists a number of independent variable (or

“latent variables”) that account for the correlations among dependent variables all becoming

zero. In other words, the latent variables determine the values of the dependent variables (The

University of Texas at Austin, 1995). Each dependent variable (Y) can be expressed as a

weighted composite of a set of latent variables (F) such as:

Y = α1 F1 + α2 F2 + ----------- + αn Fn


Y = Dependent variable

α = A constant

F = Independent variable

n = Number of independent variable

On the other hand Dillon and Goldstein (1984) pointed out that FA is essentially a method of

meaningful reduction of data. It tries to simplify complex and diverse relationships that exist

among a set of observed variables by uncovering common dimensions or factors that link

together the seemingly unrelated variables and consequently provides insight into the

underlying structure of the data. FA has the ability to produce descriptive summaries of data

matrices, which aid in detecting the presence of meaningful patterns among a set of variables

(Dess and Davis, 1984).

FA has most frequently been used to study relationships among the attributes. This use of FA

is called ‘R-type’, and is well known in the social sciences. Examples exist in many areas

including intelligence (Guilford, 1967; Thrustone, 1938) and personality (Cattell, Eber, and

Tatsuoka, 1970; Costa and McCrae, 1992). When FA is used to identify relationships among

the entities it is called ‘Q-type’. Q-type FA has been employed with less frequency than R-

type, and is not usually used by the social science researchers. It is of use in situations where

the profile of the attributes (the pattern of scores that an individual obtains over a relevant set

of measures) is of interest rather than focusing upon the analysis or description of a single

variable at a time. The use of FA may be demonstrated in two different data analysis

contexts. In one instance, the data analyst may have no theoretical hypothesis in mind when

using FA and is simply searching for a common structure underlying the data. The use of FA

in this way is called exploratory. On the other hand, FA may also be used in a case where the

data analyst may have some prior theoretical information on the common structure

underlying the data and wishes to confirm the hypothesized structure. The use of FA in this

way is called confirmatory.

Most applications of FA have been in psychology and in social and management sciences. As

for example, suppose that information is called from a wide range of people as to their

occupation, type of education, whether or not they own their own home, and so on. Then one

might ask if the concept of social class is multidimensional or if it is possible to construct a

single ‘index’ of class from the data.

Recommendations of the Sample Size

A wide range of recommendations regarding sample size in FA has been made. The

recommendations are usually stated in terms of either the minimum sample size (N) for a

particular analysis or the minimum ratio of N to the number of variables, p, i.e. the number of

survey items being subjected to FA (MacCallum, Widaman, Zhang, and Hong1999).

Gorsuch (1983) recommended five subjects per item, with a minimum of 100 subjects,

regardless of the number of items. Guilford (1954) argued that N ‘should be at least’ 200,

while Cattell (1978) recommended three to six subjects per item, with a minimum of 250.

Comrey and Lee (1992) provided the following guidance in determing the adequacy of

sample size: 100 = poor, 200= fair, 300=good, 500 = very good, 1000 or more = excellent.

More demanding recommendation for sample size is ideally several hundred (Cureton and

D’Agostino, 1983).

Mechanism of the Factor Analysis

Factor model

In practice, there are several factor models which differ in significant respects. A model most

often applied in psychology is called “common FA”. Indeed, psychologists usually reserve

the term “FA” for just this model. Common FA is concerned with defining the patterns of

common variations among a set of variables. Variation unique to a variable is ignored. In

contrast, another factor model called “component FA” is concerned with covering all the

variations in a set of variables, whether common or unique. Other factor models are “image

analysis”, “canonical analysis”, and “alpha analysis”. Image analysis has the same purpose

as common FA, but with more elegant mathematical properties. Canonical analysis defines
common factors for a sample of cases that are the best estimates of those for the population; it

enables test of significance. Alpha analysis defines common factors for a sample of variables

that are the best estimates of those in a universe.

Factor loadings

The factor loadings, also called component loading in Principal Component Analysis (PCA),

are the correlation coefficients between the variables (rows) and factors (columns). PCA is

the commonly used method for grouping the variables under few unrelated factors. Variables

with a factor loading of higher than 0.5 are grouped under a factor. A factor loading is the

correlation between the original variable with the specific factor and the key to understanding

the nature of that particular factor (Debasish, 2004).


Communality refers to a measure of the percentage of a variable’s variation that is explained

by the factors. It is the amount of variance an original variable share with all other variables

included in the analysis. A relatively higher communality indicates that a variable has much

in common with the other variables taken as a group (Islam and Mamun, 2005). Further, the

communality measures the presence of variance in a given variable explained by all the

factors jointly and may be interpreted as the reliability of the indicator.

Eigen values (also called characteristic of roots)

The Eigen value for a given factor measures the variance in all the variables which is

accounted for by that factor. The ratio of Eigen values is the ratio of explanatory importance

of the factors with respect to the variables. If a factor has a low Eigen value, then it is

contributing little to the explanation of variances in the variables and may be ignored as

redundant as compared to more important factors. Perhaps the most frequently used

extraction approach is the “root greater than one” criterion. Originally suggested by Kasier
(1958), this criterion retains those components whose Eigen values are greater than 1. The

rationale for this criterion is that any component should account for more “variance” than any

single variable in the standardized test score space.

Scree plots

Scree plots are formed by plotting the number of factors against their respective eigen value

(Hackett and Foxall, 1999). It is a graph of the eigen values against all the factors. The graph

is useful for determining how many factors to retain.

Figure 1: Plot of the Eigen values

According to the Figure 1, the plot looks like the side of a mountain, and "scree" refers to the

debris fallen from a mountain and lying at its base. Therefore, the scree plot proposes to stop

analysis at the point the mountain ends and the debris (error) begins. In this instance, that

point coincides with the eigenvalue criterion.

Factor extraction

There are several ways to conduct FA (i.e., principal components; unweighted least squares;

generalized least squares; maximum likelihood; principal axis factoring; alpha factoring;

image factoring) and alternative choice of methods (i.e., correlation matrix or a covariance


Factor rotation

The interpretability of factors can be improved through rotation. Rotation maximizes the

loading of each variable on one of the extracted factors whilst minimizing the loading on all

other factors. Rotation works through changing the absolute values of the variables whilst

keeping their differential values constant. Varimax, quartimax and equamax are the variant

techniques of orthogonal rotation.

The varimax method is the most popular among these techniques and is often used to make

principal components analysis (PCA). The procedure seeks to rotate factors so that the

variation of the squared factor loadings for a given factor is made large. The exact choice of

rotation depends largely on whether or not the researcher should choose one of the orthogonal

rotations (Generally, researchers’ recommend as varimax). Quartimax method is an

orthogonal alternative which minimizes the number of factors needed to explain each

variable. This type of rotation often generates a general factor on which most variables are

loaded to a high or medium degree. Such factor structure is usually not helpful for the

research purpose. Finally, the equimax method attempts to achieve simple structure with

respect to both the rows and columns of the factor loading matrix.

Oblimax, quartimin, covarimin, biquartimin, and oblimin methods are oblique rotation.

Oblimax seeks to rotate the factors so that the numbers of high and low loadings are

increased by decreasing those in the middle range; quartimin minimizes the sum of inner

products of the (reference) structure loadings; covarimin is the varimax analog of the oblique

rotation methods; biquartimin is a compromise algorithm falling somewhere between the

quartimin and covarimin methods; and oblimin is similar to the biquartimin method in that it

combines the quartimin and covarimin methods.

Factor scores

The methods of principal components and FA are both data reduction techniques.

Consequently, the researcher may want to calculate the projection of each observation on

each of the factors. Factor scores give the location of each observation in the space of the

common factors.

Statistical validity

Kasier – Meyer –Olkin (KMO) measure of Sampling Adequacy is a measure of whether or

not the distribution of value is adequate for conducting FA. A measure of >0.9 is marvellous,

>0.8 is meritorious, >0.7 is middling, >0.6 is mediocre, >0.5 is miserable and <0.5 is

unacceptable. FA would be meaningless with an identity matrix. A significance value <0.05

indicates that the data DO NOT produce an identity matrix and are thus appropriately

multivariate normal and acceptable for FA (George and Mallery, 2003).

Process of Factor Analysis

Norusis (1993) described the process of FA in the following ways: The first step in FA is to

produce a correlation matrix for all variables. Variables that do not appear to be related to

other variables can be indentified from this matrix. The number of factors necessary to
represent the data and the method for calculating them must then be determined. Principal

Component Analysis (PCA) is the most widely used method of extracting factors. In PCA,

linear combinations of variables are formed. The first principal component is that which

accounts for the largest amount of variance in the sample, the second principal component is

that which accounts for the next largest amount of variance and is not correlated with the first

data. Next coefficients called ‘factor loadings’ that relate variables to identified factors are

calculated. Factor models are then often ‘rotated’ to ensure that each factor has non-zero

loadings for only some of the variables. Rotation makes the factor matrix more interpretable.

Following rotation, scores for each factor can be computed for each case in a sample. These

scores are often used in further data analysis.

Empirical Evidence to Understand the Factor Analysis

Title Determinants of Key Entrepreneurial Characteristics: A Study of Ready Made

Garments (RMGs) Entrepreneurs of Bangladesh

Objectives of the Study

The principal objectives of this study are delineated below.

1. To identify the characteristics influencing the RMG entrepreneurs of Bangladesh.

2. To examine the key entrepreneurial characteristics of RMG entrepreneurs.

Research Methodology

Sampling design

The sample was derived from the Bangladesh Garment Manufacturing Export Association

(BGMEA). Twenty five RMGs entrepreneurs were selected as convenience sample method in


Data collection

Primary and secondary data were used for the study. Primary data were collected through the

written questionnaire following direct personal interviewing technique. The secondary data

were gathered from journals, books, magazines, etc.


The questionnaire was administrated to RMG entrepreneurs in Chittagong port city. A seven

points Likert type summated rating scales of questionnaire from strongly disagree (-3) to

strongly agree (+3) were adopted to identify the characteristics.

Tool of Data Analysis

The present study has used a sophisticated method of statistics - FA using varimax rotation

analyzing the data collected. In order to obtain interpretable characteristics and simple

structure solutions, researchers have subjected the initial factor matrices to varimax rotation

procedures (Kaiser, 1958). Varimax rotated factors matrix provides orthogonal common

factors. Finally ranking of the indicators has been made on the basis of factor scores.

Reliability and Validity

The reliability value of our surveyed data was 0.787 for characteristics. If we compare our

reliability value with the standard value alpha of 0.7 advocated by Cronbach (1951), a more

accurate recommendation (Nunnally & Bernstein’s, 1994) or with the standard value of 0.6 as

recommendated by Bagozzi & Yi’s (1988) we find that the scales used by us are sufficiently

reliable for data analysis. Regarding validity, Kasier – Meyer –Olkin (KMO) measure of

Sampling Adequacy is a measure of whether or not the distribution of value is adequate for

conducting FA. As per KMO measure, a measure of >0.9 is marvellous, >0.8 is meritorious,

>0.7 is middling, >0.6 is mediocre, >0.5 is miserable and <0.5 is unacceptable.

Table 1: KMO and Bartlett’s test

Kaiser – Meyer- Olkin Measures of Sampling Adequacy .772
Bartlett’s Test of Sphericity
Approx. Chi- Square 170.743
df .45
Significance .000
Source: Survey data

According to Table 1 the data returned a value sampling adequacy of 0.772 indicating

middling. Bartlett’s test of Sphericity is a measure of the multivariate normality of the set of

distributions. It also tests whether the correlation matrix conducted within the FA is an

identity matrix. FA would be meaningless with an identity matrix. A significance value <0.05

indicates that the data DO NOT produce an identity matrix and are thus appropriately

multivariate normal and acceptable for FA (George and Mallery, 2003). The data within this

study returned a significance value of 0.000, indicating that the data was acceptable for FA.

When the original ten characteristics were analysed by the Principal Component Analysis

(PCA) with varimax rotation, three characteristics extracted from the analysis with an Eigen

value of =1, which explained 80.862 percent of the total variance. The result of the FA is

presented in Table 2. The factor loadings have ranged from 0.956 to .674. The higher a factor

loading, the more would its test reflect or measure as characteristics. The characteristic

getting highest loading becomes the title of each group of characteristics e.g. risk taking –

title of characteristics group I and the like. Further, the present study has interpreted the

characteristics loaded by variables having significant loadings of the magnitudes of 0.50 and

above (Pal, 1986; Pal and Bagi,1987).

Table 2: Principal Component Analysis – Varimax Rotation of Characteristics


Name of the characteristics Characteristics group – I Characteristics group - II Characteristics group -III

Risk taking .893

Information seeking .888

Persistence .887

Systematic planning .858

Commitment to work .811


Persuasion and networking .786

Self confidence .757

Goal setting .674

Demand for work contract .956

Opportunity seeking .936

Eigen Value 5.648 1.307 1.131

Proportion of Variance 56.480% 13.068% 11.314%

Cumulative Variance 56.480% 69.548% 80.862%


Source: Survey data

Characteristics group I: Risk taking –This characteristic was represented by eight

characteristics with factor loadings ranging from .893 to .674. They were risk taking;

information seeking; persistence; systematic planning; commitment to work contract;

persuasion; self confidence and goal setting. This characteristic accounted for 56.480% of the

rated variance.

Characteristics group II: Demand for work contract – One characteristic with .956

belonged to demand for work contract. This characteristic explained 13.068% of the rated


Characteristics group III: Opportunity Seeking –Only one characteristic with .936, it

consisted opportunity seeking. A variance of 11.314% was explained by this characteristic.

Ranking of the above characteristics in order of their importance, along with factor score, is

shown in Table 3. The importance of these characteristics, as perceived by the respondents,

has been ranked on the basis of factor score.

Table 3: Ranking of Characteristics according to their importance

Key Characteristics Factor score Rank

Characteristics group-1: Risk taking 0.172 3

Characteristics group- II: Demand for work contract 0.739 2

Characteristics group - III: Opportunity seeking 0.771 1

Source: Survey data

As depicted in table 3, the characteristics: ‘ Opportunity seeking’; ‘Demand for work

contract’ and ‘Risk taking’ got the ranks of first, second and third respectively and constitute

the key characteristics of RMGs’ entrepreneurs.

Uses of FA in Social and Management Science Research

The following are the applications of FA relevant to various scientific and policy concerns.

Interdependency and pattern delineation: FA may be used to identify the linear

relationships into their separate patterns. Each pattern will appear as a factor delineating a

distinct cluster of interrelated data.

Parsimony or data reduction: It can be useful for reducing a mass of information to

economical description.

Structure: FA may be employed to discover the basic structure of a domain.

Classification or description: It is a tool for developing an empirical typology. It can be

used to group interdependent variables into descriptive categories, such as ideology,

revolution, liberal voting, and authoritarianism. It can be used to classify nation profiles into

types with similar characteristics or behaviour.

Scaling: The scale may refer to such phenomena as political participation, voting behaviour,

or conflict. FA offers a solution by dividing the characteristics into independent sources of

variation (factors). Each factor then represents a scale based on the empirical relationship

among the characteristics.

Hypothesis testing: There are numerous hypotheses regarding dimensions of attitude,

personality, group, social behaviour, voting, and conflict. Since the meaning usually

associated with ‘dimension’ is that of a cluster or group of highly intercorrelated

characteristics or behaviour, FA may be used to test for their empirical existence. Which
characteristics or behaviour should, by theory, be related to which dimensions can be

postulated in advance and statistical tests of significance can be applied to the FA results.

Besides those relating to dimensions, there are other kinds of hypotheses that may be tested

e.g. if the concern is with a relationship between economic development and instability,

holding other things constant, a FA can be done of economic and instability variables along

with other variables that may affect (hide, mediate, depress) their relationship. The resulting

factors can be so defined (rotated) that the first several factors involve the mediating

measures (to the maximum allowed by the empirical relationships). A remaining independent

factor can be calculated to best define the postulated relationships between the economic and

instability measures. The magnitude of involvement of both variables in this pattern enables

the scientist to see whether an economic development instability pattern actually exists when

other things are held constant.

Data transformation: FA can be used to transform data to meet the assumption of other

technique for example application of the multiple regression technique assumes (if tests of

significance are to be applied to the regression coefficients) that predictors –the so—called

independent variables – are statistically unrelated. If the predictor variables are correlated in

violation of the assumption, FA can be employed to reduce them to a smaller set of

uncorrelated factor scores. The scores may be used in the regression analysis in place of the

original variables, with the knowledge that the meaningful variation in the original data has

not been lost. Likewise, a large number of dependent variables can also be reduced through


Exploration: The unknown domain may be explored through FA. It can reduce complex

interrelationships to a relatively simple linear expression and it can uncover unsuspected,

perhaps startling, relationships.

Mapping: Besides facilitating exploration FA also enables a scientist to map the social

environment. Mapping means the systematic attempt to chart major empirical concepts and

sources of variation.


FA refers to a collection of statistical methods for reducing correlated data into a smaller

number of dimensions or factors. Often in the social or management sciences, indeed in many

sciences, it is of interest to investigate the structure of a particular domain by analysing

relevant measures from that domain. This paper focused primarily the approaches to FA in

the perspective of nature, mechanism and uses in social and management sciences.

