Professional Documents
Culture Documents
Unit 4 Factor, Discriminant, Conjoint, Innovation-Diffusion
Unit 4 Factor, Discriminant, Conjoint, Innovation-Diffusion
(UNIT 4)
BMS 4th Sem
Sugandha Jain
SAMPLING DESIGN &
DATA ANALYSIS
Measures of Relationship
Measures of Relationship
1. Univariate population: The population consisting of measurement of only
one variable.
2. Bivariate population: If for every measurement of a variable, X, we have
corresponding value of a second variable, Y, the resulting pairs of values
are called a bivariate population.
3. Multivariate population: We may also have a corresponding value of a
third variable, Z, or forth variable W, and so on, the resulting pairs of
values are called a multivariate population.
• In case of bivariate/multivariate population, we want to know the relation of the two or more
variables to one another. There are many methods of determining the relationship, but no
method can tell for certain that a correlation is indicative of causal relationship.
15 balls of different
colors
Visual
Representation of
Factor Analysis
Q1, Q2, Q3: Palpitation, Dry mouth, Sweating 🡪 Somatic component of anxiety
Q4, Q5, Q6: Worry, Apprehension, Nervousness 🡪 Affective component of anxiety
What is a factor?
• A construct that is not directly observable but that needs to be inferred from the input
variables.
• An underlying dimension that accounts for several observed variables.
• A linear combination of variables. Mathematical representation of a factor:
Fi = Wi1X*1 + Wi2X*2 + Wi3X*3 + ... + WikX*k
where,
X*i = ith standardized variable
Fi = Estimate of ith factor
Wi = Weight or factor score coefficient for ith standardized variable.
k = Number of variables
• There can be one or more factors, depending upon the nature of the study and the number of
variables involved in it.
• The factors are statistically independent.
Two types of factors
Reflective: Eg. symptoms Formative: Eg. Causes/ sources
Factor loading
• Those values which explain how closely the variables are related to each one of the
factors discovered.
• It is the correlation coefficient of the extracted factor score with a variable. Also
known as factor-variable correlations.
• Factor-loadings work as key to understanding what the factors mean.
• It is the absolute size (rather than the signs, plus or minus) of the loadings that is
important in the interpretation of a factor.
• A factor matrix contains the factor loadings of all the variables on the factors.
2
Communality (h )
• Amount of variance a variable shares with all the other variables. This is the proportion
of variance explained by the common factors.
• Shows how much of a variable is accounted for by the underlying factors taken together.
• A high value of communality means that not much of the variable is left over after
whatever the factors represent is taken into consideration.
• It is worked out in respect of each variable as under:
h2 of the ith variable = (ith factor loading of factor A)2 + (ith factor loading of factor B)2 + …
Eigen value (Latent root)
• When we take the sum of squared values of factor loadings relating to a factor,
then such sum is called eigen value or latent root.
• Eigen value is the total amount of variance explained by a factor.
• It indicates the relative importance of each factor in accounting for the particular
set of variables being analysed.
• Factors with eigen value higher than a specific value can be selected. This value is
generally taken as 1. The rationale is that a worthwhile component should explain
at least one variable's worth of the variability.
Total sum of squares
• When eigen values of all factors are totaled, the resulting value is termed as the
total sum of squares.
• This value, when divided by the number of variables (involved in a study),
results in an index that shows how the particular solution accounts for what all
the variables taken together represent.
• If the variables are all very different from each other, this index will be low. If
they fall into one or more highly redundant groups, and if the extracted factors
account for all the groups, the index will then approach unity.
Rotation
• Just as different stains on a microscope slide reveal different structures in the tissue,
different rotations in factor analysis reveal different structures in the data.
• Though different rotations give results that appear to be entirely different, but from
a statistical POV, all results are taken as equal, none superior or inferior to others.
• However, from the standpoint of making sense of the results of factor analysis, one
must select the right rotation.
• Independent factors: orthogonal rotation; Correlated factors: oblique rotation.
• Communality for each variable will remain undisturbed regardless of rotation but
the eigen values will change accordingly.
Factor Scores
• Composite scores estimated for each respondent on the derived factors.
• Factor score represents the degree to which each respondent gets high
scores on the group of items that load high on each factor.
• Factor scores can help explain what the factors mean.
Procedure of conducting factor analysis
1. Problem formulation and design
2. Checking appropriateness of data matrix
a. Sample size
b. Composition of data matrix
c. Independence of measures
3. Construction of correlation matrix
a. Measures of association
b. Significance of correlation matrix
4. Choosing method of factor analysis
5. Determining number of factors (Apriori analysis)
6. Rotation of factors
7. Interpretation of factors
8. Validation of factor structure
Problem formulation & Design
• Stating objectives of the FA based research (For eg. Sources of well-being)
Data summarization (understanding the relationship)
Data reduction (dimension reduction and identifying variables linked with each
dimension)
• The theory behind the hypothesis
• Selecting the measures and measurement levels (metric scales are always
preferred)
Checking appropriateness of data matrix
• Sample size: larger the better, or 5-10x the number of variables
• Composition: Data matrices should not comprise of submatrices from
different data sources
• Independence: Any dependence in the measurement of the variable
may artificially increase their correlations, thus causing them to
appear together on the same factor.
Construction of correlation matrix
• Association: Correlation coefficient is the most commonly used
measure of association, but it underestimates the strength of
curvilinear relations. Other measures of association are: distance
measures, cross-product indexes, covariance.
• Significance of the matrix: To calculate Bartlett and KMO
Methods of factor analysis
Type Descriptive Inferential
• This is bad because it ignores the useful information that gene Y provides. Same goes for
projecting the genes onto the Y-axis and ignoring information from the X-axis.
• LDA provides a better way…
Reducing a 2D graph to 1D graph with LDA
LDA uses information from both genes to create a new axis and projects the
data onto this new axis in away to maximise the separation of the 2 categories.
How LDA creates a new axis?
The new axis is created according to two criteria (considered simultaneously):
Why both distance and scatter are important
What if we have more than 2 genes? LDA with 3 genes:
The process is the same. LDA creates an axis that maximises the distance
between the means for the two categories while minimising the scatter.
m p q
(market size: provides (coefficient of innovation: (coefficient of imitation)
scale of the demand probability that an innovator
forecast) will adopt at time t)
External influence Internal influence
Advertising effect Word of mouth, social
contagion effects
The parameters p and q determine the shape of the adoption curve.
p and q specify how fast or slow the adoption of a new product is expected to proceed.
To sum up, the adoption of a new product can proceed either by small scale or a large
scale (specified by m) or at a fast or slow pace (specified by p and q).
Bass Models with various levels of innovation (p) and imitation
(q)
• For academic purpose, the recorded document presents a knowledge base on the topic
and for the manager seeking help in taking more informed decisions, the report provides
the necessary guidance for taking appropriate action.
• As the report documents all the steps and analysis undertaken, it serves to authenticate
the quality of the work carried out and establishes the strength of the findings obtained.
Types of research reports (based on size)
1. Brief Reports: Not formally structured and are generally short, not running more than
4-5 pages. Information provided has limited scope and is prepared either for
immediate consumption or as a prelude to the formal structured report that would
subsequently follow. These reports could be designed in several ways.
a) Working papers/ Basic reports: Focus is on present study. Purpose is to collate the
process carried out in terms of the scope, framework, methodology and instrument,
result and findings of the study. However, interpretation of findings and study
background might be missing, as focus is not on past literature. These reports serve as
a reference point when writing the final report or when the researcher wants to revisit
the detailed steps followed in collecting the study-related information.
b) Survey reports: Focus here is to present findings in easy-to-comprehend format that
includes figures and tables. The reader can then study the patterns in findings to arrive
at appropriate conclusions, essential for resolving the business dilemma.. Advantage:
simple and easy to understand and present findings in a clear and usable format
Types of research reports (based on size)
2. Detailed Reports: More formal and pedantic in their structure and are
essentially either academic, technical or business reports.
3. Technical Reports: Major documents and include all elements of the basic
report, as well as the interpretations and conclusions, as related to the
obtained results. Has complete problem background and any additional past
data/records that are essential for comprehending and interpreting the present
study output. All sources of data, sampling plan, data collection instrument(s),
data analysis outputs would be formally and sequentially documented.
4. Business Reports: Absence of technical rigour and details of the technical
report. Would be in the language and include conclusions as understood and
required by the business manager. The tables, figures and numbers of the first
report would now be pictorially shown as bars and graphs and the reporting
tone would be more in business terms rather than in conceptual or theoretical
terms. If needed, the tabular data might be attached in the appendix.
Preliminary Section
• Title page
• Letter of Transmittal
• Letter of Authorization
Appendices
• Table of Contents Bibliography
• Executive Summary Glossary
• Acknowledgements
Background Conclusion
• Problem Statement • Conclusions and
• Study Introduction and Background
• Scope and Objectives of the Study
Recommendations
• Limitations of the Study Process of
• Review of Literature
Report
Methodology Findings Writing
• Research Design • Results
• Sampling Design • Interpretation of Results
• Data Collection
• Data Analysis
Features of good report writing
• Clear report mandate: While writing the research problem statement and study
background, the writer needs to be focused, lucid , precise and explicit regarding what the
problem is, the background that provided the impetus to conduct the research and the
study domain. This is prepared in such a way that the writer wouldn’t need to be physically
present in to clarify the research mandate, and that the reader has no earlier insights into
the problem situation.
• Clearly designed methodology: Writer needs to be explicit in terms of the logical
justification for using the study methods and techniques. Language should be
non-technical and reader friendly and any technical explanations or details must be
provided in the appendix. Transparency regarding procedures used evokes confidence in
the findings and resulting conclusions.
• Clear representation of findings: The sample size for each analysis, any special
conditions or data treatment must be clearly mentioned either as a footnote/endnote, so
that the reader takes this into account while interpreting and understanding the study
results. Complete honesty and transparency in stating the treatment and editing of missing
or contrary data is extremely critical.
• Representativeness of study finding: A good research report is also explicit in terms of
extent and scope of the results obtained, and in terms of the applicability of findings. This is
Guidelines for effective documentation
• Command over medium (correct and effective language of
communication)
• Phrasing protocol (no personal pronouns; neutral tone; avoid long
sentences, slangs; proper citation, quotations, italics; maintain
sanctity of formal documentation)
• Simplicity of approach (avoid technical jargon)
• Report formatting and presentation (paper quality, margins, font
style, size, professional and uniform, graphs and figures for variation
and relief)
Guidelines for presenting tabular data
• Table identification details (number and title, short, no verbs or
articles)
• Data arrays (arrangement of data; ascending: either chronologically or
alphabetically, sub-categorization)
• Measurement unit
• Spaces, leaders….. and rulings
• Assumptions, details and comments (special definition or formula
needed to understand the data)
• Data sources (if data is secondary)
• Special mention (special figures can be bold, highlighted or
star-marked)
Guidelines for visual data
1. Line and curve graphs: To demonstrate trends and pattern in the data, a line
chart is the best option available. Can show growth patterns of different
sectors/industries in the same time period.
• The time units or the causal variable being studied are to be put on the X-axis
• To compare different series on the same chart, the lines should be of different
colours
• Too many lines not advisable on the same chart as data becomes cluttered;
ideal number would be 5 or less.
• The researcher also must take care to formulate the zero baseline in the chart
as otherwise, the data would seem to be misleading.
2. Area or stratum charts: Similar to line charts. However, here there are
multiple lines that are essentially components of the original composite data.
What is done is that the change in each of the components is individually shown
on the same chart and each of them is stacked one on top of the other. The
areas between the various lines indicate the scale or volume of the relevant
Guidelines for visual data
3. Pie charts: Another way of demonstrating the area or stratum or sectional
representation. The difference between a line and pie chart is that the pie chart cannot
show changes over time. Simply shows the cross-section of a single time period. The
sections or slices of the pie indicate the ratio of that section to the total area of the
parameter being displayed.
• The complete data must be shown as a 100 per cent area of the subject being
graphed.
• Display percentages within or above the pie rather than in the legend as then it is
easier to understand the magnitude of the section in comparison to the total.
• Showing changes over time is difficult through a pie chart, as stated earlier. However,
the change in the components at different time periods could be demonstrated.
Guidelines for visual data
4. Bar charts and histograms: Representation of magnitude of different objects on the
same parameter. The comparative position of objects becomes very clear.
• The usual practice is to formulate vertical bars; however, it is possible to use
horizontal bars as well if none of the variable is time related. Horizontal bars are
especially useful when one is showing both positive and negative patterns on the
same graph (bilateral bar charts).
• Another variation of the bar chart is the histogram here the bars are vertical and the
height of each bar reflects the relative or cumulative frequency of that particular
variable.
Guidelines for visual data
5. Pictogram: Shows graphical representation of data. Most often used in magazines
and newspapers, as they are eye-catching and easy to comprehend by one and all.
They are not a very accurate or scientific representation of the actual data and, thus,
should be used with caution in an academic or technical report.
6. Geographic representation: Geographic or regional maps related to countries,
states, districts, territories can be used as a base to show occurrence of the studied
variable in various regions or to show comparative analysis about major brands or
industries or minerals. In case of comparative data, the researcher must provide the
legend in the displayed map, for example any map of the location may be given.
Oral presentation: Research briefings
• Not go over 20 mins
• Time for questioning
• Know your audience
• Purpose of the briefing
• Study background, findings, implications and recommendations
• Handouts/ brochures
• Visual aids: slides (more graphical/visual data as compared to textual,
chalkboard/ flipcharts, video/ audio tapes)