Professional Documents
Culture Documents
Module - Iv: Advanced Statistical Techniques: Q1. Characteristics of Correlation
Module - Iv: Advanced Statistical Techniques: Q1. Characteristics of Correlation
INTRODUCTION: - In nature we find number of variables inter- related to one another. For example,
amount of rainfall to certain extent and production of paddy, heat and volume of gas, price and demand
of a commodity in the market etc. Correlation theory aims at finding the degree of relationship that
exist between the variables. A statistical tool with the help of which we can find the degree of
relationship that exists between two or more variables is technically called correlation.
3. A positive correlation indicates a positive linear linear association like the one in e.g. the strength of
the positive linear association increase as the correlation becomes closer +1.
4. A negative correlation indicate a negative linear association. The strength of the negative linear
association increases as the correlation become closer to -1.
5. A correlation of either +1 or -1 indicated a prefect linear relationship. This is hard to find with real
data.
6. A correlation of 0 indicated either that: there is no linear relationship between the two variables,
and / or the best straight line through the data is horizontal
2. NEGATIVE CORRELATION:-if the two variables correlation are moving in opposite directions then the
correlation is called negative, i. e if one variable decrease, the other variable increase. For e.g. price and
demand is negatively correlated for if price increase demand decreases and if price decrease s, demand
increases.
3. ZERO CORRELATION: - if there is no correlation between the two variables then the correlation is
called zero correlation or spurious correlation. For example, marks scored by a student in tests and the
amount of rain fall.
4. LINEAR CORRELATION: - if the ratio of the amount of charge in one variable to the amount of charge
in the other variable, bears a constant ratio throughout then, the correlation is said to be linear. This
type of correlation is found only in scientific variables like heat volume of gas.
5. NON- LINEAR OR CURVILINEAR CORRELATION: - if the ratio of the amount of change in one variable
to the amount of change in the other variable does not bear a constant ratio throughout then,
correlation is said to be non- linear or curvilinear. Most of the variables other than scientific variables
show non - linear correlation.
Merits:
2. Just by looking at the scatter of points we can have a rough idea about the existence of correlation.
Demerits:
Karl Pearson has given formula to determine the extent of correlation between two related variables.
This co-efficient of correlation is computed by dividing the product of all the deviations of each pair of
observations from their respective means by the product of the standard deviations of the variables and
number of items, symbolically:
R=
Q 6.Application of Regression?
i)Predictive Analysis: Predictive analysis i.e. forecasting future opportunities abd risks is the most
prominent application of regression analysis in business .Demand analysis, for instance, predicts the
number of items which a consumer will probably purchase. For example ,we can forecast the number of
the shoppers who will pass in front of a particular billboard and the use that data to estimate the
maximum to bid for an advertisement
ii) Operation Efficiency: Regression models can also be used to. optimize business processes. A factory
manager , for example, can create a statistical model to understand the impact of oven temperature on
the shelf life of the cookies baked in those ovens. In a call center, we can analyze the relationship
between wait times of callers and members of complaints
iii) Supporting Decisions: Business today are overloaded with data on finances, operations and customer
purchases. Increasingly, executives are now learning on data analytics to. make informed business
decisions thus eliminating the intuition and gut feel. Regression analysis can bring a scientific angle to
the management of any businesses
iv) Correcting Errors: Regression is not only great for lending empirical support to management decisions
but also for identifying errors in. judgment. For example, a retail store manager may believe that
extending shopping hours will greatly increase sales. Regression analysis, however, may indicate that the
increase in revenue might not be sufficient to support the rise in operating expenses due to longer
working hours (such as additional employee labour charges).Hence, regression analysis can provide
quantative support for decisions and prevent mistakes due to manager's institution
v) New Insights: Over time businesses have gathered a large volume of unorganized data that has the
potential to yeild valuable insights. However, this data is useless without proper analysis. Regression
analysis techniques can find a relationship between different variables by uncovering patterns that were
previously unnoticed. For example, analysis of data from point of sales systems and purchase accounts
may highlight market patterns like increase in demand om certain days of the week or at certain times ot
the year.
Factor analysis is a way to take a mass of data and shrinking it to a smaller data set that is more
manageable and more understandable.
ii)Confirmatory factor Analysis is used for verification as long as you have a specific idea about what
structure your data is or how many dimensions are in set of variables.
The key concept of factor analysis is that multiple observed variables have similar patterns of responses
because they are all associated with a latent (i.e not directly measured)variables.
For example, people may respond similarly to questions about income, education and occupation which
are all associated with the latent variable socio economic status.
ii)Communality (h2):Communality, symbolized as h2, shows how much of each variable is accounted for
by the underlying factor taken together.
iii) Eigen value ( or latent root): When we take the sum of squared values of factor loadings relating to a
factor, then such sum is referred to as Eigen value or latent root. Eigen value indicates the relative
importance of each factor in accounting for the particular set of variables being analysed.
iv) Total sum of squares: When given values of all factors are totalled, the resulting value is termed as
the total sum of squares.
v) Rotation: Rotation, in the context of factor analysis, is something like staining a microscope slide. Just
as different stains omit reveal different structures in the tissues, different rotations give results that
appear to be entirely different, but from statistical point of view, all the results are taken as equal, none
superior or inferior to others. However, from the standpoint of making sense of the results of factor
analysis, one must select and the right rotation. If the factors analysis, one must select the right rotation.
If the factors are independent orthogonal rotation is done and if the factors are correlated, an oblique
rotation is made.
vi) Factor -loadings: loadings are those values which explain how closely the variables are related to
each one of the factors discovered. They are also known as factor-variable correlations.
ii) parsimony or data reduction : Factor analysis can be useful for reducing a mass of information to an
economical description. For example, data on fifty characteristics for 300 nations are unwieldy to handle,
descriptively or analytically. Nations can be more easily discussed and compared on economic
development, size, and politics dimensions, for example, than on the hundreds of characteristics each
dimension involves.
iii) structure : Factor analysis may be employed to discover the basic structure of a domain. As a case in
point, a scientist may want to uncover the primary independent lines or dimensions--such as size,
leadership, and age--of variation in group characteristics and behaviour.
iv) Classification or description : Factor analysis is a tool for developing an empirical typology. 7 It can be
used to group interdependent variables into descriptive categories, such as ideology, revolution, liberal
voting and authoritarianism. It can be used to classify nation profiles into types with similar
characteristics or behavior.
V) scaling: A scientist often wishes to develop a scale on which individuals, groups, or nations can be
rated and compared. The scale may refer to such phenomenon as political participation, voting
behaviour, or conflict. A problem in developing a scale is to weight the characteristics being combined.
Factor analysis offers a solution by dividing the characteristics into independent sources of variation
(factors ).Each factor then represents a scale based on the empirical relationship among the
characteristics.
Vi) Hypothesis testing : Hypotheses abound regarding dimensions of attitude, personality, group, social
behaviour, voting, and conflict. Since the meaning usually associated with "dimension" is that of a cluster
or group of highly intercorrelated characteristics or behaviour, factor analysis may be used to test for
their empirical existence.
Vii) Data transformation: Factor analysis can be used to transform data to meet the assumptions of
other techniques. A large number of dependent variables also can be reduced through factor analysis.
Viii) Exploration : In a new domain of scientific interest like peace research, the complex interrelation of
phenomena have undergone little systematic investigation. The unknown domain may be explored
through factor analysis. It can reduce complex interrelationships to relatively simple linear expression.
ix)mapping: Besides facilitating exploration, factor analysis also enables a scientist to map the social
terrain. These concepts may then be used to describe a domain or to serve as inputs to further research.
Some social domain, such as international relations, family life, and public administration, have yet to be
charted. In Some other areas, however, such as personality, abilities, attitudes, and cognitive meaning,
considerable mapping has been done.
Cluster analysis is a data exploration (mining) tool for dividing a multivariable dataset into "natural"
clusters (groups). We use the methods to explore whither previously undefined clusters (group) exist in
the dataset. For instance, a marketing department may wish to use survey results to sort its customers
into categories.
Cluster analysis is multivariate method which aims to classify a sample of subject on the basis of a set of
measured variable into a number of different groups such that similar subjects are placed in the same
group. An example where this might be used is in the field of psychiatry, where the characterisation of
patients in the basis of clusters of symptoms can be useful in the identification of an appropriate for me
of therapy. In marketing, it may be useful to identify distinct group of potential customers so that, for
example, advertising can be appropriate targeted.
Cluster analysis is an exploratory analysis that tries to identify structure within the data. Cluster
analysis is also called segmentation analysis or taxonomy analysis.
1. Market research:
Cluster analysis is widely used in market research when working with multivariate data from surveys and
test panels . Market researchers use cluster analysis to partition the general population of consumer into
market segments and better understand the relationship between different groups of
consumer/potential customers
In the study of social network, clustering may be used to recognize communities within large groups of
people.
In the process of intelligent grouping of the files and website , clustering may used to create a more
relevant set of search results compared to normal search engines like Google
4. Software evolution:
5. Recommender system:
Recommender system are designed to recommend new item based on a user's tastes. They Sometimes
use clustering algorithms to predict a user's preference based on the preferences Of other users in the
user's cluster.
6. Crime analysis :
Cluster analysis can be used to identify areas where there are greater incidences of particular types of
crime .By identifying these distinct areas or " hot spots" where a similar crime has happened over a
period of time.
Cluster analysis is for example used to identify groups of schools or students with similar properties.
8. Climatology:
9. Petroleum geology :
Discriminant analysis is a regression based statistical technique used in determining which particular
classification or group (such as 'ill' or healthy') an item of data or an object (such as a patient) belongs to
on the basis of its characteristics or essential features. It differs from group building techniques such as
cluster analysis in that the classification or groups to choose from must be known in advance.
Two objectives
(1) When we want to assess the adequacy of classification, group memberships of the objects under
study.
(2) When we wish to assign objects to one of a number of (known) groups of objects.
2) Operates on data sets for which pre-specified, well defined groups already exist.
3) Assesses dependent relationship between one set of discriminating variables and a single grouping
variables; an attempt is made to define the relationship between independent and dependent variables.
4) Extracts dominant, underlying gradients of variation (canonical functions) among groups of sample
entities (e.g. species, sites ,observations ,etc.)from a set of multivariate observations, such that variation
among groups is maximized and variation within groups is minimized along the gradient.
5) Reduces the dimensionality of a multivariate data set by condensing a large number of original
variables into a smaller set of new composite dimensions (canonical functions) with a minimum loss of
information.
6) Summarizes data redundancy by placing similar entities in proximity in canonical space and producing
a parsimonious understanding of the data in terms of a few dominant gradients of variation.
7) Describes maximum differences among pre-specified groups of sampling entities based on a suite of
discriminating characteristics (i.e, canonical analysis of discrimination).
8) Predicts the group membership of future samples, or samples from unknown groups, based on a suite
of classification characteristics (i.e, classification).
9) Extension of multiple regression analysis if the research situation defines the group categories as
dependent upon the discriminating variables, and a single random sample (N) is drawn in which group
membership is "unknown" prior to sampling.
10)Extension of multivariate analysis of variance if values on the discriminating variables are defined as
dependent upon the groups, and separate independent random samples(N1,N2,.....) of two or more
distinct populations (i.e groups) are drawn in which group membership is "known" prior to sampling.
1) Bankruptcy prediction: In bankruptcy prediction based on accounting ratio and other financial
variables, linear discriminant analysis was the first statistical method applied to systematically explain
which firms entered bankruptcy vs. Survived.
2) Face recognition: In computerized face recognition, each face is represented by a large number of
pixel values. Linear discriminant analysis is primarily used here to reduce the number of features to a
more manageable number before classification. Each of the new dimensions is a linear combination of
pixel values, which form a template. The linear combinations obtained using fisher's linear discriminant
are called fisher faces, while those obtained using the related principal component analysis are called
eigenfaces.
3) Marketing: In marketing, discriminant analysis was once often used to determine the factors which
distinguish different types of customers and/or products on the basis of surveys or other forms of
collected data.
4) Biomedical studies: The main application of discriminant analysis in medicine is the assessment of
severity state of a patient and prognosis of diseases outcome. For example, during retrospective analysis,
patients are divided in two groups according to severity of diseases- mild, moderate and severe form.
The term scaling comes from psychometrics, where abstract concepts ("objects") are assigned number
according to a rule. For example, you may want to quantify a person's attitude to global warming. You
could assign a "1" to "doesn't believe in global warming", a 10 to "firmly believes in global warming" and
a scale of 2 to 9 for attitudes in between.
Multidimensional scaling (MDS) is a means of visualizing the level of similarity of individual cases of a
dataset.
STEPS IN CONDUCTING MDS: Following are the steps in conducting MDS research:
1) Formulating the problem: what variables do you want to compare? How many variables do you want
to compare? What purpose is the study to be used for?
2) Obtaining input data: Respondents are asked a series of question. For each product pair they are
asked to rate similarity (usually on a 7-point liner scale from very similar to very dissimilar). The first
question could be for coke /Pepsi for example, the next for coke/hires root beer, the next for Pepsi/Dr
pepper, the next for Dr pepper/Hires root beer, act.
3) Running the MDS statistical program: Software for running the procedure is available in many
statistical software packages.
4) Decide number of dimensions: The researcher must decide on the number of dimensions they want
the computer to create. The more dimensions, the better the statistical fit, but the more difficult it is to
interpret the results.
5) Mapping the results and defining the dimensions: The statistical program (or a related module) will
map the results. The map will plot each product (usually in two-dimensional space). The proximity of
products to each other indicates either how similar they are or how preferred they are, depending on
which approach was used.
6) Test the results for reliability and validity: compute R-squared to determine what proportion of
variance of the scaled data can be accounted for the MDS procedure.
2) Brand Image Analysis: Many marketers use the technique to measure the possible gaps between the
companies’s or brand's positioning with the consumer's brand image perception.
3) Development of New Product: MDS is one of the most powerful tools to be used at the idea
generation or concept testing stage. It helps us to identify quadrants that are less crowded and where a
clear product launch opportunities exists. If the product team has come up with more than one probable
concept the preference of the consumers regarding these could be used by placing the preference on
special map to see which concept finds higher acceptability on multiple dimensions.
4) Pricing studies: the marketer can use subjective maps to assess whether price is making a difference
to the preference or demand of the brand by measuring a spatial map of the competing brand with and
without the criteria of price to assess whether the positioning of the brand is affected by price or not.
5) Assessing Communication Effectiveness: The brand manager could design before and after study to
assess the placement of the brand before and after specific repositioning or a new advertising campaign
to see the impact of the same on the brand perception.