Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 9

MODULE –IV: ADVANCED STATISTICAL TECHNIQUES

INTRODUCTION: - In nature we find number of variables inter- related to one another. For example,
amount of rainfall to certain extent and production of paddy, heat and volume of gas, price and demand
of a commodity in the market etc. Correlation theory aims at finding the degree of relationship that
exist between the variables. A statistical tool with the help of which we can find the degree of
relationship that exists between two or more variables is technically called correlation.

Q1. CHARACTERISTICS OF CORRELATION


Below are some characteristics about the correlation.

1. The correlation of a sample is represented by the letter.

2. The range of possible values for a correlation is between -1 to +1.

3. A positive correlation indicates a positive linear linear association like the one in e.g. the strength of
the positive linear association increase as the correlation becomes closer +1.

4. A negative correlation indicate a negative linear association. The strength of the negative linear
association increases as the correlation become closer to -1.

5. A correlation of either +1 or -1 indicated a prefect linear relationship. This is hard to find with real
data.

6. A correlation of 0 indicated either that: there is no linear relationship between the two variables,
and / or the best straight line through the data is horizontal

Q2. TYPES OF CORRELATION


1. POSITIVE CORRELATION: if the two variables correlated are moving in the same direction then
correlation is called positive i.e. if one variable increases, the other variable also increase or if one
variable decreases, the other variable also decreases. For e. g demand and supply are positively
correlated for if demand increases supply increases, if demand decreases supply decreases I. e. both the
variables demand and supply are moving in the same direction.

2. NEGATIVE CORRELATION:-if the two variables correlation are moving in opposite directions then the
correlation is called negative, i. e if one variable decrease, the other variable increase. For e.g. price and
demand is negatively correlated for if price increase demand decreases and if price decrease s, demand
increases.

3. ZERO CORRELATION: - if there is no correlation between the two variables then the correlation is
called zero correlation or spurious correlation. For example, marks scored by a student in tests and the
amount of rain fall.

4. LINEAR CORRELATION: - if the ratio of the amount of charge in one variable to the amount of charge
in the other variable, bears a constant ratio throughout then, the correlation is said to be linear. This
type of correlation is found only in scientific variables like heat volume of gas.

5. NON- LINEAR OR CURVILINEAR CORRELATION: - if the ratio of the amount of change in one variable
to the amount of change in the other variable does not bear a constant ratio throughout then,
correlation is said to be non- linear or curvilinear. Most of the variables other than scientific variables
show non - linear correlation.

Q. 3) Methods Of Studying Correlation?


Following are the important methods of studying correlation.

1. Scatter diagram method.

2. Karl Pearson’s method.

3. Rank correlation method.

4. Method of least squares.

1. Scatter diagram method-It is a non-mathematical method of studying correlation between two


variables. It gives a rough degree of correlation as well as the direction of the correlation. If the paired
observations in the data as co-ordinates are plotted on the graph, we get a scatter of points on the
plane.By studying the scatter of points; we can roughly estimate the extent of correlation.

Merits and Demerits:

Merits:

1. As it is non-mathematical method, it can be understood very easily.

2. Just by looking at the scatter of points we can have a rough idea about the existence of correlation.

Demerits:

1. As it is a non-mathematical method, we cannot measure exact degree of correlation.

2. Interpretation of the diagram depends on the subjective judgment of the person.

2. [Karl Pearson's coefficient of correlation]

Karl Pearson has given formula to determine the extent of correlation between two related variables.
This co-efficient of correlation is computed by dividing the product of all the deviations of each pair of
observations from their respective means by the product of the standard deviations of the variables and
number of items, symbolically:

 R=

Q.4 Regression Analysis?


In "correlation", we studied how to find the extent of cause and effect relationship between two
variables X and Y.The theory of correlation gives only the degree of relationship between two variables
but No the nature of relationship. That is it does not tell which is the cause and which is the effect. This is
indicated by study off regression. Regression is a statistical mattered with the help of which we can
estimate value of 1 variable for the given value of the other variable. For example, if we know that the
two variables demand and supply are corelated, with the help of regression theory we can estimate the
most probable value of demand
for given value of supply or we can estimate the most probable value of demand for given value of
supply or we can estimate the most probable value of supply for the given value of demand. Regression
phenomenan was first noted by Sir Francis Galton.

Q.5. Distinguish between correlation and regression.

Q 6.Application of Regression?
i)Predictive Analysis: Predictive analysis i.e. forecasting future opportunities abd risks is the most
prominent application of regression analysis in business .Demand analysis, for instance, predicts the
number of items which a consumer will probably purchase. For example ,we can forecast the number of
the shoppers who will pass in front of a particular billboard and the use that data to estimate the
maximum to bid for an advertisement

ii) Operation Efficiency: Regression models can also be used to. optimize business processes. A factory
manager , for example, can create a statistical model to understand the impact of oven temperature on
the shelf life of the cookies baked in those ovens. In a call center, we can analyze the relationship
between wait times of callers and members of complaints

iii) Supporting Decisions: Business today are overloaded with data on finances, operations and customer
purchases. Increasingly, executives are now learning on data analytics to. make informed business
decisions thus eliminating the intuition and gut feel. Regression analysis can bring a scientific angle to
the management of any businesses

iv) Correcting Errors: Regression is not only great for lending empirical support to management decisions
but also for identifying errors in. judgment. For example, a retail store manager may believe that
extending shopping hours will greatly increase sales. Regression analysis, however, may indicate that the
increase in revenue might not be sufficient to support the rise in operating expenses due to longer
working hours (such as additional employee labour charges).Hence, regression analysis can provide
quantative support for decisions and prevent mistakes due to manager's institution

v) New Insights: Over time businesses have gathered a large volume of unorganized data that has the
potential to yeild valuable insights. However, this data is useless without proper analysis. Regression
analysis techniques can find a relationship between different variables by uncovering patterns that were
previously unnoticed. For example, analysis of data from point of sales systems and purchase accounts
may highlight market patterns like increase in demand om certain days of the week or at certain times ot
the year.

Q7) Factor Analysis


Introduction: Factor Analysis is a method for modelling observed variables, and their covariance
structure, in terms of a smaller number of underlying unobservable (latent) "factors" .The factors
typically are viewed as broad concepts or ideas that may describe an observed phenomenon. For
example, a basic desire of obtaining a certain social level might explain most consumption

Factor analysis is a way to take a mass of data and shrinking it to a smaller data set that is more
manageable and more understandable.

The two types: exploratory and confirmatory.


i)Exploratory factor analysis is if you do not have any idea about what structure your data is or how many
dimensions are in set of variables

ii)Confirmatory factor Analysis is used for verification as long as you have a specific idea about what
structure your data is or how many dimensions are in set of variables.

The key concept of factor analysis is that multiple observed variables have similar patterns of responses
because they are all associated with a latent (i.e not directly measured)variables.

For example, people may respond similarly to questions about income, education and occupation which
are all associated with the latent variable socio economic status.

Q8) Basic Terms Relating To Factor Analysis


i) Factor: A factor is an underlying dimension that account for several observed variables.

ii)Communality (h2):Communality, symbolized as h2, shows how much of each variable is accounted for
by the underlying factor taken together.

iii) Eigen value ( or latent root): When we take the sum of squared values of factor loadings relating to a
factor, then such sum is referred to as Eigen value or latent root. Eigen value indicates the relative
importance of each factor in accounting for the particular set of variables being analysed.

iv) Total sum of squares: When given values of all factors are totalled, the resulting value is termed as
the total sum of squares.

v) Rotation: Rotation, in the context of factor analysis, is something like staining a microscope slide. Just
as different stains omit reveal different structures in the tissues, different rotations give results that
appear to be entirely different, but from statistical point of view, all the results are taken as equal, none
superior or inferior to others. However, from the standpoint of making sense of the results of factor
analysis, one must select and the right rotation. If the factors analysis, one must select the right rotation.
If the factors are independent orthogonal rotation is done and if the factors are correlated, an oblique
rotation is made.

vi) Factor -loadings: loadings are those values which explain how closely the variables are related to
each one of the factors discovered. They are also known as factor-variable correlations.

Q.9) Application of factor analysis


i) Interdependency and pattern delineation : If a scientist has a table of data--say, UN votes, personality
characteristics, or answer to a questionnaire --and if he suspects that these data are interrelated in a
complex fashion, then factor analysis may be used to untangle the linear relationship into their separate
patterns.

ii) parsimony or data reduction : Factor analysis can be useful for reducing a mass of information to an
economical description. For example, data on fifty characteristics for 300 nations are unwieldy to handle,
descriptively or analytically. Nations can be more easily discussed and compared on economic
development, size, and politics dimensions, for example, than on the hundreds of characteristics each
dimension involves.
iii) structure : Factor analysis may be employed to discover the basic structure of a domain. As a case in
point, a scientist may want to uncover the primary independent lines or dimensions--such as size,
leadership, and age--of variation in group characteristics and behaviour.

iv) Classification or description : Factor analysis is a tool for developing an empirical typology. 7 It can be
used to group interdependent variables into descriptive categories, such as ideology, revolution, liberal
voting and authoritarianism. It can be used to classify nation profiles into types with similar
characteristics or behavior.

V) scaling: A scientist often wishes to develop a scale on which individuals, groups, or nations can be
rated and compared. The scale may refer to such phenomenon as political participation, voting
behaviour, or conflict. A problem in developing a scale is to weight the characteristics being combined.
Factor analysis offers a solution by dividing the characteristics into independent sources of variation
(factors ).Each factor then represents a scale based on the empirical relationship among the
characteristics.

Vi) Hypothesis testing : Hypotheses abound regarding dimensions of attitude, personality, group, social
behaviour, voting, and conflict. Since the meaning usually associated with "dimension" is that of a cluster
or group of highly intercorrelated characteristics or behaviour, factor analysis may be used to test for
their empirical existence.

Vii) Data transformation: Factor analysis can be used to transform data to meet the assumptions of
other techniques. A large number of dependent variables also can be reduced through factor analysis.

Viii) Exploration : In a new domain of scientific interest like peace research, the complex interrelation of
phenomena have undergone little systematic investigation. The unknown domain may be explored
through factor analysis. It can reduce complex interrelationships to relatively simple linear expression.

ix)mapping: Besides facilitating exploration, factor analysis also enables a scientist to map the social
terrain. These concepts may then be used to describe a domain or to serve as inputs to further research.
Some social domain, such as international relations, family life, and public administration, have yet to be
charted. In Some other areas, however, such as personality, abilities, attitudes, and cognitive meaning,
considerable mapping has been done.

Q 10) cluster analysis


Introduction :

Cluster analysis is a data exploration (mining) tool for dividing a multivariable dataset into "natural"
clusters (groups). We use the methods to explore whither previously undefined clusters (group) exist in
the dataset. For instance, a marketing department may wish to use survey results to sort its customers
into categories.

Cluster analysis is multivariate method which aims to classify a sample of subject on the basis of a set of
measured variable into a number of different groups such that similar subjects are placed in the same
group. An example where this might be used is in the field of psychiatry, where the characterisation of
patients in the basis of clusters of symptoms can be useful in the identification of an appropriate for me
of therapy. In marketing, it may be useful to identify distinct group of potential customers so that, for
example, advertising can be appropriate targeted.
Cluster analysis is an exploratory analysis that tries to identify structure within the data. Cluster
analysis is also called segmentation analysis or taxonomy analysis.

11). AAPLICATION OF CLUSTER ANALYSIS


On PET scans, cluster analysis can be used to differentiate between different types of Tissue in a three-
dimensional image for many different purpose

1. Market research:

Cluster analysis is widely used in market research when working with multivariate data from surveys and
test panels . Market researchers use cluster analysis to partition the general population of consumer into
market segments and better understand the relationship between different groups of
consumer/potential customers

2. Social network analysis:

In the study of social network, clustering may be used to recognize communities within large groups of
people.

3. Search result grouping:

In the process of intelligent grouping of the files and website , clustering may used to create a more
relevant set of search results compared to normal search engines like Google

4. Software evolution:

It is a form of restructuring and henceis way of direct preventative maintenance.

5. Recommender system:

Recommender system are designed to recommend new item based on a user's tastes. They Sometimes
use clustering algorithms to predict a user's preference based on the preferences Of other users in the
user's cluster.

6. Crime analysis :

Cluster analysis can be used to identify areas where there are greater incidences of particular types of
crime .By identifying these distinct areas or " hot spots" where a similar crime has happened over a
period of time.

7. Educational data mining:

Cluster analysis is for example used to identify groups of schools or students with similar properties.

8. Climatology:

To find whether regimes or preferred sea level pressure atmospheric patterns.

9. Petroleum geology :

Cluster analysis is used to reconstruct missing bottom hole core data.


V) physical geography: The clustering of chemical properties in different sample location s.

12. DISCRIMINANT ANALYSIS


introduction

Discriminant analysis is a regression based statistical technique used in determining which particular
classification or group (such as 'ill' or healthy') an item of data or an object (such as a patient) belongs to
on the basis of its characteristics or essential features. It differs from group building techniques such as
cluster analysis in that the classification or groups to choose from must be known in advance.

Discriminant analysis is a form of multivariate analysis in which the objective is to establish a


discriminant function. The function (typically a mathematical formula) discriminates between individuals
in the population and allocates each of them to a group within the population. The function is
established on the basis of a series of measurements or observations made on the individuals.

Two objectives

Discriminate Analysis may be used for two objectives:

(1) When we want to assess the adequacy of classification, group memberships of the objects under
study.

(2) When we wish to assign objects to one of a number of (known) groups of objects.

Q13.CHARACTERISTICS OF DISCRIMINANT ANALYSIS


1) Essentially a single techniques consisting of a couple of closely related procedures.

2) Operates on data sets for which pre-specified, well defined groups already exist.

3) Assesses dependent relationship between one set of discriminating variables and a single grouping
variables; an attempt is made to define the relationship between independent and dependent variables.

4) Extracts dominant, underlying gradients of variation (canonical functions) among groups of sample
entities (e.g. species, sites ,observations ,etc.)from a set of multivariate observations, such that variation
among groups is maximized and variation within groups is minimized along the gradient.

5) Reduces the dimensionality of a multivariate data set by condensing a large number of original
variables into a smaller set of new composite dimensions (canonical functions) with a minimum loss of
information.

6) Summarizes data redundancy by placing similar entities in proximity in canonical space and producing
a parsimonious understanding of the data in terms of a few dominant gradients of variation.

7) Describes maximum differences among pre-specified groups of sampling entities based on a suite of
discriminating characteristics (i.e, canonical analysis of discrimination).

8) Predicts the group membership of future samples, or samples from unknown groups, based on a suite
of classification characteristics (i.e, classification).
9) Extension of multiple regression analysis if the research situation defines the group categories as
dependent upon the discriminating variables, and a single random sample (N) is drawn in which group
membership is "unknown" prior to sampling.

10)Extension of multivariate analysis of variance if values on the discriminating variables are defined as
dependent upon the groups, and separate independent random samples(N1,N2,.....) of two or more
distinct populations (i.e groups) are drawn in which group membership is "known" prior to sampling.

Q14) APPLICATIONS OF DISCRIMINANT ANALYSIS


Applications of discriminant analysis are as the follow:

1) Bankruptcy prediction: In bankruptcy prediction based on accounting ratio and other financial
variables, linear discriminant analysis was the first statistical method applied to systematically explain
which firms entered bankruptcy vs. Survived.

2) Face recognition: In computerized face recognition, each face is represented by a large number of
pixel values. Linear discriminant analysis is primarily used here to reduce the number of features to a
more manageable number before classification. Each of the new dimensions is a linear combination of
pixel values, which form a template. The linear combinations obtained using fisher's linear discriminant
are called fisher faces, while those obtained using the related principal component analysis are called
eigenfaces.

3) Marketing: In marketing, discriminant analysis was once often used to determine the factors which
distinguish different types of customers and/or products on the basis of surveys or other forms of
collected data.

4) Biomedical studies: The main application of discriminant analysis in medicine is the assessment of
severity state of a patient and prognosis of diseases outcome. For example, during retrospective analysis,
patients are divided in two groups according to severity of diseases- mild, moderate and severe form.

Q15) MULTIDIMENSIONAL SCALING


Multidimensional scaling is a visual representation of distances or dissimilarities between sets of
objects. "Objects" can be colors, faces, map coordinates, political persuasion, or any kind of real or
conceptual stimuli. Objects that are more similar (or have shorter distance) are closer together on the
graph than objects that are less similar (or have longer distances).

The term scaling comes from psychometrics, where abstract concepts ("objects") are assigned number
according to a rule. For example, you may want to quantify a person's attitude to global warming. You
could assign a "1" to "doesn't believe in global warming", a 10 to "firmly believes in global warming" and
a scale of 2 to 9 for attitudes in between.

Multidimensional scaling (MDS) is a means of visualizing the level of similarity of individual cases of a
dataset.

STEPS IN CONDUCTING MDS: Following are the steps in conducting MDS research:

1) Formulating the problem: what variables do you want to compare? How many variables do you want
to compare? What purpose is the study to be used for?
2) Obtaining input data: Respondents are asked a series of question. For each product pair they are
asked to rate similarity (usually on a 7-point liner scale from very similar to very dissimilar). The first
question could be for coke /Pepsi for example, the next for coke/hires root beer, the next for Pepsi/Dr
pepper, the next for Dr pepper/Hires root beer, act.

3) Running the MDS statistical program: Software for running the procedure is available in many
statistical software packages.

4) Decide number of dimensions: The researcher must decide on the number of dimensions they want
the computer to create. The more dimensions, the better the statistical fit, but the more difficult it is to
interpret the results.

5) Mapping the results and defining the dimensions: The statistical program (or a related module) will
map the results. The map will plot each product (usually in two-dimensional space). The proximity of
products to each other indicates either how similar they are or how preferred they are, depending on
which approach was used.

6) Test the results for reliability and validity: compute R-squared to determine what proportion of
variance of the scaled data can be accounted for the MDS procedure.

Q16) APPLICATION OF MULTIDIMENSIONAL SCALING


1) Scale construction: MSD gives a composite picture about how the respondent views the object or
brand or city etc.when compared to other in the category. This can be done using similarity or preference
data the researcher tries to name the the dimensions that could have been the basis of the comparison-
for example in the illustrations about cities, the researchers may feel that the two dimensions used by
the respondent were -(i) City culture and (ii) job opportunities.

2) Brand Image Analysis: Many marketers use the technique to measure the possible gaps between the
companies’s or brand's positioning with the consumer's brand image perception.

3) Development of New Product: MDS is one of the most powerful tools to be used at the idea
generation or concept testing stage. It helps us to identify quadrants that are less crowded and where a
clear product launch opportunities exists. If the product team has come up with more than one probable
concept the preference of the consumers regarding these could be used by placing the preference on
special map to see which concept finds higher acceptability on multiple dimensions.

4) Pricing studies: the marketer can use subjective maps to assess whether price is making a difference
to the preference or demand of the brand by measuring a spatial map of the competing brand with and
without the criteria of price to assess whether the positioning of the brand is affected by price or not.

5) Assessing Communication Effectiveness: The brand manager could design before and after study to
assess the placement of the brand before and after specific repositioning or a new advertising campaign
to see the impact of the same on the brand perception.

You might also like