Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

Submitted by:Rabia

Roll No: 29.

Assumption of correlation
The assumption of correlation refers to the idea that two or more variables are related or

connected in some way. In statistical analysis and research, this assumption is crucial because

many statistical techniques, such as correlation and regression analysis, rely on the presence of a

correlation between variables.

There are different types of correlations, including:

1. Positive correlation: As one variable increases, the other variable also tends to increase.

2. Negative correlation: As one variable increases, the other variable tends to decrease.

3. No correlation: There is no systematic relationship between the variables.

Correlation can be measured using statistical coefficients, such as Pearson's r or Spearman's

rho, which range from -1 (perfect negative correlation) to 1 (perfect positive correlation).

correlation does not imply causation, meaning that just because two variables are related, it

doesn't mean that one causes the other. Additional analysis and experimentation are often

necessary to establish causality.

Assuming correlation when none exists (or ignoring the possibility of no correlation) can lead to

incorrect conclusions and flawed decision-making. Therefore, it's essential to carefully evaluate

the relationship between variables and test for correlation using appropriate statistical methods.

Pearson's correlation:
Pearson correlation, also known as Pearson's r, is a statistical measure that assesses the strength

and direction of the linear relationship between two continuous variables. It is defined as the

covariance of the two variables divided by the product of their standard deviations.

The Pearson correlation coefficient (r) ranges from -1 to 1, where:

- r = 1 indicates a perfect positive linear relationship

- r = -1 indicates a perfect negative linear relationship

- r = 0 indicates no linear relationship

Pearson's correlation coefficient (r) assumes:

1. Linearity: The relationship between the two variables should be linear.

2. Independence: Each data point should be independent of the others.

3. Homoscedasticity: The variance of the residuals should be constant across all levels of the

independent variable.

4. Normality: The data should follow a normal distribution or the sample size should be

sufficiently large (>30) to assume normality.

5. No or little multicollinearity: The two variables should not be highly correlated with each

other.

6. Continuous data: The data should be continuous or ordinal.

7. No outliers: There should be no extreme outliers in the data.

Violating these assumptions can lead to:


- Inaccurate correlation coefficients

- Inflated or deflated correlation coefficients

- Increased Type I error rate (false positives)

- Decreased power (ability to detect real relationships)

- Unreliable conclusions

Spearman's rank correlation:

Spearman's rank correlation coefficient (ρ) is a statistical measure that assesses the strength and

direction of the monotonic relationship between two variables. It is defined as the correlation

between the ranked values of the two variables.

Spearman's correlation is a non-parametric test, which means it doesn't assume normality or

equal intervals between the data points. It is used to evaluate the relationship between two

variables when the data is ordinal or continuous, but not necessarily normally distributed.

Spearman's correlation coefficient (ρ) ranges from -1 to 1, where:

- ρ = 1 indicates a perfect positive monotonic relationship

- ρ = -1 indicates a perfect negative monotonic relationship

- ρ = 0 indicates no monotonic relationship

Spearman's rank correlation coefficient (ρ) assumes:

1. Ordinal data: The data should be ordinal or continuous.

2. Independence: Each data point should be independent of the others.


3. Monotonic relationship: The relationship between the two variables should be monotonic

(either increasing or decreasing).

4. No ties or few ties: There should be no or few tied values in the data.

5. No outliers: There should be no extreme outliers in the data.

Note that Spearman's correlation does not assume:

- Normality

- Equal intervals between the data points

- Linearity

Spearman's correlation is a non-parametric test, which means it is less sensitive to outliers and

non-normality than Pearson's correlation.

Violating these assumptions can lead to:

- Inaccurate correlation coefficients

- Inflated or deflated correlation coefficients

- Increased Type I error rate (false positives)

- Decreased power (ability to detect real relationships)

- Unreliable conclusions

t-test

A t-test for correlation, also known as a t-test for Pearson's r, is a statistical test used to determine

whether the correlation between two continuous variables is significantly different from zero. It
examines whether the observed correlation coefficient (r) is statistically significant, indicating a

real relationship between the variables.

Assumptions of t-test for correlation:

1. Normality: The data should follow a bivariate normal distribution.

2. Independence: Each observation should be independent of the others.

3. Linearity: The relationship between the two variables should be linear.

4. Homoscedasticity: The variance of the residuals should be constant across all levels of the

independent variable.

5. No or little multicollinearity: The two variables should not be highly correlated with each

other.

6. Continuous data: The data should be continuous or ordinal.

7. No outliers: There should be no extreme outliers in the data

- The t-test for correlation is sensitive to outliers and non-normality.

- The test assumes that the data is randomly sampled from the population.

- The test is used to examine the correlation between two variables, not the difference between

means.

Formula:

t = r√(n-2) / √(1-r^2)

t = t-statistic
r = correlation coefficient (Pearson's r)

n = sample

- The t-test for correlation is used to test the significance of the correlation coefficient (r).

- The test produces a t-statistic and a p-value, which indicate the probability of observing the

correlation by chance.

- If the p-value is below a certain significance level (e.g., 0.05), the correlation is considered

statistically significant.

ANOVA correlation

ANOVA is used to compare the means of two or more groups to determine if there is a

significant difference between them. It analyzes the variance in a continuous outcome variable to

determine if the differences between groups are statistically significant.

Correlation, on the other hand, measures the strength and direction of the linear relationship

between two continuous variables. It analyzes the covariance between the variables to determine

if they tend to change together.

Types of ANOVA:

There are several types of ANOVA (Analysis of Variance), including:

1. One-way ANOVA: Used to compare means across three or more groups.

2. Two-way ANOVA: Used to examine the effects of two independent variables on a continuous

outcome variable.
3. Repeated measures ANOVA: Used to compare means across three or more time points or

conditions.

4. Mixed ANOVA: Combines elements of one-way and two-way ANOVA to examine the

effects of both between-subjects and within-subjects factors.

5. Factorial ANOVA: Used to examine the effects of multiple independent variables on a

continuous outcome variable.

6. Randomized block ANOVA: Used to compare means across three or more groups while

controlling for the effects of a blocking variable.

7. Split-plot ANOVA: Used to examine the effects of a whole-plot factor and a subplot factor on

a continuous outcome variable.

8. Latin square ANOVA: Used to examine the effects of multiple independent variables on a

continuous outcome variable in a balanced and efficient design.

9. Analysis of covariance (ANCOVA): Used to examine the effects of an independent variable

on a continuous outcome variable while controlling for the effects of a covariate.

ANOVA (Analysis of Variance) and correlation have some common assumptions, as well as

some unique assumptions. Here are the key assumptions for ANOVA and correlation:

Common Assumptions:

1. Equal variances: The variance of the data should be equal across all groups or levels of the

independent variable.

2. Random sampling: The samples should be randomly selected from the population.
3. No significant outliers: There should be no significant outliers in the data.

Assumptions specific to Correlation:

1. Linearity: The relationship between the variables should be linear.

2. Homoscedasticity: The variance of the data should be constant across all levels of the

independent variable.

3. No significant outliers: There should be no significant outliers in the data.

4. No multicollinearity: The independent variables should not be highly correlated with each

other.

Violating these assumptions can lead to inaccurate or misleading results. It's essential to check

these assumptions before conducting ANOVA or correlation analysis, and to take appropriate

measures to address any violations.

Some common ways to address assumption violations include:

- Transforming the data (e.g., logarithmic or square root transformation) to meet the normality

assumption

- Using non-parametric tests or robust statistical methods to accommodate non-normal data

- Using variance-stabilizing transformations to address unequal variances

- Removing or winsorizing outliers to meet the assumption of no significant outliers

- Using regularization techniques (e.g., ridge regression) to address multicollinearity.


Regression

Regression is a statistical method used to establish a relationship between two or more variables,

where the goal is to predict the value of one variable (dependent variable) based on the values of

others (independent variables). It helps to identify the strength and direction of the relationship

between variables.

Types of Regression:

1. Simple Linear Regression: One independent variable is used to predict the dependent

variable.

2. Multiple Linear Regression: More than one independent variable is used to predict the

dependent variable.

3. Non-Linear Regression: The relationship between variables is not linear, but rather curved or

non-linear.

4. Logistic Regression: Used for binary dependent variables (0/1, yes/no) to predict the

probability of an event occurring.

5. Polynomial Regression: The relationship between variables is modeled using a polynomial

equation.

6. Ridge Regression: A type of regularized linear regression that reduces the magnitude of

regression coefficients.
7. Lasso Regression: A type of regularized linear regression that sets some regression

coefficients to zero.

8. Elastic Net Regression: A combination of Ridge and Lasso regression.

9. Stepwise Regression: A method of selecting the most significant independent variables.

10. Hierarchical Regression: A method of analyzing the effects of independent variables in a

hierarchical order.

Here are the assumptions of regression analysis:

1. Linearity: The relationship between the independent variable(s) and the dependent variable

should be linear.

2. Independence: Each observation should be independent of the others.

3. Homoscedasticity: The variance of the residuals should be constant across all levels of the

independent variable(s).

4. Normality: The residuals should be normally distributed.

5. No or little multicollinearity: The independent variables should not be highly correlated with

each other.

6. No autocorrelation: The residuals should not be correlated with each other.

7. Constant variance: The variance of the residuals should be constant across all levels of the

independent variable(s).

8. Random sampling: The data should be randomly sampled from the population.
9. No omitted variable bias: All relevant independent variables should be included in the

model.

10. No measurement error: The independent and dependent variables should be measured

accurately.

References:

 Algina, J., & Keselman, H. J. (1999). Comparing squared multiple correlation

coefficients: Examination of a confidence interval and a test significance.

 Psychological Methods, 4(1), 76-83.

 Bobko, P. (2001). Correlation and regression: Applications for industrial organizational

psychology and management (2nd ed.). Thousand Oaks, CA: Sage Publications.

 https://www.statisticssolutions.com/free-resources/directory-of-statistical-

analyses/correlation-pearson-kendall-spearman/

 R. Kothari (1990) Research Methodology. Vishwa Prakasan. India.

You might also like