Professional Documents
Culture Documents
Rabia 29
Rabia 29
Assumption of correlation
The assumption of correlation refers to the idea that two or more variables are related or
connected in some way. In statistical analysis and research, this assumption is crucial because
many statistical techniques, such as correlation and regression analysis, rely on the presence of a
1. Positive correlation: As one variable increases, the other variable also tends to increase.
2. Negative correlation: As one variable increases, the other variable tends to decrease.
rho, which range from -1 (perfect negative correlation) to 1 (perfect positive correlation).
correlation does not imply causation, meaning that just because two variables are related, it
doesn't mean that one causes the other. Additional analysis and experimentation are often
Assuming correlation when none exists (or ignoring the possibility of no correlation) can lead to
incorrect conclusions and flawed decision-making. Therefore, it's essential to carefully evaluate
the relationship between variables and test for correlation using appropriate statistical methods.
Pearson's correlation:
Pearson correlation, also known as Pearson's r, is a statistical measure that assesses the strength
and direction of the linear relationship between two continuous variables. It is defined as the
covariance of the two variables divided by the product of their standard deviations.
3. Homoscedasticity: The variance of the residuals should be constant across all levels of the
independent variable.
4. Normality: The data should follow a normal distribution or the sample size should be
5. No or little multicollinearity: The two variables should not be highly correlated with each
other.
- Unreliable conclusions
Spearman's rank correlation coefficient (ρ) is a statistical measure that assesses the strength and
direction of the monotonic relationship between two variables. It is defined as the correlation
equal intervals between the data points. It is used to evaluate the relationship between two
variables when the data is ordinal or continuous, but not necessarily normally distributed.
4. No ties or few ties: There should be no or few tied values in the data.
- Normality
- Linearity
Spearman's correlation is a non-parametric test, which means it is less sensitive to outliers and
- Unreliable conclusions
t-test
A t-test for correlation, also known as a t-test for Pearson's r, is a statistical test used to determine
whether the correlation between two continuous variables is significantly different from zero. It
examines whether the observed correlation coefficient (r) is statistically significant, indicating a
4. Homoscedasticity: The variance of the residuals should be constant across all levels of the
independent variable.
5. No or little multicollinearity: The two variables should not be highly correlated with each
other.
- The test assumes that the data is randomly sampled from the population.
- The test is used to examine the correlation between two variables, not the difference between
means.
Formula:
t = r√(n-2) / √(1-r^2)
t = t-statistic
r = correlation coefficient (Pearson's r)
n = sample
- The t-test for correlation is used to test the significance of the correlation coefficient (r).
- The test produces a t-statistic and a p-value, which indicate the probability of observing the
correlation by chance.
- If the p-value is below a certain significance level (e.g., 0.05), the correlation is considered
statistically significant.
ANOVA correlation
ANOVA is used to compare the means of two or more groups to determine if there is a
significant difference between them. It analyzes the variance in a continuous outcome variable to
Correlation, on the other hand, measures the strength and direction of the linear relationship
between two continuous variables. It analyzes the covariance between the variables to determine
Types of ANOVA:
2. Two-way ANOVA: Used to examine the effects of two independent variables on a continuous
outcome variable.
3. Repeated measures ANOVA: Used to compare means across three or more time points or
conditions.
4. Mixed ANOVA: Combines elements of one-way and two-way ANOVA to examine the
6. Randomized block ANOVA: Used to compare means across three or more groups while
7. Split-plot ANOVA: Used to examine the effects of a whole-plot factor and a subplot factor on
8. Latin square ANOVA: Used to examine the effects of multiple independent variables on a
ANOVA (Analysis of Variance) and correlation have some common assumptions, as well as
some unique assumptions. Here are the key assumptions for ANOVA and correlation:
Common Assumptions:
1. Equal variances: The variance of the data should be equal across all groups or levels of the
independent variable.
2. Random sampling: The samples should be randomly selected from the population.
3. No significant outliers: There should be no significant outliers in the data.
2. Homoscedasticity: The variance of the data should be constant across all levels of the
independent variable.
4. No multicollinearity: The independent variables should not be highly correlated with each
other.
Violating these assumptions can lead to inaccurate or misleading results. It's essential to check
these assumptions before conducting ANOVA or correlation analysis, and to take appropriate
- Transforming the data (e.g., logarithmic or square root transformation) to meet the normality
assumption
Regression is a statistical method used to establish a relationship between two or more variables,
where the goal is to predict the value of one variable (dependent variable) based on the values of
others (independent variables). It helps to identify the strength and direction of the relationship
between variables.
Types of Regression:
1. Simple Linear Regression: One independent variable is used to predict the dependent
variable.
2. Multiple Linear Regression: More than one independent variable is used to predict the
dependent variable.
3. Non-Linear Regression: The relationship between variables is not linear, but rather curved or
non-linear.
4. Logistic Regression: Used for binary dependent variables (0/1, yes/no) to predict the
equation.
6. Ridge Regression: A type of regularized linear regression that reduces the magnitude of
regression coefficients.
7. Lasso Regression: A type of regularized linear regression that sets some regression
coefficients to zero.
hierarchical order.
1. Linearity: The relationship between the independent variable(s) and the dependent variable
should be linear.
3. Homoscedasticity: The variance of the residuals should be constant across all levels of the
independent variable(s).
5. No or little multicollinearity: The independent variables should not be highly correlated with
each other.
7. Constant variance: The variance of the residuals should be constant across all levels of the
independent variable(s).
8. Random sampling: The data should be randomly sampled from the population.
9. No omitted variable bias: All relevant independent variables should be included in the
model.
10. No measurement error: The independent and dependent variables should be measured
accurately.
References:
psychology and management (2nd ed.). Thousand Oaks, CA: Sage Publications.
https://www.statisticssolutions.com/free-resources/directory-of-statistical-
analyses/correlation-pearson-kendall-spearman/