Data Analysis

DATA ANALYSIS
 STATISTUCAL ANALYSIS
- Statistical analysis is the process of collecting large volumes of data and then using statistics and other data
analysis techniques to identify trends, patterns, and insights.
- Types of statistical analysis
o There are two main types of statistical analysis: descriptive and inferential
 Descriptive statistics summarizes the information within a data set without drawing conclusions about
its contents.
 For example, if a business gave you a book of its expenses and you summarized the
percentage of money it spent on different categories of items, then you would be performing a
form of descriptive statistics.
When performing descriptive statistics, you will often use data visualization to present information in
the form of graphs, tables, and charts to clearly convey it to others in an understandable format.
Typically, leaders in a company or organization will then use this data to guide their decision making
going forward.
 Inferential statistics takes the results of descriptive statistics one step further by drawing conclusions
from the data and then making recommendations.
 For example, instead of only summarizing the business's expenses, you might go on to
recommend in which areas to reduce spending and suggest an alternative budget.
Inferential statistical analysis is often used by businesses to inform company decisions and in scientific
research to find new relationships between variables.
 DESCRIPTIVE ANALYSIS
- Descriptive analysis is a sort of data research that aids in describing, demonstrating, or helpfully summarizing data points
so those patterns may develop that satisfy all of the conditions of the data.
It is the technique of identifying patterns and links by utilizing recent and historical data. Because it identifies patterns
and associations without going any further, it is frequently referred to as the most basic data analysis.
 CORRELATIONAL RESEARCH
- Spearman rank
o Spearman's correlation measures the strength and direction of monotonic association between two variables.
Monotonicity is "less restrictive" than that of a linear relationship.
 For example, the middle image above shows a relationship that is monotonic, but not linear.
o What is the definition of Spearman's rank-order correlation?
 There are two methods to calculate Spearman's correlation depending on whether: (1) your data does
not have tied ranks or (2) your data has tied ranks. The formula for when there are no tied ranks is:
 Spearman Formula
 where di = difference in paired ranks and n = number of cases.
 The formula to use when there are tied ranks is:
 Spearman Formula
 where i = paired score.
- Chi square
o The Chi-Square test is a statistical procedure for determining the difference between observed and expected
data.
 This test can also be used to determine whether it correlates to the categorical variables in our data. It
helps to find out whether a difference between two categorical variables is due to chance or a
relationship between them.
 Formula For Chi-Square Test
Where:
C = Degrees of freedom
O = Observed Value
E = Expected Value
 The degrees of freedom in a statistical calculation represent the number of variables that can vary in a
calculation. The degrees of freedom can be calculated to ensure that chi-square tests are statistically
valid. These tests are frequently used to compare observed data with data that would be expected to be
obtained if a particular hypothesis were true.
 The Observed values are those you gather yourselves.
 The expected values are the frequencies expected, based on the null hypothesis.
 EXPIREMENT AND CAUSAL – COMPARATIVE RESEARCH

- T – TEST
o A t test is a statistical test that is used to compare the means of two groups. It is often used in hypothesis testing
to determine whether a process or treatment actually has an effect on the population of interest, or whether two
groups are different from one another.
o When to use a t test
 A t test can only be used when comparing the means of two groups (a.k.a. pairwise comparison). If you
want to compare more than two groups, or if you want to do multiple pairwise comparisons, use an
ANOVA test or a post-hoc test.
 The t test is a parametric test of difference, meaning that it makes the same assumptions about your
data as other parametric tests. The t test assumes your data:
 are independent
 are (approximately) normally distributed
 have a similar amount of variance within each group being compared (a.k.a. homogeneity of
variance)
 If your data do not fit these assumptions, you can try a nonparametric alternative to the t test, such as
the Wilcoxon Signed-Rank test for data with unequal variances
o What type of t test should I use?
 When choosing a t test, you will need to consider two things: whether the groups being compared
come from a single population or two different populations, and whether you want to test the
difference in a specific direction.
One-sample, two-sample, or paired t test?
If the groups come from a single population (e.g., measuring before and after an experimental treatment),
perform a paired t test. This is a within-subjects design.
If the groups come from two different populations (e.g., two different species, or people from two separate
cities), perform a two-sample t test (a.k.a. independent t test). This is a between-subjects design.
If there is one group being compared against a standard value (e.g., comparing the acidity of a liquid to a neutral
pH of 7), perform a one-sample t test.
One-tailed or two-tailed t test?
If you only care whether the two populations are different from one another, perform a two-tailed t test.
If you want to know whether one population mean is greater than or less than the other, perform a one-tailed t
test.
o T test formula
The formula for the two-sample t test (a.k.a. the Student’s t-test) is shown below.
- ANOVA
o Developed by Ronald Fisher, ANOVA stands for Analysis of Variance.
 One-Way Analysis of Variance tells you if there are any statistical differences between the means of
three or more independent groups.
o You might use Analysis of Variance (ANOVA) as a marketer, when you want to test a particular hypothesis.
 You would use ANOVA to help you understand how your different groups respond, with a null
hypothesis for the test that the means of the different groups are equal. If there is a statistically
significant result, then it means that the two populations are unequal (or different).
o What is the difference between one-way and two-way ANOVA tests?
 This is defined by how many independent variables are included in the ANOVA test.
 One-way means the analysis of variance has one independent variable.
 Two-way means the test has two independent variables.
o An example of this may be the independent variable being a brand of drink (one-
way), or independent variables of brand of drink and how many calories it has or
whether it’s original or diet.
o
- WILCOXON
o The Wilcoxon test, which can refer to either the rank sum test or the signed rank test version, is a
nonparametric statistical test that compares two paired groups. The tests essentially calculate the difference
between sets of pairs and analyze these differences to establish if they are statistically significantly different
from one another.
- U – TEST
o Mann-Whitney U test is the non-parametric alternative test to the independent sample t-test. It is a non-
parametric test that is used to compare two sample means that come from the same population , and used to test
whether two sample means are equal or not.
o Mann-Whitney U test is a non-parametric test, so it does not assume any assumptions related to the distribution
of scores. There are, however, some assumptions that are assumed
 1. The sample drawn from the population is random.
 2. Independence within the samples and mutual independence is assumed. That means that an
observation is in one group or the other (it cannot be in both).
 3. Ordinal measurement scale is assumed.
o Calculation of the Mann-Whitney U:
Where:
U=Mann-Whitney U test
N1 = sample size one
N2= Sample size two
Ri = Rank of the sample size
- Z – TEST
o A z-test is a statistical test used to determine whether two population means are different when the variances are known and
the sample size is large.
o The test statistic is assumed to have a normal distribution, and nuisance parameters such as standard deviation should be
known in order for an accurate z-test to be performed.

Data Analysis

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Analysis

Uploaded by

Copyright:

Available Formats

DATA ANALYSIS

 EXPIREMENT AND CAUSAL – COMPARATIVE RESEARCH

One-tailed or two-tailed t test?

You might also like