Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Name: Jazmine Ibarra 

Calculating for Data Normality

 
Normally Distributed Data: ​bell-curve shape, the tails are the same length on both sides. This data is 
analyzed using parametric tests. Parametric means that the data follows a normal distribution 
pattern.  
Not Normally Distributed:​ ​ skewed shape, one tail is longer on one side, not a symmetric shape. This 
data is analyzed using non-parametric tests.  
Skew:​ ​A positive skew value means the tail is longer on the right side, and the mean is higher than 
the median value. A negative skew value means the tail is longer on the left side, and the mean is 
lower than the median value. ​A skew of 0 means the data is symmetrical.  
Kurtosis:​ ​ How the data is clumped together. A positive kurtosis value means that there is a high 
peak and the shape of the curve is narrow. A negative kurtosis value means that there is a low/flat 
peak, and the shape is flat/wide.​ A kurtosis of 0 means the shape is symmetrical.  
P-Value:​ ​Is the probability value used in statistical analysis to either support or reject the null 
hypothesis. Typically the p-value is compared to a significance/alpha level of (0.05).  
● For data normality using a Shapiro-Wilks Test, ​If a p-value is less than 0.05​, then 
the data is​ NOT​ normally distributed. 
● For data normality using a Shapiro-Wilks Test, ​If a p-value is greater than 0.05​, 
then the data ​IS​ normally distributed. 
 
The following exercises are designed to reinforce the above concepts and to inspire you to think 
about the different ways in which data can be presented and interpreted. In a post below, report 
on the following activities/questions. Attach your spreadsheet to your post with your work ​AND 
include a Histogram of your data within your spreadsheet.  

 
Practice Example (Google Sheets):  

Imagine you are comparing the size of hardwood trees in two different study watersheds to assess 
the successional stage of each forest. You choose to measure the DBH (diameter at breast height; 
a commonly used metric in forestry) of a sample of trees in each watershed. This results in the 
following data: 

a. DBH of trees in "Watershed A" (in cm): 84, 78, 51, 62, 55, 72, 34, 45, 89 
b. DBH of trees in "Watershed B" (in cm): 54, 118, 142, 5, 36, 115, 12, 14,  
c. Enter the above data into Google Sheets and insert a Histogram chart for your 
data. Remember to combine “Watershed A & B” in one column and the numerical 
values corresponding to each category in the second column. Select Insert -> Chart 
and scroll down to select Histogram. You can customize the bins/buckets for data 
range columns in the histogram to give you a better idea of the shape of the 
dataset. Select function to calculate the average (mean) and median, as well as the 
skew and kurtosis for your dataset.  

Watershed A & B Dataset (Google Sheets)  

● Mean: 62.705 
● Median: 55 
● Kurtosis: -0.332 
● Skew: 0.432 
 
1. Are the mean and median values the same or different?  
The mean and median values are different. 
 
2. Is the kurtosis and skew value 0 or different?  
The kurtosis and skew value is different (kurtosis value is -0.332 and skew value is 0.432).  
 
3. Based on this information, and the shape of your histogram, would you assume that your 
data is normally distributed or not normally distributed?  
Based on the above information and the shape of my histogram, I would assume that my 
data is not normally distributed because the tail of the bell curve is longer on the positive side 
(right side). The bell curve shows that the data set is more positively skewed. 
 
4. Copy and paste your histogram graph below.  
 
 
 
Watershed A & B Dataset (Stats Kingdom)  

Go to the ​stats kingdom link​ for calculating normality using a Shapiro-Wilks Test. Clear the 
example data they have, and input the watershed dataset. Make sure the alpha (significance) level 
is set to 0.05, and keep the outliers included. Select Calculate.  

● P-value: 0.840008 
 
5. What is the outcome of the Shapiro-Wilks Test? In other words, is the data normally or not 
normally distributed?  
 
The outcome of the Shapiro-Wilks Test supports that the data is normally distributed 
because the P-value is greater than 0.05. 
 
Finding a Dataset Online: 

In the same spreadsheet, add a new sheet at the bottom of your google sheets, and report the 
results of a quick "Internet Investigation" study. For example, you might investigate which is more 
popular, country music or hip hop, by looking at the number of views on Youtube for the top ten 
videos for each. Be creative. Whatever you choose, compare at least two groups, include a brief 
description of what you looked at, and report the mean, median, range, and standard deviation for 
each data set. Also include these data in the spreadsheet you attach to your post. 
d. Google Dataset Search 
e. Kaggle 
 
● What is your identified dataset?  
 
Dataset:​ Top 10 Pop Songs of 2019 and Top 10 Iconic Classical Pieces of 2019 
 
● Mean: 643,966,732 
● Median: 57,926,820 
● Kurtosis: 2.028 
● Skew: 1.757 
● P-Value (Stats Kingdom): 0.0000231228 
 
6. Are the mean and median values the same or different?  
The mean and median values are drastically different. 
 
7. Is the kurtosis and skew value 0 or different?  
The kurtosis and skew value is different (kurtosis value is 2.028 and skew value is 1.757). 
 
8. Based on this information, and the shape of your histogram, would you assume that your 
data is normally distributed or not normally distributed?  
Based on the above information and the shape of my histogram, I would assume that the 
data is not normally distributed because it is more positively skewed. 
 
9. Copy and paste your histogram graph below.  

 
 
10. Based on the outcome of the Shapiro-Wilks test in Stats Kingdom, is your identified 
dataset normally or not normally distributed?  
Based on the outcome of the Shapiro-Wilks test in Stats Kingdom, my identified dataset is 
not normally distributed. 

You might also like