Professional Documents
Culture Documents
Daily Task 9 - Statistical Tests - Jupyter Notebook
Daily Task 9 - Statistical Tests - Jupyter Notebook
Daily Task 9 - Statistical Tests - Jupyter Notebook
I. T Test
1 Sample
2 Sample
Paired T Test
1 Sample proportion test
2 Sample proportion test
Anova Test
In [1]:
Out[1]:
30
In [2]:
import numpy as np
np.mean(minutes) # Population mean
Out[2]:
122.06666666666666
In [3]:
Out[3]:
In [4]:
Out[4]:
140.2
In [5]:
In [6]:
print(_)
print(pval)
0.7920233342827273
0.47266930756565606
In [7]:
#Level if significance - 10%. ie, At 10% level of significance, do we reject or not reject?
if pval<0.1:
print('We can reject the Null Hypothesis and we can claim that there is a significant d
else:
print('We do not reject the Null Hypothesis and we can claim that there is no significa
We do not reject the Null Hypothesis and we can claim that there is no signi
ficant difference in the population mean and sample mean
So, in this example, suppose we want to sample a mango trees for different
fertilizers
Our null hypothesis is: Organic fertilizer is good for production
Our alternative hypothesis is: Chemical fertilizer is good for production
In [8]:
import numpy as np
from scipy import stats
In [9]:
np.mean(Organic_mangos)
Out[9]:
39.1
In [10]:
np.mean(Chemical_mangos)
Out[10]:
80.8
In [11]:
Out[11]:
3.5677940572369257e-07
In [12]:
We reject the Null Hypothesis and we can claim that there is a significant d
ifference in the average nos of mangos with chemical fertilizer and organic
fertilizer for production
3. Paired T Test
Here we are going to calculate working time with focus of human with and
without meditation in minutes
Our null hypothesis is: We can focus more than 10 hrs without meditation
Our alternative hypothesis is: We cannot focus more than 10 hrs without meditation
In [13]:
pre_meditation_program = [480,420,400,360,300,320,340,500,380,420]
post_meditation_program = [540,600,720,650,700,630,560,670,800,740]
In [14]:
np.mean(pre_meditation_program)
Out[14]:
392.0
In [15]:
np.mean(post_meditation_program)
Out[15]:
661.0
In [16]:
Out[16]:
3.137229239240941e-05
In [17]:
#Level if significance - 10%. ie, At 10% level of significance, do we reject or not reject?
if pval<0.1:
print('We reject the Null Hypothesis and we can claim that there is a significant diffe
else:
print('We do not reject the Null Hypothesis and we can claim that there is no significa
We reject the Null Hypothesis and we can claim that there is a significant d
ifference in the average working hrs with & without meditaion
Example:
Null hypothesis is: 80% of the tests pass
Alternative hypothesis is: more than 80% of the tests pass
We sampled 500 tests, and found 410 passed
In [18]:
In [19]:
sample_success = 410
sample_size = 500
null_hypothesis = 0.80
In [20]:
Out[20]:
0.12220177493249235
In [21]:
#Level if significance - 10%. ie, At 10% level of significance, do we reject or not reject?
significance = 0.1
if p_value < significance:
print ("Fail to reject the null hypothesis - we have nothing else to say")
else:
print ("Reject the null hypothesis - suggest the alternative hypothesis is true")
In [22]:
In [23]:
sns.get_dataset_names()
Out[23]:
['anagrams',
'anscombe',
'attention',
'brain_networks',
'car_crashes',
'diamonds',
'dots',
'exercise',
'flights',
'fmri',
'gammas',
'geyser',
'iris',
'mpg',
'penguins',
'planets',
'taxis',
'tips',
'titanic']
In [24]:
tips_data = load_dataset('tips')
tips_data
Out[24]:
In [25]:
tips_data['smoker'].value_counts()
Out[25]:
No 151
Yes 93
In [26]:
per_yes = round(93/244*100,2)
print("per_yes is",per_yes,'%')
per_yes is 38.11 %
In [27]:
per_no = round(151/244*100,2)
print("per_no is",per_no,'%')
per_no is 61.89 %
In [28]:
Out[28]:
0.7278585473640354
In [29]:
Reject the null hypothesis & consider the alternative hypothesis is true
In [30]:
import pandas as pd
import numpy as np
from statsmodels.stats.proportion import proportions_ztest
In [31]:
depression_data = pd.read_csv("Depression_status.csv")
depression_data.head(10)
Out[31]:
0 Yes No
1 Yes No
2 Yes No
3 Yes No
4 Yes No
5 Yes No
6 Yes No
7 Yes No
8 No No
9 Yes No
In [32]:
Out[32]:
Old_Therapy New_Therapy
0 Yes No
1 Yes No
2 Yes No
3 Yes No
4 Yes No
5 Yes No
6 Yes No
7 Yes No
8 No No
9 Yes No
In [33]:
depression_data.value_counts()
Out[33]:
Old_Therapy New_Therapy
Yes No 37
No No 7
Yes Yes 4
No Yes 2
dtype: int64
In [34]:
per_old_depression = 41/50*100
print("Old Therapy Depression level is",per_old_depression,"%")
In [35]:
per_new_depression = 6/50*100
print("New Therapy Depression level is",per_new_depression,"%")
In [36]:
In [37]:
In [38]:
Out[38]:
2.3387247876563156e-12
In [39]:
We reject the Null Hypothesis and we can claim that there is a significant d
ifference in the depression level with old therapy & new therapy, So people
can feel less depressed with new therapy
Example: Average between two data samples are significantly independent and
different.
Hypothesis Formulation
H0: the mean between two samples are equal .
H1: the mean between two samples are not equal.
In [40]:
In [41]:
data1 = [0.873, 2.817, 0.121, -0.945, -0.055, -1.436, 0.360, -1.478, -1.637, -1.869]
data2 = [1.142, -0.432, -0.938, -0.729, -0.846, -0.157, 0.500, 1.183, -1.075, -0.169]
data3 = [-0.208, 0.696, 0.928, -1.148, -0.213, 0.229, 0.137, 0.269, -0.870, -1.204]
In [42]:
Out[42]:
0.9083957433926546
In [43]:
In [44]:
In [45]:
sns.get_dataset_names()
Out[45]:
['anagrams',
'anscombe',
'attention',
'brain_networks',
'car_crashes',
'diamonds',
'dots',
'exercise',
'flights',
'fmri',
'gammas',
'geyser',
'iris',
'mpg',
'penguins',
'planets',
'taxis',
'tips',
'titanic']
In [46]:
penguins_data = load_dataset('penguins')
penguins_data
Out[46]:
In [47]:
import pandas as pd
observed_table = pd.crosstab(index = penguins_data['sex'], columns = penguins_data['species
observed_table
Out[47]:
sex
Female 73 34 58
Male 73 34 61
In [48]:
**************************************************************
P-val : 0.97599
Degree of Freedom : 2
Expected Table :
In [49]:
#Level if significance - 10%. ie, At 10% level of significance, do we reject or not reject?
if pval<0.1:
print('We can reject the Null Hypothesis and we can claim that there is a association b
else:
print('We do not reject the Null Hypothesis and we can claim that there is no associati
We do not reject the Null Hypothesis and we can claim that there is no assoc
iation between male penguins and female penguins.
THE END!!