Professional Documents
Culture Documents
Fdsa Final
Fdsa Final
SOURCE CODE:
importio
from google.colab import files
uploaded =files.upload()
import pandas as pd
import numpy as np
d= pd.read_csv()
d.describe()
d.head()
d.head(6)
d.tail()
d.dtypes()
d.ndim
d.shape()
d.size()
d.coloumns()
d[ Age ]
a=d.sort_values(by= Age)
a1
a2
1
OUTPUT:
Out[1]:
Sl.No Employee Age Sex Salary
s
S
e
r
v
i
c
e
3
Out[2]:
Out[3]:
e
Officestaff 20.0 male 5.0 15000.0
e
Officestaff 30.0 femal 6.0 15000.0
4
Out[4]:
5
EX.NO:02 BASICPLOTS USING MATPLOTLIB
SOURCE CODE:
1) PLOT(X,Y)
#plot
fig, ax=plt.subplots()
ax.plot(x,y,linewidth=2.0)
ax.set(xlim=(0,8),xticks=np.arange(1,8), ylim=(0,8),yticks=np.arange(1,8))
plt.show()
OUTPUT:
6
2) SCATTER(X,Y)
plt.style.use('_mpl-gallery')
np.random.seed(3)
x=4+ np.random.normal(0,2,24)
y = 4 + np.random.normal(0, 2, len(x))
#plot
fig, ax=plt.subplots()
ax.scatter(x,y,s=sizes,c=colors,vmin=0,vmax=100)
ax.set(xlim=(0,8),xticks=np.arange(1,8), ylim=(0,8),yticks=np.arange(1,8))
plt.show()
OUTPUT:
7
3) BAR(X,HEIGHT)
import numpy as np
plt.style.use('_mpl-gallery')
#make data:
np.random.seed(3)
x=0.5+np.arange(8)
y=np.random.uniform(2,7,len(x))
#plot
ax.set(xlim=(0,8),xticks=np.arange(1,8), ylim=(0,8),yticks=np.arange(1,8))
plt.show()
OUTPUT:
4) STEM(X,Y)
import matplotlib.pyplot as plt
import numpy as np
plt.style.use('_mpl-gallery')
# make data
8
np.random.seed(3)
x=0.5+np.arange(8)
y=np.random.uniform(2,7,len(x))
#plot
fig, ax=plt.subplots()
ax.stem(x,y)
ax.set(xlim=(0,8),xticks=np.arange(1,8),ylim=(0,8),yticks=np.arange(1,8))
plt.show()
OUTPUT:
9
EX.NO:3 FREQUENCY DISTRIBUTION, AVERAGES ,VARIABILITY
PROGRAM:
A histogram is a plot of the frequency distribution of numeric array by splitting it to small equal-sized bins.
1) FREQUENCEYDISTRIBUTION
import numpy as np
x=np.random.randint(low=0,high=100,size=100)
frequency,bins=np.histogram(x,bins=10,range=[0,100])
#Pretty Print
OUTPUT:
10.0* ** * ** *
20.0* * ** * ** ***
30.0* * ** * ** *** * **
40.0* ** * ** * *
50.0* * ** * ** **
60.0* * ** * ** **
70.0* * ** * ** *** * *
80.0* ** * ** *
90.0* * ** * ** *** * ** * * **
100.0* ** * ** * *
2) AVERAGE,VARIENCE,STANDARDDEVIATION
import numpy as np
# Original array
array=np.arange(10)
1
0
print(array)
1
1
r1 =np.average(array)
print("\nMean:",r1)
nstd:",r2)
r3 =np.mean((array- np.mean(array))**2)
print("\nvariance:",r3)
OUTPUT:
[0123456789]
Average:4.5
std:2.8722813232690143
variance:8.25
12
EX.NO:4 NORMAL CURVES, CORRELATION AND SCATTER
PLOTS,CORRELATIONCOEFFICENT
PROGRAM:
1) NORMAL CURVE
inlcude:
%matplotlib inline
# define constants
mu=998.8
sigma=73.10
x1=900
x2=1100
z1=(x1-mu) /sigma
z2=(x2-mu)/sigma
# range of x in spec
y=norm.pdf(x,0,1)
fig, ax = plt.subplots(figsize=(9,6))
14
ax.fill_between(x,y,0, alpha=0.3, color='b')
ax.fill_between(x_all,y2,0, alpha=0.1)
ax.set_xlim([-4,4])
ax.set_yticklabels([])
ax.set_title('Normal Curve')
plt.show()
OUTPUT:
2) SCATTERPLOT
import matplotlib.pyplot as plt
import numpy as np
x=range(50)
y=range(50)+np.random.randint(0,30,50)
plt.scatter(x,y)
plt.rcParams.update({ figure.figsize :(10,8), figure.dpi :100})plt.title( simpescatterplot )
plt.xlabel( xvalue
)
plt.ylabel( yvalue)
15
plt.show()
OUTPUT:
16
3) CORRELATION WITH SCATTER PLOT
import numpy as np
x=np.random.randn(100)
y1=x*5+9
y2=-5+x
y3=np.random.randn(100)plt.rcParams.update({'figure.figsize':(10,8),'figure.dpi':100})
plt.scatter(x,y1,label=f’y1,correlation={np.round(np.corrcoef(x,y1)[0,1],2)}’)
plt.scatter(x,y2,label=f’y2,correlation={np.round(np.corrcoef(x,y2)[0,1],2)}’) plt.scatter(x,y3,label=f’y3,
correlation={np.round(np.corrcoef(x,y3)[0,1],2)}’) plt.title('scatterplotandcorrelations')
plt.legend()
plt.show()
17
OUTPUT:
18
EX.NO:5 REGRESSION
SOURCE CODE:
import pandas as pd
# Create a sample
DataFrame
OUTPUT:
OUT[1]:
==============================================================================
Dep. Variable: Model:dependent_var
Method: Date: OLS R-squared:1.000
Time: Least Squares Thu, 30Adj.
MarR-squared:1.000
2023
No. Observations: Df Residuals:
08:40:16 F-statistic:7.606e+30 Prob (F-statistic):1.05e-46 Log-Likelihood:16
Df Model: -321.5
5 AIC: BIC:
3 -322.2
1
Covariance Type:nonrobust
19
====================================================================================
coefstd errtP>|t|[0.0250.975]
independent_var1 independent_var2
0.15383.17e-164.86e+14 0.000 0.1540.154
1.2308 9e-16 1.37e+15 0.0001.2311.231
independent_var30.3077 6.34e-164.86e+140.0000.3080.308
==============================================================================
Omnibus:nanDurbin-Watson:0.400
Prob(Omnibus):nanJarque-Bera (JB):0.770
Skew:-0.844Prob(JB):0.680
Kurtosis:2.078Cond. No4.43e+16
==============================================================================
OUT[2]:
20
EXNO: 6 IMPLEMENT Z-TEST USING STATSMODELS
SOURCE CODE:
ztest(data,value=100)
from statsmodels.stats.weightstats
import ztest as ztest
#enter IQ levels for20individuals from each city
cityA=[82,84,85, 89,91, 91,92,94, 99, 99,105,109,109,109,110,112,112,113,114,114]
21
OUTPUT:
(1.5976240527147705,0.1101266701438426)
22
EXNO:7 IMPLEMENTION OF T-TEST USING SCIPY
SOURCE CODE:
data=[14,14,16,13,12,17,15,14,15,13,15,14]
import scipy.stats as stats
stats.ttest_1samp(a=data, popmean=15)
import numpy as np
group1=np.array([14,15,15,16,13,8,14,17,16,14, 19,20,21,15,15,16,16,13,14,12])
group2=np.array([15,17,14,17,14,8,12,19,19,14, 17,22,24,16,13,16,13,18,15,13])
print(np.var(group1),np.var(group2))
stats.ttest_ind(a=group1,b=group2,equal_var=True)
23
OUTPUT:
TtestResult(statistic=-1.6848470783484626,pvalue=0.12014460742498101,df=11)
7.73,12.26
TtestResult(statistic=-0.6337,pvalue=0.53005)
24
EXNO:8 ANOVA TABLE
SOURCECODE:
f_oneway(group1, group2,group3)
import numpy as np
import pandas as pd
#create data
df=pd.DataFrame({'water':np.repeat(['daily','weekly'],15),'sun':np.tile(np.repeat(['low','med','high'],5),2),
'height':[6,6,6,5,6,5, 5,6,4,5,6,6,7, 8,7,3,4,4,4,5,4,4,4,4,4,5,6, 6,7,8]})
#view rows of data
df import statsmodels.api as sm
25
OUTPUT:
OUT[1]:
F_oneway Result (statistic=2.3575322551335636,pvalue=0.1138479534583721 8)
OUT[2]:
0 daily low 6
1 daily low 6
2 daily low 6
3 daily low 5
4 daily low 6
5 daily med 5
6 daily med 5
7 daily med 6
8 daily med 4
9 daily med 5
sum_sq df F PR(>F)
26
EXNO:9 BUILDING AND VALIDATING LINEAR MODELS
SOURCE CODE:
import numpy as np
from sklearn.linear_modelimport LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 4, 5, 4, 6])
x = x.reshape(-1, 1)
y = y.reshape(-1, 1)
model = LinearRegression()
model.fit(x, y)
y_pred = model.predict(x)
# print results
27
OUTPUT:
28
EXNO:10 BUILDING AND VALIDATING LOGISTIC MODELS
SOURCE CODE:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix
# Sample data
data = {'age': [25, 32, 28, 45, 36, 52, 40, 55, 60, 48],'income': [50000, 75000, 60000, 55000,
70000, 80000, 65000, 90000, 95000, 90000],'default': [0, 0, 0, 1, 1, 1, 0, 1, 1, 1]}
# Create a DataFrame
df = pd.DataFrame(data)
# Data preprocessing
X = df[['age', 'income']] # Features
Y = df['default'] # Target variable
# Model evaluation
accuracy = accuracy_score(Y_test, Y_pred)
precision = precision_score(Y_test, Y_pred)
recall = recall_score(Y_test, Y_pred)
f1 = f1_score(Y_test, Y_pred)
confusion = confusion_matrix(Y_test, Y_pred)
29
OUTPUT:
Accuracy: 100.00%
Precision: 100.00%
Recall: 100.00%
F1-score: 100.00%
Confusion Matrix:
[[1 0]
[0 1]]
30
EXNO: 11 TIME SERIES ANALYSIS
PROGRAM:
Import pandas as pd
['1/2/2021','1/3/2021','1/4/2021','1/5/2021','1/6/2021','1/7/2021','1/8/2021'],
'value':[4,7,8,13,17,15,21]})
sns.lineplot(x='date',y='value',data=df)
linestyle='dashed').set(title='TimeSeries Plot')
#rotatex-axislabelsby15degrees
plt.xticks(rotation=15)
import pandas as pd
['1/1/2021','1/2/2021','1/3/2021','1/4/2021','1/1/2021','1/2/2021','1/3/2021','1/4/2021'
],
'sales':[4,7,8,13,17,15,21,28],'company':['A','A','A','A','B','B','B','B']})
32
OUTPUT:
OUT[1]:
<AxesSubplot:xlabel='date',ylabel='value'>
OUT[2]:
([0,1,2,3,4,5,6],
[Text(0,0,''),
Text(0, 0, ''),
Text(0, 0, ''),
Text(0, 0, ''),
Text(0, 0, ''),
Text(0, 0, ''),
Text(0, 0, '')])
33
OUT[3]:
<AxesSubplot:xlabel='date',ylabel='sales'>
34