EDA - Visualization - Ipynb - Colab

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

5/13/24, 6:38 PM EDA_Visualization.

ipynb - Colab

keyboard_arrow_down 1. Summary Analysis:


1 import pandas as pd
2 import matplotlib.pyplot as plt
3
4 # Reading Excel file
5 file_path = '/content/encoded_data.xlsx'
6 df = pd.read_excel(file_path)
7
8 # Displaying summary statistics
9 summary_stats = df.describe()
10 print(summary_stats)

count 457.000000 457.000000 457.000000 457.000000 457.000000


mean 1.923414 2.382932 0.829322 2.562363 0.691466
std 0.929912 1.023825 0.514456 1.498883 0.462394
min 0.000000 0.000000 0.000000 0.000000 0.000000
25% 1.000000 2.000000 1.000000 1.000000 0.000000
50% 2.000000 3.000000 1.000000 3.000000 1.000000
75% 3.000000 3.000000 1.000000 4.000000 1.000000
max 3.000000 3.000000 2.000000 5.000000 1.000000

BIF_1 (Who influence your decision in subject selection) \


count 457.000000
mean 3.133479
std 1.556578
min 0.000000
25% 3.000000
50% 4.000000
75% 4.000000
max 4.000000

BIF_2 (Factor influencing decision) BIF_4 (Placements) \


count 457.000000 457.000000
mean 2.586433 2.332604
std 1.497685 1.259564
min 0.000000 0.000000
25% 1.000000 1.000000
50% 3.000000 2.000000
75% 4.000000 3.000000
max 5.000000 4.000000

SGIF_1 (Dependency) SGIF_2 (Frequency of Support) \


count 457.000000 457.000000
mean 2.708972 2.507659
std 1.823514 1.415936
min 0.000000 0.000000
25% 0.000000 1.000000
50% 3.000000 3.000000
75% 4.000000 4.000000
max 6.000000 5.000000

SGIF_3 (Confedence Level) IAIF_2 IAIF_3 ESIF_1 \


count 457.000000 457.000000 457.000000 457.000000
mean 1.702407 1.728665 1.838074 2.019694
std 1.490855 1.166373 1.494153 1.522365
min 0.000000 0.000000 0.000000 0.000000
25% 1.000000 1.000000 1.000000 1.000000
50% 1.000000 2.000000 1.000000 1.000000
75% 3.000000 2.000000 4.000000 4.000000
max 4.000000 5.000000 4.000000 4.000000

ESIF_2 (What Decide Future Goal) Overall


count 457.000000 457.000000
mean 1.590810 1.715536
std 1.352776 1.543961
min 0.000000 0.000000
25% 0.000000 0.000000
50% 2.000000 1.000000
75% 3.000000 3.000000
max 5.000000 4.000000

1 # Visualizing data with histogram


2 plt.figure(figsize=(10, 6))
3 df.hist(bins=20, color='skyblue', edgecolor='black', linewidth=1.5)
4 plt.xlabel('Values')
5 plt.ylabel('Frequency')
6 plt.title('Histogram of Data')
7 plt.grid(False)
8 plt.show()

https://colab.research.google.com/drive/1bPKboF2NkJBP5w7twnNxYEMTkWFwYXSY?authuser=0#scrollTo=faCn9ZSa821P&uniqifier=1&printM… 1/8
5/13/24, 6:38 PM EDA_Visualization.ipynb - Colab

<Figure size 1000x600 with 0 Axes>

Calculating the central tendency from the data:

1 # Calculating mean
2 mean = df.mean()
3
4 # Calculating median
5 median = df.median()
6
7 # Calculating mode
8 mode = df.mode().iloc[0]
9
10 print("Mean:")
11 print(mean)
12 print("\nMedian:")
13 print(median)
14 print("\nMode:")
15 print(mode)

Mean:
Parent Qualification 1.923414
DGIF_1 2.382932
DGIF_2 0.829322
DGIF_4 2.562363
Background 0.691466
BIF_1 (Who influence your decision in subject selection) 3.133479
BIF_2 (Factor influencing decision) 2.586433
BIF_4 (Placements) 2.332604
SGIF_1 (Dependency) 2.708972
SGIF_2 (Frequency of Support) 2.507659
SGIF_3 (Confedence Level) 1.702407
IAIF_2 1.728665
IAIF_3 1.838074
ESIF_1 2.019694
ESIF_2 (What Decide Future Goal) 1.590810
Overall 1.715536
dtype: float64

Median:
Parent Qualification 2.0
DGIF_1 3.0
DGIF_2 1.0
DGIF_4 3.0
Background 1.0
BIF_1 (Who influence your decision in subject selection) 4.0
BIF_2 (Factor influencing decision) 3.0
BIF_4 (Placements) 2.0
SGIF_1 (Dependency) 3.0
SGIF_2 (Frequency of Support) 3.0
SGIF_3 (Confedence Level) 1.0
IAIF_2 2.0
IAIF_3 1.0
ESIF_1 1.0
ESIF_2 (What Decide Future Goal) 2.0
Overall 1.0
dtype: float64

Mode:
Parent Qualification 1

https://colab.research.google.com/drive/1bPKboF2NkJBP5w7twnNxYEMTkWFwYXSY?authuser=0#scrollTo=faCn9ZSa821P&uniqifier=1&printM… 2/8
5/13/24, 6:38 PM EDA_Visualization.ipynb - Colab
DGIF_1 3
DGIF_2 1
DGIF_4 3
Background 1
BIF_1 (Who influence your decision in subject selection) 4
BIF_2 (Factor influencing decision) 4
BIF_4 (Placements) 2
SGIF_1 (Dependency) 4
SGIF_2 (Frequency of Support) 4
SGIF_3 (Confedence Level) 1
IAIF_2 1
IAIF_3 1
ESIF_1 1
ESIF_2 (What Decide Future Goal) 0
Overall 0
Name: 0, dtype: int64

1 # Visualizing data with box plot


2 plt.figure(figsize=(10, 6))
3 df.boxplot()
4 plt.ylabel('Values')
5 plt.title('Box Plot of Data')
6 plt.grid(False)
7 plt.show()

keyboard_arrow_down 2. Bivariate Analysis


1 import seaborn as sns
2 # Function to perform bivariate analysis
3 def bivariate_analysis(data, x, y):
4 # Scatter plot
5 plt.figure(figsize=(10, 6))
6 sns.scatterplot(data=data, x=x, y=y)
7 plt.title("Scatter Plot")
8 plt.xlabel(x)
9 plt.ylabel(y)
10 plt.show()
11
12 # Histogram
13 plt.figure(figsize=(10, 6))
14 sns.histplot(data=data, x=x, kde=True)
15 plt.title("Histogram")
16 plt.xlabel(x)
17 plt.show()
18
19 # Line chart
20 plt.figure(figsize=(10, 6))
21 sns.lineplot(data=data, x=x, y=y)
22 plt.title("Line Chart")
23 plt.xlabel(x)
24 plt.ylabel(y)
25 plt show()
https://colab.research.google.com/drive/1bPKboF2NkJBP5w7twnNxYEMTkWFwYXSY?authuser=0#scrollTo=faCn9ZSa821P&uniqifier=1&printM… 3/8
5/13/24, 6:38 PM EDA_Visualization.ipynb - Colab
25 plt.show()
26
27 x_column = "IAIF_2"
28 y_column = "Overall"
29 bivariate_analysis(df, x_column, y_column) #Relation between any 2 variates can be visualized

https://colab.research.google.com/drive/1bPKboF2NkJBP5w7twnNxYEMTkWFwYXSY?authuser=0#scrollTo=faCn9ZSa821P&uniqifier=1&printM… 4/8
5/13/24, 6:38 PM EDA_Visualization.ipynb - Colab

https://colab.research.google.com/drive/1bPKboF2NkJBP5w7twnNxYEMTkWFwYXSY?authuser=0#scrollTo=faCn9ZSa821P&uniqifier=1&printM… 5/8
5/13/24, 6:38 PM EDA_Visualization.ipynb - Colab

keyboard_arrow_down 3. Correlation Analysis:


1 # Ploting correlation matrix as a heatmap
2 correlation_matrix = df.corr()
3 plt.figure(figsize=(10, 8))
4 sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', fmt=".2f")
5 plt.title('Correlation Matrix')
6 plt.show()

1 # Ploting Scatterplot
2 sns.pairplot(df)
3 plt.title('Scatter Plots')
4 plt.show()

https://colab.research.google.com/drive/1bPKboF2NkJBP5w7twnNxYEMTkWFwYXSY?authuser=0#scrollTo=faCn9ZSa821P&uniqifier=1&printM… 6/8
5/13/24, 6:38 PM EDA_Visualization.ipynb - Colab

keyboard_arrow_down 4. Distribution Analysis


https://colab.research.google.com/drive/1bPKboF2NkJBP5w7twnNxYEMTkWFwYXSY?authuser=0#scrollTo=faCn9ZSa821P&uniqifier=1&printM… 7/8
5/13/24, 6:38 PM EDA_Visualization.ipynb - Colab

1 data = pd.read_excel('/content/encoded_data.xlsx')
2 # Extract the numeric columns
3 numeric_columns = data.select_dtypes(include=['number'])
4 # Plot Histogram
5 plt.figure(figsize=(10, 6))
6 for col in numeric_columns.columns:
7 sns.histplot(data[col], kde=True, bins=20, alpha=0.5, label=col)
8 plt.title('Histogram')
9 plt.xlabel('Value')
10 plt.ylabel('Frequency')
11 plt.legend()
12 plt.show()

1 plt.figure(figsize=(10, 6))
2 sns.boxplot(data=numeric_columns, orient='h')
3 plt.title('Box Plot')
4 plt.xlabel('Value')
5 plt.show()

https://colab.research.google.com/drive/1bPKboF2NkJBP5w7twnNxYEMTkWFwYXSY?authuser=0#scrollTo=faCn9ZSa821P&uniqifier=1&printM… 8/8

You might also like