Professional Documents
Culture Documents
DAV Prac BHR
DAV Prac BHR
DAV Prac BHR
Exercise: 1
1. Write a program to find correlation between gender and Semester marks.
Input:
import pandas as pd
import numpy as np
df = pd.DataFrame(data)
corr_gender_semester
df['Gender'].str.get_dummies().corrwith(df['Semester_marks'])
print(corr_gender_semester)
Female 0.118725
Male -0.118725
dtype: float64
210020116055
Input:
import pandas as pd
import numpy as np
import random
data = {
df = pd.DataFrame(data)
210020116051
corr_location_semester =
pd.get_dummies(df['Geographical_location']).corrwith(df['Semester_marks'])
print(corr_location_semester)
print(corr_gender_semester)
print(corr_location_semester)
Output:
Location 1 0.158657
Location 10 0.178196
210020116055
Location 11 -0.144198
Location 12 0.100040
...
Location 95 -0.114890
Location 96 0.144003
Location 97 0.021884
Location 98 -0.114890
Location 99 0.051192
Female 0.118725
Male -0.118725
dtype: float64
Location 1 0.158657
Location 10 0.178196
Location 11 -0.144198
Location 12 0.100040
210020116051
...
Location 95 -0.114890
Location 96 0.144003
Location 97 0.021884
Location 98 -0.114890
Location 99 0.051192
Exercise:1(b)
1. Write a program to calculate correlation between difficulty level and subject marks.
Input:
df['Difficulty_level'] = df['Difficulty_level'].map(difficulty_map)
print(corr_gender_semester)
print(corr_location_semester)
if corr_difficulty_subject < 0:
else:
Output:
Female 0.118725
Male -0.118725
dtype: float64
Location 1 0.158657
Location 10 0.178196
Location 11 -0.144198
Location 12 0.100040
...
Location 95 -0.114890
Location 96 0.144003
Location 97 0.021884
Location 98 -0.114890
Location 99 0.051192
Length: 100,
210020116055
dtype: float64
Exercise:2
1. Write a program to find out mean and standard deviation.
Input:
import pandas as pd
import numpy as np
import random
college_data = {
college_df = pd.DataFrame(college_data)
college_mean = college_df['University_exam_score'].mean()
college_std = college_df['University_exam_score'].std()
gujarat_data = {
'College': ['College B', 'College C', 'College D', 'College E'] * 10,
gujarat_df = pd.DataFrame(gujarat_data)
gujarat_mean = gujarat_df.groupby('College')['University_exam_score'].mean()
gujarat_std = gujarat_df.groupby('College')['University_exam_score'].std()
print(gujarat_mean)
print(gujarat_std)
print("\nObservations:")
print("- The mean university exam score for College A is higher than the mean score
print("- The standard deviation of university exam scores for College A is lower than
print("- This suggests that the performance of students in College A is generally better
Output:
Mean university exam score for College A: 65.42
College
College B 65.4
College C 55.3
College D 66.3
College E 65.1
College
College B 21.582915
College C 22.837834
College D 22.514440
College E 26.320461
Exercise:3
1. Plot this time series Data. Analyze the trend as per time.
Input:
import pandas as pd
data = {
'Month': ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'],
df = pd.DataFrame(data)
plt.figure(figsize=(12, 6))
plt.xlabel('Month')
plt.ylabel('COVID Cases')
plt.legend()
plt.show()
print("Observations:")
print("- The trend for all cities shows an increasing trend in COVID cases over time.")
print("- The peak COVID cases occur in the months of May and June for all cities.")
print("- The trend for each city may vary based on local factors such as population size,
output:
Observations:
- The trend for all cities shows an increasing trend in COVID cases over time.
- The peak COVID cases occur in the months of May and June for all cities.
- The trend for each city may vary based on local factors such as population size,
Exercise:4
1. Write a program to apply random forest algorithm and suggest the best suited
Input:
'Student_Research_Experience', 'Student_Internship_Experience',
'Student_Scholarship']
college_data = {
college_df = pd.DataFrame(college_data)
X = college_df[features]
y = college_df['College']
rf_model.fit(X_train, y_train)
predictions = rf_model.predict(X_test)
print(f"Accuracy: {accuracy:.2f}")
student_data = {
'Student_SAT_score': [1800],
210020116055
'Student_TOEFL_score': [80],
'Student_GPA': [3.5],
'Student_Research_Experience': [1],
'Student_Internship_Experience': [1],
'Student_Scholarship': [1]
student_df = pd.DataFrame(student_data)
predicted_college = rf_model.predict(student_df[features])[0]
Output:
Accuracy: 0.00
Exercise:5
1. Visualize the data to show region wise results, branch wise results, subject wise
results.
Input:
import pandas as pd
student_data = pd.DataFrame({
"Civil Engineering"],
})
grouped_data = data.groupby(key)
averages = grouped_data['score'].mean().reset_index()
return average
# Region-wise Results
plt.figure(figsize=(10, 6))
plt.xlabel('Region')
plt.ylabel('Average Score')
plt.title('Region-wise Results')
plt.show()
# Branch-wise Results
plt.figure(figsize=(10, 6))
210020116055
plt.xlabel('Branch')
plt.ylabel('Average Score')
plt.title('Branch-wise Results')
plt.xticks(rotation=45, ha='right')
plt.show()
# Subject-wise Results
plt.figure(figsize=(10, 6))
plt.xlabel('Subject')
plt.ylabel('Average Score')
plt.title('Subject-wise Results')
plt.xticks(rotation=45, ha='right')
plt.show()
210020116055
output:
210020116055
210020116055
210020116055
Exercise:6
1. Take year wise population.
2. Show appropriate size circle for population as per year.
3. Fill color in circle.
4. Prepare bar chart and pie chart.
5. Explore other functionality of D3.js
Input:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>D3.js Population Visualization</title>
<script src="https://d3js.org/d3.v7.min.js"></script>
<style>
.bar {
fill: steelblue;
}
.bar:hover {
fi5ll: orange;
}
</style>
</head>
<body>
<h2>Year-wise Population</h2>
<svg id="population-svg" width="800" height="400"></svg>
<h2>Bar Chart</h2>
<svg id="bar-chart" width="800" height="400"></svg>
<h2>Pie Chart</h2>
<svg id="pie-chart" width="800" height="400"></svg>
210020116055
<script>
// Sample data
const populationData = [
{ year: 2020, population: 1000 },
{ year: 2021, population: 1200 },
{ year: 2022, population: 1500 },
{ year: 2023, population: 1800 },
{ year: 2024, population: 2000 }
];
// SVG for population circles
const svgPopulation = d3.select("#population-svg");
svgPopulation.selectAll("circle")
.data(populationData)
.enter().append("circle")
.attr("cx", (d, i) => 100 + i * 100)
.attr("cy", 100)
.attr("r", d => Math.sqrt(d.population) / 10)
.attr("fill", "steelblue");
// Bar Chart
const svgBar = d3.select("#bar-chart");
const xScale = d3.scaleBand()
.domain(populationData.map(d => d.year))
.range([0, 800])
.padding(0.1);
const yScale = d3.scaleLinear()
.domain([0, d3.max(populationData, d => d.population)])
.range([400, 0]);
svgBar.selectAll("rect")
.data(populationData)
.enter().append("rect")
210020116055
.attr("class", "bar")
.attr("x", d => xScale(d.year))
.attr("y", d => yScale(d.population))
.attr("width", xScale.bandwidth())
.attr("height", d => 400 - yScale(d.population));
// Pie Chart
const svgPie = d3.select("#pie-chart");
const pie = d3.pie()
.value(d => d.population)
.sort(null);
const arc = d3.arc()
.innerRadius(0)
.outerRadius(200);
const arcs = svgPie.selectAll("arc")
.data(pie(populationData))
.enter()
.append("g")
.attr("class", "arc")
.attr("transform", "translate(400,200)");
arcs.append("path")
.attr("fill", (d, i) => d3.schemeCategory10[i])
.attr("d", arc);
</script>
</body>
</html>
210020116055
Output:
210020116055
210020116055