Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 3

Week 3 Python Project Business Report

[1] To read a dataset from a csv file we need to import pandas and create a dataframe (df). The code is given below.
import pandas as pd
df=pd.read_csv('Laliga_scores.csv')
To replace dashes with 0 in order to perform arithmetic operation, the code is – df=df.replace('-',0)
To check distribution of best position and report top position we use describe function. The code is:
df['BestPosition'].describe()
The result is given below and top position is found to be 1.
count 61
unique 18
top 1
freq 9
Name: BestPosition, dtype: object

[2] To print all the teams which have started playing between 1930 and 1980 using ‘Debut’ column, the code is:
df['Debut']=df['Debut'].astype(str)
df[df['Debut'].str[:4].between("1930","1980")][['Team', "Debut"]]
The result is given below.

[3] To print the list of teams which came Top 5 in terms of points the code is:
df["Points"] = df["Points"].astype(str).astype(int)
df.sort_values('Points', ascending=False).head()['Team']
The result is given below,

1 Real Madrid
2 Barcelona
3 Atletico Madrid
4 Valencia
5 Athletic Bilbao
Name: Team, dtype: object
[4] Function to find goal difference:
df["GoalsFor"] = df["GoalsFor"].astype(str).astype(int)
df["GoalsAgainst"] = df["GoalsAgainst"].astype(str).astype(int)

def Goal_diff_count(a,b):
return a - b

df['Goal_diff_count'] = df.apply(lambda f: Goal_diff_count(f['GoalsFor'],f['GoalsAgainst']), axis=1)


A new column named ‘Goal_diff_count’ is added to the dataframe.

[5] To find the teams with maximum and minimum goal difference the code and result is:
Code for maximum –
max_goal_diff=df['Goal_diff_count'].max()
df[df['Goal_diff_count']==max_goal_diff][['Team','Goal_diff_count']]

Result:

Team Goal_diff_count

1 Real Madrid 2807

Code for minimum –


min_goal_diff=df['Goal_diff_count'].min()
df[df['Goal_diff_count']==min_goal_diff][['Team','Goal_diff_count']]

Result:

Team Goal_diff_count

14 Racing Santander -525

[6] To calculate winning percentage into a new column the code is:
df["GamesPlayed"] = df["GamesPlayed"].astype(str).astype(int)
df["GamesWon"] = df["GamesWon"].astype(str).astype(int)
def Percentage_of_Winning(t,w):
if t>0:
pw = w/t*100
else:
pw = 0
return pw
df['WinningPercent'] = df.apply(lambda f: Percentage_of_Winning(f['GamesPlayed'],f['GamesWon']), axis=1)

[7] To print the top 5 teams which have the highest Winning percentage, the code is:
df.sort_values('WinningPercent',ascending=False).head()['Team']
Result:
1 Real Madrid
2 Barcelona
3 Atletico Madrid
4 Valencia
5 Athletic Bilbao
Name: Team, dtype: object
[8] To group teams based on their “Best position” and print the sum of their points for all positions the code is:
df.groupby('BestPosition')['Points'].sum()

Result:
BestPosition points.sum()
1 27933
10 450
11 445
12 511
14 71
15 14
16 81
17 266
19 81
2 6904
20 34
3 5221
4 6563
5 1884
6 2113
7 1186
8 1134
9 96
Name: Points, dtype: int32

You might also like