Matchdata - Ipynb - Colaboratory

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

10/6/23, 9:51 PM Untitled5.

ipynb - Colaboratory

First of all we have to import Pandas library and then open CSV file.

import pandas as pd
df=pd.read_csv('matchdata - Sheet1 (1).csv')
df

Player Span Mat Inns NO Runs HS Ave BF SR 100 50

SR Tendulkar 1989-
0 463 452.0 41 18426 200* 44.83 21368.0 86.23 49 96 2
(IND) 2012

KC
2000-
1 Sangakkara 404 380.0 41 14234 169 41.98 18048.0 78.86 25 93
2015
(Asia/ICC/SL)

RT Ponting 1995-
2 375 365.0 39 13704 164 42.03 17046.0 80.39 30 82 2
(AUS/ICC) 2012

ST
1989-
3 Jayasuriya 445 433.0 18 13430 189 32.36 14725.0 91.20 28 68 3
2011
(Asia/SL)

2008-
4 V Kohli (IND) 281 269.0 41 13083 183 57.38 13950.0 93.78 47 66
2023

... ... ... ... ... ... ... ... ... ... ... ... ...

PD
2001-
89 Collingwood 197 181.0 37 5092 120* 35.36 6614.0 76.98 5 26
2011
(ENG)

Now changing columns name for better understanding. using .rename function then using df.head() for checking the changes.

df = df.rename(columns = {'Mat':'Match','NO':'Not Outs','HS':'Highest Score','BF':'Balls Faced','Ave':'Average','SR':'Strike Rate','0':'Ducks','4s':'Fours(4)','6s':'Sixes(6)'})


df.head()

Not Highest Balls Strike


Player Span Match Inns Runs Average
Outs Score Faced Rate

SR Tendulkar 1989-
0 463 452.0 41 18426 200* 44.83 21368.0 86.23
(IND) 2012

KC
2000-
1 Sangakkara 404 380.0 41 14234 169 41.98 18048.0 78.86
2015
(Asia/ICC/SL)

RT Ponting 1995-
2 375 365.0 39 13704 164 42.03 17046.0 80.39
(AUS/ICC) 2012

Now Checking Null Values.

df.isnull().any()

Player False
Span False
Match True
Inns True
Not Outs False
Runs False
Highest Score True
Average False
Balls Faced True
Strike Rate True
100 False
50 False
Ducks False
Fours(4) True
Sixes(6) False
dtype: bool

As we have some null values in columns giving true so we have to fill this values with some. Also check wich columns has value null at what
point.so uding (.isna()) command.

df[df['Match'].isna()==1]

Not Highest Balls Strike


Player Span Match Inns Runs Average 100
Outs Score Faced Rate

Q de
2013-

Here we have 1 row having Match value null (NaN) so we have to fill it with some value.so using (.fillna()) command.

df['Match']=df['Match'].fillna(0)

similiarly do this step for all the missing values.

df[df['Inns'].isna()==1]

Not Highest Balls Strike


Player Span Match Inns Runs Average 100
Outs Score Faced Rate

PA de
1984-
18 Silva 308 NaN 30 9284 145 34.90 11443.0 81.13 11
2003
(SL)

Shoaib
1999-
35 Malik 287 NaN 40 7534 143 34.55 9199.0 81.90 9
2019
(PAK)

df['Inns']=df['Inns'].fillna(0)

https://colab.research.google.com/drive/1y3gYMvP6_blXC19LMusye-tcAmlWpfTi#scrollTo=oSZt2D7_1p2x&printMode=true 1/3
10/6/23, 9:51 PM Untitled5.ipynb - Colaboratory
df[df['Highest Score'].isna()==1]

Not Highest Balls Strike


Player Span Match Inns Runs Average 100
Outs Score Faced Rate

G
2003-

df['Highest Score']=df['Highest Score'].fillna(0)

df[df['Balls Faced'].isna()==1]

Not Highest Balls Strike


Player Span Match Inns Runs Average
Outs Score Faced Rate

Inzamam-ul-
1991-
6 Haq 378 350.0 53 11739 137* 39.52 NaN 74.24
2007
(Asia/PAK)

S
1994

df['Balls Faced']=df['Balls Faced'].fillna(0)

df[df['Strike Rate'].isna()==1]

Not Highest Balls Strik


Player Span Match Inns Runs Average
Outs Score Faced Rat

R Dravid 1996-
9 344 318.0 40 10889 153 39.16 15285.0 Na
(Asia/ICC/IND) 2011

Mohammad
1998-
15 Yousuf 288 273.0 40 9720 141* 41.71 12942.0 Na
2010
df['Strike Rate']=df['Strike Rate'].fillna(0)

df[df['Fours(4)'].isna()==1]

Not Highest Balls Strike


Player Span Match Inns Runs Average 100
Outs Score Faced Rate

Shoaib
1999-
35 Malik 287 0.0 40 7534 143 34.55 9199.0 81.90 9
2019
(PAK)

df['Fours(4)']=df['Fours(4)'].fillna(0)

Now check all the columns are filled or not.

df.isnull().any()

Player False
Span False
Match False
Inns False
Not Outs False
Runs False
Highest Score False
Average False
Balls Faced False
Strike Rate False
100 False
50 False
Ducks False
Fours(4) False
Sixes(6) False
dtype: bool

It shows that all the columns are filled.

Now check for duplications.

df.duplicated()

0 False
1 False
2 False
3 False
4 False
...
89 False
90 False
91 False
92 False
93 False
Length: 94, dtype: bool

df[df['Player'].duplicated()==1]

Not Highest Balls Strike


Player Span Match Inns Runs Average 10
Outs Score Faced Rate

A
1982-
44 Ranatunga 269 255.0 47 7456 131* 35.84 9571.0 77.90
1999
(SL)

It shows that there 2 rows are duplicated so we have to clear it.

df[df['Player'].isin(['A Ranatunga (SL)','BRM Taylor (ZIM)'])]

https://colab.research.google.com/drive/1y3gYMvP6_blXC19LMusye-tcAmlWpfTi#scrollTo=oSZt2D7_1p2x&printMode=true 2/3
10/6/23, 9:51 PM Untitled5.ipynb - Colaboratory

Not Highest Balls Strike


Player Span Match Inns Runs Average 10
Outs Score Faced Rate

A
1982-
36 Ranatunga 269 255.0 47 7456 131* 35.84 9571.0 77.90
1999
(SL)

A
1982-
44 Ranatunga 269 255.0 47 7456 131* 35.84 9571.0 77.90
1999
(SL)

Now we have to drop duplicate rows.

df=df.drop_duplicates()

df[df['Player'].isin(['A Ranatunga (SL)','BRM Taylor (ZIM)'])]

Player Span Match Inns Not Outs Runs Highest Score Average Balls Faced Strike Rate 100 50 Ducks Fours(4) Sixes(6)

36 A Ranatunga (SL) 1982-1999 269 255.0 47 7456 131* 35.84 9571.0 77.90 4 49 18 523+ 64+

51 BRM Taylor (ZIM) 2004-2021 205 203.0 15 6684 145* 35.55 8721.0 76.64 11 39 15 599 106

Again we check for duplications and it shows no duplicate data.

df[df['Player'].duplicated()==1]

Player Span Match Inns Not Outs Runs Highest Score Average Balls Faced Strike Rate 100 50 Ducks Fours(4) Sixes(6)

**Now we have this final data that is cleaned using panda library.

df

1 to 25 of 92 entries Filter
index Player Span Match Inns Not Outs Runs Highest Score Average Balls Faced Strike Rate 100 50 Ducks Fours(4) Sixes(6)
0 SR Tendulkar (IND) 1989-2012 463 452.0 41 18426 200* 44.83 21368.0 86.23 49 96 20 2016 195
1 KC Sangakkara (Asia/ICC/SL) 2000-2015 404 380.0 41 14234 169 41.98 18048.0 78.86 25 93 15 1385 88
2 RT Ponting (AUS/ICC) 1995-2012 375 365.0 39 13704 164 42.03 17046.0 80.39 30 82 20 1231 162
3 ST Jayasuriya (Asia/SL) 1989-2011 445 433.0 18 13430 189 32.36 14725.0 91.2 28 68 34 1500 270
4 V Kohli (IND) 2008-2023 281 269.0 41 13083 183 57.38 13950.0 93.78 47 66 15 1226 142
5 DPMD Jayawardene (Asia/SL) 1998-2015 448 418.0 39 12650 144 33.37 16020.0 78.96 19 77 28 1119 76
6 Inzamam-ul-Haq (Asia/PAK) 1991-2007 378 350.0 53 11739 137* 39.52 0.0 74.24 10 83 20 971 144
7 JH Kallis (Afr/ICC/SA) 1996-2014 328 314.0 53 11579 139 44.36 15885.0 72.89 17 86 17 911 137
8 SC Ganguly (Asia/IND) 1992-2007 311 300.0 23 11363 183 41.02 15416.0 73.7 22 72 16 1122 190
9 R Dravid (Asia/ICC/IND) 1996-2011 344 318.0 40 10889 153 39.16 15285.0 0.0 12 83 13 950 42
10 MS Dhoni (Asia/IND) 2004-2019 350 297.0 84 10773 183* 50.57 12303.0 87.56 10 73 10 826 229
11 CH Gayle (ICC/WI) 1999-2019 301 294.0 17 10480 215 37.83 12019.0 87.19 25 54 25 1128 331
12 BC Lara (ICC/WI) 1990-2007 299 289.0 32 10405 169 40.48 13086.0 79.51 19 63 16 1042 133
13 TM Dilshan (SL) 1999-2016 330 303.0 41 10290 161* 39.27 11933.0 86.23 22 47 11 1111 55
14 RG Sharma (IND) 2007-2023 251 243.0 36 10112 264 48.85 11170.0 90.52 30 52 15 928 292
15 Mohammad Yousuf (Asia/PAK) 1998-2010 288 273.0 40 9720 141* 41.71 12942.0 0.0 15 64 15 785 90
16 AB de Villiers (Afr/SA) 2005-2018 228 218.0 39 9577 176 53.5 9473.0 101.09 25 53 7 840 204
17 M Azharuddin (IND) 1985-2000 334 308.0 54 9378 153* 36.92 12669.0 74.02 7 58 9 622+ 77+
18 PA de Silva (SL) 1984-2003 308 0.0 30 9284 145 34.9 11443.0 81.13 11 64 17 712+ 102+
19 Saeed Anwar (PAK) 1989-2003 247 244.0 19 8824 194 39.21 10938.0 80.67 20 43 15 938 97
20 S Chanderpaul (WI) 1994-2011 268 251.0 40 8778 150 41.6 0.0 70.74 11 59 6 722 85
21 Yuvraj Singh (Asia/IND) 2000-2017 304 278.0 40 8701 150 36.55 9924.0 87.67 14 52 18 908 155
22 DL Haynes (WI) 1978-1994 238 237.0 28 8648 152* 41.37 13707.0 63.09 17 57 13 768+ 53+
23 LRPL Taylor (NZ) 2006-2022 236 220.0 39 8607 181* 47.55 10330.0 83.32 21 51 9 713 147
24 MS Atapattu (SL) 1990-2007 268 259.0 32 8529 132* 37.57 12594.0 67.72 11 59 13 734 15
Show 25 per page 1 2 3 4

Like what you see? Visit the data table notebook to learn more about interactive tables.

https://colab.research.google.com/drive/1y3gYMvP6_blXC19LMusye-tcAmlWpfTi#scrollTo=oSZt2D7_1p2x&printMode=true 3/3

You might also like