Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 16

Introduction to Data

Science
Pandas
Generate Series

• pd.Series([‘a', ‘b’, ‘c’])


• pd.Series([1,2,3])
• pd.Series([‘a’, 2, True])
• pd.Series([‘a', ‘b’, ‘c’], index=[‘x’,’y’,’z’])

2
Series Index and Slicing

• my_series = pd.Series(['ayam', 'bebek', 'cacing',


'domba', 'elang’], index = ['a','b','c','d','e'])
• my_series[0]
• my_series[2]
• my_series['c’]
• my_series[1:4]

3
Manipulate Series

• my_series = pd.Series(['ayam', 'bebek', 'cacing'])


• my_series_1 = pd.Series([‘fox', ‘giraffe'])
• my_series_2 = pd.concat([my_series, my_series_1])
• my_series_2.sort_index()
• my_series_2.reset_index(drop=True, inplace=True)
• my_series_2[3] = ‘deer’
• my_series_2[‘x’] = ‘hippo’
• my_series.drop(labels=0, inplace=True)
4
Generate DataFrame (1)

• dict_1 = {"Nama": "Ani", "Umur": 21, "WNI": True, "Gender": "Female", "GPA": 3.5}
• dict_2 = {"Nama": "Budi", "Umur": 20, "WNI": True, "Gender": "Male", "GPA": 3.3}
• dict_3 = {"Nama": "Charlie", "Umur": 23, "WNI": False, "Gender": "Male", "GPA": 3.2}
• dict_4 = {"Nama": "Devi", "Umur": 20, "WNI": True, "Gender": "Female", "GPA": 3.7}

• bio_ani = pd.Series(dict_1)
• bio_budi = pd.Series(dict_2)
• bio_charlie = pd.Series(dict_3)
• bio_devi = pd.Series(dict_4)

• df_bimbingan = pd.DataFrame([bio_ani,bio_budi,bio_charlie,bio_devi])

5
Generate DataFrame (2)

• dict_bimbingan = {"Nama": ['ani', 'budi', 'caca’],


"Umur": [20,22,21],
"Gender": ['F','M','F']}
• pd.DataFrame(dict_bimbingan)

6
Indexing DataFrame

• df_bimbingan['Nama’]
• df_bimbingan[['Nama', 'Umur’]]

• df_bimbingan.iloc[1]
• df_bimbingan.iloc[1:3]

• df_bimbingan.iloc[1:4,2:5]
7
Query DataFrame

• df_bimbingan[df_bimbingan["Gender"]=="Male"]
• df_bimbingan[df_bimbingan["GPA"]>=3.5]

• df_bimbingan[df_bimbingan["GPA"]>=3.5 &
df_bimbingan["Gender"]=="Male"]]

8
DataFrame Manipulation

• temp = {"Nama": "Eko", "Umur": 20, "WNI": True, "Gender":


"Male", "GPA": 3.5}
• df_bimbingan = pd.concat([df_bimbingan, pd.DataFrame(temp,
index=[0])], ignore_index=True)
• df_bimbingan.loc[5] = temp
• df_bimbingan.drop(5, axis=0,inplace=True)
• df_bimbingan["Strata"] = ["S2", "S2", "S1", "S1", "S1"]
• df_bimbingan.loc[6] = ["Farah", 22, True, "Female", 3.9, "S2"]
• df_bimbingan.drop(['Strata'], axis=1, inplace=True)
9
DataFrame Info and Stat

• df_bimbingan.info()
• df_bimbingan.Gender.unique()
• df_bimbingan.Gender.value_counts()
• df_bimbingan.Umur.max()
• df_bimbingan.GPA.mean()
• df_bimbingan.GPA.describe()

10
DataFrame Pivot

• df_bimbingan.pivot(columns='Gender', values='Umur’)
• df_bimbingan.pivot(columns='Gender',
values='Umur').mean()

11
DataFrame Group by

• df_bimbingan.groupby('Gender').mean(numeric_only=True)

12
DataFrame Join

• dosen = {'Gender' : ['Female', 'Male'],


'Group Name' : ['Group A','Group B'],
'Lecturer' : ['Ms. Gina','Mr. Hasan']}
• df_dosen = pd.DataFrame(dosen)

• df_bimbingan = df_bimbingan.merge(df_dosen, how='left',


left_on='Gender', right_on='Gender')

13
Read and Write Data

• df_bimbingan.to_csv('df_bimbingan.csv')
• df_bimbingan_edited = pd.read_csv('df_bimbingan.csv')

14
Exercise (1)

• url =
'https://raw.githubusercontent.com/datasciencedojo/datasets/mast
er/titanic.csv'
• df = pd.read_csv(url, index_col=0)
• df.head()

15
Exercise (2)

• Calculate the probability that a woman aboard the Titanic


survived the accident!
• Compare the level of passenger safety in each class
• Compute the probability of survival of the first-class female
passenger; Who is not survive?
• Compare the age distribution of survivors and non-survivors

16

You might also like