Facebook - Jupyter Notebook

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

In [1]: import pandas as pd

In [5]: df=pd.read_csv(r"C:\Users\Admin\eclipse\Downloads\pseudo_facebook.csv\pseudo_f

In [6]: df.head()

Out[6]: userid age dob_day dob_year dob_month gender tenure friend_count friendships_initi

0 2094382 14 19 1999 11 male 266.0 0

1 1192601 14 2 1999 11 female 6.0 0

2 2083884 14 16 1999 11 male 13.0 0

3 1203168 14 25 1999 12 female 93.0 0

4 1733186 14 4 1999 12 male 82.0 0

 

In [38]: df.columns

Out[38]: Index(['userid', 'age', 'dob_day', 'dob_year', 'dob_month', 'gender', 'tenur


e',
'friend_count', 'friendships_initiated', 'likes', 'likes_received',
'mobile_likes', 'mobile_likes_received', 'www_likes',
'www_likes_received'],
dtype='object')

a. Create data subsets

In [39]: sub1=df['mobile_likes_received']

In [40]: sub1

Out[40]: 0 0
1 0
2 0
3 0
4 0
...
98998 11887
98999 10592
99000 11462
99001 5760
99002 9530
Name: mobile_likes_received, Length: 99003, dtype: int64
In [10]: subsets=df[['dob_day','likes','gender','mobile_likes','friendships_initiated']

In [11]: subsets

Out[11]: dob_day likes gender mobile_likes friendships_initiated

0 19 0 male 0 0

1 2 0 female 0 0

2 16 0 male 0 0

3 25 0 female 0 0

4 4 0 male 0 0

... ... ... ... ... ...

98998 4 3996 female 3505 341

98999 12 4401 female 4399 1720

99000 10 11959 female 11959 1524

99001 11 4506 female 4506 185

99002 15 9410 female 9410 768

99003 rows × 5 columns

b.Merge Data

In [12]: df2=pd.read_csv(r"C:\Users\Admin\Desktop\ml_data\startup_funding.csv")
In [14]: df2.head(10)

Out[14]: SNo Date StartupName IndustryVertical SubVertical CityLocation InvestorsName

Predictive
0 0 01/08/2017 TouchKin Technology Care Bangalore Kae Capital
Platform

Digital Triton
1 1 02/08/2017 Ethinos Technology Marketing Mumbai Investment
Agency Advisors

Online
Kashyap
platform for
Consumer Deorah, Anand
2 2 02/08/2017 Leverage Edu Higher New Delhi
Internet Sankeshwar,
Education
Deepak Jain,...
Services

Kunal Shah,
DIY
Consumer LetsVenture,
3 3 02/08/2017 Zepo Ecommerce Mumbai
Internet Anupam Mittal,
platform
Hetal ...

healthcare
Consumer Narottam Thudi,
4 4 02/08/2017 Click2Clinic service Hyderabad
Internet Shireesh Palle
aggregator

Reliance
Peer to Peer
Consumer Corporate
5 5 01/07/2017 Billion Loans Lending Bangalore
Internet Advisory
platform
Services Ltd

Energy
management Infuse
6 6 03/07/2017 Ecolibriumenergy Technology Ahmedabad
solutions Ventures, JLL
provider

Asset
Online
Management
marketplace
7 7 04/07/2017 Droom eCommerce Gurgaon (Asia) Ltd,
for
Digital Garage
automobiles
Inc

online
Kalaari Capital,
marketplace
8 8 05/07/2017 Jumbotail eCommerce Bangalore Nexus India
for food and
Capital Advisors
grocery

B2B International
marketplace Finance
9 9 05/07/2017 Moglix eCommerce Noida
for Industrial Corporation,
products Rocketship,...

 

In [17]: df3=pd.concat([df,df2],axis=1) #axis=0 it indicate add data in row manner


#axis=1 it indicate add data in column manner
In [18]: df3.head()

Out[18]: userid age dob_day dob_year dob_month gender tenure friend_count friendships_initi

0 2094382 14 19 1999 11 male 266.0 0

1 1192601 14 2 1999 11 female 6.0 0

2 2083884 14 16 1999 11 male 13.0 0

3 1203168 14 25 1999 12 female 93.0 0

4 1733186 14 4 1999 12 male 82.0 0

5 rows × 25 columns
 
c. Sort Data

In [22]: df3.sort_values(by='StartupName',ascending=False)

Out[22]: userid age dob_day dob_year dob_month gender tenure friend_count friendships_

56 1264260 14 11 1999 7 male 18.0 0

2230 2008255 21 21 1992 10 female 25.0 1

1173 1073170 35 1 1978 1 male 7.0 0

526 2143083 23 5 1990 9 male 101.0 0

878 1992445 28 10 1985 5 male 255.0 0

... ... ... ... ... ... ... ... ...

98998 1268299 68 4 1945 4 female 541.0 2118

98999 1256153 18 12 1995 3 female 21.0 1968

99000 1195943 15 10 1998 5 female 111.0 2002

99001 1468023 23 11 1990 4 female 416.0 2560

99002 1397896 39 15 1974 5 female 397.0 2049

99003 rows × 25 columns


 

d. Transposing Data

In [23]: result = df3.transpose()


# it convert rows into columns and viceversa
In [24]: result.head()

Out[24]: 0 1 2 3 4 5 6 7 8

userid 2094382 1192601 2083884 1203168 1733186 1524765 1136133 1680361 1365174

age 14 14 14 14 14 14 13 13 13

dob_day 19 2 16 25 4 1 14 4 1

dob_year 1999 1999 1999 1999 1999 1999 2000 2000 2000

dob_month 11 11 11 12 12 12 1 1 1

5 rows × 99003 columns


 

e. Shape and reshape Data

In [25]: df3.shape

Out[25]: (99003, 25)

In [36]: df.values.reshape((-1,1))

Out[36]: array([[2094382],
[14],
[19],
...,
[9530],
[0],
[2913]], dtype=object)

In [ ]: ​

You might also like