Importing Data Into Pandas Dataframes

Statistics with Python
February 25, 2024
Assignment Webpage: immkavin-ranks.github.io/pd

Author: Kavin Manoharan kavinsde@skiff.com
Personal Website: kavinsde.netlify.app
1 Assignment - Importing Data into Pandas Dataframes

Pandas allows you to import data from a wide range of data sources directly into a
dataframe. These can be static files, such as CSV, TSV, fixed width files, Microsoft
Excel, JSON, SAS and SPSS files, as well as a range of popular databases, such as
MySQL, PostgreSQL and Google BigQuery. You can even scrape data directly from
web pages into Pandas dataframes.
When importing data into Pandas dataframes, you can also save time and write less
code by defining which columns to import, rename the columns, set their data types,
define the index, and many other things.
1.0.1 Load the Pandas Library

When importing the Pandas package the convention is to use the command import
pandas as pd which allows you to call Pandas functions by prefixing them with pd.
instead of pandas..
[1]: import pandas as pd
1.1 Reading CSV with Pandas

Comma Separated Value or CSV files are likely to be the file format you encounter most
commonly in data science. As the name suggests, these are simple text files in which
the values are separated (usually) by commas.
1.1.1 Reading a local CSV file

To import a CSV file and put the contents into a Pandas dataframe we use the
read_csv() function, which is appended after calling the pd object we created when
we imported Pandas. The read_csv() function can take several arguments, but by
default you just need to provide the path to the file you wish to read.
The Ice Cream Sales - temperatures.csv
1
Dataset Temperature and Ice Cream Sales - A public dataset relating temperature and
ice cream revenue
[2]: df = pd.read_csv('Data/Ice Cream Sales - temperatures.csv')
Reading the data

• One of the most used method for getting a quick overview of the DataFrame, is the head()
method.
• The head() method returns the headers and a specified number of rows, starting from the
top. Default argument value = 5
[3]: df.head()
[3]: Temperature Ice Cream Profits

0 39 13.17
1 40 11.88
2 41 18.82
3 42 18.65
4 43 17.02
Renaming Column Headers on Import

To rename the columns, we simply use read_csv() to load the file and then pass in a
list of the new names to the names argument, and use skiprows to ignore the first row
of the file which contains the old column names.
[4]: df = pd.read_csv('Data/Ice Cream Sales - temperatures.csv', names=['Temp in␣
↪Celcius', 'Ice Cream Sale Profits'], skiprows=1)
df.head(2)
[4]: Temp in Celcius Ice Cream Sale Profits

0 39 13.17
1 40 11.88
1.1.2 Reading a remote CSV file

You can also use read_csv() to read remote CSV files. Instead of passing in the path
to the file you provide the full URL to the CSV file. Here, I’m loading a CSV file from
my GitHub account.
Defining which fields to import

You can use the usecols argument to define the list of column names to import.
[5]: df = pd.read_csv('https://raw.githubusercontent.com/immkavin-ranks/pd/main/Data/
↪spotify_long_tracks_2014_2024.csv',\
usecols=['Name', 'Artists'])
df.head(3)
2
[5]: Name Artists
0 Steady Rain in a Forest with Light Background … Nature Sounds
1 Soundarya Lahari Mambalam Sisters
2 Waves of Abundance & Fullfillment Zen Life Relax
1.2 Reading Microsoft Excel files with Pandas

Pandas includes built-in functionality for reading Microsoft Excel spreadsheet files via
its read_excel() function. This works in the same way as read_csv() so can be used
on local Excel documents as well as remote files, however, as Excel files are a bit more
bloated than CSV files, it can be a bit slower.
Specifying the number of rows to read

To restrict the number of rows that are read in you can pass an integer representing the
number of rows to the nrows argument of read_csv().
[6]: df = pd.read_excel('Data/team-players.xlsx', nrows=3)

df.head()
[6]: Player Role Price

0 Ambati Rayudu (R) Batsman 2.20 crore
1 Monu Kumar (R) Batsman 20 lakhs
2 Murali Vijay (R) Batsman 2 crore
1.3 Reading HTML with Pandas

One other handy feature of Pandas is the read_html() function. This allows you to
parse HTML markup from remote web pages or local HTML documents and extract
any tables present.
The read_html() function returns any tables it finds in a list, so if more than one is
present, you’ll need to define which one to display in your dataframe using its list index,
which starts from zero.
[7]: data = pd.read_html('https://en.wikipedia.org/wiki/
↪Comparison_of_programming_languages#General_comparison')
data[1].head()
[7]: Language \
0 1C:Enterprise programming language
1 ActionScript
2 Ada
3 Aldor
4 ALGOL 58
Original purpose Imperative \

0 Application, RAD, business, general, web, mobile Yes
1 Application, client-side, web Yes
3
2 Application, embedded, realtime, system Yes
3 Highly domain-specific, symbolic computing Yes
4 Application Yes
Object-oriented Functional Procedural Generic Reflective \

0 No Yes Yes Yes Yes
1 Yes Yes Yes No No
2 Yes[2] No Yes[3] Yes[4] No
3 Yes Yes No No No
4 No No No No No
Other paradigms \
0 Object-based, Prototype-based programming
1 prototype-based
2 Concurrent,[5] distributed,[6]
3 NaN
4 NaN
Standardized?
0 No
1 Yes 1999-2003, ActionScript 1.0 with ES3, Acti…
2 Yes 1983, 2005, 2012, ANSI, ISO, GOST 27831-88[7]
3 No
4 No
1.4 Reading SQL Database with Pandas

You can use read_sql() to read db files. To connect database file we import sqlite3.
[8]: import sqlite3
conn = sqlite3.connect('Data/favorites.db')
# Read the data from the database into a DataFrame

df = pd.read_sql('SELECT * FROM shows', conn)
# Close the connection to the database

conn.close()
print(df)
id title
0 1 How I Met Your Mother
1 2 The Sopranos
2 3 Friday Night Lights
3 4 Family Guy
4 5 New Girl
.. … …
4
152 153 Sherlock
153 154 Anne With An E
154 155 Money Heist
155 156 Succession
156 157 Silicon Valley
[157 rows x 2 columns]
1.5 Requirments
• Pandas
• Sqlite3
• Openpyxl
1.6 References https://github.com/immkavin-ranks/pd

1. Datasets from Kaggle
• Temperature and Ice Cream Sales
• Spotify’s Long Hits (2014-2024)
• IPL-2020 Dataset
2. Dataset from Wikipedia - Comparison of Programming Languages
3. Dataset from CS50x 2023 Practice Problem-Favorites
4. An Article from Practical Data Science
5. W3Schools Pandas Tutorial

Importing Data Into Pandas Dataframes

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Importing Data Into Pandas Dataframes

Uploaded by

Copyright:

Available Formats

Statistics with Python

February 25, 2024

Assignment Webpage: immkavin-ranks.github.io/pd

1 Assignment - Importing Data into Pandas Dataframes

1.0.1 Load the Pandas Library

1.1 Reading CSV with Pandas

1.1.1 Reading a local CSV file

Reading the data

[3]: Temperature Ice Cream Profits

Renaming Column Headers on Import

[4]: Temp in Celcius Ice Cream Sale Profits

1.1.2 Reading a remote CSV file

Defining which fields to import

1.2 Reading Microsoft Excel files with Pandas

Specifying the number of rows to read

[6]: df = pd.read_excel('Data/team-players.xlsx', nrows=3)

[6]: Player Role Price

1.3 Reading HTML with Pandas

Original purpose Imperative \

Object-oriented Functional Procedural Generic Reflective \

1.4 Reading SQL Database with Pandas

[8]: import sqlite3

# Read the data from the database into a DataFrame

# Close the connection to the database

[157 rows x 2 columns]

1.6 References https://github.com/immkavin-ranks/pd

You might also like