Pandas 8

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

1/29/23, 12:53 PM 5 Methods for Filtering Strings with Python Pandas | by Soner Yıldırım | Towards Data Science

You are signed out. Sign in with your member account


(ja__@u__.gt) to view other member-only stories. Sign in

Published in Towards Data Science

You have 2 free member-only stories left this month. Sign up for Medium and get an extra one

Soner Yıldırım Follow

Aug 10, 2022 · 4 min read · · Listen

Save

5 Methods for Filtering Strings with Python


Pandas
Explained with examples

https://towardsdatascience.com/5-methods-for-filtering-strings-with-python-pandas-ebe4746dcc74 1/9
1/29/23, 12:53 PM 5 Methods for Filtering Strings with Python Pandas | by Soner Yıldırım | Towards Data Science

You are signed out. Sign in with your member account


(ja__@u__.gt) to view other member-only stories. Sign in

Open in app Sign up Sign In

Search Medium

(Photo by Neven Krcmarek on Unsplash)

We often deal with textual data, which requires much more cleaning than numeric
data. In order to extract usable and informative insights from textual data, we usually
need to perform several steps of preprocessing and filtering.

Pandas library has lots of functions and methods that make working with textual data
easy and simple. In this article, we will learn 5 different methods that can be used for
filtering textual data (i.e. strings):

1. Containing a sequence of characters

2. The length of a string

3. Starting or ending with a specific sequence of characters

4. Characters being numeric or alphanumeric

5. Number of occurrences of a specific sequence of characters

https://towardsdatascience.com/5-methods-for-filtering-strings-with-python-pandas-ebe4746dcc74 2/9
1/29/23, 12:53 PM 5 Methods for Filtering Strings with Python Pandas | by Soner Yıldırım | Towards Data Science

You are signed out. Sign in with your member account


(ja__@u__.gt) to view other member-only stories. Sign in
Let’s start with creating a DataFrame from a csv file.

import pandas as pd

df = pd.read_csv("example.csv")
df

160 1

df (image by author)

The DataFrame contains 6 rows and 4 columns. We will use different methods to filter
rows in this DataFrame. These methods are available via the str accessor of Pandas.

The first filtering operation we will do is to check if a string contains a particular word
or a sequence of characters. We can use the contains method for this task. The
following line of code finds the rows in which the description field contains “used car”.

df[df["description"].str.contains("used car")]

https://towardsdatascience.com/5-methods-for-filtering-strings-with-python-pandas-ebe4746dcc74 3/9
1/29/23, 12:53 PM 5 Methods for Filtering Strings with Python Pandas | by Soner Yıldırım | Towards Data Science

You are signed out. Sign in with your member account


(ja__@u__.gt) to view other member-only stories. Sign in

(image by author)

There is actually another row for a used car but the description does not contain the
“used car” expression. In order to find all the used cars in this DataFrame, we can look
for the words “used” and “car” separately.

df[df["description"].str.contains("used") &
df["description"].str.contains("car")]

(image by author)

The last row contains both “car” and “used” but not together.

We can also filter based on the length of a string. Let’s say we are only interested in the
description that are longer than 15 characters. We can do this operation by using the
built-in len function as follows:

df[df["description"].apply(lambda x: len(x) > 15)]

https://towardsdatascience.com/5-methods-for-filtering-strings-with-python-pandas-ebe4746dcc74 4/9
1/29/23, 12:53 PM 5 Methods for Filtering Strings with Python Pandas | by Soner Yıldırım | Towards Data Science

We write a lambda expression that includes checking the length with the len function
and apply it to each row in the description column. A more practical and efficient way
You are signed out. Sign in with your member account
of doing this operation is to
(ja__@u__.gt) to use the len
view other method stories.
member-only via theSign
strinaccessor.

df[df["description"].str.len() > 15]

The output will be the same:

(image by author)

We can filter based on the first or last letter of a string using the startswith and
endswith methods, respectively.

df[df["lot"].str.startswith("A")]

(image by author)

https://towardsdatascience.com/5-methods-for-filtering-strings-with-python-pandas-ebe4746dcc74 5/9
1/29/23, 12:53 PM 5 Methods for Filtering Strings with Python Pandas | by Soner Yıldırım | Towards Data Science

These methods are able to check the first n characters as well. For instance, we can
select rows in which the lot value starts with ‘A-0’:
You are signed out. Sign in with your member account
(ja__@u__.gt) to view other member-only stories. Sign in

df[df["lot"].str.startswith("A-0")]

(image by author)

Python has some built-in string functions, which can be used for filtering string values
in Pandas DataFrames. For instance, in the price column, there are some non-numeric
characters such as $ and k. We can filter these out using the isnumeric function.

df[df["price"].apply(lambda x: x.isnumeric()==True)]

(image by author)

If we are looking for alphanumeric characters (i.e. only letters and numbers), we can
use the isalphanum function.

https://towardsdatascience.com/5-methods-for-filtering-strings-with-python-pandas-ebe4746dcc74 6/9
1/29/23, 12:53 PM 5 Methods for Filtering Strings with Python Pandas | by Soner Yıldırım | Towards Data Science

The count method counts the number of occurrences of a single character or a


You are signed out. Sign in with your member account
sequence of characters. This
(ja__@u__.gt) might
to view othercome in handy
member-only forSign
stories. filtering
in in some cases. For
instance, you may be looking for strings that do not include a sequence of characters
more than once.

We can count the word “used” in the description column as follows:

df["description"].str.count("used")
# output
0 1
1 0
2 1
3 1
4 1
5 0
Name: description, dtype: int64

If you want to use this for filtering, just compare it to a value as follows:

df[df["description"].str.count("used") < 1]

We have covered 5 different ways of Pandas DataFrames based on string values. We


tend to give more importance to numeric data but textual data is just as important and
carries lots of valuable information. Therefore, being able to clean and preprocess
textual data is of crucial importance for data analysis and modeling.

You can become a Medium member to unlock full access to my writing, plus the rest of
Medium. If you already are, don’t forget to subscribe if you’d like to get an email whenever I
publish a new article.

https://towardsdatascience.com/5-methods-for-filtering-strings-with-python-pandas-ebe4746dcc74 7/9
1/29/23, 12:53 PM 5 Methods for Filtering Strings with Python Pandas | by Soner Yıldırım | Towards Data Science

Thank you for reading. Please let me know if you have any feedback.

You are signed out. Sign in with your member account


(ja__@u__.gt) to view other member-only stories. Sign in

Data Science Artificial Intelligence Machine Learning Programming

Tips And Tricks

Enjoy the read? Reward the writer.Beta


Your tip will go to Soner Yıldırım through a third-party platform of their choice, letting them know you appreciate their
story.

Give a tip

Sign up for The Variable


By Towards Data Science

Every Thursday, the Variable delivers the very best of Towards Data Science: from hands-on tutorials and cutting-edge
research to original features you don't want to miss. Take a look.

By signing up, you will create a Medium account if you don’t already have one. Review
our Privacy Policy for more information about our privacy practices.

Get this newsletter

About Help Terms Privacy

Get the Medium app


https://towardsdatascience.com/5-methods-for-filtering-strings-with-python-pandas-ebe4746dcc74 8/9
1/29/23, 12:53 PM 5 Methods for Filtering Strings with Python Pandas | by Soner Yıldırım | Towards Data Science

You are signed out. Sign in with your member account


(ja__@u__.gt) to view other member-only stories. Sign in

https://towardsdatascience.com/5-methods-for-filtering-strings-with-python-pandas-ebe4746dcc74 9/9

You might also like