Search in A String - JetBrains Academy - Learn Programming by Building Your Own Apps

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Computer science → Programming languages → Python → Working with data → Data types and operations → Strings

Search in a string
Theory Practice 18% completed, 0 problems solved

Theory 4 required topics


11 minutes reading
Declaring a function

Verify to skip Start practicing


Indexes

One of the essential skills when working with data is to be able to search it and Searching a substring
locate specific bits of information. Working with textual data in Python, you may
need to get some information about its content: whether it includes a specific If statement
substring (i.e. part of the string), where this substring is, or how many times it
occurs in the text. In this topic, we will learn how to do it. 1 dependent topic

find() under the hood

§1. A substring searching algorithm


Table of contents:
We'll start with a substring searching problem. Given two strings, text and pattern,
we need to identify whether there is at least one occurrence of the pattern in the ↑ Search in a string
text. The simplest and most natural way to solve this problem is to sequentially
consider all substrings of the text whose length is equal to the length of the §1. A substring searching algorithm
pattern and compare them with the pattern itself. If at least in one case all
§2. Membership testing
corresponding symbols match, the pattern is found. If none of such attempts were
successful, we should indicate that there is no pattern in the text. Here's how this §3. Boolean search in a string
simple algorithm can be implemented in Python:
§4. Element position
1 def contains(text, pattern):
§5. Element number
2 for i in range(len(text) - len(pattern) + 1):
3 found = True
§6. Summary
4
5 for j in range(len(pattern)): Discussion
6 if text[i + j] != pattern[j]:
7 found = False
8 break
9
1
0 if found:
1
1 return True
1
2
1
3 return False

The function named contains takes two strings, text and pattern , as input
and returns True if text contains pattern and False otherwise.

At each step of the outer for loop, we create a variable named found and
initialize it with True . Then, in the inner for loop, we start comparing pattern
with the current substring of text . If at least one of the corresponding symbols
doesn't match, we set the variable found to False and break the inner loop.
After the inner for loop is done, we check the state of the found variable. If it
remains True , this means that each symbol of pattern matches the current
substring. In this case, we return True indicating that pattern is found.
Otherwise, we move to the next iteration and start considering the next substring.
In case none of the comparisons were successful, that is, the outer for loop
finishes all iterations, the function returns False indicating that pattern is not
found.

Here is how this algorithm can be used:


1 contains("abacabad", "cab") # True
2 contains("abacabad", "abacabad") # True
3 contains("aba", "") # True
4 contains("abacabad", "hello") # False

Now we know that there is a substring searching algorithm and how to implement
it in Python. However, there are also built-in functions and operators that solve the
problem. Let's see what they are.

§2. Membership testing


Another way to define if there's a specific pattern in our string is called
membership testing, and it is implemented with the help of the operators in and
not in . When we write pattern in string , the left operand should be a
string, and membership test checks if string contains pattern as a substring.

If membership test returns True , this means that there exists a position in
string starting from which you can read the pattern in the string.

1 print("apple" in "pineapple") # True


2 print("milk" in "yogurt") # False

Interestingly, an empty string is considered to be a substring of any string.

1 print('' in '') # True


2 print('' not in "lemon") # False

§3. Boolean search in a string


Apart from knowing that a substring just occurs in the string, we can determine
that the string starts or ends with a specific pattern. Methods startswith() and
endswith() return True if the pattern is found and False otherwise.

1 email = "email_address@something.com"
2 print(email.startswith("www.")) # False
3 print(email.endswith("@something.com")) # True

Optional values for start and end that bound the search area can be added:
string.startswith(pattern, start, end) . When we specify only one
additional element, it's automatically considered as start .

1 email = "my_email@something.com"
2 print(email.startswith("email", 2)) # False
3 print(email.startswith("email", 3)) # True

In the example above, when we specified the start argument as 2, we limited


the search to the substring "_email@something.com" , which actually doesn't
start with "email" . Then we fixed this off-by-one mistake by setting start to 3.

Note that the substring bound by the start and end indexes does include the
character with the start index but does not include the element with the end index.

1 email = "my_email@something.com"
2 print(email.endswith("@", 5, 8)) # False
3 print(email.endswith("@", 5, 9)) # True

The substring defined for the search in the first case is "ail" , while in the
second one it's "ail@" .
§4. Element position
Now, as we know how to check if a string contains a substring, starts or ends with
it, let's learn how to define the exact position of the substring. We can use the
methods find() or index() to do so:

1 best = "friend"
2
3 print(best.find("i")) # 2
4 print(best.index("i")) # 2

They work absolutely the same except that the former returns -1 if it can't find
the given element, while the latter raises ValueError .

1 print(best.find("u")) # -1
2 print(best.index("u")) # ValueError

So, all the examples with find() below will work with index() as well.

We can search both for single characters and for longer substrings. In the latter
case, the index of the first character of the substring is returned.

1 print(best.find("end")) # 3

In the string friend , the substring end occupies positions from 3 to 5, and the
start index is returned. Keep in mind that both methods return only the index of
the first occurrence of the element we search for.

1 magic = "abracadabra"
2 print(magic.find("ra")) # 2

However, we can additionally specify an interval for searching, just as with the
boolean search: string.find(pattern, start, end) .

1 print(magic.find("ra", 5)) # 9
2 print(magic.find("ra", 5, 10)) # -1

Once again, the end index is not included in the search area.

Alternatively, we can use methods rfind() and rindex() to search backward


from the end of the string.

1 print(magic.rfind("ra")) # 9
2 print(magic.rindex("a")) # 10

§5. Element number


Finally, it's often useful to count how many times an element (a char or a
substring) occurs in the string, and for this, we can use the method count() .

1 magic = "abracadabra"
2
3 print(magic.count("abra")) # 2
4 print(magic.count("a")) # 5

§6. Summary
In this topic, we have examined different aspects of searching through a string and
learned how to locate specific patterns. Now you will be able:

To implement a substring searching algorithm in Python,


To test for membership in a text,
To check that the string starts or ends with a specific pattern,
To find the exact position of a substring,
To count how many times a pattern occurs in the text.

This knowledge will be helpful in real-world tasks, so let's practice it!


Report a typo

691 users liked this piece of theory. 9 didn't like it. What about you?

Start practicing Verify to skip

Comments (23) Useful links (0) Show discussion

Tracks About Become beta tester

Pricing Contribute Be the first to see what's new


For organizations Careers

Terms Support Made with by Hyperskill and JetBrains

You might also like