No Really, Pathlib Is Great

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

12/07/2022 19:59 No really, pathlib is great

Trey Hunner
I help developers level-up their Python skills
Hire Me For Training

RSS

Search

Navigate…

Articles
Talks
Python Morsels
Team Training
About

No really, pathlib is great


Jan 15th, 2019 11:20 am | Comments

I recently published an article about Python’s pathlib module and how I think everyone should be using it.

I won some pathlib converts, but some folks also brought up concerns. Some folks noted that I seemed to be comparing pathlib to os.path in a disingenuous way.
Some people were also concerned that pathlib will take a very long time to be widely adopted because os.path is so entrenched in the Python community. And there
were also concerns expressed about performance.

In this article I’d like to acknowledge and address these concerns. This will be both a defense of pathlib and a sort of love letter to PEP 519.

Comparing pathlib and os.path the right way


Normalizing file paths shouldn’t be your concern
pathlib seems great, but I depend on code that doesn’t use it!
But my favorite third-party library X has a better Path object!
But Path objects and path strings don’t mix, do they?
pathlib is too slow
Improving readability with pathlib
Start using pathlib.Path objects

Comparing pathlib and os.path the right way


In my last article I compared this code which uses os and os.path:

1 import os
2 import os.path
3
4 os.makedirs(os.path.join('src', '__pypackages__'), exist_ok=True)
5 os.rename('.editorconfig', os.path.join('src', '.editorconfig'))

To this code with uses pathlib.Path:

1 from pathlib import Path


2
3 Path('src/__pypackages__').mkdir(parents=True, exist_ok=True)
4 Path('.editorconfig').rename('src/.editorconfig')

This might seem like an unfair comparison because I used os.path.join in the first example to ensure the correct path separator is used on all platforms but I didn’t
do that in the second example. But this is in fact a fair comparison because the Path class normalizes path separators automatically.

We can prove this by looking at the string representation of this Path object on Windows:

1 >>> str(Path('src/__pypackages__'))
2 'src\\__pypackages__'

No matter whether we use the joinpath method, a / in a path string, the / operator (which is a neat feature of Path objects), or separate arguments to the Path
constructor, we get the same representation in all cases:

1 >>> Path('src', '.editorconfig')


2 WindowsPath('src/.editorconfig')
3 >>> Path('src') / '.editorconfig' Privacidade - Termos

4 WindowsPath('src/.editorconfig') Write more Pythonic code ↑

https://treyhunner.com/2019/01/no-really-pathlib-is-great/ 1/7
12/07/2022 19:59 No really, pathlib is great
5 >>> Path('src').joinpath('.editorconfig')
6 WindowsPath('src/.editorconfig') Need to fill-in gaps in your
7 >>> Path('src/.editorconfig') Python skills? I send regular
8 WindowsPath('src/.editorconfig') emails designed to do just that.

That last expression caused some confusion from folks who assumed pathlib wouldn’t be smart enough to convert that / into a \ in the path string. Fortunately, it is!

With Path objects, you never have to worry about backslashes vs forward slashes again: specify all paths usingemail@domain.com
forward slashes and you’ll get what you’d expect on
all platforms.

Sign up for Python Morsels tips


Normalizing file paths shouldn’t be your concern
If you’re developing on Linux or Mac, it’s very easy to add bugs to your code that only affect Windows users. Unless you’re careful to use os.path.join to build
your paths up or os.path.normcase to convert forward slashes to backslashes as appropriate, you may be writing code that breaks on Windows.

This is a Windows bug waiting to happen (we’ll get mixed backslashes and forward slashes here):

1 import sys
2 import os.path
3 directory = '.' if not sys.argv[1:] else sys.argv[1]
4 new_file = os.path.join(directory, 'new_package/__init__.py')

This just works on all systems:

1 import sys
2 from pathlib import Path
3 directory = '.' if not sys.argv[1:] else sys.argv[1]
4 new_file = Path(directory, 'new_package/__init__.py')

It used to be the responsibility of you the Python programmer to carefully join and normalize your paths, just as it used to be your responsibility in Python 2 land to
use unicode whenever it was more appropriate than bytes. This is the case no more. The pathlib.Path class is careful to fix path separator issues before they even
occur.

I don’t use Windows. I don’t own a Windows machine. But a ton of the developers who use my code likely use Windows and I don’t want my code to break on their
machines.

If there’s a chance that your Python code will ever run on a Windows machine, you really need pathlib.

Don’t stress about path normalization: just use pathlib.Path whenever you need to represent a file path.

pathlib seems great, but I depend on code that doesn’t use it!
You have lots of code that works with path strings. Why would you switch to using pathlib when it means you’d need to rewrite all this code?

Let’s say you have a function like this:

1 import os
2 import os.path
3
4 def make_editorconfig(dir_path):
5 """Create .editorconfig file in given directory and return filename."""
6 filename = os.path.join(dir_path, '.editorconfig')
7 if not os.path.exists(filename):
8 os.makedirs(dir_path, exist_ok=True)
9 open(filename, mode='wt').write('')
10 return filename

This function accepts a directory to create a .editorconfig file in, like this:

1 >>> import os.path


2 >>> make_editorconfig(os.path.join('src', 'my_package'))
3 'src/my_package/.editorconfig'

But our code also works with a Path object:

1 >>> from pathlib import Path


2 >>> make_editorconfig(Path('src/my_package'))
3 'src/my_package/.editorconfig'

But… how??

Well os.path.join accepts Path objects (as of Python 3.6). And os.makedirs accepts Path objects too.

In fact the built-in open function accepts Path objects and shutil does and anything in the standard library that previously accepted a path string is now expected to
work with both Path objects and path strings.

This is all thanks to PEP 519, which called for an os.PathLike abstract base class and declared that Python utilities that work with file paths should now accept either
path strings or path-like objects.

But my favorite third-party library X has a better Path object!


Privacidade - Termos

You might already be using a third-party library that has a Path object which works differently than pathlib’s Write more Pythonic
Path objects. code
Maybe you even like it better. ↑

https://treyhunner.com/2019/01/no-really-pathlib-is-great/ 2/7
12/07/2022 19:59 No really, pathlib is great
For example django-environ, path.py, plumbum, and visidata all have their own custom Path objects that represent file paths. Some of these pathlib alternatives
Need
predate pathlib and chose to inherit from str so they could be passed to functions that expected path strings. Thanks to PEP 519toboth
fill-in gaps inand
pathlib your
its third-party
alternatives can play nicely without needing to resort to the hack of inheriting from str. Python skills? I send regular
emails designed to do just that.
Let’s say you don’t like pathlib because Path objects are immutable and you very much prefer using mutable Path objects. Well thanks to PEP 519, you can create
your own even-better-because-it-is-mutable Path and also has a __fspath__. You don’t need to use pathlib to benefit from it.

Any homegrown Path object you make or find in a third party library now has the ability to work natively with the Python built-ins and standard library modules that
email@domain.com
expect Path objects. Even if you don’t like pathlib, its existence a big win for third-party Path objects as well.

But Path objects and path strings don’t mix, do they? Sign up for Python Morsels tips

You might be thinking: this is really wonderful, but won’t this sometimes-a-string and sometimes-a-path-object situation add confusion to my code?

The answer is yes, somewhat. But I’ve found that it’s pretty easy to work around.

PEP 519 added a couple other things along with path-like objects: one is a way to convert all path-like objects to path strings and the other is a way to convert all
path-like objects to Path objects.

Given either a path string or a Path object (or anything with a __fspath__ method):

1 from pathlib import Path


2 import os.path
3 p1 = os.path.join('src', 'my_package')
4 p2 = Path('src/my_package')

The os.fspath function will now normalize both of these types of paths to strings:

1 >>> from os import fspath


2 >>> fspath(p1), fspath(p2)
3 ('src/my_package', 'src/my_package')

And the Path class will now accept both of these types of paths and convert them to Path objects:

1 >>> Path(p1), Path(p2)


2 (PosixPath('src/my_package'), PosixPath('src/my_package'))

That means you could convert the output of the make_editorconfig function back into a Path object if you wanted to:

1 >>> from pathlib import Path


2 >>> Path(make_editorconfig(Path('src/my_package')))
3 PosixPath('src/my_package/.editorconfig')

Though of course a better long-term approach would be to rewrite the make_editorconfig function to use pathlib instead.

pathlib is too slow


I’ve heard this concern come up a few times: pathlib is just too slow.

It’s true that pathlib can be slow. Creating thousands of Path objects can make a noticeable impact on your code.

I decided to test the performance difference between pathlib and the alternative on my own machine using two different programs that both look for all .py files
below the current directory.

Here’s the os.walk version:

1 from os import getcwd, walk


2
3
4 extension = '.py'
5 count = 0
6 for root, directories, filenames in walk(getcwd()):
7 for filename in filenames:
8 if filename.endswith(extension):
9 count += 1
10 print(f"{count} Python files found")

Here’s the Path.rglob version:

1 from pathlib import Path


2
3
4 extension = '.py'
5 count = 0
6 for filename in Path.cwd().rglob(f'*{extension}'):
7 count += 1
8 print(f"{count} Python files found")

Testing runtimes for programs that rely on filesystem accesses is tricky because runtimes vary greatly, so I reran each script 10 times and compared the best runtime
of each.
Privacidade - Termos

Write more Pythonic code ↑

https://treyhunner.com/2019/01/no-really-pathlib-is-great/ 3/7
12/07/2022 19:59 No really, pathlib is great
Both scripts found 97,507 Python files in the directory I ran them in. The first one finished in 1.914 seconds (best out of 10 runs). The second one finished in 3.430
seconds (best out of 10 runs). Need to fill-in gaps in your
Python skills? I send regular
When I set extension = '' these find about 600,000 files and the differences spread a little further apart. The first runs in emails designedand
1.888 seconds to do
thejust that.in 7.485
second
seconds.

So the pathlib version of this program ran twice as slow for .py files and four times as slow for every file in my home directory. The pathlib code was indeed
slower, much slower percentage-wise.
email@domain.com
But in my case, this speed difference doesn’t matter much. I searched for every file in my home directory and lost 6 seconds to the slower version of my code. If I
needed to scale this code to search 10 million files, I’d probably want to rewrite it. But that’s a problem I can get to if I experience
Sign it.
up for Python Morsels tips
If you have a tight loop that could use some optimizing and pathlib.Path is one of the bottlenecks that’s slowing that loop down, abandon pathlib in that part of
your code. But don’t optimize parts of your code that aren’t bottlenecks: it’s a waste of time and often results in less readable code for little gain.

Improving readability with pathlib


I’d like to wrap up these thoughts by ending with some pathlib refactorings. I’ve taken a couple small examples of code that work with files and refactored these
examples to use pathlib instead. I’ll mostly leave these code blocks without comment and let you be the judge of which versions you like best.

Here’s the make_editorconfig function we saw earlier:

1 import os
2 import os.path
3
4
5 def make_editorconfig(dir_path):
6 """Create .editorconfig file in given directory and return filename."""
7 filename = os.path.join(dir_path, '.editorconfig')
8 if not os.path.exists(filename):
9 os.makedirs(dir_path, exist_ok=True)
10 open(filename, mode='wt').write('')
11 return filename

And here’s the same function using pathlib.Path instead:

1 from pathlib import Path


2
3
4 def make_editorconfig(dir_path):
5 """Create .editorconfig file in given directory and return filepath."""
6 path = Path(dir_path, '.editorconfig')
7 if not path.exists():
8 path.parent.mkdir(exist_ok=True, parents=True)
9 path.touch()
10 return path

Here’s a command-line program that accepts a string representing a directory and prints the contents of the .gitignore file in that directory if one exists:

1 import os.path
2 import sys
3
4
5 directory = sys.argv[1]
6 ignore_filename = os.path.join(directory, '.gitignore')
7 if os.path.isfile(ignore_filename):
8 with open(ignore_filename, mode='rt') as ignore_file:
9 print(ignore_file.read(), end='')

This is the same code using pathlib.Path:

1 from pathlib import Path


2 import sys
3
4
5 directory = Path(sys.argv[1])
6 ignore_path = directory / '.gitignore'
7 if ignore_path.is_file():
8 print(ignore_path.read_text(), end='')

And here’s some code that prints all groups of files in and below the current directory which are duplicates:

1 from collections import defaultdict


2 from hashlib import md5
3 from os import getcwd, walk
4 import os.path
5
6
7 def find_files(filepath):
8 for root, directories, filenames in walk(filepath):
9 for filename in filenames:
10 yield os.path.join(root, filename)
11
12
13 file_hashes = defaultdict(list)
14 for path in find_files(getcwd()):
15 with open(path, mode='rb') as my_file:
16 file_hash = md5(my_file.read()).hexdigest() Privacidade - Termos

17 file_hashes[file_hash].append(path) Write more Pythonic code ↑

https://treyhunner.com/2019/01/no-really-pathlib-is-great/ 4/7
12/07/2022 19:59 No really, pathlib is great
18
19 for paths in file_hashes.values(): Need to fill-in gaps in your
20 if len(paths) > 1: Python skills? I send regular
21 print("Duplicate files found:") emails designed to do just that.
22 print(*paths, sep='\n')

This is the same code that uses pathlib.Path instead:


email@domain.com
1 from collections import defaultdict
2 from hashlib import md5
3 from pathlib import Path
Sign up for Python Morsels tips
4
5
6 def find_files(filepath):
7 for path in Path(filepath).rglob('*'):
8 if path.is_file():
9 yield path
10
11
12 file_hashes = defaultdict(list)
13 for path in find_files(Path.cwd()):
14 file_hash = md5(path.read_bytes()).hexdigest()
15 file_hashes[file_hash].append(path)
16
17 for paths in file_hashes.values():
18 if len(paths) > 1:
19 print("Duplicate files found:")
20 print(*paths, sep='\n')

The changes here are subtle, but I think they add up. I prefer this pathlib-refactored version.

Start using pathlib.Path objects


Let’s recap.

The / separators in pathlib.Path strings are automatically converted to the correct path separator based on the operating system you’re on. This is a huge feature that
can make for code that is more readable and more certain to be free of path-related bugs.

1 >>> path1 = Path('dir', 'file')


2 >>> path2 = Path('dir') / 'file'
3 >>> path3 = Path('dir/file')
4 >>> path3
5 WindowsPath('dir/file')
6 >>> path1 == path2 == path3
7 True

The Python standard library and built-ins (like open) also accept pathlib.Path objects now. This means you can start using pathlib, even if your dependencies
don’t!

1 from shutil import move


2
3 def rename_and_redirect(old_filename, new_filename):
4 move(old, new)
5 with open(old, mode='wt') as f:
6 f.write(f'This file has moved to {new}')

1 >>> from pathlib import Path


2 >>> old, new = Path('old.txt'), Path('new.txt')
3 >>> rename_and_redirect(old, new)
4 >>> old.read_text()
5 'This file has moved to new.txt'

And if you don’t like pathlib, you can use a third-party library that provides the same path-like interface. This is great because even if you’re not a fan of pathlib
you’ll still benefit from the new changes detailed in PEP 519.

1 >>> from plumbum import Path


2 >>> my_path = Path('old.txt')
3 >>> with open(my_path) as f:
4 ... print(f.read())
5 ...
6 This file has moved to new.txt

While pathlib is sometimes slower than the alternative(s), the cases where this matters are somewhat rare (in my experience at least) and you can always jump
back to using path strings for parts of your code that are particularly performance sensitive.

And in general, pathlib makes for more readable code. Here’s a succinct and descriptive Python script to demonstrate my point:

1 from pathlib import Path


2 gitignore = Path('.gitignore')
3 if gitignore.is_file():
4 print(gitignore.read_text(), end='')

The pathlib module is lovely: start using it!


Privacidade - Termos

Posted by Trey Hunner Jan 15th, 2019 11:20 am python Write more Pythonic code ↑

https://treyhunner.com/2019/01/no-really-pathlib-is-great/ 5/7
12/07/2022 19:59 No really, pathlib is great
Tweet
Need to fill-in gaps in your
Python skills? I send regular
« Why you should be using pathlib Tuple ordering and deep comparisons in Python »
emails designed to do just that.

Comments
email@domain.com

Sign up for Python Morsels tips

Hi! My name is Trey Hunner.

I help Python teams write better Python code through Python team training.

I also help individuals level-up their Python skills with weekly Python skill-building.

Python Team Training

Write Pythonic code

The best way to improve your skills is to write more code, but it's time consuming to figure out what code to write. I've made a Python skill-building service to help
solve this problem.

Each week you'll get an exercise that'll help you dive deeper into Python and carefully reflect on your own coding style. The first 4 exercises are free.

Sign up below for four free exercises!

Your email Sign up

See the Python Morsels Privacy Policy.


This form is reCAPTCHA protected (see Google Privacy Policy & Terms of Service)

Favorite Posts

Python List Comprehensions


How to Loop With Indexes in Python
Check Whether All Items Match a Condition in Python
Keyword (Named) Arguments in Python: How to Use Them
Tuple unpacking improves Python code readability
The Idiomatic Way to Merge Dictionaries in Python
The Iterator Protocol: How for Loops Work in Python
Craft Your Python Like Poetry
Python: range is not an iterator!
Counting Things in Python: A History

Follow @treyhunner Privacidade - Termos

Write more Pythonic code ↑

https://treyhunner.com/2019/01/no-really-pathlib-is-great/ 6/7
12/07/2022 19:59 No really, pathlib is great
Copyright © 2022 - Trey Hunner - Powered by Octopress
Need to fill-in gaps in your
Python skills? I send regular
emails designed to do just that.

email@domain.com

Sign up for Python Morsels tips

Privacidade - Termos

Write more Pythonic code ↑

https://treyhunner.com/2019/01/no-really-pathlib-is-great/ 7/7

You might also like