Python List Directory, Subdirectory, and Files - Stack Overflow

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

12/07/2022 19:39 Python list directory, subdirectory, and files - Stack Overflow

Python list directory, subdirectory, and files


Asked 12 years, 1 month ago Modified 3 months ago Viewed 298k times

I'm trying to make a script to list all directory, subdirectory, and files in a given directory.
I tried this:
188
import sys, os

root = "/home/patate/directory/"
path = os.path.join(root, "targetdirectory")
55
for r, d, f in os.walk(path):
for file in f:
print(os.path.join(root, file))

Unfortunatly it doesn't work properly.


I get all the files, but not their complete paths.

For example if the dir struct would be:

/home/patate/directory/targetdirectory/123/456/789/file.txt

It would print:

/home/patate/directory/targetdirectory/file.txt

What I need is the first result. Any help would be greatly appreciated! Thanks.

python file path

Share Improve this question edited Mar 19 at 8:07 asked May 26, 2010 at 3:38
Follow Martin Thoma thomytheyon
110k 145 557 858 1,929 2 13 6

Sorted by:
Trending sort available
12 Answers
Highest score (default)

Use os.path.join to concatenate the directory and file name:

319 for path, subdirs, files in os.walk(root):


for name in files:
print(os.path.join(path, name))
Join Stack Overflow to find the best answer to your technical question, help others
Sign up
answer theirs.

https://stackoverflow.com/questions/2909975/python-list-directory-subdirectory-and-files 1/10
12/07/2022 19:39 Python list directory, subdirectory, and files - Stack Overflow

Note the usage of path and not root in the concatenation, since using root would be
incorrect.

In Python 3.4, the pathlib module was added for easier path manipulations. So the equivalent
to os.path.join would be:

pathlib.PurePath(path, name)

The advantage of pathlib is that you can use a variety of useful methods on paths. If you use
the concrete Path variant you can also do actual OS calls through them, like changing into a
directory, deleting the path, opening the file it points to and much more.

Share Improve this answer Follow edited Dec 14, 2020 at 11:38 answered May 26, 2010 at 3:46
Rui Vieira Eli Bendersky
5,185 5 41 54 249k 86 342 405

1 this is the one and only useful answer for the many questions that have been asked concerning "how to
get all files recursively in python". – harrisonfooord Oct 12, 2018 at 14:37

1 comprehension list: all_files = [os.path.join(path, name) for name in files for path, subdirs, files in
os.walk(folder)] – Nir Aug 12, 2019 at 14:40

In Python3 use parenthesis for print function print(os.path.join(path, name)) – Ehsan Aug 7,
2020 at 11:31

Just in case... Getting all files in the directory and subdirectories matching some pattern (*.py
for example):
67
import os
from fnmatch import fnmatch

root = '/some/directory'
pattern = "*.py"

for path, subdirs, files in os.walk(root):


for name in files:
if fnmatch(name, pattern):
print(os.path.join(path, name))

Share Improve this answer Follow edited Mar 29 at 18:25 answered Nov 4, 2012 at 0:38
Ivan Pirog
2,696 1 16 7

In Python3 use parenthesis for print function print(os.path.join(path, name)) . You can also use
print(pathlib.PurePath(path, name)) . – Ahmad Ismail Jul 5, 2021 at 9:28

Join Stack Overflow to find the best answer to your technical question, help others
Couldn't comment so writing answer here. This is the clearest one-line I have seen:
Sign up
answer theirs.

https://stackoverflow.com/questions/2909975/python-list-directory-subdirectory-and-files 2/10
12/07/2022 19:39 Python list directory, subdirectory, and files - Stack Overflow

23 import os
[os.path.join(path, name) for path, subdirs, files in os.walk(root) for name in
files]

Share Improve this answer Follow answered Jan 4, 2019 at 1:15


Mong H. Ng
621 7 13

1 this is the answer for all you googlers – Matt Feb 23 at 9:58

Here is a one-liner:

12 import os

[val for sublist in [[os.path.join(i[0], j) for j in i[2]] for i in


os.walk('./')] for val in sublist]
# Meta comment to ease selecting text

The outer most val for sublist in ... loop flattens the list to be one dimensional. The j
loop collects a list of every file basename and joins it to the current path. Finally, the i loop
iterates over all directories and sub directories.

This example uses the hard-coded path ./ in the os.walk(...) call, you can supplement any
path string you like.

Note: os.path.expanduser and/or os.path.expandvars can be used for paths strings like ~/

Extending this example:


Its easy to add in file basename tests and directoryname tests.

For Example, testing for *.jpg files:

... for j in i[2] if j.endswith('.jpg')] ...

Additionally, excluding the .git directory:

... for i in os.walk('./') if '.git' not in i[0].split('/')]

Share Improve this answer Follow edited May 20, 2015 at 16:15 answered Sep 26, 2014 at 21:03
ThorSummoner
14.6k 13 126 138

It does work, but to excluve .git directoy you need to check if '.git' is NOT into the path. – Roman Rdgz
May 19, 2015 at 10:32
Join Stack Overflow to find the best answer to your technical question, help others
Sign up
Yep. Should be if '.git' not in i[0].split('/')] – Roman Rdgz May 19, 2015 at 15:24
answer theirs.

https://stackoverflow.com/questions/2909975/python-list-directory-subdirectory-and-files 3/10
12/07/2022 19:39 Python list directory, subdirectory, and files - Stack Overflow

I would recommend os.walk over a manual dirlisting loop, generators are great, go use them.
– ThorSummoner Dec 4, 2016 at 19:18

A bit simpler one-liner:

5 import os
from itertools import product, chain

chain.from_iterable([[os.sep.join(w) for w in product([i[0]], i[2])] for i in


os.walk(dir)])

Share Improve this answer Follow edited Apr 25, 2020 at 8:47 answered Feb 21, 2018 at 8:44
Jean-François Fabre ♦ Daniel
132k 23 126 197 76 1 2

how do I list each file ? – Aakash Gupta Sep 22, 2021 at 4:43

You can take a look at this sample I made. It uses the os.path.walk function which is
deprecated beware.Uses a list to store all the filepaths
4
root = "Your root directory"
ex = ".txt"
where_to = "Wherever you wanna write your file to"
def fileWalker(ext,dirname,names):
'''
checks files in names'''
pat = "*" + ext[0]
for f in names:
if fnmatch.fnmatch(f,pat):
ext[1].append(os.path.join(dirname,f))

def writeTo(fList):

with open(where_to,"w") as f:
for di_r in fList:
f.write(di_r + "\n")

if __name__ == '__main__':
li = []
os.path.walk(root,fileWalker,[ex,li])

writeTo(li)

Share Improve this answer Follow answered May 4, 2013 at 23:02


devsaw
Join Stack Overflow to find the best answer to your technical question, help others
989 1 12 26
Sign up
answer theirs.

https://stackoverflow.com/questions/2909975/python-list-directory-subdirectory-and-files 4/10
12/07/2022 19:39 Python list directory, subdirectory, and files - Stack Overflow

Since every example here is just using walk (with join ), i'd like to show a nice example and
comparison with listdir :
4
import os, time

def listFiles1(root): # listdir


allFiles = []; walk = [root]
while walk:
folder = walk.pop(0)+"/"; items = os.listdir(folder) # items = folders
+ files
for i in items: i=folder+i; (walk if os.path.isdir(i) else
allFiles).append(i)
return allFiles

def listFiles2(root): # listdir/join (takes ~1.4x as long) (and uses '\\'


instead)
allFiles = []; walk = [root]
while walk:
folder = walk.pop(0); items = os.listdir(folder) # items = folders +
files
for i in items: i=os.path.join(folder,i); (walk if os.path.isdir(i)
else allFiles).append(i)
return allFiles

def listFiles3(root): # walk (takes ~1.5x as long)


allFiles = []
for folder, folders, files in os.walk(root):
for file in files: allFiles+=[folder.replace("\\","/")+"/"+file] #
folder+"\\"+file still ~1.5x
return allFiles

def listFiles4(root): # walk/join (takes ~1.6x as long) (and uses '\\' instead)
allFiles = []
for folder, folders, files in os.walk(root):
for file in files: allFiles+=[os.path.join(folder,file)]
return allFiles

for i in range(100): files = listFiles1("src") # warm up

start = time.time()
for i in range(100): files = listFiles1("src") # listdir
print("Time taken: %.2fs"%(time.time()-start)) # 0.28s

start = time.time()
for i in range(100): files = listFiles2("src") # listdir and join
print("Time taken: %.2fs"%(time.time()-start)) # 0.38s

start = time.time()
for i in range(100): files = listFiles3("src") # walk
print("Time taken: %.2fs"%(time.time()-start)) # 0.42s

start = time.time()
for i in range(100): files = listFiles4("src") # walk and join
print("Time taken: %.2fs"%(time.time()-start)) # 0.47s

So as you can see for yourself, the listdir version is much more efficient. (and that join is
slow)

Join Stack
ShareOverflow to find
Improve this the Follow
answer best answer to your
edited Feb 1,technical question, help
2019 at 7:13 othersFeb 1, 2019 at 6:37
answered Sign up
answer theirs.
Puddle
https://stackoverflow.com/questions/2909975/python-list-directory-subdirectory-and-files 5/10
12/07/2022 19:39 Python list directory, subdirectory, and files - Stack Overflow

2,635 1 17 32

It's just an addition, with this you can get the data into CSV format

1 import sys,os
try:
import pandas as pd
except:
os.system("pip3 install pandas")

root = "/home/kiran/Downloads/MainFolder" # it may have many subfolders and


files inside
lst = []
from fnmatch import fnmatch
pattern = "*.csv" #I want to get only csv files
pattern = "*.*" # Note: Use this pattern to get all types of files and
folders
for path, subdirs, files in os.walk(root):
for name in files:
if fnmatch(name, pattern):
lst.append((os.path.join(path, name)))
df = pd.DataFrame({"filePaths":lst})
df.to_csv("filepaths.csv")

Share Improve this answer Follow answered Feb 1, 2021 at 8:49


kiran beethoju
65 4

Another option would be using the glob module from the standard lib:

1 import glob

path = "/home/patate/directory/targetdirectory/**"

for path in glob.glob(path, recursive=True):


print(path)

If you need an iterator you can use iglob as an alternative:

for file in glob.iglob(my_path, recursive=True):


# ...

Share Improve this answer Follow answered Nov 11, 2021 at 23:25
Rotareti
42k 18 102 102

Using any supported Python version (3.4+), you should use pathlib.rglob to recusrively list
the contents of the current directory and all subdirectories:
1
Join Stack Overflow to find the best answer to your technical question, help others
from pathlib import Path Sign up
answer theirs.

https://stackoverflow.com/questions/2909975/python-list-directory-subdirectory-and-files 6/10
12/07/2022 19:39 Python list directory, subdirectory, and files - Stack Overflow

def generate_all_files(root: Path, only_files: bool = True):


for p in root.rglob("*"):
if only_files and not p.is_file():
continue
yield p

for p in generate_all_files(Path("."), only_files=False):


print(p)

If you want something copy-pasteable:

Example
Folder structure:

$ tree . -a
.
├── a.txt
├── bar
├── b.py
├── collect.py
├── empty
├── foo
│ └── bar.bz.gz2
├── .hidden
│ └── secrect-file
└── martin
└── thoma
└── cv.pdf

gives:

$ python collect.py
bar
empty
.hidden
collect.py
a.txt
b.py
martin
foo
.hidden/secrect-file
martin/thoma
martin/thoma/cv.pdf
foo/bar.bz.gz2

Share Improve this answer Follow answered Mar 19 at 8:20


Martin Thoma
110k 145 557 858

Pretty simple solution would be to run a couple of sub process calls to export the files into
CSV format:
Join Stack Overflow to find the best answer to your technical question, help others
0
answer theirs.
Sign up

https://stackoverflow.com/questions/2909975/python-list-directory-subdirectory-and-files 7/10
12/07/2022 19:39 Python list directory, subdirectory, and files - Stack Overflow

import subprocess

# Global variables for directory being mapped

location = '.' # Enter the path here.


pattern = '*.py' # Use this if you want to only return certain filetypes
rootDir = location.rpartition('/')[-1]
outputFile = rootDir + '_directory_contents.csv'

# Find the requested data and export to CSV, specifying a pattern if needed.
find_cmd = 'find ' + location + ' -name ' + pattern + ' -fprintf ' +
outputFile + ' "%Y%M,%n,%u,%g,%s,%A+,%P\n"'
subprocess.call(find_cmd, shell=True)

That command produces comma separated values that can be easily analyzed in Excel.

f-rwxrwxrwx,1,cathy,cathy,2642,2021-06-01+00:22:00.2970880000,content-audit.py

The resulting CSV file doesn't have a header row, but you can use a second command to add
them.

# Add headers to the CSV


headers_cmd = 'sed -i.bak
1i"Permissions,Links,Owner,Group,Size,ModifiedTime,FilePath" ' + outputFile
subprocess.call(headers_cmd, shell=True)

Depending on how much data you get back, you can massage it further using Pandas. Here
are some things I found useful, especially if you're dealing with many levels of directories to
look through.

Add these to your imports:

import numpy as np
import pandas as pd

Then add this to your code:

# Create DataFrame from the csv file created above.


df = pd.read_csv(outputFile)

# Format columns
# Get the filename and file extension from the filepath
df['FileName'] = df['FilePath'].str.rsplit("/",1).str[-1]
df['FileExt'] = df['FileName'].str.rsplit('.',1).str[1]

# Get the full path to the files. If the path doesn't include a "/" it's the
root directory
df['FullPath'] = df["FilePath"].str.rsplit("/",1).str[0]
df['FullPath'] = np.where(df['FullPath'].str.contains("/"), df['FullPath'],
rootDir)

# Split the path into columns for the parent directory and its children
df['ParentDir'] = df['FullPath'].str.split("/",1).str[0]
Join Stack Overflow to find the best answer to your technical question, help others
df['SubDirs'] = df['FullPath'].str.split("/",1).str[1] Sign up
answer theirs.
# Account for NaN returns, indicates the path is the root directory

https://stackoverflow.com/questions/2909975/python-list-directory-subdirectory-and-files 8/10
12/07/2022 19:39 Python list directory, subdirectory, and files - Stack Overflow
df['SubDirs'] = np.where(df.SubDirs.str.contains('NaN'), '', df.SubDirs)

# Determine if the item is a directory or file.


df['Type'] = np.where(df['Permissions'].str.startswith('d'), 'Dir', 'File')

# Split the time stamp into date and time columns


df[['ModifiedDate', 'Time']] = df.ModifiedTime.str.rsplit('+', 1, expand=True)
df['Time'] = df['Time'].str.split('.').str[0]

# Show only files, output includes paths so you don't necessarily need to
display the individual directories.
df = df[df['Type'].str.contains('File')]

# Set columns to show and their order.


df=df[['FileName','ParentDir','SubDirs','FullPath','DocType','ModifiedDate','Time'
'Size']]

filesize=[] # Create an empty list to store file sizes to convert them to


something more readable.

# Go through the items and convert the filesize from bytes to something more
readable.
for items in df['Size'].items():
filesize.append(convert_bytes(items[1]))
df['Size'] = filesize

# Send the data to an Excel workbook with sheets by parent directory


with pd.ExcelWriter("scripts_directory_contents.xlsx") as writer:
for directory, data in df.groupby('ParentDir'):
data.to_excel(writer, sheet_name = directory, index=False)

# To convert sizes to be more human readable


def convert_bytes(size):
for x in ['b', 'K', 'M', 'G', 'T']:
if size < 1024:
return "%3.1f %s" % (size, x)
size /= 1024

return size

Share Improve this answer Follow answered Jun 1, 2021 at 4:34


CMSG
41 2

And this is how you list it in case you want to list the files on SharePoint. Your path will
probably start after the "\teams\" part
0
import os
root = r"\\mycompany.sharepoint.com@SSL\DavWWWRoot\teams\MyFolder\Policies
and Procedures\Deal Docs\My Deals"
list = [os.path.join(path, name) for path, subdirs, files in os.walk(root)
for name in files]
print(list)

Share Improve this answer Follow answered Nov 5, 2021 at 5:03


Join Stack Overflow to find the best answer to your technical question, help others
Chadee Fouad
Sign up
answer theirs. 2,045 2 20 22

https://stackoverflow.com/questions/2909975/python-list-directory-subdirectory-and-files 9/10
12/07/2022 19:39 Python list directory, subdirectory, and files - Stack Overflow

Join Stack Overflow to find the best answer to your technical question, help others
Sign up
answer theirs.

https://stackoverflow.com/questions/2909975/python-list-directory-subdirectory-and-files 10/10

You might also like