How To Find Anything Under Linux: Find Largest or Newest Files

How to Find Anything Under Linux
The Linux find, grep, and awk commands are amazing power tools for fine-grained file searches, and for finding things inside files. With them you can find the largest and newest files on a system, fine-tune search parameters, search for text inside files, and perform some slick user management tricks.
Find Largest or Newest Files

The find command can do nearly anything, if you can figure out how. This example hunts down space hogs by finding the 10 largest files on your system, and sorts them from small to large in humanreadable form:
# find / -type f -exec du {} \; 2>/dev/null | sort -n | tail -n 10 | xargs -n 1 du -h 2>/dev/null 1.2G /home/carla/.local/share/Trash/files/download 1.3G /home/carla/sda1/carla/.VirtualBox/Machines/ubuntu-hoary/Snapshots/ {671041dd-700c-4506-68a8-7edfcd0e3c58}.vdi 2.2G /home/carla/.local/share/Trash/files/dreamstudio.iso [...]
These results remind me why I dont like having a Trash bin, because when I delete something I mean it, by cracky. This command is a brute-force search of the entire filesystem and may take a few minutes to run, so use it as an excuse to go have a quick healthy walk outside. Of course you can modify the command to search whatever directories you want; for example, use find /var/ to hunt down obese logfiles. Lets dissect the command. find / -type f means search all files in the entire root filesystem. The -exec option is for incorporating other commands, in this case du, the disk usage command. -exec du {} \; means run the du command on every file to get its size in bytes. 2>/dev/null sends all error messages to the bitbucket, so they dont clutter up your results. You can delete both 2>/dev/null occurrences and rerun the command if youre curious about what youre missing. sort -n puts all the files in order by size, and tail -n 10 displays the last 10, which thanks to the sort are the largest. You could stop there, and then your output would look like this:
1206316 /home/carla/.local/share/Trash/files/download 2209784 /home/carla/.local/share/Trash/files/dreamstudio.iso
xargs -n 1 du -h adds the final refinement, converting the file sizes from bytes to an easy-toread format. You can easily find all files on your system that were changed in the last five minutes:
# find / -mmin -5 -type f
This command finds all files changed between 10 and 20 minutes ago:
# find / -mmin +10 -mmin -20 -type f
+10 means more than 10 minutes ago, and -20 means less than 20. If you do not use a plus or minus, it means that number exactly. Use -mtime to search by 24-hour days. If you want to find directories, use -type d.
Searching Multiple Directories

You can list multiple arbitrary directories in which to search like this:
# find /etc /var /mnt /media -xdev -mmin -5 -type f
- xdev limits the search to the filesystem you are in and will not enter any other mounted filesystems. By default find does not follow symlinks, so you only need to include -xdev to stay inside a filesystem and not go wandering through network shares and removable devices.
Excluding Directories
You can narrow your searches by excluding directories with the prune option. prune is a little weird; you have to think backwards. This example searches the whole filesystem except for the /proc and /sys pseudo-directories:
# find / $ -name proc -o -name sys $ -prune -o -type f -mmin -1
First you name the directories to exclude, where -o means or, and escape the parentheses. Then -prune -o means dont look in the previously named directories. I like to use prune to exclude web browser caches, because they clutter the results. The following example does that, and also prints the date and time for each file:
$ find / $ -name proc -o -name sys -o -name .mozilla -o -name chromium $ -prune -o -type f -mmin -10 -printf "%Ac\t%p\n" Wed 28 Sep 2011 10:34:54 AM PDT /home/carla/.local/share/akonadi/db_data/ib_logfile0 Wed 28 Sep 2011 10:34:54 AM PDT /home/carla/.local/share/akonadi/db_data/ibdata1 Wed 28 Sep 2011 05:21:48 PM PDT /home/carla/articles/findgrep.html
The printf option is print format. Use printf when you want to control the formatting of your output. You get to specify newlines, date and time formatting, and file attributes such as permissions, ownership, and time stamps. %Ac prints the date and time, \t inserts a tab, %p prints the full filename, and \n inserts a newline. As you can see, find has a lot of built-in functionality that people often add the ls command for.
Finding File Types

Searching by file extension is easy too. This example searches the current directory for three different types of image files:
$ find . -name "*.png" -o -name "*.jpg" -o -name "*.gif" -type f
Use the -name option to search on any part of a filename; either the extension or part of the name. For example, to find mysong.ogg you could search for mys*, or any part of it, using normal shell wildcards. Use -iname for a case-insensitive search.
Finding Duplicate Files

You can find duplicates files in a couple of ways. This command checks MD5 hashes:
$ find . -type f -exec md5sum '{}' ';' | sort | uniq --all-repeated=separate -w 24
This calculates an MD5 hash for all the files, sorts them by hash, displays them on separate lines, and matches the first 24 digits of each hash. The second way is to match files by file size:
$ find . -type f -printf "%p - %s\n" | sort -nr -k3 | uniq -D -f1
MD5 hashes are more accurate, but matching file sizes is faster.
Finding Text Inside Files

The grep command is endlessly useful for searching inside text files to find things. Suppose you have a directory full of configuration files for a server, and you want to search all of them to find all of your test entries. If you were foresightful you used the word test in all of them, so this command will find them:
# grep -inR -A2 test /etc/fooserver/
This tells grep to do a case-insensitive recursive search for test in all the files in the /etc/fooserver/ directory, and to print the next two lines following the line that matches the search. The n option prints line numbers, which is a nice bonus in large files.
Finding Blocks of Text

The awk command can find blocks of related text in a way that grep cant, using this simple syntax: awk '/start-pattern/,/stop-pattern/'. Suppose you want to see expanded information from lspci for just your Ethernet device:
$ lspci -v | awk '/[Ee]thernet/,/^$/' 08:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 03) Subsystem: Lenovo Device 2131 Flags: bus master, fast devsel, latency 0, IRQ 46 I/O ports at 3000 [size=256] Memory at f2004000 (64-bit, prefetchable) [size=4K] Memory at f2000000 (64-bit, prefetchable) [size=16K] [virtual] Expansion ROM at f2020000 [disabled] [size=128K] Capabilities: <access denied> Kernel driver in use: r8169 Kernel modules: r8169
You need to know the beginning and end of the block that you want to see, so its a great tool for quickly snagging sections of configuration files. This example takes advantage of configuration blocks delimited with curly braces, and homes in on the listen directives in radiusd.conf:
# awk '/listen {/,/}/' /etc/freeradius/radiusd.conf listen { ipaddr = * # ipv6addr = :: port = 0 type = acct # interface = eth0 # clients = per_socket_clients }
Managing Users and Files

Employees leave, and file ownership and permissions get messed up on an organizations system files but dont worry, find can help you set things right quickly. You can find all files that belong to a specified username:
# find / -user carla
Or to a group:
# find / -group admins
You can also search by UID and GID with the -uid and -gid options. You can then move all of a users files to another user by either username or UID:
# find / -uid 1100 -ok chown -v 1200 {} \; # find / -user carla -ok chown -v steven {} \;
Of course this works for changing group membership as well:

# find / -group carla -ok chgrp -v admins {} \;
The ok option requires you to verify each and every change. Replace it with -exec if youre confident about your changes. When employees leave you may have a policy of deleting their files, which find can do with ease:
# find / -user 1100 -exec rm {} \;
Of course you want to be very sure you have it right, because find wont nag you and ask if you are sure. It will simply do what you tell it to. find, grep, and awk with tools like these, and maybe a little help from their man pages, you can find just about anything on your Linux systems.

How To Find Anything Under Linux: Find Largest or Newest Files

Uploaded by

Copyright:

Available Formats

You might also like

How To Find Anything Under Linux: Find Largest or Newest Files

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

How To Find Anything Under Linux: Find Largest or Newest Files

Uploaded by

Copyright:

Available Formats

How to Find Anything Under Linux

Find Largest or Newest Files

Searching Multiple Directories

Finding File Types

Finding Duplicate Files

Finding Text Inside Files

Finding Blocks of Text

Managing Users and Files

Of course this works for changing group membership as well:

You might also like