Professional Documents
Culture Documents
How To Find Anything Under Linux: Find Largest or Newest Files
How To Find Anything Under Linux: Find Largest or Newest Files
How To Find Anything Under Linux: Find Largest or Newest Files
The Linux find, grep, and awk commands are amazing power tools for fine-grained file searches, and for finding things inside files. With them you can find the largest and newest files on a system, fine-tune search parameters, search for text inside files, and perform some slick user management tricks.
These results remind me why I dont like having a Trash bin, because when I delete something I mean it, by cracky. This command is a brute-force search of the entire filesystem and may take a few minutes to run, so use it as an excuse to go have a quick healthy walk outside. Of course you can modify the command to search whatever directories you want; for example, use find /var/ to hunt down obese logfiles. Lets dissect the command. find / -type f means search all files in the entire root filesystem. The -exec option is for incorporating other commands, in this case du, the disk usage command. -exec du {} \; means run the du command on every file to get its size in bytes. 2>/dev/null sends all error messages to the bitbucket, so they dont clutter up your results. You can delete both 2>/dev/null occurrences and rerun the command if youre curious about what youre missing. sort -n puts all the files in order by size, and tail -n 10 displays the last 10, which thanks to the sort are the largest. You could stop there, and then your output would look like this:
1206316 /home/carla/.local/share/Trash/files/download 2209784 /home/carla/.local/share/Trash/files/dreamstudio.iso
xargs -n 1 du -h adds the final refinement, converting the file sizes from bytes to an easy-toread format. You can easily find all files on your system that were changed in the last five minutes:
# find / -mmin -5 -type f
This command finds all files changed between 10 and 20 minutes ago:
# find / -mmin +10 -mmin -20 -type f
+10 means more than 10 minutes ago, and -20 means less than 20. If you do not use a plus or minus, it means that number exactly. Use -mtime to search by 24-hour days. If you want to find directories, use -type d.
- xdev limits the search to the filesystem you are in and will not enter any other mounted filesystems. By default find does not follow symlinks, so you only need to include -xdev to stay inside a filesystem and not go wandering through network shares and removable devices.
Excluding Directories
You can narrow your searches by excluding directories with the prune option. prune is a little weird; you have to think backwards. This example searches the whole filesystem except for the /proc and /sys pseudo-directories:
# find / \( -name proc -o -name sys \) -prune -o -type f -mmin -1
First you name the directories to exclude, where -o means or, and escape the parentheses. Then -prune -o means dont look in the previously named directories. I like to use prune to exclude web browser caches, because they clutter the results. The following example does that, and also prints the date and time for each file:
$ find / \( -name proc -o -name sys -o -name .mozilla -o -name chromium \) -prune -o -type f -mmin -10 -printf "%Ac\t%p\n" Wed 28 Sep 2011 10:34:54 AM PDT /home/carla/.local/share/akonadi/db_data/ib_logfile0 Wed 28 Sep 2011 10:34:54 AM PDT /home/carla/.local/share/akonadi/db_data/ibdata1 Wed 28 Sep 2011 05:21:48 PM PDT /home/carla/articles/findgrep.html
The printf option is print format. Use printf when you want to control the formatting of your output. You get to specify newlines, date and time formatting, and file attributes such as permissions, ownership, and time stamps. %Ac prints the date and time, \t inserts a tab, %p prints the full filename, and \n inserts a newline. As you can see, find has a lot of built-in functionality that people often add the ls command for.
Use the -name option to search on any part of a filename; either the extension or part of the name. For example, to find mysong.ogg you could search for mys*, or any part of it, using normal shell wildcards. Use -iname for a case-insensitive search.
This calculates an MD5 hash for all the files, sorts them by hash, displays them on separate lines, and matches the first 24 digits of each hash. The second way is to match files by file size:
$ find . -type f -printf "%p - %s\n" | sort -nr -k3 | uniq -D -f1
MD5 hashes are more accurate, but matching file sizes is faster.
This tells grep to do a case-insensitive recursive search for test in all the files in the /etc/fooserver/ directory, and to print the next two lines following the line that matches the search. The n option prints line numbers, which is a nice bonus in large files.
You need to know the beginning and end of the block that you want to see, so its a great tool for quickly snagging sections of configuration files. This example takes advantage of configuration blocks delimited with curly braces, and homes in on the listen directives in radiusd.conf:
# awk '/listen {/,/}/' /etc/freeradius/radiusd.conf listen { ipaddr = * # ipv6addr = :: port = 0 type = acct # interface = eth0 # clients = per_socket_clients }
Or to a group:
# find / -group admins
You can also search by UID and GID with the -uid and -gid options. You can then move all of a users files to another user by either username or UID:
# find / -uid 1100 -ok chown -v 1200 {} \; # find / -user carla -ok chown -v steven {} \;
The ok option requires you to verify each and every change. Replace it with -exec if youre confident about your changes. When employees leave you may have a policy of deleting their files, which find can do with ease:
# find / -user 1100 -exec rm {} \;
Of course you want to be very sure you have it right, because find wont nag you and ask if you are sure. It will simply do what you tell it to. find, grep, and awk with tools like these, and maybe a little help from their man pages, you can find just about anything on your Linux systems.