How to Find those large files on Linux

·

4 min read

Every so often we have to dig through a system to find what's taking up space and there are many ways to do this.

I recently had to go on this quest and found myself, once again, searching for the best solution. I found a few and then I combined them to make the perfect one.

This is to help anyone who may need to do this in future (and also as a post-it note for myself).

Here's one that's easy to use.

This says: Give me files sizes in my current directory. Sort them numerically and display largest to smallest. Oh, and by the way, only give me the top 10

[root@virtualbox ~]# du -a . | sort -n -r | head -n 10

Break down of options:

-a same as --all

-n numerical sort

-r reverse order

-n lines to print

The method above works well for non-root users. Of course, you can also use a full path to search a specific directory such as du -a /home/user | sort -n -r | head -n 10 for example.

Without making this too long, my favorite method so far is:

[root@virtualbox ~]# find . -printf '%s %p\n'| sort -nr | head -10

On the surface, this is no different from using our previous command of du -a. However, with a little adjustment, we can avoid one of my pet peeves when trying to search the entire system.

What pet peeve you ask? Let me show you.

[root@virtualbox ~]# find / -printf '%s %p\n'| sort -nr | head -10
find: ‘/proc/85560/task/85560/fd/6’: No such file or directory
find: ‘/proc/85560/task/85560/fdinfo/6’: No such file or directory
find: ‘/proc/85560/fd/5’: No such file or directory
find: ‘/proc/85560/fdinfo/5’: No such file or directory
find: ‘/run/user/1000/doc’: Permission denied
find: ‘/run/user/1000/gvfs’: Permission denied
140737471590400 /proc/kcore
404947896 /var/cache/pkgfile/chaotic-aur.files
252453708 /var/cache/pkgfile/community.files
165642866 /usr/share/pycharm/lib/platform-impl.jar
165642866 /.snapshots/4/snapshot/usr/share/pycharm/lib/platform-impl.jar
162100138 /var/cache/pacman/pkg/pycharm-community-edition-2021.2.2-1-x86_64.pkg.tar.zst
155180807 /var/cache/pacman/pkg/garuda-wallpapers-extra-r9.ef73a85-1-any.pkg.tar.zst
144331808 /usr/lib/electron13/electron
144331808 /.snapshots/4/snapshot/usr/lib/electron13/electron
142212672 /usr/lib/jvm/java-11-openjdk/lib/modules

Can you see those "No such file or directory" and "Permission denied" at the top? Yes? They really annoy me for some reason. So how do I get rid of them?

I tell find to exclude some paths when searching. For most systems, I only need to tell find to exclude the '/proc' path. In my example, I will exclude two paths (because, why not). More importantly, to show you how to do this in case it fits your scenario better.

find / -not \( -path /proc -prune \) -not \( -path /run -prune \) -printf '%s %p\n'| sort -nr | head -10

Here's an example of what the output looks like with these additions.

[root@virtualbox ~]# find / -not \( -path /proc  -prune \) -not \( -path /run -prune \) -printf '%s %p\n'| sort -nr | head -10
404947896 /var/cache/pkgfile/chaotic-aur.files
252453708 /var/cache/pkgfile/community.files
165642866 /usr/share/pycharm/lib/platform-impl.jar
165642866 /.snapshots/4/snapshot/usr/share/pycharm/lib/platform-impl.jar
162100138 /var/cache/pacman/pkg/pycharm-community-edition-2021.2.2-1-x86_64.pkg.tar.zst
155180807 /var/cache/pacman/pkg/garuda-wallpapers-extra-r9.ef73a85-1-any.pkg.tar.zst
144331808 /usr/lib/electron13/electron
144331808 /.snapshots/4/snapshot/usr/lib/electron13/electron
142212672 /usr/lib/jvm/java-11-openjdk/lib/modules
142212672 /.snapshots/4/snapshot/usr/lib/jvm/java-11-openjdk/lib/modules

I should point out that the size output is in bytes. We can also use %k to make that kilobytes.

The logic to this command is straight forward(ish). We are using the find command to search a path. Then we are telling it to print out some information along with the results. We also use sort here again to list the result in reverse (largest to smallest) order. And finally, we want only the top ten listed.

On most systems, you may only need to suppress the '/proc' directory, though. For that you could create a nice little alias to call on whenever you wish to hunt down any large files on the system.

Here's what I use for that.

alias search="find . -not \( -path /proc -prune \) -printf '%s %p\n'| sort -nr | head -10"

This means I can simply type search from any directory and I will immediately list the top 10 largest files in that folder.

[root@virtualbox ~]# alias search="find . -not \( -path /proc -prune \) -printf '%s %p\n'| sort -nr | head -10"
[root@virtualbox ~]# search
453248 ./.cache/mirrorstatus.json
90233 ./.config/vlc/vlcrc
76033 ./.local/share/fish/generated_completions/mpv.fish
61544 ./.local/share/fish/generated_completions
33234 ./.local/share/fish/generated_completions/gpg.fish
27604 ./.local/share/fish/generated_completions/openvpn.fish
26650 ./.config/falkon/profiles/garuda/browsedata.db
25846 ./.local/share/fish/generated_completions/curl.fish
19364 ./.local/share/fish/generated_completions/yad.fish
18557 ./.local/share/fish/generated_completions/qmicli.fish

Notice that I am still excluding the '/proc' directory and can search anywhere on the system without raising an error.

One final thing to note is that, given how we implemented the search alias, it will not be available once the system is rebooted. I will leave you with the task of researching how to make it permanent should you choose to do so.

Well, that's me done. I hope you find this helpful.