25

How do I get the actual directory size, using UNIX/Linux standard tools?

Alternative question: How do I get du to show me the actual directory size (not disk usage)?

Since people seem to have different definitions of the term "size": My definition of "directory size" is the sum of all regular files within that directory.

I do NOT care about the size of the directory inode or whatever (blocks * block size) the files take up on the respective file system. A directory with 3 files, 1 byte each, has a directory size of 3 bytes (by my definition).

Calculating the directory size using du seems to be unreliable.
For example, mkdir foo && du -b foo reports "4096 foo", 4096 bytes instead of 0 bytes. With very large directories, the directory size reported by du -hs can be off by 100 GB (!) and more (compressed file system).

So what (tool/option) has to be used to get the actual directory size?

basic6
  • 2,587
  • 9
  • 33
  • 41
  • What filesystem is used in the new location — is it `xfs` by any chance? – Sergey Vlasov Jun 02 '13 at 16:32
  • 2
    The same question was asked here before: [Is there a way to force du to report a directory size (recursively) including only sizes of files?](http://superuser.com/q/369878/58777) – Sergey Vlasov Jun 02 '13 at 16:48
  • And if your new FS is really XFS, the greatly increased disk usage is probably due to [aggressive preallocation](http://serverfault.com/q/406069/63156), which decreases file fragmentation at the cost of disk usage. – Sergey Vlasov Jun 02 '13 at 16:51

5 Answers5

14

Some versions of du support the argument --apparent-size to show apparent size instead of disk usage. So your command would be:

du -hs --apparent-size

From the man pages for du included with Ubuntu 12.04 LTS:

--apparent-size
      print apparent sizes,  rather  than  disk  usage;  although  the
      apparent  size is usually smaller, it may be larger due to holes
      in (`sparse') files, internal  fragmentation,  indirect  blocks,
      and the like
Brian
  • 8,896
  • 23
  • 37
9

Assuming you have du from GNU coreutils, this command should calculate the total apparent size of arbitrary number of regular files inside a directory without any arbitrary limits on the number of files:

find . -type f -print0 | du -scb --files0-from=- | tail -n 1

Add the -l option to du if there are some hardlinked files inside, and you want to count each hardlink separately (by default du counts multiple hardlinks only once).

The most important difference with plain du -sb is that recursive du also counts sizes of directories, which are reported differently by different filesystems; to avoid this, the find command is used to pass only regular files to du. Another difference is that symlinks are ignored (if they should be counted, the find command should be adjusted).

This command will also consume more memory than plain du -sb, because using the --files0-from=FILE makes du store device and inode numbers of all processed files, as opposed to the default behavior of remembering only files with more than one hard link. (This is not an issue if the -l option is used to count hardlinks multiple times, because the only reason to store device and inode numbers is to skip hardlinked files which had been already processed.)

If you want to get a human-readable representation of the total size, just add the -h option (this works because du is invoked only once and calculates the total size itself, unlike some other suggested answers):

find . -type f -print0 | du -scbh --files0-from=- | tail -n 1

or (if you are worried that some effects of -b are then overridden by -h)

find . -type f -print0 | du -sc --apparent-size -h --files0-from=- | tail -n 1
Sergey Vlasov
  • 2,948
  • 1
  • 18
  • 16
  • 1
    Not sure what to do for FreeBSD — although `-b` could probably be replaced by `-A -B 1`, there is no equivalent for `--files0-from=-`, and using `xargs` will need some workarounds in case the file list is bigger than `ARG_MAX` (and some external solution for human-readable output). – Sergey Vlasov Jun 03 '13 at 19:59
  • Brilliant, this way I could check that the byte-size of a Linux folder matches a Windows folder. – Kar.ma Mar 01 '23 at 12:00
  • Thank You. After running rsync between 2 linux machines that have compression in the filesystem but are using different filesystems I kept getting slightly different sizes when I compared options of du between the filesystems. I believe the issue was the addition of the number of blocks of the folders between the 2 different filesystems. With this method I see that the number of bytes are exactly the same. – drescherjm Mar 25 '23 at 16:28
9

Here is a script displaying a human readable directory size using Unix standard tools (POSIX).

#!/bin/sh
find ${1:-.} -type f -exec ls -lnq {} \+ | awk '
BEGIN {sum=0} # initialization for clarity and safety
function pp() {
  u="+Ki+Mi+Gi+Ti+Pi+Ei";
  split(u,unit,"+");
  v=sum;
  for(i=1;i<7;i++) {
    if(v<1024) break;
    v/=1024;
  }
  printf("%.3f %sB\n", v, unit[i]);
}
{sum+=$5}
END{pp()}'

eg:

$ ds ~        
72.891 GiB
jlliagre
  • 13,899
  • 4
  • 31
  • 48
  • And now I found another option which is missing in all suggested `ls` invocations here: `-q`. Without this option the script will break if some file name contains newline characters. Writing really reliable shell scripts is too hard… – Sergey Vlasov Jun 03 '13 at 20:17
  • @SergeyVlasov The script I posted shouldn't break with such files, only merely ignoring the extra lines. The only problem case would occur should a carefully crafted file had an extra line witha fifth colon that contains a numerical value. Your suggestion would indeed avoid that situation. Thanks for the tip, script updated. – jlliagre Jun 03 '13 at 20:47
  • Excelent answer. +1 to you sir – ehime May 29 '14 at 14:25
  • This is one of the most reliable solutions. It works with file names that have spaces or quotes in them and it prints a human-readable size. – basic6 May 02 '15 at 18:10
  • @KIAaze Thanks for reviewing and fixing my code! – jlliagre Jun 19 '17 at 12:46
  • You're welcome. :) But now that you added PiB, the for loop should be i<7. Or do you have a different awk version than me where split returns a zero-indexed array? – KIAaze Jun 19 '17 at 14:03
  • @KIAaze Now the loop uses `i<7` and supports Exbibytes ! :-) – jlliagre Oct 25 '18 at 21:59
4

Just an alternative, using ls:

ls -nR | grep -v '^d' | awk '{total += $5} END {print total, "Total"}'

ls -nR: -n like -l, but list numeric UIDs and GIDs and -R list subdirectories recursively.

grep -v: Invert the sense of matching, to select non-matching lines. (-v is specified by POSIX .). '^ d' will exclude the directories.

Ls command: http://linux.about.com/od/commands/l/blcmdl1_ls.htm

Man Grep: http://linux.die.net/man/1/grep

EDIT:

Edited as the suggestion @ Sergey Vlasov.

stderr
  • 10,264
  • 2
  • 32
  • 49
  • Using the `-n` option for `ls` instead of `-l` (show UID/GID numbers instead of names) is safer, because **user and group names can contain spaces** (e.g., if `winbind` or `sssd` is used to join the system to a Windows domain, you can get group names like `domain users`). It should also be faster due to not needing to lookup user and group names. – Sergey Vlasov Jun 03 '13 at 04:49
  • Thanks, this is MUCH faster than find -exec ls! – gpothier Jun 26 '18 at 21:44
3

If all you want is the size of the files, excluding the space the directories take up, you could do something like

find . -type f -print0 | xargs -0 du -scb | tail -n 1

@SergeyVlasov pointed out that this will fail if you have more files than argmax. To avoid that you could use something like:

find . -type f -exec du -sb '{}' \; | gawk '{k+=$1}END{print k}'
terdon
  • 52,568
  • 14
  • 124
  • 170
  • 1
    This command will silently give a wrong result if the directory contains so many files that they don't fit in the limit on execve() arguments size — in this case `xargs` will invoke `du` multiple times, and each invocation will print grand total just for its part of the complete file list, then `tail` will show just the total size of the last part. – Sergey Vlasov Jun 02 '13 at 17:05
  • 1
    @SergeyVlasov good point, I hadn't thought of that, thanks, answer updated. – terdon Jun 02 '13 at 17:19