4

I have two volumes, one NTFS (15028048 14213120 51544), that corresponds to (1-K blocks, used, available), and the other is a LUKS + ext4 (14895184 13869752 245752). I checked the files inside with md5sum and they are the same, however, how can it be that NTFS has 14213120 and the other 13869752? I understand that different formatting will take away a different amount of space, but once you have a 1 MB file, this 1 MB file is the same in NTFS, vfat, ext3, ext4 or whatever format you use. What am I missing?

Quora Feans
  • 678
  • 2
  • 9
  • 23

2 Answers2

6

TL;DR → the size of the file's contents are only part of the story, and the way it's stored in the filesystem influences your effective space.

A 1MiB file may take a different amount of "housekeeping" overhead, i.e. metadata, as well as being allocated across a different number of blocks.

The length of a file's contents — a.k.a. its "inode" (A regular file has one or more directory entries, a.k.a. hard links, and one inode, which has the contents) — should be identical no matter what filesystem is used, but the amount of space that the inode and metadata collectively make unavailable for other files to use can differ somewhat.

E.G. a filesystem might allocate space in 4kiB blocks; in which case, a directory might take up 4kiB even with only a single entry, and a 1 byte file could take up 4kiB. A 4,097 byte file on a filesystem with 4kiB blocks would take an effective 8KiB of space.

stat .emacs

  File: ‘.emacs’
  Size: 36303           Blocks: 72         IO Block: 4096   regular file
Device: fd01h/64769d    Inode: 2097977     Links: 1

The stat output here shows that my home is on a filesystem with typical 4,096-byte (4KiB) I/O blocks, so the ~36kiB (36,303B) file actually takes up a bit more space than it would have on a 1970's system with 512-byte I/O blocks.

The Blocks: figure is in traditional Unix 512-byte (½kiB) blocks, so by dividing the IO Block size by 512 (4,096 ÷ 512 = 8), you'll find that number of blocks is always a multiple of 8 on this filesystem. That allocation overhead accounts for a wasted space of ( 72 × 512 = 36,864 ) - 36,303 = 561 bytes in this case. (On a filesystem with 512-byte blocks, that would be only 71 blocks used, so an overhead of only 49 bytes.)

The ext2/3/4 filesystems, in particular, have some optimizations to make symbolic links and very small files take up less space, and handle "sparse" files (files with many blocks of nothing but zeroes) particularly efficiently. For example, an empty file only takes up the space of its directory entry:

  File: ‘emptyness’
  Size: 0               Blocks: 0          IO Block: 4096   regular empty file                                                                 

Likewise, a short symlink might be stored without actually allocating any blocks of its own:

  File: ‘symlink’ -> ‘emptyness’
  Size: 9               Blocks: 0          IO Block: 4096   symbolic link                                                                      

The 9 bytes of the symlink are stored in the directory file, so it takes up 0 bytes.

A much, much longer symlink name might need to allocate space:

File: ‘long-symlink’ -> ‘a very very long symlink target name … …                                                                     
Size: 2000            Blocks: 8          IO Block: 4096   symbolic link                                                                    

Note that 8 "Unix blocks" is the smallest amount this filesystem can allocate (4096÷512), so the ~2k contents take up ~4k.

Here's a file with nothing in it but zeroes (created with truncate -s 49M sparse-file), which also has a logical size that doesn't match its disc footprint:

File: ‘sparse-file’
Size: 51380224        Blocks: 0          IO Block: 4096   regular file                                                                   

Note that these kinds of files are often created by random-access programs like databases, or random-order downloading like BitTorrent.


There are also a number of options that you can set when you create ("format") the filesystem, which can make it more efficient for your particular work-load, if it's a big concern. (The mke2fs manual has some details.)

In addition to metadata that relates to individual files (including directories, regular files, symbolic links, and device special files) like file names, permissions, ACL's, extended attributes, and the like, there's also some metadata on the filesystem level itself, like the "superblock" (of which ext2/3/4 keeps several backup copies) and the journal inode, which will count against your effective usable (free) space.

And, finally, ext2/3/4 usually reserves a percentage of each volume for the superuser's use only (i.e. root) — this space won't show up as "free" on e.g. du or most similar programs, but it's still unused space for root's use only. This can be adjusted with tune2fs at any time — I typically reduce it greatly on "documents" type volumes, but leave a certain amount reserved on / in case of an emergency of some kind.

If you'd really like to see more detail than is usually useful, dumpe2fo will print up a rather lengthy report of the various options in effect on your filesystem — e.g. a bit of that report might read:

Filesystem features:      has_journal ext_attr resize_inode dir_index f … …
Filesystem flags:         signed_directory_hash 
Default mount options:    user_xattr acl
Filesystem state:         clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              128016
Block count:              512000
Reserved block count:     25600
Free blocks:              331865
Free inodes:              127631
BRPocock
  • 241
  • 1
  • 5
0

I think this blurb from Doc's answer to the SU question "Is ext4 more expensive than ntfs?" covers it well:

Each file system implements its datastructures differently. Something neat you can try is formatting a partition with various file systems and comparing. The "free" space will differ. Also, as you fill them up, the will be a difference in storage overhead and so in a sense how much space is required to store the same file CAN differ depending on the file system. Usually, when you format a file system you can also select some parameters for how it's to be arranged - this has an impact too.

Ƭᴇcʜιᴇ007
  • 111,883
  • 19
  • 201
  • 268