20

In my mind, both locate and find finds a file, but why does locate run so fast?

According to its documentation, locate:

DESCRIPTION
locate reads one or more databases prepared by updatedb(8) and writes file names matching at >least one of the PATTERNs to standard output, one per line.

What files are in that database, and is every created file in that database?

Peter Mortensen
  • 12,090
  • 23
  • 70
  • 90
Tiina
  • 2,920
  • 3
  • 17
  • 26
  • 6
    Ultimately, spatial locality of the list of file names on disk. `locate` reads them from one contiguous file (which might have some compression like common-prefix removal), while `find` has to read them from the filesystem where each directory has its list of filenames in a different sector. Disk random access is slow, and having to make some open and getdents system calls for each directory isn't great either. – Peter Cordes Sep 14 '21 at 06:33

1 Answers1

46

In my mind, both locate and find finds a file, but why does locate run so fast?

find searches the file system itself. It is optimised for telling you everything (including the content of files that can be many gigabytes in size) a given file for a specific path and for being written to frequently.

locate searches a database generated from previously indexing the file system. The database is optimised for the types of searches locate performs.

What files are in that database, and is every created file in that database?

The database is populated by updatedb. The files in it are determined by what options are passed to updatedb. Files will be in it unless they are outside the areas being searched or if they have been created since updatedb last ran.

For example, my default install of Ubuntu has:

PRUNE_BIND_MOUNTS="yes"
PRUNEPATHS="/tmp /var/spool /media /var/lib/os-prober /var/lib/ceph /home/.ecryptfs /var/lib/schroot"
PRUNEFS="NFS afs autofs binfmt_misc ceph cgroup cgroup2 cifs coda configfs curlftpfs debugfs devfs devpts devtmpfs ecryptfs ftpfs fuse.ceph fuse.cryfs fuse.encfs fuse.glusterfs fuse.gvfsd-fuse fuse.mfs fuse.rozofs fuse.sshfs fusectl fusesmb hugetlbfs iso9660 lustre lustre_lite mfs mqueue ncpfs nfs nfs4 ocfs ocfs2 proc pstore rpc_pipefs securityfs shfs smbfs sysfs tmpfs tracefs udev udf usbfs"

In the /etc/updatedb.conf file.

So it indexes everything except for certain directories which shouldn't be indexes for various (but hopefully fairly obvious) reasons and a bunch of different file system types (typically ones containing secret data, data on remote file systems, and system APIs).

Quentin
  • 1,714
  • 12
  • 12
  • 2
    locate also has some delay in reflecting additions, deletions, or changes to the filesystem; the updatedb has to run before the change is seen. – mpez0 Sep 13 '21 at 13:31
  • 4
    @mpez0 — Yes, that is why I said "or if they have been created since updatedb last ran" in this answer – Quentin Sep 13 '21 at 13:35
  • 1
    Also worth noting that I have encountered systems where `locate` is installed, but `updatedb` is not enabled. Therefore, it is quite possible for `locate` to return no results at all or - worse, as it's harder to dianose - very old results from the one time when `updatedb` was actually run. – HappyDog Sep 16 '21 at 08:33