1

While taking backup of a collection of folder containing source code (where I realized later that I could exclude certain folders containing library files like node_modules), I noticed that the file transfer speed slows to a crawl (few kb/s against the usual 60 mb/s that the backup drive allows).

I'd like to understand where the bottleneck is. Is it some computation which must be performed which interleaves with the pure I/O and thus slows the whole thing down, or is it that the filesystem index on the target drive has some central lock which must be acquired and released between files?

I'm using NTFS on the target backup drive, and it's a HDD.

Peeyush Kushwaha
  • 891
  • 7
  • 18
  • There is overhead with each file transfer, the more files the larger the overhead which slows it down, this is normal in NTFS file system. – Moab Jun 29 '20 at 19:52
  • @Moab Right. So this question is asking what the causes of that "overhead" is. – Peeyush Kushwaha Jun 29 '20 at 20:02
  • Related: [Is normal this speed of Reading an Writing in my SSD?](https://superuser.com/questions/1547302/is-normal-this-speed-of-reading-an-writing-in-my-ssd) – Mokubai Jun 29 '20 at 20:19
  • See https://superuser.com/questions/344534/why-does-copying-the-same-amount-of-data-take-longer-if-spread-across-many-separ/344860#344860 – sawdust Jun 29 '20 at 21:05
  • how are you copying the files? For example, robocopy on windows is much faster than the default ctrl+c ctrl+v copying. – GChuf Jul 19 '20 at 15:42

1 Answers1

1

The problem is that the file-system catalog, which says where the files are situated on the hard disk, needs to be accessed multiple times.

For each file the copy needs to do:

  • Open the source file from the source catalog
  • Create a target file in the target catalog
  • Copy the file
  • Close the source file and mark its catalog entry as read
  • Close the target file and mark its catalog entry as created.

This causes the heads of both source and target disks to switch from file metadata in the catalog to the file itself several times during each file copy.

On an SSD this wouldn't matter much, but on a HDD this can slow down to a crawl the copy of a large number of small files. Basically the HDD would be mostly moving the head(s), which is a much slower operation than reads or writes.

Windows wouldn't also be able to effectively use the RAM as cache, since closing a file causes its flush to the disk.

harrymc
  • 455,459
  • 31
  • 526
  • 924
  • How could I get more information on what the file-system catalog? I tried to search for "file-system catalog" and "ntfs file-system catalog" but the only hits I found were related to problems people posted or documentation, but no explanation – Peeyush Kushwaha Jun 29 '20 at 20:01
  • 1
    For [NTFS](https://en.wikipedia.org/wiki/NTFS) for example this would be the [Master File Table (MFT)](http://www.ntfs.com/ntfs-mft.htm). The name is different for each file-system version. – harrymc Jun 29 '20 at 20:07