1

I just installed WinDirStat to figure out why a NAS seems to be filling up faster than expected. The NAS is mounted as a drive on my Windows 7 machine, and 7TB of its 8TB are full so I want to find the biggest folders and files. It took WinDirStats about 2 hours to scan the NAS - in fact I was about to cancel the scan but it finished when I was writing this. Is the long scan time to be expected?

KAE
  • 1,849
  • 8
  • 32
  • 48
  • 1
    WinDirStat has not been updated for a very long time. I have no doubt its routines could be improved to support more modern hardware. In the end you are still attempting to perform an analysis on a 8 TB platter HDD which itself is slow. RAID array would make it slightly faster but that involves its own pitfalls. I can't submit an answer since there really isn't an answer to provide, other then 8 TB, would take any program awhile to process. – Ramhound Apr 25 '16 at 17:59
  • 1
    also, NAS access can be a fair amount slower than direct access, so that would increase WinDirStat's time compared to an 8tb drive on a local connection (usb/sata) – Aaron Ladd Apr 25 '16 at 18:01
  • 2
    Each small access of a file or directory on the NAS will have a minimum round trip delay. Probably 5 to 10 milliseconds for the packet to traverse your network, then a millisecond for the NAS to process the request, 10 milliseconds to get the drive head in position ready to read, then time to process and packetise and send back across the network. Depending on how the filesystem is formatted, what details windirstat gets and whether or not the NAS OS does buffering could affect the per file scan statistics. 2 hours seems extreme, but we've no idea what the makeup of your files are. – Mokubai Apr 25 '16 at 18:11

2 Answers2

5

WinDirStat must parse every single file on the target volume to determine the size. For 7TB of data, that can be a painfully slow process, especially if the filesystem is made up of many small files rather than a few large ones.

Even more latency is added when scanning a network drive, as it's relying on the SMB protocol to get metadata rather than filesystem APIs.

Unfortunately, this is perfectly normal.

Wes Sayeed
  • 13,662
  • 6
  • 41
  • 76
  • Good to know. Still worth it to figure out why we filled up 5TB in a relatively short time. Surprised Windows doesn't have built in tools to look at this, but maybe the average user doesn't need them. – KAE Apr 25 '16 at 18:56
  • What's more surprising is that none of the major OSes provide this functionality. A variant of this tool exists for Macs and Linux systems as well. IMO, this should be a standard feature. – Wes Sayeed Apr 25 '16 at 19:01
  • @KAE - It actually does, its called Windows Explorer, outside of that there is Powershell scripts. As pointed out, having a tool that generates a report on your disk usage, is not a standard feature in most operating systems. – Ramhound Apr 25 '16 at 19:18
  • Windows Explorer functions exactly the same way. Right-clicking to get properties makes it traverse the folder tree to add up file sizes. *nix systems have the `du` command which has to do the same thing. And neither show a nice graphical representation that shows you at a glance where all your space is being used. – Wes Sayeed Apr 25 '16 at 20:56
  • @WesSayeed this functionality has been available basically for every OS since their inception. For DOS, its called Dir, for *nix its called ls. – Keltari Nov 11 '18 at 10:21
  • @Keltari; Not sure what you're saying here. Just try running `dir /a /s` against 7TB share filled with 4KB files and tell me how long that takes. – Wes Sayeed Nov 11 '18 at 16:11
3

It is not surprising, but actually not the number of TB are relevant, but the number of files and folders on the drive to scan. Also the network latency has a high impact on the speed. This latency is in general much higher compared to direct attached storage, the other responsible factor is that each request run through the entire network stack of the operating system.

In TreeSize we were able to speed things up by using multiple threads and lower level APIs than the usual FindFirstFile(). Full disclosure: I am the developer of TreeSize.

Joachim Marder
  • 460
  • 2
  • 5
  • to anyone who hasn't used it, TreeSize parses information directly from the MFT instead of reading the info on each file, so currently it's significantly fast. Currently it's the fastest solution – phuclv Sep 24 '22 at 01:14