7

I have heard that file system performance (on a NTFS partition) can start decreasing if the number of files in a single directory becomes very huge (eg : >= 10.000.000 items). Is it true ?

If true, what is the recommended maximum number of files in a single directory ?

EDIT:

About performance: I'm thinking about file operations inside that folder (read, write, create, delete) that could possibly get slow.

ᔕᖺᘎᕊ
  • 6,173
  • 4
  • 33
  • 45
tigrou
  • 871
  • 5
  • 14
  • 23
  • Duplicate of http://superuser.com/questions/6382/how-many-files-can-you-put-in-a-windows-folder, http://superuser.com/questions/453348/is-it-bad-if-millions-of-files-are-stored-in-one-ntfs-folder – James P Jul 25 '13 at 08:27
  • Yes. MSDN advices not to keep more than 20k files in a single directory. (Windows Vista 2gb Ram) - I have noticed when it starts to go over 40k (Windows 7 4gb Ram) it grinds to a halt. Everything just hangs and stops to work. But having 100k sub directories does not affect speed at all :) – Piotr Kula Jul 25 '13 at 08:29

2 Answers2

7

I answer my own question : Yes, it's definitely slower.

I wrote a C# Console Application that creates many empty files in a folder and then randomly access them. Here is the results :

10 files in a folder        : ~26000 operation/sec
1.000.000 files a in folder : ~6000 operation/sec

Here is source code :

List<string> files = new List<string>();

Console.WriteLine("creating files...");
for (int i = 0; i < 1000 * 1000; i++)
{
    string filename = @"C:\test\" + Guid.NewGuid().ToString();
    using (File.Create(filename));
    files.Add(filename);
}

Console.WriteLine("benchmark...");            
Random r = new Random();
Stopwatch sw = new Stopwatch();
sw.Start();

int count = 0;
while (sw.ElapsedMilliseconds < 5000)
{
    string filename = files[r.Next(files.Count)];
    string text = System.IO.File.ReadAllText(filename);
    count++;
}
Console.WriteLine("{0} operation/sec ", count / 5);
tigrou
  • 871
  • 5
  • 14
  • 23
  • +1 for the code. I found that as long as there were above 1000 files, the time was very similar, no difference 1k or 300k. Under 1000 files it depended on the number of files. – wezten Feb 28 '17 at 13:57
  • 1
    To be useful, you need to *compare to some alternative* way to randomly store and access 1M files. E.g. make 1000 subfolders each containing 1000 files, then randomly access those 1M files. – ToolmakerSteve Mar 29 '19 at 06:11
  • I'm thinking Windows simply caches the ten files allowing that test to run 4x faster - after the first read. – bmiller Jul 20 '21 at 21:35
3

If you read this, then you should get a pretty good understanding of how NTFS works indexing files and folders.

Locally it shouldn't be much of a hazel indexing files and folders, if you follow the guidelines in the link above, but it will need alot of maintenance with that many files.
On a network it will be another story. It will be slow, this is from my own expirience at work, where we have folders with thousand of folders and that takes some time to index over a network.

Another thing to probably increase with that many files is to disable short-naming:, which will stop windows from creating a second file directory entry which will follow the 8.3 convention (MS-DOS file-naming convention) and decrease the time for folders to enumerate, because it doesn't have to look up the short-names associated with their long-names when enumerating.

  • Go to Run in the Start menu
  • Type cmd and when you see the command-prompt, then right-click on it and select Run as administrator
  • When in the Command prompt type fsutil behavior set disable8dot3 1 to disable short-naming
  • Reboot

If you want to enable it again, type fsutil behavior set disable8dot3 0

Jesper Jensen
  • 684
  • 4
  • 11
  • 1
    Not entirely true. Have you ever tried to access a folder with 80k files (say bad email folder on a server) without any tweaks. You can wait a day before it enumerates. – Piotr Kula Jul 25 '13 at 08:31
  • No ofcourse it's not true in all cases, but I still believe if you do it right and maintain it regulary, then you could have a working system. What do you mean with bad email folder? – Jesper Jensen Jul 25 '13 at 08:50
  • 1
    You clearly never had to deal with a mail server before :) You need to write in your answer that if it gets maintained well (about 80% of systems admins don't do that) then there will be no problems. Besides your answer does not really talk about read/write performance and what disabling 8dot3 will do to affect performance. Neither is there hard facts that this does help. Sorry to be such pain.. but your answer needs improovement. -1 till you do so. Let me know – Piotr Kula Jul 25 '13 at 08:57
  • I never said that i've dealed with a mail servers or that the above is from my own expirience (except the network part) :). It is in my answer to maintain `but it will need alot of maintenance with that many files`..But thanks for the criticism and i'll try to improve my answer a bit. – Jesper Jensen Jul 25 '13 at 09:16
  • 1
    See StephenR's comments on [this answer](https://stackoverflow.com/a/26205776/199364) - if already have many files, after disabling 8.3, need to **strip existing 8.3 names** to get speed improvement. – ToolmakerSteve Mar 29 '19 at 06:27