3

I need a file storage solution that will provide read/write for files to a collection of web servers. The space demands are modest -- about 2 TiB right now but will probably grow to twice that. NFS is what is used now and it looked good until I saw that almost all the files are in one single directory. Considering there are about 15 million files right now and the total could grow to 20 or 30 million I am worried that a linux filesystem might have a problem with that many.

I proposed that the application be modifed to split the files up across several sub-directories but the powers-that-be say "no" to that. That seems to leave me with two options:

  1. NFS. This would be the simplest but I am not sure how well it can handle the number of files in the directory.

  2. Cloud storage -- here that means Azure. I don't know enough about cloud storage to have an opinion on expected performance. Also I do not know what kind if rewriting will be necessary. Can object storage in the cloud be made to appear like part of the local file system like I can with NFS?

Giacomo1968
  • 53,069
  • 19
  • 162
  • 212
  • 2
    You can't control a file structure with NFS for it to save to different folders? Each server having it's own folder, or the logic that controls the copies/writes to `YYYYMMDD` time stamp a specific folder, etc? Or is that the application level stuff that is copying files to the NFS that you are saying the developers are saying it's not possible? – Vomit IT - Chunky Mess Style Jul 22 '22 at 20:11
  • I suppose I could but I do not think that would work in this case. These files are shared with all the servers in the pool so all need to be able to find them. Node 1 may write a file and node 3 or 4 may need to pick it up later. Sometimes months or years later. Sorry I was not clear on that. Any strategy I can devise to break up the files between several directoiries would require code changes and the project manager is not willing to do anythng beyond the most trivial of changes. He seems to think this is a systems problem and he is not entirely wrong. – Stephen Carville Jul 22 '22 at 22:32
  • 3
    He is not entirely right either. Programmers are meant to work *with* systems to ensure that their programs are performant and just blasting millions of files into a single directory without caring about things like directory seek times or searchability is short sighted, irresponsible and frankly quite stupid. https://superuser.com/questions/623965/can-file-system-performance-decrease-if-there-is-a-very-large-number-of-files-in is very relevant (though for Windows). You can just throw faster disks and processors at it, but that will only go so far before poor design makes it unusable. – Mokubai Jul 23 '22 at 02:54
  • 2
    https://www.google.com/search?q=Millions%20of%20files%20in%20a%20single%20directory gives many links across [su] and [so] and also [Unix.se] that show that this is a problem programmers should care about. It is not just a "systems problem". – Mokubai Jul 23 '22 at 02:57
  • 1
    There is even evidence that on Linux using `ext4` that simply having had millions of files in a directory *in the past* can make it unusable: https://unix.stackexchange.com/questions/679176/listing-directory-takes-forever-on-a-folder-that-used-to-have-millions-of-files. Granted you can use other filesystems but that would mean a lot of work benchmarking each filesystems behaviour and their pros and cons. – Mokubai Jul 23 '22 at 03:14
  • 1
    [Storing a million images in the filesystem](https://serverfault.com/q/95444/343888), [What are the performance implications for millions of files in a modern file system?](https://serverfault.com/q/796665/343888), [Performance associated with storing millions of files on NTFS](https://serverfault.com/q/622872/343888) – phuclv Jul 23 '22 at 08:20
  • [Can really big folder (more than one million files) slow Nginx down?](https://serverfault.com/q/1064833/343888) – phuclv Jul 23 '22 at 08:23
  • @StephenCarville Please don’t post your answer into the question text [as you did in a previous revision](https://superuser.com/posts/1733104/revisions). If you have self-solved this issue, please feel free to post your own answer and check it off as such. – Giacomo1968 Jul 28 '22 at 18:43
  • @Mokubai I've done tests with just 10 million of files (each an wikipedia article). And just like the theory tells me. It should all be faster when using one directory. We are already working with btrees. The only reason to split files into subdirectories are other limits, for example there seems to be a linux problem with hitting a 16million (24bit border of entries). I assume this as a mistake. 100 millions are nothing these days for an algorithm. The only problem is that we have no way to iterate subranges in a directory. It's an API limit, not a data structure limit. – Lothar Jul 23 '23 at 21:05

2 Answers2

1

I just realized I never posted what I finally did to "solve" this.

I built a GlusterFS cluster consisting of four servers. Servers 1 & 2 mirror to each other. Server 3 & 4 mirror to each other. The new files are written alternately to the 1/2 and 3/4 pair. Sort of like a Raid10 for file storage. I think this called a 2x2 cluster by the glusterfs folks.

The volumes are managed by lvm and formatted as XFS.

So far it has held up well. We just passed the 25 million file mark and performance is still acceptable. It takes a while (about 3 hours) to get a listing but I only have to that once per day for statistical purposes. According to df we are using about 5.2T of 8.0T total though bear in mind the actual storage used is twice that because of the mirroring.

Delayed thanks to all who answered. It helped me arrive at a compromise that should hold us for a while.

0

He seems to think this is a systems problem and he is not entirely wrong.

To some extent, yes, older filesystems were really bad at handling millions of files; newer ones do it differently. For example, ext2 and FAT use simple linear lists of files, so the scalability problems were indeed problems with ext2 and FAT, which were subsequently improved to use HTrees or B+trees in ext4 and NTFS.

(Eventually, however, the design of the filesystem can only do so much – I suspect it's not easy to optimize a general-purpose filesystem to handle billions of files per directory on a server and still remain usable for tens of files per directory on a desktop computer without too much overhead...)

But the way you use the filesystem also matters a lot. Even if you have millions of files on e.g. XFS, chances are that direct lookups by exact path will remain reasonably fast, as they only involve reading a small part of the directory data; but trying to list the directory will be much slower in comparison. So your program should be designed to never need to list the entire directory, but to know exactly what files it needs.

(As an analogy, if you use a SQL database, you already know that the correct way to search for data is to let server-side queries of "SELECT WHERE this=that" do the job – you don't usually try to retrieve the entire table every single time and then blame it on your network being too slow.)

NFS. This would be the simplest but I am not sure how well it can handle the number of files in the directory.

NFS doesn't store the directory lists on its own, it only provides access to the capabilities of the remote "storage" filesystem.

So if all your operations only deal with exact paths (i.e. read this specific file, write this file) then the capabilities of NFS itself should be completely irrelevant to your problem, as NFS will never need to look at the complete list of files – it will only forward the exact requested paths to the fileserver, where the NFS server's on-disk filesystem (e.g. zfs or ext4) needs to worry about handling the whole directory listing.

In other words, you're only shifting the problem to a different machine, but it still remains the exact same problem there. (Though the NFS file server certainly could use a filesystem that handles many files better than the one used on the web server, but you can do that locally as well.)

Any strategy I can devise to break up the files between several directoiries would require code changes and the project manager is not willing to do anythng beyond the most trivial of changes.

The most trivial change would be to use part of the file name itself that becomes the subdirectory name, as this makes it easy to find the files later – just apply the same transformation to the file name as you did when storing it.

Take a look at how .git/objects/ works. It can accumulate many object files (especially if you travel back in time to when Git didn't yet have packfiles), so they are separated into subdirectories based on the first 2 digits of the object ID.

For example, the Git object c813a148564a5.. is found at objects/c8/13a148564a5.., using one level of subdirectories with a 8-bit prefix – there are 256 possible subdirectories, and the number of files within each subdirectory is reduced approximately 256 times (e.g. only ~40k files per directory, in a 10-million-object repository) – and the software knows exactly where to find each object knowing only its name.

If you want to spread files out even more, you can use longer subdirectory names (e.g 12-bit for 1/4096) or even create a second level of subdirectories.

This works best if the names are evenly distributed, like hash-based names usually are. If your file names tend to start with the same text over and over, you should hash the names to avoid that (and store the mapping of real name to hash name in a database).

u1686_grawity
  • 426,297
  • 64
  • 894
  • 966
  • this scheme has been suggested here [Storing a million images in the filesystem](https://serverfault.com/q/95444/343888) – phuclv Jul 23 '22 at 08:20