5

Is there a [virtual-]filesystem that can automatically split files in storage but present them to the user as single files?

For example:

The user sees: /user/files/10TBfile
But it is stored as: /backingstorage/user/files/10TBfile.{1..100}

Basically the same way a split archive works but in real time. I feel like it should be possible since it's basically storing each virtual disk block as a separate file but I don't know of any existing solutions for it.

If you're curious, the end goal might be something similar to this question: Divide out local files in different servers with limited space with rsync except that I have single large files that need to be split and the files need to be updated in real-time so a daily cron/rsync and split tar are out of the question. I already have the remote drives mounted so I just need a way to split the file and present it as a single file to the user.

Thanks!

arcyqwerty
  • 1,050
  • 13
  • 28

4 Answers4

6

What you want is chunkfs:

ChunkFS is a FUSE based filesystem that allows you to mount an arbitrary file or block device as a directory tree of files that each represent a chunk of user-specified size of the mounted file.

It was written for the same purpose as yours:

ChunkFS was originally written for making space-efficient incremental backups of encrypted filesystem images using rsync. Using the --link-dest option of rsync, you can create incremental backups from the ChunkFS-mounted image where any chunk that hasn't changed since the last backup will be a hard link to the corresponding chunk from the previous backup.

Dan D.
  • 5,964
  • 2
  • 37
  • 40
  • Seems to be pretty much exactly what I was writing. Funny since I actually already have pretty much the same read-only functionality but am currently working on write. Thanks for pointing me to something that already works! – arcyqwerty Mar 29 '14 at 06:06
1

Typically you would do this at the block level. Some solutions to this include:

  • Raid 0
  • DRBD ( more so for mirroring, network OK )
  • ZFS ( higher level of abstraction )

From a file system perspective:

  • Manually store half the file on one file system and half on the other then present that through something like a custom FUSE file system ( Think complicated custom code ).
  • Most file system solutions are focused on synchronization not on data partitioning.
  • Hadoop ( Data Sharding, not a traditional FS )
Clarus
  • 763
  • 1
  • 6
  • 14
1

I found the answer here: https://unix.stackexchange.com/a/67995/6226

You create several container files, concatenate them as a device, format them with a file system, mount that file system and put your big file in it.

f.ardelian
  • 857
  • 5
  • 11
  • 21
  • 1
    sounds like it would work well but for a large number of fragments requires many loop devices (imagine mounting several large files at a time) – arcyqwerty Mar 29 '14 at 06:05
0

I'm not sure but I think you could use striping (which is the technique of segmenting logically sequential data, such as a file) witn LVM for example.

Here is some info about it from RedHat:

2.3.2. Striped Logical Volumes When you write data to an LVM logical volume, the file system lays the data out across the underlying physical volumes. You can control the way the data is written to the physical volumes by creating a striped logical volume. For large sequential reads and writes, this can improve the efficiency of the data I/O. Striping enhances performance by writing data to a predetermined number of physical volumes in round-round fashion. With striping, I/O can be done in parallel. In some situations, this can result in near-linear performance gain for each additional physical volume in the stripe. source

Additional info here

Boogy
  • 556
  • 4
  • 6