4

I would like to add a large number of files with different names from different folders to a single 7-Zip archive using 7za.exe. This should be simple, but it turned out to be a major pain.

I created a file that contains the paths (7za a out.7z @list.txt), but once there are too many (~100) files, it fails. Apparently the content of the argument file is pushed onto the command line buffer [Edit: This was likely a misinformation on my part, either way it was not the reason], which is far too small (the number of files to add is more than one million).

Splitting the process up by adding the files one by one is not feasible due to the way 7za works: When adding the next file, it creates a copy of the archive, adds the file to the copy and finally replaces the original. This is terribly slow once the archive gets to a couple 100 MB in size.

So far I am using a combination of the two approaches by adding a dozen files each time in a loop, but it is an unreliable hack and still very slow. Is there a better way to do it?

I tried to use 7-Zip wrapper DLLs (I'm a C# programmer), but none of them worked reliably and I was repeatedly suggested to just use 7za instead.

mafu
  • 2,865
  • 6
  • 30
  • 36
  • '-a are pushed onto the command line buffer' .. i tend to not believe this claim is bogus. especially since the code uses the function `ReadNamesFromListFile()` from the file `ListFileUtils.cpp`. the '-a' flag does not exist at all, the command line should look more like `7za a out.7z @in.txt` – akira Sep 04 '12 at 15:30
  • i have created 50 folders with 100 (empty) text files in each. build a list of the files. fed that list into `7za`. worked like a charm. provide more information about how you create the list of files, how the files are organized (path wise), where your working directory is, etc etc. – akira Sep 04 '12 at 15:37
  • Could you run a command to copy all those files to a new folder, then zip that, or do you need to maintain the file structure? – SaintWacko Sep 04 '12 at 16:01
  • @akira: Yes, my mistake, I updated the question text. – mafu Sep 05 '12 at 09:16
  • @akira: Could you try that with an even larger number of files and longer pathes? The command I use is `7za/7za.exe a "C:\foo/0000.7z" -mx1 -w"C:\foo/" -- "C:\temp\list.tmp"`. Working directory is c:\input and all pathes in list.tmp are located in that directory. (The real directory names are longer and deeper.) – mafu Sep 05 '12 at 09:30
  • @SaintWacko: Good idea. Directory structure has to be maintained, but since I cut off the head of the path anyway I may as well recreate the tail of the directory structure and copy the files, then just feed the whole directory to 7za. Drawback would be that it's one additional copy step of millions of files, but I guess I can live with that. Far better than having 7z copy the arch file thousands of times. – mafu Sep 05 '12 at 09:32
  • @mafutrct: how long, how deep? what are your longest path names? depending on how 7za is written and which api it is using there might be an issue with "too long" path names. – akira Sep 05 '12 at 09:45
  • @akira: I'm not certain, I will check that. However, I know for sure that everything does work when I repeatedly feed smaller lists to 7za. With large lists, it always chokes at the same time (i.e. when running it with the same list again, it stops at the same file, though the file's path is not unusual in any way). – mafu Sep 05 '12 at 10:07
  • @akira: Working directory and exe path length is 92, longest path in the list file is 168, of which 29 are the archive root folder. The target zip file path is 56. The whole command for a single file would be of length 693 (and works for me). – mafu Sep 05 '12 at 10:20
  • @mafutrct - Does that solve your problem? Should I put it down as the answer? – SaintWacko Sep 05 '12 at 12:49
  • @mafutrct: 7za uses the unicode-api on win32, so it handles up to ~32k chars in the path. is 7za crashing after the nth entry in the list? just terminating normal? whats the exit code? what is a debugger (windbg) telling you? is it always the same entry in the list that makes 7za stop? why do you want to us 7za? for the compression only? – akira Sep 05 '12 at 16:49
  • @SaintWacko: Yes, please, it is a viable workaround. – mafu Sep 07 '12 at 08:39
  • @akira: I will come back with more information shortly. Yes, it is always the same entry. I'm using it both for compression and gathering several files in one file. – mafu Sep 07 '12 at 08:41

1 Answers1

4

Due to the stupid way I approach the problem it took me a lot of time to figure out the real reason. It is due to a bug in 7zip at a place I did not think of.

There is a wrong error notification if using a listfile in cli version 4.57. If a listfile contains some lines of absolute paths (it seems that relative pathes are working) which point to a same named files in different directories a 'Duplicate filename' error comes up.

I was able to reproduce the problem using any two files of the same name in different directories with their absolute path in the listfile.

In the original question, when adding only 1 or few files at a time, it worked (the archive file itself was created without problems) since the listfile (accidentally) did not contain any such "duplicate filenames".

Officially, it is called a feature. While I am uncertain about that, it was without doubt confusing me and others due to it being undocumented.

mafu
  • 2,865
  • 6
  • 30
  • 36
  • so, deduplicating the list actually solves the problem? – akira Sep 07 '12 at 10:47
  • Oh, that would make sense, actually. 7zip doesn't actually store the file path as you would think of it. It just stores the files, and somewhere else it keeps track of the path for each file. This means that if you have two files with the same name, even if they have different paths, it will still see it as trying to put two identical files into the archive. – SaintWacko Sep 07 '12 at 12:39
  • @akira: Yes, it does - the "duplicate" files have to be added one at a time. – mafu Sep 10 '12 at 09:04
  • @SaintWacko: To clarify, in the present case 7z does store the path of the file in the archive. That's why it is possible to add files with the same name, one by one. In the listfile however, it only checks names, not pathes, and thus fails. – mafu Sep 10 '12 at 09:08
  • @mafutrct - Oh, really? My bad, I must have misread something. Thanks for straightening me out :) – SaintWacko Sep 10 '12 at 12:31