4

please help me finding a solution to the following problem:

I have a folder on my W7-NTFS drive with saved webpages

files in this folder have been lost, some of the contents of which I have found randomly stored in .chk file format (I think this is W7 chkdsk performing)

Now, the only retraceably memory of these files seems to lie in the XAPIAN db-file of the recoll index I updated some while ago. I can search for the contents of these lost files, but Recoll cannot open them because they don't properly exist any more.

What I would like to do is being able to compare the contents of my present folder to the files recoll indexed before data was lost. That way I know exactly what other files got lost and could restore them by fetching them in the web.

Many thanks for your input.

  • Linux Newbie
Joseph
  • 193
  • 1
  • 8
  • OK, so you want do find files with specific contents , is that right ? `grep 'sometext' /path/to/some/folder/* ` can help you with that. I don't quite understand which files you're looking for, because you mention webpages and chk files, so which ones you need ? – Sergiy Kolodyazhnyy Jun 26 '15 at 20:35
  • hello and thank you for your reply. - Let me clarify: I do not want to find files that are presently on my filesystem. the files are gone and I want to use the image created by recoll-index/xapian-db to find out which filenames have been deleted. – Joseph Jun 26 '15 at 21:16

1 Answers1

4

At least with recent Recoll versions, a query language request like:

dir:/path/to/dir

should return all files which lived in the subtree under dir.

If you switch to the table mode display for the results, you can use the Results->Save as CSV menu entry to produce an easily processed listing.

If the data is very precious, there are also ways to retrieve a raw likeness of the text contents from the index data.

Hope this helps. I'm not too sure I'll get a notice if this page is updated, so don't hesitate to reach me through the contact on the Recoll web site if you need more help.

medoc
  • 596
  • 2
  • 3
  • Due to a relatively old recoll version and other issues, the solution was finally to use a low level Xapian command to extract the paths stored in the recoll index. For the record, this would not work well with very long paths (over 150 chars), these get truncated/hashed by recoll. Command used: `delve -a -1 ~/.recoll/xapiandb | egrep '^Q/path/to/the/dir'`. `delve -a -1` extracts all terms from the index. Terms beginning with Q are Recoll file unique ids (for completude, this would be slightly different with a character-case-sensitive index). – medoc Jul 01 '15 at 07:36
  • Thankyou - this worked out exactly the way needed and saved a lot of work ! – Joseph Jul 01 '15 at 09:24