1

I am trying to create a GUI for searching through a large number of huge configuration files (approx 60000 files, each one with a size between 20 KByte to 50 MByte). Those files are also updated frequently (~3 times/day).

So far I have found SOLR and Sphinx, but found no way to have them return the list of matching lines including a line number for each matching document.

What we currently do is we convert each text file to XML:

<xml>
   <line number="1">foobar</line>
   <line number="2">barfoo</line>
   ...
</xml>

and store the result in an eXist-db. However, storing documents is way too slow, so we need an alternative.

Any better ideas?

Oliver Salzburg
  • 86,445
  • 63
  • 260
  • 306
knipknap
  • 121
  • 5

1 Answers1

0

Opinion: If you have large amounts of volatile text data you need fast access to, converting them to XML will make your problems much harder to solve.

Any better ideas?

Leave the files as text and use Lucene?

(I'm assuming that grep doesn't cut it)

RedGrittyBrick
  • 81,981
  • 20
  • 135
  • 205
  • 1
    "Some people, when confronted with a problem, think “I know, I'll use XML” Now they have two problems." – Paul Nov 11 '11 at 11:17
  • The only way to make Lucene return line numbers is by storing each line in a separate document. This, however, makes updating a document hard (and it is probably impossible to make updates fast). – knipknap Nov 14 '11 at 08:52
  • @knipknap: I can't work out if that's how [this example](http://www.tom-carden.co.uk/2007/08/01/a-quick-less-certain-note-on-using-lucene-in-processing/) does it. ([Applet with source](http://www.tom-carden.co.uk/p5/simple_lucene_demo/applet/)) – RedGrittyBrick Nov 14 '11 at 09:20
  • The demo only adds one file, but yes, it stores each line as a separate document using the loop at the comment "pull the data from our list and add it to the index" in SimpleLucene.java. Since they don't do updates, this is not a problem. – knipknap Nov 14 '11 at 11:19