0

I am trying to use Regex to search between the start of the report to the start of the next report further down the same file, capture the report as a whole, then use that to search for duplicates, and remove them.

They are broken up by CRLFs and I thought I was smart by doing (\r\n).*(\r\n) to capture report, find, delete, repeat for next report.

When I do (\r\n).*(\r\n) it captures from the next CRLF to the last CRLF in the file.

I cannot for the life of me figure out how to limit the search to just one instance of the first line of the report, the ~30 lines of the body, then the end of the report.

DavidPostill
  • 153,128
  • 77
  • 353
  • 394
  • your problem is that dot is matching new line.. try unticking the 'dot matches newline' box. Which in notepad++ might not have hard to find.. – barlop Jun 10 '17 at 08:53
  • 1
    Please [edit] and add some sample data if you want a specific answer. – DavidPostill Jun 10 '17 at 09:18
  • @DavidPostill It's possible that his understanding is at a level where he doesn't have to ask a question that is too specific to his particular case. And questions that are very specific to a person's case are often less helpful to others. It does look like he has gone some way to solving the problem himself and has just run into the dot matches new line issue. So he could get past that and if he still has problems then he could ask another question and he'd learn better that way. I wouldn't encourage him against doing that. – barlop Jun 10 '17 at 09:31

1 Answers1

1

your problem is that dot is matching new line.. try unticking the 'dot matches newline' box. Which in notepad++ might not have hard to find(See the bottom left hand corner of notepad++'s edit..find dialog box). I won't include a picture because you didn't put notepad++ in your title and I think it's good if the answer isn't tainted to look too notepad++ centric, and unnecessarily notepad++ centric. Other programs that support regex, also have a dot matches newline thing that can be ticked or unticked.

you could experiment with other searches and see if they work or don't.. some will work regardless of dot, e.g. if they don't use dot, or if they may have use dot but with eg .*? which uses an operator *? so it doesn't match too much. Other regex examples require that dot matches new line is unticked. So may as well untick it, and only tick it to see what if any contrast. You can try this ^.*$ with dot matches new line not ticked. Or your one with it unticked. Or see what happens with this a kind of pattern of the form [^X]*X, (that is a good way of averting the problem of e.g. if you do .*x then the * will include x, and you don't want it to. So you can specify everything that is not x, *, followed by x), such as \r\n[^\r\n]*\r\n or [^\r\n]*\r\n try ^[^\r\n]*\r\n Note the caret within square brackets means Not. The ^ outside square brackets means match position at the beginning of the line. Another way is trying *? specifically .*? eg \r\n.*?\r\n .*? will matches few dots as possible. So .*?X will match a few characters as possible up till X.

barlop
  • 23,380
  • 43
  • 145
  • 225
  • Thank you very much barlop! The [^\r\n].*?\r\n argument ended up being exactly what I need. The explanation of what '.' and '?' did really helped. – Justin Jarrett Jun 10 '17 at 16:39
  • @JustinJarrett what you've suggested is a kind of hybrid approach, not exactly one of my examples. The logic behind my examples is a bit clearer than whatever the logic is behind your one but if it works then it works. – barlop Jun 11 '17 at 23:36