47

In Textpad or Notepad++ is there an option to export all the matches for a regular expression find, as a single list?

In a big text file, I am searching for tags (words enclosed in % %), using regular expression %\< and \>%, and want all the matches as a single list, so that I can remove duplicates using Excel and get a list of unique tags.

Kiranshell
  • 669
  • 2
  • 7
  • 12
  • If you use the RegexExtract plugin for Notepad++, it can remove duplicates for you and there is no need to post-process with Excel. – R. Schreurs Aug 30 '18 at 09:26
  • Linked: [How to copy marked text in notepad++](https://stackoverflow.com/questions/2298962/how-to-copy-marked-text-in-notepad) – Ivan Chau Jun 25 '19 at 02:40
  • related to this https://stackoverflow.com/a/66330516/961631 – serge Feb 23 '21 at 10:56

5 Answers5

84

You can achieve this by using Backreferences and Find and Mark functionality in Notepad++.

  1. Find the matches using regex (say %(.*?)% ) and replace it by \n%\1%\n , after this we will have our target word in separate lines (i.e. no line will have more than one matched word)

  2. Use the Search-->Find-->Mark functionality to mark each line with regex %(.*?)% and remember to tick 'Bookmark Line' before marking the text

  3. Select Search-->Bookmark-->Remove Unmarked Lines
  4. Save the remaining text. It is the required list.
Ankit
  • 4,716
  • 2
  • 23
  • 32
  • I have one more file with <> as tags instead of % %, I tried with <(.*?)> and \n<\1>\n, but its not working, please help. – Kiranshell Sep 21 '12 at 19:27
  • you are welcome :) For me its working for <> also. Are there Nested <> ? Could you elaborate what exactly is 'not working' ? – Ankit Sep 22 '12 at 16:18
  • I am trying to make a list of tags like before but these once have <>, I am using <(.*?)> instead of %(.*?)% and \n<\1>\n instead of \n%\1%\n, this is the link to a sample file http://wikisend.com/download/158050/tags.txt – Kiranshell Sep 22 '12 at 19:17
  • I tried it again with the provided text and using <(.*?)>, its working normally. I got the list of tags .....and so on – Ankit Sep 22 '12 at 19:45
  • Please mention the exact error/problem you are having. Might sound silly but remember to move cursor to top.I often do that mistake and search returns no result... :) – Ankit Sep 22 '12 at 19:48
  • you were right :), I has set search direction to up, lol. Thank you so much Lamb, this has relay go me interested in regular expressions. – Kiranshell Sep 22 '12 at 23:23
  • Step by step is great for discovering new possibilities in notepad++. The regex expressions didn't work for me (I'm a regex Noob). '%' was not needed. I got mine working by testing it with regex101. – lode Jan 30 '20 at 19:40
11

Is doing this in Notepad++ a mandatory requirement?  Are you on Windows or some form of Unix?  If you’re on Windows, you can do it (partly) from the Command Prompt:

findstr /r "%[a-z].*[a-z]% %[a-z]%" your_file > new_file

findstr is vaguely inspired by grep, so this new_file will contain all lines matching your search criteria; you can then use Notepad++ to strip out the unwanted text (to the left of the first % and to the right of the second one).


And, of course, if you’re on Unix, you can do the equivalent task with sed.  And if you have GNU grep (i.e., if you’re on Linux), you can do it with grep -o.

4

There is a Notepad++ plugin which can copy matched regex expression to new file in new tab. RegexExtract

Because I didn't find any plugin for Notepad++ that can extract some text from current document or all files from a location with some additional settings (like case conversion), I decided to try to make it myself. (...) Plugin interface is pretty straightforward (...). (...) "Find", "Replace" and "Mask" fields use C++11 regex syntax. Extracting from files works right now only for those in UTF8.

Edit Dialog input tailored to the question

enter image description here

In the image you can see how to fill in the dialog. I assume that a word does not contain spaces, etc., only characters matched by \w. Notably:

  • Use a pair of brackets, to allow selecting the word, without the percetange characters.
  • Choose option Extract with replace, to select the first match. Otherwise, you will get a columnar output of all $1, $2, etc.
  • Check Skip $& ... to leave out the complete matches.
  • Check Filter unique to report each match only once.
  • Click Extract to select get results. (Search only finds the matches, but does not report).
Greck
  • 261
  • 1
  • 5
2

In TextPad, you'd bring up the Find box as usual, then use the Mark All button.

From there, use Copy Bookmarked Lines function. (Edit menu > Copy Other > Bookmarked Lines.)

daveloyall
  • 169
  • 6
  • Personally, I do that exact operation so often that I've configured a keyboard shortcut for the Copy Bookmarked Lines function: Ctrl+Alt+c. – daveloyall Feb 16 '17 at 18:22
  • I came to this Question because I was searching for the Notepad++ question. After many years as a loyal and unpaying Textpad user, I'm switching to Notepad++ (GPL). – daveloyall Feb 16 '17 at 18:23
0

If anyone is interested in an online solution instead (since the notepad++ plugin doesn't work on 64bit) you can try Molbiotools it can extract your regex completely without additional lines or with them.