Remove XML node in Notepad++

Question

I have a large XML with the structure below. Now, I want to get rid of the <tuv xml:lang="en-GB"><seg>CONTENT</seg></tuv> nodes, so for each unit only the de-DE part stays (<tuv xml:lang="de-DE"><seg>CONTENT</seg></tuv>). Is there a way to do this with Notepad++ or a different tool? I am not really into coding, so the simpler the better.

What I have:

<tu tuid="ID_0">
<tuv xml:lang="en-GB">
<seg>Hello!</seg>
</tuv>
<tuv xml:lang="de-DE">
<seg>Hallo!</seg>
</tuv>
</tu>
<tu tuid="ID_1">
<tuv xml:lang="en-GB">
<seg>This is a test content! :)</seg>
</tuv>
<tuv xml:lang="de-DE">
<seg>Das ist ein Testinhalt! :)</seg>
</tuv>
</tu>
<tu tuid="ID_2">
<tuv xml:lang="en-GB">
<seg>All your base are belong tu us ...</seg>
</tuv>
<tuv xml:lang="de-DE">
<seg>Och nö, echt jetzt?</seg>
</tuv>
</tu>

What I want:

<tu tuid="ID_0">
<tuv xml:lang="de-DE">
<seg>Hallo!</seg>
</tuv>
</tu>
<tu tuid="ID_1">
<tuv xml:lang="de-DE">
<seg>Das ist ein Testinhalt! :)</seg>
</tuv>
</tu>
<tu tuid="ID_2">
<tuv xml:lang="de-DE">
<seg>Och nö, echt jetzt?</seg>
</tuv>
</tu>

score 5 · Answer 1 · answered Aug 22 '12 at 15:34

In NotePad++ open the Replace dialog and specify:

Find what: <tuv xml:lang="en-GB">.*?</tuv>

Replace with:

Then set the Search Mode to 'Regular expression' and put a check in the '. matches newline' box.

Replace all should now remove all the en-GB blocks. Note: the trick here is with the ? following the *. It instructs the regex to be non-greedy.

score 5 · Answer 2 · answered Aug 22 '12 at 15:39

5

Ctrl+H (Replace...)

Find what: <tuv xml:lang="en-GB">.*?</tuv>

Search mode: Regular expression

checked: . matches newline

answered Aug 22 '12 at 15:39

m4573r

5,561
1
25
37

Remove XML node in Notepad++

2 Answers2