5

I have a large XML with the structure below. Now, I want to get rid of the <tuv xml:lang="en-GB"><seg>CONTENT</seg></tuv> nodes, so for each unit only the de-DE part stays (<tuv xml:lang="de-DE"><seg>CONTENT</seg></tuv>). Is there a way to do this with Notepad++ or a different tool? I am not really into coding, so the simpler the better.

What I have:

<tu tuid="ID_0">
<tuv xml:lang="en-GB">
<seg>Hello!</seg>
</tuv>
<tuv xml:lang="de-DE">
<seg>Hallo!</seg>
</tuv>
</tu>
<tu tuid="ID_1">
<tuv xml:lang="en-GB">
<seg>This is a test content! :)</seg>
</tuv>
<tuv xml:lang="de-DE">
<seg>Das ist ein Testinhalt! :)</seg>
</tuv>
</tu>
<tu tuid="ID_2">
<tuv xml:lang="en-GB">
<seg>All your base are belong tu us ...</seg>
</tuv>
<tuv xml:lang="de-DE">
<seg>Och nö, echt jetzt?</seg>
</tuv>
</tu>

What I want:

<tu tuid="ID_0">
<tuv xml:lang="de-DE">
<seg>Hallo!</seg>
</tuv>
</tu>
<tu tuid="ID_1">
<tuv xml:lang="de-DE">
<seg>Das ist ein Testinhalt! :)</seg>
</tuv>
</tu>
<tu tuid="ID_2">
<tuv xml:lang="de-DE">
<seg>Och nö, echt jetzt?</seg>
</tuv>
</tu>
Mithical
  • 321
  • 1
  • 3
  • 14
Robert Herzog
  • 51
  • 1
  • 1
  • 2

2 Answers2

5

In NotePad++ open the Replace dialog and specify:

Find what: <tuv xml:lang="en-GB">.*?</tuv>

Replace with:

Then set the Search Mode to 'Regular expression' and put a check in the '. matches newline' box.

Replace all should now remove all the en-GB blocks. Note: the trick here is with the ? following the *. It instructs the regex to be non-greedy.

snowdude
  • 2,870
  • 17
  • 20
5

Ctrl+H (Replace...)

Find what: <tuv xml:lang="en-GB">.*?</tuv>

Search mode: Regular expression

checked: . matches newline

m4573r
  • 5,561
  • 1
  • 25
  • 37