10

Is there a command line tool that can remove comments from an XML file? Or do I need to write a small program that makes use of an XML parser to do this?

Update: I'm not interested in solutions that only handle a subset of all possible XML files.

For instance a regexp can't handle XML parsing.

https://stackoverflow.com/questions/6751105/why-its-not-possible-to-use-regex-to-parse-html-xml-a-formal-explanation-in-la

Erik Sjölund
  • 1,794
  • 1
  • 14
  • 24

3 Answers3

18

I would do it in this way:

cat myfile.xml | sed '/<!--.*-->/d' | sed '/<!--/,/-->/d' > cleaned.xml

Or:

awk 'in_comment&&/-->/{sub(/([^-]|-[^-])*--+>/,"");in_comment=0}
 in_comment{next}
 {gsub(/<!--+([^-]|-[^-])*--+>/,"");
  in_comment=sub(/<!--+.*/,"");
  print}'

Or:

xmlstarlet ed -d '//comment()' file.xml
Frantique
  • 8,435
  • 35
  • 51
  • 1
    I know regex solutions can't handle XML parsing (without possible errors). But I guess the third solution is using a real XML parser under the hood? – Erik Sjölund Sep 19 '14 at 11:07
  • 1
    Erik, yes, xmlstarlet is designed for xml manipulation. – Frantique Sep 19 '14 at 11:26
  • 1
    Ok, I checked. xmlstarlet uses libxml2. If you remove the first *sed* and *awk* suggestions that are not correct according to me, I'll accept the *xmlstarlet* suggestion as the accepted answer. – Erik Sjölund Sep 19 '14 at 12:04
  • 1
    it's disappointing to know that none of the big editors (Sublime, vim, Atom, Notepad++) have this feature built-in. Sad, sad, sad. Thanks @Frantique – Lucas Pottersky Jun 28 '16 at 18:52
  • I would like to keep only comments that contain "TODO" on it, any idea if possible with xmlstarlet? – Aquarius Power Jul 25 '22 at 00:20
0

To expand on the top answer. If you only want to delete the comment and not the entire line, you should probably use:

sed 's/<!--.*-->//'

In my case, I had a minified XML file where the entire content was in a single line and since the previous solution would delete the entire line where the comment was located, it would completely clear out my file.

bezbos.
  • 101
  • 2
  • Quote from the question "_I'm not interested in solutions that only handle a subset of all possible XML files._". The difficult thing is finding a solution that can handle all XML files, not just a subset. – Erik Sjölund May 30 '22 at 13:42
0

This is good to clean multiline comments (like failed tests) from a xml, least the ones you hand picked and are helpful to the end user:
perl -i -w -0777pe 's/<!--(.(?<!(HELP|TODO)))*?-->//sg' somefile.xml

more about related regex: https://stackoverflow.com/a/1240293/1422630

If there is a way to obtain the same result but using xmlstarlet, I would prefer as there may have some exception that regex may not handle, but for now this is what I have to use.

Aquarius Power
  • 3,921
  • 6
  • 39
  • 67
  • This solution only handle a subset of all possible XML files is therefore not relevant to the question asked. (Regex solutions only handle a subset of all possible XML files) See https://stackoverflow.com/questions/6751105/why-its-not-possible-to-use-regex-to-parse-html-xml-a-formal-explanation-in-la – Erik Sjölund Jul 25 '22 at 08:19
  • I will ask a new question thx! – Aquarius Power Aug 01 '22 at 21:52
  • https://stackoverflow.com/questions/73200053/how-to-remove-all-comments-from-a-xml-file-least-the-ones-containing-the-help – Aquarius Power Aug 01 '22 at 22:07