0

I have around 350 XML files spread throughout the /abc directory. I would like to find all instances where the value of the alt attribute is exactly 'blah blah':

<image alt="blah blah" src="../webcontent/filename.png">
    <caption>
        Figure 1.1: Typical Components of Blah Blah
    </caption>
</image>

and replace the value of the alt attribute with the contents enclosed by the caption (removing newlines)

<image alt="Figure 1.1: Typical Components of Blah Blah" src="../webcontent/filename.png">
    <caption>
        Figure 1.1: Typical Components of Blah Blah
    </caption>
</image>

I'm open to running a script on Ubuntu or Windows, or using any text editing tool.

It is not safe to assume that newlines and indentation are consistent. Also, not all images have a caption. All XML documents in the path are well-formed.

Is there a simple way to script this replacement in-place? I'd be open to something that works for a single file; I can extend it to run recursively.

Jedi
  • 860
  • 1
  • 9
  • 19
  • 3
    Possible duplicate of [How to search and replace a string in multiple xml files (within a directory) with Windows CMD](http://superuser.com/questions/887694/how-to-search-and-replace-a-string-in-multiple-xml-files-within-a-directory-wi) – RedGrittyBrick Jun 17 '16 at 14:06
  • @RedGrittyBrick, thanks for replying. it's less a question about doing it recursively, I'm okay with a solution that can do the replacement for a single file. The post you link to is a simpler string replacement; what I'm looking for requires more XML parsing. – Jedi Jun 17 '16 at 14:14

2 Answers2

1

For a single file, the following XSLT stylesheet will do the job:

<t:transform version="1.0" xmlns:t="http://www.w3.org/1999/XSL/Transform">
  <t:template match="node()|@*">
    <t:copy>
      <t:apply-templates select="node()|@*"/>
    </t:copy>
  </t:template>
  <t:template match="image/@alt[. = 'blah blah']">
    <t:attribute name="alt" select="normalize-space(../caption)"/>
  </t:template>
</t:transform>

To process multiple files, you can invoke the stylesheet multiple times from some shell script, Ant script, or similar (or look at xmlsh), or if you're using an XSLT 2.0 processor such as Saxon, you can script it within XSLT itself using the collection() function

Michael Kay
  • 434
  • 2
  • 4
  • Worked like a charm with `xsltproc` for one file and then `find-exec` for all XMLs. – Jedi Jun 17 '16 at 15:05
0

You could also use xmlstarlet:

xmlstarlet ed -u '//image/@alt[.= "blah blah"]' -x "normalize-space(../caption/text())"
Michael Vehrs
  • 265
  • 1
  • 5