-1

Can any one help me by suggesting a batch script to find 'utf-16' string in multiple xml files in a directory and replace it with 'utf-8'. With out using find and replace tools. The replacement need to be done in existing file itself.

barlop
  • 23,380
  • 43
  • 145
  • 225
user426493
  • 7
  • 1
  • 2
  • 1
    possible duplicate of [How to search and replace a string in multiple text files (within a directory) with Windows CMD? Part 2](http://superuser.com/questions/682863/how-to-search-and-replace-a-string-in-multiple-text-files-within-a-directory-w) – DavidPostill Mar 10 '15 at 09:00
  • @DavidPostill did you not see the words utf-16 and utf-8 The question may or may not be a bit confused but you are not addressing that by just pointing to that other question – barlop Mar 10 '15 at 09:24
  • if you have a file that has a mix of utf-8 and unnecessarily utf-16 then is your file a mixed up file format, are you sure you have a file like that? – barlop Mar 10 '15 at 09:27
  • @barlop I read it as replace the literal string `utf-16` and replace with `utf-8`. If the OP wants to change *encoding* I don't think thats possible from cmd. – DavidPostill Mar 10 '15 at 09:29
  • @DavidPostill `msxls source.xml -o UTF8stylesheet.xsl outputUTF8.xml` - all possible) – STTR Mar 10 '15 at 09:36
  • @DavidPostill of course you can change encoding from the command line, you can from a gui, and the ecommand line is more powerful for that sort of thing than GUIs are. I think it's possible your interpretation is correct though. BTW, how is replacing the literal string utf-8 any different from replacing the literal string abcdefg? – barlop Mar 10 '15 at 09:41
  • @DavidPostill for a start, xxd with sed can do almost anything to a file. here is an example of xxd and sed playing with the encoding http://superuser.com/questions/801419/how-can-i-make-changes-to-this-file-encoding And for a specific command line tool that specialises in it. iconv http://superuser.com/questions/16672/how-can-i-convert-multiple-files-to-utf-8-encoding-using-nix-command-line-tools I am amazed that you would pontificate(and wrongly obviously) that something as technical as changing an encoding can't be done from the command line. – barlop Mar 10 '15 at 10:39
  • 1
    @barlop OP asked for a batch script ( With out using find and replace tools). He didn't ask how to use external tools to do the conversion. AFAIK it cannot be done with just a batch script without using external tools. In any case it is still unclear what the OP is trying to do. Please don't insult me using "pontificate" whan the question is unclear ... – DavidPostill Mar 10 '15 at 11:28
  • @DavidPostill i'm aware that the OP said "batch script", but you wrote "If the OP wants to change encoding I don't think thats possible from cmd" cmd is not just batch scripts, and your sentence is not true. If you'd said you don't think that's possible via a batch script then fine it's reasonable to think it's not possible via a batch script w/ batch tools. But you used the term cmd which is broader.You didn't use the right terminology to denote what you meant.You meant batch script specifically then you should've said so. rather than saying can't from cmd (implying thus can't from batch). – barlop Mar 10 '15 at 15:19
  • @barlop lets not argue over small points ;) – DavidPostill Mar 10 '15 at 17:52

1 Answers1

1

Use any XSLT-processor. For example msxsl.

Command Line Transformation Utility

MSXML 4.0 Service Pack 2 (Microsoft XML Core Services)

zero.xsl - style sheet transform test.xml to test2.xml

<xsl:output method="xml" encoding="UTF-8" /> convert xml to UTF-8.

Zeroxml test.xml

Zeroxml.cmd:

@echo off
@set name=%1
msxsl.exe %name% zero.xsl -o %name:~0,-4%2.xml

zero.xsl:

<!-- The Identity Transformation -->
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

<!-- <xsl:output method="html" encoding="UTF-8" indent="yes" omit-xml-declaration="no"/> -->


<xsl:output method="html" media-type="application/vnd.ms-excel" encoding="UTF-8" indent="yes" omit-xml-declaration="no"/>

<!-- <xsl:output omit-xml-declaration="no" indent="yes" encoding="UTF-8" method="html" />  -->
<!-- <xsl:output method="xml" media-type="application/vnd.ms-excel" encoding="UTF-8" indent="yes"     omit-xml-declaration="no"/>  -->
<!-- <xsl:output method="xml" encoding="UTF-8" indent="yes" omit-xml-declaration="no"/>  -->
<!-- <xsl:output method="xml" encoding="UTF-8" indent="yes" omit-xml-declaration="yes"/> -->

  <!-- Whenever you match any node or any attribute -->
  <xsl:template match="node()|@*">
    <!-- Copy the current node -->
    <xsl:copy>
      <!-- Including any attributes it has and any child nodes -->
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>

 <xsl:template match="*|@*|comment()|
                          processing-instruction()|text()">
         <xsl:copy>
             <xsl:apply-templates select="*|@*|comment()|
                                      processing-instruction()|text()"/>
         </xsl:copy>
  </xsl:template>

</xsl:stylesheet>

Way2, convert UTF-16 to UTF-8 command line:

Unicode - UTF-16 format little-endian byte order.

powershell gc test.xml -encoding Unicode^|sc testUTF8.xml -encoding UTF8

BigEndianUnicode - UTF-16 format big-endian byte order.

powershell gc test.xml -encoding BigEndianUnicode^|sc testUTF8.xml -encoding UTF8

Convert all xml file UTF-16 in SourceDirXML directory and subdirectory to UTF-8

powershell $in='C:\SourceDirXML';$out='C:\OutputUTF8XML\';ls -Fo -r $in -Fi *.xml^|%{(gc $_.FullName -encoding Unicode^|sc ($out+$_.Name) -encoding UTF8)}

Search and replace a string in multiple xml files (within a directory) with Windows CMD:

powershell $in='C:\SourceDirXML';$out='C:\OutputUTF8XML\';ls -Fo -r $in -Fi *.xml^|%{(gc $_.FullName^|%{$_ -replace 'oldstring','newstring'}^|sc ($out+$_.Name) -encoding UTF8)}
STTR
  • 6,767
  • 2
  • 18
  • 20
  • I guess all that code in zero.xsl is to get it to do it for many files? what code would do it for just one file? that's a lot of code just to change a literal string from one thing to another. (if that's what he wants to do) – barlop Mar 10 '15 at 09:47
  • @barlop zero.xsl shows the transformation that does nothing with the text. Do the same for multiple files, you just need to loop `FOR /R %%i IN (*.xml) DO` . Which lines you want to replace in the xml file, and where? – STTR Mar 10 '15 at 09:51
  • so all that code in zero.xsl is just to replace one string with another string? – barlop Mar 10 '15 at 10:37
  • @barlop No, it's an empty conversion. When the file is converted into itself but for example with UTF-8 encoding specified in the options. If you have indicated in the example exactly what you want to convert, you can write a specific string conversion. Because it will be different for elements and attributes for the complete line for the substring. – STTR Mar 10 '15 at 10:46
  • thanks, you should clarify that all that does nothing and you specify the conversion, you should even specify an example of it converting a specific string. Then it'll be more useful for people. Yor answer should be useful for people looking for the answer to the question, other than just the OP. interesting thing re powershell too you mention there. hopefully the OP will cliarify what he wants to convert. But in the meantime, or even regardless, give a more useful example (one that works, that others can use and adapt) so that your example isn't a completely empty shell. – barlop Mar 10 '15 at 10:47
  • @barlop Updated. The last command works. But it destroys the meaning of existence xml. – STTR Mar 10 '15 at 11:43