1

I have a big log file and i need to cut a part of it and paste it in the end of the same file... so i created a script that looks for the chunks of code i need (sed) create a temp file ,paste it , then go to original file and delete that same text. then i (cat) both files. Problem is my files are all in utf-8, but the temp file is created in ANSI(probably because it has only "normal" characters) and when i join both files the original file is changed to ANSI, causing a mess in some text. I changed my locals, but they where correct,

en_GB.ISO-8859-1... up-to-date
  en_GB.UTF-8... up-to-date

and tried a endless nr of combinations in my scripts, but result is always the same. I did a this simple test

echo "test" >/tmp/test.txt 
echo "ção" >/tmp/test2.txt 

then i did

file/tmp/test.txt ------>it says ASCII
file /tmp/test2.txt ---->it says UTF-8 Unicode text

when i edit the file in edit-plus and check the file encoding and it says ANSI to 1st and UTF-8 to 2nd. but if i cat them together the output is ok (utf-8) no matter the order i chose... but using my big log file (already in utf-8) and cat it together with the temp file the result is always the ANSI.

I thought there might be some kind of a autoconfiguration of the encoding if no special chars are present, but i can't understand why it changes to ANSI or ASCII when i join them. I'm using a server with Ubuntu 14.04 Server, and accessing with Putty on windows machine.

  • Could you add an example to your post that does _not_ work? – PerlDuck Oct 06 '18 at 16:47
  • Maybe https://stackoverflow.com/q/27072558/5830574 – PerlDuck Oct 06 '18 at 16:51
  • I'm no specialist on encoding, but file identified as ASCII is simply when in contains characters only in ASCII range. But bitwise characters should be same width to fit UTF-8 width, so in itself it should not be a problem. – Sergiy Kolodyazhnyy Oct 06 '18 at 17:04
  • @ PerlDuck as crazy as it seems i tryed that too (so I thought) and it didn't worked... But i double checked it before answering you and it just worked fine... Almost 3 days going nuts.... Thanks 4 the help. – Lcross Portugal Oct 06 '18 at 17:07
  • 1
    The `file` command reads only a limited number of bytes in order to determine the type of the file. If the initial bytes are only ASCII, then the file type reported will be `ASCII` even if `UTF-8` characters show up later in the file. Linux file systems do not differentiate between `ASCII` and `UTF-8` files. – doneal24 Oct 06 '18 at 17:38

0 Answers0