8

I am experimenting a strange behaviour which I don't know how to solve. I will explain the scenario:

  • From a Python script I'm getting a json from a simple application hosted on parse.
  • Once I get the text, I get a sentence from it and save it to a local "txt" file saving it as iso-8859-15.
  • Finally I send it to a text to speech processor, which expects receiving it on ISO-8859-15

The weird thing is that once the python script runs, if I run

file my_file.txt

The output is:

my_file.txt: ASCII text, with no line terminators

But if I open my_file.txt with vim, then remove the last "dot" of the sentence, write it again, and save the file: if I do again:

file my_file.txt

now the output is:

my_file.txt: ASCII text

Which solves some problems when processing the voice synthesizer. So, how can I force this behaviour automatically without doing the vim stuff? I have also done many tries with iconv with no success.

Any help would be much appreciated

Edit:

i@raspberrypi ~/main $ hexdump -C my_file.txt

00000000  73 61 6d 70 6c 65 20 61  6e 73 77 65 72 2e 2e     |sample answer..|
0000000f

pi@raspberrypi ~/main $ file my_file.txt
my_file.txt: ASCII text, with no line terminators
pi@raspberrypi ~/main $ vim my_file.txt
pi@raspberrypi ~/main $ file my_file.txt
my_file.txt: ASCII text
pi@raspberrypi ~/main $ hexdump -C my_file.txt

00000000  73 61 6d 70 6c 65 20 61  6e 73 77 65 72 2e 2e 0a  |sample answer...|
00000010

Sample file

Python code:

import json,httplib
from random import randint
import codecs

connection = httplib.HTTPSConnection('api.parse.com', 443)
connection.connect()
connection.request('GET', '/1/classes/XXXX', '', {
       "X-Parse-Application-Id": "xxxx",
       "X-Parse-REST-API-Key": "xxxx"
     })
result = json.loads(connection.getresponse().read())

pos = randint(0,len(result['results'])-1)
sentence = result['results'][pos]['sentence'].encode('iso-8859-15')
response = result['results'][pos]['response'].encode('iso-8859-15')

text_file = codecs.open("sentence.txt", "w","ISO-8859-15")
text_file.write("%s" % sentence)
text_file.close()

text_file = open("response.txt","w")
text_file.write("%s" % response)
text_file.close()
Arjan
  • 30,974
  • 14
  • 75
  • 112
cor
  • 183
  • 1
  • 1
  • 6
  • Can you upload the file with no line terminators? I would like to have a look at it. – Nidhoegger Oct 17 '15 at 09:23
  • 1
    Is it removing the 'dot', or does any edit fix it? It might be that editing the file adds the end of line marker, rather than the dot causing the problem. – Paul Oct 17 '15 at 09:24
  • So it's a single line in that text file? And *does* it have a line terminator? And are you sure you're only removing the dot? You can validate using `hexdump -C`. When typing in vim, lines always seem to end with `0x0a`, even though you cannot move the cursor to the next empty line. So I guess vim is indeed adding it when you remove the dot, or make any edit. – Arjan Oct 17 '15 at 09:24
  • many thanks! yes, you are all right, just opening and saving the file with vim is enough – cor Oct 17 '15 at 09:27
  • thank you @Arjan I edited the post with the command results – cor Oct 17 '15 at 09:40
  • @Nidhoegger I uploaded a file. Is on the edited question. Many thanks – cor Oct 17 '15 at 10:33
  • Please show the python code how you get the line and how you write it. I suspect that the newline is stripped when looping over the input and all you need to do is append it when writing the output file. Please make sure to specify if you are using python 2 or 3 since unicode handling has changed a lot between those two versions. – Bram Oct 17 '15 at 10:42
  • Thanks @Bram, there it is. Using python 2.7.3. Writing to a file in two different ways, with same result. – cor Oct 17 '15 at 10:52
  • So that specific example even has *two* dots, right? `0x2e` is a dot, and that's in the example twice. But indeed, the `0x0a` is added by vim, even when you don't even remove anything, like you already saw now. – Arjan Oct 17 '15 at 11:01

2 Answers2

8

The standard /bin/echo can be used to add that newline to the end of the file for you:

$ echo -n 'ssss'>test
$ file test
test: ASCII text, with no line terminators
$ hexdump -C test 
00000000  73 73 73 73                                       |ssss|
00000004
$ echo >> test
$ file test
test: ASCII text
$ hexdump -C test 
00000000  73 73 73 73 0a                                    |ssss.|
00000005
$ 

Another option would be to add it in your Python code:

text_file = open("response.txt","w")
text_file.write("%s" % response)
text_file.write("\n")  # <-- newline added here
text_file.close()
4

The simplest solution is to append the newline in the write command:

text_file.write("%s\n" % sentence)

My sample program to demonstrate

import codecs
sentence = 'something'
text_file = codecs.open("sentence.txt", "w","ISO-8859-15")
text_file.write("%s" % sentence)
text_file.close()
text_file = codecs.open("sentence2.txt", "w","ISO-8859-15")
text_file.write("%s\n" % sentence)
text_file.close()

And the result:

$ file sentence.txt 
sentence.txt: ASCII text, with no line terminators
$ file sentence2.txt 
sentence2.txt: ASCII text

The explanation is that the variable you are writing does not contain the newline and write() writes exactly hat you give it.

Arjan
  • 30,974
  • 14
  • 75
  • 112
Bram
  • 622
  • 4
  • 12
  • Thank you, it works! your answer could be the acepted one perfectly, but Scott was quicker. – cor Oct 17 '15 at 11:13