0

I have the following xml generated from Gnote:

<?xml version="1.0"?>
<note version="0.3" xmlns:link="http://beatniksoftware.com/tomboy/link" xmlns:size="http://beatniksoftware.com/tomboy/size" xmlns="http://beatniksoftware.com/tomboy"><title>things</title><text xml:space="preserve"><note-content version="0.1" xmlns:link="http://beatniksoftware.com/tomboy/link" xmlns:size="http://beatniksoftware.com/tomboy/size">things
<list><list-item dir="ltr">sheets
</list-item><list-item dir="ltr">test
</list-item><list-item dir="ltr">eval</list-item></list>
asd
</note-content>
</text><last-change-date>2023-02-19T12:20:06.551763Z</last-change-date><last-metadata-change-date>2023-02-19T12:20:06.553010Z</last-metadata-change-date><create-date>2023-02-19T10:40:01.309068Z</create-date><cursor-position>90</cursor-position><selection-bound-position>-1</selection-bound-position><width>649</width><height>282</height></note>

I want all the text content in <note-content></note-content> without extra newlines. This includes text content in list/list-item elements. The content requested is the following including the format:

things
sheets
test
eval
asd

After much trial and error, parsing the xml with xmllint --xpath "//*[local-name()='note']/*[local-name()='text']/*[local-name()='note-content']/text() | //*[local-name()='note']/*[local-name()='text']/*[local-name()='note-content']/*[local-name()='list']/*[local-name()='list-item'][@dir='ltr']/text()[normalize-space()]" a.xml (with or without --noblanks) yields output separated by extra newline/blank lines (there is a supposed to be a blank line after asd but the code block isn't showing it):

things

sheets

test

eval

asd

Removing the new lines in the xml file and using the same xmllint command outputs desired output with no extra newlines/blank lines so I don't know if this is Gnote producing something non-standard.

I tried looking at the comments and answers at https://stackoverflow.com/questions/11776910/xpath-expression-to-remove-whitespace/11777638, but I've been unsuccessful. Some observations:

  1. When I tried to execute (notice the |) xmllint --xpath "normalize-space(//*[local-name()='note']/*[local-name()='text']/*[local-name()='note-content']/text()) | normalize-space(//*[local-name()='note']/*[local-name()='text']/*[local-name()='note-content']/*[local-name()='list']/*[local-name()='list-item'][@dir='ltr']/text())" a.xml I get:

XPath error : Invalid type

XPath evaluation failure

  1. Even if I reworked my script to execute multiple xmllint invocations I'm left with a single string where the newlines are removed, which is good, but which string needs to be manually set before hand. So for example here's both normalize-space and translate(normalize-space, ' ', '') variations for the note-content element path: xmllint --xpath "normalize-space(//*[local-name()='note']/*[local-name()='text']/*[local-name()='note-content']/text())" a.xml xmllint --xpath "translate(normalize-space(//*[local-name()='note']/*[local-name()='text']/*[local-name()='note-content']/text()), ' ', '')" a.xml Both yield one of the same two items in the note-content element without newlines ([1] is things and [2] is asd). I can choose between the two by appending [1] or [2] to text(), but this doesn't work if I have an undefined number of items. (I don't know if there is a way to just get all of the text array/items this way).
  2. Some answers suggest using [normalize-space() = 'desiredtext'], this doesn't work if I can't expect the text in the generated xml.
  3. If I just have [normalize-space()] after text(): xmllint --noblanks --xpath "//*[local-name()='note']/*[local-name()='text']/*[local-name()='note-content']/text()[normalize-space()] | //*[local-name()='note']/*[local-name()='text']/*[local-name()='note-content']/*[local-name()='list']/*[local-name()='list-item'][@dir='ltr']/text()[normalize-space()]" a.xml I'm left with the same output I started with.
  4. I tried appending [not(.='')] after text() I get the same output.

The question: I want to know if this excessive blankline behavior is caused by incorrect xmllint/xpath commands or due to the way Gnote generated the xml and the correct xmllint/xpath commands if there are any. I am not looking to use xmlstarlet because it doesn't appear to be maintained anymore. This question is not asking for a way to pipe this into a command that removes extra newlines.

Yetoo
  • 13
  • 3
  • I want all the text in and not have extra \n after each. Ok, lets say sheets has an embedded new line, but why is there a newline in output after at least eval? I feel like whatever is going to get eval to not output a successive newline may also ignore the embedded new lines in the other parts of the file. – Yetoo Apr 25 '23 at 18:41
  • I don't think newline in xml matters for this as I'm under the impression xml does not care about standard text line feeds just tags https://sbnwiki.astro.umd.edu/wiki/Welcome_to_the_SBN_Wiki https://stackoverflow.com/questions/35504890/how-to-add-a-newline-line-break-in-an-xml-file – Yetoo Apr 25 '23 at 18:49
  • @roaima When I used initial xpath in question as xmlstarlet sel -t -v "//*[local-name()='note']/*[local-name()='text']/*[local-name()='note-content']/text() | //*[local-name()='note']/*[local-name()='text']/*[local-name()='note-content']/*[local-name()='list']/*[local-name()='list-item'][@dir='ltr']/text()[normalize-space()]" -n a.xml I got the same output with the extra newlines so it seems xpath needs to be adjusted. – Yetoo Apr 25 '23 at 19:27
  • I meant to say , but I updated the question. – Yetoo Apr 25 '23 at 20:10

1 Answers1

0

Although you have expressed a preference to avoid xmlstarlet, it will do exactly what you want:

xmlstarlet sel -t -v '//_:note-content' -n xmlfile

Output

things
sheets
test
eval
asd

Using xmllint I cannot avoid the blank lines that are part of the element value text:

xmllint --xpath '//*[local-name()="note-content"]//text()' xmlfile

Output

things

sheets

test

eval

asd

After having spent some time with xmllint I would suggest that you simply remove the blank lines. (Not ideal, but certainly effective.)

xmllint … | grep .

Output

things
sheets
test
eval
asd
roaima
  • 2,889
  • 1
  • 13
  • 27
  • Would upvote if I had enough reputation because of improved command. It's funny, when I was doing trial and error I did that but with double quotes so I got XPath set is empty and was none the wiser. – Yetoo Apr 25 '23 at 21:32
  • @Yetoo if it solves your problem you can definitely accept it ✔ as your chosen answer – roaima Apr 25 '23 at 21:41