How can I extract two or three "parallel" XPaths on a web page to/in a format that can be imported into a spreadsheet?
Here is an example: on https://www.amazon.co.uk/s?rh=n%3A20606777031&language=en_GB&brr=1&pf_rd_i=3012216031&pf_rd_m=A3P5ROKL5A1OLE&pf_rd_p=7accc6fa-9287-4d54-804e-88e159c5cb45&pf_rd_r=P0TZ793D83HJK8616YKC&pf_rd_s=merchandised-search-top-2&pf_rd_t=101&rd=1&ref=uk_outsbcd_1 most items have an actual price and an RRP. I would like to extract all prices as well as RRPs to something that can be used in a spreadsheet.
An ugly XPath to get the RRP is
/html/body/div[1]/div[2]/div[1]/div[1]/div/span[1]/div[1]/div[11]/div/div/div/div/div[2]/div[3]/div/a/div/span[2]/span[2]/text()
and to get the (non-decimal part of the) price
/html/body/div[1]/div[2]/div[1]/div[1]/div/span[1]/div[1]/div/div/div/div/div/div[2]/div[3]/div/a/span/span[2]/span[2]/text()
The product name:
/html/body/div[1]/div[2]/div[1]/div[1]/div/span[1]/div[1]/div/div/div/div/div/div[2]/div[1]/h2/a/span
If I execute these XPaths with $x, expand the result, right click on the result here (this is a screenshot from the console in Firefox):
and select Copy message (in Firefox) I get something that looks like this:
Array(39) [ #text, #text, #text, #text, #text, #text, #text, #text, #text, #text, … ]
0: #text "2"
1: #text "7"
2: #text "8"
when pasted into a text editor.
I would like the output in a format that required less postprocessing before I can insert it into a spreadsheet. Furthermore, if I get many (100+? 1000+?) results Firefox doesn't list them all.
As a first step I would like
Array(39) [ #text, #text, #text, #text, #text, #text, #text, #text, #text, #text, … ]
0: #text "2"
1: #text "7"
2: #text "8"
to look like this:
2
7
8
That is, just the numbers/result, one on each row (which I then can paste into a spreadsheet, one column for each XPath, and the name will match with the price and the RRP).
The next step would be to make a csv/tsv with everything included, something like:
prod1,2,7
prod2,7,11
prod3,8,14
(I prefer tab to comma, but I used comma here because it made it clearer)
Which in a spreadsheet would come out like this:
| name | price | RRP |
|---|---|---|
| prod1 | 2 | 7 |
| prod2 | 7 | 11 |
| prod3 | 8 | 14 |
How can I do this? I imagine a Javascript solution that I execute in the browser console where the input is one¹ or more XPaths and that the output is a string that I can either copy from the console and paste into a spreadsheet (or maybe that string is inserted directly in the pasteboard?)
Since this is a hack that I will only use once at a specific site, I am prepared to do some manual cleanup or manipulation afterwards.
I suck at using the browser console as well as Javascript...That is why I ask this question.
(My final goal is to calculate the price change in percent)
¹ In the case of one XPath as input, I will execute this several times, and copy-paste into a spreadsheet after each execution. In the case of two or more XPaths as input, I would just execute it once.
