Make wget convert HTML links to relative after download if -k wasn't specified

Question

The -k option (or --convert-link) will convert links in your web pages to relative after the download finishes, such as the man page says:

After the download is complete, convert the links in the document to make them suitable for local viewing. This affects not only the visible hyperlinks, but any part of the document that links to external content, such as embedded images, links to style sheets, hyperlinks to non-HTML content, etc.

So, if I didn't specify -k, can I run wget again after the download and fix that, and if so, what would be the proper command? My guess is wget -c [previous options used] [url] and run it in the same working directory as the file were downloaded to.

you could certainly post-process the files after download, but i don't know if `wget` does this. your idea of trying it with `-c` is a good one. time to experiment! — quack quixote, Dec 07 '09 at 21:08
Have a utility handy to convert the links, by any chance? Running on Windows, by the way... — Nathaniel, Dec 07 '09 at 21:14
`perl` ... no prewritten script, but if i wanted a DIY solution that's what i'd use — quack quixote, Dec 07 '09 at 21:48
Okay, thanks. Don't have Perl installed and it would take too long to grab it. Fortunately, I found how to make wget do the job. I posted an answer. — Nathaniel, Dec 07 '09 at 21:52
btw, ActivePerl is around as a windows perl port; it's a fairly small installer, and i'm pretty sure most CPAN modules work with it. http://www.activestate.com/activeperl/ — quack quixote, Dec 08 '09 at 15:47

score 30 · Accepted Answer · edited Dec 18 '14 at 09:08

30

Yes, you can make wget do it. I'd say use wget -nc -k [previous options] [previous url]. -nc is no-clobber. From the man page:

When −nc is speciﬁed, this behavior is suppressed, and Wget will refuse to download newer copies of ﬁle.

And the -k option does the link converting. So, wget starts digging in the remote server, sees all the files you already have, refuses to redownload them, and then edits the HTML links to relative when it's done. Nice.

edited Dec 18 '14 at 09:08

Kevin Panko

7,346
22
44
53

answered Dec 07 '09 at 21:51

Nathaniel

4,356
8
41
60

4

No this doesn't work for me. He download the first file (e.g. index.html), see that is allready downloaded an stop. If you want wget working recursive you have to use the timestamp (-K) option. So wget must request all headers to match if the file is newer or not. – Jul 10 '11 at 21:51
15

GNU Wget 1.13.3 built on darwin11.1.0. Trying to use both options at the same time gives `Both --no-clobber and --convert-links were specified,only --convert-links will be used.` – Ludovic Kuty Dec 29 '11 at 04:05
2

didn't your question ask for without -k? – barlop Jan 21 '12 at 01:34
8

Cf. @LudovicKuty's comment -- as of wget 1.13 `--no-clobber` doesn't work with `--convert-links`. See [http://savannah.gnu.org/bugs/?31781](http://savannah.gnu.org/bugs/?31781) for details. – David Moles Feb 26 '13 at 20:37
1

In case anyone cares, I built a docker image for wget 1.12: https://hub.docker.com/r/berezovskyi/wget1.12/ – berezovskyi Dec 03 '17 at 11:54
@berezovskyi I appreciate the effort you put into that, but it "didn't work." I wish I could say more, but I'm not sure how to troubleshoot that docker container; it doesn't output anything to stderr – Daniel Kaplan Mar 24 '22 at 22:42

Make wget convert HTML links to relative after download if -k wasn't specified

1 Answers1

Linked

Related