1

I tried to make a copy of the site wiredhealthresources.net using the command:

wget -rpkl inf wiredhealthresources.net

But the command only downloaded 54 files! Most of the pages are missing, e.g. /topics-cardiology.html, despite being linked to from /index.html

What did I do wrong? Why is wget not downloading the whole site?

Giacomo1968
  • 53,069
  • 19
  • 162
  • 212
Zaz
  • 2,405
  • 2
  • 26
  • 38
  • While I can't answer the question itself, I would suggest giving [HTTrack](http://www.httrack.com/page/1/en/index.html) a try, as I have had more success with that. – Unencoded Oct 27 '16 at 14:42

2 Answers2

4

If you look at the page source you won't see any topics-cardiology.html link because the sidebar is being generated by JavaScript. You will need to use a JavaScript headless browser like CasperJS to make a complete mirror.

Zaz
  • 2,405
  • 2
  • 26
  • 38
Nathan
  • 216
  • 1
  • 2
-1

I'm reasonably sure you can't use the inf option to modify depth, only to modify tries, or query. Have you tried using -m instead of -r and -l? It sounds like you want to mirror the page, and that's what -m is used for.

Giacomo1968
  • 53,069
  • 19
  • 162
  • 212
Warley
  • 124
  • 5
  • Both using `-l 99` and `wget -pkm` yield the same result: only 54 files downloaded. The man page says `-m` is equivalent to `-r -N -l inf --no-remove-listing`, which is where I got the `-l inf` from. – Zaz Oct 27 '16 at 16:44