Why is entire website not downloaded?

Question

I tried to make a copy of the site wiredhealthresources.net using the command:

wget -rpkl inf wiredhealthresources.net

But the command only downloaded 54 files! Most of the pages are missing, e.g. /topics-cardiology.html, despite being linked to from /index.html

What did I do wrong? Why is wget not downloading the whole site?

While I can't answer the question itself, I would suggest giving [HTTrack](http://www.httrack.com/page/1/en/index.html) a try, as I have had more success with that. — Unencoded, Oct 27 '16 at 14:42

score 4 · Accepted Answer · edited Nov 03 '16 at 00:46

4

If you look at the page source you won't see any topics-cardiology.html link because the sidebar is being generated by JavaScript. You will need to use a JavaScript headless browser like CasperJS to make a complete mirror.

edited Nov 03 '16 at 00:46

Zaz

2,405
2
26
38

answered Nov 01 '16 at 15:00

Nathan

216
1
2

Ahh! Makes sense. I should have checked the source. Thank you! – Zaz Nov 02 '16 at 22:03
Do you know of a good CasperJS script to mirror a website? I'm struggling to find one. – Zaz Nov 15 '16 at 22:22

score -1 · Answer 2 · edited Nov 03 '16 at 00:55

-1

I'm reasonably sure you can't use the inf option to modify depth, only to modify tries, or query. Have you tried using -m instead of -r and -l? It sounds like you want to mirror the page, and that's what -m is used for.

edited Nov 03 '16 at 00:55

Giacomo1968

53,069
19
162
212

answered Oct 27 '16 at 14:45

Warley

124
5

Both using `-l 99` and `wget -pkm` yield the same result: only 54 files downloaded. The man page says `-m` is equivalent to `-r -N -l inf --no-remove-listing`, which is where I got the `-l inf` from. – Zaz Oct 27 '16 at 16:44

Why is entire website not downloaded?

2 Answers2