9

I am trying to mirror a blog, eg www.example.com, with wget.

I use wget with the following options (shell variables are substituted correctly):

wget -m -p -H -k -E -np \
    -w 1 \
    --random-wait \
    --restrict-file-names=windows \
    -P $folder \
    -Q${quota}m \
    -t 3 \
    --referer=$url \
    -U 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.6) Gecko/20070802 SeaMonkey/1.1.4' \
    -e robots=off \
    -D $domains
    -- $url

The blog contain images that reside on other domains.

Even though I have specified the -p option (download linked page assets) these images are not being downloaded unless I specify each domain explicitly in the -D option.

If I omit the -D option then wget will follow every link outside www.example.com and download the whole internet.

Is it possible for wget to just follow every link under www.example.com and download each page’s required assets, whether those reside on the same domain or not without me having to specify each domain explicitly?

Giacomo1968
  • 53,069
  • 19
  • 162
  • 212
  • I'd love to find a good answer to this one also. I've run into the same situation and couldn't find a single wget invocation that did it. I ended up using `wget -N -E -H -k -K -p` first, and came up with a script to fetch missing linked images. – lemonsqueeze Oct 16 '14 at 16:52
  • 5
    According to [this one](http://superuser.com/questions/14403/how-can-i-download-an-entire-website/14436#14436), [httrack](http://www.httrack.com/) is a killer for this. I'll give it a shot next time instead of wget. – lemonsqueeze Oct 16 '14 at 16:58
  • Assuming your blog (minus the page assets) is not spanning multiple domains, try removing both the `-D $domains` as well as `-H`. Without `-H` it should stay within your domain but still retrieve the direct page assets, even when they are on a different domain. – blubberdiblub Dec 19 '15 at 16:06

1 Answers1

1

No, the only way is to specify the domains that you want wget to follow using -D or --domains=[domain list] (in the form of comma separated list)

sparks
  • 333
  • 2
  • 7