0

Possible Duplicate:
How can I download an entire website

I frequently encounter webpages that offer manual pages or other info accessible only via a table of contents consisting of links to individual chapters or paragraphs. Often the individual leaf pages then consist of a few lines only, so traversing the entire tree is extremely cumbersome.

What I am seeking is a tool that would allow me to pull and combine all pages referenced by the links of a starting page into a single concatenated html document, such that one could e.g., save that page and/or linearly scroll through all child pages without having to click and go back 1000 times. This would also allow to print the entire collection to have a manual or search through it in one go, etc.

Does anyone know a good tool to achieve that? Ideally such a tool would offer some exclusion criteria (like ignore all "back" links or the link to help or home pages that is found on each page, etc.).

2 Answers2

1

You could use wget in mirror mode:

C:\MySites\> wget -m http://mymanuals.com/manuals/foobar

Would mirror the whole http://mymanuals.com/manuals/foobar site.

The other thing I have used with quite good success is HTTrack which again mirrors a website for you, but with a nice GUI front-end.

Majenko
  • 32,128
  • 4
  • 61
  • 81
0

wget to get all the pages. You could use xhtml2pdf and pdftk to create a single document.

l0b0
  • 7,171
  • 4
  • 33
  • 54
  • I don't think is is a duplicate! I am NOT trying to duplicate an entire website. What I would rather like to see is some tool, that lists a website's structure and pages e.g. as a tree and one can then conveniently select (e.g. by checking or circling) those which one wants to be copied (i.e. concatenated and "flattened") into a single document. IMHO that's a different job from just duplicating a website to local. – Michael Moser Mar 18 '11 at 16:58