2

I have a list of links here: https://docs.oracle.com/javase/tutorial/reallybigindex.html

That I would liked all downloaded. Would anyone know how this would go?

  • What defines "nothing more"? – random May 28 '16 at 18:10
  • @random To me it seems strange to have this marked as a duplicate when the website in question **offers a zip file containing the needed pages** (see my answer). Why go for a general solution when there is a **specific solution** (which is **not** covered in the dupe) to the OP's question? – DavidPostill May 28 '16 at 22:58
  • Either it's a duplicate of how to download a site and all the links but not all the links (because that's still not clarified) or it's out of scope for wanting to download a specific resource @dav – random May 29 '16 at 00:35

3 Answers3

2

You can download Wget for Windows and use that from cmd.exe:

wget -r -l 2 https://docs.oracle.com/javase/tutorial/reallybigindex.html

If you also want the images and CSS files for those pages, then add -p and also -k to change the links in the HTML so you can browse these pages offline.

This tutorial has some screenshots which may help.

The value of -l 2 will get that first page, and all the pages that it links to. You can increase the number to get deeper pages, but I fear it will follow some links away from the tutorials and around the Oracle website.

Alternatively you could try VisualWget which has a UI!

Alternatively you might like to download the tutorials in ebook form.

joeytwiddle
  • 1,715
  • 1
  • 15
  • 22
1

How can I download a website and the links it references

I have a list of links here: https://docs.oracle.com/javase/tutorial/reallybigindex.html

Instead of downloading all the links in the "The Really Big Index" it is easier to just Download the latest Java Tutorials bundle.

It is available in a variety of formats - zip, epub and mobi.

tutorial.zip includes reallybigindex.html and all of the referenced files.

Here are the top level contents of the expanded zip file:

enter image description here

DavidPostill
  • 153,128
  • 77
  • 353
  • 394
1

There are many ways to approach this. Not knowing your desired end product I can't be very specific.

  • wget, as suggested by @joeytwiddle
  • curl (similar to wget)
  • google sheets
  • browser add-ons for Chrome or Firefox (search scraper)

I'll expand on Google Sheets (I use this for simple one time projects):

  • create a new sheet
  • put this in cell a1 https://docs.oracle.com/javase/tutorial/reallybigindex.html
  • put this in cell b2 =IMPORTXML(A1, "//a[@href]/text()")(this retrieves the text of the click)
  • put this in cell e2 =IMPORTXML(A1, "//a[@href]/@href")(this retrieves the URL)

The second parameter of the function is an xpath expression. You'll need to adjust those to get the result you want. There are many online xpath testers to help you do this.

Paulb
  • 763
  • 7
  • 14