1

I have this setup:

  • The remote server is very unstable
  • The content it serves is pretty much static, as long as I know the URL, its content isn't changing much.
  • I'm using Firefox, Windows platform

"Unstable" server may return 5xx or take too long to reply etc. So if I am lucky enough to land on an actual page, I'd like to save it locally for further reading later when I have time. About "how lucky" - well, if I loaded the page, I might be able to load something again in 1-2 minutes.. or 1-2 hours. Or days. I don't know, really - that's outside of my control and I'd rather not focus on "trying to fix the server", I don't own it.

Problem is - if I try to "Save page", Firefox tries to request the content from the server instead of saving the content that I already have loaded in the tab. Obviously, since the server is unstable, more often than not it will save the 5xx response page and not the actual content.

Since the content is mostly text, I'm fine even if I save just the page without styles, header/footer images etc. But "view source" (aka Ctrl+U) seems to do the same thing, i.e. tries to request the content from the server. How to avoid that? How to just save the content that I already happen to have loaded into the memory, without requesting the server to return it to me once again?

This answer implies that Firefox doesn't actually request it the second time, but it contradicts what I see happening in my case. I clearly see the request in the network and the page save goes to the "Downloads" as well. And of course, the fact it "saves" the server error response pages says that itself.

EDIT: The bare minimum I'd like to have is being able to follow the links (articles can have references) and copy / paste the text. So screen capture is hardly an option. Besides, is it too much to expect to be able to save the content that's already loaded?

Alma Do
  • 115
  • 1
  • 1
  • 10
  • Perhaps taking a screenshot is a solution. See for example [Nimbus Screen Capture](https://addons.mozilla.org/en-US/firefox/addon/nimbus-screenshot/). – harrymc Feb 25 '21 at 17:55
  • It would work if not references / other hyperlinks that articles often have. Those do not lead to the same server and I won't be able to do anything about it on a screenshot – Alma Do Feb 25 '21 at 18:01
  • So you wish the saved page to still be functional (HTML)? – harrymc Feb 25 '21 at 18:08
  • The bare minimum I'd like to have is being able to follow the links and copy / paste the text. So it doesn't have to be fully functional html, even a trimmed source will work (i.e. without styles or pictures - I can live with copying reference links manually) – Alma Do Feb 25 '21 at 18:17

3 Answers3

1

I guess that the reason that Firefox reloads the page is because the page as-displayed may not be the same as the original. For example, it might have been managed by JavaScript code or some installed extension such as Greasemonkey.

There is a method for getting the raw HTML. This method will conserve text and links, but sometimes not the exact look, because it doesn't also save the CSS files. You will get the raw HTML, but not the complete page with its external JavaScript and CSS.

Here is it:

  • With the page displayed, type Ctrl+U to display the page source
  • Use the menu File > Save Page As ..., or use the right-click context menu
  • Save the contents in an .html file.

This method should not cause Firefox (or any other browser) to access again the website.

harrymc
  • 455,459
  • 31
  • 526
  • 924
  • `With the page displayed, type Ctrl+U to display the page source` I do that and get HTTP 520 html code error page "source" inside instead of the html for the page. In the question I also stated that Ctrl+U also retries the request which I want to avoid (to be more precise: __even Ctrl+U itself causes the network request, I see the error page inside the Firefox viewer for the source code__ ) – Alma Do Feb 25 '21 at 19:59
  • That's astounding - there's no reason for Firefox to re-access the page. Very bad programming. – harrymc Feb 25 '21 at 21:15
1

You can use the Web Developer Tools built into Firefox to do this.

  • Open tools: CtrlShifti or menu Tools / Web Developer / Toggle Tools
  • In the tool area, click on the tab "Inspector". This shows you the source of page, as Firefox is currently showing, including any changes performed by Javascript.
  • At the top, there should be a line starting with <html ..... Right-click this line, select Copy / Outer HTML.
  • Paste the clipboard into an editor and save.

This will give you the complete HTML source of the page, as displayed. This also works for complex web applications, because the HTML will reflect any changes that scripts made, such as loading additional content via AJAX.

What will be missing will be external files, such as images, CSS and scripts. If you want to include that, best use a Firefox Add-on, for example Save Page WE.

sleske
  • 22,652
  • 10
  • 69
  • 93
1

You could :

  • Select (drag select) what you want, or select all on the page Ctrl A
  • Copy Ctrl C
  • Paste to Word, LibreOffice Writer, or some other application that shows html as a page.

This is a little cumbersome, but would accomplish your goal as stated. Links are still accessible, and text still savable.

Rick
  • 31
  • 3
  • Unfortunately Word will usually try to contact the remote server when you paste html into it – phuclv Feb 26 '21 at 00:55
  • @ phuclv I did not see Word do this. Does this only happen with source html from the page and not the copy/paste of the page as displayed? – Rick Feb 26 '21 at 16:42
  • it almost always happens when copying/pasting the page to Word. If you cancel the operation during that time you'll get a page without images or some other scripts. Obviously if you copy the source code you'll only get the source code instead of the page so no connection is made – phuclv Feb 27 '21 at 13:30