27

I want to download HTMLs (example: http://www.brpreiss.com/books/opus6/) and join it to one HTML or some other format that i can use on ebook reader. Sites with free books don't have standard paging, they're not blogs or forums, so don't know how to do some automatic crawling and merging.

Hrvoje Hudo
  • 552
  • 2
  • 7
  • 14

5 Answers5

15

You can use Calibre for you ebook converting needs. You can get it to make a single ebook of multiple HTML files by linking to them from a single HTML file you setup as a table of contents like this.

Adam
  • 135
  • 6
Mr Alpha
  • 6,668
  • 2
  • 24
  • 26
  • I'm using Sigil for conversion to EPUB, but didn't know that Cailbre can make one ebook from bunch of linked htmls. I'll try, thanx! – Hrvoje Hudo Mar 02 '11 at 10:30
  • 1
    You can use http://www.httrack.com/ to download the webpage(s), then use Calibre to convert them all to an ePub format. – 에이바 Mar 21 '11 at 18:47
  • 3
    My process is (using Chrome) to use the Instapaper Text bookmarklet to clean things up a bit, then right click -> Save As, choose to save as a single web page, HTML Only, then open this in Calibre, convert to EPub, then use the Edit Book functionality to tidy up any additional messy bits of markup that get pulled in. – El Yobo Jan 30 '15 at 11:24
10

The way I used to do this was Calibre.

That became too much of a pain though so I built a Chrome Extension to make it easier.

It's called EpubPress (http://epub.press).

It allows you to build an ebook from your Chrome tabs.

Hope that helps!

HaroldT
  • 129
  • 1
  • 3
  • 4
    The website in your link suggests that the packaging occurs on a 3rd party server the privacy is NOT guaranteed with this method. – Burgi May 01 '16 at 01:57
  • Do you have suggestions for changes that would make you feel more secure? I have done my best to only require the bare minimum information for creating a book, but I'm open to further feedback. If you look at any comparable service, you will find that any content you want to save is sent to a server. The difference is that those services also require an account and have all content associated to your name. They also don't provide source code for their websites to allow you to see what they collect. The extension is open source and I'm happy to answer any questions about that code. – HaroldT May 06 '16 at 20:03
  • What a great tool! Thank you very much for providing it to the community for free! – vonjd Apr 14 '18 at 06:13
  • It doesn't include images :( – Hai Feng Kao Jan 04 '21 at 12:31
  • Putting aside that OP didn't seem to query for any particular privacy/licensing requirement, EpubPress got its backend open-sourced in 2017. And (barring bugs) it should also be able to include pictures. The only [caveat](https://github.com/haroldtreen/epub-press-clients/issues/29) if any is that they should be publicly accessible since the conversion process happens independently of your browser credentials. – mirh Feb 26 '23 at 22:59
8

Pandoc can take a link to a page (or a html file) and convert it to pdf/epub ...

I'm not sure if it'd crawl. If it doesn't, you could crawl pages first with wget or something (or just collect links) and give it to pandoc.

  • according to the man page it will: "Instead of a file, an absolute URI may be given. In this case pandoc will fetch the content using HTTP" – jopasserat May 18 '17 at 12:04
2

HTTrack is a good option - it will build an ebook from a website: It is available for download from here: https://www.httrack.com/ HTTrack "allows you to download a World Wide Web site from the Internet to a local directory, building recursively all directories, getting HTML, images, and other files from the server to your computer. HTTrack arranges the original site's relative link-structure."

You can then convert the HTML into an EPUB , AZW3 or PDF using Calibre, or any other HTML to epub conversion software.

A second option to convert directly to EPUB is EpubPress: It has extensions to allow use from Firefox (v44.0+ only) or Chrome. To use this software you need to open a browser window. Each tab is essentially a 'chapter' in your ebook. Arrange the tabs in the desired order of appearance, then activate epubpress - it will download and arrange the tabs in their order of appearance, in .epub format. Hope this helps!

*However, note that EpubPress downloads discrete webpages - not a 'website', at HTTrack does. To download a website with EpubPress you must open each link on the website as a separate tab, then use Epubpress to collect these links into .epub format.

str8arrow
  • 31
  • 2
2

You can use https://getpocket.com and the pocket recipe in calibre accessible via the "Fetch news" menu.

enter image description here

gagarine
  • 1,083
  • 1
  • 11
  • 20