4

I'd like to backup content of my blog which is powered by posterous.com. I'd like to save all texts and images to the local disk. Ability to browse it offline is a plus.

What I've already tried:

wget

wget -mk http://myblogurl

It downloads the first page with list of posts, then stops with "20 redirections exceeded" message.

WinHttpTrack

It downloads the first page with redirection to the www.posterous.com home page instead of real page content.

Edit: The url of the site I'm trying to backup is blog.safabyte.net

Martin Vobr
  • 387
  • 1
  • 4
  • 14
  • I tried on a random user on posterous, and it worked without any problems. How about giving us the actual site url? – gorilla Jan 23 '10 at 00:45
  • Link added. See bottom of the question. – Martin Vobr Jan 23 '10 at 01:21
  • Just tried, wget picked up the your full blog contents – Sathyajith Bhat Jan 23 '10 at 05:33
  • Could you post the command line? In my case the 'wget -mk http://blog.safabyte.com' get index.html only. No images are downloaded. No pages with posts are downloaded. I'm using wget 1.11.3 from cygwin running on WinXP. – Martin Vobr Jan 23 '10 at 10:37
  • @Martin Vobr : `wget -mk http://blog.safabyte.net` GNU Wget 1.11.1 on openSUSE 11.0 – Sathyajith Bhat Jan 23 '10 at 17:03
  • Added a 'windows' tag as it seems to be os specific. After trying few things I've found a solution. It looks like the `wget -mk http://blog.safabyte.net` does not works on win. However `wget -mk http://blog.safabyte.net/*` DOES work. – Martin Vobr Jan 23 '10 at 18:50
  • Thanks @Sathya and @gorilla. Yours proof that it works for others has made me to try to fiddle with parameters again and to find how to get it work. – Martin Vobr Jan 23 '10 at 18:52
  • @Martin : Glad to hear it worked out. You might want to post your comment as an answer and mark it as accepted, it would help others in the future. – Sathyajith Bhat Jan 24 '10 at 06:03

3 Answers3

1

Posterous.com does maintain an API that might help you. In particular, their http://posterous.com/api/reading API might be of use. You may use it to obtain an XML file containing all of your posts and their content.

For example, http://posterous.com/api/readposts?hostname=jasonpearce retrieves all 12 posts that I've made to Posterous.

Jason Pearce
  • 216
  • 1
  • 2
  • 6
1

This worked for me:

wget -r -l inf -k -E -p -nc http://blog.safabyte.net/

It seems like using -m turns on -N (timestamping) and posterous is not sending last modified headers which upset wget, so instead I just used -r -l inf directly.

The options used are:

-r recursive
-l inf infinite depth
-k suffix html files with .html
-E update the saved files with links to local files
-p download page resources
-nc don't redownload urls more than once

This command is still not downloading resources from other domains, which means it doesn't fetch the images as they're hosted on a different CDN.

0

Managed to download at least all html content. Following code seems to download all pages from the blog (using Wget 1.11.3 on Windows XP):

wget -mk http://blog.safabyte.net/*

Posts images are still not downloaded. It looks like it's probably because they are stored on the different domains.

Html content is on blog.safabyte.com/* while images are in http://posterous.com/getfile/files.posterous.com/cheated-by-safabyte/* and files.posterous.com

Martin Vobr
  • 387
  • 1
  • 4
  • 14