How to archive a blog

i_am_looking_for_my_happiness__by_chromepictures-d4yy9uc

I care deeply about the preservation of self-expression.  Archiving people’s labor of love and preserving them for posterity is good for society.

With commercial art, music and writing there are more avenues for preservation but nobody cares about archiving the blogs. In general, nobody cares about works that are available free of charge on the web. I am a strong believer in the idea of disseminating ideas free of charge. There are treasures out there. We have to find ways to preserve them.

Nothing is forever! No web platform is forever. I hope my current platform – WordPress – exists for a long time but I know that everything ends or transforms eventually. I started writing on the Google-Knol platform in 2008. Google discontinued  the Knol platform in 2012. To address a different audience, I started writing at the OpenSalon in 2010. When I stopped being active at the OpenSalon because of their never-ending technical problems they deleted all my writings (OpenSalon was shut-down in 2015). Starting in 2011, I slowly moved all my writing to the WordPress platform. I have been happy with the service here.

Nothing is forever! Books are not forever either. Many people suggested that physical paper is the best method of preservation for our writing. I am not sure! Books go out of print. But, I see the value in replication. If there are enough printed copies, someone, somewhere will have a copy even though most of them will be destroyed.

Can you imagine what a loss it would be not to have the Nag Hamadi texts today. Somebody bothered to preserve those writings 2000 years ago.

Sergey Brin wrote a NYT opinion piece about the preservation of libraries and books. I respect and highly value the Google-Books platform but I no longer trust Google because of the bad taste in my mouth from the Knol experience.

After this long introduction, I will now show you a way to archive your blog. This method is based on the internet archiving services. You might ask, “what if a particular internet archiving service goes out of service?” Well, the method I am describing below is independent of which internet archiving service is used. My hope is that at least one of them will survive long enough.

All you have to do is to maintain a list of the urls (internet addresses) of your writings in a blog post or page. For example, I keep my list in a blog page titled “index“.

I am assuming that you allow web crawlers to index your blog site. In other words, you don’t have a robots.txt at the root of your blog.

“The Wayback Machine collects every Web page it can find, unless that page is blocked; blocking a Web crawler requires adding only a simple text file, “robots.txt,” to the root of a Web site. The Wayback Machine will honor that file and not crawl that site, and it will also, when it comes across a robots.txt, remove all past versions of that site.” [1]

This means that most blog posts and pages have been archived at the Wayback Machine. If you know the url (internet address) of a specific post of yours then you can possibly see the archived versions of that post at the Wayback Machine. The problem is that you may not remember the url. It is not practical to write down all the urls in a notebook. That’s why I keep a list such as  “index” that contains the urls in the form of links and I link to this “index” from the root of my site. For a blog site the root of the site is typically the “About” page.

In order to see the archived versions of my blog I go to the Wayback Machine

http://archive.org/web/

and type the following in their retrieval box and click the “BROWSE HISTORY” button

 sureshemre.wordpress.com

then obtain the latest archived version of my blog site. This brings up my “About” page which contains the link to my (archived version of) “index”. Then I click on the “index” to see the list of my writings. I then click on the links to read the archived versions of my posts. There are no broken links so far but some pictures are not visible. Also, the Wayback Machine is few months behind. That’s expected, though.

Important point: If there are no links to your pages or posts, crawlers won’t find them (the crawler robots don’t enter queries in search boxes). This is why you have to have a link to your “index” from your “About” page which is the root of your blog site.

For curiosity, I wanted to see my Knol articles. Remember Knol does not exist anymore. I typed the following in the retrieval box and clicked the “BROWSE HISTORY” button

 http://knol.google.com/k/suresh-emre/

Then, I obtained the latest archived version and followed the links to read my old Knol articles. The Wayback Machine is not perfect. I saw many broken links. Some pictures did not show up either. These problems may be related to Knol, however.

Similarly, I typed type the url

 http://open.salon.com/blog/suresh_emre

and obtained the Feb 21, 2011 version then followed the links (especially the links on the left hand side in the “SURESH EMRE’s LINKS” section) to read my old OpenSalon articles. Again, I saw many problems (broken links, etc). These problems may be related to OpenSalon.

[1] New Yorker article titled “The Cobweb” by Jill Lepore

Advertisements

About Suresh Emre

I have worked as a physicist at the Fermi National Accelerator Laboratory and the Superconducting Super Collider Laboratory. I am a volunteer for the Renaissance Universal movement. My main goal is to inspire the reader to engage in Self-discovery and expansion of consciousness.
This entry was posted in tutorial and tagged . Bookmark the permalink.