SEO and the Internet Archive

September 9th, 2007

When attempting to diagnose problems with SEO on a website, one potentially useful tool that is often overlooked is the Internet archive, also known as the Wayback Machine. The Internet archive is a database of historical records that stores the images and source code for websites, allowing you to see what they looked like at any point in their history. After reading an interesting article written by an employee at a reputable SEO firm, I realized that this archive has uses that go beyond simple curiosity about what websites looked like in the past.

In the case of the SEO firm that was described in the article, it turns out that they had a client who was operating a business site in a fairly competitive industry and was suddenly having problems with decreasing search rankings. Prior to the recent drops, the site had been doing well in the rankings for most relevant keywords and had been steadily building up traffic over the past three months. The people at the SEO company that was hired to figure out what the problem was could not seem to determine the source of the decline in rankings through conventional methods. The site had clean code and appeared to be well optimized, and there were not any substantial changes to the site’s content during the period when the sudden decrease in traffic occurred.

Just as they were about to give up, someone at the company got the idea to check the Internet archive and discovered that there was a small but important difference in the HTML of the site’s home page before and after the major traffic decline. Before the fall, the navigation links had titles that were well optimized for the main keywords, but the archive for pages after the traffic drop (including the then-current version of the page) showed that although the links still existed and were functional, the title attributes had been removed, which effectively reduced the keyword density of the home page. This explained the subsequent decline in the rankings, so the link titles were quickly replaced and the site gradually moved back up to where it had been before. It was discovered later that the site’s web designer had accidentally removed the link title code during a recent update and never realized what had happened.

The incident inspired some thought about the possible uses of the Internet archive for the purposes of SEO and general website management. Here are ten ways in which the Wayback Machine can be utilized:

1. Detection of significant HTML code changes that may have affected a site’s rankings (as in the case above),

2. Analysis of design and source code history of your sites or those of your competition,

3. Checking if a particular website has ever been optimized in the past,

4. Studying the design and source code for other sites that are doing well at SEO, in the hopes that you may be able to replicate their success,

5. Noticing sites that may have used “black hat” optimization tactics in the past (such as keyword stuffing, hidden text, etc.),

6. Looking at what kinds of keywords your competitors targeted in the past and comparing these with the currently favored keywords,

7. Comparing the changes in design, interface, and the like at older, well-established sites to get some ideas of what might work for your site,

8. Resurrecting images and HTML for sites that have been hacked or wiped out, and there are no recent backups available,

9. Detecting duplicate content or violations of copyrighted material in cases where the site owner has already removed the questionable material, and

10. Determination of the true age of a site, including the various purposes for which it may have been used in previous years.

Of course, there are other uses besides these that are not SEO specific, such as making fun of sites that had relatively inferior designs and graphics in earlier historical periods. It is also interesting to look at what some of the popular search engines looked like ten years ago and see how far they have come since then. For the SEO professional, however, the Internet archive is an important resource that should not be overlooked, especially when other efforts to pinpoint problems have been seemingly exhausted.

