Kahle says that he and his cohorts have now unleashed on the Web-surfing public nothing less than the largest collection of human words ever, bigger even than the Library of Congress, which the Wayback Machine is affiliated with. Of course, those words are not limited to the great works of the literatures of the world, but include a mess of just about anything one could imagine, including millions of pieces of marketing literature from now-defunct companies and the early versions of uncounted confessional personal homepages.
So with a little digging, you can learn that on May 11, 2000, Pets.com offered this bit of wisdom: "Word to the wise: Beware -- pooper scooper laws vary." Or that May 5, 1999, was Jenni of JenniCam's first day at a new job: "Mostly I just watched videos," she mused in her journal. "On the Job Safety, Our Corporate Vision, How to Sell, stuff like that. I haven't worked retail since high school."
But as any cultural relativist or American studies major knows, it's just such banal ephemera that counts, if you have enough of it. Beyond sheer novelty, there's a social value to preserving these cultural artifacts. In the future, perhaps they'll reveal more about us and the early Web than we could ever imagine. And while the bulk of the archive can be compared to a library of millions of digital brochures and scrapbooks, there are also featured "collections" of pages designed to show off the more serious artifacts as well: the Web news coverage as it happened on Sept. 11 and the way that some of the pioneering sites on the Web looked in 1996. There's certainly lots for researchers, scholars and pop culture fanatics to wade through. And there will surely be lots of writers and graphic designers searching for the remnants of their own hard work, now vanished as a result of publishing Web sites gone missing.
Then there's the pure nostalgia factor. Old mouse-hands will grow misty as they contemplate the good-old Ultimate Band List as it was in late 1996, or Amazon.com, that humble online bookstore, light-years before the smiley logo grinned its first evil grin on Jeff Bezos' quest for world domination of all "e-tail" everywhere.
At first, it's hard to see how anyone could object to such a historic preservation project. One of the maddening paradoxes of the Web has always been the odd reality that once something makes it online, it can't be taken back, and yet an individual page or site might disappear at any moment. But making the Wayback Machine freely available to all comers poses several legal problems.
The archivists are fond of using the metaphor of a library for the archive, and in its early years, it did function like a library, albeit one with closed stacks. It was a huge body of information that could only be accessed by sending a request to engineers who would pull out the relevant volumes. But now that it's online and free to search by anyone, the archive is more like a shadow Web, another version of the Web that's effectively republishing huge amounts of data from sites irrespective of copyright law.
It's a conflict that the founders of the archive are well aware of, one that led Lawrence Lessig, a law professor at Stanford, to rally the troops at the Internet Archive Wayback Machine's launch party with this amusing call to arms: "I join your fight against the students that I produce."
There are already some sites that refuse to be included in the archive. Sites can keep their pages off-limits either through password protection or robot exclusion, simply by automatically rejecting the software robots that specialize in indexing the Web. For instance, although some of the New York Times home pages can be found and searched in the archive, the stories themselves, as they ran, are not available.
Any Web page that is password protected is also inaccessible, which is likely to have an increasing impact on the future quality of the archive's collection as more and more commercial sites try to make money through subscriptions rather than advertising. But it's also likely to cause controversy with newspaper sites like the San Jose Mercury News, which offer free access to new stories, but make readers pay for archived material. Why pay up if you can already find the story for free at Archive.org?
"We're sure that there are going to be a lot of people who want to be excluded," says Kahle, although he notes that in the Internet Archive's five-year history 90 percent of the complainers have become converts after hearing that the nonprofit's primary goal is simply to preserve history, not to profit off it. Kahle says it has typically been individuals, not companies, who are most concerned about protecting their intellectual property -- or future privacy.
The Internet Archive's nonprofit status may help it avoid some legal challenges, but it is still not immune from basic copyright concerns. The problems that arise aren't likely to be entirely solved by blocking access to individual sites within the archive. That's because the copyright to the content of any given site doesn't necessarily reside with the operator of that site. For instance, a wire service, such as the Associated Press, might balk when it discovers that thousands of its stories, published on other sites, can be freely visited in the Internet Archive Wayback Machine. The testy members of the National Writers Union may also view the archive as an unauthorized and uncompensated republishing of their work. There's also the tricky question of what happens if a settlement in a lawsuit requires that libelous material be removed from a Web site, yet the original lives on in the archive?
The Internet Archive Wayback Machine may be ready to take us on a mind-blowing sojourn into the digital past, but it may have less success delivering us to a less litigious future.
About the writer
Katharine Mieszkowski is a senior writer for Salon Technology.
Related Stories
Google à go-go
While other search engines sputter and fail, Monika Henzinger, Google's director of research, has an answer to every query.
06/21/01
Do we really need an Internet time capsule?
Al Gore and AT&T ask students to upload pictures of their pet kitties for future generations to enjoy. Here's to online history!
07/12/99
The Net never forgets
11/25/98
Story finder (3 ways to search Salon)
Salon Directory (browse by topic)
