Research on the past relies on documentary evidence. When tomorrow's researchers look back and interpret our electronic era, what will they have to study?

Does the past have a future?

G. Beato

Streetcorner millennialists, boardroom oracles, and pathologically avant-garde Condé Nast editors all diligently map the future of the future, but what about the future of the past? While it's safe to assume that tomorrow's past will be as different from today's as a Nokia Communicator is from a ballpoint pen, that doesn't necessarily mean it will be a richer, more comprehensive place. After all, it's not just the quantity of information that we're producing that marks our age; increasingly, this information exists only in digital form. And while Industry evangelists have been quick to position acid-free binary archives as the solution to the brittle book epidemic, we have concerns regarding the permanence of digital information as well.

Media decay and technological obsolescence1 are the real-world cumuli that darken the blue-sky picture of efficient, hyper-accessible archives that unrealistically enthusiastic digital boosters like to paint. At the moment, obsolescence is a greater concern in the archival community: It doesn't really matter if a CD-ROM has an estimated lifespan of 50 years or 100 years if after only 10 years, say, the CD-ROM drive required to play the disk will no longer be available and the word-processing program used to create the disk's files will be nothing more than a bullet point on an aging programmer's résumé. From the archivist's perspective, then, it's much more important to implement a workable migration strategy, moving information from old technology to new technology on a periodic basis.

Even in best-case scenarios, this is likely to be a costly, time-consuming proposition. It also increases the opportunities for information loss. As Janet Gertz, director for preservation at Columbia's library system, suggests: "You're always going to have change. Because if I have a document that I have word-processed, and I did it in a New York Times 12 point font, and it migrates through four generations of software, maybe New York Times 12 point font is no longer available. The words will still be there, but it wouldn't be the document exactly as I created it. And that's the easy scenario. Databases will be much more complicated." In addition, files can be corrupted during transfer, or even inadvertently deleted. Of even more consequence is the impact such frequent maintenance will have on curatorial mercy. Paper documents of questionable archival value are often retained, reflecting a "just in case" perspective; they require relatively little maintenance for long stretches of time, and they may ultimately turn out to be valuable to future scholars. But since the expense of maintaining those same documents in digital format is harder to rationalize, they are more likely to be consigned to the ASCII heap of history.

Archivists deal with two kinds of digital information: that which they receive directly from individuals and institutions, and that which they create themselves from analog materials in their collections. So far, archives and libraries generally have the most experience with the latter type, leading to certain subtle assumptions that bear mentioning.

First, in declaring technical obsolescence more problematic than media decay, one makes an assumption that an archive or library starts with an intact set of digital media. That is to say, if you have a set of an author's manuscript drafts on a CD-ROM with an estimated lifespan of 50 to 100 years, then it's more important to think about how to migrate that information than to worry about media decay. But most individuals, and even many institutions, won't be saving their documents and records on CD-ROMs and won't be turning these materials over to archives and libraries immediately after creating them. Instead, people are more likely to save their papers on hard drives, floppy disks, or other backup media with shorter lifespans, and to keep these media for years after they've actually stopped using them. By the time archives and libraries take possession of the materials, media decay may indeed be a serious problem.

In addition, the concept of migration as an organized, manageable process is based on the assumption that these migrations happen at set times, and that the information involved is of a relatively uniform nature. In the case of digital documents created from an institution's own collection, this is likely to be true: All the text files will share a common format, as will image files, database files, and so on. But individuals and outside institutions will donate their material on an irregular, ongoing basis; some of it will be in need of immediate migration, and it will undoubtedly appear in a wide variety of formats. While institutions like the National Archives and Records Administration, which is in charge of collecting electronic records from federal agencies, has the mandate to determine standards for information collection, that same opportunity doesn't really exist for private institutions. "Once in a while, you might be in the fortunate position of having someone come to you and say, I'm thinking of donating my papers--what file formats make the most sense?'" says Gertz. "But for the most part, what's there is what you get."

The documents and records that remain accessible over the long term are likely to offer a different view of their creators than earlier paper-based archives have of theirs. "We can look at a Hemingway manuscript or a Tennessee Williams manuscript and see cross-outs and changes," says Jean Ashton, director of Columbia's Rare Book and Manuscript Library. "But with the new technologies, the whole process of creation has become less visible." Theoretically, digital archives can offer an even greater level of detail than paper-based ones, but that proposition hinges on the currently dubious assumption that people actually use Microsoft Word's "Mark Revisions" capabilities (or the equivalent), save successive drafts of their documents, or even hold onto e-mail for more than few weeks.

At least a few people do these things, of course. One of them is Columbia historian Anders Stephanson, who suggests that e-mail may restore much of the historical record that the telephone has eliminated. "When people stopped writing letters," he comments, "historians lost an invaluable set of sources. Now, at least, we have in theory a remarkable amount of writing; though most of it might disappear, it might still probably be more than we had in the age of the telephone. This assumes, admittedly, that some people at least think as 'archivally' as I do." Alas, the computer makes it just as easy to discard information as it does to save it. Consider, for example, the mystery surrounding the first e-mail message, which was sent in 1964. Because no one saved it, no one knows exactly who sent it. (M.I.T., Cambridge University, and the Carnegie Institute of Technology are the three sites of suspects.2)

Counteracting the potential loss of multiple manuscripts, edited drafts, notes, and other incidental documents and records, however, are the new forms of discourse that the computer makes possible. Online conferencing systems like the WELL stand as a completely new source of information about the work of writers and other historically important persons. For example, while Bruce Sterling, author of several influential books on cyberculture, doesn't save e-mail or drafts of his work, he does host a WELL-based conference called "Mirrorshades" that he considers his electronic commonplace book. The conference includes his comments, and those of hundreds of other participants, on subjects related to his work. "A true Sterling scholar (should such a thing come to exist) could probably retrace a lot of my thought patterns and creative obsessions by going through this thing chronologically," he explains.

Computers also make new kinds of remarkably detailed textual analysis possible. Five months before a forensic document examiner compared handwriting samples to determine that Joe Klein was indeed the author of Primary Colors, Vassar literature professor Don Foster's computer, analyzing adverb choices, colon propensity, thematic similarities, and a variety of other lexical signatures, had reached the same conclusion. While various forward-thinking bibliotechnophiles are undoubtedly using similar pattern-matching capabilities in their pursuit of innovative scholarship, Ph.D.s, tenure, and literary truth, the bias that Klein showed for older, more "human" technologies--it was only after the document examiner's testimony that he felt he could no longer maintain his charade--is one that many of us still share.

Indeed, computers are such cold, powerful, ostensibly efficient machines that they seem to have little capacity for quirk or nuance. But this isn't true at all. If anything, their complexity creates more quirks and nuances than paper could ever harbor. Instead of regretting the loss of various kinds of incidental information that were native to paper archives, perhaps scholars should look for their digital successors. Was Hemingway the secretly sentimental type who would have used emoticons in his private correspondence? Would Winston Churchill have subscribed to the Drudge Report? In labeling his works-in-progress, would T.S. Eliot have used standard file-naming procedures, or would he have named his various documents, haphazardly, after his cats? In the case of these three men, these are things we will never know. In the case of future public figures we study, such effluvia may become part of the materials we use to reconstruct their lives.

1. Graham, Peter S., "Building the Digital Research Library: Reservation and Access at the Heart of Scholarship", Leicester University, March 19, 1997

2. The Task Force on Archiving on Digital Archiving, Preserving Digital Information, The Commission on Preservation and Access and The Research Libraries Group Inc., 1996

Related links:

  • "Preserving Digital Information: Report of the Task Force on Archiving of Digital Information," Commission on Preservation and Access and Research Libraries Group

  • Obsolete Media, Word

  • Media Stability Studies, National Technology Alliance

  • Online Computer Library Center

  • Conservation Online, Stanford University Libraries

  • Preserving Access to Digital Information, National Library of Australia

  • Council on Library and Information Resources, Washington, D.C.

  • "Primary Ethics," Freedom Forum Media Studies Center panel on Primary Colors

    G. BEATO contributes regularly to a variety of publications, including Salon, Wired, Spin, Newsday, Mother Jones, and SF Weekly. His observations on media culture also appear at his own online magazine Soundbitten.