Creating A Digital Archive that will last through the centuries depends not only on technology but also on administrative skills and policies. Before we choose a particular method of preserving digital documents, images, audio files, or data sets, we need to consider the ultimate goal. Is our objective to be able to read, hear, or play something a hundred years from now? Or is it to prove provenance in a court of law—for instance, that certain clinical test data really are the original ones we based a particular drug on?
The former is hard, but the latter is really hard. When you want to be able to read something in a hundred years, you probably don't mind if the background color and the fonts are a bit different as long as the words themselves haven't changed. In that case, a simple migration strategy should suffice: keep copying the file into newer formats that can be interpreted without any substantive changes by modern software.
But if you also need to know that what you're reading is authentic, in the sense that the billionaire's will you're reading is the real thing, then you need data essentially to be stamped with a digital signature to ensure that nothing's been changed since the item was stored. You'll also need to have the item tagged with a digital "paper trail" of all the changes the billionaire made to the will before finally storing it. DSpace supports both bit migration—transferring bits to new formats—and attaching the requisite tags to the file about what happens to it over time to prove legal provenance.
Like most difficult challenges, data preservation is really a mix of the simple and the complex challenges. At one end of the preservation continuum is a simple item, like an ASCII text document. Preserve the data by keeping the file on current media and provide some way to view it and you're pretty much done. We'll call this the "save the bits" approach.
At the other end lie the harder cases, like these:
A compiled software program written in a custom-built programming language for which neither the language documentation nor the compiler has survived.
A complex geospatial data set developed for the U.S. Geological Survey in a proprietary system made by a company that went out of business 20 years ago.
A Hollywood movie created with state-of-the-art encryption to prevent piracy, for which the decryption keys were lost.
For these three items, we don't hold out much hope of being able to preserve the content forever. For the software program and the geospatial data set, the digital archeologists of the future probably won't have enough information about how the software and data set were created or the language they were created in—no Rosetta Stone, as it were, to translate the bits from lost languages to modern ones.
As for that encrypted movie, our archeologists might have read old reviews that raved about the special effects in Sin City, but this cinematic achievement will remain locked away until someone pays a lot of money to a master of ancient cryptology to crack the key.
Fortunately, many content types fall between these difficult cases and ASCII text. Usually, saving the bits using standard, well-documented data, video, and image formats, such as XML, MPEG, and TIFF, gets you halfway to an enduring digital archive. Put another way, the goal is to avoid formats that require proprietary software, such as AutoCAD or QuarkXPress, to play or render the data.
In some cases, even files created with proprietary software might survive. Take Microsoft Word. Even though Microsoft does not guarantee backward compatibility over time—leaving it to you to resave documents into the latest version of Word—the program is so popular that we can expect such migration programs to emerge from third parties to help companies and governments salvage their valuable information assets.
Today, organizations looking to convert vast stores of Microsoft documents can turn to companies like ConverterTechnology, in North Sydney, Australia. The company uses a proprietary program to batch-process documents from one version of Microsoft Office to another and from other programs like Lotus 1-2-3 to Microsoft Excel [see sidebar, ].