The Problem of Data Archival

Recent federal regulations are requiring more companies to save more content for longer periods of time. What’s important here is the content. However, as technology changes, magnetic tapes and drives and optical discs are becoming obsolete and companies are constanly faced with the issue of migrating their data to newer formats. Furthermore, proprietary or older encoding formats prevent newer software from reading what already exists. There are potential and probably new approaches to this problem. One is using ASCII or other plain-text formats for storing data. XML encoding is another, and Adobe has put forth a variant on its PDF technology, called PDF/A, as another.

On a related topic, in case you haven’t heard yet, Google has plans to index the collections of several libraries as well as books by various publishers and make them searchable via their search engine. I wonder how Google intends to tackle the problem of ever-changing data formats. Of course, they won’t really care if no one else can read their encoding/format, but I’m sure a new technology will come along that will compel them to migrate.