The original proposal The book “For the World Wide Web”, written by Tim Berners-Lee in 1989, is an important piece of Internet history. It also cannot be opened on modern computers.
John Graham-Cumming, a British engineer and writer, attempted to open the Word document containing the proposal. Modern versions of Microsoft Word and Apple’s Pages both completely failed to open the file, as he pointed out in a blog post. The open source word processor LibreOffice It worked, albeit with some rough formatting. Graham-Cumming eventually found a PDF exported by CERN in 1998, which was the only way he could see the document as it existed in 1989.
It’s worrying that such an important piece of history, in such a common file format, could be almost entirely lost to time and software updates. Anyone with a collection of old digital documents, photos, and videos might wonder if the same thing will happen to their files, which is the kind of question digital archivists face all the time, it seems. So I reached out to one of them.
“Twenty years in the digital world is history,” says Lance Stuchell, director of digital preservation services at the University of Michigan. His team is often tasked with recovering digital files from old computers and storage media. “We have a lab that can handle old media: floppy drives, CDs, old computers. We can extract these items from those types of media and transfer them into our preservation system while being careful not to damage them.”
But getting the files off the hard drive is just the first step: Then you have to open them and leave them in a state that will allow them to be opened for decades. It’s a task that gave Stuchell a reason to think about strategies for preserving documents for as long as possible. I asked him what we, who are not professional archivists, should do to ensure that our files last for decades.
Use open formats
The Word document I mentioned above could no longer be opened by Microsoft Word because the software has evolved over time. This is part of the challenge of archiving digital files.
“Physical documents last less the more they are consulted,” Stuchell says. “Digital documents are constantly facing obsolescence. Over time, the file loses information.”
Updates to software like Microsoft Word mean that files that opened properly in the 1980s no longer open in the 2020s. Part of the problem is that Microsoft, and only Microsoft, controls the file format, or even knows how it works. For this reason, Stuchell says he encourages people to export files to an open file format, especially files they want to keep accessible long-term.
For the documents he recommends PDF/Aan open standard built on Adobe’s PDF format that includes everything the file needs to be opened, including the fonts used in the document. Microsoft Office, LibreOffice, and Adobe Acrobat all support exporting to PDF/A, which means it’s relatively easy to create such a file. Stuchell recommends archiving any documents you want to keep in this format.