Organizations and universities alike have been banding together to prevent important-yet-dated documents from disappearing entirely online. Digital preservation is a big problem, and it’s one that associations—with their significant content offerings—should do more to solve. Here’s how.
Of all the winners of the 2015 Webby Awards, the winner of the law category might have the most lasting effect. And not just because it’s a groundbreaking project.
Rather, perma.cc got the nod for an effort that could help solve a major problem for legal analysts and academics: the tendency, over time, of a hyperlink to “rot,” or lose its original URL.
The problem is well known and well documented—and particularly problematic in cases where such data would be better encased in amber, like those linked in legal decisions. But it took the work of the Harvard Library Innovation Lab to do something significant about it.
The project emerged from the work of three Harvard Law School researchers—professors Jonathan Zittrain and Lawrence Lessig and student Kendra Albert—who noted that only half of all links used in recent Supreme Court decisions were still active at the time they published a 2013 paper on the topic.
“The link, a URL, points to a resource hosted by a third party,” the authors explained [PDF]. “That resource will only survive so as long as the third party preserves it. And as websites evolve, not all third parties will have a sufficient interest in preserving the links that provide backwards compatibility to those who relied upon those links.”
Perma.cc, introduced in late 2013, was born of these concerns. The website helps keep this legal and academic data “vested,” ensuring that links won’t disappear from the web on a whim and can be safely cited by researchers and Supreme Court justices alike. Already, several state court systems are using Perma.cc’s technology.
“Libraries are in the forever business,” Harvard Library Innovation Lab Director Kim Dulin said in a news release. “We developed Perma.cc to allow our users to protect and preserve their sources, no matter where they originate.”
(It’s worth noting that the Internet Archive—itself a huge advocate of preserving old online content—is among the supporters of this effort.)
A Case for Digital Preservation
But for all the impressive work that Harvard is doing on this topic, it begs the question: What are owners of content doing to keep it in place?
Associations, particularly those with interests in research and academic knowledge, should be mindful of the technical needs of their underlying content—both old and new—as well as the effects that a redesign can have on a URL structure. Part of the problem can be traced back to decisions made early in the life of older content, when considerations regarding permalinks were led by the software driving content management systems, rather than by whether URLs would be human-readable.
The sooner you create a structure for your information that is flexible and lasting, the easier these kinds of changes will be to make down the road.
This equation has started to change. News sites and blogs have become better at saving URLs (with some exceptions), and sites like The New York Times go out of their way to preserve print content in an easy-to-search digital format. But archiving is hard work, especially if stakeholders have to make up for information architecture decisions that fit poorly with the modern web.
Content preservation requires resources, time commitment, and prioritization—all of which can be difficult for organizations. But the sooner you create a structure for your information that is flexible and lasting, the easier these changes will be to make down the road.
A Link-Fixing Checklist
First, an important reminder: These kinds of information architecture shifts aren’t failings on your part. The web circa 1999 isn’t the same as the web circa 2015. Best practices have evolved, and online tools have adapted to modern needs. Just because you’re making changes later in the game does not mean earlier efforts were bad, just that there’s a better understanding of what digital preservation entails. Going back, you have the opportunity to do it the right way.
Talk with your vendors and developers about permalink structure. Your needs will differ significantly based on the project—an internal social network or forum, for example, has much different needs than a blog platform. Likewise, if your association were to use a link-shortening tool, obviously brevity is a priority over clarity. This kind of process will take time and money, but if done right, it can future-proof your content structure.
Pretty permalinks preferred. It’s 2015, and while gibberish-heavy URLs were never really in vogue, they look downright ancient compared to a well-considered URL structure. If you’re eyeing a restructure of your website’s links, creating a well-organized permalink structure that cites the basic subject matter and is self-explanatory is the way to go, if your content management platform allows you to do so. It’s better for search engines and users alike.
Know your redirects. If you’re changing up your links, you need to ensure that your old links go to the right place. It can be a headache to redirect URLs, but it’s definitely possible. That said, the two primary URL redirect strategies have different purposes. The 301 redirect, which is generally added at the server level, tells the browser to redirect the page entirely, so it forwards to the latest version. This is most useful for users and is said to have no effect on search engine results. Canonical redirects, meanwhile, are generally added to individual pages and are essentially messages to search engines, telling crawlers that the primary version of the page is somewhere else—something that might come in handy in cases of duplicate or republished content. Both have their uses, but they can easily be misapplied and are not silver bullets, as Search Engine People‘s Carla Barker notes.
Make your old content web-friendly. Here’s a common occurrence for me when I’m researching a news story or looking for comment on an issue: I load up an association’s website, and the resource—usually a press release—is in PDF or Word format. Sometimes, the PDF will even be scanned in from a printed sheet of paper, meaning it can’t easily be cited. These formats have their benefits, but they’re based on the assumption that the document will be printed. (It probably won’t be.) If long-term preservation and readability is your goal, you should convert these to HTML, which will future-proof them for mobile, watches, or whatever content-delivery innovation is coming next. This isn’t a hard-and-fast rule—old magazines, for example, may make more sense in PDF formats—but it should be a discussion point.
Automation is possible, but … If this seems like the kind of project you’d love to do with a script rather than a whole mess of grunt work, that’s understandable. It takes time to do this right. And there are ways to automate the process with plugins or other tools. (If you’re using WordPress, for example, the third-party tools by Yoast are an excellent starting point.) But you still need to double-check your work. Ensure your archives have working links both manually and by using products like Google Webmaster Tools.
Associations have big collections of data just sitting on servers somewhere. We know that massive value can be culled from this data—value that can be monetized, shared with members, or even highlighted for historical value. And the people linking to your old content are the ones who suffer the most when this data isn’t properly preserved.
The Supreme Court may not be linking to your website, but you should keep in mind the people who are.