In just a few days, Twitpic will likely shut down for good, taking hundreds of millions of photos with it. Online archivists who work in the nonprofit space—like the Internet Archive’s Jason Scott—want to make sure those photos remain available.
Update: Since this story posted on October 21st, Twitpic founder Noah Everett announced that he had made an agreement with Twitter to keep the photos online for free, stating that “since Twitpic’s user base consists of Twitter users, it makes sense to keep this data with Twitter.” The service will otherwise shut down. The original blog post is below.
Before there was Twitpic, Twitter was largely a text medium.
The online service, which plugged into the social network, made it stupid-easy to publish photos to a wide audience, something that came in handy when Janis Krums happened to be near the Hudson River on a fateful day in January 2009 and got a chance to shoot this iconic news photo:
A lot has changed in the nearly six years since that photo, reporting the “Miracle on the Hudson,” first made its appearance, and for much of that time, Twitpic has been at the center of it. But the service, facing trademark issues from Twitter and failing to reach an acquisition deal, announced it was shutting down—and taking its massive array of user data with it.
If that sounds like a sad tale, Jason Scott would agree with you. The archivist, documentarian, and internet advocate has spent much of his time in the past few years working to save content on shuttered sites like Google Reader and Friendster from certain death.
It’s not an easy job—and it requires a lot in the way of volunteer help, hard-drive space, and bandwidth. But Scott has found that help through two organizations: his own Archive Team, which raises the alarm when a cloud-based online service announces a shutdown, and the Internet Archive, the nonprofit that hired Scott in 2011 and has offered technical backing for Archive Team’s data-recovery efforts.
“Over the years, Internet Archive has provided disk space and bandwidth for the sites that Archive Team has downloaded,” he told me over the weekend. “They’ve been provided in a format (WARC) that works with the Internet Archive’s Wayback Machine, so it’s a great partnership.”
I'm watching a ragtag set of volunteers in an archiveteam channel work harder to save copies of twitpic than any professional organization.
— Jason Scott (@textfiles) October 17, 2014
Twitpic has now shut off all outside access to the images on their system. The archive team machines are blocked out.
— Jason Scott (@textfiles) October 17, 2014
In some cases, twitpic (before removing all the image access) was banning entire ISPs to stop archive team backing it up.
— Jason Scott (@textfiles) October 17, 2014
Scott, who is currently working on the Twitpic efforts before the service’s October 25 shutdown, offered some insights on his data-rescue work, along with the pitfalls that come with using publicly available cloud services. Check out the Q&A below:
When something predicated on user-generated content is on the brink of death, your team of volunteers goes into action. What lessons have you learned over the years working on archive projects for sites like Geocities and Posterous?
We have definitely learned that it is extremely rare that a company will work at all to help us. Many of them want to forget something earlier or act like it never happened, so it’s in their best interest to hide the results of that work. When we go in, sometimes we deal with complaints or issues raised by the original owners, but they’re mostly meant to distract from the fact that they are voluntarily destroying all this data.
We usually spend a few days assessing the site, figuring out its capabilities, and finding the easiest way to grab the most amount of data. It shouldn’t be surprising that a lot of companies at the end of their life will have severely reduced machine and network capability.
If I’m ever feeling down or concerned about how this all goes, sitting in that discussion channel while all of the engineering folks at Archive Team assess the site is truly inspiring.
Obviously tools like Twitpic are widely used and carry significant value—value that can disappear when a company’s funding well runs dry. What should users keep in mind when taking advantage of these kinds of cloud-based tools?
I think that, unfortunately, the trend has been to obfuscate and hide from users the engineering required to keep all of this data available. Costs for storage, network, and maintenance are all turned into some sort of hidden shadow budget. That means that if any of these increase or otherwise are not handled by things like advertising or budget, everyone will discover what a house of cards they’ve been living with.
Assuming that your photographs, writing, email, or other data is important to you … you should always be looking for an export function or a way to save a local backup. If a company claims it’s too hard, they are lying. If they claim that they have everything under control, ignore that and demand your export and local backup again.
People don’t recognize that this data is where all of our history is going now.
As far as Twitpic goes, how do you wish the company and its staff would’ve handled things differently in this case?
I can’t think of a single thing they’ve done that’s right.
They have ignored people trying to rescue this data to at least make it statically available for historical reasons. They have been inconsistent in their message; they have misled people as to the current state of the company.
Every message out of the founder, Noah Everett, has been a single sentence or statements, with no background, no plans, and no deeper information. Refused interviews, ignore phone calls, and not respond to any emails. This is no way to run a railroad.
(Editor’s note: Everett, who is currently focused on a new startup called Pingly, did not respond to requests for comment for this article.)
Beyond the current situation, what sort of barriers do you see holding back online preservation efforts like those that Archive Team and the Internet Archive are known for? And do you see any sort of long-term remedies to the problems that these kinds of projects face?
I think the problems are mostly cultural. People don’t recognize that this data is where all of our history is going now. We don’t take photographs with film, we don’t write letters or journals on paper, and we don’t tend to keep creative works in anything other than electronic form. I think once we recognize that this is where the majority of the media goes, and that it is no longer a transient fad, perhaps we will take it all a little more seriously and head off further disasters to some extent.
What role do you think the ad-driven business models of many online services play in these shutdowns—and do you think users would find themselves in less danger of losing their data if they paid for using a company’s services?
The whole system is a mess—there’s multiple ways to run a company, and “give all your services for absolutely free by getting money elsewhere” has shown to be a terrible, terrible business model. It has no future, and after a short time, those businesses tend to go down hard. The problem is basically this situation at the end, where users who stored data at this temporary party are finding out how little effort was made to think about the long term.
I pay for all “cloud” (networked) services I use, with the exception of Twitter and Facebook—and I’d happily pay if they switched to that model.
You’re known for having a strong opinion on cloud computing, most eloquently described in this 2009 essay you wrote (which, just a side note to our readers at home, has a profanity in the title). Given what’s happened over the past few years where the cloud has become more prevalent than ever—and, in some ways, required—what do you think organizations should keep in mind when relying on cloud services?
For a number of sad reasons mostly related to marketing, cloud computing has proudly made things obfuscated and hidden away. Instead of giving people direct information after the contents of their cloud setup, number of locations, and other aspects, end users are not allowed to know where their data is sitting, in what condition, and doing what for who.
I think in a certain design fashion, it’s sexy to have things feel like a 1970s sci-fi novel where a single simple interface just does everything you could ever want, but that sort of interface comes with endless complication beneath, and when things go wrong, it becomes more like Brazil.
I totally realize it’s nice to make it all seem simple, but I wish that it was required or the practice to allow a person to understand what the systems beneath were and what protections were in place, not unlike nutritional information or ingredients lists.
Frankly, having worked within the tech industry for years, the percentage of it that is a scam is remarkably large. People are sold a false bill of goods, and luck and handwaving allows hustlers to make it to another day. This approach is encouraged by rah-rah culture in tech, and while like most scams it works while it works, the whole thing can come crashing down in a week. So we’re fighting that cultural fact as well.