Playing devil’s advocate here: Why is it so important to collect data, software, and old websites? Aren’t we all drowning in information anyway? 

We are drowning in information more than ever before, but a lot of that “information” is endless feeds of ambient humanity, captured in digital—livestreams, music recordings, ad-fed clickbait articles and blogs, as well as endless social-media check-ins and clicks that are interesting but not critical. We swirl in this abundance. 

     But threaded throughout this tidal pool of information is a bunch of critical history, experience, and writing that used to all be offline, put into printed sheets, or placed in memento books and diaries. Anything that affects the spew is going to affect this stuff as well. 

     That’s what drives historians and archivists such as me to make sure data isn’t treated as 100% disposable, because a good percentage of it isn’t. The issue comes with where to draw the line. 

Many people have the idea that if something is online it will be there until the heat death of the universe. That’s not true, right? 

It’s so untrue I wonder what causes people to think it. It’s either the existence of memes, where an image lasts a very long time due to the fact that people repurpose it for new things, like a discarded piece of metal. Perhaps it’s the speed at which surviving data can be retrieved. If you can find something that has been around for 10 years in seconds, then in a world where finding it used to require a trip to the attic or the library, it can have an aura of being “here forever”. 

     But a glance at the last 20+ years of the “web” shows that many, many communities and collections are gone, wiped clean, no record of them outside old articles and the Wayback Machine. And archives such as the Wayback Machine won’t capture the search engines and other features those sites had. If you don’t know exactly what you’re looking for, such as the URL, you’ll have a hard time finding it. 

     The way I often put it is that the capability to keep something around, digitally and forever, is the best it has ever been. Without human intervention and a lot of caretaking, the chance of it disappearing, unlike, say, a binder of photographs or a forgotten letter, is huge. 


What’s your favorite example of crucial information that’s lost forever? 

Nobody seems to have been doing large-scale website downloading and archiving between 1993 and 1995, so evidence of that time is around here and there, but not in any meaningful amount. A lot of pre-web stuff, aspects of the Fidonet or related to Usenet and so on, is also going to be gone unless someone has a hard drive we haven’t heard of. All that Myspace music. All the Geocities stuff that others missed. 

     A lot of this missing data is a cultural loss, more than anything else. While one could argue whether it’s 100% necessary, it’s also a fragile thing and reflects where we came from. That’s the sort of stuff I’m sad about. When a floppy disk here or a zip file there arrives out of nowhere, there’s much rejoicing. 


If we look at the state of media technology today, it’s hard to get an overview of what’s going on. Certain new technologies are very present, but others don’t get a lot of spotlight. From your perspective as a historian, what are the interesting trends?

Glass half-full and glass half-empty rule the world right now. We’ve settled on 8-bit ASCII or Unicode for almost everything, which means less of a problem for later reading and decoding. And centralization, as problematic as it is in so many other ways, is a really solid way to ensure things are seen. For example, finding trends or mentions of information on Twitter. 

     But convenient file formats mean we assume everything is going to be in those formats, and anything lurking outside that format could be forgotten and lost. Also, those wonderfully dependable centralized systems are prone to losing or deleting piles of data due to economic, legal, and arbitrary reasons. 

     I’m very happy with having these wonderful little home computers in our pocket and doing all sorts of things with the world using them. Maybe the vision they provide is biased and broken, but home computers of the old days had the same issue and you didn’t get any vitamin D. 


What were the most important tech developments since the 1970s that people don’t know about? 

There are a lot of obvious ones, to be sure, but there’s one weird fact of computers and related life that didn’t hit everyone’s radar for how revolutionary it has been. That’s code control. This system allows various people to work on a huge online project, submit changes to the code, have it tested, problems noted and sent back, and the changes are documented and all reversible. We think of wikis being similar, but only in terms of tracking many changes among the collaborators. Code control is different, and without it, a lot of our systems would fall apart, and fixes would be incredibly difficult to roll out. It has revolutionized our relationship with computers and code, and few people know it exists. 

      As a side note, some projects and people use a code-control system to track and maintain projects that aren’t code, since the restrictions and verification mechanisms are so strong. Lists of resources, maps, and documentation use this system. While it’s not always the greatest fit, it demonstrates how different our interaction with our machines has been. 


As an archivist, you probably have a different definition of trust than other people dealing with online info. What do you trust? How do you evaluate if something is real? 

There’s big, big money and big, big business in manufacturing information for the benefit of others. Not so much elsewhere. But I agree that we’ve set up mechanisms that make it easy to convince people to think in a way that might not be accurate, especially those who have no day-to-day experience with something. 

     I take low-stakes declarations of what something “is” more seriously than high-stakes declarations, even though I’ve been surprised when low-stakes aren’t enough to stop someone from creating misleading or falsifying information. 

     I’m sure I get tricked, too. Literacy and verification continue to be our biggest hurdles going forward. Trust and verification will become as important as security currently is, because it’s the new level of security. 

Bad Sectors

Companies such as Facebook try to commoditize trust, offering Facebook as a log-in method for other services, for example. How big is the trust business? 

Huge. And fundamental. Whoever corners the trust market will make billions of dollars, until they have a breach. Then, whoever steps in indicating that they have a better solution will make billions more dollars. Until they have a breach, too. 

     We’re depending more and more on a remote, online methodology of ensuring what we see on the screen is what we think it is, and no bad actors are in play. That’s a leap. It’s going to be up there with eye strain, bandwidth speeds, and privacy for what represents our baseline experience with our machines and each other. Trust is as important as the silicon and the electricity. 


As someone who grew up in the 1980s, surrounded by cyberpunk literature and tech-based sci-fi, what is the most astonishing thing for you about the real year 2020? 

Voice recognition. I was told voice recognition was never going to happen, but I can ask machines around my house to tell me what’s going on or for a basic answer to a question. It’s still mindboggling, and I get to use it anytime I want to! I’m not going to consider the privacy-destroying aspects at this time! 


Deepfakes are a new cultural phenomenon in the ever-growing ecosphere of meme culture. Could you put them into a historical perspective for us? 

We have always been disappointed with what reality reflects when we create images of it. From photographs to film to video, we have bought into the idea that we have captured a version of reality that is reality accurately reflected. But from the years of print advertising, political entities airbrushing out unwanted figures, and modifications and edits to newsreels, we’ve been manipulating these captured “realities” to various ends. 

     Pornography took the starting flag for deepfakes, but the technology, underlying and unavoidable, is the utilization of math and machines to change what moving images and photographs show into something else. Sometimes it’s something we want more. In some cases, the technology upscales old film footage, and in other cases, it misleads and adds stress and misery. But the same amount of processing used to create a deepfake can do a lot more. The realm is of great fascination to me, and I look forward to what comes next. As one does. 


Is there anything that we can do to make your job as an archivist easier? 

You know, don’t be afraid to keep everything about a place you worked at, a school you went to, or a project you worked on. Label it, and forget about it. That’s digital or analog. Make a USB stick of things that matter to you, that you created, or can’t be gotten anywhere else. 

     That’s the biggest deal. Make sure the things you do, that you work on, that you make or want to have value later can be found again. The value can be memory or nostalgia. Keeping the record is what matters. 

     And live life. That’s pretty important too.

JASON SCOTT is a digital historian and filmmaker, and he can be called the figurehead of the digital-archiving world. Jason runs, one of the most comprehensive collections of our digital past, and he shot a couple excellent documentaries, such as BBS and Get Lamp. He also works for the Internet Archive.

Follow Jason on Twitter: @textfiles

Copyright 2021 TFLC
Ideas for change