The internet is disappearing… but you can help save it
The Internet Archive turns 25
Disappearing act
People and organizations remove content from the web for a variety of reasons. Sometimes it’s a result of changing internet culture, such as the recentshutdown of Yahoo Answers.
It can also be a result of following best practices for website design. When a website is updated, for example, the previous version is overwritten – unless it was archived.
Web archiving is the process of collecting, preserving, and providing continued access to information on the internet. Often this work is done by librarians and archivists, with assistance from automated technology like web crawlers.
Web crawlers are programs that index web pages to make them available through search engines, or for long-term preservation. The Internet Archive, a nonprofit organization, uses thousands of computer servers to save multiple digital copies of these pages, requiringover 70 petabytes of data. It is funded through donations, grants, and payments for its digitization services. Over750 million web pages are captured per dayin the Internet Archive’s Wayback Machine.
Why archive?
In 2018, President Donald Trumpwrongly claimed via Twitterthat Google had promoted on its homepage President Barack Obama’s State of the Union address, but not his own. Archived versions of the Google homepage proved that Google had, in fact,highlighted Trump’s State of the Union addressin the same manner. Multiple news outlets use the Internet Archive’s Wayback Machine as the source for fact-checking these types of claims since screenshots alone can be easily altered.
A 2019 report from the Tow Center for Digital Journalismexamined the digital archiving practices, and policies of newspapers, magazines and other news producers. The interviews revealed that many news media staff either do not have the resources to devote to archiving their work or misunderstand digital archiving by equating it to having a backup version.
Whena news story disappeared from the Gawker websitea year after the publication shut down, theFreedom of the Press Foundationbecame concerned with what might happen when wealthy individuals purchase websites with the intent to delete or censor the archives. It partnered with the Internet Archive to launch aweb archive collectionfocused on preserving the web archives of vulnerable news outlets – and to dissuade billionaires from purchasing such material to censor.
Archiving websites that document social justice issues, such asBlack Lives Matter, helps explain these movements to people of the present and the future.
Archiving government websites promotes transparency and accountability. Especially during times of transition, government websites are vulnerable to deletion with changing political parties.
In 2017the Library of Congress announcedit would no longer archive every single tweet, because of Twitter’s growth as a communication tool. Twitter supplies the Library of Congress with the texts of tweets, not shared images or videos. Instead of comprehensive collecting, the Library of Congress now archives only tweets of significant national importance.
Archived websites that document the culture and history of the internet, likethe Geocities Gallery, not only are fun to look at but illustrate the ways early websites were created and used by individuals.
Citizen archivists
Archiving the internet is a monumental task, one that librarians and archivists cannot do alone. Anyone can be a citizen archivist and preserve history through theInternet Archive’s Wayback Machine. The “Save Page Now” feature allows anyone to freely archive a single, public website page. Bear in mind, some websites prevent web crawling and archiving through special coding or by requiring a login to the site. This may be due to sensitive content or the personal preference of the web developer.
Local cultural heritage institutions, such as libraries, archives, and museums, are also actively archiving the internet. Over 800 institutions useArchive-It, a tool from the Internet Archive, to create archived web collections. At theUniversity of Dayton, we curate collections related to our Catholic and Marianist heritage, from Catholic blogs to stories of the Virgin Mary in the news.
Through itsSpontaneous Event collections, Archive-It partners with organizations and individuals to create collections of “web content related to a specific event, capturing at risk content during times of crisis.”
Similarly, it created theCommunity Webs program, in partnership with theInstitute of Museum and Library Services, to help public libraries create collections of archived web content relevant to local communities.
The websites of today are the historical evidence of tomorrow, but only if they are archived. If they are lost, we will lose crucial information about corporate and government decisions, modern communication methods such as social media, and social movements with significant online presences, such as Black Lives Matter and #MeToo.
Together with librarians and archivists, you can help ensure the survival of this evidence and save internet history.
Article byKayla Harris, Librarian/Archivist at the Marian Library, Associate Professor,University of Dayton;Christina Beis, Director of Collections Strategies & Services, Associate Professor, University Libraries,University of Dayton, andStephanie Shreffler, Collections Librarian/Archivist and Associate Professor, University Libraries,University of Dayton
This article is republished fromThe Conversationunder a Creative Commons license. Read theoriginal article.
Story byThe Conversation
An independent news and commentary website produced by academics and journalists.An independent news and commentary website produced by academics and journalists.
Get the TNW newsletter
Get the most important tech news in your inbox each week.