Your code lives in the snow

Image Credit: Pixabay licence: viarami

Vanshika Dhyani delves into the complexities of storing code safely in the snow, and talks to Professor Cinnéide about GitHub and its Archive Program.

2020—TIME magazine’s nominee for one of "the worst years to be alive" in modern history unveiled mortality in a way that man had previously been indifferent to. 2020 was also the year GitHub realised its own mortality and decided to store its data under a mountain in the Arctic, to satiate "historical curiosity", and prevent indispensable code from getting "abandoned, forgotten, or lost." The Arctic World Archive (AWA) provides data security to several countries along with the world's largest source code host—GitHub. 

AWA was the brainchild of Piql—a Norwegian data storage company and Store Norske Spitsbergen Kulkompani(SNSK)—a coal mining company based in the Svalbard archipelago. Being a free economy and a demilitarized zone, Svalbard was the ideal choice for such a project. It was inaugurated in 2017 with records of important historical events from the Brazilian, Mexican and Norwegian governments.

Github—the literal hub for software developers, is an Internet hosting service that offers version control features to its users. Last year, the company archived all active public repositories along with inactive ones that were considered vital, in the AWA. It decided to preserve vital software in the case of a climate calamity, global catastrophe or an apocalyptic situation, citing the missing Saturn V blueprints.

Mel Ó Cinnéide, a professor from UCD School of Computer Science informed the University Observer that all his UCD student projects are hosted and submitted on the service. "GitHub is a wonderful resource — it hosts everything from small one-person projects to massive projects like Linux with over 14,000 contributors worldwide. " 

The steel facility that stores the code is an underground vault. It is buried 250 meters below the Arctic surface and is safe from nuclear bombs and Electromagnetic pulse. The contents are saved in shipping containers bounded by a solid wall and a steel gate. The vault itself is secured by permafrost that could take decades to melt, preserving its contents at a temperature below the freezing point for many years to come.

The 2nd of February 2020 was the date of snapshot and code deposit. Github announced that it would vault "every repo with any commits between the announcement at GitHub Universe on November 13th and 02/02/2020; every repo with at least 1 star and any commits from the year before the snapshot; and all repos with at least 250 stars."

GitHub’s data is stored offline in the vault on digital films coated in microscopically small light-sensitive silver halide crystals. The aforementioned film is essentially improved darkroom photography technology and has a lifespan of 500-2000 years.

The code is stored with the sole purpose of being serviceable to future generations. Thus every reel is accompanied by a human-readable directory that details how to extract information from the films. To ensure readability of code by future homo sapiens or others, a guide was composed in English, Hindi, Arabic, Spanish and Chinese. Github calls its guide the Tech Tree and describes it as "a selection of works intended to describe how the world makes and uses software today, as well as an overview of how computers work and the foundational technologies required to make and use computers." It is a public Git repository being developed as a joint online effort. Thus every reel is accompanied by a human-readable directory that details how to extract information from the films.

GitHub's 21TB of data is compactly stored as a high-capacity 2D barcode in film reels. The storage can be comprehended using a real-life example: "If someone who types at about 60 words a minute sat down and tried to fill up all that space, it would take 111,300 years". Out of the 186, only 1 reel is used to accommodate the code for Linux and Android operating systems, the same reel also holds 6,000 additional open source applications.

In its introduction to the Tech Tree, GitHub explains that the intention of the GitHub Archive Program is to "preserve open source software for future generations. This implies also preserving the knowledge of other technologies on which open-source software runs, along with a depiction of the open-source movement which brought this software into being."

Although, Professor Cinnéide emphasises that "GitHub represents a fascinating and rich snapshot in the development of computation, a field which is still in its infancy,” he believes that the Arctic Code Vault "is more of a stunt” and it is “highly unlikely that the technology required to read this archive will be available in the distant (or even no-too-distant) future, and certainly not in a post-apocalyptic scenario where existing GitHub archives in Oxford, Stanford and Alexandria have been destroyed. So its purpose isn’t clear to me, other than to remind us all to make regular backups." 

GitHub shares the reason for maintaining the archive: “As today’s vital code becomes yesterday’s historical curiosity, it may be abandoned, forgotten, or lost. Worse, albeit much less likely, in the case of global catastrophe, we could lose everything stored on modern media in a few generations. Archiving software across multiple organizations and forms of storage helps to ensure its long-term preservation.”

GitHub ensures its users are informed about their code's new living situation—the frigid zone. It issues virtual Arctic Code Vault Badges to millions of contributors across borders to "recognize and celebrate these contributions"

The next step for the GitHub Archive Program is to attempt banking all the repositories in the Arctic Code Vault for 10,000 years. To achieve its goal, GitHub is collaborating on Project Silica and expects to imprint its data into the molecular structure of Fused Silica with the help of a high-precision laser.