Modern Antiquity: Untying the Gordian Knot of Data Preservation

codex.storage

I am passionate about the future of durable, decentralised storage infrastructure to preserve vital archival records, knowledge, and documents. Before being involved with the Codex organisation and the p2p storage ecosystem, I had a weak grasp of the importance and intricacies of data storage — I was blissfully uninformed and ignorant.

Thanks to the team's wisdom, I had a zen-like flash of insight. I realised we need powerful decentralised storage solutions to fathom what combatting censorship and internet capture will look like. The world's data is centrally warehoused and controlled, making it a honeypot prone to attacks, outages, leaks, hacks, and censorship.

But that is not all. I had the epiphany that data — generally speaking — is not infinite. It is not forever, even in the modern age. Computers, disc drives, and solid-state drives will not guarantee that our data will persist into the future. Their mere existence does not imply archival preservation or longevity. Both analogue and digitally stored information tend to evaporate, age, deteriorate over time, and become subject to censorship, a problem I call 'modern antiquity.' More on that in a bit.

The beginning of a solution (even if we cannot guarantee the persistence of data into infinity) starts with understanding the scope of the problem, grasping the solution space, and intuiting the potential of decentralisation and p2p tech. Let’s talk briefly about the problem. 

Decentralisation Theatre and The Great Unpersoning

The crypto ecosystem somewhat ignores the importance of data and data ownership. Many Web3 companies store most of their data with centralised platforms like Amazon Web Services and Google Cloud, which all use warehousing for their cloud storage models. This contributes to the proliferation of 'decentralisation theatre' in the Web3 space, where people cosplay they are decentralised when the reality is more sobering. If a Web3 company promotes a product as decentralised but offloads user data to a cloud storage provider, its bar for decentralisation can be considered low. This fact compromises the data and user. 

Furthermore, cloud storage models rely on 'replication' to protect data. Effectively, replication means that data is copy-pasted on another server living in another warehouse. However, replication does not prevent censorship, protect from outages, or ensure a lifetime guarantee that data will persist. It only persists if the data colonialists, i.e. Google, want it to persist (and even then, there is no guarantee).

Recently, many people have lost their data or access to it, which amounts to losing cherished stories, sensitive information, and knowledge. The case studies are legion. In one example, author K Renee lost the rough draft of her romance novel. She had 200,000 words stored on Google Docs. She tried retrieving her work one day and was told: 'This content is inappropriate.' So, the company threw it into an unmarked van and disappeared it. They did not give a reason why. Her work has not been restored to date. Effectively, she was censored. Cory Doctorow referred to what is happening as 'The Great Unpersoning.' That is to say: when a person's data ceases to exist, the person ceases to exist.

For an example of mass unpersoning, consider the recent AT&T breach in 2019. Customer data from the hack surfaced on the dark web in March 2024. The attack impacted 7.6 million current customers and 65.4 million former customers. The company said they had launched an investigation to stop the malware from spreading, but their remedy is palliative. It only stops the bleeding temporarily. The core problem is that they have a centralised, targetable trove of data. 

The overall impact of unpersoning and data breaches is far-reaching. A 2023 IBM report suggested that they have cost $4.45 million USD, an increase of 15% over the last three years. The analytic company Surfshark estimated that 4.2 billion people worldwide have been affected by censorship. These numbers represent the real-world consequences of data storage breaches and single points of data storage failure. 

codex.storage

The Codex: History and Future of Data Storage

In this data climate, we live in what I call 'modern antiquity.' Even though our data is digital, it will not necessarily exist in perpetuity; we are 'modern,' but our tech is still in 'antiquity.' Let me explain. 

In antiquity, the most valuable information and content were written and stored as data in codices made of papyrus or vellum. A codex is an ancient handwritten manuscript that was the ancestor of the modern book. It is a stack of parchment pages bound together at one edge. It did the job of storing information, but the material was not necessarily durable and could only persist that content for a limited number of years. It was fragile, but it was the chief technology the ancients had available during the Iron Age.

Indeed, much of the content written in codices did not survive into the common era. In this manner, bodies of work were lost in part or entirely. For example, Quintus Fabius Pictor was a Roman historian who wrote during the third millennium BC. His work has come to us in fragments. Some of it could have been lost due to forms of censorship, but mostly because the work and its copies were subject to the ravages of time. In those days, the job of a scribe was to copy ageing material into a new manuscript, which, aside from being tedious, was prone to errors, censorship, mistranslation, and ineptitude. 

This concept of 'modern antiquity' suggests that we believe our data will exist forever due to the mere presence of new technologies: magnetic drives, flash memory, and the practice of cloud storage. However, data is temporal. Content is deletion-prone. In some ways, these hard drives and cloud storage devices can act as 'hardware papyri,' implying fragility and ephemerality. In this sense, our modern data storage techniques are fragile. In some ways, they are worse than antiquated storage methods, such as clay tablets, scrolls, codices, etc.

(This is also, IMO, why Codex has taken its name from the humble origins of persisting larger amounts of data — because codices were a novel and revolutionary form of large-quantity data preservation).

Data: The Gordian Knot 

To solve this problem of data fragility and prevent censorship, we have to distribute data across peers on a network, i.e. peer-to-peer. But there is more. To argue from metaphor, we must make our papyrus or vellum more durable and robust. It can no longer be centralised and prone to failure or destruction. This means having a mechanism to enable data repairs. Imagine the papyrus as being able to rewrite itself and render itself legible.

You can still use flash memory or magnetic discs on a p2p network, but the data should have coded redundancies that allow for retrieval in the instance of loss. These redundancies include using techniques like erasure-coding, but I digress. I won't get into all the technical weeds for this article because we have discussed the tech at length elsewhere. Suffice it to say, we have an engineering Gordian knot to untangle.

What is worse than engineering such a system is the denial, lack of data comprehension, and ignorance about data storage. This is why solving the engineering conundrum is not something we should take lightly or eagerly dismiss. Building this system is essential to developing the crypto economy and the global village we want to live in. Data is not a discardable piece of this picture; it is the whole picture. It is the Gestalt. 


Are you a developer or a decentralised storage aficionado? To learn how Codex works and use it yourself, try our testnet today.

To stay informed, follow us on social media, join our discord, and subscribe to the newsletter below: