Decentralising internet file storage, the why, what and how.

Ryan J Kris
12 min readNov 2, 2020

The mission to decentralise internet file storage is underway.

Decentralised storage platforms store data on peer-to-peer distributed networks. They are alternate file storage solutions to existing cloud storage services like Amazon S3.

Internet file storage is simply the retention of data on a computer or device. We store files on a daily basis, whether on our phone, in social media accounts, email or in web apps at work.

Behind these applications we are using, the data we save and that which is collected, is now generally stored in the cloud for us. This is super convenient but the cloud is highly centralised.

According to Gartner, five companies currently control 80% of the global cloud infrastructure market — Amazon, Google, Microsoft, Alibaba and Tencent. These companies manage content delivery networks that provide companies the infrastructure to deliver everything from critical business applications to doge memes.

That’s a small number of individual companies who control the infrastructure for data storage on the Internet. This poses a number of risks that are driving people to develop decentralised alternatives. This includes:

  • Limited competition. You must trust these few companies with your data
  • Data breaches are all too common.
  • Data may be censored at the whim of the controller
  • Transparency of data collected and how its managed is opaque

Decentralised storage developers are seeking to challenge existing cloud service providers with open decentralised economic models.

Decentralised storage platforms are building trustless architectures with new economic models for the storage and retrieval of data on the internet.

Decentralised storage is a component of the emerging web3 stack. Web3 projects are creating an alternate open internet architecture where users own their data and identity. These shifts represent the emergence of a user-controlled internet away from the centralised web2 model we are familiar with today.

This article explores the development of decentralised storage networks and looks at a couple of live storage networks, Filecoin and Sia.

Before we dive in, a quick recap of web3.0.

What is web 3.0?

The vision of web3 is immense. It comes in response to the increasing control of data and centralised power that web2 tech giants have in the economy and the maturation of cryptography and blockchains that power it.

The web3 era is preceded by web1 and web2. In recap — Web1 from around the 1980s to the early 2000s was characterised by the development of open protocols used, governed and owned by communities of users.

Web2, from the mid 2000s to now, saw for-profit tech companies build software and services on-top of these protocols. These services were much more sophisticated and developed faster than open protocols that came before. This attracted users, data and profit, but led to centralised control by large tech companies, like Google, Amazon, Microsoft and Facebook.

Web2 has certainly provided some amazing benefits, but also raised issues around privacy, security, censorship and competition. Check Chris Dixon’s article ‘Why Decentralization Matters’ for a deeper dive.

An internet which is censorship resistant and not centralised around control by a few tech oligopolies forms the web3 vision.

Web3 technologies decentralise those power structures by giving participants partial ownership of the network directly. This is generally in the form of cryptographic tokens on blockchains. Incentivising and rewarding users of networks is how web3 aims to grow and develop services comparative to what web2 providers offer today.

In the decentralised web there is no single authority, no one to implicitly trust. Like Bitcoin and other blockchain projects, the goal is to build trustless open architectures where the rules are codified and transactions can be verified by users. This creates transparency across the network without having to ‘trust’ who you are connected with.

Web3 technologies are broadly aiming to:

  • provide self-sovereignty, giving users more control over their digital identities and data
  • build distributed decentralised networks to limit single points of control and failure
  • allow permission less access to networks and applications
  • build dependable, trustless infrastructure that is censorship-resistant with verifiable interactions

Each web3 project has a more nuanced set of objectives but these are general foundational objectives. Web3 is a big vision so tradeoffs are inherent. What is ideologically desired may not be technically feasible (yet).

Achieving general availability of decentralised file storage is seen as a foundational pillar of the web3 vision.

For decentralised applications (dapps), which sit at the top of the stack, to meet their principles of being decentralised, they need to ensure the data they are capturing and storing is also decentralised.

If dapps use centralised storage, they are not decentralised anymore. This is where decentralised storage networks as middleware play a key role in the stack.

Decentralised Storage is just one component of the emerging Web3 stack.
Source: Multichain Capital. The red box calls out where Decentralised Storage fits in the Web3 stack.

How decentralised storage works

There are a number of projects building decentralised file storage solutions and they follow a similar vision. Each has their own approach but share some common principles, which are certainly different to existing service cloud storage providers.

Architecturally in web2, companies control closed databases and own user data. In web3 users own their data on open encrypted p2p networks.

I’ve captured a conceptual model in the diagram below to help visualise the workflows and services that exist between participants in a decentralised storage ecosystem.

Decentralised Storage Architecture, conceptual model to explain users, services and workflows.
Decentralised Storage Architecture, conceptual model to explain users, services and workflows.

Here are a few design principles:

  • The network makes use of the unused storage or hard drive capacity found on servers and in end user devices globally, creating a marketplace for data storage and retrieval over a peer-to-peer network.
  • Tokens are used to incentivise participation in the storage network. Miner nodes get rewarded for storing data on the storage they make available.
  • Users pay to enter into a storage contract to store their data. Contract details are stored on the blockchain. The buyer’s payment and the storage provider’s collateral are held until the duration of the storage contract has expired. Then the host receives the amount of token specified in the original contract, from the buyer’s account.
  • Uploaded data is encrypted and chunked into blobs. Those blobs are hashed and linked together. Data is stored off-chain.
  • Data is replicated to multiple nodes across the network. This reduces centralisation and control of data storage (aka the web2 data monopoly concern) and increases redundancy.
  • As part of the blockchain consensus rules, miner storage nodes must prove that they are maintaining the storage of data they have agreed to store via a publicly-verifiable cryptographic proof.
  • Layer 2 services are built on-top of the protocol to create marketplaces, storage applications and other novel solutions using the network and protocol. This L2 service provide friendlier user experiences when interacting with the underlying protocol.

Why decentralise storage, what are the benefits?

There are a number or reasons for the use of a decentralised storage network. It’s broad but covers,

  • No centralised owners of your data, nor data silos as content is distributed over many nodes
  • Reduced dependency on a single internet backbones, single servers and certificate authorities in the content delivery pipeline
  • Immutable data structures ensure data integrity and protection against data destruction and manipulation
  • Increased data privacy as the protocol is blind to the encrypted data stored on its network
  • Reduced contract storage risk, such changes in storage price or duration as contracts cannot be changed once agreed.

The key to the viability of using trustless storage architecture in ensuring that the platforms provide data integrity. These solutions must ensure a user can rely on the data they got from the network is the data they requested.

This is achieved through cryptography. When files are stored on the network they are hashed. When they are requested and received those hashes are checked for a match. If the hashes don’t match, then someone has tampered with the data.

A cryptographic hash is a unique fingerprint of the file (or a segment of it). If you have used Bitcoin before you would be familiar with public key hashes that represent your unique wallet on the blockchain. It’s the same concept but the hash represents the unique file segment.

The hash function maps inputs of an arbitrary size to output of a fixed size. The same data always produces the same hash so its deterministic. If the data changes the hash will change. It’s also impossible to invert and the data cannot be reconstructed from the hash.

Verifiability is one reason for using immutable data structures. Data can be verifiably persistent (as long as the storage contract is renewed). This adds a native security guarantee against data manipulation and destruction. Additionally the contract itself can easily be verified on chain by all parties.

Data theft and breaches are a growing threat. In its 2019 report, the EU’s lead privacy regulator reported a 71% increase in valid data breaches compared to 2018. Juniper Research put the cost of data breach to reach $5trn by 2024. The lack of a centralised data store protects against data breach and theft.

Another benefit is the use of location addressing. Decentralised storage platforms use location address not content addressing. When using the web, our browsers load data via HTTP which is known as location addressing.

With content addresses, instead of your browser looking up a central server for a filename, the decentralised file storage network is checked for the content of the file. You may be familiar with this concept if you have used BitTorrent before and magnet links.

This has the benefit of removing reliance on a single centralised server for content and also the internet backbones that those servers depend on. Servers and backbones are not immune to attack, distributing the data over a network of thousands of servers improves resiliency.

Content addressing also means the de-duplication of data on the internet. With location addresses, if a file is moved, all links pointing to that location need to be updated. Content addresses refer to the content anywhere on the network. If it exists then the link will always resolve.

Examining real world decentralised networks

As part of this article I looked deeper into two networks which are live today, Filecoin and Sia. Both share similar visions and ambitions, but have different levels of funding and at differing stages of development. Let’s dive in.

Filecoin

Filecoin is a decentralised data storage network allowing users to sell their excess storage on an open platform. It acts as the incentive and security layer for IPFS (InterPlanetary File System), a peer-to-peer network for storing and sharing data files. Filecoin, and IPFS are both developed by Protocol Labs in San Francisco.

Filecoin had a $257M token sale in the 2017 ICO boom period and after a long period of development has recently launched to mainnet as of October 2020. They have significant backing from VC firms including Andreessen Horowitz and Sequoia Capital.

Filecoin’s native token, FIL, is used to pay storage providers to store and distribute data on the network. FIL becomes the incentive for a host to make content available on the network persistently creating a marketplace for decentralised storage.

For content delivery it has built both storage and retrieval markets. Storage for those offering high capacity, archival storage, vs retrieval for those who need low latency fast retrieval of files. The marketplace optimises resources and helps users select providers that match their service needs, whilst helping drive the cost of storage down as the network grows.

Filecoin’s blockchain records details of storage deals between users and miners, transactions for sending and receiving FIL, along with proofs from storage miners that they are storing their files correctly. Content is not directly stored on the Filecoin blockchain, but by the miners themselves.

To get their rewards, it is up to the miners to prove they are storing and making the data available for users. This is achieved through Filecoin’s Proof-of-Replication and Proof-of-Spacetime consensus mechanisms. These proofs allow a miner to demonstrate that they have stored certain data locally, and that they are storing it over a period of time. In turn, the user pays a fee for the storage service provided by the network.

Filecoin is currently has 554PB of network storage across 642 hosts globally.

To learn more about Filecoin

Sia

Sia is a platform for decentralised cloud data storage built by Boston-based Nebulous Inc. It connects people ‘hosts’ that have underutilised hard drive capacity with ‘renters’ in search of data storage.

Nebulous first launched the Sia blockchain in 2016 after running one of the early ICOs in 2015, raising $120,000. Since launch they have raised a further $9m in funding, including a $3.5m Series A in 2019, from Bain Capital and others and $3m in 2020 led by Paradigm. The latest funding round also saw Nebulous re-brand to Skynet Labs as part of their push on Skynet — see below.

The funding differential to Filecoin is enormous and this team have achieved a lot following a more traditional startup, build, measure, learn approach. The Sia network has been in a production state for years now, which does give stronger assurance to new users.

The Sia blockchain uses proof-of-work mining for consensus like that of Bitcoin. Sia only maintains cryptographic storage contracts that are formed between the parties and paid with Siacoin (SC).

Storage contracts stipulate terms such as capacity, price and storage duration requirements to rent disk space. This is then recorded on the Sia blockchain. The host of the data will only get paid if they are still storing the data when the storage contract expires.

The team has followed Sia with the launch of Skynet in 2020 as a content distribution network (CDN). Skynet is a layer 2 application platform built on Sia’s network. It enables filesharing, dynamic websites, shared databases, and many more things that are not possible on Sia alone.

Skynet data is available through Skynet Web Portals, which are hosts running the Sia client. These portals serve as proxies to the Sia network allowing for the storage and retrieval of files. Skylinks represent content based links to files. Skynet data is unencrypted and design to be publicly accessible as opposed to private encrypted data stored directly to Sia.

Popular Decentralised Exchange (DEX), UniSwap is one key customer using Skynet already. Since Skynet uses content addressing, people can be assured opening Uniswap via a Skynet browser that the web page they are viewing has not been hijacked and is the actual Uniswap site. Check out other Skynet apps in the Skynet App Store now.

Sia is currently hosting around 2.14PB of data across 304 hosts globally.

To learn more about Sia and Skynet

Where to from here?

It’s hard to predict how projects like Sia and Filecoin will scale and whether they become go-to platforms of choices for file storage. What is promising is the level of innovation and experimentation happening. That is driven by some very determined individuals and teams developing these solutions.

There are challenges ahead for these platforms, and for others pursuing grand visions to challenge incumbent cloud storage providers, that should be noted.

The sheer technical complexity of building such networks may mean the experience to users will not be comparative to what we have today. Users want to have confidence in availability and redundancy of data they get when compared to existing service providers today.

Getting the token economics right on these networks is critical. Network growth can only be unlocked if the incentives for users to participate and store data is there.

Users too want a price stable service that is secure, scalable and actually decentralised. Amazon’s S3 service is often used as a price comparison point, but price comparisons alone may not sway people away from the current service providers. As the crypto ecosystem pie grows larger, the bigger opportunity is becoming the go-to storage service for crypto-native companies.

These projects, like other open source web3 projects, must also foster a strong community of developers and users, to ensure the networks continued growth and maturity. Continued hackathons, grants, community conferences all serve in this goal, as does continued transparency around the product roadmap, network performance and analytics.

As a key middleware component of the web3 stack, these services must also consider standards and specifications development for interfacing with other web3 services. Difficult web3 problems like data permissions, interoperability and multi-party sharing are still to be resolved.

Decentralised file storage solutions represent an ambitious and disruptive alternative to today’s centralised cloud storage providers and a key component of the emerging web3 stack. Blockchains, token incentives, cryptographic proofs, p2p networks and spare hard drive capacity are pieced together in a novel way that re-architects storage on the Internet. A paradigm shift in data storage could be upon us.

--

--

Ryan J Kris

COO @ Verida.io — crypto + web3 + privacy. Economics, finance, history buff, travel & good food a must.