InterPlanetary File System (IPFS) :- The new internet !

Significance !

Do we really need a new internet ? Maybe not. Especially we don’t feel the need of one, currently. But did the generation before us ever felt the need of internet in the first place. If you delve into the past and asked any normal person living in late 70’s or early 80’s “If they feel the need of the structure consisting web of hyper-linked connected texts(internet)?”. The answer would be a “Big No”. That is the reason internet was not a instant hit when it came out, It took some years for it to become fairly successful among common non-techy people.

Now we can clearly see why we needed the advent of internet in first place. Just look at people now enjoying the string of luxuries which they didn’t even know two decades ago would and could exist. Now, we can feed a particular destination into the self driving cars , it can find the directions itself and also will take best route to reach that place with help of GPS. If we don’t have the time to go for shopping and want to buy clothes for friend’s wedding, We can order them on E-commerce websites like E-bay. We can Access movies and tv shows online on sites like Netflix and amazon prime video. We can showcase our talent on the websites like Youtube. We can video chat our far living friends on skype. As Elon musk said “We’re already a cyborg, We have a digital version of ourself, a partial version of ourself online in the form of our emails, our social media, and all the things that we do. We already have small super powers”. We have extensions of ourselves living virtually with us in the form Apps we use in our smart phones.

So the basic point is that we don’t have to feel the need of some luxuries for the innovation to take place . Innovation will provide us with the luxuries that we never even in our wildest dreams ever felt the need of. Maybe a virtual reality devices in future will be giving us simulation of houses that we are interested in. So that we can live in simulation for some duration to decide more clearly whether we should buy it or not. We don’t need to feel the need of new version of internet but it might give us some of the most important luxuries in Future.

untitled.png

Why, New Internet might be Crucial ?

Let’s look at industrial revolution whose basic driving force was electricity and fuels (on which machines ran). Is electricity and fuel free ? No, As it should be but they are available to everyone in free markets and anyone can get access to them with money for their own purpose. So industrial revolution worked upon level playing field.

Currently , we are on the verge of new Revolution that is A.I. Revolution . But, Is their a level playing field for such revolution ? Answer to that is “Big NO”. The Driving force for A.I. revolution will be “Data”. The More Data you have the smarter you’re A.I. will be. So is data free ? Of Course it is free for some organizations that we use and provide data to in return of their service . Organizations like Google, Amazon, Microsoft and Facebook collects billions of terabytes of data that their users provide them for free. All these Organizations are called as “Stacks”. Stacks have abundance of data and they are using it to feed their A.I.’s . And more data you have Smarter your A.I’s are. The amount of data these organizations have their A.I’s would not be simple A.I.’s they are going to be superior breed of A.I.’s called as super A.I’s . So there is no level playing field in the A.I. revolution. This will only expand the already huge financial gap to another extreme. As we know currently the wealth of only 3 percent people of earth equals the wealth of other 97 percent. So what can we do currently to counter this situation.

We need a new Internet .

LandingPage_BlockChain

Required attributes in the new internet !

  • Decentralized system :- The system should be Decentralized. No system should control or order other systems. Every system should be of equal Value.
  • Democratic system :- The majority consensus should be given importance.
  • Data security :- Data uploaded or created by the users should be secure and only accessible by anyone else if permission is given by the users.
  • Wealth for Data :- User data should be of some value, May be in crypto currency(Bitcoin) terms and exchange of data should also mean exchange of wealth.
  • Faster version would be cool
  • Shared memory :- All accumulated data should be shared in every memory . So that data is owned by everyone and yet by no one.

There is this paper Titled “Interplanetary file System “ by Juan benet which proposed a system like this .

Introduction to Interplanetary file system !

1_Is2-SBXlWiPmRqUX1FPECQ.jpg

There have been many attempts at constructing a global distributed file system. For example :- Project Xanadu . Among the academic attempts, AFS has succeeded widely and is still in use today. Outside of academics, the most successful systems have been peer-to-peer file-sharing applications primarily geared toward large media (audio and video). Most notably, Napster, KaZaA, and BitTorrent deployed large file distribution systems supporting over 100 million simultaneous users. Even today, BitTorrent maintains a massive deployment where tens of millions of nodes churn daily . These applications saw greater numbers of users and files distributed than their academic file system counterparts. However, the applications were not designed as infrastructure to be built upon. While there have been successful repurposings1, no general file-system has emerged that offers global, low-latency, and decentralized distribution. Perhaps this is because a “good enough” system for most use cases already exists: HTTP. By far, HTTP is the most successful “distributed system of files” ever deployed. Coupled with the browser, HTTP has had enormous technical and social impact. It has become the de facto way to transmit files across the internet. Yet, it fails to take advantage of dozens of brilliant file distribution techniques invented in the last fifteen years. From one prespective, evolving Web infrastructure is near-impossible, given the number of backwards compatibility constraints and the number of strong parties invested in the current model. But from another perspective, new protocols have emerged and gained wide use since the emergence of HTTP. What is lacking is upgrading design: enhancing the current HTTP web, and introducing new functionality without degrading user experience. Industry has gotten away with using HTTP this long because moving small files around is relatively cheap, even for small organizations with lots of traffic. But we are entering a new era of data distribution with new challenges:

(a) Hosting and distributing petabyte datasets,

(b) Computing on large data across organizations,

(c) High-volume highdefinition on-demand or real-time media streams,

(d) Versioning and linking of massive datasets,

(e) Preventing accidental disappearance of important files, and more.

Many of these can be boiled down to “lots of data, accessible everywhere.”

Git, the distributed source code version control system, developed many useful ways to model and implement distributed data operations. The Git toolchain offers versatile versioning functionality that large file distribution systems severely lack. New solutions inspired by Git are emerging, such as Camlistore, a personal file storage system, and Dat [a data collaboration toolchain and dataset package manager. Git has already influenced distributed filesystem design , as its content addressed Merkle DAG data model enables powerful file distribution strategies. What remains to be explored is how this data structure can influence the design of high-throughput oriented file systems, and how it might upgrade the Web itself.

What is IPFS ?

 

The InterPlanetary File System (IPFS) is a peer-to-peer distributed file system that seeks to connect all computing devices with the same system of files. In some ways, IPFS is similar to the Web, but IPFS could be seen as a single BitTorrent swarm, exchanging objects within one Git repository. In other words, IPFS provides a high throughput content-addressed block storage model, with content-addressed hyper links. This forms a generalized Merkle DAG, a data structure upon which one can build versioned file systems, blockchains, and even a Permanent Web. IPFS combines a distributed hash table(DHT), an incentivized block exchange, and a self-certifying namespace. IPFS has no single point of failure, and nodes do not need to trust each other.

Screen-Shot-2015-08-24-at-7.44.52-PM.png

Aim :- To help us move towards a permanent decentralized web.

Attributes of IPFS :-

  • Permanence :- Web whose links never dies .
  • Decentralized :- No single entity controls your data. If any node fails system still works.
  • Secure :- Your Data is secured and can’t be misused by anyone. All Data is owned by everyone yet by No one.
  • Fast :- As Data is distributed to several nodes. Retrieval of data will be easy.
  • Robust :- File transfer can be done by nearest system that has it.
  • Credit permanence :- Credit is also permanent on IPFS. So creativity of users is safe.

Working of IPFS :-

  • Upon downloading an IPFS client, a user is able to add any data to it .
  • In return receives a hash.
  • The user can then access that data via its hash.
  • IPFS Uses Content addresses based System instead of Internet protocol(IP) based system.

Content addressed System :-

  • In an IP-addressed system, if a nameserver fails, effectively so does all of its data.
  • Content addressing is a much more efficient form of addressing data because it doesn’t rely on a single server’s uptime to access data.
  • When you request data from a content address, you’ll receive it faster than you would IP-addressed data because it will route from whoever owns a copy of that content address closest to you.

interplanetary-file-system

Architecture of IPFS :-

  • IPFS uses a DHT to store data. It’s based on the popular Kademlia DHT, and it borrows ideas from Chord and BitTorrent’s DHT.
  • When users upload data to IPFS, that data is copied among a certain number of other nodes, so even if one node fails, the data remains.
  • On top of that—and like BitTorrent—the more nodes that need the data, the more resilient it becomes as they each share the copy they download.
  • Chord’s killer feature was its DHT circles, which created “chords” to maximize DHT lookups among nodes across the globe that were in close proximity to one another within larger chords. So, the globe would look like a series of increasingly larger chords and lookups would benefit from this efficiency, hopping between chords where necessary.
  • IPFS Uses a data structure called merkleDAG.

master.png

Distribution hash tables (DHT) :-

 

  • Distributed Hash Tables (DHTs) have taken off in popularity in the past decade. They distribute not only copies of the data, but also the indexing functions that enable the data to be found, ensuring resiliency. Early peer-to-peer (P2P) filesharing programs like KaZaA, Napster, and Gnutella used their own versions of DHTs with varying levels of decentralization. Some had centralized trackers to monitor the movement of all data and some (like Napster) had central sources that all data had to go through, leaving them with a single point of failure (in this case, due to legal action).
  • The first implementation of a DHT to really take off was BitTorrent. BitTorrent is still used by more than 300 million users. Despite having a decentralized data store (the BitTorrent Mainline DHT), it still depends on centralized trackers (like Pirate Bay) to monitor the network. Sites like Pirate Bay are regularly shut down by legal action, so even with BitTorrent’s data resiliency, it still has some points of failure. If we use BitTorrent’s DHT to store our dapp’s data, that would be great, right? BitTorrent doesn’t just offer a decentralized data store; it offers a data distribution protocol that maximizes bandwidth via a tit-for-tat strategy between seeders and leechers.
  • BitTorrent’s data transfer protocol is even faster than the Web’s, and as such it’s become the de facto method of transferring large datasets like HD movies over the Web. The problem with using BitTorrent as a data store is that there is not enough incentive to store your data for the long term among nodes. The network is set up to prioritize files with high demand—people have to want your data for it to be replicated and continually stored in the network. In contrast, when using a reputable central server like Amazon Web Services, you know that your data is going to continue to exist even if you are the only user of the data, because their reputation is at stake, they are contractually obligated to do so, and they don’t depend on others needing the data to store it.

Kademlia DHT :-

 

Kademlia DHT is a popular DHT that provides:

  •  Efficient lookup through massive networks: queries on average contact log2(n)e nodes. (e.g. 20 hops for a network of 10,000,000 nodes).
  •  Low coordination overhead: it optimizes the number of control messages it sends to other nodes.
  •  Resistance to various attacks by preferring long-lived nodes.
  •  Wide usage in peer-to-peer applications, including Gnutella and BitTorrent, forming networks of over 20 million nodes.

Fig-1-Locating-a-node-in-the-Kademlia-overlay

Coral DHT :-

 

Coral DSHT While some peer-to-peer filesystems store data blocks directly in DHTs, this“wastes storage and bandwidth, as data must be stored at nodes where it is not needed” . The Coral DSHT extends Kademlia in three particularly important ways:

  • Kademlia stores values in nodes whose ids are“nearest” (using XOR-distance) to the key. This does not take into account application data locality, ignores “far” nodes that may already have the data, and forces“nearest” nodes to store it, whether they need it or not. This wastes significant storage and bandwith. Instead, Coral stores addresses to peers who can provide the data blocks.
  • Coral relaxes the DHT API from get_value(key) to get_any_values(key) (the “sloppy” in DSHT). This still works since Coral users only need a single (working) peer, not the complete list. In return, Coral can distribute only subsets of the values to the “nearest” nodes, avoiding hot-spots (overloading all the nearest nodes when a key becomes popular).
  • Additionally, Coral organizes a hierarchy of separate DSHTs called clusters depending on region and size. This enables nodes to query peers in their region first, “finding nearby data without querying distant nodes” and greatly reducing the latency of lookups.

S/Kademlia :-

 

S/Kademlia extends Kademlia to protect against malicious attacks in two particularly important ways:

  • S/Kademlia provides schemes to secure NodeId generation, and prevent Sybill attacks. It requires nodes to create a PKI key pair, derive their identity from it, and sign their messages to each other. One scheme includes a proof-of-work crypto puzzle to make generating Sybills expensive.
  • S/Kademlia nodes lookup values over disjoint paths, in order to ensure honest nodes can connect to each other in the presence of a large fraction of adversaries in the network. S/Kademlia achieves a success rate of 0.85 even with an adversarial fraction as large as half of the nodes.

IPFS Protocol Structure :-

The IPFS Protocol is divided into a stack of sub-protocols responsible for different functionality:

  • Identities – manage node identity generation and verification.
  • Network – manages connections to other peers, uses various underlying network protocols. Configurable.
  • Routing – maintains information to locate specific peers and objects. Responds to both local and remote queries. Defaults to a DHT, but is swappable.
  • Exchange – a novel block exchange protocol (BitSwap) that governs efficient block distribution. Modelled as a market, weakly incentivizes data replication. Trade Strategies swappable.
  • Objects – a Merkle DAG of content-addressed immutable objects with links. Used to represent arbitrary datastructures, e.g. file hierarchies and communication systems.
  • Files – versioned file system hierarchy inspired by Git.
  • Naming – A self-certifying mutable name system.

68747470733a2f2f697066732e696f2f697066732f516d514a363850464d4464417367435a76413155567a7a6e3138617356636637485676434467706a695343417365.png

Merkle DAG (Directed acyclic graph ):-

scarface.png

  • A merkleDAG is a simple flexible data structure that can be conceptualized as a series of nodes connected to each other.It’s a directed, acyclic graph (DAG).
  • A merkleDAG can look like a linked list or a tree.
  • When adding data to the DHT, the system generates an SHA-256 multihash public-private key pair, and the user gets both.
  • Developers can link hashes programmatically to form their own mini-merkleDAGs
  • All data in IPFS forms the same generalized merkleDAG consisting of all nodes.
  • All data on IPFS is public, so it’s the users’ responsibility to encrypt their data accordingly.
  • The private keys, in addition to allowing access to the data, can prove ownership.

Filecoin(based on bitswap) :-

  • Filecoin is used to pay miners (nodes that store data) using a novel value-for-data mechanism called BitSwap.
  • Cryptocurrency makes sense here: its value transfer is fast and it allows for micropayments to pay for every correlated byte of storage.
  • Filecoin is currently in development.
  • Eventually, all uploads and downloads will require Filecoin.
  • Filecoin will most likely be an asset built directly on Bitcoin’s blockchain, so users can just use their Bitcoin to pay for storage.

images

Git versioned Filesystem :-

  • IPFS borrows from Git’s version-control model to version all data.
  • Git uses a DAG to model versions of data and IPFS uses it to give structure to the entire system.
  • Users can see the version history of their data (or any data to which they have decrypted access).

Probable Use-cases :-

IPFS is designed to be used in a number of different ways.

  • As a mounted global filesystem, under /ipfs and /ipns.
  • As a mounted personal sync folder that automatically versions, publishes, and backs up any writes.
  • As an encrypted file or data sharing system.
  • As a versioned package manager for all software.
  • As the root filesystem of a Virtual Machine.
  • As the boot filesystem of a VM (under a hypervisor).
  • As a database: applications can write directly to the Merkle DAG data model and get all the versioning, caching, and distribution IPFS provides.
  • As a linked (and encrypted) communications platform.
  • As a new Permanent Web where links do not die.

Competitors :-

There are other notable contenders in the space, as well.

Ethereum Swarm :-

images1

Ethereum is working to build a general-purpose (Turing-complete) blockchain computing language, including decentralized storage. As of this writing (2016), their efforts are focused on securing the DAO (in their usage, “Democratic Autonomous Organization”) and storage has been put on the back burner.

StorJ :-

untitled1.png

StorJ has garnered a lot of hype lately; it has pre-mined a lot of StorJcoins and has made some pretty designs. The designs are neat—it won an Austin hackathon—and the group seems to know what it’s talking about. Despite all of this,         more than a year post-hackathon it is still vaporware.

Maidsafe :-

Maidsafe, like Ethereum, is trying to do many things. They aren’t using proof-of-work and aim to create a decentralized platform for computing, storage, and currency. They’ve been working on their platform for six years and it seems like the project hasn’t gained that much traction.

untitled2.png

Conclusion :-

So, IPFS takes the best ideas from Git, DHTs, SFS, BitTorrent, and Bitcoin and combines them to create a decentralized data-storage network. IPFS hopes to one day replace the HTTP:// protocol of the Web with IPFS://, but they can work in unison as well in several ways that we’ll get into when we begin talking about implementation.

IPFS is the most robust, thought-through solution to decentralized storage out of all the cryptocurrency projects.

References :-