A Network Analyst's View of the Blockchain

This article was published on CoinDesk.

The blockchain is a decentralized, consensus-driven ledger of every successful bitcoin transaction to date. As of the 300,000th block, the ledger includes over 38 million transactions.

Aside from being a monumental technical achievement, the blockchain is a fascinating dataset. We can use it to create a transaction network that models the flow of bitcoins from the creation of the genesis block to the present day.

In this network, every node represents a transaction and every (directed) edge represents a flow of bitcoins from an output of one transaction to an input of another. This large, complex network has over 38 million nodes and 85 million edges.

Network Science

Network science is the study of complex networks. It provides theories, techniques and tools that help us understand the structure and evolution of a network. The bitcoin transaction network is a prime example. Its basic building block, the transaction, can be combined to produce complex transfers of value. This is reflected in the topological structure of the transaction network.

The network as a whole is too large and complex for most network visualization tools. However, we can measure various structural properties of the network. For example, transactions can be characterized by their varying numbers of inputs and outputs. But how are these numbers distributed in practice? In the transaction network, we can analyze the in- and out-degrees of the nodes. We can plot the in- and out-degree distributions. They show, for each possible degree, the number of times they occur in the network.

In both cases, we observe inverse relationships between these numbers. The lower the degree, the more frequently nodes with that degree occur; the higher the degree, the less frequently they occur. There are many outliers. The outlier in the out-degree distribution with out-degree equal to two is due to an abundance of transactions with exactly two outputs.

Giant Connected Component

Suppose we were able to visualize the entire bitcoin transaction network. It would probably resemble a "hairball". These visualizations suffer from cluttering and over-plotting to an extreme that makes them unusable for any practical purposes. However, they do provide one key piece of information. Are we dealing with one large connected component or several smaller connected components?

A connected component is a group of nodes and edges that are all connected to each other, either directly or indirectly. If a network has a giant connected component, this means that almost every node is reachable from almost every other node. If we ignore the direction of the edges in the bitcoin transaction network, then it does indeed contain a giant connected component covering over 99.9% of all nodes. The second largest connected component has just 71 nodes.

Fourteen Degrees of Separation

Six degrees of separation is the theory that everyone on the planet is connected to everyone else through a chain of acquaintances with no more than six hops. In network science terminology, this translates to the theory that the social network of the human race has diameter six. Facebook reported that the effective diameter (covering 90% of all pairs of users) of its social network is five and is decreasing with time.

The equivalent number for the bitcoin transaction network is fourteen and is increasing with time. That is, across 90% of all pairs of transactions, the shortest path between them in the transaction network (ignoring directionality) is at most fourteen hops. The increasing value is likely due to the fact that, unlike the Facebook social network, there is no preferential attachment. New nodes are connected to existing nodes whose corresponding transactions are not yet fully redeemed. In other words, the transaction network is growing at the frontier only.

The First Currency with a Ledger

Surprisingly, Bitcoin is not the first currency with a ledger from which we can model the transfer of value. The Tomamae-cho community currency was introduced into the Hokkaido Prefecture in Japan for a three-month period during 2004-05 in a bid to revitalize the local economy. The Tomamae-cho system involved gift-certificates that were re-usable and legally redeemable into yen. There was an entry space on the reverse of each certificate for recipients to record transaction dates, their names and addresses, and the purposes of use, up to a maximum of five recipients.

Researchers collected these certificates in order to derive a network structure that represented the flow of currency during the period. They showed, for example, that the network had small world properties.

The blockchain is a digital equivalent to the Tomamae-cho certificates. It does not contain information such as names and addresses or the purposes of use. However, it has other properties that make it suitable for analyzing the transfer of value including its accuracy, size, and completeness.

The application of network analysis to the blockchain is an under-explored, yet fascinating area. There are a handful of academic studies but very little in the way of software and tools to open it up to a wider audience. QuantaBytes is developing a suite of tools for analyzing and visualizing Bitcoin's blockchain. By understanding the structure and evolution of the blockchain, we can better understand Bitcoin's usage patterns, economy, and the growth of the system as a whole.