Bitcoin Analytics: The Principles of Network Development, Part 1

clock-icon-white  6 min read

When it first appeared in 2008, Bitcoin revolutionized the idea of digital currencies; today, it is accepted by top online retailers, including Overstock, and considered by many to be the currency of the future.

This five-part series from SoftServe’s Data Science Group offers a short history of the currency network, uses graph analysis to identify the most influential nodes, patterns, and changes over time, and explores statistical analysis that sheds light on current user behavior.

Before Bitcoin

David Chaum pioneered the idea of digital cash in 1982 in Blind Signatures for Untraceable Payments, but it took more than 25 years for a practical implementation of a truly decentralized currency to take hold. (Earlier systems still relied on a central authority, i.e. a bank, to distribute currency and maintain ownership records.)

After Chaum, the first major milestone came when Wei Dai (who proposed “b-money”) and Nick Szabo (who coined the term BitGold in 2005) garnered wide-spread support for the idea of interpreting the solution of a cryptographic puzzle as something valuable – like a piece of precious metal – in 1998. With this approach, everyone could become a “digital gold miner,” eliminating the need for a bank to act as the source of money emission. Still, this system required a central entity to maintain ownership records.

"With digital cash, everyone could become a “digital gold miner,” eliminating the need for a bank to act as the source of money emission."

Early adopters realized that to completely eliminate the need of a central entity, the ledger that contains ownership records would also need to be distributed. However, this leads us to the fundamental and inherent risk of digital currencies in general and distributed digital currency in particular - double spending.

Because digital currency is not a tangible asset and could be easily copied, it would be possible to issue multiple transactions simultaneously, transferring the same coin to several recipients without detection. Abuse like this would usually be prevented by the bank in a centralized digital currency system, but it’s difficult to detect in a distributed environment.

In 1998, Nick Szabo (also referenced above) suggested employing Quorum systems to address this issue. This idea assumes that as long as the majority of any chosen group of peers (a quorum) is honest, the correct outcome will prevail. Still, this approach could be circumvented if a malicious entity convinced enough group members to subvert the group’s vote (called a Sybil attack); it’s also vulnerable to underlying technical issues common with distributed systems, like propagation delays.

These remaining issues were resolved by Satoshi Nakamoto, a person (or possibly a group of people) who presented the Bitcoin as we know it today.

Bitcoin Takes Shape

To appreciate Nakamoto’s contribution to the development of Bitcoin, we need to first understand how the currency changes hands.

Bitcoin’s cryptography is deeply embedded into its core functionality. It uses a digital signature to establish ownership of an electronic coin, which in turn is defined as a chain of digital signatures (meaning, every coin carries its ownership history). Each new transaction is made by “signing a coin” (with a combination of a private key of the current owner, a public key of the nex t one, and a hash of the previous transaction in the chain of ownership) and then broadcasting a message containing input/output addresses as well as the transaction amount.

Every node in the network receives this message, collects it into a block along with other transactions, and works on finding a difficult proof-of-work for that block. Once done, the block is broadcasted to other nodes – if all transactions inside are valid and the currency has not already been spent, the transaction is accepted and added to the distributed public ledger called the blockchain. The next block is created upon its hash.

Proof-of-work effectively introduces a one-CPU-one-vote limit, and binding it to a blockchain is Bitcoin’s solution to the problem of double spending. Any discrepancies (i.e. blockchain forks) are resolved by consensus. The logic says that if the majority of computing power in the network is controlled by honest nodes, the longest chain grows the fastest, rendering a Sybil attack impossible.

The nodes, or miners, are essential for maintaining the Bitcoin network. In addition to blockchain processing, they also source new coins, which they earn as a reward for successful block generation (this process is called mining).

Bitcoin network internals are hidden from regular users by software referred to as a wallet. A wallet can signify locally installed applications, web applications, or even paper folds with all the necessary data put into a QR code. The wallet does not contain actual coins.

"A virtually unlimited number of bitcoin addresses can be used in one wallet."

Instead, digital keys are stored inside and, since there’s no limit on how many of those can be stored, a virtually unlimited number of bitcoin addresses can be used in one wallet. Users should be cautious not to lose their keys though since all the associated coins wi ll be lost as well, both for the user and for the Bitcoin economy in general.

Bitcoin Data Analysis

Though Bitcoin has become more popular in recent years, research covering how the system actually works – and how people use it – still lags far behind. The purpose of this series is to analyze the state of the network using statistical analysis and graph theory. The data we used in our research was compiled in December 2015.

Our goal is to analyze the Bitcoin network user community based on their activity, wallet features, and connections while taking into account known groups, Bitcoin events, and general trends. Analysis like this helps us uncover behavior patterns from large data sets; in this case, it will help us better understand the hidden, underlying principles guiding the development of the Bitcoin network.

We used a WebBTC “database dump” as our primary data source, dividing our research into four major parts, which will be covered in subsequent posts in our series:

  • Bitcoin Network Graph
  • Bitcoin Components
  • Anonymity in Bitcoin
  • Structures in Bitcoin Network

In our final post, we’ll also explore the conclusions we’ve drawn from our research as well as how they may be used in a practical application today and for the basis of additional study.

Looking for more big thinking from SoftServe’s Data Science Group? Check out Anomaly Detection – Unsupervised Approach, a whitepaper illustrating how to use statistical analysis and abnormal deviations to identify informational security risks.