Hands-On Blockchain for Python Developers

By Arjuna Sky Kok
  • Instant online access to over 7,500+ books and videos
  • Constantly updated with 100+ new titles each month
  • Breadth and depth in over 1,000+ technologies
  1. Introduction to Blockchain Programming

About this book

Blockchain is seen as the main technological solution that works as a public ledger for all cryptocurrency transactions. This book serves as a practical guide to developing a full-fledged decentralized application with Python to interact with the various building blocks of blockchain applications.

Hands-On Blockchain for Python Developers starts by demonstrating how blockchain technology and cryptocurrency hashing works. You will understand the fundamentals and benefits of smart contracts such as censorship resistance and transaction accuracy. As you steadily progress, you'll go on to build smart contracts using Vyper, which has a similar syntax to Python. This experience will further help you unravel the other benefits of smart contracts, including reliable storage and backup, and efficiency. You'll also use web3.py to interact with smart contracts and leverage the power of both the web3.py and Populus framework to build decentralized applications that offer security and seamless integration with cryptocurrencies. As you explore later chapters, you'll learn how to create your own token on top of Ethereum and build a cryptocurrency wallet graphical user interface (GUI) that can handle Ethereum and Ethereum Request for Comments (ERC-20) tokens using the PySide2 library. This will enable users to seamlessly store, send, and receive digital money. Toward the end, you'll implement InterPlanetary File System (IPFS) technology in your decentralized application to provide a peer-to-peer filesystem that can store and expose media.

By the end of this book, you'll be well-versed in blockchain programming and be able to build end-to-end decentralized applications on a range of domains using Python.

Publication date:
February 2019
Publisher
Packt
Pages
450
ISBN
9781788627856

 

Chapter 1. Introduction to Blockchain Programming

In this book, we'll learn blockchain programming so that you can become a force to be reckoned with when finding blockchain opportunities. To achieve this, you need to begin by understanding blockchain technology and what it entails. In this chapter, we will learn what blockchain technology is. How does blockchain empower Bitcoin and Ethereum? We will get an intuitive understanding of blockchain technology. We will also replicate some basic functions behind blockchain.

The following topics will be covered in this chapter:

  • The rise of cryptocurrency and blockchain
  • Blockchain technology
  • Cryptography
  • The hashing function
  • Consensus
  • Coding on the blockchain
 

The rise of cryptocurrency and blockchain


Assuming that you didn't live a secluded life as a hermit on a mountain in 2017, you would have heard all about cryptocurrency, especially Bitcoin. You didn't have to look far to hear about the soaring popularity of this topic, its terminology, and its growth in value. At this point, other cryptocurrencies also began to grow, making way for headlines such as Ethereum reaches $1,000! During this craze, people discussed everything about cryptocurrency, from the swinging price to the technology behind it, which is blockchain.

 

Blockchain was regarded as the technology that would bring the dawn of a new era of justice and prosperity for mankind. It would democratize wealth. It would take the power away from the oligarchy and give it back to the people. It would protect the data of the people. Then came 2018, and cryptocurrency went down. The party was over. Bitcoin now sits at $6,000, while Ethereum sits at less than $400.

However, despite the fact that the hype surrounding cryptocurrency had died down, it still continues to be a regular point of discussion. Blockchain conferences and meetups are cropping up in many places, while investments keep pouring into blockchain startups. Andreessen Horowitz, a giant name in Silicon Valley, secured as much as $300 million from its limited partner in a dedicated blockchain fund. [1] In this case, the opportunities lie where the money flows into. Katheryn Griffith Hill, a lead recruiter at Blockchain Developers, claims that [2] there are currently fourteen blockchain developer positions available for every blockchain developer. In addition, a friend of mine who attended a local blockchain event in Jakarta commented on this, stating that I could see around one hundred audience members, but there were only around four or five developers. 50% of the audience were investors. There are people who want to put money into blockchain, but there are fewer people who are capable of developing the product.

Blockchain started to be used as a payment solution without the middleman, namely Bitcoin. Then, people found out that blockchain has some other properties that are interesting. First, it is transparent, meaning people can audit it to check whether there is money laundering going on or not. Second, it gives to some extent privacy for users, which can be used to avoid profiling.

Then, after Ethereum was released, people suddenly became creative with how to apply blockchain in real life. From creating a token to represent ownership of something, such as an autonomous organization or payment with full privacy, to digital assets that cannot be duplicated (unlike MP3 files).

 

Blockchain technology


Most people know Bitcoin exists because of blockchain. But what is blockchain? It is an append-only database that consists of blocks that are linked by hashing. Here, each block contains many transactions of transferring value (but could be other things) between participants secured by cryptography; a consensus between many nodes that hold an identical database decides on which new block is to be appended next.

You don't have to understand the definition at this point; those are a lot of words to chew on! First, I'll explain blockchain to you so that you can adjust to this new knowledge as we move through this book.

Going back to the definition of blockchain, we can summarize the definition as an append-only database. Once you put something into the database, it cannot be changed; there is no Undo. We'll talk about the ramifications of this feature in Chapter 2, Smart Contract Fundamentals. This definition entails many things and opens up a whole new world.

So, what can you put into this append-only database? It depends on the cryptocurrency. For Bitcoin, you can store the transactions of transferring value. For example, Nelson sends one Bitcoin to Dian. However, we accumulate many transactions into one block before appending them to the database. For Ethereum, the things that you can put into the append-only database are richer. This not only includes the transaction of transferring value—it could also be a change of state. What I mean by state here is really general. For example, a queue for buying a ticket for a show can have a state. This state can be empty or full. Similarly to Bitcoin, in Ethereum, you gather all the transactions before appending them together in this append-only database.

To make it clearer, we put all these transactions into the block before appending them to the append-only database. Aside from the list of transactions, we store other things in this block, such as the time when we append the block into the append-only database, the target's difficulty (don't worry if you don't know about this), and the parent's hash (I'll explain this shortly), among many other things.

Now that you understand the block element of the blockchain, let's look at the chain element. As previously explained, aside from the list of transactions, we also put the parent's hash in the block. But for now, let's just use a simple ID to indicate the parent instead of using a hash. Parent id is just the previous block id. Here, think of the stack. In the beginning, there is no block. Instead, we put Block A, which has three transactions: Transaction 1, Transaction 2, and Transaction 3. Since Block A is the first block, it has no parent. We then apply Block B to Block A, which consists of two transactions: Transaction 4 and Transaction 5. Block B is not the first one in this blockchain. Consequently, we set the parent section in Block B as the Block A id because Block A is the parent of Block B. Then, we put Block C in the blockchain, which has two transactions: Transaction 6 and Transaction 7.

 

The parent section in Block C would be the Block B id, and so on. To simplify things, we increment the id from 0 by 1 for every new block:

Let's implement a database to record the history of what people like and hate. This means that when you said you like cats at one point in history, you won't be able to change that history. You may add new history when you change your mind (for example, if you then hate cats), but that won't change the fact that you liked them in the past. So, we can see that in the past you liked cats, but now you hate them. We want to make this database full of integrity and secure against cheating. Take a look at the following code block:

class Block:
    id = None
    history = None
    parent_id = None

block_A = Block()
block_A.id = 1
block_A.history = 'Nelson likes cat'

block_B = Block()
block_B.id = 2
block_B.history = 'Marie likes dog'
block_B.parent_id = block_A.id

block_C = Block()
block_C.id = 3
block_C.history = 'Sky hates dog'
block_C.parent_id = block_B.id

 

 

 

 

 

 

If you studied computer science, you will recognize this data structure, which is called a linked list. Now, there is a problem. Say Marie hates Nelson and wants to paint Nelson in a negative light. Marie can do this by changing the history of block A:

block_A.history = 'Nelson hates cat'

This is unfair to Nelson, who is a big fan of cats. So, we need to add a way in which only Nelson can write the history of his own preferences. The way to do this is by using a private key and a public key.

Signing data in blockchain

In blockchain, we use two keys to sign data, to authenticate a message and protect it from being altered by unauthorized users. The two keys are as follows:

  • Private key
  • Public key

The secrecy of the private key is guarded and it is not made known to the public. On the other hand, you let the public key be given out in public. You tell everyone, hey, this is my public key.

Let's generate the private key. To do this, we need openssl software. You can install this by doing the following:

$ sudo apt-get install openssl

So, Nelson generates the private key, which is the nelsonkey.pem file. He must keep this key secret. It is generated as follows:

$ openssl genrsa -out nelsonkey.pem 1024

From the private key, Nelson generates the public key:

$ openssl rsa -in nelsonkey.pem -pubout > nelsonkey.pub

Nelson can share this public key, nelsonkey.pub, with everyone. Now, in the real world we could set up a simple dictionary of the public key and its owner as follows:

{
'Nelson': 'nelsonkey.pub',
'Marie': 'mariekey.pub',
'Sky': 'skykey.pub'
}

We will now look at how Nelson can prove that he is the only one who can make changes to his history.

First, let's create a Python virtual environment:

$ python3 -m venv blockchain
$ source blockchain/bin/activate
(blockchain) $

Next, install the library:

(blockchain) $ pip install --upgrade pip
(blockchain) $ pip install wheel
(blockchain) $ pip install cryptography

This is the Python script that can be used to sign the message. Name this script verify_message.py (refer to the code file in the following GitLab link for the full code: https://gitlab.com/arjunaskykok/hands-on-blockchain-for-python-developers/blob/master/chapter_01/verify_message.py):

from cryptography.hazmat.primitives import hashes
from cryptography.hazmat.primitives.asymmetric import padding
from cryptography.hazmat.backends import default_backend
from cryptography.hazmat.primitives.asymmetric import rsa
from cryptography.hazmat.primitives import serialization

# Generate private key
#private_key = rsa.generate_private_key(
# public_exponent=65537,
# key_size=2048,
# backend=default_backend()
#)
...
...

# Message validation executed by other people
public_key.verify(
    signature,
    message,
    padding.PSS(mgf=padding.MGF1(hashes.SHA256()),
                salt_length=padding.PSS.MAX_LENGTH),
    hashes.SHA256())

When executing this script, nothing will happen, as expected. This means that the message is verified with the signature from the public key. The signature can only be created by Nelson because you need the private key in order to create a signature. However, to verify the message with the signature, you only need the public key.

 

Let's take a look at a case in which Marie tries to falsify the facts with a script named falsify_message.py. Marie tries to put Nelson hates cat in the history database as follows:

from cryptography.hazmat.primitives import hashes
from cryptography.hazmat.primitives.asymmetric import padding
from cryptography.hazmat.backends import default_backend
from cryptography.hazmat.primitives.asymmetric import rsa
from cryptography.hazmat.primitives import serialization

message = b'Nelson hates cat'
signature = b'Fake Signature'

with open("nelsonkey.pub", "rb") as key_file:
    public_key = serialization.load_pem_public_key(
        key_file.read(),
        backend=default_backend())

public_key.verify(
 signature,
 message,
 padding.PSS(mgf=padding.MGF1(hashes.SHA256()),
                salt_length=padding.PSS.MAX_LENGTH),
    hashes.SHA256())

Here's how the verify method works. Nelson calculates the hash from the message, then encrypts it with his private key. The result is the signature. For example, if Sky wants to verify the signature, he has the message and the signature. He calculates the hash of the message. Then, he decrypts the signature using the public key. The result is compared to the hash of the message. If it is the same, then everything is well. If not, either the message has been altered or the private key used to sign the message is different.

When doing this, you would get the following output:

So, what does the signature look like? Go back to verify_message.py and append this line to the end of the file. Then, run the script again:

print(signature)

The signature looks like this:

Every message has a different signature, and it's impossible for Marie to guess the signature in order to falsify the message. So, with the private key and the public key, we can verify whether or not the message is indeed from someone authorized, even if we communicate on an unsecured channel.

So with the private key, Nelson could create a signature that is unique to the message it tries to sign:

Everyone in the world who has Nelson's public key can verify that Nelson did indeed write Message A. Nelson can prove he did write Message A by showing Signature A. Everyone can take those two inputs and verify the truth:

So, to validate whether or not it is Nelson who wrote Nelson likes cat, input the following (refer to the code file in the following GitLab link for the full code: https://gitlab.com/arjunaskykok/hands-on-blockchain-for-python-developers/blob/master/chapter_01/validate_message.py): 

# validate_message.py
from cryptography.hazmat.primitives import hashes
from cryptography.hazmat.primitives.asymmetric import padding
from cryptography.hazmat.backends import default_backend
from cryptography.hazmat.primitives.asymmetric import rsa
from cryptography.hazmat.primitives import serialization

def fetch_public_key(user):
    with open(user + "key.pub", "rb") as key_file:
        public_key = serialization.load_pem_public_key(
           key_file.read(),
           backend=default_backend())
    return public_key

# Message coming from user
message = b"Nelson likes cat"

# Signature coming from user, this is very specific to public key.
# Download the public key from Gitlab repository of this code so this signature matches the message.
# Otherwise, you should generate your own signature.
signature = 
...
...
    padding.PSS(mgf=padding.MGF1(hashes.SHA256()),
                salt_length=padding.PSS.MAX_LENGTH),
    hashes.SHA256())

From linked list to blockchain

Now we know that only Nelson can write Nelson likes cats or Nelson hates cats, we can be at peace. However, to make the tutorial code short, we won't integrate the validation using the private key and the public key. We assume only authorized people are able to write the history in the block. Take a look at the following code block:

>>> block_A.history = 'Nelson likes cat'

When that happens, we assume it's Nelson who wrote that history. So, what is the problem in recording data with a linked list?

 

The problem is that the data can be altered easily. Say Nelson wants to be a senator. If many people in his district don't like cats, they may not be happy with the fact that Nelson likes them. Consequently, Nelson wants to alter the history:

>>> block_A.history = 'Nelson hates cat'

Just like that, the history has been changed. We can avoid this way of cheating by recording all history in the block every day. So, when Nelson alters the database, we can compare the data in the blockchain today to the data in the blockchain yesterday. If it's different, we can confirm that something fishy is happening. That method could work, but let's see if we can come up with something better.

Let's upgrade our linked list to the blockchain. To do this, we add a new property in the Block class, which is the parent's hash:

import hashlib
import json

class Block:
    id = None
    history = None
    parent_id = None
    parent_hash = None

block_A = Block()
block_A.id = 1
block_A.history = 'Nelson likes cat'

block_B = Block()
block_B.id = 2
block_B.history = 'Marie likes dog'
block_B.parent_id = block_A.id
block_B.parent_hash = hashlib.sha256(json.dumps(block_A.__dict__).encode('utf-8')).hexdigest()

block_C = Block()
block_C.id = 3
block_C.history = 'Marie likes dog'
block_C.parent_id = block_B.id
block_C.parent_hash = hashlib.sha256(json.dumps(block_B.__dict__).encode('utf-8')).hexdigest()

 

 

 

 

 

 

 

 

Let's demonstrate what the hashlib() function does:

>>> print(block_B.__dict__)
{'parent_hash': '880baef90c77ae39d49f364ff1074043eccb78717ecec85e5897c282482012f1', 'history': 'Marie likes dog', 'id': 2, 'parent_id': 1}
>>> print(json.dumps(block_B.__dict__))
{"parent_hash": "880baef90c77ae39d49f364ff1074043eccb78717ecec85e5897c282482012f1", "parent_id": 1, "history": "Marie likes dog", "id": 2}
>>> print(json.dumps(block_B.__dict__).encode(‘utf-8'))
b'{"id": 2, "parent_hash": "69a1db9d3430aea08030058a6bd63788569f1fde05adceb1be6743538b03dadb", "parent_id": 1, "history": "Marie likes dog"}'
>>> print(hashlib.sha256(json.dumps(block_B.__dict__).encode('utf-8')))
<sha256 HASH object @ 0x7f58518e3ee0>
>>> print(hashlib.sha256(json.dumps(block_B.__dict__).encode('utf-8')).hexdigest())
25a7a88637c507d33ae1402ba6b0ee87eefe9c90e33e75c43d56858358f1704e

If we change the history of block_A, the following code look like this: 

>>> block_A.history = 'Nelson hates cat'

Again, the history has been changed just like that. However, this time there is a twist. We can verify that this change has occurred by printing the original parent's hash of block_C:

>>> print(block_C.parent_hash)
ca3d23274de8d89ada13fe52b6000afb87ee97622a3edfa3e9a473f76ca60b33

Now, let's recalculate the parent's hash of each block:

>>> block_B.parent_hash = hashlib.sha256(json.dumps(block_A.__dict__).encode('utf-8')).hexdigest()
>>> block_C.parent_hash = hashlib.sha256(json.dumps(block_B.__dict__).encode('utf-8')).hexdigest()
>>> print(block_C.parent_hash)
10b7d80f3ede91fdffeae4889279f3acbda32a0b9024efccc9c2318e2771e78c

These blocks are different. By looking at these, we can be very sure that the history has been altered. Consequently, Nelson would be caught red-handed. Now if Nelson wants to alter the history without getting caught, it is not enough to change the history in block_A anymore. Nelson needs to change all the parent_hash properties in every block (except block_A of course). This is tougher cheating. With three blocks only, Nelson needs to change two parent_hash properties. With a 1,000 blocks, Nelson needs to change 999 parent_hash properties!

 

 

 

 

 

 

 

Cryptography


The most popular use of blockchain is to create a cryptocurrency. As the word crypto is in cryptocurrency, you would expect that you need to master cryptography in order to become a blockchain programmer. That is not true. You only need to know two things about cryptography:

  • Private key and public key (asymmetric cryptography)
  • Hashing

These two have been explained in the previous part of this chapter. You don't need to know how to design a hashing algorithm or private key and public key algorithm. You only need to get an intuitive understanding of how they work and the implications of these technologies.

The implication of private keys and public keys is that it enables decentralized accounts. In a normal application, you have a username and password. These two fields enable someone to access their account. But having a private key and public key enables someone to have an account in a decentralized manner.

For hashing, it is a one-way function, meaning that given an input, you can get the output easily. But given an output, you couldn't get the input. A simple version of a one-way function would be this:

This is an addition process. If I tell you one of the outputs of this function is 999, and I ask you what the inputs are, you couldn't guess the answer. It could be anything from 1 and 998 to 500 and 499. A hashing function is something like that. The algorithm is clear as sky (you can read the algorithm of any hashing function on the internet), but it's hard to reverse the algorithm.

So, all you need to know about hashing is this: given input input you get this SHA-256 output (in hexadecimal): c96c6d5be8d08a12e7b5cdc1b207fa6b2430974c86803d8891675e76fd992c20. If you don't know the input, you couldn't get the input based on this output alone. Say you know the input input it is very prohibitive to find another input that produces the same output. We wouldn't even know whether such input exists or not.

That is all you need to know about cryptography when you become a blockchain developer. But that's only true if you become a certain type of blockchain developer, who creates a program on top of Ethereum. 

 

 

Symmetric and asymmetric cryptography

Symmetric cryptography uses the same key between sender and receiver. This key is used to encrypt and decrypt a message. For example, you want to create an encryption function to encrypt text. Symmetric cryptography could be as simple as adding 5 to the text to be encrypted. If A (or 65 in ASCII) is the text to be encrypted, then this encryption function will add 5 to 65. The encrypted text would be F (or 71 in ASCII). To decrypt it, you just subtract 5 from the encrypted text, F.

Asymmetric cryptography is a different beast. There are two keys: a public key and a private key. They are linked with a special mathematical relationship. If you encrypt a message with a public key, you can only decrypt it with a private key. If you encrypt a message with a private key, you can only decrypt it with a public key. There is no straight relationship as with symmetric keys (adding and subtracting the same number) between a public key and a private key. There are a couple of asymmetric cryptography algorithms. I'll explain the easiest one, the RSA algorithm.

Generate two prime numbers, called p and q. They should be really big numbers (with at least hundreds of digits), but for this example, we choose low numbers: 11 and 17. These are your private key. Don't let someone know these numbers:

 

n = p x q

n is a composite number. In our case, n is 187.

Then, we find e number, which should be relatively prime, with (p-1)x(q-1):

(p-1) x (q-1) = 160

Relatively prime means e and (p-1) x (q-1) cannot be factorized with any number except 1. There is no number other than 1 that we can divide them by without a remainder. So, e is 7. But, e can be 11 as well. For this example, we choose 7 for e.

e and n are your public key. You can tell these numbers to strangers you meet on the bus, your grandma, your friendly neighbor, or your date.

Let's say the message we want to encrypt is A. In the real world, encrypting a short message like this is not safe. We have to pad the short message. So, A would be something like xxxxxxxxxxxxxxxxxxxA. If you check the previous script to encrypt a message earlier in this chapter, you would see there is a padding function. But for this example, we would not pad the message.

 

The encryption function is this:

encrypted_message = messagee (mod n)

So, the encrypted_message would be 65 ** 7 % 187 = 142.

Before we are able to decrypt the message, we need to find the d number:

e x d = 1 (mod (p-1) x (q-1))

d is 23.

The decryption function is this:

decrypted_message = encrypted_messaged mod n

So, the decrypted_message would be 142 ** 23 % 187 = 65. 65 in ASCII is A.

Apparently, xy mod n is easy to calculate, but finding the y root of integer module n is really hard. We call this trapdoor permutation. Factorization of n to find p and q is really hard (generating a private key from a public key). But, finding n from p and q is easy (generating a public key from a private key). These properties enable asymmetric cryptography.

Compared to symmetric cryptography, asymmetric cryptography enables people to communicate securely without needing to exchange keys first. You have two keys (private key and public key). You throw the public key out to anyone. All you need to do is to protect the secrecy of the private key. The private key is like a password to your Bitcoin/Ethereum account. Creating an account in any cryptocurrency is just generating a private key. Your address (or your username in cryptocurrency) is derived from the public key. The public key itself can be derived from the private key. An example of Bitcoin's private key in Wallet Import Format (WIF) is this: 5K1vbDP1nxvVYPqdKB5wCVpM3y99MzNqMJXWTiffp7sRWyC7SrG.

It has 51 hexadecimal characters. Each character can have 16 combinations. So, the amount of private keys is as follows: 16 ^ 51 = 25711008708143844408671393477458601640355247900524685364822016 (it's not exactly this amount, because the first number of a private key in Bitcoin is always 5 in mainnet, but you get the idea). That is a huge number. So, the probability of someone finding another account that is filled with Bitcoin already when generating a private key with a strong random process is very, very low. But the kind of account generated by a private key and public key does not have a reset password feature.

 

If someone sends Bitcoin to your address, and you forgot your private key, then it's gone for good. So, while your public key is recorded on the blockchain that is kept in every Bitcoin node, people are not going to get the private key.

 

The hashing function


Hashing is a function that takes an input of any length and turns it into a fixed length output. So, to make this clearer, we can look at the following code example:

>>> import hashlib
>>> hashlib.sha256(b"hello").hexdigest()
'2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824'
>>> hashlib.sha256(b"a").hexdigest()
'ca978112ca1bbdcafac231b39a23dc4da786eff8147c4e72b9807785afee48bb'
>>> hashlib.sha256(b"hellohellohellohello").hexdigest()
'25b0b104a66b6a2ad14f899d190b043e45442d29a3c4ce71da2547e37adc68a9'

As you can see, the length of the input can be 1, 5, or even 20 characters, but the output will always be the length of 64 hexadecimal numeric characters. The output looks scrambled and it appears that there is no apparent link between the input and the output. However, if you give the same input, it will give the same output every time:

>>> hashlib.sha256(b"a").hexdigest()
'ca978112ca1bbdcafac231b39a23dc4da786eff8147c4e72b9807785afee48bb'
>>> hashlib.sha256(b"a").hexdigest()
'ca978112ca1bbdcafac231b39a23dc4da786eff8147c4e72b9807785afee48bb'

If you change the input by even just a character, the output would be totally different:

>>> hashlib.sha256(b"hello1").hexdigest()
'91e9240f415223982edc345532630710e94a7f52cd5f48f5ee1afc555078f0ab'
>>> hashlib.sha256(b"hello2").hexdigest()
'87298cc2f31fba73181ea2a9e6ef10dce21ed95e98bdac9c4e1504ea16f486e4'

Now that the output has a fixed length, which is 64 in this case, of course there will be two different inputs that have the same output.

 

Note

Here is the interesting thing: it is very prohibitive to find two different inputs that have the same output as this hashing function. Mission Impossible: even if you hijack all the computers in the world and make them run the hashing computation, it is unlikely that you would ever find two different inputs with the same output.

Not all hashing functions are safe though. SHA-1 already died in 2017. This means that people can find two different long strings that have the same output. In this example, we will use SHA-256.

The output of the hashing function can be used as a digital signature. Imagine you have a string with a length of 10 million (say you are writing a novel), and to make sure this novel is not tampered with, you tell all your potential readers that they have to count the 10 million characters in order to ensure that the novel isn't be corrupted. Nobody would do that. But with hashing, you can publish the output validation with only 64 characters (through Twitter, for example) and your potential readers can hash the novel that they buy/download and compare them to make sure that their novel is legit.

So, we add the parent's hash in the block class. This way, we keep the digital signature of the parent's block in our block. This means that if we are ever naughty and change the content of any block, the parent's hash in any child's block will be invalid, and you would get caught red-handed.

But can't you change the parent's hash of the children's block if you want to alter the content of any block? You can, obviously. However, the process of altering the content becomes more difficult. You have to have two steps. Now, imagine you have 10 blocks and you want to change the content in the first block:

  1. In this case, you have to change the parent's hash in its immediate child's block. But, alas, there are unseen ramifications with this. Technically speaking, the parent's hash in its immediate child is a part of the content in that block. That would mean that the parent's hash in its child (the grandchild of the first block) would be invalid.
  2. Now, you have to change that grandchild's parent's hash, but this affects the subsequent block, and so on. Now, you have to change all blocks' parent's hashes. For this, ten steps need to be taken. Using a parent's hash makes tampering much more difficult.

 

Proof of work

So, we have three participants in this case: Nelson, Marie, and Sky. But there is another type of participant too: the one who writes into the blockchain is called—in blockchain parlance—the miner. In order to put the transaction into the blockchain, the miner is required to do some work first.

Previously, we had three blocks (block_A, block_B, and block_C), but now we have a candidate block (block_D), which we want to add into the blockchain as follows:

block_D = Block()
block_D.id = 4
block_D.history = 'Sky loves turtle'
block_D.parent_id = block_C.id

But instead of adding block_D to the blockchain just like that, we first require the miner to do some puzzle work. We serialize that block and ask the miner to apply an extra string, which, when appended to the serialization string of that block, will show the hash output with at least five zeros in the front, if it is hashed.

Those are a lot of words to chew on. First things first, we serialize the block:

import json
block_serialized = json.dumps(block_D.__dict__).encode('utf-8')
print(block_serialized)
b'{"history": "Sky loves turtle", "parent_id": 3, "id": 4}'

If the serialized block is hashed, what does it mean if we want the hash output to have at least five zeros at the front? It means that we want the output to look like this:

00000aa21def23ee175073c6b3c89b96cfe618b6083dae98d2a92c919c1329be

Alternatively, we want it to look like this:

00000be7b5347509c9df55ca35d27091b41a93acb2afd1447d1cc3e4b70c96ab

So, the puzzle is something like this:

string serialization + answer = hash output with (at least) 5 leading zeros

 

 

 

 

The miner needs to guess the correct answer. If this puzzle is converted to Python code, it would be something like this:

answer = ?
input = b'{"history": "Sky loves turtle", "parent_id": 3, "id": 4}' + answer
output = hashlib.sha256(input).hexdigest()
// output needs to be 00000???????????????????????????????????????????????????????????

So, how could the miner solve a problem like this? We can use brute force:

import hashlib

payload = b'{"history": "Sky loves turtle", "parent_id": 3, "id": 4}'
for i in range(10000000):
  nonce = str(i).encode('utf-8')
  result = hashlib.sha256(payload + nonce).hexdigest()
  if result[0:5] == '00000':
    print(i)
    print(result)
    break

The result would therefore be as follows:

184798
00000ae01f4cd7806e2a1fccd72fb18679cb07ede3a2a7ef028a0ecfd4aec153

This means that the answer is 184798, or the hash output of {"history": "Sky loves turtle", "parent_id": 3, "id": 4}184798 is the one that has five leading zeros. In that simple script, we iterate from 0 to 9999999 and append that into the input. This is a naive method, but it works. Of course, you could also append with characters other than numbers, such as a, b, or c.

Now, try to increase the number of leading zeros to six, or even ten. In this case, can you find the hash output? If there is no output, you could increase the range limit from 10000000 to an even higher number, such as 1000000000000. Once you get an appreciation of the hard work that goes into this, try to comprehend this: Bitcoin required around 18 leading zeros in the hash output at the time that this book was being written. The number of leading zeros is not static and changes according to the situation (but you don't need to worry about this).

So, why do we need proof of work? We need to take a look at the idea of consensus first.

 

Consensus


As we can see, the hashing function makes history tampering hard, but not too hard. Even if we have a blockchain that consists of 1000 blocks, it would be trivial to alter the content of the first block and change the 999 parent hashes on the other blocks with recent computers. So, to ensure that bad people cannot alter the history (or at least make it very hard), we distribute this append-only database to everyone who wants to keep it (let's call them miners). Say there are ten miners. In this case, you cannot just alter the blockchain in your copy because the other nine miners who would scold, saying something like hey, our records say history A but your record says B. In this case, the majority wins.

However, consensus is not just a case of choosing which blockchain has been chosen by the majority. The problem starts when we want to add a new block to the blockchain. Where do we start? How do we do it? The answer is that we broadcast. When we broadcast the candidate block that contains a new transaction, it will not reach every miner at the same time. You may reach the miner that stands beside you, but it will require time for your message to reach the miner that stands far away from you.

Here's where it gets interesting: the miner that stands far away from you may receive another new candidate block first. So, how do we synchronize all these things and make sure that the majority will have the same blockchain? The simple rule is to choose the longest chain. So if you are a miner in the middle, you may receive two different candidate blocks at the same time, as shown in the following figure:

You get this from the West side:

block_E = Block()
block_E.id = 5
block_E.history = 'Sherly likes fish'
block_E.parent_id = block_D.id

And you get this from the East side:

block_E = Block()
block_E.id = 5
block_E.history = 'Johny likes shrimp'
block_E.parent_id = block_D.id

So, we will keep both versions of block_E. Our blockchain now has a branch. However, in a short time, other blocks have arrived from the East side. Here is the situation now:

This is from the West side:

block_E = Block()
block_E.id = 5
block_E.history = 'Sherly likes fish'
block_E.parent_id = block_D.id

This is from the East side:

block_E = Block()
block_E.id = 5
block_E.history = 'Johny likes shrimp'
block_E.parent_id = block_D.id

block_F = Block()
block_F.id = 6
block_F.history = 'Marie hates shark'
block_F.parent_id = block_E.id

block_G = Block()
block_G.id = 7
block_G.history = 'Sarah loves dog'
block_G.parent_id = block_F.id

By this point, we can get rid of the West side version of the blockchain because we chose the longer version.

Here comes the problem. Say Sherly hates sharks but Sherly wants to get votes from a district where most people only vote for a candidate who loves sharks. To get more votes, Sherly broadcasts a block containing the following lie:

block_E = Block()
block_E.id = 5
block_E.history = 'Sherly loves shark'
block_E.parent_id = block_D.id

All is fine and dandy. The voting session takes one day. After one day has passed, the blockchain has gotten another two blocks:

block_E = Block()
block_E.id = 5
block_E.history = 'Sherly loves shark'
block_E.parent_id = block_D.id

block_F = Block()
block_F.id = 6
block_F.history = 'Lin Dan hates crab'
block_F.parent_id = block_E.id

block_G = Block()
block_G.id = 7
block_G.history = 'Bruce Wayne loves bat'
block_G.parent_id = block_F.id

The following figure illustrates the three blocks:

Now, Sherly needs to get votes from another district where most people only vote for candidates who hate sharks. So, how can Sherly tamper with the blockchain to make this work in her favor? Sherly could broadcast four blocks!

block_E = Block()
block_E.id = 5
block_E.history = 'Sherly hates shark'
block_E.parent_id = block_D.id

block_F = Block()
block_F.id = 6
block_F.history = 'Sherly loves dog'
block_F.parent_id = block_E.id

block_G = Block()
block_G.id = 7
block_G.history = 'Sherly loves turtle'
block_G.parent_id = block_F.id

block_H = Block()
block_H.id = 8
block_H.history = 'Sherly loves unicorn'
block_H.parent_id = block_G.id

The following figure illustrates the four blocks:

The miner will choose the blockchain from Sherly instead of the previous blockchain they kept, which contains the history of Sherly loves sharks. So, Sherly has been able to change the history. This is what we call a double-spending attack.

We can prevent this through proof of work (an incentive for adding blocks). We explained proof of work earlier in this chapter, but we haven't explained the incentive system yet. An incentive means that if the miner successfully adds a new block to the blockchain, the system gives them a digital reward. We can integrate it into the code as follows:

import hashlib

payload = b'{"history": "Sky loves turtle", "parent_id": 3, "id": 4}'
for i in range(10000000):
  nonce = str(i).encode('utf-8')
  result = hashlib.sha256(payload + nonce).hexdigest()
  if result[0:5] == '00000':
    // We made it, time to claim the prize
    reward[miner_id] += 1
    print(i)
    print(result)
    break

If Sherly wants to alter the history (by replacing some blocks), she needs to spend some resources by solving four puzzles in a short time. By the times she finishes doing this, the blockchain kept by the most miners would have likely added more blocks, making it longer than Sherly's blockchain.

This is the case because most miners want to get that reward we spoke of in the most efficient manner possible. To do this, they would get a new candidate block, work hard to find the answer in proof of work, and then add it to the longest chain as quickly as possible. But, why do they want to add it to the longest chain and not another chain? This is because it secures their reward.

Say we have two versions of the blockchain. One has three blocks, while the other has eight blocks. The most sensible way to add a new block is to add it to the blockchain that has eight blocks. If someone adds it to the blockchain that has three blocks, it is more likely to get discarded. Consequently, the reward would be taken away from the miner. The longest chain attracts the most miners anyway, and you want to be in the blockchain version that is kept by more people.

Some miners could persist in adding the block to the blockchain with three blocks, while other miners could also persist in adding the block to the blockchain with eight blocks. We call this a hard fork. Most of the time, miners will stick to the blockchain that has the longest chain.

To change the history, Sherly will need to outgun at least more than 50% of the miners, which is impossible. The older the block, the more secure the history in that block is. Say one person needs 5 minutes to do the puzzle work. In this case, to replace the last five blocks in the blockchain, Sherly needs more than 25 minutes (because Sherly needs at least six blocks to convince miners to replace the last five blocks in their blockchain). But in those 25 minutes, other miners would keep adding new blocks to the most popular blockchain. So when 25 minutes have passed, the most popular blockchain would have gained an additional five blocks! Maybe the miners take a nap for an hour and don't add any more blocks. In this case, Sherly could accumulate six blocks to tamper with the most popular blockchain. However, the incentive embedded in the blockchain keeps the miners awake 24/7 as they want to get the reward as much as possible. Consequently, it's a losing battle for Sherly.

 

 

Coding on the blockchain


As this book is being written, the two most popular cryptocurrencies are Bitcoin and Ethereum (once in a while, Ripple will take second place). If you ask a simple question to someone who knows a lot about cryptocurrencies, you may get an answer: Bitcoin is just for sending money, but you can create a program on Ethereum. The program can be tokens, auction, or escrow, among many other things. But that is a half-truth. You can also create a program on Bitcoin. Usually, people call this program a script. In fact, it is a must to provide a script in a Bitcoin transaction. A transaction in Bitcoin can be mundane, so if I want to send you 1 BTC (a unit of currency in Bitcoin) and your Bitcoin address is Z, I need to upload a script like this into Bitcoin blockchain:

What's your public key? If the public key is hashed, does it equal Z? If yes, could you provide your private key to prove that you own this public key?

But it could be a little bit fancier. Let's say you want to require at least two signatures from four authorized signatures to unlock this account; you can do that with Bitcoin script. Think creative and you can come up with something like this:

This transaction is frozen until 5 years from now. Then business will be as usual, that the spender must provide public key and private key.

But a Bitcoin script is created with a simple programming language, incapable of even looping. It is stack-based. So, you put instructions: hash the public key, check a signature, and check the current time. Then, it will be executed on the Bitcoin node from left to right.

This means that you cannot create a fancy program, such as an auction, on Bitcoin.Bitcoin is designed just to store and transfer value (money). So it is purposely designed to avoid a complex program. In a Bitcoin node, every script is executed. Without a loop, a Bitcoin script will be so simple and you know when it will stop. But if you have a loop in a Bitcoin script, you don't know when it will stop. It could stop in the fourth iteration, or the millionth iteration, or in a far away future.

Some people were not satisfied with this limitation, so Ethereum was created. The programming language that you are equipped with on the Ethereum blockchain is much more sophisticated than the programming language in Bitcoin (there is a while or for construct). Technically speaking, you could create a program that runs forever in the Ethereum blockchain.

 

You can do what you can do in Bitcoin, which is store and transfer values. But there is so much more that you can do in Ethereum. You could create a voting program, an escrow service, an online auction, and even another cryptocurrency on top of it. So, people like to differentiate the currencies of Bitcoin (BTC) and Ethereum (ETH).  BTC is like digital gold. ETH is like oil and gas. Both are valuable, if we take that analogy. But, you can use oil and gas to create a whole new world, such as by creating plastics, fuel, and so on. On the other hand, what you can do with gold is quite limited, other than creating jewelry.

Creating a cryptocurrency on top of Ethereum is very easy. All you need is a weekend if you are a skilled programmer. You just inherit a class, and set your token's name and supply limit. Then, you compile it and launch to the Ethereum production blockchain, and you would have your own cryptocurrency. Prior to this, creating another cryptocurrency meant forking Bitcoin. The skill level required to do that is quite deep (C++, CMake, and replacing many parts of files in the Bitcoin core).

Other types of blockchain programmers

This chapter intended to give you an intuitive understanding of how blockchain works. However, it's not a complete scope of how it works. My explanation differs quite a lot from how Bitcoin works (and even Ethereum). Ethereum does not use SHA-256 for hashing; it commonly uses the Keccak-256 algorithm. In our case, we only put one history/transaction/payload in one block, but Bitcoin can save more than 1,000 transactions in one block. Then, we generate a private key and public key by using RSA cryptography, while Bitcoin and Ethereum use elliptic curve cryptography. In our case, the payload is history (who likes/loves/hates an animal), but in Bitcoin it's a transaction that has a dependency on the previous payload. In Ethereum itself, it's a state of programs. So, if you have variable a as equal to integer 5 in the payload, it could be something like change variable a to integer 7. In the Bitcoin consensus, we choose the blockchain that has the most hashing rate power, not the one that has the longest chain. For example, blockchain A has two blocks, but each block has the answer to solve the puzzle with 12 leading zeros, while blockchain B has ten blocks but each block has the answer to solving the puzzle with only five leading zeros. In this situation, blockchain A has the most hash rate power.

Now, we go back to the following questions: what does it mean to be a blockchain programmer? How many types of Blockchain programmers are there? What is the scope of this book?

 

Blockchain programming could mean that you are working on improving the state of Bitcoin or creating a fork of Bitcoin, such as Bitcoin Cash. You need C++ and Python. If you are creating a Bitcoin fork, such as Bitcoin Gold, you need to dig deeper into cryptography. In Bitcoin Gold, the developers changed the proof of work hashing function from SHA-256 to Equihash because Equihash is ASIC resistant. ASIC resistance means you cannot create a specific machine to do the hashing. You need a computer with a GPU to do the Equihash hashing function, but this book will not discuss that.

Furthermore, Blockchain programming could mean that you are working on improving the Ethereum Virtual Machine. You need Go, C++, or Python. You need to understand how to interact with low-level cryptographic library functions. An intuitive understanding of how basic cryptography works is not enough, but this book will not discuss that either.

Blockchain programming could mean that you are writing the program on top of Ethereum. You need Solidity or Vyper for this, which this book will discuss. You only need an intuitive understanding of how basic cryptography works. You have been abstracted away from low-level cryptography. Once in a while, you might use a hashing function in a program you write, but nothing fancy.

Blockchain programming could mean that you are writing a program to interact with the program on top of Ethereum, which sounds meta. But what you will need for this depends on the platform. If it is a mobile app, you need Kotlin, Java, Swift, Obj-C, or even C++. If it is a web frontend, you will most likely need JavaScript. Only an intuitive understanding of how basic cryptography works is needed. This book will discuss some of this.

This is the same as if I asked you, what does it entail when someone wants to become a web developer? The answer is quite diverse. Should I learn Ruby, Java, PHP, or Python? Should I learn Ruby on Rails, Laravel, or Django?

This book is going to teach you how to build a program on top of Ethereum (not to be confused with building Ethereum itself). Comparing this with web development, this is like saying that this book is going to teach you how to build a web application using Ruby on Rails, but the book does not teach you how to dissect the Ruby on Rails framework itself. This does not mean that the internals of Ruby on Rails are not important, it just means that most of the time, you don't need them.

This book will teach you to use the Python programming language, assuming that you have basic knowledge of Python already. But why Python? The answer is a cliché: Python is one of the easiest and most popular programming languages. It lowers the barrier to entry for someone who wants to jump into blockchain.

 

 

 

Summary


In this chapter, we looked into the technology behind cryptocurrencies such as Bitcoin and Ethereum. This technology enables the decentralization of storing values or code. We also covered cryptography by using private and public keys to secure the integrity of any data. Further on, we learned about hash functions, proof of work, consensus, and the basic concepts of blockchain programming.

In the next chapter, we will learn about a smart contract, a kind of program that lives in Ethereum. A smart contract is different than a kind of program that lives in a server, such as an application written with Ruby on Rails, Laravel, or Django. The differences are more than just the syntax; the concept is radically different than a normal web application.

 

About the Author

  • Arjuna Sky Kok

    Arjuna Sky Kok has experience more than 10 years in expressing himself as a software engineer. He has developed web applications using Symfony, Laravel, Ruby on Rails, and Django. He also has built mobile applications on top of Android and iOS platforms.

    Currently, he is researching Ethereum technology. Other than that, he teaches Android and iOS programming to students.

    He graduated from Bina Nusantara University with majors in Computer Science and Applied Mathematics. He always strives to become a holistic person by enjoying leisure activities, such as dancing Salsa, learning French, and playing StarCraft 2. He lives quietly in the bustling city of Jakarta.

    In loving memory of my late brother, Hengdra Santoso (1979-2011).

    Browse publications by this author
Book Title
Unlock this full book FREE 10 day trial
Start Free Trial