Discover the Secrets of the Bitcoin Blockchain

Alex Leishman

August 10th, 2015

In a previous post Deploy Toshi Bitcoin Node with Docker on AWS you learned how to setup and run Toshi, a Bitcoin node built with Ruby, PostgreSQL and Redis. We walked through how to provision our AWS server, run our Docker containers and access Toshi via the built-in web interface.

As you might notice, it takes a very long time (~two weeks) for the blockchain to sync and for the Toshi database to be fully populated. However, Toshi sees new transactions in the Bitcoin network once it starts running, which allows us to start playing with the data soon after it is up and running.

In this post we will walk through a few simple ways to analyze our new blockchain data to gain insights into and uncover some mysteries of the Blockchain.

First we need a way to access our database in the toshi_db Docker container. I’ve found that the easiest way to do that is to setup a new container running ‘psql’, the PostgreSQL interactive terminal.

Run the command below, which will take you to the psql command line inside a new container.

toshi@ip-172-31-62-77:~$ sudo docker run -it --link toshi_db:postgres 
postgres sh -c 'exec psql -h "172.17.0.3" -p "5432" -U postgres' psql (9.3.5) Type "help" for help. postgres=#

Remember to replace my Postgres IP with your own.

Any commands run from here are actually being done IN the psql container. Once we quit psql (‘\q’), the container will be stopped.

Let’s explore the data!

schema.txt

There are 37 tables in Toshi. Let’s find out what some of them have to offer:

1_query.txt

Here we queried the first two entries in the addresses table. We can see that Toshi stores the ’total_received’ and ‘total_sent’ values from each address. This allows us to calculate the balance of any bitcoin address with ease, something not possible with bitcoind.

We also queried the count of entries in the ‘addresses’ table, this value will continually change as your data syncs and new transactions are created in the network.

Another interesting column in this table is ‘address_type’, which tells us whether an address is a traditional Pay to PubKey Hash (P2PKH) address or a Pay to Script Hash (P2SH) address. In the ‘addresses’ table, P2PK addresses have ‘address_type = 0’ and P2SH addresses have ‘address_type=1’. Querying the amount of P2SH addresses can help us estimate the percentage of Bitcoin users taking advantage of multi-signature addresses. To learn more about different address types, look here.

If we want to learn more about a specific table in the Toshi DB, we can use the same ‘\d’ command followed by the table name. For example:

2_query.txt

This lists the columns, indices and relations of the ‘unspent_outputs’ table.

Now, let’s look for something a bit more fun – hidden messages in the blockchain.

The ‘op code’ OP_RETURN allows us to create bitcoin transactions with zero value outputs used to store a string of data. The hexadecimal code for OP_RETURN is ‘\x6a’.

Therefore, we can search for OP_RETURN outputs using the following query:

3_query.txt

If we convert each of the script hex values to UTF-8 (online tool), we get some interesting results. Most of them are just noise (they may be encrypted), but there are a few decoded values that stand out:

 (CaMarche!ޛKꈶ-誗�bَˆګ㟟ǒo⁏ˀ
"OAu=https://cpr.sm/aNSAyIRJSr
(DOCPROOFv㒶nۗ䧴꠿کRS:Cw-J䙻$:_䌽
(Test0004嬟zN畆oﱵ联؞¬⋈ǦQ<㺏֪怀

The first result includes the French expression “Ça marche”, meaning “ok, that works”. The second result includes a URL that leads to the JSON description of a colored coin asset. To learn more about colored coins, check out Coinprism. The third result includes the text ‘DOCPROOF’, which indicates that the output was used for proof of existence, allowing a user to cryptographically prove existence of a document at a point in time. The last result looks like somebody just wanted to play around and test out OP_RETURN.

Lastly, if we want to export the results of a query from our container we can copy it to SDOUT and then export it from the container log afterwards.

4_query.txt

If we quit psql (‘\q’), we find the name of our previously used psql container (now stopped):

ubuntu@ip-172-31-29-91:~$ sudo docker ps -a
CONTAINER ID        IMAGE                  COMMAND                CREATED             STATUS                      PORTS                    NAMES
c66179a35dbe        postgres:latest        "/docker-entrypoint.   2 minutes ago       Exited (0) 15 seconds ago                            sad_shockley

Then we can export the log into a CSV file:

sudo docker logs sad_shockly > data.csv

You can now ‘scp’ this CSV file back to your local machine for further analysis. Note that this file will include all commands and outputs from your psql container. It may require some manual touch-ups. You can always start a new psql container for a fresh log.

So, in just a few minutes we were able to create a new psql Docker container, allowing us to explore blockchain data in ways that are impossible or very difficult to do with bitcoind. We discovered messages that people have left in the blockchain and learned how to export any queries we make into a CSV file. We have only scratched the surface, there are many insights yet to be discovered. Happy querying!

About the Author

Alex Leishman is a software engineer who is passionate about Bitcoin and other digital currencies. He works at MaiCoin.com where he is helping to build the future of money.