Home

Data

Python for Finance Cookbook - Second Edition

By Eryk Lewinson

Book + AI Assistant

eBook + AI Assistant $39.99 $27.98

Print $49.99

Subscription $15.99 $10 p/m for three months

BUY NOW

$10 p/m for first 3 months. $15.99 p/m after that. Cancel Anytime!

What do you get with a Packt Subscription?

Gain access to our AI Assistant (beta) for an exclusive selection of 500 books, available during your subscription period. Enjoy a personalized, interactive, and narrative experience to engage with the book content on a deeper level.

This book & 7000+ ebooks & video courses on 1000+ technologies

60+ curated reading lists for various learning paths

50+ new titles added every month on new and emerging tech

Early Access to eBooks as they are being written

Personalised content suggestions

Customised display settings for better reading experience

50+ new titles added every month on new and emerging tech

Playlists, Notes and Bookmarks to easily manage your learning

Mobile App with offline access

What do you get with a Packt Subscription?

This book & 6500+ ebooks & video courses on 1000+ technologies

60+ curated reading lists for various learning paths

50+ new titles added every month on new and emerging tech

Early Access to eBooks as they are being written

Personalised content suggestions

Customised display settings for better reading experience

50+ new titles added every month on new and emerging tech

Playlists, Notes and Bookmarks to easily manage your learning

Mobile App with offline access

What do you get with eBook + Subscription?

Download this book in EPUB and PDF formats, plus a monthly download credit

This book & 6500+ ebooks & video courses on 1000+ technologies

60+ curated reading lists for various learning paths

50+ new titles added every month on new and emerging tech

Early Access to eBooks as they are being written

Personalised content suggestions

Customised display settings for better reading experience

50+ new titles added every month on new and emerging tech

Playlists, Notes and Bookmarks to easily manage your learning

Mobile App with offline access

What do you get with a Packt Subscription?

This book & 6500+ ebooks & video courses on 1000+ technologies

60+ curated reading lists for various learning paths

50+ new titles added every month on new and emerging tech

Early Access to eBooks as they are being written

Personalised content suggestions

Customised display settings for better reading experience

50+ new titles added every month on new and emerging tech

Playlists, Notes and Bookmarks to easily manage your learning

Mobile App with offline access

What do you get with eBook?

Along with your eBook purchase, enjoy AI Assistant (beta) access in our online reader for a personalized, interactive reading experience.

Download this book in EPUB and PDF formats

Access this title in our online reader

DRM FREE - Read whenever, wherever and however you want

Online reader with customised display settings for better reading experience

What do I get with Print?

Along with your Print book purchase, enjoy AI Assistant (beta) access in our online reader for a personalized, interactive reading experience.

Get a paperback copy of the book delivered to your specified Address*

Download this book in EPUB and PDF formats

Access this title in our online reader

DRM FREE - Read whenever, wherever and however you want

Online reader with customised display settings for better reading experience

What do I get with Print?

Get a paperback copy of the book delivered to your specified Address*

Access this title in our online reader

Online reader with customised display settings for better reading experience

What do you get with video?

Download this video in MP4 format

Access this title in our online reader

DRM FREE - Watch whenever, wherever and however you want

Online reader with customised display settings for better learning experience

What do you get with video?

Stream this video

Access this title in our online reader

DRM FREE - Watch whenever, wherever and however you want

Online reader with customised display settings for better learning experience

What do you get with Audiobook?

Download a zip folder consisting of audio files (in MP3 Format) along with supplementary PDF

What do you get with Exam Trainer?

Flashcards, Mock exams, Exam Tips, Practice Questions

Access these resources with our interactive certification platform

Mobile compatible-Practice whenever, wherever, however you want

BUY NOW $10 p/m for first 3 months. $15.99 p/m after that. Cancel Anytime!

eBook + AI Assistant $39.99 $27.98

Print $49.99

Subscription $15.99 $10 p/m for three months

What do you get with a Packt Subscription?

This book & 7000+ ebooks & video courses on 1000+ technologies

60+ curated reading lists for various learning paths

50+ new titles added every month on new and emerging tech

Early Access to eBooks as they are being written

Personalised content suggestions

Customised display settings for better reading experience

50+ new titles added every month on new and emerging tech

Playlists, Notes and Bookmarks to easily manage your learning

Mobile App with offline access

What do you get with a Packt Subscription?

This book & 6500+ ebooks & video courses on 1000+ technologies

60+ curated reading lists for various learning paths

50+ new titles added every month on new and emerging tech

Early Access to eBooks as they are being written

Personalised content suggestions

Customised display settings for better reading experience

50+ new titles added every month on new and emerging tech

Playlists, Notes and Bookmarks to easily manage your learning

Mobile App with offline access

What do you get with eBook + Subscription?

Download this book in EPUB and PDF formats, plus a monthly download credit

This book & 6500+ ebooks & video courses on 1000+ technologies

60+ curated reading lists for various learning paths

50+ new titles added every month on new and emerging tech

Early Access to eBooks as they are being written

Personalised content suggestions

Customised display settings for better reading experience

50+ new titles added every month on new and emerging tech

Playlists, Notes and Bookmarks to easily manage your learning

Mobile App with offline access

What do you get with a Packt Subscription?

This book & 6500+ ebooks & video courses on 1000+ technologies

60+ curated reading lists for various learning paths

50+ new titles added every month on new and emerging tech

Early Access to eBooks as they are being written

Personalised content suggestions

Customised display settings for better reading experience

50+ new titles added every month on new and emerging tech

Playlists, Notes and Bookmarks to easily manage your learning

Mobile App with offline access

What do you get with eBook?

Along with your eBook purchase, enjoy AI Assistant (beta) access in our online reader for a personalized, interactive reading experience.

Download this book in EPUB and PDF formats

Access this title in our online reader

DRM FREE - Read whenever, wherever and however you want

Online reader with customised display settings for better reading experience

What do I get with Print?

Along with your Print book purchase, enjoy AI Assistant (beta) access in our online reader for a personalized, interactive reading experience.

Get a paperback copy of the book delivered to your specified Address*

Download this book in EPUB and PDF formats

Access this title in our online reader

DRM FREE - Read whenever, wherever and however you want

Online reader with customised display settings for better reading experience

What do I get with Print?

Get a paperback copy of the book delivered to your specified Address*

Access this title in our online reader

Online reader with customised display settings for better reading experience

What do you get with video?

Download this video in MP4 format

Access this title in our online reader

DRM FREE - Watch whenever, wherever and however you want

Online reader with customised display settings for better learning experience

What do you get with video?

Stream this video

Access this title in our online reader

DRM FREE - Watch whenever, wherever and however you want

Online reader with customised display settings for better learning experience

What do you get with Audiobook?

Download a zip folder consisting of audio files (in MP3 Format) along with supplementary PDF

What do you get with Exam Trainer?

Flashcards, Mock exams, Exam Tips, Practice Questions

Access these resources with our interactive certification platform

Mobile compatible-Practice whenever, wherever, however you want

About this book

Python is one of the most popular programming languages in the financial industry, with a huge collection of accompanying libraries. In this new edition of the Python for Finance Cookbook, you will explore classical quantitative finance approaches to data modeling, such as GARCH, CAPM, factor models, as well as modern machine learning and deep learning solutions. You will use popular Python libraries that, in a few lines of code, provide the means to quickly process, analyze, and draw conclusions from financial data. In this new edition, more emphasis was put on exploratory data analysis to help you visualize and better understand financial data. While doing so, you will also learn how to use Streamlit to create elegant, interactive web applications to present the results of technical analyses. Using the recipes in this book, you will become proficient in financial data analysis, be it for personal or professional projects. You will also understand which potential issues to expect with such analyses and, more importantly, how to overcome them.

Publication date:: December 2022
Publisher: Packt
Pages: 740
ISBN: 9781803243191
Download code from GitHub

Acquiring Financial Data

The first chapter of this book is dedicated to a very important (some may say the most important) part of any data science/quantitative finance project—gathering data. In line with the famous adage “garbage in, garbage out,” we should strive to obtain data of the highest possible quality and then correctly preprocess it for later use with statistical and machine learning algorithms. The reason for this is simple—the results of our analyses are highly dependent on the input data and no sophisticated model will be able to compensate for that. That is also why in our analyses, we should be able to use our (or someone else’s) understanding of the economic/financial domain to motivate certain data for, for example, modeling stock returns.

One of the most frequently reported issues among the readers of the first edition of this book was getting high-quality data. That is why in this chapter we spend more time exploring different sources of financial data. While quite a few of these vendors offer similar information (prices, fundamentals, and so on), they also offer additional, unique data that can be downloaded via their APIs. An example could be company-related news articles or pre-computed technical indicators. That is why we will download different types of data depending on the recipe. However, be sure to inspect the documentation of the library/API, as most likely its vendor also provides standard data such as prices.

Additional examples are also covered in the Jupyter notebooks, which you can find in the accompanying GitHub repository.

The data sources in this chapter were selected intentionally not only to showcase how easy it can be to gather high-quality data using Python libraries but also to show that the gathered data comes in many shapes and sizes.

Sometimes we will get a nicely formatted pandas DataFrame, while other times it might be in JSON format or even bytes that need to be processed and then loaded as a CSV. Hopefully, these recipes will sufficiently prepare you to work with any kind of data you might encounter online.

Something to bear in mind while reading this chapter is that data differs among sources. This means that the prices we downloaded from two vendors will most likely differ, as those vendors also get their data from different sources and might use other methods to adjust the prices for corporate actions. The best practice is to find a source you trust the most concerning a particular type of data (based on, for example, opinion on the internet) and then use it to download the data you need. One additional thing to keep in mind is that when building algorithmic trading strategies, the data we use for modeling should align with the live data feed used for executing the trades.

This chapter does not cover one important type of data—alternative data. This could be any type of data that can be used to generate some insights into predicting asset prices. Alternative data can include satellite images (for example, tracking shipping routes, or the development of a certain area), sensor data, web traffic data, customer reviews, etc. While there are many vendors specializing in alternative data (for example, Quandl/Nasdaq Data Link), you can also get some by accessing publicly available information via web scraping. As an example, you could scrape customer reviews from Amazon or Yelp. However, those are often bigger projects and are unfortunately outside of the scope of this book. Also, you need to make sure that web scraping a particular website is not against its terms and conditions!

Using the vendors mentioned in this chapter, you can get quite a lot of information for free. But most of those providers also offer paid tiers. Remember to do thorough research on what the data suppliers actually provide and what your needs are before signing up for any of the services.

In this chapter, we cover the following recipes:

Getting data from Yahoo Finance
Getting data from Nasdaq Data Link
Getting data from Intrinio
Getting data from Alpha Vantage
Getting data from CoinGecko

Getting data from Yahoo Finance

One of the most popular sources of free financial data is Yahoo Finance. It contains not only historical and current stock prices in different frequencies (daily, weekly, and monthly), but also calculated metrics, such as the beta (a measure of the volatility of an individual asset in comparison to the volatility of the entire market), fundamentals, earnings information/calendars, and many more.

For a long period of time, the go-to tool for downloading data from Yahoo Finance was the pandas-datareader library. The goal of the library was to extract data from a variety of sources and store it in the form of a pandas DataFrame. However, after some changes to the Yahoo Finance API, this functionality was deprecated. It is definitely good to be familiar with this library, as it facilitates downloading data from sources such as FRED (Federal Reserve Economic Data), the Fama/French Data Library, or the World Bank. Those might come in handy for different kinds of analyses and some of them are presented in the following chapters.

As of now, the easiest and fastest way of downloading historical stock prices is to use the yfinance library (formerly known as fix_yahoo_finance).

For the sake of this recipe, we are interested in downloading Apple’s stock prices from the years 2011 to 2021.

How to do it…

Execute the following steps to download data from Yahoo Finance:

Import the libraries:

import pandas as pd
import yfinance as yf

Download the data:

df = yf.download("AAPL",
                 start="2011-01-01",
                 end="2021-12-31",
                 progress=False)

Inspect the downloaded data:
```
print(f"Downloaded {len(df)} rows of data.")
df
```
Running the code generates the following preview of the DataFrame:

Figure 1.1: Preview of the DataFrame with downloaded stock prices

The result of the request is a pandas DataFrame (2,769 rows) containing daily Open, High, Low, and Close (OHLC) prices, as well as the adjusted close price and volume.

Yahoo Finance automatically adjusts the close price for stock splits, that is, when a company divides the existing shares of its stock into multiple new shares, most frequently to boost the stock’s liquidity. The adjusted close price takes into account not only splits but also dividends.

How it works…

The download function is very intuitive. In the most basic case, we just need to provide the ticker (symbol), and it will try to download all available data since 1950.

In the preceding example, we downloaded daily data from a specific range (2011 to 2021).

Some additional features of the download function are:

We can download information for multiple tickers at once by providing a list of tickers (["AAPL", "MSFT"]) or multiple tickers as a string ("AAPL MSFT").
We can set auto_adjust=True to download only the adjusted prices.
We can additionally download dividends and stock splits by setting actions='inline'. Those actions can also be used to manually adjust the prices or for other analyses.
Specifying progress=False disables the progress bar.
The interval argument can be used to download data in different frequencies. We could also download intraday data as long as the requested period is shorter than 60 days.

There’s more…

yfinance also offers an alternative way of downloading the data—via the Ticker class. First, we need to instantiate the object of the class:

aapl_data = yf.Ticker("AAPL")

To download the historical price data, we can use the history method:

aapl_data.history()

By default, the method downloads the last month of data. We can use the same arguments as in the download function to specify the range and frequency.

The main benefit of using the Ticker class is that we can download much more information than just the prices. Some of the available methods include:

info—outputs a JSON object containing detailed information about the stock and its company, for example, the company’s full name, a short business summary, which exchange it is listed on, as well as a selection of financial metrics such as the beta coefficient
actions—outputs corporate actions such as dividends and splits
major_holders—presents the names of the major holders
institutional_holders—shows the institutional holders
calendar—shows the incoming events, such as the quarterly earnings
earnings/quarterly_earnings—shows the earnings information from the last few years/quarters
financials/quarterly_financials—contains financial information such as income before tax, net income, gross profit, EBIT, and much more

Please see the corresponding Jupyter notebook for more examples and outputs of those methods.

Getting data from Nasdaq Data Link

Alternative data can be anything that is considered non-market data, for example, weather data for agricultural commodities, satellite images that track oil shipments, or even customer feedback that reflects a company’s service performance. The idea behind using alternative data is to get an “informational edge” that can then be used for generating alpha. In short, alpha is a measure of performance describing an investment strategy’s, trader’s, or portfolio manager’s ability to beat the market.

Quandl was the leading provider of alternative data products for investment professionals (including quant funds and investment banks). Recently, it was acquired by Nasdaq and is now part of the Nasdaq Data Link service. The goal of the new platform is to provide a unified source of trusted data and analytics. It offers an easy way to download data, also via a dedicated Python library.

A good starting place for financial data would be the WIKI Prices database, which contains stock prices, dividends, and splits for 3,000 US publicly traded companies. The drawback of this database is that as of April 2018, it is no longer supported (meaning there is no recent data). However, for purposes of getting historical data or learning how to access the databases, it is more than enough.

We use the same example that we used in the previous recipe—we download Apple’s stock prices for the years 2011 to 2021.

Getting ready

Before downloading the data, we need to create an account at Nasdaq Data Link (https://data.nasdaq.com/) and then authenticate our email address (otherwise, an exception is likely to occur while downloading the data). We can find our personal API key in our profile (https://data.nasdaq.com/account/profile).

How to do it…

Execute the following steps to download data from Nasdaq Data Link:

Import the libraries:

import pandas as pd
import nasdaqdatalink

Authenticate using your personal API key:
```
nasdaqdatalink.ApiConfig.api_key = "YOUR_KEY_HERE"
```
You need to replace YOUR_KEY_HERE with your own API key.

Download the data:

df = nasdaqdatalink.get(dataset="WIKI/AAPL",
                        start_date="2011-01-01", 
                        end_date="2021-12-31")

Inspect the downloaded data:
```
print(f"Downloaded {len(df)} rows of data.")
df.head()
```
Running the code generates the following preview of the DataFrame:

Figure 1.2: Preview of the downloaded price information

The result of the request is a DataFrame (1,818 rows) containing the daily OHLC prices, the adjusted prices, dividends, and potential stock splits. As we mentioned in the introduction, the data is limited and is only available until April 2018—the last observation actually comes from March, 27 2018.

How it works…

The first step after importing the required libraries was authentication using the API key. When providing the dataset argument, we used the following structure: DATASET/TICKER.

We should keep the API keys secure and private, that is, not share them in public repositories, or anywhere else. One way to make sure that the key stays private is to create an environment variable (how to do it depends on your operating system) and then load it in Python. To do so, we can use the os module. To load the NASDAQ_KEY variable, we could use the following code: os.environ.get("NASDAQ_KEY").

Some additional details on the get function are:

We can specify multiple datasets at once using a list such as ["WIKI/AAPL", "WIKI/MSFT"].
The collapse argument can be used to define the frequency (available options are daily, weekly, monthly, quarterly, or annually).
The transform argument can be used to carry out some basic calculations on the data prior to downloading. For example, we could calculate row-on-row change (diff), row-on-row percentage change (rdiff), or cumulative sum (cumul) or scale the series to start at 100 (normalize). Naturally, we can easily do the very same operation using pandas.

There’s more...

Nasdaq Data Link distinguishes two types of API calls for downloading data. The get function we used before is classified as a time-series API call. We can also use the tables API call with the get_table function.

Download the data for multiple tickers using the get_table function:

COLUMNS = ["ticker", "date", "adj_close"]
df = nasdaqdatalink.get_table("WIKI/PRICES", 
                              ticker=["AAPL", "MSFT", "INTC"], 
                              qopts={"columns": COLUMNS}, 
                              date={"gte": "2011-01-01", 
                                    "lte": "2021-12-31"}, 
                              paginate=True)
df.head()

Running the code generates the following preview of the DataFrame:

Figure 1.3: Preview of the downloaded price data

This function call is a bit more complex than the one we did with the get function. We first specified the table we want to use. Then, we provided a list of tickers. As the next step, we specified which columns of the table we were interested in. We also provided the range of dates, where gte stands for greater than or equal to, while lte is less than or equal to. Lastly, we also indicated we wanted to use pagination. The tables API is limited to 10,000 rows per call. However, by using paginate=True in the function call we extend the limit to 1,000,000 rows.

Pivot the data from long format to wide:
```
df = df.set_index("date")
df_wide = df.pivot(columns="ticker")
df_wide.head()
```
Running the code generates the following preview of the DataFrame:

Figure 1.4: Preview of the pivoted DataFrame

The output of the get_tables function is in the long format. However, to make our analyses easier, we might be interested in the wide format. To reshape the data, we first set the date column as an index and then used the pivot method of a pd.DataFrame.

Please bear in mind that this is not the only way to do so, and pandas contains at least a few helpful methods/functions that can be used for reshaping the data from long to wide and vice versa.

Getting data from Intrinio

Another interesting source of financial data is Intrinio, which offers access to its free (with limits) database. The following list presents just a few of the interesting data points that we can download using Intrinio:

Intraday historical data
Real-time stock/option prices
Financial statement data and fundamentals
Company news
Earnings-related information
IPOs
Economic data such as the Gross Domestic Product (GDP), unemployment rate, federal funds rate, etc.
30+ technical indicators

Most of the data is free of charge, with some limits on the frequency of calling the APIs. Only the real-time price data of US stocks and ETFs requires a different kind of subscription.

In this recipe, we follow the preceding example of downloading Apple’s stock prices for the years 2011 to 2021. That is because the data returned by the API is not simply a pandas DataFrame and requires some interesting preprocessing.

Getting ready

Before downloading the data, we need to register at https://intrinio.com to obtain the API key.

Please see the following link (https://docs.intrinio.com/developer-sandbox) to understand what information is included in the sandbox API key (the free one).

How to do it…

Execute the following steps to download data from Intrinio:

Import the libraries:

import intrinio_sdk as intrinio
import pandas as pd

Authenticate using your personal API key, and select the API:
```
intrinio.ApiClient().set_api_key("YOUR_KEY_HERE")
security_api = intrinio.SecurityApi()
```
You need to replace YOUR_KEY_HERE with your own API key.

Request the data:

r = security_api.get_security_stock_prices(
    identifier="AAPL", 
    start_date="2011-01-01",
    end_date="2021-12-31", 
    frequency="daily",
    page_size=10000
)

Convert the results into a DataFrame:

df = (
    pd.DataFrame(r.stock_prices_dict)
    .sort_values("date")
    .set_index("date")
)

Inspect the data:

print(f"Downloaded {df.shape[0]} rows of data.")
df.head()

The output looks as follows:

Figure 1.5: Preview of the downloaded price information

The resulting DataFrame contains the OHLC prices and volume, as well as their adjusted counterparts. However, that is not all, and we had to cut out some additional columns to make the table fit the page. The DataFrame also contains information, such as split ratio, dividend, change in value, percentage change, and the 52-week rolling high and low values.

How it works…

The first step after importing the required libraries was to authenticate using the API key. Then, we selected the API we wanted to use for the recipe—in the case of stock prices, it was the SecurityApi.

To download the data, we used the get_security_stock_prices method of the SecurityApi class. The parameters we can specify are as follows:

identifier—stock ticker or another acceptable identifier
start_date/end_date—these are self-explanatory
frequency—which data frequency is of interest to us (available choices: daily, weekly, monthly, quarterly, or yearly)
page_size—defines the number of observations to return on one page; we set it to a high number to collect all the data we need in one request with no need for the next_page token

The API returns a JSON-like object. We accessed the dictionary form of the response, which we then transformed into a DataFrame. We also set the date as an index using the set_index method of a pandas DataFrame.

There’s more...

In this section, we show some more interesting features of Intrinio.

Not all information is included in the free tier. For a more thorough overview of what data we can download for free, please refer to the following documentation page: https://docs.intrinio.com/developer-sandbox.

Get Coca-Cola’s real-time stock price

You can use the previously defined security_api to get the real-time stock prices:

security_api.get_security_realtime_price("KO")

The output of the snippet is the following JSON:

{'ask_price': 57.57,
 'ask_size': 114.0,
 'bid_price': 57.0,
 'bid_size': 1.0,
 'close_price': None,
 'exchange_volume': 349353.0,
 'high_price': 57.55,
 'last_price': 57.09,
 'last_size': None,
 'last_time': datetime.datetime(2021, 7, 30, 21, 45, 38, tzinfo=tzutc()),
 'low_price': 48.13,
 'market_volume': None,
 'open_price': 56.91,
 'security': {'composite_figi': 'BBG000BMX289',
              'exchange_ticker': 'KO:UN',
              'figi': 'BBG000BMX4N8',
              'id': 'sec_X7m9Zy',
              'ticker': 'KO'},
 'source': 'bats_delayed',
 'updated_on': datetime.datetime(2021, 7, 30, 22, 0, 40, 758000, tzinfo=tzutc())}

Download news articles related to Coca-Cola

One of the potential ways to generate trading signals is to aggregate the market’s sentiment on the given company. We could do it, for example, by analyzing news articles or tweets. If the sentiment is positive, we can go long, and vice versa. Below, we show how to download news articles about Coca-Cola:

r = intrinio.CompanyApi().get_company_news(
    identifier="KO", 
    page_size=100
)
 
df = pd.DataFrame(r.news_dict)
df.head()

This code returns the following DataFrame:

Figure 1.6: Preview of the news about the Coca-Cola company

Search for companies connected to the search phrase

Running the following snippet returns a list of companies that Intrinio’s Thea AI recognized based on the provided query string:

r = intrinio.CompanyApi().recognize_company("Intel")
df = pd.DataFrame(r.companies_dict)
df

As we can see, there are quite a few companies that also contain the phrase “intel” in their names, other than the obvious search result.

Figure 1.7: Preview of the companies connected to the phrase “intel”

Get Coca-Cola’s intraday stock prices

We can also retrieve intraday prices using the following snippet:

response = (
    security_api.get_security_intraday_prices(identifier="KO", 
                                              start_date="2021-01-02",
                                              end_date="2021-01-05",
                                              page_size=1000)
)
df = pd.DataFrame(response.intraday_prices_dict)
df

Which returns the following DataFrame containing intraday price data.

Figure 1.8: Preview of the downloaded intraday prices

Get Coca-Cola’s latest earnings record

Another interesting usage of the security_api is to recover the latest earnings records. We can do this using the following snippet:

r = security_api.get_security_latest_earnings_record(identifier="KO")
print(r)

The output of the API call contains quite a lot of useful information. For example, we can see what time of day the earnings call happened. This information could potentially be used for implementing trading strategies that act when the market opens.

Figure 1.9: Coca-Cola’s latest earnings record

Getting data from Alpha Vantage

Alpha Vantage is another popular data vendor providing high-quality financial data. Using their API, we can download the following:

Stock prices, including intraday and real-time (paid access)
Fundamentals: earnings, income statement, cash flow, earnings calendar, IPO calendar
Forex and cryptocurrency exchange rates
Economic indicators such as real GDP, Federal Funds Rate, Consumer Price Index, and consumer sentiment
50+ technical indicators

In this recipe, we show how to download a selection of crypto-related data. We start with historical daily Bitcoin prices, and then show how to query the real-time crypto exchange rate.

Getting ready

Before downloading the data, we need to register at https://www.alphavantage.co/support/#api-key to obtain the API key. Access to the API and all the endpoints is free of charge (excluding the real-time stock prices) within some bounds (5 API requests per minute; 500 API requests per day).

How to do it…

Execute the following steps to download data from Alpha Vantage:

Import the libraries:

from alpha_vantage.cryptocurrencies import CryptoCurrencies

Authenticate using your personal API key and select the API:

ALPHA_VANTAGE_API_KEY = "YOUR_KEY_HERE"
crypto_api = CryptoCurrencies(key=ALPHA_VANTAGE_API_KEY,
                              output_format= "pandas")

Download the daily prices of Bitcoin, expressed in EUR:
```
data, meta_data = crypto_api.get_digital_currency_daily(
    symbol="BTC", 
    market="EUR"
)
```
The meta_data object contains some useful information about the details of the query. You can see it below:
```
{'1. Information': 'Daily Prices and Volumes for Digital Currency',
 '2. Digital Currency Code': 'BTC',
 '3. Digital Currency Name': 'Bitcoin',
 '4. Market Code': 'EUR',
 '5. Market Name': 'Euro',
 '6. Last Refreshed': '2022-08-25 00:00:00',
 '7. Time Zone': 'UTC'}
```
The data DataFrame contains all the requested information. We obtained 1,000 daily OHLC prices, the volume, and the market capitalization. What is also noteworthy is that all the OHLC prices are provided in two currencies: EUR (as we requested) and USD (the default one).

Figure 1.10: Preview of the downloaded prices, volume, and market cap

Download the real-time exchange rate:
```
crypto_api.get_digital_currency_exchange_rate(
    from_currency="BTC", 
    to_currency="USD"
)[0].transpose()
```
Running the command returns the following DataFrame with the current exchange rate:

Figure 1.11: BTC-USD exchange rate

How it works…

After importing the alpha_vantage library, we had to authenticate using the personal API key. We did so while instantiating an object of the CryptoCurrencies class. At the same time, we specified that we would like to obtain output in the form of a pandas DataFrame. The other possibilities are JSON and CSV.

In Step 3, we downloaded the daily BTC prices using the get_digital_currency_daily method. Additionally, we specified that we wanted to get the prices in EUR. By default, the method will return the requested EUR prices, as well as their USD equivalents.

Lastly, we downloaded the real-time BTC/USD exchange rate using the get_digital_currency_exchange_rate method.

There’s more...

So far, we have used the alpha_vantage library as a middleman to download information from Alpha Vantage. However, the functionalities of the data vendor evolve faster than the third-party library and it might be interesting to learn an alternative way of accessing their API.

Import the libraries:

import requests
import pandas as pd
from io import BytesIO

Download Bitcoin’s intraday data:
```
AV_API_URL = "https://www.alphavantage.co/query"
parameters = {
    "function": "CRYPTO_INTRADAY",
    "symbol": "ETH",
    "market": "USD",
    "interval": "30min",
    "outputsize": "full",
    "apikey": ALPHA_VANTAGE_API_KEY
}
r = requests.get(AV_API_URL, params=parameters)
data = r.json()
df = (
    pd.DataFrame(data["Time Series Crypto (30min)"])
    .transpose()
)
df
```
Running the snippet above returns the following preview of the downloaded DataFrame:

Figure 1.12: Preview of the DataFrame containing Bitcoin’s intraday prices

We first defined the base URL used for requesting information via Alpha Vantage’s API. Then, we defined a dictionary containing the additional parameters of the request, including the personal API key. In our function call, we specified that we want to download intraday ETH prices expressed in USD and sampled every 30 minutes. We also indicated we want a full output (by specifying the outputsize parameter). The other option is compact output, which downloads the 100 most recent observations.

Having prepared the request’s parameters, we used the get function from the requests library. We provide the base URL and the parameters dictionary as arguments. After obtaining the response to the request, we can access it in JSON format using the json method. Lastly, we convert the element of interest into a pandas DataFrame.

Alpha Vantage’s documentation shows a slightly different approach to downloading this data, that is, by creating a long URL with all the parameters specified there. Naturally, that is also a possibility, however, the option presented above is a bit neater. To see the very same request URL as presented by the documentation, you can run r.request.url.

Download the upcoming earnings announcements within the next three months:

AV_API_URL = "https://www.alphavantage.co/query"
parameters = {
    "function": "EARNINGS_CALENDAR",
    "horizon": "3month",
    "apikey": ALPHA_VANTAGE_API_KEY
}
r = requests.get(AV_API_URL, params=parameters)
pd.read_csv(BytesIO(r.content))

Running the snippet returns the following output:

Figure 1.13: Preview of a DataFrame containing the downloaded earnings information

While getting the response to our API request is very similar to the previous example, handling the output is much different.

The output of r.content is a bytes object containing the output of the query as text. To mimic a normal file in-memory, we can use the BytesIO class from the io module. Then, we can normally load that mimicked file using the pd.read_csv function.

In the accompanying notebook, we present a few more functionalities of Alpha Vantage, such as getting the quarterly earnings data, downloading the calendar of the upcoming IPOs, and using alpha_vantage's TimeSeries module to download stock price data.

Getting data from CoinGecko

The last data source we will cover is dedicated purely to cryptocurrencies. CoinGecko is a popular data vendor and crypto-tracking website, on which you can find real-time exchange rates, historical data, information about exchanges, upcoming events, trading volumes, and much more.

We can list a few of the advantages of CoinGecko:

Completely free, and no need to register for an API key
Aside from prices, it also provides updates and news about crypto
It covers many coins, not only the most popular ones

In this recipe, we download Bitcoin’s OHLC from the last 14 days.

How to do it…

Execute the following steps to download data from CoinGecko:

Import the libraries:

from pycoingecko import CoinGeckoAPI
from datetime import datetime
import pandas as pd

Instantiate the CoinGecko API:
```
cg = CoinGeckoAPI()
```

Get Bitcoin’s OHLC prices from the last 14 days:

ohlc = cg.get_coin_ohlc_by_id(
    id="bitcoin", vs_currency="usd", days="14"
)
ohlc_df = pd.DataFrame(ohlc)
ohlc_df.columns = ["date", "open", "high", "low", "close"]
ohlc_df["date"] = pd.to_datetime(ohlc_df["date"], unit="ms")
ohlc_df

Running the snippet above returns the following DataFrame:

Figure 1.14: Preview of the DataFrame containing the requested Bitcoin prices

In the preceding table, we can see that we have obtained the requested 14 days of data, sampled every 4 hours.

How it works…

After importing the libraries, we instantiated the CoinGeckoAPI object. Then, using its get_coin_ohlc_by_id method we downloaded the last 14 days’ worth of BTC/USD exchange rates. It is worth mentioning there are some limitations of the API:

We can only download data for a predefined number of days. We can select one of the following options: 1/7/14/30/90/180/365/max.
The OHLC candles are sampled with a varying frequency depending on the requested horizon. They are sampled every 30 minutes for requests of 1 or 2 days. Between 3 and 30 days they are sampled every 4 hours. Above 30 days, they are sampled every 4 days.

The output of the get_coin_ohlc_by_id is a list of lists, which we can convert into a pandas DataFrame. We had to manually create the column names, as they were not provided by the API.

There’s more...

We have seen that getting the OHLC prices can be a bit more difficult using the CoinGecko API as compared to the other vendors. However, CoinGecko has additional interesting information we can download using its API. In this section, we show a few possibilities.

Get the top 7 trending coins

We can use CoinGecko to acquire the top 7 trending coins—the ranking is based on the number of searches on CoinGecko within the last 24 hours. While downloading this information, we also get the coins’ symbols, their market capitalization ranking, and the latest price in BTC:

trending_coins = cg.get_search_trending()
(
    pd.DataFrame([coin["item"] for coin in trending_coins["coins"]])
    .drop(columns=["thumb", "small", "large"])
)

Using the snippet above, we obtain the following DataFrame:

Figure 1.15: Preview of the DataFrame containing the 7 trending coins and some information about them

Get Bitcoin’s current price in USD

We can also extract current crypto prices in various currencies:

cg.get_price(ids="bitcoin", vs_currencies="usd")

Running the snippet above returns Bitcoin’s real-time price:

{'bitcoin': {'usd': 47312}}

In the accompanying notebook, we present a few more functionalities of pycoingecko, such as getting the crypto price in different currencies than USD, downloading the entire list of coins supported on CoinGecko (over 9,000 coins), getting each coin’s detailed market data (market capitalization, 24h volume, the all-time high, and so on), and loading the list of the most popular exchanges.

Summary

In this chapter, we have covered a few of the most popular sources of financial data. However, this is just the tip of the iceberg. Below, you can find a list of other interesting data sources that might suit your needs even better.

Additional data sources are:

IEX Cloud (https://iexcloud.io/)—a platform providing a vast trove of different financial data. A notable feature that is unique to the platform is a daily and minutely sentiment score based on the activity on Stocktwits—an online community for investors and traders. However, that API is only available in the paid plan. You can access the IEX Cloud data using pyex, the official Python library.
Tiingo (https://www.tiingo.com/) and the tiingo library.
CryptoCompare (https://www.cryptocompare.com/)—the platform offers a wide range of crypto-related data via their API. What stands out about this data vendor is that they provide order book data.
Twelve Data (https://twelvedata.com/).
polygon.io (https://polygon.io/)—a trusted data vendor for real-time and historical data (stocks, forex, and crypto). Trusted by companies such as Google, Robinhood, and Revolut.
Shrimpy (https://www.shrimpy.io/) and shrimpy-python—the official Python library for the Shrimpy Developer API.

In the next chapter, we will learn how to preprocess the downloaded data for further analysis.

Join us on Discord!

To join the Discord community for this book – where you can share feedback, ask questions to the author, and learn about new releases – follow the QR code below:

https://packt.link/ips2H

About the Author

Eryk Lewinson

Eryk Lewinson received his master's degree in Quantitative Finance from Erasmus University Rotterdam. In his professional career, he has gained experience in the practical application of data science methods while working in risk management and data science departments of two "big 4" companies, a Dutch neo-broker and most recently the Netherlands' largest online retailer. Outside of work, he has written over a hundred articles about topics related to data science, which have been viewed more than 3 million times. In his free time, he enjoys playing video games, reading books, and traveling with his girlfriend.
Browse publications by this author

Python for Finance Cookbook - Second Edition

Acquiring Financial Data

Getting data from Yahoo Finance

How to do it…

How it works…

There’s more…

See also

Getting data from Nasdaq Data Link

Getting ready

How to do it…

How it works…

There’s more...

See also

Getting data from Intrinio

Getting ready

How to do it…

How it works…

There’s more...

Get Coca-Cola’s real-time stock price

Download news articles related to Coca-Cola

Search for companies connected to the search phrase

Get Coca-Cola’s intraday stock prices

Get Coca-Cola’s latest earnings record

See also

Getting data from Alpha Vantage

Getting ready

How to do it…

How it works…

There’s more...

See also

Getting data from CoinGecko

How to do it…

How it works…

There’s more...

Get the top 7 trending coins

Get Bitcoin’s current price in USD

See also

Summary

Join us on Discord!