Reader small image

You're reading from  Hands-On Data Analysis with Pandas - Second Edition

Product typeBook
Published inApr 2021
Reading LevelIntermediate
PublisherPackt
ISBN-139781800563452
Edition2nd Edition
Languages
Tools
Concepts
Right arrow
Author (1)
Stefanie Molin
Stefanie Molin
author image
Stefanie Molin

Stefanie Molin is a data scientist and software engineer at Bloomberg LP in NYC, tackling tough problems in information security, particularly revolving around anomaly detection, building tools for gathering data, and knowledge sharing. She has extensive experience in data science, designing anomaly detection solutions, and utilizing machine learning in both R and Python in the AdTech and FinTech industries. She holds a B.S. in operations research from Columbia University's Fu Foundation School of Engineering and Applied Science, with minors in economics, and entrepreneurship and innovation. In her free time, she enjoys traveling the world, inventing new recipes, and learning new languages spoken among both people and computers.
Read more about Stefanie Molin

Right arrow

Chapter 7: Financial Analysis – Bitcoin and the Stock Market

It's time to switch gears and work on an application. In this chapter, we will explore a financial application by performing an analysis of bitcoin and the stock market. This chapter builds upon everything we have learned so far—we will extract data from the Internet; perform some exploratory data analysis; create visualizations with pandas, seaborn, and matplotlib; calculate important metrics for analyzing the performance of financial instruments using pandas; and get a taste of building some models. Note that we are not trying to learn financial analysis here, but rather walk through an introduction of how the skills we have learned in this book can be applied to financial analysis.

This chapter is also a departure from the standard workflow in this book. Up until this point, we have been working with Python as more of a functional programming language. However, Python also supports object-oriented...

Chapter materials

For this chapter, we will be creating our own package for stock analysis. This makes it extremely easy for us to distribute our code and for others to use our code. The final product of this package is on GitHub at https://github.com/stefmolin/stock-analysis/tree/2nd_edition. Python's package manager, pip, is capable of installing packages from GitHub and also building them locally; this leaves us with either of the following choices as to how we want to proceed:

  • Install from GitHub if we don't plan on editing the source code for our own use.
  • Fork and clone the repository and then install it on our machine in order to modify the code.

If we wish to install from GitHub directly, we don't need to do anything here since this was installed when we set up our environment back in Chapter 1, Introduction to Data Analysis; however, for reference, we would do the following to install packages from GitHub:

(book_env) $ pip3 install \
git...

Building a Python package

Building packages is considered good coding practice since it allows for writing modular code and reuse. Modular code is code that is written in many smaller pieces for more pervasive use, without needing to know the underlying implementation details of everything involved in a task. For example, when we use matplotlib to plot something, we don't need to know what the code inside the functions we call is doing exactly—it suffices to simply know what the input and output will be to build on top of it.

Package structure

A module is a single file of Python code that can be imported; window_calc.py from Chapter 4, Aggregating Pandas DataFrames, and viz.py from Chapter 6, Plotting with Seaborn and Customization Techniques, were both modules. A package is a collection of modules organized into directories. Packages can also be imported, but when we import a package we have access to certain modules inside, so we don't have to import each one...

Collecting financial data

Back in Chapter 2, Working with Pandas DataFrames, and Chapter 3, Data Wrangling with Pandas, we worked with APIs to gather data; however, there are other ways to collect data from the Internet. We can use web scraping to extract data from the HTML page itself, which pandas offers with the pd.read_html() function—it returns a dataframe for each of the HTML tables it finds on the page. For economic and financial data, an alternative is the pandas_datareader package, which the StockReader class in the stock_analysis package uses to collect financial data.

Important note

In case anything has changed with the data sources that are used in this chapter or you encounter errors when using the StockReader class to collect data, the CSV files in the data/ folder can be read in as a replacement in order to follow along with the text; for example:

pd.read_csv('data/bitcoin.csv', index_col='date', parse_dates=True...

Exploratory data analysis

Now that we have our data, we want to get familiar with it. As we saw in Chapter 5, Visualizing Data with Pandas and Matplotlib and Chapter 6, Plotting with Seaborn and Customization Techniques, creating good visualizations requires knowledge of matplotlib, and—depending on the data format and the end goal for the visualization—seaborn. Just as we did with the StockReader class, we want to make it easier to visualize both individual assets and groups of assets, so rather than expecting users of our package (and, perhaps, our collaborators) to be proficient with matplotlib and seaborn, we will create wrappers around this functionality. This means that users of this package only have to be able to use the stock_analysis package to visualize their financial data. In addition, we are able to set a standard for how the visualizations look and avoid copying and pasting large amounts of code for each new analysis we want to conduct, which brings consistency...

Technical analysis of financial instruments

With technical analysis of assets, metrics (such as cumulative returns and volatility) are calculated to compare various assets to each other. As with the previous two sections in this chapter, we will be writing a module with classes to help us. We will need the StockAnalyzer class for technical analysis of a single asset and the AssetGroupAnalyzer class for technical analysis of a group of assets. These classes are in the stock_analysis/stock_analyzer.py file.

As with the other modules, we will start with our docstring and imports:

"""Classes for technical analysis of assets."""
import math
from .utils import validate_df

The StockAnalyzer class

For analyzing individual assets, we will build the StockAnalyzer class, which calculates metrics for a given asset. The following UML diagram shows all the metrics that it provides:

Figure 7.19 – Structure of the StockAnalyzer...

Modeling performance using historical data

The goal of this section is to give us a taste of how to build some models; as such, the following examples are not meant to be the best possible model, but rather a simple and relatively quick implementation for learning purposes. Once again, the stock_analysis package has a class for this section's task: StockModeler.

Important note

To fully understand the statistical elements of this section and modeling in general, we need a solid understanding of statistics; however, the purpose of this discussion is to show how modeling techniques can be applied to financial data without dwelling on the underlying mathematics.

The StockModeler class

The StockModeler class will make it easier for us to build and evaluate some simple financial models without needing to interact directly with the statsmodels package. In addition, we will reduce the number of steps that are needed to generate a model with the methods we create. The following...

Summary

In this chapter, we saw how building Python packages for our analysis applications can make it very easy for others to carry out their own analyses and reproduce ours, as well as for us to create repeatable workflows for future analyses.

The stock_analysis package we created in this chapter contained classes for gathering stock data from the Internet (StockReader); visualizing individual assets or groups of them (Visualizer family); calculating metrics for single assets or groups of them for comparisons (StockAnalyzer and AssetGroupAnalyzer, respectively); and time series modeling with decomposition, ARIMA, and linear regression (StockModeler). We also got our first look at using the statsmodels package in the StockModeler class. This chapter showed us how the pandas, matplotlib, seaborn, and numpy functionality that we've covered so far in this book has come together and how these libraries can work harmoniously with other packages for custom applications. I strongly...

Exercises

Use the stock_analysis package to complete the following exercises. Unless otherwise noted, use data from 2019 through the end of 2020. In case there are any issues collecting the data with the StockReader class, backup CSV files are provided in the exercises/ directory:

  1. Using the StockAnalyzer and StockVisualizer classes, calculate and plot three levels of support and resistance for Netflix's closing price.
  2. With the StockVisualizer class, look at the effect of after-hours trading on the FAANG stocks:

    a) As individual stocks

    b) As a portfolio using the make_portfolio() function from the stock_analysis.utils module

  3. Using the StockVisualizer.open_to_close() method, create a plot that fills the area between the FAANG stocks' opening price (as a portfolio) and its closing price each day in red if the price declined and in green if the price increased. As a bonus, do the same for a portfolio of bitcoin and the S&P 500.
  4. Mutual funds and exchange...

Further reading

Check out the following resources for more information on the material covered in this chapter:

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Hands-On Data Analysis with Pandas - Second Edition
Published in: Apr 2021Publisher: PacktISBN-13: 9781800563452
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Stefanie Molin

Stefanie Molin is a data scientist and software engineer at Bloomberg LP in NYC, tackling tough problems in information security, particularly revolving around anomaly detection, building tools for gathering data, and knowledge sharing. She has extensive experience in data science, designing anomaly detection solutions, and utilizing machine learning in both R and Python in the AdTech and FinTech industries. She holds a B.S. in operations research from Columbia University's Fu Foundation School of Engineering and Applied Science, with minors in economics, and entrepreneurship and innovation. In her free time, she enjoys traveling the world, inventing new recipes, and learning new languages spoken among both people and computers.
Read more about Stefanie Molin