You're reading from Machine Learning for Algorithmic Trading - Second Edition

Product type Book

Published in Jul 2020

Publisher Packt

ISBN-13 9781839217715

Pages 822 pages

Edition 2nd Edition

Languages

Python

Concepts

Machine Learning

Author (1):

Stefan Jansen

Table of Contents (27) Chapters

Preface

1. Machine Learning for Trading – From Idea to Execution

2. Market and Fundamental Data – Sources and Techniques

3. Alternative Data for Finance – Categories and Use Cases

4. Financial Feature Engineering – How to Research Alpha Factors

5. Portfolio Optimization and Performance Evaluation

6. The Machine Learning Process

7. Linear Models – From Risk Factors to Return Forecasts

8. The ML4T Workflow – From Model to Strategy Backtesting

9. Time-Series Models for Volatility Forecasts and Statistical Arbitrage

10. Bayesian ML – Dynamic Sharpe Ratios and Pairs Trading

11. Random Forests – A Long-Short Strategy for Japanese Stocks

12. Boosting Your Trading Strategy

13. Data-Driven Risk Factors and Asset Allocation with Unsupervised Learning

14. Text Data for Trading – Sentiment Analysis

15. Topic Modeling – Summarizing Financial News

16. Word Embeddings for Earnings Calls and SEC Filings

17. Deep Learning for Trading

18. CNNs for Financial Time Series and Satellite Images

19. RNNs for Multivariate Time Series and Sentiment Analysis

20. Autoencoders for Conditional Risk Factors and Asset Pricing

21. Generative Adversarial Networks for Synthetic Time-Series Data

22. Deep Reinforcement Learning – Building a Trading Agent

23. Conclusions and Next Steps

24. References

25. Index

Appendix: Alpha Factor Library

The ML4T Workflow – From Model to Strategy Backtesting

Now, it's time to integrate the various building blocks of the machine learning for trading (ML4T) workflow that we have so far discussed separately. The goal of this chapter is to present an end-to-end perspective of the process of designing, simulating, and evaluating a trading strategy driven by an ML algorithm. To this end, we will demonstrate in more detail how to backtest an ML-driven strategy in a historical market context using the Python libraries backtrader and Zipline.

The ultimate objective of the ML4T workflow is to gather evidence from historical data. This helps us decide whether to deploy a candidate strategy in a live market and put financial resources at risk. This process builds on the skills you developed in the previous chapters because it relies on your ability to:

Work with a diverse set of data sources to engineer informative factors
Design ML models that generate predictive...

How to backtest an ML-driven strategy

In a nutshell, the ML4T workflow, illustrated in Figure 8.1, is about backtesting a trading strategy that leverages machine learning to generate trading signals, select and size positions, or optimize the execution of trades. It involves the following steps, with a specific investment universe and horizon in mind:

Source and prepare market, fundamental, and alternative data
Engineer predictive alpha factors and features
Design, tune, and evaluate ML models to generate trading signals
Decide on trades based on these signals, for example, by applying rules
Size individual positions in the portfolio context
Simulate the resulting trades triggered using historical market data
Evaluate how the resulting positions would have performed

Figure 8.1: The ML4T workflow

When we discussed the ML process in Chapter 6, The Machine Learning Process, we emphasized that the model's learning should...

Backtesting pitfalls and how to avoid them

Backtesting simulates an algorithmic strategy based on historical data, with the goal of producing performance results that generalize to new market conditions. In addition to the generic uncertainty around predictions in the context of ever-changing markets, several implementation aspects can bias the results and increase the risk of mistaking in-sample performance for patterns that will hold out-of-sample.

These aspects are under our control and include the selection and preparation of data, unrealistic assumptions about the trading environment, and the flawed application and interpretation of statistical tests. The risks of false backtest discoveries multiply with increasing computing power, bigger datasets, and more complex algorithms that facilitate the misidentification of apparent signals in a noisy sample.

In this section, we will outline the most serious and common methodological mistakes. Please refer to the literature on...

How a backtesting engine works

Put simply, a backtesting engine iterates over historical prices (and other data), passes the current values to your algorithm, receives orders in return, and keeps track of the resulting positions and their value.

In practice, there are numerous requirements for creating a realistic and robust simulation of the ML4T workflow that was depicted in Figure 8.1 at the beginning of this chapter. The difference between vectorized and event-driven approaches illustrates how the faithful reproduction of the actual trading environment adds significant complexity.

Vectorized versus event-driven backtesting

A vectorized backtest is the most basic way to evaluate a strategy. It simply multiplies a signal vector that represents the target position size with a vector of returns for the investment horizon to compute the period performance.

Let's illustrate the vectorized approach using the daily return predictions that we created using ridge...

backtrader – a flexible tool for local backtests

backtrader is a popular, flexible, and user-friendly Python library for local backtests with great documentation, developed since 2015 by Daniel Rodriguez. In addition to a large and active community of individual traders, there are several banks and trading houses that use backtrader to prototype and test new strategies before porting them to a production-ready platform using, for example, Java. You can also use backtrader for live trading with several brokers of your choice (see the backtrader documentation and Chapter 23, Conclusions and Next Steps).

We'll first summarize the key concepts of backtrader to clarify the big picture of the backtesting workflow on this platform, and then demonstrate its usage for a strategy driven by ML predictions.

Key concepts of backtrader's Cerebro architecture

backtrader's Cerebro (Spanish for "brain") architecture represents the key components of the backtesting...

Zipline – scalable backtesting by Quantopian

The backtesting engine Zipline powers Quantopian's online research, backtesting, and live (paper) trading platform. As a hedge fund, Quantopian aims to identify robust algorithms that outperform, subject to its risk management criteria. To this end, they use competitions to select the best strategies and allocate capital to share profits with the winners.

Quantopian first released Zipline in 2012 as version 0.5, and the latest version, 1.3, dates from July 2018. Zipline works well with its sister libraries Alphalens, pyfolio, and empyrical that we introduced in Chapter 4, Financial Feature Engineering – How to Research Alpha Factors and Chapter 5, Portfolio Optimization and Performance Evaluation, and integrates well with NumPy, pandas, and numeric libraries, but may not always support the latest version.

Zipline is designed to operate at the scale of thousands of securities, and each can be associated with a...

Summary

In this chapter, we took a much closer look at how backtesting works, what challenges there are, and how to manage them. We demonstrated how to use the two popular backtesting libraries, backtrader and Zipline.

Most importantly, however, we walked through the end-to-end process of designing and testing an ML model, showed you how to implement trading logic that acts on the signals provided by the model's predictions, and saw how to conduct and evaluate backtests. Now, we are ready to continue exploring a much broader and more sophisticated array of ML models than the linear regressions we started with.

The next chapter will cover how to incorporate the time dimension into our models.