Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Machine Learning for Algorithmic Trading - Second Edition

You're reading from  Machine Learning for Algorithmic Trading - Second Edition

Product type Book
Published in Jul 2020
Publisher Packt
ISBN-13 9781839217715
Pages 822 pages
Edition 2nd Edition
Languages
Author (1):
Stefan Jansen Stefan Jansen
Profile icon Stefan Jansen

Table of Contents (27) Chapters

Preface 1. Machine Learning for Trading – From Idea to Execution 2. Market and Fundamental Data – Sources and Techniques 3. Alternative Data for Finance – Categories and Use Cases 4. Financial Feature Engineering – How to Research Alpha Factors 5. Portfolio Optimization and Performance Evaluation 6. The Machine Learning Process 7. Linear Models – From Risk Factors to Return Forecasts 8. The ML4T Workflow – From Model to Strategy Backtesting 9. Time-Series Models for Volatility Forecasts and Statistical Arbitrage 10. Bayesian ML – Dynamic Sharpe Ratios and Pairs Trading 11. Random Forests – A Long-Short Strategy for Japanese Stocks 12. Boosting Your Trading Strategy 13. Data-Driven Risk Factors and Asset Allocation with Unsupervised Learning 14. Text Data for Trading – Sentiment Analysis 15. Topic Modeling – Summarizing Financial News 16. Word Embeddings for Earnings Calls and SEC Filings 17. Deep Learning for Trading 18. CNNs for Financial Time Series and Satellite Images 19. RNNs for Multivariate Time Series and Sentiment Analysis 20. Autoencoders for Conditional Risk Factors and Asset Pricing 21. Generative Adversarial Networks for Synthetic Time-Series Data 22. Deep Reinforcement Learning – Building a Trading Agent 23. Conclusions and Next Steps 24. References
25. Index
Appendix: Alpha Factor Library

The ML4T Workflow – From Model to Strategy Backtesting

Now, it's time to integrate the various building blocks of the machine learning for trading (ML4T) workflow that we have so far discussed separately. The goal of this chapter is to present an end-to-end perspective of the process of designing, simulating, and evaluating a trading strategy driven by an ML algorithm. To this end, we will demonstrate in more detail how to backtest an ML-driven strategy in a historical market context using the Python libraries backtrader and Zipline.

The ultimate objective of the ML4T workflow is to gather evidence from historical data. This helps us decide whether to deploy a candidate strategy in a live market and put financial resources at risk. This process builds on the skills you developed in the previous chapters because it relies on your ability to:

  • Work with a diverse set of data sources to engineer informative factors
  • Design ML models that generate predictive...

How to backtest an ML-driven strategy

In a nutshell, the ML4T workflow, illustrated in Figure 8.1, is about backtesting a trading strategy that leverages machine learning to generate trading signals, select and size positions, or optimize the execution of trades. It involves the following steps, with a specific investment universe and horizon in mind:

  1. Source and prepare market, fundamental, and alternative data
  2. Engineer predictive alpha factors and features
  3. Design, tune, and evaluate ML models to generate trading signals
  4. Decide on trades based on these signals, for example, by applying rules
  5. Size individual positions in the portfolio context
  6. Simulate the resulting trades triggered using historical market data
  7. Evaluate how the resulting positions would have performed

Figure 8.1: The ML4T workflow

When we discussed the ML process in Chapter 6, The Machine Learning Process, we emphasized that the model's learning should...

Backtesting pitfalls and how to avoid them

Backtesting simulates an algorithmic strategy based on historical data, with the goal of producing performance results that generalize to new market conditions. In addition to the generic uncertainty around predictions in the context of ever-changing markets, several implementation aspects can bias the results and increase the risk of mistaking in-sample performance for patterns that will hold out-of-sample.

These aspects are under our control and include the selection and preparation of data, unrealistic assumptions about the trading environment, and the flawed application and interpretation of statistical tests. The risks of false backtest discoveries multiply with increasing computing power, bigger datasets, and more complex algorithms that facilitate the misidentification of apparent signals in a noisy sample.

In this section, we will outline the most serious and common methodological mistakes. Please refer to the literature on...

How a backtesting engine works

Put simply, a backtesting engine iterates over historical prices (and other data), passes the current values to your algorithm, receives orders in return, and keeps track of the resulting positions and their value.

In practice, there are numerous requirements for creating a realistic and robust simulation of the ML4T workflow that was depicted in Figure 8.1 at the beginning of this chapter. The difference between vectorized and event-driven approaches illustrates how the faithful reproduction of the actual trading environment adds significant complexity.

Vectorized versus event-driven backtesting

A vectorized backtest is the most basic way to evaluate a strategy. It simply multiplies a signal vector that represents the target position size with a vector of returns for the investment horizon to compute the period performance.

Let's illustrate the vectorized approach using the daily return predictions that we created using ridge...

backtrader – a flexible tool for local backtests

backtrader is a popular, flexible, and user-friendly Python library for local backtests with great documentation, developed since 2015 by Daniel Rodriguez. In addition to a large and active community of individual traders, there are several banks and trading houses that use backtrader to prototype and test new strategies before porting them to a production-ready platform using, for example, Java. You can also use backtrader for live trading with several brokers of your choice (see the backtrader documentation and Chapter 23, Conclusions and Next Steps).

We'll first summarize the key concepts of backtrader to clarify the big picture of the backtesting workflow on this platform, and then demonstrate its usage for a strategy driven by ML predictions.

Key concepts of backtrader's Cerebro architecture

backtrader's Cerebro (Spanish for "brain") architecture represents the key components of the backtesting...

Zipline – scalable backtesting by Quantopian

The backtesting engine Zipline powers Quantopian's online research, backtesting, and live (paper) trading platform. As a hedge fund, Quantopian aims to identify robust algorithms that outperform, subject to its risk management criteria. To this end, they use competitions to select the best strategies and allocate capital to share profits with the winners.

Quantopian first released Zipline in 2012 as version 0.5, and the latest version, 1.3, dates from July 2018. Zipline works well with its sister libraries Alphalens, pyfolio, and empyrical that we introduced in Chapter 4, Financial Feature Engineering – How to Research Alpha Factors and Chapter 5, Portfolio Optimization and Performance Evaluation, and integrates well with NumPy, pandas, and numeric libraries, but may not always support the latest version.

Zipline is designed to operate at the scale of thousands of securities, and each can be associated with a...

Summary

In this chapter, we took a much closer look at how backtesting works, what challenges there are, and how to manage them. We demonstrated how to use the two popular backtesting libraries, backtrader and Zipline.

Most importantly, however, we walked through the end-to-end process of designing and testing an ML model, showed you how to implement trading logic that acts on the signals provided by the model's predictions, and saw how to conduct and evaluate backtests. Now, we are ready to continue exploring a much broader and more sophisticated array of ML models than the linear regressions we started with.

The next chapter will cover how to incorporate the time dimension into our models.

lock icon The rest of the chapter is locked
You have been reading a chapter from
Machine Learning for Algorithmic Trading - Second Edition
Published in: Jul 2020 Publisher: Packt ISBN-13: 9781839217715
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €14.99/month. Cancel anytime}