Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Machine Learning for Algorithmic Trading - Second Edition

You're reading from  Machine Learning for Algorithmic Trading - Second Edition

Product type Book
Published in Jul 2020
Publisher Packt
ISBN-13 9781839217715
Pages 822 pages
Edition 2nd Edition
Languages
Author (1):
Stefan Jansen Stefan Jansen
Profile icon Stefan Jansen

Table of Contents (27) Chapters

Preface 1. Machine Learning for Trading – From Idea to Execution 2. Market and Fundamental Data – Sources and Techniques 3. Alternative Data for Finance – Categories and Use Cases 4. Financial Feature Engineering – How to Research Alpha Factors 5. Portfolio Optimization and Performance Evaluation 6. The Machine Learning Process 7. Linear Models – From Risk Factors to Return Forecasts 8. The ML4T Workflow – From Model to Strategy Backtesting 9. Time-Series Models for Volatility Forecasts and Statistical Arbitrage 10. Bayesian ML – Dynamic Sharpe Ratios and Pairs Trading 11. Random Forests – A Long-Short Strategy for Japanese Stocks 12. Boosting Your Trading Strategy 13. Data-Driven Risk Factors and Asset Allocation with Unsupervised Learning 14. Text Data for Trading – Sentiment Analysis 15. Topic Modeling – Summarizing Financial News 16. Word Embeddings for Earnings Calls and SEC Filings 17. Deep Learning for Trading 18. CNNs for Financial Time Series and Satellite Images 19. RNNs for Multivariate Time Series and Sentiment Analysis 20. Autoencoders for Conditional Risk Factors and Asset Pricing 21. Generative Adversarial Networks for Synthetic Time-Series Data 22. Deep Reinforcement Learning – Building a Trading Agent 23. Conclusions and Next Steps 24. References
25. Index
Appendix: Alpha Factor Library

Random Forests – A Long-Short Strategy for Japanese Stocks

In this chapter, we will learn how to use two new classes of machine learning models for trading: decision trees and random forests. We will see how decision trees learn rules from data that encode nonlinear relationships between the input and the output variables. We will illustrate how to train a decision tree and use it for prediction with regression and classification problems, visualize and interpret the rules learned by the model, and tune the model's hyperparameters to optimize the bias-variance trade-off and prevent overfitting.

Decision trees are not only important standalone models but are also frequently used as components in other models. In the second part of this chapter, we will introduce ensemble models that combine multiple individual models to produce a single aggregate prediction with lower prediction-error variance.

We will illustrate bootstrap aggregation, often called bagging, as...

Decision trees – learning rules from data

A decision tree is a machine learning algorithm that predicts the value of a target variable based on decision rules learned from data. The algorithm can be applied to both regression and classification problems by changing the objective function that governs how the tree learns the rules.

We will discuss how decision trees use rules to make predictions, how to train them to predict (continuous) returns as well as (categorical) directions of price movements, and how to interpret, visualize, and tune them effectively. See Rokach and Maimon (2008) and Hastie, Tibshirani, and Friedman (2009) for additional details and further background information.

How trees learn and apply decision rules

The linear models we studied in Chapter 7, Linear Models – From Risk Factors to Return Forecasts, and Chapter 9, Time-Series Models for Volatility Forecasts and Statistical Arbitrage, learn a set of parameters to predict the outcome...

Random forests – making trees more reliable

Decision trees are not only useful for their transparency and interpretability. They are also fundamental building blocks for more powerful ensemble models that combine many individual trees, while randomly varying their design to address the overfitting problems we just discussed.

Why ensemble models perform better

Ensemble learning involves combining several machine learning models into a single new model that aims to make better predictions than any individual model. More specifically, an ensemble integrates the predictions of several base estimators, trained using one or more learning algorithms, to reduce the generalization error that these models produce on their own.

For ensemble learning to achieve this goal, the individual models must be:

  • Accurate: Outperform a naive baseline (such as the sample mean or class proportions)
  • Independent: Predictions are generated differently to produce different...

Long-short signals for Japanese stocks

In Chapter 9, Time-Series Models for Volatility Forecasts and Statistical Arbitrage, we used cointegration tests to identify pairs of stocks with a long-term equilibrium relationship in the form of a common trend to which their prices revert.

In this chapter, we will use the predictions of a machine learning model to identify assets that are likely to go up or down so we can enter market-neutral long and short positions, accordingly. The approach is similar to our initial trading strategy that used linear regression in Chapter 7, Linear Models – From Risk Factors to Return Forecasts, and Chapter 8, The ML4T Workflow – From Model to Strategy Backtesting.

Instead of the scikit-learn random forest implementation, we will use the LightGBM package, which has been primarily designed for gradient boosting. One of several advantages is LightGBM's ability to efficiently encode categorical variables as numeric features rather...

Summary

In this chapter, we learned about a new class of model capable of capturing a non-linear relationship, in contrast to the classical linear models we had explored so far. We saw how decision trees learn rules to partition the feature space into regions that yield predictions, and thus segment the input data into specific regions.

Decision trees are very useful because they provide unique insights into the relationships between features and target variables, and we saw how to visualize the sequence of decision rules encoded in the tree structure.

Unfortunately, a decision tree is prone to overfitting. We learned that ensemble models and the bootstrap aggregation method manage to overcome some of the shortcomings of decision trees and render them useful as components of much more powerful composite models.

In the next chapter, we will explore another ensemble model, boosting, which has come to be considered one of the most important machine learning algorithms.

...
lock icon The rest of the chapter is locked
You have been reading a chapter from
Machine Learning for Algorithmic Trading - Second Edition
Published in: Jul 2020 Publisher: Packt ISBN-13: 9781839217715
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €14.99/month. Cancel anytime}