You're reading from Machine Learning for Algorithmic Trading - Second Edition

Product type Book

Published in Jul 2020

Publisher Packt

ISBN-13 9781839217715

Pages 822 pages

Edition 2nd Edition

Languages

Python

Concepts

Machine Learning

Author (1):

Stefan Jansen

Table of Contents (27) Chapters

Preface

1. Machine Learning for Trading – From Idea to Execution

2. Market and Fundamental Data – Sources and Techniques

3. Alternative Data for Finance – Categories and Use Cases

4. Financial Feature Engineering – How to Research Alpha Factors

5. Portfolio Optimization and Performance Evaluation

6. The Machine Learning Process

7. Linear Models – From Risk Factors to Return Forecasts

8. The ML4T Workflow – From Model to Strategy Backtesting

9. Time-Series Models for Volatility Forecasts and Statistical Arbitrage

10. Bayesian ML – Dynamic Sharpe Ratios and Pairs Trading

11. Random Forests – A Long-Short Strategy for Japanese Stocks

12. Boosting Your Trading Strategy

13. Data-Driven Risk Factors and Asset Allocation with Unsupervised Learning

14. Text Data for Trading – Sentiment Analysis

15. Topic Modeling – Summarizing Financial News

16. Word Embeddings for Earnings Calls and SEC Filings

17. Deep Learning for Trading

18. CNNs for Financial Time Series and Satellite Images

19. RNNs for Multivariate Time Series and Sentiment Analysis

20. Autoencoders for Conditional Risk Factors and Asset Pricing

21. Generative Adversarial Networks for Synthetic Time-Series Data

22. Deep Reinforcement Learning – Building a Trading Agent

23. Conclusions and Next Steps

24. References

25. Index

Appendix: Alpha Factor Library

Word Embeddings for Earnings Calls and SEC Filings

In the two previous chapters, we converted text data into a numerical format using the bag-of-words model. The result is sparse, fixed-length vectors that represent documents in high-dimensional word space. This allows the similarity of documents to be evaluated and creates features to train a model with a view to classifying a document's content or rating the sentiment expressed in it. However, these vectors ignore the context in which a term is used so that two sentences containing the same words in a different order would be encoded by the same vector, even if their meaning is quite different.

This chapter introduces an alternative class of algorithms that use neural networks to learn a vector representation of individual semantic units like a word or a paragraph. These vectors are dense rather than sparse, have a few hundred real-valued entries, and are called embeddings because they assign each semantic unit a location...

How word embeddings encode semantics

The bag-of-words model represents documents as sparse, high-dimensional vectors that reflect the tokens they contain. Word embeddings represent tokens as dense, lower-dimensional vectors so that the relative location of words reflects how they are used in context. They embody the distributional hypothesis from linguistics that claims words are best defined by the company they keep.

Word vectors are capable of capturing numerous semantic aspects; not only are synonyms assigned nearby embeddings, but words can have multiple degrees of similarity. For example, the word "driver" could be similar to "motorist" or to "factor." Furthermore, embeddings encode relationships among pairs of words like analogies (Tokyo is to Japan what Paris is to France, or went is to go what saw is to see), as we will illustrate later in this section.

Embeddings result from training a neural network to predict words from their context...

How to use pretrained word vectors

There are several sources for pretrained word embeddings. Popular options include Stanford's GloVE and spaCy's built-in vectors (refer to the using_pretrained_vectors notebook for details). In this section, we will focus on GloVe.

GloVe – Global vectors for word representation

GloVe (Global Vectors for Word Representation, Pennington, Socher, and Manning, 2014) is an unsupervised algorithm developed at the Stanford NLP lab that learns vector representations for words from aggregated global word-word co-occurrence statistics (see resources linked on GitHub). Vectors pretrained on the following web-scale sources are available:

Common Crawl with 42 billion or 840 billion tokens and a vocabulary or 1.9 million or 2.2 million tokens
Wikipedia 2014 + Gigaword 5 with 6 billion tokens and a vocabulary of 400,000 tokens
Twitter using 2 billion tweets, 27 billion tokens, and a vocabulary of 1.2 million tokens...

Custom embeddings for financial news

Many tasks require embeddings of domain-specific vocabulary that models pretrained on a generic corpus may not be able to capture. Standard word2vec models are not able to assign vectors to out-of-vocabulary words and instead use a default vector that reduces their predictive value.

For example, when working with industry-specific documents, the vocabulary or its usage may change over time as new technologies or products emerge. As a result, the embeddings need to evolve as well. In addition, documents like corporate earnings releases use nuanced language that GloVe vectors pretrained on Wikipedia articles are unlikely to properly reflect.

In this section, we will train and evaluate domain-specific embeddings using financial news. We'll first show how to preprocess the data for this task, then demonstrate how the skip-gram architecture outlined in the first section works, and finally visualize the results. We also will introduce...

word2vec for trading with SEC filings

In this section, we will learn word and phrase vectors from annual SEC filings using Gensim to illustrate the potential value of word embeddings for algorithmic trading. In the following sections, we will combine these vectors as features with price returns to train neural networks to predict equity prices from the content of security filings.

In particular, we will use a dataset containing over 22,000 10-K annual reports from the period 2013-2016 that are filed by over 6,500 listed companies and contain both financial information and management commentary (see Chapter 2, Market and Fundamental Data – Sources and Techniques).

For about 3,000 companies corresponding to 11,000 filings, we have stock prices to label the data for predictive modeling. (See data source details and download instructions and preprocessing code samples in the sec_preprocessing notebook in the sec-filings folder.)

Preprocessing – sentence detection...

Sentiment analysis using doc2vec embeddings

Text classification requires combining multiple word embeddings. A common approach is to average the embedding vectors for each word in the document. This uses information from all embeddings and effectively uses vector addition to arrive at a different location point in the embedding space. However, relevant information about the order of words is lost.

In contrast, the document embedding model, doc2vec, developed by the word2vec authors shortly after publishing their original contribution, produces embeddings for pieces of text like a paragraph or a product review directly. Similar to word2vec, there are also two flavors of doc2vec:

The distributed bag of words (DBOW) model corresponds to the word2vec CBOW model. The document vectors result from training a network on the synthetic task of predicting a target word based on both the context word vectors and the document's doc vector.
The distributed memory (DM)...

New frontiers – pretrained transformer models

Word2vec and GloVe embeddings capture more semantic information than the bag-of-words approach. However, they allow only a single fixed-length representation of each token that does not differentiate between context-specific usages. To address unsolved problems such as multiple meanings for the same word, called polysemy, several new models have emerged that build on the attention mechanism designed to learn more contextualized word embeddings (Vaswani et al., 2017). The key characteristics of these models are as follows:

The use of bidirectional language models that process text both left-to-right and right-to-left for a richer context representation
The use of semi-supervised pretraining on a large generic corpus to learn universal language aspects in the form of embeddings and network weights that can be used and fine-tuned for specific tasks (a form of transfer learning that we will discuss in more detail in...

Summary

In this chapter, we discussed a new way of generating text features that use shallow neural networks for unsupervised machine learning. We saw how the resulting word embeddings capture interesting semantic aspects beyond the meaning of individual tokens by capturing some of the context in which they are used. We also covered how to evaluate the quality of word vectors using analogies and linear algebra.

We used Keras to build the network architecture that produces these features and applied the more performant Gensim implementation to financial news and SEC filings. Despite the relatively small datasets, the word2vec embeddings did capture meaningful relationships. We also demonstrated how appropriate labeling with stock price data can form the basis for supervised learning.

We applied the doc2vec algorithm, which produces a document rather than token vectors, to build a sentiment classifier based on Yelp business reviews. While this is unlikely to yield tradeable...