Reader small image

You're reading from  Machine Learning for Algorithmic Trading - Second Edition

Product typeBook
Published inJul 2020
Reading LevelIntermediate
PublisherPackt
ISBN-139781839217715
Edition2nd Edition
Languages
Right arrow
Author (1)
Stefan Jansen
Stefan Jansen
author image
Stefan Jansen

Stefan is the founder and CEO of Applied AI. He advises Fortune 500 companies, investment firms, and startups across industries on data & AI strategy, building data science teams, and developing end-to-end machine learning solutions for a broad range of business problems. Before his current venture, he was a partner and managing director at an international investment firm, where he built the predictive analytics and investment research practice. He was also a senior executive at a global fintech company with operations in 15 markets, advised Central Banks in emerging markets, and consulted for the World Bank. He holds Master's degrees in Computer Science from Georgia Tech and in Economics from Harvard and Free University Berlin, and a CFA Charter. He has worked in six languages across Europe, Asia, and the Americas and taught data science at Datacamp and General Assembly.
Read more about Stefan Jansen

Right arrow

Alternative Data for Finance – Categories and Use Cases

The previous chapter covered working with market and fundamental data, which have been the traditional drivers of trading strategies. In this chapter, we'll fast-forward to the recent emergence of a broad range of much more diverse data sources as fuel for discretionary and algorithmic strategies. Their heterogeneity and novelty have inspired the label of alternative data and created a rapidly growing provider and service industry.

Behind this trend is a familiar story: propelled by the explosive growth of the internet and mobile networks, digital data continues to grow exponentially amid advances in the technology to process, store, and analyze new data sources. The exponential growth in the availability of and ability to manage more diverse digital data, in turn, has been a critical force behind the dramatic performance improvements of machine learning (ML) that are driving innovation across industries...

The alternative data revolution

The data deluge driven by digitization, networking, and plummeting storage costs has led to profound qualitative changes in the nature of information available for predictive analytics, often summarized by the five Vs:

  • Volume: The amount of data generated, collected, and stored is orders of magnitude larger as the byproduct of online and offline activity, transactions, records, and other sources. Volumes continue to grow with the capacity for analysis and storage.
  • Velocity: Data is generated, transferred, and processed to become available near, or at, real-time speed.
  • Variety: Data is organized in formats no longer limited to structured, tabular forms, such as CSV files or relational database tables. Instead, new sources produce semi-structured formats, such as JSON or HTML, and unstructured content, including raw text, "images"? and audio or video data, adding new challenges to render data suitable for ML...

Sources of alternative data

Alternative datasets are generated by many sources but can be classified at a high level as predominantly produced by:

  • Individuals who post on social media, review products, or use search engines
  • Businesses that record commercial transactions (in particular, credit card payments) or capture supply-chain activity as intermediaries
  • Sensors that, among many other things, capture economic activity through images from satellites or security cameras, or through movement patterns such as cell phone towers

The nature of alternative data continues to evolve rapidly as new data sources become available and sources previously labeled "alternative" become part of the mainstream. The Baltic Dry Index (BDI), for instance, assembles data from several hundred shipping companies to approximate the supply/demand of dry bulk carriers and is now available on the Bloomberg Terminal.

Alternative data includes raw data as...

Criteria for evaluating alternative data

The ultimate objective of alternative data is to provide an informational advantage in the competitive search for trading signals that produce alpha, namely positive, uncorrelated investment returns. In practice, the signals extracted from alternative datasets can be used on a standalone basis or combined with other signals as part of a quantitative strategy. Independent usage is viable if the Sharpe ratio generated by a strategy based on a single dataset is sufficiently high, but that is rare in practice. (See Chapter 4, Financial Feature Engineering – How to Research Alpha Factors, for details on signal measurement and evaluation.)

Quant firms are building libraries of alpha factors that may be weak signals individually but can produce attractive returns in combination. As highlighted in Chapter 1, Machine Learning for Trading – From Idea to Execution, investment factors should be based on a fundamental and economic...

The market for alternative data

The investment industry spent an estimated $2-3 billion on data services in 2018, and this number is expected to grow at a double-digit rate per year in line with other industries. This expenditure includes the acquisition of alternative data, investments in related technology, and the hiring of qualified talent.

A survey by Ernst & Young shows significant adoption of alternative data in 2017; 43 percent of funds were using scraped web data, for instance, and almost 30 percent were experimenting with satellite data (see Figure 3.1). Based on the experience so far, fund managers considered scraped web data and credit card data to be most insightful, in contrast to geolocation and satellite data, which around 25 percent considered to be less informative:

Figure 3.1: Usefulness and usage of alternative data (Source: Ernst & Young, 2017)

Reflecting the rapid growth of this new industry, the market for alternative data providers...

Working with alternative data

We will illustrate the acquisition of alternative data using web scraping, targeting first OpenTable restaurant data, and then move on to earnings call transcripts hosted by Seeking Alpha.

Scraping OpenTable data

Typical sources of alternative data are review websites such as Glassdoor or Yelp, which convey insider insights using employee comments or guest reviews. Clearly, user-contributed content does not capture a representative view, but rather is subject to severe selection biases. We'll look at Yelp reviews in Chapter 14, Text Data for Trading – Sentiment Analysis, for example, and find many more very positive and negative ratings on the five-star scale than you might expect. Nonetheless, this data can be valuable input for ML models that aim to predict a business's prospects or market value relative to competitors or over time to obtain trading signals.

The data needs to be extracted from the HTML source, barring...

Summary

In this chapter, we introduced new sources of alternative data made available as a result of the big data revolution, including individuals, business processes, and sensors, such as satellites or GPS location devices. We presented a framework to evaluate alternative datasets from an investment perspective and laid out key categories and providers to help you navigate this vast and quickly expanding area that provides critical inputs for algorithmic trading strategies that use ML.

We also explored powerful Python tools you can use to collect your own datasets at scale. We did this so that you can potentially work on getting your private informational edge as an algorithmic trader using web scraping.

We will now proceed, in the following chapter, to the design and evaluation of alpha factors that produce trading signals and look at how to combine them in a portfolio context.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Machine Learning for Algorithmic Trading - Second Edition
Published in: Jul 2020Publisher: PacktISBN-13: 9781839217715
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Stefan Jansen

Stefan is the founder and CEO of Applied AI. He advises Fortune 500 companies, investment firms, and startups across industries on data & AI strategy, building data science teams, and developing end-to-end machine learning solutions for a broad range of business problems. Before his current venture, he was a partner and managing director at an international investment firm, where he built the predictive analytics and investment research practice. He was also a senior executive at a global fintech company with operations in 15 markets, advised Central Banks in emerging markets, and consulted for the World Bank. He holds Master's degrees in Computer Science from Georgia Tech and in Economics from Harvard and Free University Berlin, and a CFA Charter. He has worked in six languages across Europe, Asia, and the Americas and taught data science at Datacamp and General Assembly.
Read more about Stefan Jansen