官术网_书友最值得收藏!

6 The Machine Learning Process

This chapter starts Part 2 of this book, where we'll illustrate how you can use a range of supervised and unsupervised machine learning (ML) models for trading. We will explain each model's assumptions and use cases before we demonstrate relevant applications using various Python libraries. The categories of models that we will cover in Parts 2-4 include:

  • Linear models for the regression and classification of cross-section, time series, and panel data
  • Generalized additive models, including nonlinear tree-based models, such as decision trees
  • Ensemble models, including random forest and gradient-boosting machines
  • Unsupervised linear and nonlinear methods for dimensionality reduction and clustering
  • Neural network models, including recurrent and convolutional architectures
  • Reinforcement learning models

We will apply these models to the market, fundamental, and alternative data sources introduced in the first part of this book. We will build on the material covered so far by demonstrating how to embed these models in a trading strategy that translates model signals into trades, how to optimize portfolio, and how to evaluate strategy performance.

There are several aspects that many of these models and their applications have in common. This chapter covers these common aspects so that we can focus on model-specific usage in the following chapters. They include the overarching goal of learning a functional relationship from data by optimizing an objective or loss function. They also include the closely related methods of measuring model performance.

We'll distinguish between unsupervised and supervised learning and outline use cases for algorithmic trading. We'll contrast supervised regression and classification problems and the use of supervised learning for statistical inference of relationships between input and output data, along with its use for the prediction of future outputs.

We'll also illustrate how prediction errors are due to the model's bias or variance, or because of a high noise-to-signal ratio in the data. Most importantly, we'll present methods to diagnose sources of errors like overfitting and improve your model's performance.

In this chapter, we will cover the following topics relevant to applying the ML workflow in practice:

  • How supervised and unsupervised learning from data works
  • Training and evaluating supervised learning models for regression and classification tasks
  • How the bias-variance trade-off impacts predictive performance
  • How to diagnose and address prediction errors due to overfitting
  • Using cross-validation to optimize hyperparameters with a focus on time-series data
  • Why financial data requires additional attention when testing out-of-sample

If you are already quite familiar with ML, feel free to skip ahead and pe right into learning how to use ML models to produce and combine alpha factors for an algorithmic trading strategy. This chapter's directory in the GitHub repository contains the code examples and lists additional resources.

主站蜘蛛池模板: 禄丰县| 吉木萨尔县| 若尔盖县| 静安区| 亳州市| 德安县| 平阳县| 临西县| 阿尔山市| 宣武区| 平江县| 芦溪县| 梅河口市| 屏边| 盈江县| 桂林市| 桦川县| 福鼎市| 阿拉善盟| 济宁市| 临汾市| 阳东县| 林口县| 荣成市| 张家界市| 大同县| 东台市| 河源市| 伊吾县| 阳城县| 天镇县| 沙田区| 井冈山市| 衡南县| 泰来县| 浮山县| 宾川县| 乐昌市| 常熟市| 灵丘县| 宜良县|