官术网_书友最值得收藏!

Chapter 2. Data Pipelines

In the first chapter, you were acquainted with some rudimentary concepts regarding data processing, clustering, and classification.

This chapter is dedicated to the creation and maintenance of a flexible end-to-end workflow to train and classify data. The first section of the chapter introduces a data-centric (functional) approach to create number crunching applications, followed by a description of a configurable workflow computation model. The chapter concludes with an overview of different model validation techniques.

You will learn how to do the following:

  • Apply the concept of monadic design to create dynamic workflows
  • Leverage some of Scala's advanced patterns, such as the cake pattern, to build portable computational workflows
  • Take into account the bias-variance trade-off in selecting a model
  • Overcome overfitting in modeling
  • Break down data into training, test and validation sets
  • Implement model validation in Scala using precision, recall, and F score
主站蜘蛛池模板: 上虞市| 云南省| 安化县| 天台县| 西华县| 石狮市| 昌乐县| 巴南区| 嘉禾县| 昌江| 沁水县| 丹东市| 东乌珠穆沁旗| 五指山市| 清远市| 吉安县| 莱州市| 正蓝旗| 乐东| 中山市| 贞丰县| 遂宁市| 海门市| 木里| 常德市| 石林| 资阳市| 天长市| 佛学| 泸溪县| 多伦县| 桐柏县| 合作市| 依兰县| 洛隆县| 兴安县| 彰化县| 徐州市| 筠连县| 左云县| 永新县|