官术网_书友最值得收藏!

Chapter 2. Data Pipelines

In the first chapter, you were acquainted with some rudimentary concepts regarding data processing, clustering, and classification.

This chapter is dedicated to the creation and maintenance of a flexible end-to-end workflow to train and classify data. The first section of the chapter introduces a data-centric (functional) approach to create number crunching applications, followed by a description of a configurable workflow computation model. The chapter concludes with an overview of different model validation techniques.

You will learn how to do the following:

  • Apply the concept of monadic design to create dynamic workflows
  • Leverage some of Scala's advanced patterns, such as the cake pattern, to build portable computational workflows
  • Take into account the bias-variance trade-off in selecting a model
  • Overcome overfitting in modeling
  • Break down data into training, test and validation sets
  • Implement model validation in Scala using precision, recall, and F score
主站蜘蛛池模板: 景洪市| 荣昌县| 宁陕县| 富蕴县| 南岸区| 敦化市| 曲周县| 陆河县| 兰坪| 根河市| 榆树市| 城口县| 色达县| 峨边| 安远县| 苍溪县| 汤原县| 沁阳市| 攀枝花市| 牟定县| 达孜县| 东方市| 东乌珠穆沁旗| 承德市| 含山县| 且末县| 英吉沙县| 青河县| 敦煌市| 宜君县| 贵德县| 微博| 都兰县| 河池市| 娱乐| 拉孜县| 宁乡县| 辽源市| 喀什市| 通许县| 凤凰县|