官术网_书友最值得收藏!

  • Scala for Data Science
  • Pascal Bugnion
  • 337字
  • 2021-07-23 14:33:02

Programming in data science

This book is not a book about data science. It is a book about how to use Scala, a programming language, for data science. So, where does programming come in when processing data?

Computers are involved at every step of the data science pipeline, but not necessarily in the same manner. The style of programs that we build will be drastically different if we are just writing throwaway scripts to explore data or trying to build a scalable application that pushes data through a well-understood pipeline to continuously deliver business intelligence.

Let's imagine that we work for a company making games for mobile phones in which you can purchase in-game benefits. The majority of users never buy anything, but a small fraction is likely to spend a lot of money. We want to build a model that recognizes big spenders based on their play patterns.

The first step is to explore data, find the right features, and build a model based on a subset of the data. In this exploration phase, we have a clear goal in mind but little idea of how to get there. We want a light, flexible language with strong libraries to get us a working model as soon as possible.

Once we have a working model, we need to deploy it on our gaming platform to analyze the usage patterns of all the current users. This is a very different problem: we have a relatively clear understanding of the goals of the program and of how to get there. The challenge comes in designing software that will scale out to handle all the users and be robust to future changes in usage patterns.

In practice, the type of software that we write typically lies on a spectrum ranging from a single throwaway script to production-level code that must be proof against future expansion and load increases. Before writing any code, the data scientist must understand where their software lies on this spectrum. Let's call this the permanence spectrum.

主站蜘蛛池模板: 化州市| 大理市| 吉水县| 礼泉县| 景洪市| 葵青区| 三都| 葵青区| 府谷县| 什邡市| 兴文县| 萨嘎县| 简阳市| 衡南县| 依安县| 临洮县| 浏阳市| 神池县| 石屏县| 迁安市| 乐安县| 额济纳旗| 栾城县| 霍邱县| 仁寿县| 高淳县| 大城县| 延边| 武平县| 永顺县| 清水县| 西丰县| 砚山县| 军事| 商河县| 安岳县| 曲水县| 竹北市| 北辰区| 运城市| 威宁|