- Scala for Machine Learning(Second Edition)
- Patrick R. Nicolas
- 342字
- 2021-07-08 10:43:05
Defining a methodology
Let's start by clarifying the role of the data scientist, software engineer, and domain expert.
A domain or subject-matter expert is a person with authoritative or credited expertise in a particular area or topic. A chemist is an expert in the domain of chemistry and possibly related fields.
A data scientist solves problems related to data in a variety of fields such as biological sciences, health care, marketing, or finances. Data and text mining, signal processing, statistical analysis, and modeling using machine learning algorithms are some of the activities performed by a data scientist.
A software developer performs all the tasks related to creating software applications, including analysis, design, coding, testing, and deployment.
A data scientist has many options in selecting and implementing a classification or clustering algorithm.
Firstly, a mathematical or statistical model is to be selected to extract knowledge from the raw input data or the output of a data upstream transformation. The selection of the model is constrained by the following parameters:
- Business requirements, such as accuracy of results or computation time
- Availability of training data, algorithms, and libraries
- Access to a domain or subject-matter expert, if needed
Secondly, the engineer has to select a computational and deployment framework suitable for the amount of data to be processed. The computational context is to be defined by the following parameters:
- Available resources, such as machines, CPU, memory, or I/O bandwidth
- Implementation strategy, such as iterative versus recursive computation or caching
- Requirement for responsiveness of the overall process, such as duration of computation or display of intermediate results
Thirdly, a domain expert has to tag or label the observations in order to generate an accurate classifier.
Finally, the model has to be validated against a reliable test dataset.
The following diagram illustrates the selection process to create a workflow:

Statistical and computation modelling for machine learning applications
The parameters of a data transformation may need to be reconfigured according to the output of the upstream data transformation. Scala's higher-order functions are particularly suitable for implementing configurable data transformations.
- C++案例趣學
- 圖解Java數(shù)據(jù)結(jié)構與算法(微課視頻版)
- Learning RxJava
- 編程珠璣(續(xù))
- Python測試開發(fā)入門與實踐
- Learning Vaadin 7(Second Edition)
- Python 3.7從入門到精通(視頻教學版)
- Getting Started with Nano Server
- Java Web應用開發(fā)給力起飛
- C++程序設計教程(第2版)
- ASP.NET 4.0 Web程序設計
- Python Deep Learning
- Web開發(fā)的平民英雄:PHP+MySQL
- Android技術內(nèi)幕(系統(tǒng)卷)
- C語言程序設計教程