- Scala for Machine Learning(Second Edition)
- Patrick R. Nicolas
- 342字
- 2021-07-08 10:43:05
Defining a methodology
Let's start by clarifying the role of the data scientist, software engineer, and domain expert.
A domain or subject-matter expert is a person with authoritative or credited expertise in a particular area or topic. A chemist is an expert in the domain of chemistry and possibly related fields.
A data scientist solves problems related to data in a variety of fields such as biological sciences, health care, marketing, or finances. Data and text mining, signal processing, statistical analysis, and modeling using machine learning algorithms are some of the activities performed by a data scientist.
A software developer performs all the tasks related to creating software applications, including analysis, design, coding, testing, and deployment.
A data scientist has many options in selecting and implementing a classification or clustering algorithm.
Firstly, a mathematical or statistical model is to be selected to extract knowledge from the raw input data or the output of a data upstream transformation. The selection of the model is constrained by the following parameters:
- Business requirements, such as accuracy of results or computation time
- Availability of training data, algorithms, and libraries
- Access to a domain or subject-matter expert, if needed
Secondly, the engineer has to select a computational and deployment framework suitable for the amount of data to be processed. The computational context is to be defined by the following parameters:
- Available resources, such as machines, CPU, memory, or I/O bandwidth
- Implementation strategy, such as iterative versus recursive computation or caching
- Requirement for responsiveness of the overall process, such as duration of computation or display of intermediate results
Thirdly, a domain expert has to tag or label the observations in order to generate an accurate classifier.
Finally, the model has to be validated against a reliable test dataset.
The following diagram illustrates the selection process to create a workflow:

Statistical and computation modelling for machine learning applications
The parameters of a data transformation may need to be reconfigured according to the output of the upstream data transformation. Scala's higher-order functions are particularly suitable for implementing configurable data transformations.
- Cocos2D-X權(quán)威指南(第2版)
- JavaScript修煉之道
- LabVIEW程序設(shè)計(jì)基礎(chǔ)與應(yīng)用
- 劍指JVM:虛擬機(jī)實(shí)踐與性能調(diào)優(yōu)
- Learning Neo4j 3.x(Second Edition)
- Android 應(yīng)用案例開發(fā)大全(第3版)
- JavaScript:Moving to ES2015
- 精通Linux(第2版)
- Mastering JavaScript Design Patterns(Second Edition)
- Java網(wǎng)絡(luò)編程實(shí)戰(zhàn)
- Service Mesh實(shí)戰(zhàn):基于Linkerd和Kubernetes的微服務(wù)實(shí)踐
- Visualforce Developer’s guide
- CRYENGINE Game Development Blueprints
- Java Web應(yīng)用開發(fā)項(xiàng)目教程
- Oracle數(shù)據(jù)庫編程經(jīng)典300例