- Mastering Machine Learning with Spark 2.x
- Alex Tellez Max Pumperla Michal Malohlava
- 265字
- 2021-07-02 18:46:04
The machine learning algorithm using a distributed environment
Machine learning algorithms combine simple tasks into complex patterns, that are even more complicated in distributed environment. Let's take a simple decision tree algorithm (reference), for example. This particular algorithm creates a binary tree that tries to fit training data and minimize prediction errors. However, in order to do this, it has to decide about the branch of tree it has to send every data point to (don't worry, we'll cover the mechanics of how this algorithm works along with some very useful parameters that you can learn in later in the book). Let's demonstrate it with a simple example:

Consider the situation depicted in preceding figure. A two-dimensional board with many points colored in two colors: red and blue. The goal of the decision tree is to learn and generalize the shape of data and help decide about the color of a new point. In our example, we can easily see that the points almost follow a chessboard pattern. However, the algorithm has to figure out the structure by itself. It starts by finding the best position of a vertical or horizontal line, which would separate the red points from the blue points.
The found decision is stored in the tree root and the steps are recursively applied on both the partitions. The algorithm ends when there is a single point in the partition:

- 在最好的年紀學Python:小學生趣味編程
- ASP.NET MVC4框架揭秘
- Getting Started with CreateJS
- Python神經網絡項目實戰
- 編寫高質量代碼:改善C程序代碼的125個建議
- Data Analysis with Stata
- Unity Game Development Scripting
- Flutter跨平臺開發入門與實戰
- Python High Performance Programming
- 深入剖析Java虛擬機:源碼剖析與實例詳解(基礎卷)
- Solutions Architect's Handbook
- BeagleBone Robotic Projects(Second Edition)
- Instant Automapper
- Clojure High Performance Programming(Second Edition)
- Spring Data JPA從入門到精通