官术网_书友最值得收藏!

  • Machine Learning with Swift
  • Alexander Sosnovshchenko
  • 428字
  • 2021-06-24 18:55:02

Calculating the distance

How do we calculate a distance? Well, that depends on the kind of problem. In two-dimensional space, we used to calculate the distance between two points, (x1, y1) and (x2, y2), as —the Euclidean distance. But this is not how taxi drivers calculate distance because in the city you can't cut corners and go straight to your goal. So, they use (knowing it or not) another distance metric: Manhattan distance or taxicab distance, also known as l1-norm: . This is the distance if we're only allowed to move along coordinate axes:

Figure 3.1: The blue line represents the Euclidean distance, the red line represents the Manhattan distance. Map of Manhattan by OpenStreetMap

Jewish German mathematician Hermann Minkowski proposed a generalization of both Euclidean and Manhattan distances. Here is the formula for the Minkowski distance:

where p and q are n-dimensional vectors (or coordinates of points in n-dimensional space if you wish). But what does c stand for? It is an order of the Minkowsi distance: under the c = 1, it gives an equation of Manhattan distance, and under c = 2 it gives Euclidean distance.

Vector operations, including the calculation of Manhattan and Euclidean distances, can be parallelized for efficiency. Apple's Accelerate framework provides APIs for fast vector and matrix computations.

In machine learning, we generalize the notion of distance to any kind of objects for which we can calculate how similar they are, using a function: distance metric. In this way, we can define the distance between two pieces of text, two pictures, or two audio signals. Let's take a look at two examples.

When you deal with two pieces of text of equal length, you use edit distance; for example, Hamming distance—the minimum number of substitutions needed to transform one string into another. To calculate the edit distance, we use dynamic programming, an iterative approach where the problem is broken into small subproblems, and the result of each step is remembered for future computations. Edit distance is an important measure in applications that deal with text revisions; for example, in bioinformatics (see the following diagram):

Figure 3.2: Four pieces of DNA from different species aligned together: modern human, neanderthal, gorilla, and cat. The Hamming edit distance from modern human to others is 1, 5, and 11 respectively.

Often, we store different signals (audio, motion data, and so on) as arrays of numbers. How do we measure the similarity of such two arrays? We use the combination of Euclidean distance and edit distance, called DTW.

主站蜘蛛池模板: 临汾市| 当涂县| 四会市| 鸡西市| 石屏县| 浮山县| 太谷县| 潞城市| 德化县| 绵竹市| 偃师市| 武鸣县| 津市市| 忻城县| 遵义县| 田阳县| 津市市| 合肥市| 崇文区| 清苑县| 吴旗县| 余庆县| 三门县| 大厂| 北碚区| 新化县| 佛坪县| 邛崃市| 光泽县| 拜泉县| 台中市| 河南省| 屯留县| 海伦市| 华容县| 纳雍县| 永清县| 泗阳县| 屯昌县| 沙坪坝区| 彩票|