- Haskell Data Analysis Cookbook
- Nishant Shukla
- 364字
- 2021-12-08 12:43:37
Computing the Euclidean distance
Defining a distance between two items allows us to easily interpret clusters and patterns. The Euclidean distance is one of the most geometrically natural forms of distance to implement. It uses the Pythagorean formula to compute how far away two items are, which is similar to measuring the distance with a physical ruler.

We can use this distance metric to detect whether an item is unusually far away from everything else. In this recipe, we will detect outliers using the Euclidean distance. It is slightly more computationally expensive than measuring the Manhattan distance since it involves multiplication and square roots; however, depending on the dataset, it may provide more accurate results.
Getting ready
Create a list of comma-separated points. We will compute the smallest distance between these points and a test point.
$ cat input.csv 0,0 10,0 0,10 10,10 5,5
How to do it...
Create a new file, which we will call Main.hs
, and perform the following steps:
- Import the CSV and List packages:
import Text.CSV (parseCSV)
- Read in the following points:
main :: IO () main = do let fileName = "input.csv" input <- readFile fileName let csv = parseCSV fileName input
- Represent the data as a list of floating point numbers:
let points = either (\e -> []) (map toPoint . myFilter) csv
- Define a couple of points to test out the function:
let test1 = [2,1] let test2 = [-10,-10]
- Compute the Euclidean distance on each of the points and find the smallest result:
if (not.null) points then do print $ minimum $ map (euclidianDist test1) points print $ minimum $ map (euclidianDist test2) points else putStrLn "Error: no points to compare"
- Create a helper function to convert a list of strings to a list of floating point numbers:
toPoint record = map (read String -> Float) record
- Compute the Euclidean distance between two points:
euclidianDist p1 p2 = sqrt $ sum $ zipWith (\x y -> (x - y)^2) p1 p2
- Filter out records that are of incorrect size:
myFilter = filter (\x -> length x == 2)
- The output will be the shortest distance between the test points and the list of points:
$ runhaskell Main.hs 2.236068 14.142136
See also
If a more computationally efficient distance calculation is required, then take a look at the previous recipe, Computing the Manhattan distance.
- Learning Scala Programming
- Learning Flask Framework
- Blender 3D Incredible Machines
- AutoCAD VBA參數化繪圖程序開發與實戰編碼
- 琢石成器:Windows環境下32位匯編語言程序設計
- 精通MATLAB(第3版)
- UML 基礎與 Rose 建模案例(第3版)
- 硅谷Python工程師面試指南:數據結構、算法與系統設計
- 區塊鏈技術與應用
- Java語言程序設計教程
- UI設計全書(全彩)
- Buildbox 2.x Game Development
- 深度探索Go語言:對象模型與runtime的原理特性及應用
- 單片機原理及應用技術
- Java設計模式深入研究