官术网_书友最值得收藏!

Computing the Euclidean distance

Defining a distance between two items allows us to easily interpret clusters and patterns. The Euclidean distance is one of the most geometrically natural forms of distance to implement. It uses the Pythagorean formula to compute how far away two items are, which is similar to measuring the distance with a physical ruler.

We can use this distance metric to detect whether an item is unusually far away from everything else. In this recipe, we will detect outliers using the Euclidean distance. It is slightly more computationally expensive than measuring the Manhattan distance since it involves multiplication and square roots; however, depending on the dataset, it may provide more accurate results.

Getting ready

Create a list of comma-separated points. We will compute the smallest distance between these points and a test point.

$ cat input.csv

0,0
10,0
0,10
10,10
5,5

How to do it...

Create a new file, which we will call Main.hs, and perform the following steps:

  1. Import the CSV and List packages:
    import Text.CSV (parseCSV)
  2. Read in the following points:
    main :: IO ()
    main = do
      let fileName = "input.csv"
      input <- readFile fileName
      let csv = parseCSV fileName input
  3. Represent the data as a list of floating point numbers:
      let points = either (\e -> []) (map toPoint . myFilter) csv 
  4. Define a couple of points to test out the function:
      let test1 = [2,1]
      let test2 = [-10,-10]
  5. Compute the Euclidean distance on each of the points and find the smallest result:
      if (not.null) points then do
        print $ minimum $ map (euclidianDist test1) points
        print $ minimum $ map (euclidianDist test2) points
      else putStrLn "Error: no points to compare"
  6. Create a helper function to convert a list of strings to a list of floating point numbers:
    toPoint record = map (read String -> Float) record
  7. Compute the Euclidean distance between two points:
    euclidianDist p1 p2 = sqrt $ sum $ 
                          zipWith (\x y -> (x - y)^2) p1 p2
  8. Filter out records that are of incorrect size:
    myFilter = filter (\x -> length x == 2)
  9. The output will be the shortest distance between the test points and the list of points:
    $ runhaskell Main.hs
    
    2.236068
    14.142136
    

See also

If a more computationally efficient distance calculation is required, then take a look at the previous recipe, Computing the Manhattan distance.

主站蜘蛛池模板: 黄龙县| 闸北区| 吉隆县| 灵寿县| 儋州市| 邻水| 沙雅县| 翁源县| 南溪县| 常山县| 云阳县| 黔江区| 越西县| 静宁县| 介休市| 永康市| 永靖县| 莱芜市| 孙吴县| 贡山| 新泰市| 饶阳县| 唐河县| 阜宁县| 金川县| 北辰区| 定日县| 左权县| 公主岭市| 延长县| 改则县| 牟定县| 从江县| 贡觉县| 扎鲁特旗| 龙岩市| 尉氏县| 中江县| 特克斯县| 姜堰市| 辉县市|