官术网_书友最值得收藏!

Computing the Manhattan distance

Defining a distance between two items allows us to easily interpret clusters and patterns. The Manhattan distance is one of the easiest to implement and is used primarily due to its simplicity.

The Manhattan distance (or Taxicab distance) between two items is the sum of the absolute differences of their coordinates. So if we are given two points (1, 1) and (5, 4), then the Manhattan distance will be |1-5| + |1-4| = 4 + 3 = 7.

We can use this distance metric to detect whether an item is unusually far away from everything else. In this recipe, we will detect outliers using the Manhattan distance. The calculations merely involve addition and subtraction, and therefore, it performs exceptionally well for a very large amount of data.

Getting ready

Create a list of comma-separated points. We will compute the smallest distance between these points and a test point:

$ cat input.csv

0,0
10,0
0,10
10,10
5,5

How to do it...

Create a new file, which we will call Main.hs, and perform the following steps:

  1. Import the CSV and List packages:
    import Text.CSV (parseCSV)
  2. Read in the following points:
    main :: IO ()
    main = do
      let fileName = "input.csv"
      input <- readFile fileName
      let csv = parseCSV fileName input
  3. Represent the data as a list of floating point numbers:
      let points = either (\e -> []) (map toPoint . myFilter) csv
  4. Define a couple of points to test the function:
      let test1 = [2,1]
      let test2 = [-10,-10]
  5. Compute the Manhattan distance on each of the points and find the smallest result:
      if (not.null) points then do
        print $ minimum $ map (manhattanDist test1) points
        print $ minimum $ map (manhattanDist test2) points
      else putStrLn "Error: no points to compare"
  6. Create a helper function to convert a list of strings to a list of floating point numbers:
    toPoint record = map (read :: String -> Float) record
  7. Compute the Manhattan distance between two points:
    manhattanDist p1 p2 = 
      sum $ zipWith (\x y -> abs (x - y)) p1 p2
  8. Filter out records that are of incorrect size:
    myFilter = filter (\x -> length x == 2)
  9. The output will be the shortest distance between the test points and the list of points:
    $ runhaskell Main.hs 
    
    3.0
    20.0

See also

If the distance matches more closely to the traditional geometric space, then read the next recipe on Computing the Euclidean distance.

主站蜘蛛池模板: 鹿泉市| 上林县| 句容市| 天柱县| 济阳县| 垣曲县| 桃源县| 全椒县| 麻城市| 大兴区| 宜昌市| 伽师县| 阿尔山市| 麻栗坡县| 关岭| 吉木乃县| 建湖县| 临海市| 普陀区| 海淀区| 昭苏县| 博乐市| 黎城县| 盐城市| 宽城| 靖远县| 和静县| 北流市| 嵊泗县| 象山县| 当阳市| 台前县| 广水市| 淮北市| 遵义县| 吉木萨尔县| 尉氏县| 南丰县| 阳泉市| 炉霍县| 新宁县|