- Haskell Data Analysis Cookbook
- Nishant Shukla
- 379字
- 2021-12-08 12:43:37
Computing the Manhattan distance
Defining a distance between two items allows us to easily interpret clusters and patterns. The Manhattan distance is one of the easiest to implement and is used primarily due to its simplicity.

The Manhattan distance (or Taxicab distance) between two items is the sum of the absolute differences of their coordinates. So if we are given two points (1, 1) and (5, 4), then the Manhattan distance will be |1-5| + |1-4| = 4 + 3 = 7.
We can use this distance metric to detect whether an item is unusually far away from everything else. In this recipe, we will detect outliers using the Manhattan distance. The calculations merely involve addition and subtraction, and therefore, it performs exceptionally well for a very large amount of data.
Getting ready
Create a list of comma-separated points. We will compute the smallest distance between these points and a test point:
$ cat input.csv 0,0 10,0 0,10 10,10 5,5
How to do it...
Create a new file, which we will call Main.hs
, and perform the following steps:
- Import the CSV and List packages:
import Text.CSV (parseCSV)
- Read in the following points:
main :: IO () main = do let fileName = "input.csv" input <- readFile fileName let csv = parseCSV fileName input
- Represent the data as a list of floating point numbers:
let points = either (\e -> []) (map toPoint . myFilter) csv
- Define a couple of points to test the function:
let test1 = [2,1] let test2 = [-10,-10]
- Compute the Manhattan distance on each of the points and find the smallest result:
if (not.null) points then do print $ minimum $ map (manhattanDist test1) points print $ minimum $ map (manhattanDist test2) points else putStrLn "Error: no points to compare"
- Create a helper function to convert a list of strings to a list of floating point numbers:
toPoint record = map (read :: String -> Float) record
- Compute the Manhattan distance between two points:
manhattanDist p1 p2 = sum $ zipWith (\x y -> abs (x - y)) p1 p2
- Filter out records that are of incorrect size:
myFilter = filter (\x -> length x == 2)
- The output will be the shortest distance between the test points and the list of points:
$ runhaskell Main.hs 3.0 20.0
See also
If the distance matches more closely to the traditional geometric space, then read the next recipe on Computing the Euclidean distance.
- 深入核心的敏捷開發:ThoughtWorks五大關鍵實踐
- Dynamics 365 for Finance and Operations Development Cookbook(Fourth Edition)
- ASP.NET Web API:Build RESTful web applications and services on the .NET framework
- Rake Task Management Essentials
- Oracle 12c中文版數據庫管理、應用與開發實踐教程 (清華電腦學堂)
- x86匯編語言:從實模式到保護模式(第2版)
- FPGA Verilog開發實戰指南:基于Intel Cyclone IV(進階篇)
- Visual FoxPro程序設計習題集及實驗指導(第四版)
- Oracle GoldenGate 12c Implementer's Guide
- Android應用開發深入學習實錄
- Django 5企業級Web應用開發實戰(視頻教學版)
- Visual C++開發寶典
- Python機器學習開發實戰
- Node.js應用開發
- AngularJS Web Application Development Cookbook