- Haskell Data Analysis Cookbook
- Nishant Shukla
- 311字
- 2021-12-08 12:43:37
Comparing sparse data using cosine similarity
When a data set has multiple empty fields, comparing the distance using the Manhattan or Euclidean metrics might result in skewed results. Cosine similarity measures how closely two vectors are oriented with each other. For example, the vectors (82, 86) and (86, 82) essentially point in the same direction. In fact, their cosine similarity is equivalent to the cosine similarity between (41, 43) and (43, 41). A cosine similarity of 1 corresponds to vectors that point in the exact same direction, and 0 corresponds to vectors that are completely orthogonal to each other.

As long as the angles between the two vectors are equal, their cosine similarity is equivalent. Applying a distance metric such as the Manhattan distance or Euclidean distance in this case produces a significant difference between the two sets of data.
The cosine similarity between the two vectors is the dot product of the two vectors divided by the product of their magnitudes.

How to do it...
Create a new file, which we will call Main.hs
, and perform the following steps:
- Implement
main
to compute the cosine similarity between two lists of numbers.main :: IO () main = do let d1 = [3.5, 2, 0, 4.5, 5, 1.5, 2.5, 2] let d2 = [ 3, 0, 0, 5, 4, 2.5, 3, 0]
- Compute the cosine similarity.
let similarity = dot d1 d2 / (eLen d1 * eLen d2) print similarity
- Define the dot product and Euclidean length helper functions.
dot a b = sum $ zipWith (*) a b eLen a = sqrt $ dot a a
- Run the code to print the cosine similarity.
$ runhaskell Main.hs 0.924679432210068
See also
If the data set is not sparse, consider using the Manhattan or Euclidean distance metrics instead, as detailed in the recipes Computing the Manhattan distance and Computing the Euclidean distance.
- Learning Docker
- 劍指JVM:虛擬機實踐與性能調優
- Three.js開發指南:基于WebGL和HTML5在網頁上渲染3D圖形和動畫(原書第3版)
- 深入淺出Windows API程序設計:編程基礎篇
- TypeScript圖形渲染實戰:基于WebGL的3D架構與實現
- C語言程序設計
- Java:High-Performance Apps with Java 9
- Kotlin從基礎到實戰
- Python從入門到精通(第3版)
- 高性能PHP 7
- Mastering Machine Learning with R
- Cinder:Begin Creative Coding
- Improving your Penetration Testing Skills
- KnockoutJS Blueprints
- 人件集:人性化的軟件開發