- Haskell Data Analysis Cookbook
- Nishant Shukla
- 395字
- 2021-12-08 12:43:39
Searching a string using the Rabin-Karp algorithm
The Rabin-Karp algorithm finds a pattern in a body of text by matching a unique representation of the pattern against a moving window. The unique representation, or hash, is computed by considering a string as a number written in an arbitrary base of 26 or greater.
The advantage of Rabin-Karp is in searching for many needles in a haystack. It's not very efficient to search for just a single string. After the initial preprocessing of the corpus, the algorithm can quickly find matches.
Getting ready
Install the Data.ByteString.Search
library from Cabal as follows:
$ cabal install stringsearch
How to do it...
- Use the
OverloadedStrings
language extension to facilitate theByteString
manipulations in our code as follows. It essentially allows polymorphic behavior for strings so that the GHC compiler may infer it as aByteString
type when necessary:{-# LANGUAGE OverloadedStrings #-}
- Import the Rabin-Karp algorithms as follows:
import Data.ByteString.Search.KarpRabin (indicesOfAny) import qualified Data.ByteString as BS
- Define a couple of patterns to find and obtain the corpus from a
big.txt
file, as shown in the following code snippet:main = do let needles = [ "preparing to go away" , "is some letter of recommendation"] haystack <- BS.readFile "big.txt"
- Run the Rabin-Karp algorithm on all the search patterns as follows:
print $ indicesOfAny needles haystack
- The code prints out all indices found for each needle as a list of tuples. The first element of the tuple is the position in the haystack that the needle was found. The second element of the tuple is a list of indices of the needles. In our recipe, we find one instance of "preparing to go away" and two instances of "is some letter of recommendation."
$ runhaskell Main.hs [(3738968,[1]),(5632846,[0]),(5714386,[0])]
How it works...
In Rabin-Karp, a fixed window moves from left to right, comparing the unique hash values for efficient comparisons. The hash function converts a string to its numerical representation. Here's an example of converting a string into base b equal to 256: "hello" = h' * b4 + e' * b3 + l' * b2 + l' * b1 + o' * b0 (which results in 448378203247), where each letter h' = ord h
(which results in 104), and so on.
See also
To see another efficient string searching algorithm, refer to the Searching a string using the Boyer-Moore-Horspool algorithm recipe.
- 多媒體CAI課件設(shè)計與制作導(dǎo)論(第二版)
- HoloLens Beginner's Guide
- JavaScript 網(wǎng)頁編程從入門到精通 (清華社"視頻大講堂"大系·網(wǎng)絡(luò)開發(fā)視頻大講堂)
- C語言程序設(shè)計
- SEO實戰(zhàn)密碼
- Python Data Analysis Cookbook
- Go語言精進(jìn)之路:從新手到高手的編程思想、方法和技巧(1)
- C#程序設(shè)計教程(第3版)
- Learning Modular Java Programming
- C#面向?qū)ο蟪绦蛟O(shè)計(第2版)
- 數(shù)字媒體技術(shù)概論
- Python數(shù)據(jù)科學(xué)實踐指南
- Clojure Web Development Essentials
- Mastering Unreal Engine 4.X
- Python程序設(shè)計教程