- Haskell Data Analysis Cookbook
- Nishant Shukla
- 375字
- 2021-12-08 12:43:39
Searching for a substring using Data.ByteString
There are many algorithms to search for a string within another string. This recipe will use an existing breakSubstring
function in the Data.ByteString
library to do most of the heavy lifting.
The ByteString
documentation establishes its merits by declaring the following claim:
"[A ByteString is] a time- and space-efficient implementation of byte vectors using packed Word8 arrays, suitable for high performance use, both in terms of large data quantities, or high speed requirements. Byte vectors are encoded as strict Word8 arrays of bytes, held in a ForeignPtr, and can be passed between C and Haskell with little effort."
More information and documentation can be obtained on the package web page at http://hackage.haskell.org/package/bytestring/docs/Data-ByteString.html.
How to do it...
- Import the
breakSubstring
function as well as theData.ByteString.Char8
package as follows:import Data.ByteString (breakSubstring) import qualified Data.ByteString.Char8 as C
- Pack the strings as a
ByteString
and feed them intobreakSubstring
which has the following type:ByteString -> ByteString -> (ByteString, ByteString)
. Then determine whether the string is found:substringFound :: String -> String -> Bool substringFound query str = (not . C.null . snd) $ breakSubstring (C.pack query) (C.pack str)
- Try out some tests in
main
as follows:main = do print $ substringFound "scraf" "swedish scraf mafia" print $ substringFound "flute" "swedish scraf mafia"
- Executing
main
will print out the following results:True False
How it works...
The breakSubstring
function recursively checks if the pattern is a prefix of the string. To lazily find the first occurrence of a string, we can call snd (breakSubstring pat str)
.
There's more...
Another elegant way to quickly find a substring is by using the isInfixOf
function provided by both Data.List
and Data.ByteString
. Moreover, we can also use the OverloadedStrings
language extension to remove verbiage, as shown in the following code snippet:
{-# LANGUAGE OverloadedStrings #-} import Data.ByteString (isInfixOf) main = do print $ isInfixOf "scraf" "swedish scraf mafia" print $ isInfixOf "flute" "swedish scraf mafia"
See also
Depending on the length of the pattern we're trying to find and the length of the whole string itself, other algorithms may provide better performance. Refer to the Searching a string using the Boyer-Moore-Horspool algorithm and Searching a string using the Rabin-Karp algorithm recipes for more details.