官术网_书友最值得收藏!

Lexing and parsing an e-mail address

An elegant way to clean data is by defining a lexer to split up a string into tokens. In this recipe, we will parse an e-mail address using the attoparsec library. This will naturally allow us to ignore the surrounding whitespace.

Getting ready

Import the attoparsec parser combinator library:

$ cabal install attoparsec

How to do it…

Create a new file, which we will call Main.hs, and perform the following steps:

  1. Use the GHC OverloadedStrings language extension to more legibly use the Text data type throughout the code. Also, import the other relevant libraries:
    {-# LANGUAGE OverloadedStrings #-}
    import Data.Attoparsec.Text
    import Data.Char (isSpace, isAlphaNum)
  2. Declare a data type for an e-mail address:
    data E-mail = E-mail 
      { user :: String
      , host :: String
      } deriving Show
  3. Define how to parse an e-mail address. This function can be as simple or as complicated as required:
    e-mail :: Parser E-mail
    e-mail = do
      skipSpace
      user <- many' $ satisfy isAlphaNum
      at <- char '@'
      hostName <- many' $ satisfy isAlphaNum
      period <- char '.'
      domain <- many' (satisfy isAlphaNum)
      return $ E-mail user (hostName ++ "." ++ domain)
  4. Parse an e-mail address to test the code:
    main :: IO ()
    main = print $ parseOnly e-mail "nishant@shukla.io"
  5. Run the code to print out the parsed e-mail address:
    $ runhaskell Main.hs
    
    Right (E-mail {user = "nishant", host = "shukla.io"})
    

How it works…

We create an e-mail parser by matching the string against multiple tests. An e-mail address must contain some alphanumerical username, followed by the 'at' sign (@), then an alphanumerical hostname, a period, and lastly the top-level domain.

The various functions used from the attoparsec library can be found in the Data.Attoparsec.Text documentation, which is available at https://hackage.haskell.org/package/attoparsec/docs/Data-Attoparsec-Text.html.

主站蜘蛛池模板: 永平县| 汪清县| 平凉市| 乐至县| 宁陕县| 吴忠市| 永宁县| 郯城县| 乌兰察布市| 伊吾县| 威信县| 遂平县| 如东县| 娄烦县| 靖边县| 祁门县| 姚安县| 漳浦县| 当阳市| 榕江县| 杨浦区| 绵竹市| 交城县| 揭阳市| 东光县| 定日县| 江山市| 樟树市| 凤山县| 萨嘎县| 镇远县| 鄯善县| 林口县| 黄平县| 长岭县| 江山市| 东阳市| 溧阳市| 通辽市| 馆陶县| 朝阳区|