官术网_书友最值得收藏!

Lexing and parsing an e-mail address

An elegant way to clean data is by defining a lexer to split up a string into tokens. In this recipe, we will parse an e-mail address using the attoparsec library. This will naturally allow us to ignore the surrounding whitespace.

Getting ready

Import the attoparsec parser combinator library:

$ cabal install attoparsec

How to do it…

Create a new file, which we will call Main.hs, and perform the following steps:

  1. Use the GHC OverloadedStrings language extension to more legibly use the Text data type throughout the code. Also, import the other relevant libraries:
    {-# LANGUAGE OverloadedStrings #-}
    import Data.Attoparsec.Text
    import Data.Char (isSpace, isAlphaNum)
  2. Declare a data type for an e-mail address:
    data E-mail = E-mail 
      { user :: String
      , host :: String
      } deriving Show
  3. Define how to parse an e-mail address. This function can be as simple or as complicated as required:
    e-mail :: Parser E-mail
    e-mail = do
      skipSpace
      user <- many' $ satisfy isAlphaNum
      at <- char '@'
      hostName <- many' $ satisfy isAlphaNum
      period <- char '.'
      domain <- many' (satisfy isAlphaNum)
      return $ E-mail user (hostName ++ "." ++ domain)
  4. Parse an e-mail address to test the code:
    main :: IO ()
    main = print $ parseOnly e-mail "nishant@shukla.io"
  5. Run the code to print out the parsed e-mail address:
    $ runhaskell Main.hs
    
    Right (E-mail {user = "nishant", host = "shukla.io"})
    

How it works…

We create an e-mail parser by matching the string against multiple tests. An e-mail address must contain some alphanumerical username, followed by the 'at' sign (@), then an alphanumerical hostname, a period, and lastly the top-level domain.

The various functions used from the attoparsec library can be found in the Data.Attoparsec.Text documentation, which is available at https://hackage.haskell.org/package/attoparsec/docs/Data-Attoparsec-Text.html.

主站蜘蛛池模板: 乌海市| 黔南| 柳河县| 江华| 洪泽县| 三穗县| 辛集市| 旅游| 上犹县| 余干县| 黔江区| 中卫市| 舞钢市| 嘉义市| 娄烦县| 邯郸市| 太保市| 卓资县| 河曲县| 江门市| 安吉县| 青川县| 当雄县| 会泽县| 罗江县| 陆良县| 孙吴县| 松阳县| 阿坝县| 揭东县| 岳普湖县| 鄂尔多斯市| 简阳市| 缙云县| 五峰| 盘山县| 昂仁县| 怀宁县| 桑日县| 中江县| 大化|