官术网_书友最值得收藏!

Lexing and parsing an e-mail address

An elegant way to clean data is by defining a lexer to split up a string into tokens. In this recipe, we will parse an e-mail address using the attoparsec library. This will naturally allow us to ignore the surrounding whitespace.

Getting ready

Import the attoparsec parser combinator library:

$ cabal install attoparsec

How to do it…

Create a new file, which we will call Main.hs, and perform the following steps:

  1. Use the GHC OverloadedStrings language extension to more legibly use the Text data type throughout the code. Also, import the other relevant libraries:
    {-# LANGUAGE OverloadedStrings #-}
    import Data.Attoparsec.Text
    import Data.Char (isSpace, isAlphaNum)
  2. Declare a data type for an e-mail address:
    data E-mail = E-mail 
      { user :: String
      , host :: String
      } deriving Show
  3. Define how to parse an e-mail address. This function can be as simple or as complicated as required:
    e-mail :: Parser E-mail
    e-mail = do
      skipSpace
      user <- many' $ satisfy isAlphaNum
      at <- char '@'
      hostName <- many' $ satisfy isAlphaNum
      period <- char '.'
      domain <- many' (satisfy isAlphaNum)
      return $ E-mail user (hostName ++ "." ++ domain)
  4. Parse an e-mail address to test the code:
    main :: IO ()
    main = print $ parseOnly e-mail "nishant@shukla.io"
  5. Run the code to print out the parsed e-mail address:
    $ runhaskell Main.hs
    
    Right (E-mail {user = "nishant", host = "shukla.io"})
    

How it works…

We create an e-mail parser by matching the string against multiple tests. An e-mail address must contain some alphanumerical username, followed by the 'at' sign (@), then an alphanumerical hostname, a period, and lastly the top-level domain.

The various functions used from the attoparsec library can be found in the Data.Attoparsec.Text documentation, which is available at https://hackage.haskell.org/package/attoparsec/docs/Data-Attoparsec-Text.html.

主站蜘蛛池模板: 嘉峪关市| 阿瓦提县| 河西区| 霞浦县| 龙泉市| 东台市| 郧西县| 红原县| 漳州市| 如东县| 台北市| 仁布县| 巢湖市| 通河县| 科技| 巫山县| 新巴尔虎右旗| 柘荣县| 湖南省| 阳高县| 措美县| 延吉市| 肃北| 平阴县| 泸水县| 中宁县| 河曲县| 信丰县| 昭通市| 安塞县| 阳城县| 抚松县| 新田县| 锦州市| 杭州市| 正蓝旗| 文安县| 交口县| 房山区| 苗栗县| 岳阳市|