- Haskell Data Analysis Cookbook
- Nishant Shukla
- 295字
- 2021-12-08 12:43:35
Lexing and parsing an e-mail address
An elegant way to clean data is by defining a lexer to split up a string into tokens. In this recipe, we will parse an e-mail address using the attoparsec
library. This will naturally allow us to ignore the surrounding whitespace.
Getting ready
Import the attoparsec
parser combinator library:
$ cabal install attoparsec
How to do it…
Create a new file, which we will call Main.hs
, and perform the following steps:
- Use the GHC
OverloadedStrings
language extension to more legibly use theText
data type throughout the code. Also, import the other relevant libraries:{-# LANGUAGE OverloadedStrings #-} import Data.Attoparsec.Text import Data.Char (isSpace, isAlphaNum)
- Declare a data type for an e-mail address:
data E-mail = E-mail { user :: String , host :: String } deriving Show
- Define how to parse an e-mail address. This function can be as simple or as complicated as required:
e-mail :: Parser E-mail e-mail = do skipSpace user <- many' $ satisfy isAlphaNum at <- char '@' hostName <- many' $ satisfy isAlphaNum period <- char '.' domain <- many' (satisfy isAlphaNum) return $ E-mail user (hostName ++ "." ++ domain)
- Parse an e-mail address to test the code:
main :: IO () main = print $ parseOnly e-mail "nishant@shukla.io"
- Run the code to print out the parsed e-mail address:
$ runhaskell Main.hs Right (E-mail {user = "nishant", host = "shukla.io"})
How it works…
We create an e-mail parser by matching the string against multiple tests. An e-mail address must contain some alphanumerical username, followed by the 'at' sign (@
), then an alphanumerical hostname, a period, and lastly the top-level domain.
The various functions used from the attoparsec
library can be found in the Data.Attoparsec.Text
documentation, which is available at https://hackage.haskell.org/package/attoparsec/docs/Data-Attoparsec-Text.html.
- 玩轉(zhuǎn)Scratch少兒趣味編程
- Docker技術(shù)入門與實(shí)戰(zhàn)(第3版)
- Java Web及其框架技術(shù)
- Python金融數(shù)據(jù)分析
- Python程序設(shè)計(jì)案例教程
- 深度學(xué)習(xí):算法入門與Keras編程實(shí)踐
- RabbitMQ Cookbook
- 深入淺出React和Redux
- Python爬蟲(chóng)、數(shù)據(jù)分析與可視化:工具詳解與案例實(shí)戰(zhàn)
- Kubernetes源碼剖析
- Practical Microservices
- 玩轉(zhuǎn).NET Micro Framework移植:基于STM32F10x處理器
- Learning ECMAScript 6
- Distributed Computing with Python
- 小學(xué)生C++趣味編程從入門到精通