官术网_书友最值得收藏!

Coping with unexpected or missing input

Data sources often contain incomplete and unexpected data. One common approach to parsing such data in Haskell is using the Maybe data type.

Imagine designing a function to find the nth element in a list of characters. A na?ve implementation may have the type Int -> [Char] -> Char. However, if the function is trying to access an index out of bounds, we should try to indicate that an error has occurred.

A common way to deal with these errors is by encapsulating the output Char into a Maybe context. Having the type Int -> [Char] -> Maybe Char allows for some better error handling. The constructors for Maybe are Just a or Nothing, which will become apparent by running GHCi and testing out the following commands:

$ ghci

Prelude> :type Just 'c'
Just 'c' :: Maybe Char

Prelude> :type Nothing
Nothing :: Maybe a

We will set each field as a Maybe data type so that whenever a field cannot be parsed, it will simply be represented as Nothing. This recipe will demonstrate how to read the CSV data with faulty and missing info.

Getting ready

We create an input set of CSV files to read in. The first column will be for laptop brands, the next column will be for their models, and the third column will be for the base cost. We should leave some fields blank to simulate an incomplete input. We name the file input.csv:

Also, we must install the csv library:

$ cabal install csv

How to do it...

Create a new file, which we will call Main.hs, and perform the following steps:

  1. Import the CSV library:
    import Text.CSV
  2. Create a data type corresponding to the CSV fields:
    data Laptop = Laptop { brand :: Maybe String
                         , model :: Maybe String
                         , cost :: Maybe Float 
                         } deriving Show
  3. Define and implement main to read the CSV input and parse relevant info:
    main :: IO ()
    main = do
      let fileName = "input.csv"
      input <- readFile fileName
      let csv = parseCSV fileName input
      let laptops = parseLaptops csv
      print laptops
  4. From a list of records, create a list of laptop data types:
    parseLaptops (Left err) = []
    parseLaptops (Right csv) = 
      foldl (\a record -> if length record == 3
                          then (parseLaptop record):a
                          else a) [] csv
    
    parseLaptop record = Laptop{ brand = getBrand $ record !! 0
                               , model = getModel $ record !! 1
                               , cost = getCost $ record !! 2 }
  5. Parse each field, producing Nothing if there is an unexpected or missing item:
    getBrand :: String -> Maybe String
    getBrand str = if null str then Nothing else Just str
    
    getModel :: String -> Maybe String
    getModel str = if null str then Nothing else Just str
    
    getCost :: String -> Maybe Float
    getCost str = case reads str::[(Float,String)] of
      [(cost, "")] -> Just cost
      _ -> Nothing

How it works...

The Maybe monad allows you to have two states: Just something or Nothing. It provides a useful abstraction to produce an error state. Each field in these data types exists in a Maybe context. If a field doesn't exist, then we simply regard it as Nothing and move on.

There's more...

If a more descriptive error state is desired, the Either monad may be more useful. It also has two states, but they are more descriptive: Left something, or Right something. The Left state is often used to describe the error type, whereas the Right state holds the desired result. We can use the Left state to describe different types of errors instead of just one behemoth Nothing.

See also

To review CSV data input, see the Keeping and representing data from a CSV file recipe in Chapter 1, The Hunt for Data.

主站蜘蛛池模板: 肇州县| 彝良县| 台中市| 平罗县| 梅河口市| 剑阁县| 巴林左旗| 开封县| 富顺县| 麻阳| 哈尔滨市| 谷城县| 洛川县| 黑龙江省| 黎城县| 黄冈市| 丰顺县| 和静县| 比如县| 泌阳县| 嘉兴市| 兴城市| 武川县| 琼海市| 舞阳县| 宣城市| 白沙| 嘉禾县| 鹰潭市| 微山县| 林口县| 嘉峪关市| 汝城县| 广宗县| 湟中县| 漳浦县| 甘孜| 桐柏县| 榆林市| 外汇| 灵宝市|