- Haskell Data Analysis Cookbook
- Nishant Shukla
- 593字
- 2021-12-08 12:43:35
Coping with unexpected or missing input
Data sources often contain incomplete and unexpected data. One common approach to parsing such data in Haskell is using the Maybe
data type.
Imagine designing a function to find the nth element in a list of characters. A na?ve implementation may have the type Int -> [Char] -> Char
. However, if the function is trying to access an index out of bounds, we should try to indicate that an error has occurred.
A common way to deal with these errors is by encapsulating the output Char
into a Maybe
context. Having the type Int -> [Char] -> Maybe Char
allows for some better error handling. The constructors for Maybe
are Just a
or Nothing
, which will become apparent by running GHCi and testing out the following commands:
$ ghci Prelude> :type Just 'c' Just 'c' :: Maybe Char Prelude> :type Nothing Nothing :: Maybe a
We will set each field as a Maybe
data type so that whenever a field cannot be parsed, it will simply be represented as Nothing
. This recipe will demonstrate how to read the CSV data with faulty and missing info.
Getting ready
We create an input set of CSV files to read in. The first column will be for laptop brands, the next column will be for their models, and the third column will be for the base cost. We should leave some fields blank to simulate an incomplete input. We name the file input.csv
:

Also, we must install the csv library:
$ cabal install csv
How to do it...
Create a new file, which we will call Main.hs
, and perform the following steps:
- Import the CSV library:
import Text.CSV
- Create a data type corresponding to the CSV fields:
data Laptop = Laptop { brand :: Maybe String , model :: Maybe String , cost :: Maybe Float } deriving Show
- Define and implement
main
to read the CSV input and parse relevant info:main :: IO () main = do let fileName = "input.csv" input <- readFile fileName let csv = parseCSV fileName input let laptops = parseLaptops csv print laptops
- From a list of records, create a list of laptop data types:
parseLaptops (Left err) = [] parseLaptops (Right csv) = foldl (\a record -> if length record == 3 then (parseLaptop record):a else a) [] csv parseLaptop record = Laptop{ brand = getBrand $ record !! 0 , model = getModel $ record !! 1 , cost = getCost $ record !! 2 }
- Parse each field, producing
Nothing
if there is an unexpected or missing item:getBrand :: String -> Maybe String getBrand str = if null str then Nothing else Just str getModel :: String -> Maybe String getModel str = if null str then Nothing else Just str getCost :: String -> Maybe Float getCost str = case reads str::[(Float,String)] of [(cost, "")] -> Just cost _ -> Nothing
How it works...
The Maybe
monad allows you to have two states: Just
something or Nothing
. It provides a useful abstraction to produce an error state. Each field in these data types exists in a Maybe
context. If a field doesn't exist, then we simply regard it as Nothing
and move on.
There's more...
If a more descriptive error state is desired, the Either
monad may be more useful. It also has two states, but they are more descriptive: Left
something, or Right
something. The Left
state is often used to describe the error type, whereas the Right
state holds the desired result. We can use the Left
state to describe different types of errors instead of just one behemoth Nothing
.
See also
To review CSV data input, see the Keeping and representing data from a CSV file recipe in Chapter 1, The Hunt for Data.
- Implementing VMware Horizon 7(Second Edition)
- JavaScript+DHTML語法與范例詳解詞典
- 深入淺出Prometheus:原理、應用、源碼與拓展詳解
- 實戰低代碼
- Koa開發:入門、進階與實戰
- Kotlin Standard Library Cookbook
- Magento 1.8 Development Cookbook
- 自制編程語言
- Spring Boot+MVC實戰指南
- Building Dynamics CRM 2015 Dashboards with Power BI
- Simulation for Data Science with R
- 軟硬件綜合系統軟件需求建模及可靠性綜合試驗、分析、評價技術
- Learning Puppet
- Python實戰指南:手把手教你掌握300個精彩案例
- Building E-Commerce Solutions with WooCommerce(Second Edition)