官术网_书友最值得收藏!

Accumulating text data from a file path

One of the easiest ways to get started with processing input is by reading raw text from a local file. In this recipe, we will be extracting all the text from a specific file path. Furthermore, to do something interesting with the data, we will count the number of words per line.

Tip

Haskell is a purely functional programming language, right? Sure, but obtaining input from outside the code introduces impurity. For elegance and reusability, we must carefully separate pure from impure code.

Getting ready

We will first create an input.txt text file with a couple of lines of text to be read by the program. We keep this file in an easy-to-access directory because it will be referenced later. For example, the text file we're dealing with contains a seven-line quote by Plato. Here's what our terminal prints when we issue the following command:

$ cat input.txt

And how will you inquire, Socrates,
into that which you know not? 
What will you put forth as the subject of inquiry? 
And if you find what you want, 
how will you ever know that 
this is what you did not know?

Tip

Downloading the example code

You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you. The code will also be hosted on GitHub at https://github.com/BinRoot/Haskell-Data-Analysis-Cookbook.

How to do it...

Create a new file to start coding. We call our file Main.hs.

  1. As with all executable Haskell programs, start by defining and implementing the main function, as follows:
    main :: IO ()
    main = do
    
  2. Use Haskell's readFile :: FilePath -> IO String function to extract data from an input.txt file path. Note that a file path is just a synonym for String. With the string in memory, pass it into a countWords function to count the number of words in each line, as shown in the following steps:
    input <- readFile "input.txt"
    print $ countWords input
    
  3. Lastly, define our pure function, countWords, as follows:
    countWords :: String -> [Int]
    countWords input = map (length.words) (lines input)
    
  4. The program will print out the number of words per line represented as a list of numbers as follows:
    $ runhaskell Main.hs
    
    [6,6,10,7,6,7]
    

How it works...

Haskell provides useful input and output (I/O) capabilities for reading input and writing output in different ways. In our case, we use readFile to specify a path of a file to be read. Using the do keyword in main suggests that we are joining several IO actions together. The output of readFile is an I/O string, which means it is an I/O action that returns a String type.

Now we're about to get a bit technical. Pay close attention. Alternatively, smile and nod. In Haskell, the I/O data type is an instance of something called a Monad. This allows us to use the <- notation to draw the string out of this I/O action. We then make use of the string by feeding it into our countWords function that counts the number of words in each line. Notice how we separated the countWords function apart from the impure main function.

Finally, we print the output of countWords. The $ notation means we are using a function application to avoid excessive parenthesis in our code. Without it, the last line of main would look like print (countWords input).

See also

For simplicity's sake, this code is easy to read but very fragile. If an input.txt file does not exist, then running the code will immediately crash the program. For example, the following command will generate the error message:

$ runhaskell Main.hs

Main.hs: input.txt: openFile: does not exist…

To make this code fault tolerant, refer to the Catching I/O code faults recipe.

主站蜘蛛池模板: 襄汾县| 延寿县| 山阳县| 临沧市| 丹阳市| 乌什县| 刚察县| 辽阳市| 瑞丽市| 房山区| 松江区| 三穗县| 嘉义县| 峡江县| 博白县| 南木林县| 霞浦县| 芦山县| 南郑县| 新巴尔虎右旗| 巩留县| 黄大仙区| 连南| 德兴市| 开平市| 千阳县| 金湖县| 徐水县| 大安市| 若羌县| 田东县| 普兰店市| 天气| 营山县| 祁东县| 峡江县| 皋兰县| 泗洪县| 松阳县| 宜川县| 文安县|