官术网_书友最值得收藏!

Accumulating text data from a file path

One of the easiest ways to get started with processing input is by reading raw text from a local file. In this recipe, we will be extracting all the text from a specific file path. Furthermore, to do something interesting with the data, we will count the number of words per line.

Tip

Haskell is a purely functional programming language, right? Sure, but obtaining input from outside the code introduces impurity. For elegance and reusability, we must carefully separate pure from impure code.

Getting ready

We will first create an input.txt text file with a couple of lines of text to be read by the program. We keep this file in an easy-to-access directory because it will be referenced later. For example, the text file we're dealing with contains a seven-line quote by Plato. Here's what our terminal prints when we issue the following command:

$ cat input.txt

And how will you inquire, Socrates,
into that which you know not? 
What will you put forth as the subject of inquiry? 
And if you find what you want, 
how will you ever know that 
this is what you did not know?

Tip

Downloading the example code

You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you. The code will also be hosted on GitHub at https://github.com/BinRoot/Haskell-Data-Analysis-Cookbook.

How to do it...

Create a new file to start coding. We call our file Main.hs.

  1. As with all executable Haskell programs, start by defining and implementing the main function, as follows:
    main :: IO ()
    main = do
    
  2. Use Haskell's readFile :: FilePath -> IO String function to extract data from an input.txt file path. Note that a file path is just a synonym for String. With the string in memory, pass it into a countWords function to count the number of words in each line, as shown in the following steps:
    input <- readFile "input.txt"
    print $ countWords input
    
  3. Lastly, define our pure function, countWords, as follows:
    countWords :: String -> [Int]
    countWords input = map (length.words) (lines input)
    
  4. The program will print out the number of words per line represented as a list of numbers as follows:
    $ runhaskell Main.hs
    
    [6,6,10,7,6,7]
    

How it works...

Haskell provides useful input and output (I/O) capabilities for reading input and writing output in different ways. In our case, we use readFile to specify a path of a file to be read. Using the do keyword in main suggests that we are joining several IO actions together. The output of readFile is an I/O string, which means it is an I/O action that returns a String type.

Now we're about to get a bit technical. Pay close attention. Alternatively, smile and nod. In Haskell, the I/O data type is an instance of something called a Monad. This allows us to use the <- notation to draw the string out of this I/O action. We then make use of the string by feeding it into our countWords function that counts the number of words in each line. Notice how we separated the countWords function apart from the impure main function.

Finally, we print the output of countWords. The $ notation means we are using a function application to avoid excessive parenthesis in our code. Without it, the last line of main would look like print (countWords input).

See also

For simplicity's sake, this code is easy to read but very fragile. If an input.txt file does not exist, then running the code will immediately crash the program. For example, the following command will generate the error message:

$ runhaskell Main.hs

Main.hs: input.txt: openFile: does not exist…

To make this code fault tolerant, refer to the Catching I/O code faults recipe.

主站蜘蛛池模板: 循化| 夹江县| 拉孜县| 武定县| 马龙县| 法库县| 绍兴县| 绍兴市| 塔城市| 洪泽县| 岳西县| 乐亭县| 柞水县| 巴楚县| 乌兰浩特市| 民勤县| 神木县| 漳浦县| 凤城市| 武隆县| 德化县| 仲巴县| 山西省| 涟源市| 天镇县| 凯里市| 德格县| 昂仁县| 宣威市| 永泰县| 平陆县| 仪陇县| 浠水县| 准格尔旗| 鞍山市| 峨山| 台江县| 江阴市| 泾源县| 辉县市| 梁平县|