官术网_书友最值得收藏!

Accumulating text data from a file path

One of the easiest ways to get started with processing input is by reading raw text from a local file. In this recipe, we will be extracting all the text from a specific file path. Furthermore, to do something interesting with the data, we will count the number of words per line.

Tip

Haskell is a purely functional programming language, right? Sure, but obtaining input from outside the code introduces impurity. For elegance and reusability, we must carefully separate pure from impure code.

Getting ready

We will first create an input.txt text file with a couple of lines of text to be read by the program. We keep this file in an easy-to-access directory because it will be referenced later. For example, the text file we're dealing with contains a seven-line quote by Plato. Here's what our terminal prints when we issue the following command:

$ cat input.txt

And how will you inquire, Socrates,
into that which you know not? 
What will you put forth as the subject of inquiry? 
And if you find what you want, 
how will you ever know that 
this is what you did not know?

Tip

Downloading the example code

You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you. The code will also be hosted on GitHub at https://github.com/BinRoot/Haskell-Data-Analysis-Cookbook.

How to do it...

Create a new file to start coding. We call our file Main.hs.

  1. As with all executable Haskell programs, start by defining and implementing the main function, as follows:
    main :: IO ()
    main = do
    
  2. Use Haskell's readFile :: FilePath -> IO String function to extract data from an input.txt file path. Note that a file path is just a synonym for String. With the string in memory, pass it into a countWords function to count the number of words in each line, as shown in the following steps:
    input <- readFile "input.txt"
    print $ countWords input
    
  3. Lastly, define our pure function, countWords, as follows:
    countWords :: String -> [Int]
    countWords input = map (length.words) (lines input)
    
  4. The program will print out the number of words per line represented as a list of numbers as follows:
    $ runhaskell Main.hs
    
    [6,6,10,7,6,7]
    

How it works...

Haskell provides useful input and output (I/O) capabilities for reading input and writing output in different ways. In our case, we use readFile to specify a path of a file to be read. Using the do keyword in main suggests that we are joining several IO actions together. The output of readFile is an I/O string, which means it is an I/O action that returns a String type.

Now we're about to get a bit technical. Pay close attention. Alternatively, smile and nod. In Haskell, the I/O data type is an instance of something called a Monad. This allows us to use the <- notation to draw the string out of this I/O action. We then make use of the string by feeding it into our countWords function that counts the number of words in each line. Notice how we separated the countWords function apart from the impure main function.

Finally, we print the output of countWords. The $ notation means we are using a function application to avoid excessive parenthesis in our code. Without it, the last line of main would look like print (countWords input).

See also

For simplicity's sake, this code is easy to read but very fragile. If an input.txt file does not exist, then running the code will immediately crash the program. For example, the following command will generate the error message:

$ runhaskell Main.hs

Main.hs: input.txt: openFile: does not exist…

To make this code fault tolerant, refer to the Catching I/O code faults recipe.

主站蜘蛛池模板: 潜山县| 大余县| 满洲里市| 固原市| 滨州市| 永春县| 砀山县| 焉耆| 托克托县| 溧水县| 台东市| 贵德县| 沭阳县| 陕西省| 巴林左旗| 罗源县| 隆林| 福建省| 同德县| 浑源县| 湘潭市| 辰溪县| 会东县| 柳林县| 德清县| 泰兴市| 沁阳市| 潼南县| 泸定县| 文登市| 唐山市| 皮山县| 罗平县| 潼关县| 百色市| 洪江市| 北宁市| 清河县| 宜君县| 璧山县| 鹤山市|