官术网_书友最值得收藏!

Splitting a string on lines, words, or arbitrary tokens

Useful data is often interspersed between delimiters, such as commas or spaces, making string splitting vital for most data analysis tasks.

Getting ready

Create an input.txt file similar to the following one:

$ cat input.txt

first line
second line
words are split by space
comma,separated,values
or any delimiter you want

Install the split package using Cabal as follows:

$ cabal install split

How to do it...

  1. The only function we will need is splitOn, which is imported as follows:
    import Data.List.Split (splitOn)
  2. First we split the string into lines, as shown in the following code snippet:
    main = do 
      input <- readFile "input.txt"
      let ls = lines input
      print $ ls
  3. The lines are printed in a list as follows:
    [ "first line","second line"
    , "words are split by space"
    , "comma,separated,values"
    , "or any delimiter you want"]
    
  4. Next, we separate a string on spaces as follows:
      let ws = words $ ls !! 2
      print ws
  5. The words are printed in a list as follows:
    ["words","are","split","by","space"]
    
  6. Next, we show how to split a string on an arbitrary value using the following lines of code:
      let cs = splitOn "," $ ls !! 3
      print cs
  7. The values are split on the commas as follows:
    ["comma","separated","values"]
    
  8. Finally, we show splitting on multiple letters as shown in the following code snippet:
      let ds = splitOn "an" $ ls !! 4
      print ds
  9. The output is as follows:
    ["or any d","limit","r you want"]
    
主站蜘蛛池模板: 鹿邑县| 上饶县| 开远市| 阿瓦提县| 监利县| 嘉定区| 眉山市| 景宁| 南宁市| 苍南县| 河源市| 灵武市| 长宁区| 台安县| 泸定县| 桓仁| 曲阳县| 磐安县| 泸溪县| 长汀县| 青州市| 文登市| 孟州市| 青岛市| 辽源市| 枣强县| 陆良县| 禹城市| 兴宁市| 巴彦淖尔市| 余江县| 吉木萨尔县| 望城县| 肇东市| 浪卡子县| 湘阴县| 靖州| 舞阳县| 耿马| 慈溪市| 霍城县|