官术网_书友最值得收藏!

Splitting a string on lines, words, or arbitrary tokens

Useful data is often interspersed between delimiters, such as commas or spaces, making string splitting vital for most data analysis tasks.

Getting ready

Create an input.txt file similar to the following one:

$ cat input.txt

first line
second line
words are split by space
comma,separated,values
or any delimiter you want

Install the split package using Cabal as follows:

$ cabal install split

How to do it...

  1. The only function we will need is splitOn, which is imported as follows:
    import Data.List.Split (splitOn)
  2. First we split the string into lines, as shown in the following code snippet:
    main = do 
      input <- readFile "input.txt"
      let ls = lines input
      print $ ls
  3. The lines are printed in a list as follows:
    [ "first line","second line"
    , "words are split by space"
    , "comma,separated,values"
    , "or any delimiter you want"]
    
  4. Next, we separate a string on spaces as follows:
      let ws = words $ ls !! 2
      print ws
  5. The words are printed in a list as follows:
    ["words","are","split","by","space"]
    
  6. Next, we show how to split a string on an arbitrary value using the following lines of code:
      let cs = splitOn "," $ ls !! 3
      print cs
  7. The values are split on the commas as follows:
    ["comma","separated","values"]
    
  8. Finally, we show splitting on multiple letters as shown in the following code snippet:
      let ds = splitOn "an" $ ls !! 4
      print ds
  9. The output is as follows:
    ["or any d","limit","r you want"]
    
主站蜘蛛池模板: 龙江县| 康定县| 阿坝县| 九寨沟县| 屯门区| 商城县| 大田县| 绥宁县| 桐城市| 陇南市| 青田县| 庄浪县| 新巴尔虎右旗| 富锦市| 靖安县| 阳高县| 东安县| 横峰县| 巴中市| 温宿县| 奉新县| 平度市| 远安县| 兴安县| 潞西市| 通道| 沙河市| 娄烦县| 个旧市| 沂南县| 信宜市| 资中县| 昂仁县| 文昌市| 上虞市| 连州市| 左云县| 上饶市| 皮山县| 安乡县| 老河口市|