官术网_书友最值得收藏!

An example project – word frequency

A lot of the concepts and techniques that we have seen so far in this book come together in this little project. Its aim is to read a text file, remove all characters that are not used in words, and count the frequency of the words in the remaining text. This can be useful, for example, when counting the word density on a web page, the frequency of DNA sequences, or the number of hits on a website that came from various IP addresses. This can be done in some ten lines of code. For example, when words1.txt contains the sentence to be, or not to be, that is the question!, then this is the output of the program:

Word : frequency 
    
be : 2
is : 1
not : 1
or : 1
question : 1
that : 1
the : 1
to : 2

Here is the code with comments:

# code in chapter 5\word_frequency.jl: 
# 1- read in text file: 
str = read("words1.txt", String) 
# 2- replace non alphabet characters from text with a space: 
nonalpha = r"(\W\s?)" # define a regular expression 
str = replace(str, nonalpha => ' ') 
digits = r"(\d+)" 
str = replace(str, digits => ' ') 
# 3- split text in words: 
word_list = split(str, ' ') 
# 4- make a dictionary with the words and count their frequencies: 
word_freq = Dict{String, Int64}() 
for word in word_list 
    word = strip(word) 
    if isempty(word) continue end    
haskey(word_freq, word) ?

word_freq[word] += 1 :

word_freq[word] = 1
end # 5- sort the words (the keys) and print out the frequencies: println("Word : frequency \n") words = sort!(collect(keys(word_freq))) for word in words println("$word : $(word_freq[word])") end

The strip() function removes white space from a string at the front/back.

The isempty function is quite general and can be used on any collection.

Try the code out with the example text files words1.txt or words2.txt. See the output in results_words1.txt and results_words2.txt.

主站蜘蛛池模板: 永靖县| 油尖旺区| 香河县| 河北区| 武陟县| 朝阳县| 孝昌县| 日喀则市| 芒康县| 阿荣旗| 五河县| 斗六市| 修水县| 宿松县| 辽源市| 阿拉善右旗| 化州市| 平顶山市| 垫江县| 炎陵县| 建宁县| 象山县| 乌拉特中旗| 翁牛特旗| 枣强县| 阳西县| 东安县| 双辽市| 赤水市| 南宫市| 宜春市| 昌吉市| 邵东县| 启东市| 清涧县| 来安县| 罗江县| 兴安盟| 岳阳县| 青铜峡市| 凉城县|