官术网_书友最值得收藏!

Dictionaries for text analysis

A common use of dictionaries is to count the occurrences of like items in a sequence; a typical example is counting the occurrences of words in a body of text. The following code creates a dictionary where each word in the text is used as a key and the number of occurrences as its value. This uses a very common idiom of nested loops. Here we are using it to traverse the lines in a file in an outer loop and the keys of a dictionary on the inner loop:

def wordcount(fname): 
try:
fhand=open(fname)
except:
print('File cannot be opened')
exit()

count= dict()
for line in fhand:
words = line.split()
for word in words:
if word not in count:
count[word] = 1
else:
count[word] += 1
return(count)

This will return a dictionary with an element for each unique word in the text file. A common task is to filter items such as these into subsets we are interested in. You will need a text file saved in the same directory as you run the code. Here we have used alice.txt, a short excerpt from Alice in Wonderland. To obtain the same results, you can download alice.txt from davejulian.net/bo5630, or use a text file of your own. In the following code, we create another dictionary, filtered, containing a subset of items from count:

count=wordcount('alice.txt') 
filtered = { key:value for key, value in count.items() if value < 20 and value > 15 }

When we print the filtered dictionary, we get the following:

Note the use of the dictionary comprehension used to construct the filtered dictionary. Dictionary comprehensions work in an identical way to the list comprehensions we looked at in Chapter 1, Python Objects, Types, and Expressions.

主站蜘蛛池模板: 蓬莱市| 集贤县| 微山县| 定边县| 库尔勒市| 吉首市| 太原市| 长乐市| 凤阳县| 依兰县| 汾阳市| 和林格尔县| 陈巴尔虎旗| 油尖旺区| 申扎县| 荔浦县| 开封县| 双辽市| 资源县| 宜城市| 宁化县| 屏东县| 大埔县| 泽州县| 富源县| 黄骅市| 枣阳市| 锦屏县| 泌阳县| 财经| 花垣县| 贵南县| 永定县| 灵石县| 乐平市| 岑溪市| 凯里市| 镇沅| 凉山| 全南县| 三亚市|