官术网_书友最值得收藏!

Dictionaries for text analysis

A common use of dictionaries is to count the occurrences of like items in a sequence; a typical example is counting the occurrences of words in a body of text. The following code creates a dictionary where each word in the text is used as a key and the number of occurrences as its value. This uses a very common idiom of nested loops. Here we are using it to traverse the lines in a file in an outer loop and the keys of a dictionary on the inner loop:

def wordcount(fname): 
try:
fhand=open(fname)
except:
print('File cannot be opened')
exit()

count= dict()
for line in fhand:
words = line.split()
for word in words:
if word not in count:
count[word] = 1
else:
count[word] += 1
return(count)

This will return a dictionary with an element for each unique word in the text file. A common task is to filter items such as these into subsets we are interested in. You will need a text file saved in the same directory as you run the code. Here we have used alice.txt, a short excerpt from Alice in Wonderland. To obtain the same results, you can download alice.txt from davejulian.net/bo5630, or use a text file of your own. In the following code, we create another dictionary, filtered, containing a subset of items from count:

count=wordcount('alice.txt') 
filtered = { key:value for key, value in count.items() if value < 20 and value > 15 }

When we print the filtered dictionary, we get the following:

Note the use of the dictionary comprehension used to construct the filtered dictionary. Dictionary comprehensions work in an identical way to the list comprehensions we looked at in Chapter 1, Python Objects, Types, and Expressions.

主站蜘蛛池模板: 大田县| 突泉县| 金阳县| 龙陵县| 汶上县| 大方县| 安多县| 深水埗区| 胶州市| 独山县| 绥德县| 梅州市| 象州县| 翼城县| 肃宁县| 卫辉市| 德江县| 黔西| 青冈县| 达拉特旗| 宜都市| 思南县| 建德市| 长垣县| 奎屯市| 疏勒县| 庆元县| 德化县| 临澧县| 望都县| 定南县| 弋阳县| 长沙市| 金寨县| 桑日县| 句容市| 项城市| 安徽省| 合水县| 临沭县| 当雄县|