官术网_书友最值得收藏!

Ranking to find the best rules

Now that we can compute the support and confidence of all rules, we want to be able to find the best rules. To do this, we perform a ranking and print the ones with the highest values. We can do this for both the support and confidence values.

To find the rules with the highest support, we first sort the support dictionary. Dictionaries do not support ordering by default; the items() function gives us a list containing the data in the dictionary. We can sort this list using the itemgetter class as our key, which allows for the sorting of nested lists such as this one. Using itemgetter(1) allows us to sort based on the values. Setting reverse=True gives us the highest values first:

from operator import itemgetter 
sorted_support = sorted(support.items(), key=itemgetter(1), reverse=True)

We can then print out the top five rules:

sorted_confidence = sorted(confidence.items(), key=itemgetter(1),
reverse=True)
for index in range(5):
print("Rule #{0}".format(index + 1))
premise, conclusion = sorted_confidence[index][0]
print_rule(premise, conclusion, support, confidence, features)

The result will look like the following:

Rule #1 
Rule: If a person buys bananas they will also buy milk
- Support: 27
- Confidence: 0.474
Rule #2
Rule: If a person buys milk they will also buy bananas
- Support: 27
- Confidence: 0.519
Rule #3
Rule: If a person buys bananas they will also buy apples
- Support: 27
- Confidence: 0.474
Rule #4
Rule: If a person buys apples they will also buy bananas
- Support: 27
- Confidence: 0.628
Rule #5
Rule: If a person buys apples they will also buy cheese
- Support: 22
- Confidence: 0.512

Similarly, we can print the top rules based on confidence. First, compute the sorted confidence list and then print them out using the same method as before.

sorted_confidence = sorted(confidence.items(), key=itemgetter(1),
reverse=True)
for index in range(5):
print("Rule #{0}".format(index + 1))
premise, conclusion = sorted_confidence[index][0]
print_rule(premise, conclusion, support, confidence, features)

Two rules are near the top of both lists. The first is If a person buys apples, they will also buy cheese, and the second is If a person buys cheese, they will also buy bananas. A store manager can use rules like these to organize their store. For example, if apples are on sale this week, put a display of cheeses nearby. Similarly, it would make little sense to put both bananas on sale at the same time as cheese, as nearly 66 percent of people buying cheese will probably buy bananas -our sale won't increase banana purchases all that much.

Jupyter Notebook will display graphs inline, right in the notebook. Sometimes, however, this is not always configured by default. To configure Jupyter Notebook to display graphs inline, use the following line of code: %matplotlib inline

We can visualize the results using a library called matplotlib.

We are going to start with a simple line plot showing the confidence values of the rules, in order of confidence. matplotlib makes this easy - we just pass in the numbers, and it will draw up a simple but effective plot:

from matplotlib import pyplot as plt 
plt.plot([confidence[rule[0]] for rule in sorted_confidence])

Using the previous graph, we can see that the first five rules have decent confidence, but the efficacy drops quite quickly after that. Using this information, we might decide to use just the first five rules to drive business decisions. Ultimately with exploration techniques like this, the result is up to the user.

Data mining has great exploratory power in examples like this. A person can use data mining techniques to explore relationships within their datasets to find new insights. In the next section, we will use data mining for a different purpose: prediction and classification.

主站蜘蛛池模板: 商丘市| 桐梓县| 荆州市| 合川市| 视频| 隆德县| 新宾| 建水县| 揭阳市| 辉南县| 治多县| 化州市| 静安区| 新建县| 柳江县| 三亚市| 博罗县| 山西省| 达孜县| 开封市| 涟水县| 合江县| 天祝| 罗平县| 永清县| 临清市| 东源县| 清远市| 龙里县| 图片| 文水县| 伊春市| 乌鲁木齐市| 乐陵市| 郎溪县| 安阳市| 安多县| 三台县| 肥西县| 蚌埠市| 辽阳市|