官术网_书友最值得收藏!

Ranking to find the best rules

Now that we can compute the support and confidence of all rules, we want to be able to find the best rules. To do this, we perform a ranking and print the ones with the highest values. We can do this for both the support and confidence values.

To find the rules with the highest support, we first sort the support dictionary. Dictionaries do not support ordering by default; the items() function gives us a list containing the data in the dictionary. We can sort this list using the itemgetter class as our key, which allows for the sorting of nested lists such as this one. Using itemgetter(1) allows us to sort based on the values. Setting reverse=True gives us the highest values first:

from operator import itemgetter 
sorted_support = sorted(support.items(), key=itemgetter(1), reverse=True)

We can then print out the top five rules:

sorted_confidence = sorted(confidence.items(), key=itemgetter(1),
reverse=True)
for index in range(5):
print("Rule #{0}".format(index + 1))
premise, conclusion = sorted_confidence[index][0]
print_rule(premise, conclusion, support, confidence, features)

The result will look like the following:

Rule #1 
Rule: If a person buys bananas they will also buy milk
- Support: 27
- Confidence: 0.474
Rule #2
Rule: If a person buys milk they will also buy bananas
- Support: 27
- Confidence: 0.519
Rule #3
Rule: If a person buys bananas they will also buy apples
- Support: 27
- Confidence: 0.474
Rule #4
Rule: If a person buys apples they will also buy bananas
- Support: 27
- Confidence: 0.628
Rule #5
Rule: If a person buys apples they will also buy cheese
- Support: 22
- Confidence: 0.512

Similarly, we can print the top rules based on confidence. First, compute the sorted confidence list and then print them out using the same method as before.

sorted_confidence = sorted(confidence.items(), key=itemgetter(1),
reverse=True)
for index in range(5):
print("Rule #{0}".format(index + 1))
premise, conclusion = sorted_confidence[index][0]
print_rule(premise, conclusion, support, confidence, features)

Two rules are near the top of both lists. The first is If a person buys apples, they will also buy cheese, and the second is If a person buys cheese, they will also buy bananas. A store manager can use rules like these to organize their store. For example, if apples are on sale this week, put a display of cheeses nearby. Similarly, it would make little sense to put both bananas on sale at the same time as cheese, as nearly 66 percent of people buying cheese will probably buy bananas -our sale won't increase banana purchases all that much.

Jupyter Notebook will display graphs inline, right in the notebook. Sometimes, however, this is not always configured by default. To configure Jupyter Notebook to display graphs inline, use the following line of code: %matplotlib inline

We can visualize the results using a library called matplotlib.

We are going to start with a simple line plot showing the confidence values of the rules, in order of confidence. matplotlib makes this easy - we just pass in the numbers, and it will draw up a simple but effective plot:

from matplotlib import pyplot as plt 
plt.plot([confidence[rule[0]] for rule in sorted_confidence])

Using the previous graph, we can see that the first five rules have decent confidence, but the efficacy drops quite quickly after that. Using this information, we might decide to use just the first five rules to drive business decisions. Ultimately with exploration techniques like this, the result is up to the user.

Data mining has great exploratory power in examples like this. A person can use data mining techniques to explore relationships within their datasets to find new insights. In the next section, we will use data mining for a different purpose: prediction and classification.

主站蜘蛛池模板: 神池县| 秦皇岛市| 郓城县| 哈尔滨市| 兴文县| 枞阳县| 广安市| 临沭县| 巩义市| 惠安县| 东至县| 瑞安市| 巢湖市| 铜鼓县| 宝坻区| 大荔县| 得荣县| 安陆市| 和龙市| 论坛| 新兴县| 太仆寺旗| 和政县| 上犹县| 象州县| 乐东| 武汉市| 泽库县| 米泉市| 洪洞县| 馆陶县| 大洼县| 内黄县| 新田县| 延安市| 思茅市| 乌鲁木齐县| 湄潭县| 西吉县| 米易县| 临夏市|