官术网_书友最值得收藏!

Ranking to find the best rules

Now that we can compute the support and confidence of all rules, we want to be able to find the best rules. To do this, we perform a ranking and print the ones with the highest values. We can do this for both the support and confidence values.

To find the rules with the highest support, we first sort the support dictionary. Dictionaries do not support ordering by default; the items() function gives us a list containing the data in the dictionary. We can sort this list using the itemgetter class as our key, which allows for the sorting of nested lists such as this one. Using itemgetter(1) allows us to sort based on the values. Setting reverse=True gives us the highest values first:

from operator import itemgetter 
sorted_support = sorted(support.items(), key=itemgetter(1), reverse=True)

We can then print out the top five rules:

sorted_confidence = sorted(confidence.items(), key=itemgetter(1),
reverse=True)
for index in range(5):
print("Rule #{0}".format(index + 1))
premise, conclusion = sorted_confidence[index][0]
print_rule(premise, conclusion, support, confidence, features)

The result will look like the following:

Rule #1 
Rule: If a person buys bananas they will also buy milk
- Support: 27
- Confidence: 0.474
Rule #2
Rule: If a person buys milk they will also buy bananas
- Support: 27
- Confidence: 0.519
Rule #3
Rule: If a person buys bananas they will also buy apples
- Support: 27
- Confidence: 0.474
Rule #4
Rule: If a person buys apples they will also buy bananas
- Support: 27
- Confidence: 0.628
Rule #5
Rule: If a person buys apples they will also buy cheese
- Support: 22
- Confidence: 0.512

Similarly, we can print the top rules based on confidence. First, compute the sorted confidence list and then print them out using the same method as before.

sorted_confidence = sorted(confidence.items(), key=itemgetter(1),
reverse=True)
for index in range(5):
print("Rule #{0}".format(index + 1))
premise, conclusion = sorted_confidence[index][0]
print_rule(premise, conclusion, support, confidence, features)

Two rules are near the top of both lists. The first is If a person buys apples, they will also buy cheese, and the second is If a person buys cheese, they will also buy bananas. A store manager can use rules like these to organize their store. For example, if apples are on sale this week, put a display of cheeses nearby. Similarly, it would make little sense to put both bananas on sale at the same time as cheese, as nearly 66 percent of people buying cheese will probably buy bananas -our sale won't increase banana purchases all that much.

Jupyter Notebook will display graphs inline, right in the notebook. Sometimes, however, this is not always configured by default. To configure Jupyter Notebook to display graphs inline, use the following line of code: %matplotlib inline

We can visualize the results using a library called matplotlib.

We are going to start with a simple line plot showing the confidence values of the rules, in order of confidence. matplotlib makes this easy - we just pass in the numbers, and it will draw up a simple but effective plot:

from matplotlib import pyplot as plt 
plt.plot([confidence[rule[0]] for rule in sorted_confidence])

Using the previous graph, we can see that the first five rules have decent confidence, but the efficacy drops quite quickly after that. Using this information, we might decide to use just the first five rules to drive business decisions. Ultimately with exploration techniques like this, the result is up to the user.

Data mining has great exploratory power in examples like this. A person can use data mining techniques to explore relationships within their datasets to find new insights. In the next section, we will use data mining for a different purpose: prediction and classification.

主站蜘蛛池模板: 双流县| 甘洛县| 麻江县| 宝清县| 巴林右旗| 龙江县| 永胜县| 长沙市| 敦化市| 贺兰县| 凤山县| 天祝| 广德县| 龙井市| 达州市| 宿松县| 浦江县| 读书| 新蔡县| 威宁| 大悟县| 吉木乃县| 阿城市| 德化县| 南平市| 义马市| 乐山市| 吉首市| 若羌县| 南投市| 巨鹿县| 潞城市| 宜兴市| 明水县| 灌云县| 且末县| 澄江县| 读书| 怀柔区| 鄯善县| 德钦县|