官术网_书友最值得收藏!

Statistical learning approaches

Imagine that you need to design a spam-filtering algorithm starting from this initial (over-simplistic) classification based on two parameters:

We have collected 200 email messages (X) (for simplicity, we consider p1 and pmutually exclusive) and we need to find a couple of probabilistic hypotheses (expressed in terms of p1 and p2), to determine:

We also assume the conditional independence of both terms (it means that hp1 and hp2 contribute conjunctly to spam in the same way as they were alone).

For example, we could think about rules (hypotheses) like: "If there are more than five blacklisted words" or "If the message is less than 20 characters in length" then "the probability of spam is high" (for example, greater than 50 percent). However, without assigning probabilities, it's difficult to generalize when the dataset changes (like in a real world antispam filter). We also want to determine a partitioning threshold (such as green, yellow, and red signals) to help the user in deciding what to keep and what to trash.

As the hypotheses are determined through the dataset X, we can also write (in a discrete form):

In this example, it's quite easy to determine the value of each term. However, in general, it's necessary to introduce the Bayes formula (which will be discussed in Chapter 6, Naive Bayes):

The proportionality is necessary to avoid the introduction of the marginal probability P(X), which acts only as a normalization factor (remember that in a discrete random variable, the sum of all possible probability outcomes must be equal to 1).

In the previous equation, the first term is called a posteriori (which comes after) probability, because it's determined by a marginal Apriori (which comes first) probability multiplied by a factor which is called likelihood. To understand the philosophy of such an approach, it's useful to take a simple example: tossing a fair coin. Everybody knows that the marginal probability of each face is equal to 0.5, but who decided that? It's a theoretical consequence of logic and probability axioms (a good physicist would say that it's never 0.5 because of several factors that we simply discard). After tossing the coin 100 times, we observe the outcomes and, surprisingly, we discover that the ratio between heads and tails is slightly different (for example, 0.46). How can we correct our estimation? The term called likelihood measures how much our actual experiments confirm the Apriori hypothesis and determines another probability (a posteriori) which reflects the actual situation. The likelihood, therefore, helps us in correcting our estimation dynamically, overcoming the problem of a fixed probability.

In Chapter 6, Naive Bayes, dedicated to naive Bayes algorithms, we're going to discuss these topics deeply and implement a few examples with scikit-learn, however, it's useful to introduce here two statistical learning approaches which are very diffused. Refer to Russel S., Norvig P., Artificial Intelligence: A Modern Approach, Pearson for further information.

主站蜘蛛池模板: 平湖市| 百色市| 九台市| 舟曲县| 丹江口市| 镇沅| 南安市| 千阳县| 修武县| 勐海县| 石景山区| 临沭县| 河间市| 黄山市| 泗水县| 涪陵区| 昌吉市| 天台县| 呈贡县| 察雅县| 定南县| 娄底市| 定结县| 蒙自县| 汶川县| 庆阳市| 凌云县| 富阳市| 灵寿县| 高雄市| 清镇市| 盈江县| 巫山县| 靖安县| 昆明市| 岑溪市| 东丽区| 邹平县| 顺义区| 广德县| 康乐县|