官术网_书友最值得收藏!

Statistical learning approaches

Imagine that you need to design a spam-filtering algorithm starting from this initial (over-simplistic) classification based on two parameters:

We have collected 200 email messages (X) (for simplicity, we consider p1 and pmutually exclusive) and we need to find a couple of probabilistic hypotheses (expressed in terms of p1 and p2), to determine:

We also assume the conditional independence of both terms (it means that hp1 and hp2 contribute conjunctly to spam in the same way as they were alone).

For example, we could think about rules (hypotheses) like: "If there are more than five blacklisted words" or "If the message is less than 20 characters in length" then "the probability of spam is high" (for example, greater than 50 percent). However, without assigning probabilities, it's difficult to generalize when the dataset changes (like in a real world antispam filter). We also want to determine a partitioning threshold (such as green, yellow, and red signals) to help the user in deciding what to keep and what to trash.

As the hypotheses are determined through the dataset X, we can also write (in a discrete form):

In this example, it's quite easy to determine the value of each term. However, in general, it's necessary to introduce the Bayes formula (which will be discussed in Chapter 6, Naive Bayes):

The proportionality is necessary to avoid the introduction of the marginal probability P(X), which acts only as a normalization factor (remember that in a discrete random variable, the sum of all possible probability outcomes must be equal to 1).

In the previous equation, the first term is called a posteriori (which comes after) probability, because it's determined by a marginal Apriori (which comes first) probability multiplied by a factor which is called likelihood. To understand the philosophy of such an approach, it's useful to take a simple example: tossing a fair coin. Everybody knows that the marginal probability of each face is equal to 0.5, but who decided that? It's a theoretical consequence of logic and probability axioms (a good physicist would say that it's never 0.5 because of several factors that we simply discard). After tossing the coin 100 times, we observe the outcomes and, surprisingly, we discover that the ratio between heads and tails is slightly different (for example, 0.46). How can we correct our estimation? The term called likelihood measures how much our actual experiments confirm the Apriori hypothesis and determines another probability (a posteriori) which reflects the actual situation. The likelihood, therefore, helps us in correcting our estimation dynamically, overcoming the problem of a fixed probability.

In Chapter 6, Naive Bayes, dedicated to naive Bayes algorithms, we're going to discuss these topics deeply and implement a few examples with scikit-learn, however, it's useful to introduce here two statistical learning approaches which are very diffused. Refer to Russel S., Norvig P., Artificial Intelligence: A Modern Approach, Pearson for further information.

主站蜘蛛池模板: 威远县| 五河县| 刚察县| 祁东县| 安丘市| 达孜县| 申扎县| 新宁县| 北碚区| 临清市| 古丈县| 南京市| 称多县| 镇原县| 会宁县| 察隅县| 黑河市| 高阳县| 柞水县| 光山县| 女性| 荥经县| 孙吴县| 营山县| 瑞安市| 丹棱县| 泰宁县| 宁陕县| 天峻县| 固镇县| 小金县| 新蔡县| 江川县| 瑞丽市| 岗巴县| 林芝县| 隆子县| 独山县| 县级市| 景泰县| 元氏县|