- Machine Learning Algorithms
- Giuseppe Bonaccorso
- 491字
- 2021-07-02 18:53:27
Statistical learning approaches
Imagine that you need to design a spam-filtering algorithm starting from this initial (over-simplistic) classification based on two parameters:

We have collected 200 email messages (X) (for simplicity, we consider p1 and p2 mutually exclusive) and we need to find a couple of probabilistic hypotheses (expressed in terms of p1 and p2), to determine:

We also assume the conditional independence of both terms (it means that hp1 and hp2 contribute conjunctly to spam in the same way as they were alone).
For example, we could think about rules (hypotheses) like: "If there are more than five blacklisted words" or "If the message is less than 20 characters in length" then "the probability of spam is high" (for example, greater than 50 percent). However, without assigning probabilities, it's difficult to generalize when the dataset changes (like in a real world antispam filter). We also want to determine a partitioning threshold (such as green, yellow, and red signals) to help the user in deciding what to keep and what to trash.
As the hypotheses are determined through the dataset X, we can also write (in a discrete form):

In this example, it's quite easy to determine the value of each term. However, in general, it's necessary to introduce the Bayes formula (which will be discussed in Chapter 6, Naive Bayes):

The proportionality is necessary to avoid the introduction of the marginal probability P(X), which acts only as a normalization factor (remember that in a discrete random variable, the sum of all possible probability outcomes must be equal to 1).
In the previous equation, the first term is called a posteriori (which comes after) probability, because it's determined by a marginal Apriori (which comes first) probability multiplied by a factor which is called likelihood. To understand the philosophy of such an approach, it's useful to take a simple example: tossing a fair coin. Everybody knows that the marginal probability of each face is equal to 0.5, but who decided that? It's a theoretical consequence of logic and probability axioms (a good physicist would say that it's never 0.5 because of several factors that we simply discard). After tossing the coin 100 times, we observe the outcomes and, surprisingly, we discover that the ratio between heads and tails is slightly different (for example, 0.46). How can we correct our estimation? The term called likelihood measures how much our actual experiments confirm the Apriori hypothesis and determines another probability (a posteriori) which reflects the actual situation. The likelihood, therefore, helps us in correcting our estimation dynamically, overcoming the problem of a fixed probability.
In Chapter 6, Naive Bayes, dedicated to naive Bayes algorithms, we're going to discuss these topics deeply and implement a few examples with scikit-learn, however, it's useful to introduce here two statistical learning approaches which are very diffused. Refer to Russel S., Norvig P., Artificial Intelligence: A Modern Approach, Pearson for further information.
- Objective-C Memory Management Essentials
- 自己動手寫Java虛擬機
- FFmpeg入門詳解:音視頻流媒體播放器原理及應用
- 編寫高質量代碼:改善Python程序的91個建議
- STM32F0實戰:基于HAL庫開發
- EPLAN實戰設計
- Learning FuelPHP for Effective PHP Development
- Java程序設計入門
- Windows Phone 7.5:Building Location-aware Applications
- Java Fundamentals
- 后臺開發:核心技術與應用實踐
- 交互式程序設計(第2版)
- Unity 2017 Game AI Programming(Third Edition)
- Kotlin進階實戰
- Access數據庫應用教程(2010版)