- Go Machine Learning Projects
- Xuanyi Chew
- 415字
- 2021-06-10 18:46:38
Classification - Spam Email Detection
What makes you you? I have dark hair, pale skin, and Asiatic features. I wear glasses. My facial structure is vaguely round, with extra subcutaneous fat in my cheeks compared to my peers. What I have done is describe the features of my face. Each of these features described can be thought of as a point within a probability continuum. What is the probability of having dark hair? Among my friends, dark hair is a very common feature, and so are glasses (a remarkable statistic is out of the 300 people or so I polled on my Facebook page, 281 of them require prescription glasses). The epicanthic folds of my eyes are probably less common, as is the extra subcutaneous fat in my cheeks.
Why am I bringing up my facial features in a chapter about spam classification? It's because the principles are the same. If I show you a photo of a human face, what is the probability that the photo is of me? We can say that the probability that the photo is a photo of my face is a combination of the probability of having dark hair, the probability of having pale skin, the probability of having an epicanthic fold, and so on, and so forth. From a Naive point of view, we can think of each of the features independently contributing to the probability that the photo is me—the fact that I have an epicanthic fold in my eyes is independent from the fact that my skin is of a yellow pallor. But, of course, with recent advancements in genetics, this has been shown to be patently untrue. These features are, in real life, correlated with one another. We will explore this in a future chapter.
Despite a real-life dependence of probability, we can still assume the Naive position and think of these probabilities as independent contributions to the probability that the photo is one of my face.
In this chapter, we will build a email spam classification system using a Naive Bayes algorithm, which can be used beyond email spam classification. Along the way, we will explore the very basics of natural language processing, and how probability is inherently tied to the very language we use. A probabilistic understanding of language will be built up from the ground with the introduction of the term frequency-inverse document frequency (TF-IDF), which will then be translated into Bayesian probabilities, which is used to classify the emails.
- 面向STEM的mBlock智能機(jī)器人創(chuàng)新課程
- Ansible Quick Start Guide
- 程序設(shè)計(jì)缺陷分析與實(shí)踐
- 大數(shù)據(jù)改變世界
- 機(jī)器人智能運(yùn)動(dòng)規(guī)劃技術(shù)
- 數(shù)據(jù)產(chǎn)品經(jīng)理:解決方案與案例分析
- 21天學(xué)通Visual Basic
- 3D Printing for Architects with MakerBot
- Windows Server 2003系統(tǒng)安全管理
- 網(wǎng)站入侵與腳本攻防修煉
- Applied Data Visualization with R and ggplot2
- LMMS:A Complete Guide to Dance Music Production Beginner's Guide
- Getting Started with Tableau 2019.2
- 基于Quartus Ⅱ的數(shù)字系統(tǒng)Verilog HDL設(shè)計(jì)實(shí)例詳解
- 超好玩的Python少兒編程