官术网_书友最值得收藏!

Chapter 4. Topic Modeling

In the previous chapter we clustered texts into groups. This is a very useful tool, but it is not always appropriate. Clustering results in each text belonging to exactly one cluster. This book is about machine learning and Python. Should it be grouped with other Python-related works or with machine-related works? In the paper book age, a bookstore would need to make this decision when deciding where to stock it. In the Internet store age, however, the answer is that this book is both about machine learning and Python, and the book can be listed in both sections. We will, however, not list it in the food section.

In this chapter, we will learn methods that do not cluster objects, but put them into a small number of groups called topics. We will also learn how to derive between topics that are central to the text and others only that are vaguely mentioned (this book mentions plotting every so often, but it is not a central topic such as machine learning is). The subfield of machine learning that deals with these problems is called topic modeling.

主站蜘蛛池模板: 罗城| 秦安县| 囊谦县| 喀喇沁旗| 乐亭县| 百色市| 石狮市| 甘泉县| 邳州市| 永胜县| 隆化县| 沾益县| 孙吴县| 思茅市| 五家渠市| 安国市| 五常市| 隆安县| 凤山市| 崇文区| 庆元县| 百色市| 云安县| 武夷山市| 巴东县| 毕节市| 揭西县| 尼玛县| 中宁县| 昂仁县| 西林县| 西乌| 阿图什市| 霍城县| 澄城县| 炎陵县| 依兰县| 永昌县| 江永县| 安吉县| 霍城县|