官术网_书友最值得收藏!

Chapter 4. Topic Modeling

In the previous chapter we clustered texts into groups. This is a very useful tool, but it is not always appropriate. Clustering results in each text belonging to exactly one cluster. This book is about machine learning and Python. Should it be grouped with other Python-related works or with machine-related works? In the paper book age, a bookstore would need to make this decision when deciding where to stock it. In the Internet store age, however, the answer is that this book is both about machine learning and Python, and the book can be listed in both sections. We will, however, not list it in the food section.

In this chapter, we will learn methods that do not cluster objects, but put them into a small number of groups called topics. We will also learn how to derive between topics that are central to the text and others only that are vaguely mentioned (this book mentions plotting every so often, but it is not a central topic such as machine learning is). The subfield of machine learning that deals with these problems is called topic modeling.

主站蜘蛛池模板: 化德县| 汝阳县| 余干县| 古蔺县| 枣阳市| 新巴尔虎右旗| 陵川县| 崇左市| 安西县| 海安县| 云安县| 简阳市| 交口县| 凭祥市| 静宁县| 辽宁省| 咸宁市| 永仁县| 武山县| 崇仁县| 叶城县| 营口市| 赣州市| 宜宾市| 德格县| 化州市| 宁安市| 吉林省| 呼伦贝尔市| 郎溪县| 竹北市| 财经| 昭苏县| 永靖县| 芦山县| 沙洋县| 东辽县| 新营市| 林周县| 宁化县| 旬邑县|