官术网_书友最值得收藏!

Classifying text with language models

Text classification is an application of classification algorithms. However, the text is a combination of words in a specific order. Hence, you can observe that a text document with a class variable is not similar to the dataset that we presented in the table in the Classification algorithms section. 

A text dataset can be represented as shown in the following table:

Table 2: Example of a Twitter dataset

 

For this chapter, we have built a dataset based on tweets from two different accounts. We also have provided code in the following sections so that you can create your own datasets to try this example. Our purpose is to build a smart application that is capable of predicting the source of a tweet just by reading the tweet text. We will collect several tweets by the United States Republican Party (@GOP) and the Democratic Party (@TheDemocrats) to build a model that can predict which party wrote a given tweet. In order to do this, we will randomly select some tweets from each party and submit them through the model to check whether the prediction actually matched reality.

主站蜘蛛池模板: 安顺市| 微山县| 南江县| 江华| 玉树县| 平武县| 东莞市| 历史| 小金县| 沂水县| 松桃| 娄底市| 庄浪县| 凌云县| 方山县| 明水县| 长宁县| 津南区| 阳朔县| 英吉沙县| 榆林市| 五常市| 广丰县| 五莲县| 海丰县| 泾源县| 卢龙县| 皮山县| 拉孜县| 临猗县| 武陟县| 合肥市| 密山市| 家居| 孟村| 泽库县| 绵竹市| 和林格尔县| 新巴尔虎左旗| 乡宁县| 察隅县|