官术网_书友最值得收藏!

How it works...

Leveraging the dataset we built up in the Scraping GitHub for files of a specific type recipe, we place files in different directories, based on their file type, and then specify the paths in preparation for building our classifier (step 1). The code for this recipe assumes that the "JavascriptSamples" directory and others contain the samples, and have no subdirectories. We read in all files into a corpus, and record their labels (step 2). We train-test split the data and prepare a pipeline that will perform basic NLP on the files, followed by a random forest classifier (step 3). The choice of classifier here is meant for illustrative purposes, rather than to imply a best choice of classifier for this type of data. Finally, we perform the basic, but important, steps in the process of creating a machine learning classifier, consisting of fitting the pipeline to the training data and then assessing its performance on the testing set by measuring its accuracy and confusion matrix (step 4).

主站蜘蛛池模板: 津南区| 永昌县| 甘德县| 桓台县| 四会市| 六枝特区| 临漳县| 云南省| 安达市| 新疆| 东阳市| 班玛县| 平湖市| 灵宝市| 抚宁县| 武陟县| 临潭县| 广昌县| 区。| 沧州市| 郯城县| 兴义市| 霍州市| 抚远县| 上饶县| 宜章县| 盐池县| 通化市| 建始县| 长寿区| 察雅县| 新竹县| 勐海县| 白银市| 屏边| 阳曲县| 台南县| 平昌县| 桑植县| 会泽县| 宜阳县|