官术网_书友最值得收藏!

How it works...

Leveraging the dataset we built up in the Scraping GitHub for files of a specific type recipe, we place files in different directories, based on their file type, and then specify the paths in preparation for building our classifier (step 1). The code for this recipe assumes that the "JavascriptSamples" directory and others contain the samples, and have no subdirectories. We read in all files into a corpus, and record their labels (step 2). We train-test split the data and prepare a pipeline that will perform basic NLP on the files, followed by a random forest classifier (step 3). The choice of classifier here is meant for illustrative purposes, rather than to imply a best choice of classifier for this type of data. Finally, we perform the basic, but important, steps in the process of creating a machine learning classifier, consisting of fitting the pipeline to the training data and then assessing its performance on the testing set by measuring its accuracy and confusion matrix (step 4).

主站蜘蛛池模板: 定州市| 青铜峡市| 商河县| 克什克腾旗| 葵青区| 庐江县| 南充市| 施甸县| 连南| 普洱| 正定县| 阿巴嘎旗| 扎兰屯市| 松潘县| 新营市| 武乡县| 汽车| 夹江县| 上林县| 汨罗市| 龙井市| 轮台县| 邵阳市| 合肥市| 贵德县| 荔浦县| 旌德县| 宁强县| 油尖旺区| 保靖县| 鹤岗市| 民勤县| 兴化市| 四会市| 新源县| 芜湖县| 汨罗市| 筠连县| 栾城县| 闽清县| 宁国市|