官术网_书友最值得收藏!

How it works...

Leveraging the dataset we built up in the Scraping GitHub for files of a specific type recipe, we place files in different directories, based on their file type, and then specify the paths in preparation for building our classifier (step 1). The code for this recipe assumes that the "JavascriptSamples" directory and others contain the samples, and have no subdirectories. We read in all files into a corpus, and record their labels (step 2). We train-test split the data and prepare a pipeline that will perform basic NLP on the files, followed by a random forest classifier (step 3). The choice of classifier here is meant for illustrative purposes, rather than to imply a best choice of classifier for this type of data. Finally, we perform the basic, but important, steps in the process of creating a machine learning classifier, consisting of fitting the pipeline to the training data and then assessing its performance on the testing set by measuring its accuracy and confusion matrix (step 4).

主站蜘蛛池模板: 容城县| 兴文县| 武乡县| 沾化县| 遂宁市| 许昌市| 平南县| 皋兰县| 元阳县| 潞城市| 河池市| 上栗县| 抚州市| 丹巴县| 宣城市| 遵化市| 宁津县| 山东省| 黄浦区| 应用必备| 夏津县| 新晃| 防城港市| 定边县| 宝清县| 福鼎市| 静安区| 涿鹿县| 莲花县| 防城港市| 余干县| 秦皇岛市| 桦川县| 邵武市| 乌鲁木齐市| 池州市| 杭锦旗| 上饶市| 尤溪县| 三门峡市| 札达县|