官术网_书友最值得收藏!

Setting parameters in Random Forests

The Random Forest implementation in scikit-learn is called RandomForestClassifier, and it has a number of parameters. As Random Forests use many instances of DecisionTreeClassifier, they share many of the same parameters such as the criterion (Gini Impurity or Entropy/information gain), max_features, and min_samples_split.

There are some new parameters that are used in the ensemble process:

  • n_estimators: This dictates how many decision trees should be built. A higher value will take longer to run, but will (probably) result in a higher accuracy.
  • oob_score: If true, the method is tested using samples that aren't in the random subsamples chosen for training the decision trees.
  • n_jobs: This specifies the number of cores to use when training the decision trees in parallel.

The scikit-learn package uses a library called Joblib for inbuilt parallelization. This parameter dictates how many cores to use. By default, only a single core is used--if you have more cores, you can increase this, or set it to -1 to use all cores.

主站蜘蛛池模板: 保靖县| 资阳市| 沧源| 弥渡县| 肇源县| 阜南县| 贡嘎县| 九龙县| 新闻| 梨树县| 全椒县| 元氏县| 疏附县| 徐闻县| 林口县| 马公市| 民丰县| 余庆县| 阳谷县| 呼和浩特市| 禄劝| 上栗县| 乐亭县| 达孜县| 新野县| 高阳县| 诸城市| 抚顺市| 綦江县| 北安市| 衡东县| 鸡泽县| 盱眙县| 合江县| 乌兰县| 淄博市| 兴和县| 南雄市| 嵩明县| 永昌县| 磴口县|