- Mastering Machine Learning with Spark 2.x
- Alex Tellez Max Pumperla Michal Malohlava
- 362字
- 2021-07-02 18:46:09
Random forest model
Now, let's try building a random forest using 10 decision trees.
val numClasses = 2 val categoricalFeaturesInfo = Map[Int, Int]() val numTrees = 10 val featureSubsetStrategy = "auto" val impurity = "gini" val maxDepth = 5 val maxBins = 10 val seed = 42 val rfModel = RandomForest.trainClassifier(trainingData, numClasses, categoricalFeaturesInfo, numTrees, featureSubsetStrategy, impurity, maxDepth, maxBins, seed)
Just like our single decision tree model, we start by declaring the hyper-parameters, many of which should be familiar to you already from the decision tree example. In the preceding code, we will start by creating a random forest of 10 trees, solving a two-class problem. One key feature that is different is the feature subset strategy described as follows:
The featureSubsetStrategy object gives the number of features to use as candidates for making splits at each node. Can either be a fraction (for example, 0.5) or a function based on the number of features in your dataset. The setting auto allows the algorithm to choose this number for you but a common soft-rule states to use the square-root of the number of features you have.
Now that we have trained our model, let's score it against our hold-out set and compute the total error:
def computeError(model: Predictor, data: RDD[LabeledPoint]): Double = { val labelAndPreds = data.map { point => val prediction = model.predict(point.features) (point.label, prediction) } labelAndPreds.filter(r => r._1 != r._2).count.toDouble/data.count } val rfTestErr = computeError(rfModel, testData) println(f"RF Model: Test Error = ${rfTestErr}%.3f")
The output is as follows:

And also compute AUC by using the already defined method computeMetrics:
val rfMetrics = computeMetrics(rfModel, testData) println(f"RF Model: AUC on Test Data = ${rfMetrics.areaUnderROC}%.3f")

Our RF - where we hardcode the hyper-parameters - performs much better than our single decision tree with respect to the overall model error and AUC. In the next section, we will introduce the concept of a grid search and how we can try varying hyper-parameter values / combinations and measure the impact on the model performance.
- Android應用程序開發與典型案例
- 大學計算機應用基礎實踐教程
- R語言經典實例(原書第2版)
- R語言游戲數據分析與挖掘
- Spring Boot+Spring Cloud+Vue+Element項目實戰:手把手教你開發權限管理系統
- 嚴密系統設計:方法、趨勢與挑戰
- 青少年信息學競賽
- Mastering ROS for Robotics Programming
- MongoDB,Express,Angular,and Node.js Fundamentals
- HTML5 APP開發從入門到精通(微課精編版)
- Node.js:來一打 C++ 擴展
- Python預測分析實戰
- ASP.NET開發寶典
- iOS Development with Xamarin Cookbook
- Scratch 3.0少兒積木式編程(6~10歲)