官术网_书友最值得收藏!

Model training and testing loop

Once we have our training data in a form that is suitable for our model, we can proceed with the model's training and testing phase. During this phase, we are primarily concerned with model selection. This can refer to choosing the best modeling approach for our task, or the best parameter settings for a given model. In fact, the term model selection often refers to both of these processes, as, in many cases, we might wish to try out various models and select the best performing model (with the best performing parameter settings for each model). It is also common to explore the application of combinations of different models (known as ensemble methods) in this phase.

This is typically a fairly straightforward process of running our chosen model on our training dataset and testing its performance on a test dataset (that is, a set of data that is held out for the evaluation of the model that the model has not seen in the training phase). This process is referred to as cross-validation.

Sometimes, the model tends to overfit or doesn't converge fully depending on the type of the dataset and the number of Iterations used.

Using Ensemble methods such as Gradient Boosted Trees and Random forest are techniques used in ML and Spark to avoid overfitting.

However, due to the large scale of data we are typically working with, it is often useful to carry out this initial train-test loop on a smaller representative sample of our full dataset or perform model selection using parallel methods where possible.

For this part of the pipeline, Spark's built-in machine learning library, MLlib, is a perfect fit. We will focus most of our attention in this book on the model training, evaluation, and cross-validation steps for various machine learning techniques, using MLlib and Spark's core features.

主站蜘蛛池模板: 沁水县| 新宁县| 乌拉特中旗| 肥西县| 巨野县| 赤峰市| 哈尔滨市| 乌拉特前旗| 正定县| 邛崃市| 崇明县| 镇原县| 厦门市| 方山县| 达日县| 柳林县| 淮南市| 呼玛县| 丹巴县| 井陉县| 城口县| 清原| 射洪县| 西盟| 丹东市| 永胜县| 武安市| 平湖市| 成武县| 宜春市| 绵阳市| 会东县| 仁寿县| 惠来县| 仲巴县| 嵊泗县| 秦皇岛市| 塔河县| 乃东县| 深泽县| 吴川市|