官术网_书友最值得收藏!

Data preparation

Almost there! This step has the following five tasks:

  1. Select the data
  2. Clean the data
  3. Construct the data
  4. Integrate the data
  5. Format the data

These tasks are relatively self-explanatory. The goal is to get the data ready to input in the algorithms. This includes merging, feature engineering, and transformations. If imputation is needed, then it happens here as well. Additionally, with R, pay attention to how the outcome needs to be labeled. If your outcome/response variable is Yes/No, it may not work in some packages and will require a transformed or no variable with 1/0. At this point, you should also break your data into the various test sets if applicable: train, test, or validate. This step can be an unforgivable burden, but most experienced people will tell you that it is where you can separate yourself from your peers. With this, let's move on to the money step.

主站蜘蛛池模板: 临武县| 沂源县| 吐鲁番市| 东兰县| 海晏县| 新化县| 巴中市| 辛集市| 岐山县| 梨树县| 新安县| 松滋市| 尼勒克县| 普兰店市| 太原市| 梁平县| 洛扎县| 石屏县| 广南县| 邵阳市| 湛江市| 黎川县| 鹤岗市| 晋城| 谢通门县| 南丹县| 扬州市| 庄浪县| 宁陕县| 清远市| 家居| 荥经县| 凤凰县| 高雄县| 九江县| 定日县| 北安市| 花莲市| 神池县| 丰都县| 布尔津县|