官术网_书友最值得收藏!

Data preparation

Almost there! This step has the following five tasks:

  1. Select the data
  2. Clean the data
  3. Construct the data
  4. Integrate the data
  5. Format the data

These tasks are relatively self-explanatory. The goal is to get the data ready to input in the algorithms. This includes merging, feature engineering, and transformations. If imputation is needed, then it happens here as well. Additionally, with R, pay attention to how the outcome needs to be labeled. If your outcome/response variable is Yes/No, it may not work in some packages and will require a transformed or no variable with 1/0. At this point, you should also break your data into the various test sets if applicable: train, test, or validate. This step can be an unforgivable burden, but most experienced people will tell you that it is where you can separate yourself from your peers. With this, let's move on to the money step.

主站蜘蛛池模板: 梁河县| 中山市| 巩留县| 陆良县| 南开区| 庄河市| 阿鲁科尔沁旗| 五指山市| 文昌市| 贺兰县| 稷山县| 井陉县| 永吉县| 吴桥县| 武胜县| 屏南县| 长宁区| 新兴县| 蒙阴县| 铅山县| 南陵县| 安平县| 宜宾县| 海兴县| 建水县| 敦化市| 手游| 同心县| 分宜县| 永州市| 临漳县| 南靖县| 桐城市| 铜梁县| 偏关县| 玉林市| 黔西| 上犹县| 蓬安县| 福海县| 克拉玛依市|