官术网_书友最值得收藏!

Data preparation

Almost there! This step has the following five tasks:

  1. Selecting the data.
  2. Cleaning the data.
  3. Constructing the data.
  4. Integrating the data.
  5. Formatting the data.

These tasks are relatively self-explanatory. The goal is to get the data ready to input in the algorithms. This includes merging, feature engineering, and transformations. If imputation is needed, then it happens here as well. Additionally, with R, pay attention to how the outcome needs to be labeled. If your outcome/response variable is Yes/No, it may not work in some packages and will require a transformed or no variable with 1/0. At this point, you should also break your data into the various test sets if applicable: train, test, or validate. This step can be an unmitigated burden, but most experienced people will tell you that it is where you can separate yourself from your peers. With this, let's move on to the payoff, where you earn your money.

主站蜘蛛池模板: 赞皇县| 鄯善县| 大石桥市| 奇台县| 汉阴县| 延津县| 松江区| 呼图壁县| 海晏县| 开封县| 大兴区| 惠州市| 根河市| 黑水县| 伊春市| 天长市| 台南县| 清水河县| 义马市| 云阳县| 永清县| 乌兰浩特市| 昌吉市| 布尔津县| 吴堡县| 永年县| 柯坪县| 永登县| 张家界市| 阳西县| 宜阳县| 大丰市| 商丘市| 油尖旺区| 沅陵县| 密云县| 兴义市| 明溪县| 沽源县| 始兴县| 琼中|