官术网_书友最值得收藏!

Data preparation

Almost there! This step has the following five tasks:

  1. Selecting the data.
  2. Cleaning the data.
  3. Constructing the data.
  4. Integrating the data.
  5. Formatting the data.

These tasks are relatively self-explanatory. The goal is to get the data ready to input in the algorithms. This includes merging, feature engineering, and transformations. If imputation is needed, then it happens here as well. Additionally, with R, pay attention to how the outcome needs to be labeled. If your outcome/response variable is Yes/No, it may not work in some packages and will require a transformed or no variable with 1/0. At this point, you should also break your data into the various test sets if applicable: train, test, or validate. This step can be an unmitigated burden, but most experienced people will tell you that it is where you can separate yourself from your peers. With this, let's move on to the payoff, where you earn your money.

主站蜘蛛池模板: 岑溪市| 茶陵县| 金昌市| 灵台县| 兴和县| 彭水| 龙泉市| 山西省| 衢州市| 沙雅县| 沙湾县| 朝阳县| 中卫市| 安化县| 宾川县| 张家口市| 万盛区| 商河县| 凤冈县| 沽源县| 玉环县| 高安市| 云龙县| 洪雅县| 云龙县| 灌阳县| 定陶县| 从化市| 林州市| 邵阳县| 缙云县| 永丰县| 读书| 玉门市| 潍坊市| 长乐市| 安国市| 赤水市| 金湖县| 稷山县| 米脂县|