官术网_书友最值得收藏!

Defining your features

The second step in machine learning is defining your features. Think of features as components or attributes of the problem you wish to solve. In machine learning – specifically, when creating a new model – features are one of the biggest impacts on your model's performance. Properly thinking through your problem statement will promote an initial set of features that will drive differentiation between your dataset and model results. Going back to the Mayor example in the preceding section, what features would you consider data points for the citizen? Perhaps start by looking at the Mayor's competition and where he/she sits on issues in ways that differ from other candidates. These values could be turned into features and then made into a poll for citizens of John Doe County to answer. Using these data points would create a solid first pass at features. One aspect here that is also found in model building is running several iterations of feature engineering and model training, especially as your dataset grows. After model evaluation, feature importance is used to determine what features are actually driving your predictions. Occasionally, you will find that gut-instinct features can actually be inconsequential after a few iterations of model training and feature engineering.

In Chapter 11, Training and Building Production Models, we will deep dive into best practices when defining features and common approaches to complex problems to obtain a solid first pass at feature engineering.

主站蜘蛛池模板: 桐柏县| 广西| 许昌市| 罗甸县| 凉城县| 禹城市| 盐亭县| 越西县| 定州市| 静海县| 深水埗区| 图木舒克市| 章丘市| 四会市| 阳朔县| 横峰县| 巴南区| 旺苍县| 铜陵市| 灌南县| 余姚市| 措勤县| 苏尼特左旗| 纳雍县| 罗定市| 北辰区| 砀山县| 酒泉市| 奈曼旗| 惠州市| 巍山| 济宁市| 遵化市| 麟游县| 新巴尔虎右旗| 武胜县| 神木县| 上林县| 申扎县| 阳原县| 常山县|