官术网_书友最值得收藏!

Feature extraction and pipeline

Once your features and datasets have been obtained, the next step is to perform feature extraction. Feature extraction, depending on the size of your dataset and your features, could be one of the most time-consuming elements of the model building process. 

For example, let's say that the results from the aforementioned fictitious John Doe County Election Poll had 40,000 responses. Each response was stored in a SQL database captured from a web form. Performing a SQL query, let's say you then returned all of the data into a CSV file, using which your model can be trained. At a high level, this is your feature extraction and pipeline. For more complex scenarios, such as predicting malicious web content or image classification, the extraction will include binary extraction of specific bytes in files. Properly storing this data to avoid having to re-run the extraction is crucial to iterating quickly (assuming the features did not change). 

In Chapter 11, Training and Building Production Models, we will deep dive into ways to version your feature-extracted data and maintain control over your data, especially as your dataset grows in size.

主站蜘蛛池模板: 磴口县| 施秉县| 商南县| 特克斯县| 子洲县| 莱西市| 清水县| 东明县| 汉阴县| 辽宁省| 宁波市| 江门市| 道孚县| 涟水县| 余江县| 兰州市| 镶黄旗| 铁岭县| 安图县| 当涂县| 广昌县| 惠水县| 肃南| 偏关县| 锡林浩特市| 启东市| 福泉市| 凯里市| 广水市| 云和县| 昆明市| 淮安市| 哈巴河县| 潞西市| 宁远县| 林甸县| 乳源| 米易县| 菏泽市| 墨竹工卡县| 定兴县|