官术网_书友最值得收藏!

Feature extraction and pipeline

Once your features and datasets have been obtained, the next step is to perform feature extraction. Feature extraction, depending on the size of your dataset and your features, could be one of the most time-consuming elements of the model building process. 

For example, let's say that the results from the aforementioned fictitious John Doe County Election Poll had 40,000 responses. Each response was stored in a SQL database captured from a web form. Performing a SQL query, let's say you then returned all of the data into a CSV file, using which your model can be trained. At a high level, this is your feature extraction and pipeline. For more complex scenarios, such as predicting malicious web content or image classification, the extraction will include binary extraction of specific bytes in files. Properly storing this data to avoid having to re-run the extraction is crucial to iterating quickly (assuming the features did not change). 

In Chapter 11, Training and Building Production Models, we will deep dive into ways to version your feature-extracted data and maintain control over your data, especially as your dataset grows in size.

主站蜘蛛池模板: 铁力市| 惠来县| 延边| 永善县| 亚东县| 武汉市| 土默特左旗| 精河县| 宜阳县| 疏附县| 齐齐哈尔市| 商洛市| 乡宁县| 西充县| 太保市| 扬中市| 盘山县| 重庆市| 华蓥市| 永和县| 新建县| 黄大仙区| 湛江市| 金川县| 禄丰县| 康马县| 凤山县| 胶州市| 江阴市| 大连市| 灵丘县| 大名县| 海淀区| 恭城| 汉中市| 成都市| 德化县| 海伦市| 若羌县| 湘西| 安陆市|