- Effective Amazon Machine Learning
- Alexis Perrier
- 267字
- 2021-07-03 00:17:47
Engineering data versus model variety
Having a large choice of algorithms for your predictions is always a good thing, but at the end of the day, domain knowledge and the ability to extract meaningful features from clean data is often what wins the game.
Kaggle is a well-known platform for predictive analytics competitions, where the best data scientists across the world compete to make predictions on complex datasets. In these predictive competitions, gaining a few decimals on your prediction score is what makes the difference between earning the prize or being just an extra line on the public leaderboard among thousands of other competitors. One thing Kagglers quickly learn is that choosing and tuning the model is only half the battle. Feature extraction or how to extract relevant predictors from the dataset is often the key to winning the competition.
In real life, when working on business related problems, the quality of the data processing phase and the ability to extract meaningful signal out of raw data is the most important and time consuming part of building an efficient predictive model. It is well know that "data preparation accounts for about 80% of the work of data scientists" (http://www.forbes.com/sites/gilpress/2016/03/23/data-preparation-most-time-consuming-least-enjoyable-data-science-task-survey-says/). Model selection and algorithm optimization remains an important part of the work but is often not the deciding factor when implementation is concerned.
A solid and robust implementation that is easy to maintain and connects to your ecosystem seamlessly is often preferred to an overly complex model developed and coded in-house, especially when the scripted model only produces small gains when compared to a service based implementation.
- 數據產品經理高效學習手冊:產品設計、技術常識與機器學習
- 數據之巔:數據的本質與未來
- SQL Server 2012數據庫技術與應用(微課版)
- 從0到1:數據分析師養成寶典
- MongoDB管理與開發精要
- Access 2007數據庫應用上機指導與練習
- PySpark大數據分析與應用
- Access 2016數據庫技術及應用
- UDK iOS Game Development Beginner's Guide
- Lean Mobile App Development
- 區塊鏈:看得見的信任
- SQL應用及誤區分析
- 大數據治理與安全:從理論到開源實踐
- 聯動Oracle:設計思想、架構實現與AWR報告
- SQL Server 2008寶典(第2版)