官术网_书友最值得收藏!

Data science - an iterative process

Often, the process flow of many big data projects is iterative, which means a lot of back-and-forth testing new ideas, new features to include, tweaking various hyper-parameters, and so on, with a fail fast attitude. The end result of these projects is usually a model that can answer a question being posed. Notice that we didn't say accurately answer a question being posed! One pitfall of many data scientists these days is their inability to generalize a model for new data, meaning that they have overfit their data so that the model provides poor results when given new data. Accuracy is extremely task-dependent and is usually dictated by the business needs with some sensitivity analysis being done to weigh the cost-benefits of the model outcomes. However, there are a few standard accuracy measures that we will go over throughout this book so that you can compare various models to see how changes to the model impact the result.

H2O is constantly giving meetup talks and inviting others to give machine learning meetups around the US and Europe. Each meetup or conference slides is available on SlideShare ( http://www.slideshare.com/0xdata) or YouTube. Both the sites serve as great sources of information not only about machine learning and statistics but also about distributed systems and computation. For example, one of the most interesting presentations highlights the "Top 10 pitfalls in a data scientist job" ( http://www.slideshare.net/0xdata/h2o-world-top-10-data-science-pitfalls-mark-landry)
主站蜘蛛池模板: 涟水县| 东阿县| 宁德市| 安阳县| 黔江区| 清水河县| 佛学| 灵寿县| 酒泉市| 桂阳县| 太仆寺旗| 洛扎县| 苏州市| 水城县| 邳州市| 顺昌县| 梧州市| 大厂| 兴国县| 镇平县| 嘉祥县| 灵璧县| 买车| 阜宁县| 三明市| 荆门市| 邻水| 沾益县| 蕉岭县| 长葛市| 武夷山市| 丹东市| 桃园县| 麻栗坡县| 凭祥市| 石门县| 察雅县| 汨罗市| 泽普县| 项城市| 扬中市|