官术网_书友最值得收藏!

Analyzing the data and/or applying machine learning to the data

In this phase, quite a bit of analysis takes place as the data scientist (driven by a high level of scientific curiosity and experience) attempts to shape a story based upon an observation or the interpretation of their understanding of the data (up to this point). The data scientist continues to slice and dice the data, using analytics or BI packages—such as Tableau or Pentaho or an open source solution such as R or Python—to create a concrete data storyline. Once again, based on these analysis results, the data scientist may elect to again go back to a prior phase, pulling new data, processing and reprocessing, and creating additional visualizations. At some point, when appropriate progress has been made, the data scientist may decide that the data is at such point where data analysis can begin. Machine learning (defined further later in this chapter) has evolved over time from being more of an exercise in pattern recognition to now being defined as utilizing a selected statistical method to dig deeper, using the data and results of the analysis of this phase to learn and make a prediction, on the project data.

The ability of a data scientist to extract a quantitative result from data through machine learning and express it as something that everyone (not just other data scientists) can understand immediately is an invaluable skill, and we will talk more about this throughout this book.

主站蜘蛛池模板: 滦南县| 西城区| 潢川县| 寿阳县| 武陟县| 邻水| 青龙| 武邑县| 长子县| 闸北区| 濮阳县| 隆回县| 新野县| 藁城市| 高碑店市| 洪泽县| 文成县| 衢州市| 巢湖市| 竹北市| 杭锦后旗| 南充市| 青川县| 海晏县| 赤城县| 罗源县| 射阳县| 靖宇县| 白朗县| 南华县| 昌都县| 米脂县| 崇明县| 阳江市| 稷山县| 孝感市| 赞皇县| 辉县市| 玛纳斯县| 即墨市| 南充市|