官术网_书友最值得收藏!

Descriptive analysis

The first problem to solve in almost any data science scenario concerns understanding its nature. We need to know how the system works or what a dataset is describing. Without this analysis, our knowledge is too limited to make any assumption or hypothesis. For example, we can observe a chart of the average temperature in a city for several years. If we are unable to describe the time series discovering the correlation, seasonalities, and trends, any other question remains unsolved. In our specific context, if we don't discover the similarities between groups of objects, we cannot try to find out a way to summarize their common features. The data scientist has to employ specific tools for every particular problem, but, at the end of this stage, all possible (and helpful) questions must be answered.

Moreover, as this process must have clear business value, it's important to involve different stakeholders with the purpose of gathering their knowledge and converting it into a common language. For example, when working with healthcare data, a physician might talk about hereditary factors, but for our purpose, it's preferable to say that there's a correlation among some samples, so we're not fully authorized to treat them as statistically independent elements. In general, the outcome of descriptive analysis is a summary containing all metric evaluations and conclusions that are necessary to qualify the context, and reducing uncertainty. In the example of the temperature chart, the data scientist should be able to answer the auto-correlation, the periodicity of the peaks, the number of potential outliers, and the presence of trends.

主站蜘蛛池模板: 长阳| 丹东市| 新郑市| 江源县| 富阳市| 邢台县| 平邑县| 波密县| 中江县| 泌阳县| 化德县| 望谟县| 黄平县| 元江| 上犹县| 海晏县| 茌平县| 福海县| 孙吴县| 册亨县| 乌兰县| 舞钢市| 蒙城县| 苍溪县| 三都| 普兰店市| 开远市| 隆昌县| 新竹县| 嘉禾县| 扎赉特旗| 壤塘县| 吴江市| 普兰县| 信丰县| 徐州市| 米易县| 余江县| 绥芬河市| 余姚市| 长武县|