官术网_书友最值得收藏!

Collecting data

This should be somewhat obvious—without (at least some) data, we cannot perform any of the subsequent steps (although one might argue the point of inference, that would be inappropriate. There is no magic in data science. We, as data scientists, don't make something from anything. Inference (which we'll define later in this chapter) requires at least some data to begin with.

Some new concepts for collecting data include the fact that data can be collected from ample of sources, and the number and types of data sources continue to grow daily. In addition, how data is collected might require a perspective new to a data developer; data for data science isn't always sourced from a relational database, rather from machine-generated logging files, online surveys, performance statistics, and so on; again, the list is ever evolving.

Another point to ponder—collecting data also involves supplementation. For example, a data scientist might determine that he or she needs to be adding additional demographics to a particular pool of application data previously collected, processed, and reviewed.

主站蜘蛛池模板: 荆州市| 兴业县| 弋阳县| 梁山县| 汨罗市| 宜君县| 红桥区| 竹北市| 康保县| 新龙县| 宜良县| 锦屏县| 德保县| 信丰县| 依兰县| 桃园县| 九龙坡区| 英德市| 尉氏县| 通道| 兴山县| 泸西县| 沂南县| 阿克陶县| 建湖县| 彭水| 色达县| 通渭县| 莆田市| 普兰县| 鄂尔多斯市| 务川| 信丰县| 潼南县| 丹巴县| 旺苍县| 镇坪县| 抚远县| 汨罗市| 阜平县| 怀远县|