官术网_书友最值得收藏!

Processing data

The processing (or transformation) of data is where the data scientist's programming skills will come in to play (although you can often find a data scientist performing some sort of processing in other steps, like collecting, visualizing, or learning).

Keep in mind that there are many aspects of processing that occur within data science. The most common are formatting (and reformatting), which involves activities such as mechanically setting data types, aggregating values, reordering or dropping columns, and so on, cleansing (or addressing the quality of the data), which is solving for such things as default or missing values, incomplete or inapposite values, and so on, and profiling, which adds context to the data by creating a statistical understanding of the data.

The processing to be completed on the data can be simple (for example, it can be a very simple and manual event requiring repetitious updates to data in an MS Excel worksheet), or complex (as with the use of programming languages such as R or Python), or even more sophisticated (as when processing logic is coded into routines that can then be scheduled and rerun automatically on new populations of data).

主站蜘蛛池模板: 依安县| 商丘市| 元氏县| 武胜县| 策勒县| 车险| 沙田区| 连平县| 汉寿县| 兴国县| 泉州市| 南昌县| 城步| 改则县| 大悟县| 宁河县| 日土县| 浙江省| 尼木县| 宜兴市| 临洮县| 建瓯市| 新民市| 革吉县| 外汇| 英山县| 洛川县| 忻州市| 平度市| 禹城市| 砚山县| 康马县| 三台县| 东阿县| 邓州市| 富民县| 望都县| 莫力| 清苑县| 阿尔山市| 韶关市|