官术网_书友最值得收藏!

Summary

This chapter discussed various organizational processes used to prepare data for analysis. When used in computer programs, each data value is assigned a data type, which characterizes the data and defines the kind of operations that can be performed upon it.

When stored in a relational database, data is organized into tables, in which each row corresponds to one data point, and where all the data in each column corresponds to a single field of a specified type. The key field(s) has unique values, which allows indexed searching.

A similar viewpoint is the organization of data into key-value pairs. As in relational database tables, the key fields must be unique. A hash table implements the key-value paradigm with a hash function that determines where the key's associated data is stored.

Data files are formatted according to their file type's specifications. The comma-separated value type (CSV) is one of the most common. Common structured data file types include XML and JSON.

The information that describes the structure of the data is called its metadata. That information is required for the automatic processing of the data.

Specific data processes described here include data cleaning and filtering (removing erroneous data), data scaling (adjusting numeric values according to a specified scale), sorting, merging, and hashing.

主站蜘蛛池模板: 察雅县| 嘉义县| 永昌县| 玉屏| 邯郸县| 白沙| 昌平区| 康乐县| 河南省| 冀州市| 墨玉县| 丹寨县| 汤阴县| 岫岩| 新蔡县| 青浦区| 甘德县| 福安市| 庆阳市| 茂名市| 高尔夫| 德庆县| 甘孜县| 耒阳市| 旬阳县| 青田县| 揭阳市| 麦盖提县| 蒙城县| 隆安县| 潮安县| 昌邑市| 夹江县| 乐都县| 丹阳市| 武穴市| 洞口县| 磐石市| 新竹县| 赣榆县| 增城市|