官术网_书友最值得收藏!

Developing a better approach to understanding data

Whether you are a data developer, systems analyst, programmer/developer, or data scientist, or other business or technology professional, you need to be able to develop a comprehensive relationship with the data you are working with or designing an application or database schema for.

Some might rely on the data specifications provided to you as part of the overall project plan or requirements, and still, some (usually those with more experience) may supplement their understanding by performing some generic queries on the data, either way, this seldom is enough.

In fact, in industry case studies, unclear, misunderstood, or incomplete requirements or specifications consistently rank in the top five as reasons for project failure or added risk.

Profiling data is a process, characteristic of data science, aimed at establishing data intimacy (or a more clear and concise grasp of the data and its inward relationships). Profiling data also establishes context to which there are several general contextual categories, which can be used to augment or increase the value and understanding of data for any purpose or project.

These categories include the following:

  • Definitions and explanations: These help gain additional information or attributes about data points within your data
  • Comparisons: This help add a comparable value to a data point within your data
  • Contrasts: This help add an opposite to a data point to see whether it perhaps determines a different perspective
  • Tendencies: These are typical mathematical calculations, summaries, or aggregations
  • Dispersion: This includes mathematical calculations (or summaries) such as range, variance, and standard deviation, describing the average of a dataset (or group within the data)
Think of data profiling as the process you may have used for examining data in a data file and collecting statistics and information about that data. Those statistics most likely drove the logic implemented in a program or how you related data in tables of a database.
主站蜘蛛池模板: 拉萨市| 万安县| 凉城县| 施甸县| 怀化市| 遵化市| 施秉县| 南丹县| 章丘市| 离岛区| 克山县| 鹤峰县| 元谋县| 托克逊县| 新蔡县| 新竹市| 梨树县| 平江县| 景东| 乌恰县| 平阳县| 莆田市| 临沂市| 宜宾市| 当雄县| 玉溪市| 彰化市| 科技| 南郑县| 宁城县| 武安市| 三门县| 成武县| 宜章县| 微山县| 厦门市| 山西省| 维西| 焦作市| 禹州市| 秀山|