官术网_书友最值得收藏!

Developing a better approach to understanding data

Whether you are a data developer, systems analyst, programmer/developer, or data scientist, or other business or technology professional, you need to be able to develop a comprehensive relationship with the data you are working with or designing an application or database schema for.

Some might rely on the data specifications provided to you as part of the overall project plan or requirements, and still, some (usually those with more experience) may supplement their understanding by performing some generic queries on the data, either way, this seldom is enough.

In fact, in industry case studies, unclear, misunderstood, or incomplete requirements or specifications consistently rank in the top five as reasons for project failure or added risk.

Profiling data is a process, characteristic of data science, aimed at establishing data intimacy (or a more clear and concise grasp of the data and its inward relationships). Profiling data also establishes context to which there are several general contextual categories, which can be used to augment or increase the value and understanding of data for any purpose or project.

These categories include the following:

  • Definitions and explanations: These help gain additional information or attributes about data points within your data
  • Comparisons: This help add a comparable value to a data point within your data
  • Contrasts: This help add an opposite to a data point to see whether it perhaps determines a different perspective
  • Tendencies: These are typical mathematical calculations, summaries, or aggregations
  • Dispersion: This includes mathematical calculations (or summaries) such as range, variance, and standard deviation, describing the average of a dataset (or group within the data)
Think of data profiling as the process you may have used for examining data in a data file and collecting statistics and information about that data. Those statistics most likely drove the logic implemented in a program or how you related data in tables of a database.
主站蜘蛛池模板: 江都市| 镇雄县| 博湖县| 通道| 南丰县| 海原县| 镇宁| 牡丹江市| 开阳县| 特克斯县| 睢宁县| 石门县| 姚安县| 华坪县| 资阳市| 毕节市| 和政县| 和田市| 郓城县| 中宁县| 若尔盖县| 明光市| 文水县| 开远市| 新化县| 循化| 石景山区| 兰坪| 山丹县| 九龙坡区| 龙南县| 定兴县| 兰州市| 河南省| 铜梁县| 鹰潭市| 内江市| 通渭县| 青海省| 陵川县| 漳州市|