官术网_书友最值得收藏!

The nature of data

Data is the plural of datum, so it is always treated as plural. We can find data in all the situations of the world around us, in all the structured or unstructured, in continuous or discrete conditions, in weather records, stock market logs, in photo albums, music playlists, or in our Twitter accounts. In fact, data can be seen as the essential raw material of any kind of human activity. According to the Oxford English Dictionary:

Data are known facts or things used as basis for inference or reckoning.

As shown in the following figure, we can see Data in two distinct ways: Categorical and Numerical:

Categorical data are values or observations that can be sorted into groups or categories. There are two types of categorical values, nominal and ordinal. A nominal variable has no intrinsic ordering to its categories. For example, housing is a categorical variable having two categories (own and rent). An ordinal variable has an established ordering. For example, age as a variable with three orderly categories (young, adult, and elder).

Numerical data are values or observations that can be measured. There are two kinds of numerical values, discrete and continuous. Discrete data are values or observations that can be counted and are distinct and separate. For example, number of lines in a code. Continuous data are values or observations that may take on any value within a finite or infinite interval. For example, an economic time series such as historic gold prices.

The kinds of datasets used in this book are as follows:

  • E-mails (unstructured, discrete)
  • Digital images (unstructured, discrete)
  • Stock market logs (structured, continuous)
  • Historic gold prices (structured, continuous)
  • Credit approval records (structured, discrete)
  • Social media friends and relationships (unstructured, discrete)
  • Tweets and trending topics (unstructured, continuous)
  • Sales records (structured, continuous)

For each of the projects in this book, we try to use a different kind of data. This book is trying to give the reader the ability to address different kinds of data problems.

主站蜘蛛池模板: 怀宁县| 维西| 扶沟县| 府谷县| 莎车县| 临沭县| 铜山县| 固镇县| 涿鹿县| 桐庐县| 邳州市| 鹤壁市| 呼伦贝尔市| 朝阳县| 和政县| 东乌珠穆沁旗| 邢台县| 原平市| 洛阳市| 蕉岭县| 买车| 万年县| 荃湾区| 兴安盟| 石台县| 平邑县| 镇沅| 五家渠市| 石狮市| 华蓥市| 平舆县| 灌南县| 综艺| 通海县| 阆中市| 桓台县| 东阿县| 西城区| 邓州市| 丹东市| 太原市|