官术网_书友最值得收藏!

The nature of data

Data is the plural of datum, so it is always treated as plural. We can find data in all the situations of the world around us, in all the structured or unstructured, in continuous or discrete conditions, in weather records, stock market logs, in photo albums, music playlists, or in our Twitter accounts. In fact, data can be seen as the essential raw material of any kind of human activity. According to the Oxford English Dictionary:

Data are known facts or things used as basis for inference or reckoning.

As shown in the following figure, we can see Data in two distinct ways: Categorical and Numerical:

Categorical data are values or observations that can be sorted into groups or categories. There are two types of categorical values, nominal and ordinal. A nominal variable has no intrinsic ordering to its categories. For example, housing is a categorical variable having two categories (own and rent). An ordinal variable has an established ordering. For example, age as a variable with three orderly categories (young, adult, and elder).

Numerical data are values or observations that can be measured. There are two kinds of numerical values, discrete and continuous. Discrete data are values or observations that can be counted and are distinct and separate. For example, number of lines in a code. Continuous data are values or observations that may take on any value within a finite or infinite interval. For example, an economic time series such as historic gold prices.

The kinds of datasets used in this book are as follows:

  • E-mails (unstructured, discrete)
  • Digital images (unstructured, discrete)
  • Stock market logs (structured, continuous)
  • Historic gold prices (structured, continuous)
  • Credit approval records (structured, discrete)
  • Social media friends and relationships (unstructured, discrete)
  • Tweets and trending topics (unstructured, continuous)
  • Sales records (structured, continuous)

For each of the projects in this book, we try to use a different kind of data. This book is trying to give the reader the ability to address different kinds of data problems.

主站蜘蛛池模板: 微山县| 文山县| 定安县| 增城市| 福泉市| 通河县| 佛教| 潮安县| 岳阳县| 肥西县| 迁西县| 阿拉尔市| 澄江县| 理塘县| 焦作市| 喀喇| 凤庆县| 耒阳市| 蒙自县| 清丰县| 牡丹江市| 苏州市| 芒康县| 靖江市| 聂拉木县| 绥化市| 洞口县| 武功县| 鄂尔多斯市| 鄂托克旗| 宝应县| 天全县| 乌兰察布市| 施秉县| 定结县| 肥东县| 舟山市| 安阳县| 那坡县| 常熟市| 鄂尔多斯市|