官术网_书友最值得收藏!

  • Machine Learning in Java
  • AshishSingh Bhatia Bostjan Kaluza
  • 464字
  • 2021-06-10 19:29:55

Measurement scales

The most common way to represent data is using a set of attribute-value pairs. Consider the following example:

Bob = { 
height: 185cm, 
eye color: blue, 
hobbies: climbing, sky diving 
} 

For example, Bob has attributes named height, eye color, and hobbies with the values 185cm, blue, climbing, and sky diving respectively.

A set of data can be presented simply as a table, where columns correspond to attributes or features and rows correspond to particular data examples or instances. In supervised machine learning, the attribute whose value we want to predict the outcome, Y, from the values of the other attributes, X, is denoted as the class or target variable, as shown in the following table:

The first thing we notice is how much the attribute values vary. For instance, height is a number, eye color is text, and hobbies are a list. To gain a better understanding of the value types, let's take a closer look at the different types of data or measurement scales. Stanley Smith Stevens (1946) defined the following four scales of measurement with increasingly expressive properties:

  • Nominal data consists of data that is mutually exclusive, but not ordered. Examples include eye color, marital status, type of car owned, and so on.
  • Ordinal data correspond to categories where order matters, but not the difference between the values, such as pain level, student letter grades, service quality ratings, IMDb movie ratings, and so on.
  • Interval data consists of data where the difference between two values is meaningful, but there is no concept of zero, for instance, standardized exam scores, temperature in Fahrenheit, and so on.
  • Ratio data has all of the properties of an interval variable and also a clear definition of zero; when the variable is equal to zero, this variable would be missing. Variables such as height, age, stock prices, and weekly food spending are ratio variables.

Why should we care about measurement scales? Well, machine learning depends heavily on the statistical properties of the data; hence, we should be aware of the limitations that each data type possesses. Some machine learning algorithms can only be applied to a subset of measurement scales.

The following table summarizes the main operations and statistics properties for each of the measurement types:

Furthermore, nominal and ordinal data correspond to discrete values, while interval and ratio data can correspond to continuous values as well. In supervised learning, the measurement scale of the attribute values that we want to predict dictates the kind of machine algorithm that can be used. For instance, predicting discrete values from a limited list is called classification and can be achieved using decision trees, while predicting continuous values is called regression, which can be achieved using model trees.

主站蜘蛛池模板: 绥滨县| 沽源县| 溧阳市| 安溪县| 容城县| 周口市| 高陵县| 大洼县| 十堰市| 水城县| 肇东市| 全南县| 宜州市| 临沧市| 健康| 灵石县| 高青县| 广丰县| 孝昌县| 舞阳县| 龙泉市| 通榆县| 泊头市| 前郭尔| 江门市| 邵阳县| 赣榆县| 宁阳县| 临西县| 夏河县| 岐山县| 桐庐县| 湟源县| 沭阳县| 华宁县| 龙泉市| 永胜县| 康保县| 闵行区| 云和县| 岑溪市|