官术网_书友最值得收藏!

Math and statistics

Statistics and other math skills are essential in several phases of the data science project. Even in the beginning of data exploration, you'll be dividing the features of your data observations into categories:

  • Categorical
  • Numeric:
    • Discrete 
    • Continuous 

Categorical values describe the item and represent an attribute of the item. Imagine you have a dataset about cars: car brand would be a typical categorical value, and color would be another. 

On the other side, we have numerical values that can be split into two different categories—discrete and continuous. Discrete values describe the amount of observations, such as how many people purchased a product, and so on. Continuous values have an infinite number of possible values and use real numbers for the representation. In a nutshell, discrete variables are like points plotted on a chart, and a continuous variable can be plotted as a line. 

Another classification of the data is the measurement-level point of view. We can split data into two primary categories:

  • Qualitative:
    • Nominal
    • Ordinal
  • Quantitative:
    • Interval
    • Ratio

Nominal variables can't be ordered and only describe an attribute. An example would be the color of a product; this describes how the product looks, but you can't put any ordering scheme on the color saying that red is bigger than green, and so on. Ordinal variables describe the feature with a categorical value and provide an ordering system; for example: Education—elementary, high school, university degree, and so on.

With quantitative values, it's a different story. The major difference is that ratio has a true zero. Imagine the attribute was a length. If the length is 0, you know there's no length. But this does not apply to temperature, since there's an interval of possible values for the temperature, where 0°C or 0°F does not mean the beginning of the scale for the temperature (as absolute zero, or beginning of the scale is 273.15° C or -459.67° F). With °K, it would actually be a ratio type of the quantitative value, since the scale really begins with 0°K. So, as you can see, any number can be an interval or a ratio value, but it depends on the context! 

主站蜘蛛池模板: 古交市| 临沂市| 栾川县| 长子县| 西林县| 石屏县| 双流县| 新民市| 康乐县| 海南省| 昌图县| 建水县| 乐清市| 若羌县| 东丽区| 万盛区| 缙云县| 乌兰浩特市| 左贡县| 广灵县| 辽源市| 罗甸县| 涿鹿县| 靖安县| 甘孜县| 沁源县| 长宁区| 崇阳县| 达孜县| 长岛县| 吴江市| 贞丰县| 广河县| 台中县| 辽宁省| 江阴市| 诏安县| 杭锦旗| 枣阳市| 和平区| 江城|