官术网_书友最值得收藏!

Distributions

A distribution is a representation of how often values appear within a dataset. Let's say, for instance, that one thing you are tracking as a data scientist is the daily sales of a certain product or service, and you have a long list (which you could represent as a vector or part of a matrix) of these daily sales numbers. These sales numbers are part of our dataset, and they include one day with sales of $121, another day with sales of $207, and so on.

There will be one sales number that is the lowest out of the one we have accumulated. There will also be one sales number that is the highest out of the one we have accumulated, and the rest of the sales numbers that are somewhere in between (at least if we assume no exact duplicates). The following image represents these low, high, and in-between values of sales along a line:

This is, thus, a distribution of sales, or at least one representation of the distribution of sales. Note that this distribution has areas where there are more numbers and areas where the numbers are a little sparse. Additionally, note that there seems to be a tendency for numbers to be near the center of the distribution.

主站蜘蛛池模板: 海阳市| 卓尼县| 文化| 陕西省| 玛曲县| 怀远县| 肇庆市| 瑞金市| 阜新| 景德镇市| 剑川县| 新巴尔虎左旗| 饶阳县| 从江县| 阜宁县| 武穴市| 琼结县| 昭苏县| 教育| 长乐市| 玉门市| 天镇县| 荃湾区| 扎兰屯市| 晋江市| 扶绥县| 普安县| 龙口市| 浏阳市| 嘉禾县| 琼海市| 广元市| 五大连池市| 云和县| 集贤县| 承德县| 措美县| 武强县| 灵山县| 盖州市| 庆阳市|