官术网_书友最值得收藏!

Sample spaces

The sample space is the space that is covered by all the possible outcomes of a measurement. For example, if a feature column in a dataset is populated with the number of days last month that a responder watched television, then the sample space will include all the integers in the {0,1,2...31} set. If a manufacturing tool measures the temperature difference before and after processing a widget, then the sample space is a continuous range from {|0-maxT|}, where maxT is the highest temperature that the tool can measure. Data outside the sample space can be a sign of misreporting or a systematic misunderstanding of the problem statement, and should trigger further investigation.

The concept of sample space seems trivial but it's vital for good data mining practice. Not only does it help you to identify outliers or missing and wrong data points, it also helps to orient your mind to the task at hand and understand what the data is meant to represent. Ask yourself this question before you get started:  "What is my sample space?"
主站蜘蛛池模板: 名山县| 舞钢市| 上蔡县| 乌恰县| 呼伦贝尔市| 涞源县| 梧州市| 海丰县| 彝良县| 临沧市| 淮滨县| 永兴县| 西平县| 福清市| 乌拉特中旗| 郑州市| 修武县| 德清县| 高邑县| 梁平县| 寿宁县| 河东区| 荥经县| 广饶县| 安国市| 湖北省| 滕州市| 屏边| 和平县| 北辰区| 乐昌市| 宁武县| 郓城县| 延安市| 温州市| 渝中区| 普洱| 汤原县| 驻马店市| 洱源县| 伊金霍洛旗|