- Python Data Mining Quick Start Guide
- Nathan Greeneltch
- 181字
- 2021-06-24 15:19:46
Sample spaces
The sample space is the space that is covered by all the possible outcomes of a measurement. For example, if a feature column in a dataset is populated with the number of days last month that a responder watched television, then the sample space will include all the integers in the {0,1,2...31} set. If a manufacturing tool measures the temperature difference before and after processing a widget, then the sample space is a continuous range from {|0-maxT|}, where maxT is the highest temperature that the tool can measure. Data outside the sample space can be a sign of misreporting or a systematic misunderstanding of the problem statement, and should trigger further investigation.
The concept of sample space seems trivial but it's vital for good data mining practice. Not only does it help you to identify outliers or missing and wrong data points, it also helps to orient your mind to the task at hand and understand what the data is meant to represent. Ask yourself this question before you get started: "What is my sample space?"
推薦閱讀
- 課課通計算機原理
- 網(wǎng)上沖浪
- Getting Started with MariaDB
- Mastering Elastic Stack
- Data Wrangling with Python
- Cloudera Administration Handbook
- 傳感器與新聞
- 基于企業(yè)網(wǎng)站的顧客感知服務(wù)質(zhì)量評價理論模型與實證研究
- Word 2007,Excel 2007辦公應(yīng)用融會貫通
- 大數(shù)據(jù)技術(shù)基礎(chǔ):基于Hadoop與Spark
- 格蠹匯編
- 單片機原理實用教程
- Mastering Text Mining with R
- 青少年VEX IQ機器人實訓(xùn)課程(初級)
- PostgreSQL 10 High Performance