官术网_书友最值得收藏!

Data review

When you have successfully loaded your data into Watson Analytics, you should review it and assess its quality.

The IBM Watson Analytics documentation describes data quality as:

Data quality assesses the degree to which a data set is suitable for analysis. A shorthand representation of this assessment is the data quality score. The score is measured on a scale of 0-100, with 100 representing the highest possible data quality.

Further:

The data quality score for a data set is computed by averaging the data quality score for every column in the data set. Several factors affect the data quality score for an individual field or column.

The factors that can affect the data quality score include:

  • Missing values: Records for which no data are entered.
  • Constant values: Some fields have the same value recorded for every field.
  • Imbalance: Occurs in a categorical field when records are not equally distributed across categories.
  • Influential categories: Those categories that are significantly different from other categories.
  • Outliers: Extreme values.
  • Skewness: Skewness measures how symmetrical a continuous field is distributed. Skewed fields have lower data quality scores.
主站蜘蛛池模板: 喀喇沁旗| 井陉县| 集贤县| 西充县| 濮阳市| 阿巴嘎旗| 惠来县| 同仁县| 凤台县| 江山市| 茌平县| 芜湖市| 浮梁县| 沅陵县| 瓮安县| 康马县| 东乡族自治县| 伊金霍洛旗| 高州市| 成武县| 贵港市| 衢州市| 措勤县| 龙山县| 苗栗县| 麻栗坡县| 奉节县| 吴旗县| 湘乡市| 繁昌县| 平果县| 八宿县| 搜索| 隆昌县| 秀山| 湖北省| 渭南市| 阳泉市| 新田县| 达州市| 大城县|