官术网_书友最值得收藏!

Outliers

The simplest explanation for what outliers are might be is to say that outliers are those data points that just don't fit the rest of your data. Upon observance, any data that is either very high, very low, or just unusual (within the context of your project), is an outlier. As part of data cleansing, a data scientist would typically identify the outliers and then address the outliers using a generally accepted method:

  • Delete the outlier values or even the actual variable where the outliers exist
  • Transform the values or the variable itself

Let's look at a real-world example of using R to identify and then address data outliers.

In the world of gaming, slot machines (a gambling machine operated by inserting coins into a slot and pulling a handle which determines the payoff) are quite popular. Most slot machines today are electronic and therefore are programmed in such a way that all their activities are continuously tracked. In our example, investors in a casino want to use this data (as well as various supplementary data) to drive adjustments to their profitability strategy. In other words, what makes for a profitable slot machine? Is it the machine's theme or its type? Are newer machines more profitable than older or retro machines? What about the physical location of the machine? Are lower denomination machines more profitable? We try to find our answers using the outliers.

We are given a collection or pool of gaming data (formatted as a comma-delimited or CSV text file), which includes data points such as the location of the slot machine, its denomination, month, day, year, machine type, age of the machine, promotions, coupons, weather, and coin-in (which is the total amount inserted into the machine less pay-outs). The first step for us as a data scientist is to review (sometimes called profile) the data, where we'll determine if any outliers exist. The second step will be to address those outliers.

主站蜘蛛池模板: 灌云县| 喀喇| 壶关县| 马公市| 海南省| 濮阳市| 东乌珠穆沁旗| 抚顺县| 平遥县| 兴山县| 丰宁| 吉林市| 余庆县| 迁西县| 衡南县| 新竹市| 汶上县| 甘孜县| 朝阳县| 万宁市| 荔波县| 离岛区| 安庆市| 习水县| 大名县| 日照市| 太和县| 炉霍县| 雅江县| 留坝县| 红安县| 青神县| 兴义市| 旺苍县| 囊谦县| 夏河县| 永兴县| 桦南县| 芮城县| 石楼县| 米泉市|