- Hands-On Data Science and Python Machine Learning
- Frank Kane
- 262字
- 2021-07-15 17:15:09
Analyzing the effect of outliers
Just to prove a point, let's add in an outlier. We'll take Donald Trump; I think he qualifies as an outlier. Let's go ahead and add his income in. So I'm going to manually add this to the data using np.append, and let's say add a billion dollars (which is obviously not the actual income of Donald Trump) into the incomes data.
incomes = np.append(incomes, [1000000000])
What we're going to see is that this outlier doesn't really change the median a whole lot, you know, that's still going to be around the same value $26,911, because we didn't actually change where the middle point is, with that one value, as shown in the following example:
np.median(incomes)
This will output the following:
Out[5]: 26911.948365056276
This gives a new output of:
np.mean(incomes)
The following is the output of the preceding code:
Out[5]:127160.38252311043
Aha, so there you have it! It is a great example of how median and mean, although people tend to equate them in commonplace language, can be very different, and tell a very different story. So that one outlier caused the average income in this dataset to be over $127160 a year, but the more accurate picture is closer to 27,000 dollars a year for the typical person in this dataset. We just had the mean skewed by one big outlier.
The moral of the story is: take anyone who talks about means or averages with a grain of salt if you suspect there might be outliers involved, and income distribution is definitely a case of that.
- 演進式架構(gòu)(原書第2版)
- Kibana Essentials
- Design Principles for Process:driven Architectures Using Oracle BPM and SOA Suite 12c
- SQL Server 2016從入門到精通(視頻教學(xué)超值版)
- JavaScript+jQuery網(wǎng)頁特效設(shè)計任務(wù)驅(qū)動教程(第2版)
- AIRAndroid應(yīng)用開發(fā)實戰(zhàn)
- 深入淺出Windows API程序設(shè)計:編程基礎(chǔ)篇
- Python機器學(xué)習編程與實戰(zhàn)
- 深入淺出Serverless:技術(shù)原理與應(yīng)用實踐
- ASP.NET Core 2 Fundamentals
- CoffeeScript Application Development Cookbook
- 持續(xù)集成與持續(xù)交付實戰(zhàn):用Jenkins、Travis CI和CircleCI構(gòu)建和發(fā)布大規(guī)模高質(zhì)量軟件
- AI自動化測試:技術(shù)原理、平臺搭建與工程實踐
- 深入理解Java虛擬機:JVM高級特性與最佳實踐
- 量子計算機編程:從入門到實踐