- Hands-On Data Science and Python Machine Learning
- Frank Kane
- 335字
- 2021-07-15 17:15:08
The factor of outliers
Now in the preceding example of the number of kids in each household, the median and the mean were pretty close to each other because there weren't a lot of outliers. We had 0, 1, 2, or 3 kids, but we didn't have some wacky family that had 100 kids. That would have really skewed the mean, but it might not have changed the median too much. That's why the median is often a very useful thing to look at and often overlooked.
People have a tendency to mislead people with statistics sometimes. I'm going to keep pointing this out throughout the book wherever I can.
For example, you can talk about the mean or average household income in the United States, and that actual number from last year when I looked it up was $72,000 or so, but that doesn't really provide an accurate picture of what the typical American makes. That is because, if you look at the median income, it's much lower at $51,939. Why is that? Well, because of income inequality. There are a few very rich people in America, and the same is true in a lot of countries as well. America's not even the worst, but you know those billionaires, those super-rich people that live on Wall Street or Silicon Valley or some other super-rich place, they skew the mean. But there's so few of them that they don't really affect the median so much.
This is a great example of where the median tells a much better story about the typical person or data point in this example than the mean does. Whenever someone talks about the mean, you have to think about what does the data distribution looks like. Are there outliers that might be skewing that mean? And if the answer is potentially yes, you should also ask for the median, because often, that provides more insight than the mean or the average.
- Advanced Quantitative Finance with C++
- ThinkPHP 5實戰
- CKA/CKAD應試教程:從Docker到Kubernetes完全攻略
- Python 3破冰人工智能:從入門到實戰
- Building an RPG with Unity 2018
- 量化金融R語言高級教程
- Building RESTful Python Web Services
- Building Machine Learning Systems with Python(Second Edition)
- SQL Server 2008 R2數據庫技術及應用(第3版)
- Django實戰:Python Web典型模塊與項目開發
- 代替VBA!用Python輕松實現Excel編程
- App Inventor少兒趣味編程動手做
- 玩轉.NET Micro Framework移植:基于STM32F10x處理器
- Visual Basic程序設計實驗指導及考試指南
- 安卓工程師教你玩轉Android