官术网_书友最值得收藏!

Introduction

In the real world, data rarely matches textbook definitions and examples. We have to deal with issues such as faulty hardware, uncooperative customers, and disgruntled colleagues. It is difficult to predict what kind of issues you will run into, but it is safe to assume that they will be plentiful and challenging. In this chapter, I will sketch some common approaches to deal with noisy data, which are based more on rules of thumb than strict science. Luckily, the trial and error part of data analysis is limited.

Most of this chapter is about outlier management. Outliers are values that we consider to be abnormal. Of course, this is not the only issue that you will encounter, but it is a sneaky one. A common issue is that of missing or invalid values, so I will briefly mention masked arrays and pandas features such as the dropna() function, which I have used throughout this book.

I have also written two recipes about using mpmath for arbitrary precision calculations. I don't recommend using mpmath unless you really have to because of the performance penalty you have to pay. Usually we can work around numerical issues, so arbitrary precision libraries are rarely needed.

主站蜘蛛池模板: 夏河县| 耒阳市| 巴中市| 萝北县| 盐山县| 故城县| 武川县| 饶平县| 临西县| 铜山县| 新竹县| 遵化市| 绩溪县| 西华县| 镶黄旗| 德格县| 宣城市| 南充市| SHOW| 长岛县| 昌图县| 乌拉特前旗| 鄂托克前旗| 梨树县| 深水埗区| 鄯善县| 永福县| 绥化市| 阜阳市| 友谊县| 阿拉善盟| 九江市| 沅陵县| 温州市| 隆昌县| 山阴县| 舟山市| 东乡县| 项城市| 屏东市| 榕江县|