官术网_书友最值得收藏!

Dataset preprocessing

When we first dive into data science, a common mistake is expecting all the data to be very polished and with good characteristics from the very beginning. Alas, that is not the case for a very considerable percentage of cases, for many reasons such as null data, sensor errors that cause outliers and NAN, faulty registers, instrument-induced bias, and all kinds of defects that lead to poor model fitting and that must be eradicated.

The two key processes in this stage are data normalization and feature scaling. This process consists of applying simple transformations called affine that map the current unbalanced data into a more manageable shape, maintaining its integrity but providing better stochastic properties and improving the future applied model. The common goal of the standardization techniques is to bring the data distribution closer to a normal distribution, with the following techniques:

主站蜘蛛池模板: 九寨沟县| 南宫市| 永顺县| 定远县| 谷城县| 彰武县| 沁源县| 呈贡县| 洪洞县| 聂拉木县| 宿松县| 台南县| 桂林市| 象山县| 阜新| 如东县| 阳朔县| 兰州市| 上饶市| 渭南市| 台北县| 大连市| 繁昌县| 来凤县| 新丰县| 昌宁县| 清流县| 申扎县| 邮箱| 冀州市| 东乌珠穆沁旗| 沅陵县| 丹巴县| 望谟县| 营山县| 大埔县| 应城市| 项城市| 沂源县| 桃园县| 夏邑县|