官术网_书友最值得收藏!

Summary

Quite a long chapter! Isn't it? But, this chapter will form the core of anything you learn and implement in data-science. Let us wrap-up the chapter by summarizing the key takeaways from the chapter:

  • Data can be sub-setted in a variety of ways: by selecting a column, selecting few rows, selecting a combination of rows and columns; using .ix method and [ ] method, and creating new columns.
  • Random numbers can be generated in a number of ways. There are many methods like randint(), raandarrange() in the random library of numpy. There are also methods like shuffle and choice to randomly select an element out of a list. Randn() and uniform() are used to generate random numbers following normal and uniform probability distributions. Random numbers can be used to run simulations and generate dummy data frames.
  • The groupby() method creates a groupby element on which aggregate, transform, and filter operations can be applied. This is a good method to summarize data for each categorical variable at once.
  • A data must be split between training and testing datasets before a modelling is performed. The training dataset is the one on which the model equations are developed. The testing dataset is used to test the performance of the model comparing the actual result (present in testing dataset) to the model output. There are various ways to perform this split. One can use choice and shuffle. Scikit-learn has a readymade method for this.
  • Two datasets can be merged just like two tables in a relational database. There are various kind of joins—Inner, Left, Right, Outer, and so on. These joins can be understood better if the datasets are assumed analogous to sets. Inner Join is then Intersection, Outer Join is Union, and Left and Right joins are entire left and right data frame.

Wrangling data and bringing it in the form you desire is a big challenge before one proceeds to modelling. But, once done, it opens up a plethora of insights and information to be discovered using predictive models. As Bob Marley said, "If it is easy, it won't be amazing; if it is amazing, it won't be easy."

主站蜘蛛池模板: 吉隆县| 南涧| 西华县| 上林县| 峡江县| 昆山市| 仲巴县| 抚顺市| 新邵县| 辛集市| 九龙坡区| 营口市| 西华县| 克东县| 正安县| 东乌| 汕尾市| 灵台县| 固阳县| 南城县| 贵定县| 宁海县| 宜都市| 社旗县| 井研县| 红河县| 苍溪县| 临洮县| 洛浦县| 裕民县| 甘谷县| 壤塘县| 资溪县| 突泉县| 东乡族自治县| 苏尼特右旗| 绵阳市| 商丘市| 肃北| 三穗县| 彭泽县|