官术网_书友最值得收藏!

How it works...

In Step 1, we started by reading and describing our data. This step provided us with summary statistics for our dataset. We looked at the number of variables for each datatype in Step 2.

In Step 3, we created two variables, namely, numerical_features and categorical_features, to hold the names of numerical and categorical variables respectively. We used these two variables in the steps when we worked with numerical and categorical features separately. 

In Step 4 and Step 5, we used the seaborn library to plot our charts. We also introduced the melt() function from pandas, which can be used to reshape our DataFrame and feed it to the FacetGrid() function of the seaborn library. Here, we showed how you can paint the distribution plots for all the numerical variables in one single go. We also showed you how to use the same FacetGrid() function to plot the distribution of SalesPrice by each categorical variable.

We generated the correlation matrix in Step 6 using the corr() function of the DataFrame object. However, we noticed that with too many variables, the display does not make it easy for you to identify the correlations. In Step 7, we plotted the correlation matrix heatmap by using the heatmap() function from the seaborn library.

The corr() function c omputes the pairwise correlation of variables, excluding the missing values. The pearson method is used as the default for computing the correlation. You can also use the kendall or spearman methods, depending on your requirements. More information can be found at https://bit.ly/2CdXr8n.

In Step 8, we saw how the numerical variables correlated with the sale prices of houses using a scatter plot matrix. We generated the scatter plot matrix using the regplot() function from the seaborn library. Note that we used a parameter, fit_reg=False, to remove the regression line from the scatter plots.

In Step 9, we repeated Step 8 to see the relationship of the numerical variables with the sale prices of the houses in a numerical format, instead of scatter plots. We also sorted the output in descending order by passing a [::-1] argument to the corr() function.

主站蜘蛛池模板: 平远县| 高雄市| 永德县| 龙井市| 乐陵市| 揭阳市| 扬州市| 怀安县| 酒泉市| 沂源县| 辽中县| 临洮县| 德阳市| 枝江市| 房产| 青海省| 奉节县| 涟源市| 锡林浩特市| 财经| 泰来县| 茶陵县| 泸西县| 长宁区| 峡江县| 雅安市| 横山县| 葫芦岛市| 丹江口市| 岑溪市| 剑河县| 长武县| 东丰县| 马鞍山市| 怀集县| 许昌县| 通城县| 泰安市| 闸北区| 洛浦县| 黄平县|