官术网_书友最值得收藏!

How it works...

In Step 1, we started by reading and describing our data. This step provided us with summary statistics for our dataset. We looked at the number of variables for each datatype in Step 2.

In Step 3, we created two variables, namely, numerical_features and categorical_features, to hold the names of numerical and categorical variables respectively. We used these two variables in the steps when we worked with numerical and categorical features separately. 

In Step 4 and Step 5, we used the seaborn library to plot our charts. We also introduced the melt() function from pandas, which can be used to reshape our DataFrame and feed it to the FacetGrid() function of the seaborn library. Here, we showed how you can paint the distribution plots for all the numerical variables in one single go. We also showed you how to use the same FacetGrid() function to plot the distribution of SalesPrice by each categorical variable.

We generated the correlation matrix in Step 6 using the corr() function of the DataFrame object. However, we noticed that with too many variables, the display does not make it easy for you to identify the correlations. In Step 7, we plotted the correlation matrix heatmap by using the heatmap() function from the seaborn library.

The corr() function c omputes the pairwise correlation of variables, excluding the missing values. The pearson method is used as the default for computing the correlation. You can also use the kendall or spearman methods, depending on your requirements. More information can be found at https://bit.ly/2CdXr8n.

In Step 8, we saw how the numerical variables correlated with the sale prices of houses using a scatter plot matrix. We generated the scatter plot matrix using the regplot() function from the seaborn library. Note that we used a parameter, fit_reg=False, to remove the regression line from the scatter plots.

In Step 9, we repeated Step 8 to see the relationship of the numerical variables with the sale prices of the houses in a numerical format, instead of scatter plots. We also sorted the output in descending order by passing a [::-1] argument to the corr() function.

主站蜘蛛池模板: 东海县| 沙湾县| 宁蒗| 铅山县| 乌兰县| 贵阳市| 铅山县| 柯坪县| 云龙县| 泸州市| 泌阳县| 静乐县| 永寿县| 江北区| 乌审旗| 那坡县| 栾城县| 许昌市| 阜新| 旬阳县| 济宁市| 静海县| 宜丰县| 庐江县| 大冶市| 富顺县| 应城市| 茂名市| 石门县| 海阳市| 上栗县| 神池县| 手游| 新沂市| 呼玛县| 梁河县| 龙州县| 新野县| 龙江县| 宝坻区| 大渡口区|