官术网_书友最值得收藏!

How it works...

In Step 1, we started by reading and describing our data. This step provided us with summary statistics for our dataset. We looked at the number of variables for each datatype in Step 2.

In Step 3, we created two variables, namely, numerical_features and categorical_features, to hold the names of numerical and categorical variables respectively. We used these two variables in the steps when we worked with numerical and categorical features separately. 

In Step 4 and Step 5, we used the seaborn library to plot our charts. We also introduced the melt() function from pandas, which can be used to reshape our DataFrame and feed it to the FacetGrid() function of the seaborn library. Here, we showed how you can paint the distribution plots for all the numerical variables in one single go. We also showed you how to use the same FacetGrid() function to plot the distribution of SalesPrice by each categorical variable.

We generated the correlation matrix in Step 6 using the corr() function of the DataFrame object. However, we noticed that with too many variables, the display does not make it easy for you to identify the correlations. In Step 7, we plotted the correlation matrix heatmap by using the heatmap() function from the seaborn library.

The corr() function c omputes the pairwise correlation of variables, excluding the missing values. The pearson method is used as the default for computing the correlation. You can also use the kendall or spearman methods, depending on your requirements. More information can be found at https://bit.ly/2CdXr8n.

In Step 8, we saw how the numerical variables correlated with the sale prices of houses using a scatter plot matrix. We generated the scatter plot matrix using the regplot() function from the seaborn library. Note that we used a parameter, fit_reg=False, to remove the regression line from the scatter plots.

In Step 9, we repeated Step 8 to see the relationship of the numerical variables with the sale prices of the houses in a numerical format, instead of scatter plots. We also sorted the output in descending order by passing a [::-1] argument to the corr() function.

主站蜘蛛池模板: 新野县| 东平县| 长宁区| 海城市| 石河子市| 延长县| 彰武县| 永州市| 平遥县| 镇江市| 崇信县| 保定市| 上高县| 咸丰县| 沐川县| 青田县| 开阳县| 社会| 永和县| 武夷山市| 寻乌县| 寻乌县| 大余县| 大冶市| 井冈山市| 太康县| 岳阳县| 太康县| 讷河市| 丘北县| 芜湖市| 呼图壁县| 临高县| 林周县| 嘉荫县| 保定市| 天柱县| 全州县| 建德市| 灵宝市| 桐庐县|