- Python Machine Learning Blueprints
- Alexander Combs Michael Roman
- 352字
- 2021-07-02 13:49:39
Statsmodels
The first library we'll cover is the statsmodels library (http://statsmodels.sourceforge.net/). Statsmodels is a Python package that is well documented and developed for exploring data, estimating models, and running statistical tests. Let's use it here to build a simple linear regression model of the relationship between sepal length and sepal width for the setosa species.
First, let's visually inspect the relationship with a scatterplot:
fig, ax = plt.subplots(figsize=(7,7)) ax.scatter(df['sepal width (cm)'][:50], df['sepal length (cm)'][:50]) ax.set_ylabel('Sepal Length') ax.set_xlabel('Sepal Width') ax.set_title('Setosa Sepal Width vs. Sepal Length', fontsize=14, y=1.02)
The preceding code generates the following output:

So, we can see that there appears to be a positive linear relationship; that is, as the sepal width increases, the sepal length does as well. We'll next run a linear regression on the data using statsmodels to estimate the strength of that relationship:
import statsmodels.api as sm y = df['sepal length'][:50] x = df['sepal width'][:50] X = sm.add_constant(x) results = sm.OLS(y, X).fit() print results.summary()
The preceding code generates the following output:

In the preceding diagram, we have the results of our simple regression model. Since this is a linear regression, the model takes the format of Y = Β0+ Β1X, where B0 is the intercept and B1 is the regression coefficient. Here, the formula would be Sepal Length = 2.6447 + 0.6909 * Sepal Width. We can also see that the R2 for the model is a respectable 0.558, and the p-value, (Prob), is highly significant—at least for this species.
Let's now use the results object to plot our regression line:
fig, ax = plt.subplots(figsize=(7,7)) ax.plot(x, results.fittedvalues, label='regression line') ax.scatter(x, y, label='data point', color='r') ax.set_ylabel('Sepal Length') ax.set_xlabel('Sepal Width') ax.set_title('Setosa Sepal Width vs. Sepal Length', fontsize=14, y=1.02) ax.legend(loc=2)
The preceding code generates the following output:

By plotting results.fittedvalues, we can get the resulting regression line from our regression.
There are a number of other statistical functions and tests in the statsmodels package, and I invite you to explore them. It is an exceptionally useful package for standard statistical modeling in Python. Let's now move on to the king of Python machine learning packages: scikit-learn.
- 用“芯”探核:龍芯派開發實戰
- Linux KVM虛擬化架構實戰指南
- Android NDK Game Development Cookbook
- 精選單片機設計與制作30例(第2版)
- Camtasia Studio 8:Advanced Editing and Publishing Techniques
- 超大流量分布式系統架構解決方案:人人都是架構師2.0
- 電腦橫機使用與維修
- 單片機原理及應用
- 微服務架構基礎(Spring Boot+Spring Cloud+Docker)
- Corona SDK Mobile Game Development:Beginner's Guide
- 電腦主板維修技術
- 詳解FPGA:人工智能時代的驅動引擎
- 零基礎輕松學修電腦主板
- Arduino+3D打印創新電子制作2
- Unreal Engine 4 AI Programming Essentials