官术网_书友最值得收藏!

How to do it...

Here, we have a dataset based on the properties of cancerous tumors. Using this dataset, we'll build multiple classification models with diagnosis as our response variable. The diagnosis variable has the values, B and M, which indicate whether the tumor is benign or malignant. With multiple learners, we extract multiple predictions. The weighted averaging technique takes the average of all of the predicted values for each training sample.

In this example, we consider the predicted probabilities as the output and use the predict_proba() function of the scikit-learn algorithms to predict the class probabilities:

  1. Import the required libraries:
# Import required libraries
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression
  1. Create the response and feature sets:
# Create feature and response variable set
# We create train & test sample from our dataset
from sklearn.cross_validation import train_test_split

# create feature & response variables
X = df_cancerdata.iloc[:,2:32]
Y = df_cancerdata['diagnosis']

We retrieved the feature columns using the  iloc() function of the pandas DataFrame, which is purely integer-location based indexing for selection by position. The iloc() function takes row and column selection as its parameter, in the form:  data.iloc(<row selection>, <column selection>). The row and column selection can either be an integer list or a slice of rows and columns. For example, it might look as follows: df_cancerdata.iloc(2:100, 2:30).
  1. We'll then split our data into training and testing sets:
# Create train & test sets
X_train, X_test, Y_train, Y_test = \
train_test_split(X, Y, test_size=0.20, random_state=1)
  1. Build the base classifier models:
# create the sub models
estimators = []

dt_model = DecisionTreeClassifier()
estimators.append(('DecisionTree', dt_model))

svm_model = SVC(probability=True)
estimators.append(('SupportVector', svm_model))

logit_model = LogisticRegression()
estimators.append(('Logistic Regression', logit_model))
  1. Fit the models on the test data:
dt_model.fit(X_train, Y_train)
svm_model.fit(X_train, Y_train)
logit_model.fit(X_train, Y_train)
  1. Use the predict_proba() function to predict the class probabilities:
dt_predictions = dt_model.predict_proba(X_test)
svm_predictions = svm_model.predict_proba(X_test)
logit_predictions = logit_model.predict_proba(X_test)
  1. Assign different weights to each of the models to get our final predictions:
weighted_average_predictions=(dt_predictions * 0.3 + svm_predictions * 0.4 + logit_predictions * 0.3)
主站蜘蛛池模板: 石嘴山市| 呼伦贝尔市| 明星| 清流县| 巴林右旗| 滦平县| 屏边| 凤城市| 广丰县| 盈江县| 博乐市| 鲜城| 台江县| 准格尔旗| 日喀则市| 南阳市| 安图县| 永济市| 阿巴嘎旗| 常宁市| 平和县| 垣曲县| 万载县| 化德县| 鹿邑县| 温泉县| 婺源县| 灵山县| 许昌县| 枣强县| 新宁县| 桃江县| 宝鸡市| 柏乡县| 上高县| 湘潭县| 英山县| 辉南县| 彭州市| 烟台市| 来宾市|