官术网_书友最值得收藏!

How to do it...

  1. We start by import the libraries as follows:
import numpy as np 
import pandas as pd
from sklearn.model_selection import train_test_split

from keras.models import Sequential
from keras.layers import Dense
from keras.callbacks import EarlyStopping, ModelCheckpoint
from keras.optimizers import Adam

from sklearn.preprocessing import StandardScaler

SEED = 2017
  1. Load dataset:
data = pd.read_csv('Data/winequality-red.csv', sep=';')
y = data['quality']
X = data.drop(['quality'], axis=1)
  1. Split data for training and testing:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=SEED)
  1. Print average quality and first rows of training set:
print('Average quality training set: {:.4f}'.format(y_train.mean()))
X_train.head()

In the following screenshot, we can see an example of the output of the training data:

Figure 2-8: Training data
  1. An important next step is to normalize the input data:
scaler = StandardScaler().fit(X_train)
X_train = pd.DataFrame(scaler.transform(X_train))
X_test = pd.DataFrame(scaler.transform(X_test))
  1. Determine baseline predictions:
# Predict the mean quality of the training data for each validation input
print('MSE:', np.mean((y_test - ([y_train.mean()] * y_test.shape[0])) ** 2).round(4))
## MSE: 0.594
  1. Now, let's build our neural network by defining the network architecture:
model = Sequential()
# First hidden layer with 100 hidden units
model.add(Dense(200, input_dim=X_train.shape[1], activation='relu'))
# Second hidden layer with 50 hidden units
model.add(Dense(25, activation='relu'))
# Output layer
model.add(Dense(1, activation='linear'))
# Set optimizer
opt = Adam()
# Compile model
model.compile(loss='mse', optimizer=opt, metrics=['accuracy'])
  1. Let's define the callback for early stopping and saving the best model:
callbacks = [
EarlyStopping(monitor='val_acc', patience=20, verbose=2),
ModelCheckpoint('checkpoints/multi_layer_best_model.h5', monitor='val_acc', save_best_only=True, verbose=0)
]
  1. Run the model with a batch size of 64, 5,000 epochs, and a validation split of 20%:
batch_size = 64
n_epochs = 5000
model.fit(X_train.values, y_train, batch_size=batch_size, epochs=n_epochs, validation_split=0.2,
verbose=2,
callbacks=callbacks)
  1. We can now print the performance on the test set after loading the optimal weights:
best_model = model
best_model.load_weights('checkpoints/multi_layer_best_model.h5')
best_model.compile(loss='mse', optimizer='adam', metrics=['accuracy'])

# Evaluate on test set
score = best_model.evaluate(X_test.values, y_test, verbose=0)
print('Test accuracy: %.2f%%' % (score[1]*100))

## Test accuracy: 66.25%
## Benchmark accuracy on dataset 62.4%
With a small dataset, it's advisable to retrain on the complete training set (without validation set) and increase the number of epochs proportional to the additional data. Another option, is to use cross-validation and average the results when making predictions.  
主站蜘蛛池模板: 库尔勒市| 酒泉市| 迁西县| 康定县| 内乡县| 苏尼特右旗| 梁河县| 监利县| 漳州市| 乌鲁木齐县| 瑞丽市| 新安县| 伊川县| 盐津县| 洛宁县| 台东县| 滨海县| 古丈县| 彰化市| 江阴市| 怀集县| 新宾| 通化市| 陆河县| 禄丰县| 叶城县| 嘉定区| 萍乡市| 富源县| 杭锦旗| 珲春市| 沛县| 双鸭山市| 朔州市| 台安县| 宣城市| 资阳市| 儋州市| 蒙阴县| 兴国县| 临安市|