官术网_书友最值得收藏!

Loading data

We can load the data used in this chapter with the following function.  It's very similar to the function we used in chapter 2, however it's adapted for this dataset.

from sklearn.preprocessing import StandardScaler

def
load_data():
"""Loads train, val, and test datasets from disk"""
train = pd.read_csv(TRAIN_DATA)
val = pd.read_csv(VAL_DATA)
test = pd.read_csv(TEST_DATA)

# we will use a dict to keep all this data tidy.
data = dict()
data["train_y"] = train.pop('y')
data["val_y"] = val.pop('y')
data["test_y"] = test.pop('y')

# we will use sklearn's StandardScaler to scale our data to 0 mean, unit variance.
scaler = StandardScaler()
train = scaler.fit_transform(train)
val = scaler.transform(val)
test = scaler.transform(test)

data["train_X"] = train
data["val_X"] = val
data["test_X"] = test
# it's a good idea to keep the scaler (or at least the mean/variance) so we can unscale predictions
data["scaler"] = scaler
return data
主站蜘蛛池模板: 金堂县| 卢龙县| 吕梁市| 嘉善县| 明光市| 巍山| 盐津县| 禹州市| 镇江市| 宁乡县| 闽侯县| 敦煌市| 乌审旗| 东兰县| 昌平区| 桦川县| 麻栗坡县| 西林县| 柳河县| 宜城市| 巨鹿县| 三明市| 津市市| 山西省| 丘北县| 滨州市| 清镇市| 田林县| 阳东县| 稷山县| 耿马| 峨眉山市| 崇左市| 大宁县| 榕江县| 松原市| 吉隆县| 诏安县| 城市| 师宗县| 砀山县|