官术网_书友最值得收藏!

Getting ready

In Chapter 1, Get Closer to your Data, we manipulated and prepared the data from the HousePrices.csv file and dealt with the missing values. In this example, we're going to use the final dataset to demonstrate these sampling and resampling techniques.

You can get the prepared dataset from the GitHub.

We'll import the required libraries. We'll read the data and take a look at the dimensions of our dataset:

# import os for operating system dependent functionalities
import os

# import other required libraries
import pandas as pd
from sklearn.model_selection import train_test_split

# Set your working directory according to your requirement
os.chdir(".../Chapter 3/Resampling Methods")
os.getcwd()

Let's read our data. We'll prefix the DataFrame name with df_ to make it easier to understand:

df_housingdata = pd.read_csv("Final_HousePrices.csv")

In the next section, we'll look at how to use train_test_split() from sklean.model_selection to split our data into random training and testing subsets.

主站蜘蛛池模板: 卢龙县| 新乐市| 东阳市| 贵溪市| 云梦县| 中超| 团风县| 临洮县| 岳西县| 栾川县| 商南县| 清苑县| 宝丰县| 青阳县| 江北区| 隆尧县| 新绛县| 苏尼特右旗| 汉沽区| 齐齐哈尔市| 正阳县| 合肥市| 鹰潭市| 博乐市| 三穗县| 布拖县| 华容县| 尚义县| 宾阳县| 徐汇区| 海口市| 饶河县| 思茅市| 湄潭县| 武夷山市| 梅河口市| 乌审旗| 甘孜县| 睢宁县| 增城市| 隆安县|