官术网_书友最值得收藏!

Getting ready

In Chapter 1, Get Closer to your Data, we manipulated and prepared the data from the HousePrices.csv file and dealt with the missing values. In this example, we're going to use the final dataset to demonstrate these sampling and resampling techniques.

You can get the prepared dataset from the GitHub.

We'll import the required libraries. We'll read the data and take a look at the dimensions of our dataset:

# import os for operating system dependent functionalities
import os

# import other required libraries
import pandas as pd
from sklearn.model_selection import train_test_split

# Set your working directory according to your requirement
os.chdir(".../Chapter 3/Resampling Methods")
os.getcwd()

Let's read our data. We'll prefix the DataFrame name with df_ to make it easier to understand:

df_housingdata = pd.read_csv("Final_HousePrices.csv")

In the next section, we'll look at how to use train_test_split() from sklean.model_selection to split our data into random training and testing subsets.

主站蜘蛛池模板: 北票市| 班玛县| 永靖县| 元朗区| 同心县| 凤台县| 阿勒泰市| 绥宁县| 邯郸市| 象州县| 灵寿县| 许昌县| 安西县| 新民市| 盐山县| 余姚市| 商丘市| 乐业县| 灵寿县| 盐城市| 马山县| 塘沽区| 克东县| 宝清县| 铜川市| 仪征市| 宿松县| 湘乡市| 扶风县| 南宫市| 霍州市| 凤台县| 嘉义市| 巧家县| 阜康市| 师宗县| 拉萨市| 犍为县| 澄迈县| 二连浩特市| 雷州市|