官术网_书友最值得收藏!

Getting ready

In Chapter 1, Get Closer to your Data, we manipulated and prepared the data from the HousePrices.csv file and dealt with the missing values. In this example, we're going to use the final dataset to demonstrate these sampling and resampling techniques.

You can get the prepared dataset from the GitHub.

We'll import the required libraries. We'll read the data and take a look at the dimensions of our dataset:

# import os for operating system dependent functionalities
import os

# import other required libraries
import pandas as pd
from sklearn.model_selection import train_test_split

# Set your working directory according to your requirement
os.chdir(".../Chapter 3/Resampling Methods")
os.getcwd()

Let's read our data. We'll prefix the DataFrame name with df_ to make it easier to understand:

df_housingdata = pd.read_csv("Final_HousePrices.csv")

In the next section, we'll look at how to use train_test_split() from sklean.model_selection to split our data into random training and testing subsets.

主站蜘蛛池模板: 偃师市| 静海县| 德州市| 佛山市| 嘉定区| 龙海市| 正蓝旗| 武鸣县| 布尔津县| 巴青县| 甘谷县| 囊谦县| 南溪县| 抚顺市| 香河县| 象州县| 克东县| 习水县| 海城市| 洛川县| 达拉特旗| 南郑县| 哈尔滨市| 二连浩特市| 汪清县| 天长市| 大兴区| 卓尼县| 循化| 龙岩市| 高清| 古蔺县| 视频| 河源市| 徐汇区| 涡阳县| 邻水| 安达市| 临潭县| 清水县| 庄浪县|