官术网_书友最值得收藏!

Getting ready

In Chapter 1, Get Closer to your Data, we manipulated and prepared the data from the HousePrices.csv file and dealt with the missing values. In this example, we're going to use the final dataset to demonstrate these sampling and resampling techniques.

You can get the prepared dataset from the GitHub.

We'll import the required libraries. We'll read the data and take a look at the dimensions of our dataset:

# import os for operating system dependent functionalities
import os

# import other required libraries
import pandas as pd
from sklearn.model_selection import train_test_split

# Set your working directory according to your requirement
os.chdir(".../Chapter 3/Resampling Methods")
os.getcwd()

Let's read our data. We'll prefix the DataFrame name with df_ to make it easier to understand:

df_housingdata = pd.read_csv("Final_HousePrices.csv")

In the next section, we'll look at how to use train_test_split() from sklean.model_selection to split our data into random training and testing subsets.

主站蜘蛛池模板: 遂宁市| 灵石县| 淮阳县| 疏勒县| 嘉峪关市| 台中市| 青海省| 闽清县| 绥江县| 安西县| 蓝山县| 日照市| 根河市| 马龙县| 衢州市| 乐业县| 吉水县| 中宁县| 柘荣县| 丹江口市| 遂昌县| 仁怀市| 松江区| 阳信县| 霍城县| 屯留县| 甘南县| 兴国县| 贡嘎县| 宣武区| 孝义市| 绩溪县| 阜阳市| 江源县| 嵊州市| 西平县| 泉州市| 鄂尔多斯市| 景宁| 沁源县| 会宁县|