官术网_书友最值得收藏!

Preparing a dataset for machine learning with scikit-learn

The first step to implementing any machine learning algorithm with scikit-learn is data preparation. Scikit-learn comes with a set of constraints to implementation that will be discussed later in this section. The dataset that we will be using is based on mobile payments and is found on the world's most popular competitive machine learning website – Kaggle.

You can download the dataset from: https://www.kaggle.com/ntnu-testimon/paysim1.

Once downloaded, open a new Jupyter Notebook by using the following code in Terminal (macOS/Linux) or Anaconda Prompt/PowerShell (Windows):

Jupyter Notebook

The fundamental goal of this dataset is to predict whether a mobile transaction is fraudulent. In order to do this, we need to first have a brief understanding of the contents of our data. In order to explore the dataset, we will use the pandas package in Python. You can install pandas by using the following code in Terminal (macOS/Linux) or PowerShell (Windows):

pip3 install pandas

Pandas can be installed on Windows machines in an Anaconda Prompt by using the following code:

conda install pandas

We can now read in the dataset into our Jupyter Notebook by using the following code: 

#Package Imports

import pandas as pd

#Reading in the dataset

df = pd.read_csv('PS_20174392719_1491204439457_log.csv')

#Viewing the first 5 rows of the dataset

df.head()

This produces an output as illustrated in the following screenshot: 

主站蜘蛛池模板: 鹤峰县| 武冈市| 余庆县| 肇庆市| 多伦县| 铁岭县| 乃东县| 广汉市| 镇坪县| 秭归县| 乐都县| 胶州市| 正宁县| 射阳县| 巨野县| 许昌县| 府谷县| 安泽县| 海伦市| 攀枝花市| 明星| 苏州市| 绩溪县| 大同县| 南华县| 江孜县| 泰顺县| 平潭县| 无极县| 临夏市| 谷城县| 墨玉县| 蓬安县| 平原县| 阳原县| 盱眙县| 广丰县| 庄浪县| 湖州市| 屏南县| 麻江县|