- Machine Learning with scikit:learn Quick Start Guide
- Kevin Jolly
- 233字
- 2021-06-24 18:15:55
Preparing a dataset for machine learning with scikit-learn
The first step to implementing any machine learning algorithm with scikit-learn is data preparation. Scikit-learn comes with a set of constraints to implementation that will be discussed later in this section. The dataset that we will be using is based on mobile payments and is found on the world's most popular competitive machine learning website – Kaggle.
You can download the dataset from: https://www.kaggle.com/ntnu-testimon/paysim1.
Once downloaded, open a new Jupyter Notebook by using the following code in Terminal (macOS/Linux) or Anaconda Prompt/PowerShell (Windows):
Jupyter Notebook
The fundamental goal of this dataset is to predict whether a mobile transaction is fraudulent. In order to do this, we need to first have a brief understanding of the contents of our data. In order to explore the dataset, we will use the pandas package in Python. You can install pandas by using the following code in Terminal (macOS/Linux) or PowerShell (Windows):
pip3 install pandas
Pandas can be installed on Windows machines in an Anaconda Prompt by using the following code:
conda install pandas
We can now read in the dataset into our Jupyter Notebook by using the following code:
#Package Imports
import pandas as pd
#Reading in the dataset
df = pd.read_csv('PS_20174392719_1491204439457_log.csv')
#Viewing the first 5 rows of the dataset
df.head()
This produces an output as illustrated in the following screenshot:
- 機(jī)器學(xué)習(xí)及應(yīng)用(在線實(shí)驗(yàn)+在線自測(cè))
- OpenStack for Architects
- Photoshop CS4經(jīng)典380例
- 計(jì)算機(jī)應(yīng)用復(fù)習(xí)與練習(xí)
- Maya極速引擎:材質(zhì)篇
- 新手學(xué)電腦快速入門
- 網(wǎng)中之我:何明升網(wǎng)絡(luò)社會(huì)論稿
- Linux系統(tǒng)管理員工具集
- Linux Shell Scripting Cookbook(Third Edition)
- 大型機(jī)系統(tǒng)應(yīng)用基礎(chǔ)
- 網(wǎng)絡(luò)規(guī)劃與設(shè)計(jì)
- Arduino創(chuàng)意機(jī)器人入門:基于Mind+
- 多傳感器數(shù)據(jù)智能融合理論與應(yīng)用
- Hands-On Data Analysis with NumPy and pandas
- Machine Learning with R Quick Start Guide