- Learning Data Mining with Python(Second Edition)
- Robert Layton
- 451字
- 2021-07-02 23:40:03
Loading the dataset with NumPy
The dataset can be downloaded from the code package supplied with the book, or from the official GitHub repository at:
https://github.com/dataPipelineAU/LearningDataMiningWithPython2
Download this file and save it on your computer, noting the path to the dataset. It is easiest to put it in the directory you'll run your code from, but we can load the dataset from anywhere on your computer.
For this example, I recommend that you create a new folder on your computer to store your dataset and code. From here, open your Jupyter Notebook, navigate to this folder, and create a new notebook.
The dataset we are going to use for this example is a NumPy two-dimensional array, which is a format that underlies most of the examples in the rest of the book. The array looks like a table, with rows representing different samples and columns representing different features.
The cells represent the value of a specific feature of a specific sample. To illustrate, we can load the dataset with the following code:
import numpy as np
dataset_filename = "affinity_dataset.txt"
X = np.loadtxt(dataset_filename)
Enter the previous code into the first cell of your (Jupyter) Notebook. You can then run the code by pressing Shift + Enter (which will also add a new cell for the next section of code). After the code is run, the square brackets to the left-hand side of the first cell will be assigned an incrementing number, letting you know that this cell has completed. The first cell should look like the following:

For code that will take more time to run, an asterisk will be placed here to denote that this code is either running or scheduled to run. This asterisk will be replaced by a number when the code has completed running (including if the code completes because it failed).
This dataset has 100 samples and five features, which we will need to know for the later code. Let's extract those values using the following code:
n_samples, n_features = X.shape
If you choose to store the dataset somewhere other than the directory your Jupyter Notebooks are in, you will need to change the dataset_filename value to the new location.
Next, we can show some of the rows of the dataset to get an understanding of the data. Enter the following line of code into the next cell and run it, to print the first five lines of the dataset:
print(X[:5])
The result will show you which items were bought in the first five transactions listed:
[[ 0. 1. 0. 0. 0.]
[ 1. 1. 0. 0. 0.]
[ 0. 0. 1. 0. 1.]
[ 1. 1. 0. 0. 0.]
[ 0. 0. 1. 1. 1.]]
- Computer Vision for the Web
- Building a Game with Unity and Blender
- Vue.js 2 and Bootstrap 4 Web Development
- .NET 4.0面向?qū)ο缶幊搪劊夯A篇
- ASP.NET Core 2 and Vue.js
- Full-Stack React Projects
- Java應用開發(fā)技術實例教程
- Node.js:來一打 C++ 擴展
- Python語言實用教程
- iPhone應用開發(fā)從入門到精通
- 響應式Web設計:HTML5和CSS3實戰(zhàn)(第2版)
- C語言程序設計與應用實驗指導書(第2版)
- 數(shù)據(jù)結構:Python語言描述
- 微信小程序開發(fā)邊做邊學(微課視頻版)
- 從零開始學UI設計·基礎篇