- Machine Learning Algorithms
- Giuseppe Bonaccorso
- 224字
- 2021-07-02 18:53:29
scikit-learn toy datasets
scikit-learn provides some built-in datasets that can be used for testing purposes. They're all available in the package sklearn.datasets and have a common structure: the data instance variable contains the whole input set X while target contains the labels for classification or target values for regression. For example, considering the Boston house pricing dataset (used for regression), we have:
from sklearn.datasets import load_boston
>>> boston = load_boston()
>>> X = boston.data
>>> Y = boston.target
>>> X.shape
(506, 13)
>>> Y.shape
(506,)
In this case, we have 506 samples with 13 features and a single target value. In this book, we're going to use it for regressions and the MNIST handwritten digit dataset (load_digits()) for classification tasks. scikit-learn also provides functions for creating dummy datasets from scratch: make_classification(), make_regression(), and make_blobs() (particularly useful for testing cluster algorithms). They're very easy to use and in many cases, it's the best choice to test a model without loading more complex datasets.
- Python編程自學手冊
- Flask Blueprints
- Learning Cython Programming(Second Edition)
- 深入實踐Spring Boot
- Lua程序設(shè)計(第4版)
- Learning Apache Mahout Classification
- Python High Performance Programming
- C++從入門到精通(第5版)
- 并行編程方法與優(yōu)化實踐
- 深入分析GCC
- Java EE 8 and Angular
- 金融商業(yè)數(shù)據(jù)分析:基于Python和SAS
- Three.js Essentials
- HTML5 and CSS3:Building Responsive Websites
- 計算機軟件項目實訓(xùn)指導(dǎo)