官术网_书友最值得收藏!

  • Machine Learning Algorithms
  • Giuseppe Bonaccorso
  • 224字
  • 2021-07-02 18:53:29

scikit-learn toy datasets

scikit-learn provides some built-in datasets that can be used for testing purposes. They're all available in the package sklearn.datasets and have a common structure: the data instance variable contains the whole input set X while target contains the labels for classification or target values for regression. For example, considering the Boston house pricing dataset (used for regression), we have:

from sklearn.datasets import load_boston

>>> boston = load_boston()
>>> X = boston.data
>>> Y = boston.target

>>> X.shape
(506, 13)
>>> Y.shape
(506,)

In this case, we have 506 samples with 13 features and a single target value. In this book, we're going to use it for regressions and the MNIST handwritten digit dataset (load_digits()) for classification tasks. scikit-learn also provides functions for creating dummy datasets from scratch: make_classification(), make_regression(), and make_blobs() (particularly useful for testing cluster algorithms). They're very easy to use and in many cases, it's the best choice to test a model without loading more complex datasets.

Visit http://scikit-learn.org/stable/datasets/ for further information.
The MNIST dataset provided by scikit-learn is limited for obvious reasons. If you want to experiment with the original one, refer to the website managed by Y. LeCun, C. Cortes, C. Burges: http://yann.lecun.com/exdb/mnist/. Here you can download a full version made up of 70,000 handwritten digits already split into training and test sets.
主站蜘蛛池模板: 崇州市| 盈江县| 虞城县| 自贡市| 历史| 亳州市| 玉龙| 阳谷县| 康马县| 嘉兴市| 韶关市| 原阳县| 吴堡县| 深泽县| 鄂尔多斯市| 禄劝| 泰来县| 凤台县| 石河子市| 临漳县| 内丘县| 柞水县| 纳雍县| 若尔盖县| 喀喇| 乌兰察布市| 林周县| 万宁市| 东平县| 明水县| 高淳县| 遵化市| 西昌市| 攀枝花市| 巴南区| 浦东新区| 陇川县| 新干县| 阿拉善左旗| 渭源县| 鹿邑县|