- Machine Learning Algorithms
- Giuseppe Bonaccorso
- 224字
- 2021-07-02 18:53:29
scikit-learn toy datasets
scikit-learn provides some built-in datasets that can be used for testing purposes. They're all available in the package sklearn.datasets and have a common structure: the data instance variable contains the whole input set X while target contains the labels for classification or target values for regression. For example, considering the Boston house pricing dataset (used for regression), we have:
from sklearn.datasets import load_boston
>>> boston = load_boston()
>>> X = boston.data
>>> Y = boston.target
>>> X.shape
(506, 13)
>>> Y.shape
(506,)
In this case, we have 506 samples with 13 features and a single target value. In this book, we're going to use it for regressions and the MNIST handwritten digit dataset (load_digits()) for classification tasks. scikit-learn also provides functions for creating dummy datasets from scratch: make_classification(), make_regression(), and make_blobs() (particularly useful for testing cluster algorithms). They're very easy to use and in many cases, it's the best choice to test a model without loading more complex datasets.
- Vue.js設(shè)計(jì)與實(shí)現(xiàn)
- Spring 5.0 By Example
- JMeter 性能測(cè)試實(shí)戰(zhàn)(第2版)
- The Complete Coding Interview Guide in Java
- Nginx實(shí)戰(zhàn):基于Lua語(yǔ)言的配置、開(kāi)發(fā)與架構(gòu)詳解
- 第一行代碼 C語(yǔ)言(視頻講解版)
- Python從入門到精通
- Android開(kāi)發(fā)三劍客:UML、模式與測(cè)試
- Vue.js 2 Web Development Projects
- 30天學(xué)通C#項(xiàng)目案例開(kāi)發(fā)
- Mastering OpenStack
- Python高性能編程(第2版)
- Building Microservices with Go
- Thymeleaf 3完全手冊(cè)
- Visual C++程序開(kāi)發(fā)范例寶典