- Python Data Science Essentials
- Alberto Boschetti Luca Massaron
- 244字
- 2021-08-13 15:19:37
The MLdata.org and other public repositories for open source data
The second type of example dataset that we will present can be downloaded directly from the machine learning dataset repository, or from the LIBSVM data website. Contrary to the previous dataset, in this case, you will need access to the internet.
First, mldata.org is a public repository for machine learning datasets that is hosted by the TU Berlin University and supported by Pattern Analysis, Statistical Modelling, and Computational Learning (PASCAL), a network funded by the European Union. You are free to download any dataset from this repository and experiment with it.
For example, if you need to download all the data related to earthquakes since 1972, as reported by the United States Geological Survey, in order to analyze the data to search for predictive patterns, you will find the data repository at http://mldata.org/repository/data/viewslug/global-earthquakes/ (here, you will find a detailed description of the data).
Note that the directory that contains the dataset is global-earthquakes; you can directly obtain the data by using the following commands:
In: from sklearn.datasets import fetch_mldata
earthquakes = fetch_mldata('global-earthquakes')
print (earthquakes.data)
print (earthquakes.data.shape)
Out: (59209L, 4L)
As in the case of the Scikit-learn package toy dataset, the obtained object is a complex dictionary-like structure, where your predictive variables are earthquakes.data and your target to be predicted is earthquakes.target. This being the real data, in this case, you will have quite a lot of examples and just a few variables available.
- Julia 1.0 Programming
- Windows 8應(yīng)用開(kāi)發(fā)實(shí)戰(zhàn)
- CorelDRAW X4中文版平面設(shè)計(jì)50例
- Mastering Machine Learning Algorithms
- 數(shù)據(jù)挖掘方法及天體光譜挖掘技術(shù)
- 大型數(shù)據(jù)庫(kù)管理系統(tǒng)技術(shù)、應(yīng)用與實(shí)例分析:SQL Server 2005
- 我也能做CTO之程序員職業(yè)規(guī)劃
- Machine Learning with Apache Spark Quick Start Guide
- 精通LabVIEW程序設(shè)計(jì)
- 計(jì)算智能算法及其生產(chǎn)調(diào)度應(yīng)用
- 筆記本電腦使用與維護(hù)
- 基于Quartus Ⅱ的數(shù)字系統(tǒng)Verilog HDL設(shè)計(jì)實(shí)例詳解
- ARM嵌入式系統(tǒng)開(kāi)發(fā)完全入門(mén)與主流實(shí)踐
- 軟件質(zhì)量管理實(shí)踐
- 人工智能基礎(chǔ)