官术网_书友最值得收藏!

  • Python Data Science Essentials
  • Alberto Boschetti Luca Massaron
  • 244字
  • 2021-08-13 15:19:37

The MLdata.org and other public repositories for open source data

The second type of example dataset that we will present can be downloaded directly from the machine learning dataset repository, or from the LIBSVM data website. Contrary to the previous dataset, in this case, you will need access to the internet.

First, mldata.org is a public repository for machine learning datasets that is hosted by the TU Berlin University and supported by Pattern Analysis, Statistical Modelling, and Computational Learning (PASCAL), a network funded by the European Union. You are free to download any dataset from this repository and experiment with it.

For example, if you need to download all the data related to earthquakes since 1972, as reported by the United States Geological Survey, in order to analyze the data to search for predictive patterns, you will find the data repository at http://mldata.org/repository/data/viewslug/global-earthquakes/ (here, you will find a detailed description of the data).

Note that the directory that contains the dataset is global-earthquakes; you can directly obtain the data by using the following commands:

In: from sklearn.datasets import fetch_mldata
earthquakes = fetch_mldata('global-earthquakes')
print (earthquakes.data)
print (earthquakes.data.shape)

Out: (59209L, 4L)

As in the case of the Scikit-learn package toy dataset, the obtained object is a complex dictionary-like structure, where your predictive variables are earthquakes.data and your target to be predicted is earthquakes.target. This being the real data, in this case, you will have quite a lot of examples and just a few variables available.

主站蜘蛛池模板: 乳山市| 天柱县| 惠州市| 茌平县| 思南县| 革吉县| 休宁县| 宁乡县| 保山市| 虞城县| 沐川县| 大埔县| 沾化县| 庐江县| 江山市| 嵊州市| 周宁县| 武川县| 中西区| 武鸣县| 洮南市| 大埔区| 惠东县| 高碑店市| 洛川县| 越西县| 溧阳市| 子洲县| 临清市| 华阴市| 孝昌县| 五大连池市| 明星| 秭归县| 明光市| 潍坊市| 宝鸡市| 同心县| 峨山| 永和县| 衢州市|