- R for Data Science Cookbook
- Yu Wei Chiu (David Chiu)
- 402字
- 2021-07-14 10:51:24
Downloading open data
Before conducting any data analysis, an essential step is to collect high-quality, meaningful data. One important data source is open data, which is selected, organized, and freely available to the public. Most open data is published online in either text format or as APIs. Here, we introduce how to download the text format of an open data file with the download.file
function.
Getting ready
In this recipe, you need to prepare your environment with R installed and a computer that can access the Internet.
How to do it…
Please perform the following steps to download open data from the Internet:
- First, visit the http://finance.yahoo.com/q/hp?s=%5EGSPC+Historical+Prices link to view the historical price of the S&P 500 in Yahoo Finance:
Figure 1: Historical price of S&P 500
- Scroll down to the bottom of the page, right-click and copy the link in Download to Spreadsheet (the link should appear similar to http://real-chart.finance.yahoo.com/table.csv?s=%5EGSPC&d=6&e=3&f=2015&g=d&a=0&b=3&c=1950&ignore=.csv):
Figure 2: Download to Spreadsheet
- Download this file with the
download.file
function:> download.file('http://rea l-chart.finance.yahoo.com/table.csv?s=%5EGSPC&d=6&e=3&f=2015&g=d&a=0&b=3&c=1950&ignore=.csv', 'snp500.csv')
- You can now use the
getwd
function to determine the current directory, and then uselist.files
to search for the downloaded file:> getwd() > list.files('./')
How it works…
In this recipe, we demonstrated how to download a file using download.file
in R. First, we used Yahoo Finance to view historical prices of the S&P 500. At the bottom of the page, we found a link with a http://
URL prefix. The http://
URL prefix stands for Hypertext Transfer Protocol (HTTP), which serves the purpose of transmitting and receiving information over the Internet. Therefore, we can request the remote server with the link address through the use of download.file
. Last, we can make the request for the link and save the remote file into our local directory.
There's more…
Apart from using the download.file
function to download the file, you can use RCurl
to download a file with either a HTTP URL prefix or HTTPS URL prefix:
- First, go to the https://nycopendata.socrata.com/Social-Services/NYC-Wi-Fi-Hotspot-Locations/a9we-mtpn? link to explore the Wi-Fi hotspot location file in the NYC open data:
Figure 3: Wi-Fi hotspot location of NYC
- Next, click on Export and find the CSV download link:
Figure 4: Downloading the CSV format of the Wi-Fi hotspot location
- You can then install and load the
RCurl
package:> install.packages("RCurl") > library(RCurl)
- Finally, download the HTTPS URL prefix file by using the
getURL
function:> rows <- getURL("https://nycopendata.socrata.com/api/views/jd4g-ks2z/rows.csv?accessType=DOWNLOAD")
- SQL Server 從入門到項目實踐(超值版)
- CockroachDB權威指南
- 騰訊iOS測試實踐
- 青少年軟件編程基礎與實戰(圖形化編程三級)
- Java程序設計與計算思維
- HTML5+CSS3網站設計基礎教程
- Hands-On Enterprise Automation with Python.
- 深入理解Android:Wi-Fi、NFC和GPS卷
- D3.js By Example
- Mastering Linux Security and Hardening
- ArcGIS for Desktop Cookbook
- Django 3.0入門與實踐
- R數據科學實戰:工具詳解與案例分析
- 微課學人工智能Python編程
- Responsive Web Design with jQuery