官术网_书友最值得收藏!

Downloading open data

Before conducting any data analysis, an essential step is to collect high-quality, meaningful data. One important data source is open data, which is selected, organized, and freely available to the public. Most open data is published online in either text format or as APIs. Here, we introduce how to download the text format of an open data file with the download.file function.

Getting ready

In this recipe, you need to prepare your environment with R installed and a computer that can access the Internet.

How to do it…

Please perform the following steps to download open data from the Internet:

  1. First, visit the http://finance.yahoo.com/q/hp?s=%5EGSPC+Historical+Prices link to view the historical price of the S&P 500 in Yahoo Finance:

    Figure 1: Historical price of S&P 500

  2. Scroll down to the bottom of the page, right-click and copy the link in Download to Spreadsheet (the link should appear similar to http://real-chart.finance.yahoo.com/table.csv?s=%5EGSPC&d=6&e=3&f=2015&g=d&a=0&b=3&c=1950&ignore=.csv):

    Figure 2: Download to Spreadsheet

  3. Download this file with the download.file function:
    > download.file('http://rea
    l-chart.finance.yahoo.com/table.csv?s=%5EGSPC&d=6&e=3&f=2015&g=d&a=0&b=3&c=1950&ignore=.csv', 'snp500.csv')
    
  4. You can now use the getwd function to determine the current directory, and then use list.files to search for the downloaded file:
    > getwd()
    > list.files('./')
    

How it works…

In this recipe, we demonstrated how to download a file using download.file in R. First, we used Yahoo Finance to view historical prices of the S&P 500. At the bottom of the page, we found a link with a http:// URL prefix. The http:// URL prefix stands for Hypertext Transfer Protocol (HTTP), which serves the purpose of transmitting and receiving information over the Internet. Therefore, we can request the remote server with the link address through the use of download.file. Last, we can make the request for the link and save the remote file into our local directory.

There's more…

Apart from using the download.file function to download the file, you can use RCurl to download a file with either a HTTP URL prefix or HTTPS URL prefix:

  1. First, go to the https://nycopendata.socrata.com/Social-Services/NYC-Wi-Fi-Hotspot-Locations/a9we-mtpn? link to explore the Wi-Fi hotspot location file in the NYC open data:

    Figure 3: Wi-Fi hotspot location of NYC

  2. Next, click on Export and find the CSV download link:

    Figure 4: Downloading the CSV format of the Wi-Fi hotspot location

  3. You can then install and load the RCurl package:
    > install.packages("RCurl")
    > library(RCurl)
    
  4. Finally, download the HTTPS URL prefix file by using the getURL function:
    > rows <- getURL("https://nycopendata.socrata.com/api/views/jd4g-ks2z/rows.csv?accessType=DOWNLOAD")
    
主站蜘蛛池模板: 湘潭市| 浙江省| 华蓥市| 琼海市| 南木林县| 吴忠市| 黔西| 四川省| 夏津县| 长乐市| 泰兴市| 东辽县| 漠河县| 宝清县| 新竹县| 扬州市| 阿荣旗| 应城市| 本溪| 图们市| 九龙县| 双鸭山市| 湘阴县| 恩施市| 南部县| 山阳县| 阿鲁科尔沁旗| 武强县| 珲春市| 安陆市| 秦皇岛市| 涿鹿县| 辛集市| 九龙城区| 彭泽县| 阿鲁科尔沁旗| 明光市| 贵溪市| 泰顺县| 崇仁县| 革吉县|