官术网_书友最值得收藏!

Case 3 – reading data from a URL

Several times, we need to read the data directly from a web URL. This URL might contain the data written in it or might contain a file which has the data. For example, navigate to this website, http://winterolympicsmedals.com/ which lists the medals won by various countries in different sports during the Winter Olympics. Now type the following address in the URL address bar: http://winterolympicsmedals.com/medals.csv.

A CSV file will be downloaded automatically. If you choose to download it manually, saving it and then specifying the directory path for the read_csv method is a time consuming process. Instead, Python allows us to read such files directly from the URL. Apart from the significant saving in time, it is also beneficial to loop over the files when there are many such files to be downloaded and read in.

A simple read_csv statement is required to read the data directly from the URL:

import pandas as pd
medal_data=pd.read_csv('http://winterolympicsmedals.com/medals.csv')

Alternatively, to work with URLs to get data, one can use a couple of Python packages, which we have not used till now, that is csv and urllib. The readers can go to the documentation of the packages to learn more about these packages. It is sufficient to know that csv provides a range of methods to handle the CSV files, while urllib is used to navigate and access information from the URL. Here is how it can be done:

import csv
import urllib2

url='http://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
response=urllib2.urlopen(url)
cr=csv.reader(response)

for rows in cr:
  print rows

The working of the preceding code snippet can be explained in the following two points:

  1. The urlopen method of the urllib2 library creates a response that can be read in using the reader method of the csv library.
  2. This instance is an iterator and can be iterated over its rows.

The csv module is very helpful in dealing with CSV files. It can be used to read the dataset row by row, or in other words, iterate over the dataset among other things. It can be used to write to CSV files as well.

主站蜘蛛池模板: 灵寿县| 玛曲县| 永仁县| 毕节市| 鹤庆县| 黑河市| 新郑市| 乌拉特中旗| 饶河县| 永靖县| 惠来县| 额尔古纳市| 昭通市| 上饶市| 长葛市| 朝阳区| 抚州市| 琼结县| 万源市| 临泉县| 老河口市| 绥宁县| 乃东县| 马公市| 龙海市| 凯里市| 虹口区| 吉安县| 邵阳县| 高州市| 福贡县| 玛纳斯县| 铁岭县| 芮城县| 尼玛县| 巴塘县| 德庆县| 句容市| 丽水市| 怀安县| 阜城县|