官术网_书友最值得收藏!

Collecting the data

The data we will be using is the match history data for the NBA for the 2015-2016 season. The website  http://basketball-reference.com contains a significant number of resources and statistics collected from the NBA and other leagues. To download the dataset, perform the following steps:

  1. Navigate to http://www.basketball-reference.com/leagues/NBA_2016_games.html  in your web browser.
  2. Click Share & more.
  3. Click Get table as CSV (for Excel).
  4. Copy the data, including the heading, into a text file named basketball.csv.
  5. Repeat this process for the other months, except do not copy the heading.

This will give you a CSV file containing the results from each game of this season of the NBA. Your file should contain 1316 games and a total of 1317 lines in the file, including the header line.

CSV files are text files where each line contains a new row and each value is separated by a comma (hence the name). CSV files can be created manually by typing into a text editor and saving with a .csv extension. They can be opened in any program that can read text files but can also be opened in Excel as a spreadsheet. Excel (and other spreadsheet programs) can usually convert a spreadsheet to CSV as well.

We will load the file with the pandas library, which is an incredibly useful library for manipulating data. Python also contains a built-in library called csv that supports reading and writing CSV files. However, we will use pandas, which provides more powerful functions that we will use later in the chapter for creating new features.

For this chapter, you will need to install pandas. The easiest way to install it is to use Anaconda's conda installer, as you did in Chapter 1, Getting Started with data mining to install scikit-learn:
$ conda install pandas
If you have difficulty in installing pandas, head to the project's website at http://pandas.pydata.org/getpandas.html and read the installation instructions for your system.

主站蜘蛛池模板: 陕西省| 澜沧| 连江县| 汉寿县| 绥中县| 仁寿县| 沙雅县| 平顺县| 砀山县| 建平县| 黎城县| 巍山| 乌鲁木齐市| 鹿邑县| 商城县| 深圳市| 五指山市| 洛扎县| 江孜县| 太湖县| 临漳县| 寿阳县| 邵东县| 邳州市| 修水县| 东光县| 乌苏市| 柘城县| 鄂伦春自治旗| 万州区| 天峨县| 建始县| 德兴市| 石家庄市| 广西| 栾川县| 贵定县| 什邡市| 开化县| 桑植县| 松原市|