官术网_书友最值得收藏!

Collecting the data

The data we will be using is the match history data for the NBA for the 2015-2016 season. The website  http://basketball-reference.com contains a significant number of resources and statistics collected from the NBA and other leagues. To download the dataset, perform the following steps:

  1. Navigate to http://www.basketball-reference.com/leagues/NBA_2016_games.html  in your web browser.
  2. Click Share & more.
  3. Click Get table as CSV (for Excel).
  4. Copy the data, including the heading, into a text file named basketball.csv.
  5. Repeat this process for the other months, except do not copy the heading.

This will give you a CSV file containing the results from each game of this season of the NBA. Your file should contain 1316 games and a total of 1317 lines in the file, including the header line.

CSV files are text files where each line contains a new row and each value is separated by a comma (hence the name). CSV files can be created manually by typing into a text editor and saving with a .csv extension. They can be opened in any program that can read text files but can also be opened in Excel as a spreadsheet. Excel (and other spreadsheet programs) can usually convert a spreadsheet to CSV as well.

We will load the file with the pandas library, which is an incredibly useful library for manipulating data. Python also contains a built-in library called csv that supports reading and writing CSV files. However, we will use pandas, which provides more powerful functions that we will use later in the chapter for creating new features.

For this chapter, you will need to install pandas. The easiest way to install it is to use Anaconda's conda installer, as you did in Chapter 1, Getting Started with data mining to install scikit-learn:
$ conda install pandas
If you have difficulty in installing pandas, head to the project's website at http://pandas.pydata.org/getpandas.html and read the installation instructions for your system.

主站蜘蛛池模板: 广水市| 九龙县| 峨眉山市| 乌苏市| 顺昌县| 花莲县| 永定县| 丹棱县| 长宁区| 东平县| 日土县| 视频| 宾川县| 镇宁| 阿克| 齐齐哈尔市| 贵港市| 班戈县| 库车县| 凉山| 江都市| 广南县| 河北区| 舟曲县| 彩票| 舞钢市| 九寨沟县| 涡阳县| 信丰县| 南木林县| 德令哈市| 昌黎县| 理塘县| 夹江县| 吴江市| 驻马店市| 湖北省| 安溪县| 灌南县| 驻马店市| 灵川县|