官术网_书友最值得收藏!

Getting and reading data

The first step is to retrieve a dataset and open it with a program capable of manipulating the data. The simplest way of retrieving a dataset is to find a data file. Python and R can be used to open, read, modify, and save data stored in static files. In Chapter 3, Reading, Exploring, and Modifying Data - Part I, I will introduce the JSON data format and show how to use Python to read, write and modify JSON data. In Chapter 4Reading, Exploring, and Modifying Data - Part II, I will walk through how to use Python to work with data files in the CSV and XML data formats. In Chapter 6, Cleaning Numerical Data - An Introduction to R and Rstudio, I will introduce R and Rstudio, and show how to use R to read and manipulate data. 

Larger data sources are often made available through web interfaces called application programming interfaces (APIs). APIs allow you to retrieve specific bits of data from a larger collection of data. Web APIs can be great resources for data that is otherwise hard to get. In Chapter 8, Getting Data from the Web, I discuss APIs in detail and walk through the use of Python to extract data from APIs.

Another possible source of data is a database. I won't go into detail on the use of databases in this book, though in Chapter 9, Working with Large Datasets, I will show how to interact with a particular database using Python.

Databases are collections of data that are organized to optimize the quick retrieval of data. They can be particularly useful when we need to work incrementally on very large datasets, and of course may be a source of data.
主站蜘蛛池模板: 根河市| 随州市| 永康市| 常德市| 丰都县| 高州市| 梅州市| 虞城县| 太康县| 怀远县| 锡林郭勒盟| 洛宁县| 安庆市| 资源县| 惠州市| 宿迁市| 彩票| 日喀则市| 安岳县| 湖北省| 临夏市| 罗江县| 井冈山市| 溧阳市| 梓潼县| 库车县| 泌阳县| 龙陵县| 原阳县| 徐闻县| 榕江县| 台湾省| 兴城市| 南阳市| 沂水县| 丰都县| 永川市| 湟中县| 嘉峪关市| 闵行区| 夏津县|