官术网_书友最值得收藏!

Getting and reading data

The first step is to retrieve a dataset and open it with a program capable of manipulating the data. The simplest way of retrieving a dataset is to find a data file. Python and R can be used to open, read, modify, and save data stored in static files. In Chapter 3, Reading, Exploring, and Modifying Data - Part I, I will introduce the JSON data format and show how to use Python to read, write and modify JSON data. In Chapter 4Reading, Exploring, and Modifying Data - Part II, I will walk through how to use Python to work with data files in the CSV and XML data formats. In Chapter 6, Cleaning Numerical Data - An Introduction to R and Rstudio, I will introduce R and Rstudio, and show how to use R to read and manipulate data. 

Larger data sources are often made available through web interfaces called application programming interfaces (APIs). APIs allow you to retrieve specific bits of data from a larger collection of data. Web APIs can be great resources for data that is otherwise hard to get. In Chapter 8, Getting Data from the Web, I discuss APIs in detail and walk through the use of Python to extract data from APIs.

Another possible source of data is a database. I won't go into detail on the use of databases in this book, though in Chapter 9, Working with Large Datasets, I will show how to interact with a particular database using Python.

Databases are collections of data that are organized to optimize the quick retrieval of data. They can be particularly useful when we need to work incrementally on very large datasets, and of course may be a source of data.
主站蜘蛛池模板: 秦安县| 兰考县| 宜春市| 深水埗区| 客服| 周口市| 香河县| 忻城县| 乐清市| 墨江| 丹江口市| 花莲市| 涿鹿县| 平泉县| 额济纳旗| 吉水县| 义乌市| 瑞金市| 三原县| 泰顺县| 青浦区| 昔阳县| 五家渠市| 酉阳| 寻甸| 南江县| 罗江县| 来凤县| 同德县| 老河口市| 阳城县| 湟源县| 汝城县| 和平区| 澄江县| 宜春市| 江达县| 徐闻县| 阿荣旗| 喀什市| 光山县|