官术网_书友最值得收藏!

Getting and reading data

The first step is to retrieve a dataset and open it with a program capable of manipulating the data. The simplest way of retrieving a dataset is to find a data file. Python and R can be used to open, read, modify, and save data stored in static files. In Chapter 3, Reading, Exploring, and Modifying Data - Part I, I will introduce the JSON data format and show how to use Python to read, write and modify JSON data. In Chapter 4Reading, Exploring, and Modifying Data - Part II, I will walk through how to use Python to work with data files in the CSV and XML data formats. In Chapter 6, Cleaning Numerical Data - An Introduction to R and Rstudio, I will introduce R and Rstudio, and show how to use R to read and manipulate data. 

Larger data sources are often made available through web interfaces called application programming interfaces (APIs). APIs allow you to retrieve specific bits of data from a larger collection of data. Web APIs can be great resources for data that is otherwise hard to get. In Chapter 8, Getting Data from the Web, I discuss APIs in detail and walk through the use of Python to extract data from APIs.

Another possible source of data is a database. I won't go into detail on the use of databases in this book, though in Chapter 9, Working with Large Datasets, I will show how to interact with a particular database using Python.

Databases are collections of data that are organized to optimize the quick retrieval of data. They can be particularly useful when we need to work incrementally on very large datasets, and of course may be a source of data.
主站蜘蛛池模板: 金堂县| 开原市| 蓝山县| 寻乌县| 隆安县| 信阳市| 临朐县| 建瓯市| 朔州市| 牙克石市| 邛崃市| 翁源县| 林州市| 安溪县| 丰台区| 凤庆县| 耒阳市| 兴隆县| 桓仁| 彩票| 措勤县| 威海市| 合水县| 七台河市| 博客| 高密市| 宜川县| 长沙市| 喀喇| 农安县| 康马县| 乌海市| 驻马店市| 泽库县| 鸡泽县| 清新县| 酉阳| 丰镇市| 绥中县| 淅川县| 隆安县|