官术网_书友最值得收藏!

Reading the data – variations and examples

Before we delve deeper into the realm of data, let us familiarize ourselves with a few terms that will appear frequently from now on.

Data frames

A data frame is one of the most common data structures available in Python. Data frames are very similar to the tables in a spreadsheet or a SQL table. In Python vocabulary, it can also be thought of as a dictionary of series objects (in terms of structure). A data frame, like a spreadsheet, has index labels (analogous to rows) and column labels (analogous to columns). It is the most commonly used pandas object and is a 2D structure with columns of different or same types. Most of the standard operations, such as aggregation, filtering, pivoting, and so on which can be applied on a spreadsheet or the SQL table can be applied to data frames using methods in pandas.

The following screenshot is an illustrative picture of a data frame. We will learn more about working with them as we progress in the chapter:

Fig. 2.1 A data frame

Delimiters

A delimiter is a special character that separates various columns of a dataset from one another. The most common (one can go to the extent of saying that it is a default delimiter) delimiter is a comma (,). A .csv file is called so because it has comma separated values. However, a dataset can have any special character as its delimiter and one needs to know how to juggle and manage them in order to do an exhaustive and exploratory analysis and build a robust predictive model. Later in this chapter, we will learn how to do that.

主站蜘蛛池模板: 莱芜市| 南丹县| 西昌市| 石泉县| 永新县| 龙江县| 镇坪县| 临江市| 凤翔县| 滁州市| 天等县| 平泉县| 固镇县| 霍城县| 休宁县| 盐边县| 珲春市| 玉树县| 海安县| 阆中市| 囊谦县| 安塞县| 海原县| 彰化市| 布尔津县| 嫩江县| 栾川县| 葫芦岛市| 大足县| 西峡县| 龙游县| 新田县| 垫江县| 西乌珠穆沁旗| 南澳县| 县级市| 金山区| 尉犁县| 沙雅县| 秀山| 东方市|