官术网_书友最值得收藏!

Reading the data – variations and examples

Before we delve deeper into the realm of data, let us familiarize ourselves with a few terms that will appear frequently from now on.

Data frames

A data frame is one of the most common data structures available in Python. Data frames are very similar to the tables in a spreadsheet or a SQL table. In Python vocabulary, it can also be thought of as a dictionary of series objects (in terms of structure). A data frame, like a spreadsheet, has index labels (analogous to rows) and column labels (analogous to columns). It is the most commonly used pandas object and is a 2D structure with columns of different or same types. Most of the standard operations, such as aggregation, filtering, pivoting, and so on which can be applied on a spreadsheet or the SQL table can be applied to data frames using methods in pandas.

The following screenshot is an illustrative picture of a data frame. We will learn more about working with them as we progress in the chapter:

Fig. 2.1 A data frame

Delimiters

A delimiter is a special character that separates various columns of a dataset from one another. The most common (one can go to the extent of saying that it is a default delimiter) delimiter is a comma (,). A .csv file is called so because it has comma separated values. However, a dataset can have any special character as its delimiter and one needs to know how to juggle and manage them in order to do an exhaustive and exploratory analysis and build a robust predictive model. Later in this chapter, we will learn how to do that.

主站蜘蛛池模板: 岗巴县| 永新县| 长武县| 邵东县| 扬州市| 嘉义县| 沽源县| 宁化县| 汶川县| 肃宁县| 长葛市| 嘉义市| 海伦市| 政和县| 邓州市| 神农架林区| 舒兰市| 安阳市| 阿坝县| 宜君县| 桃园县| 金乡县| 桂林市| 德保县| 民权县| 光山县| 六安市| 松原市| 新密市| 东台市| 西青区| 南城县| 安康市| 沁阳市| 洛浦县| 芒康县| 惠安县| 平凉市| 弋阳县| 泽州县| 罗江县|