官术网_书友最值得收藏!

Alignment via index labels

Alignment of Series data by index labels is a fundamental concept in pandas, as well as being one of its most powerful concepts. Alignment provides automatic correlation of related values in multiple Series objects based upon index labels. This saves a lot of error-prone effort matching data in multiple sets using standard procedural techniques.

To demonstrate alignment, let's perform an example of adding values in two Series objects. Let's start with the following two Series objects representing two different samples of a set of variables (a and b):

Now suppose we would like to total the values for each variable. We can express this simply as s1 + s2:

pandas has matched the measurement for each variable in each series, added those values, and returned us the sum for each in one succinct statement.

It is also possible to apply a scalar value to a Series. The result will be that the scalar will be applied to each value in the Series using the specified operation:

Remember earlier when it was stated that we would come back to creating a Series with a scalar value? When performing this type of operation, pandas actually performs the following actions:

The first step is the creation of a Series from the scalar value, but with the index of the target Series. The multiplication is then applied to the aligned values of the two Series objects, which perfectly align because the index is identical.

The labels in the indexes are not required to align. Where alignment does not occur, pandas will return NaN as the result:

The NaN value is, by default, the result of any pandas alignment where an index label does not align with the other Series. This is an important characteristic of pandas, when compared to NumPy. If labels do not align, there should not be an exception thrown. This helps when some data is missing but it is acceptable for this to happen. Processing continues, but pandas lets you know there's an issue (but not necessarily a problem) by returning NaN.

Labels in a pandas index do not need to be unique. The alignment operation actually forms a Cartesian product of the labels in the two Series. If there are n 'a' labels in series 1, and m labels in series 2, then the result will have n*m total rows in the result.

To demonstrate this let's use the following two Series objects:

This will result in 6 'a' index labels and NaN for 'b' and 'c':

主站蜘蛛池模板: 潞西市| 彭阳县| 东明县| 农安县| 云霄县| 通化县| 措美县| 乌苏市| 嘉义县| 竹北市| 扶风县| 酒泉市| 昭觉县| 大悟县| 漳州市| 长治市| 雷波县| 辽宁省| 巴林左旗| 昌平区| 明光市| 洪泽县| 五大连池市| 谢通门县| 宁国市| 库尔勒市| 伊吾县| 和林格尔县| 乌兰浩特市| 盐边县| 黔江区| 寻甸| 新宾| 华亭县| 合川市| 平乡县| 平武县| 凉城县| 大姚县| 西贡区| 扎囊县|