官术网_书友最值得收藏!

Alignment via index labels

Alignment of Series data by index labels is a fundamental concept in pandas, as well as being one of its most powerful concepts. Alignment provides automatic correlation of related values in multiple Series objects based upon index labels. This saves a lot of error-prone effort matching data in multiple sets using standard procedural techniques.

To demonstrate alignment, let's perform an example of adding values in two Series objects. Let's start with the following two Series objects representing two different samples of a set of variables (a and b):

Now suppose we would like to total the values for each variable. We can express this simply as s1 + s2:

pandas has matched the measurement for each variable in each series, added those values, and returned us the sum for each in one succinct statement.

It is also possible to apply a scalar value to a Series. The result will be that the scalar will be applied to each value in the Series using the specified operation:

Remember earlier when it was stated that we would come back to creating a Series with a scalar value? When performing this type of operation, pandas actually performs the following actions:

The first step is the creation of a Series from the scalar value, but with the index of the target Series. The multiplication is then applied to the aligned values of the two Series objects, which perfectly align because the index is identical.

The labels in the indexes are not required to align. Where alignment does not occur, pandas will return NaN as the result:

The NaN value is, by default, the result of any pandas alignment where an index label does not align with the other Series. This is an important characteristic of pandas, when compared to NumPy. If labels do not align, there should not be an exception thrown. This helps when some data is missing but it is acceptable for this to happen. Processing continues, but pandas lets you know there's an issue (but not necessarily a problem) by returning NaN.

Labels in a pandas index do not need to be unique. The alignment operation actually forms a Cartesian product of the labels in the two Series. If there are n 'a' labels in series 1, and m labels in series 2, then the result will have n*m total rows in the result.

To demonstrate this let's use the following two Series objects:

This will result in 6 'a' index labels and NaN for 'b' and 'c':

主站蜘蛛池模板: 青岛市| 临沭县| 尚志市| 哈密市| 连城县| 潜江市| 太保市| 盘山县| 南雄市| 济源市| 仪陇县| 百色市| 含山县| 贺州市| 永仁县| 游戏| 平遥县| 霍林郭勒市| 四会市| 封丘县| 登封市| 章丘市| 五大连池市| 灌南县| 赤峰市| 仁寿县| 内黄县| 博罗县| 锡林郭勒盟| 锡林郭勒盟| 江孜县| 团风县| 新和县| 兰西县| 万源市| 张北县| 砚山县| 瑞昌市| 拜泉县| 巴东县| 延寿县|