官术网_书友最值得收藏!

Mathematical operations allowed

Remember, at the interval level, we have addition and subtraction to work with. This is a real game-changer. With the ability to add values together, we may introduce two familiar concepts, the arithmetic mean (referred to simply as the mean) and standard deviation. At the interval level, both of these are available to us. To see a great example of this, let's pull in a new dataset, one about climate change:

# load in the data set
climate = pd.read_csv('../data/GlobalLandTemperaturesByCity.csv')
climate.head()

Let us have a look at the following table for a better understanding:

Let's see if we have any missing values with the following line of code:

climate.isnull().sum()

dt 0 AverageTemperature 0 AverageTemperatureUncertainty 0 City 0 Country 0 Latitude 0 Longitude 0 year 0 dtype: int64

# All good

The column in question is called AverageTemperature. One quality of data at the interval level, which temperature is, is that we cannot use a bar/pie chart here because we have too many values:

# show us the number of unique items
climate['AverageTemperature'].nunique()

111994

111,994 values is absurd to plot, and also absurd because we know that the data is quantitative. Likely, the most common graph to utilize starting at this level would be the histogram. This graph is a cousin of the bar graph, and visualizes buckets of quantities and shows frequencies of these buckets.

Let's see a histogram for the AverageTemperature around the world, to see the distribution of temperatures in a very holistic view:

climate['AverageTemperature'].hist()

The following is the output of the preceding code:

Here, we can see that we have an average value of 20°C. Let's confirm this:

climate['AverageTemperature'].describe()

count 8.235082e+06 mean 1.672743e+01 std 1.035344e+01 min -4.270400e+01 25% 1.029900e+01 50% 1.883100e+01 75% 2.521000e+01 max 3.965100e+01 Name: AverageTemperature, dtype: float64

We were close. The mean seems to be around 17°. Let's make this a bit more fun and add new columns called year and century, and also subset the data to only be the temperatures recorded in the US:

# Convert the dt column to datetime and extract the year
climate['dt'] = pd.to_datetime(climate['dt'])
climate['year'] = climate['dt'].map(lambda value: value.year)

climate_sub_us['century'] = climate_sub_us['year'].map(lambda x: x/100+1)
# 1983 would become 20
# 1750 would become 18

# A subset the data to just the US
climate_sub_us = climate.loc[climate['Country'] == 'United States']

With the new column century, let's plot four histograms of temperature, one for each century:

climate_sub_us['AverageTemperature'].hist(by=climate_sub_us['century'],
sharex=True, sharey=True,
figsize=(10, 10),
bins=20)

The following is the output of the preceding code:

Here, we have our four histograms, showing that the AverageTemperature is going up slightly. Let's confirm this:

climate_sub_us.groupby('century')['AverageTemperature'].mean().plot(kind='line')

The following is the output of the preceding code:

Interesting! And because differences are significant at this level, we can answer the question of how much, on average, the temperature has risen since the 18th century in the US. Let's store the changes over the centuries as its own pandas Series object first:

century_changes = climate_sub_us.groupby('century')['AverageTemperature'].mean()

century_changes

century 18 12.073243 19 13.662870 20 14.386622 21 15.197692 Name: AverageTemperature, dtype: float64

And now, let's use the indices in the Series to subtract the value in the 21st century minus the value in the 18th century, to get the difference in temperature:

# 21st century average temp in US minus 18th century average temp in US
century_changes[21] - century_changes[18]

# average difference in monthly recorded temperature in the US since the 18th century
3.12444911546
主站蜘蛛池模板: 当涂县| 揭阳市| 康平县| 水富县| 松阳县| 东山县| 六枝特区| 蚌埠市| 将乐县| 南木林县| 富阳市| 宁明县| 治县。| 富源县| 天水市| 阜南县| 道真| 介休市| 鄂托克旗| 溧阳市| 淮安市| 霸州市| 中宁县| 恩施市| 宁阳县| 姜堰市| 涞源县| 山阳县| 稻城县| 宜都市| 凤城市| 常山县| 修文县| 东乌珠穆沁旗| 太康县| 洪泽县| 开江县| 霸州市| 安远县| 高要市| 天祝|