官术网_书友最值得收藏!

  • Mastering matplotlib
  • Duncan M. McGreggor
  • 550字
  • 2021-07-16 14:11:20

Setting up the interactive backend

As mentioned above, our notebooks will all start with the following, as does this preview notebook:

In [1]: import matplotlib
 matplotlib.use('nbagg')
 %matplotlib inline
In [2]: import matplotlib.pyplot as plt
 import seaborn as sns
 import numpy as np
 from scipy import stats
 import pandas as pd

These commands do the following:

  • Set up the interactive backend for plotting
  • Allow us to evaluate images in-line, as opposed doing the same in a pop-up window
  • Provide the standard alias to the matplotlib.pyplot sub package and import other packages that we will need

Joint plots with Seaborn

Our first preview example will take a look at the Seaborn package, an open source third-party library for data visualization and attractive statistical graphs. Seaborn depends upon not only matplotlib, but also NumPy and SciPy (among others). These were already installed for you when you ran make (pulled from the requirements.txt file).

We'll cover Seaborn palettes in more detail later in the book, so the following command is just a sample. Let's use a predefined palette with a moderate color saturation level:

In [3]: sns.set_palette("BuPu_d", desat=0.6)
 sns.set_context("notebook", font_scale=2.0)

Next, we'll generate two sets of random data (with a random seed of our choosing), one for the x axis and the other for the y axis. We're then going to plot the overlap of these distributions in a hex plot. Here are the commands for the same:

In [4]: np.random.seed(42424242)
In [5]: x = stats.gamma(5).rvs(420)
 y = stats.gamma(13).rvs(420)
In [6]: with sns.axes_style("white"):
 sns.jointplot(x, y, kind="hex", size=16);

The generated graph is as follows:

Scatter plot matrix graphs with Pandas

In the second preview, we will use Pandas to graph a matrix of scatter plots whose diagonal will be the statistical graphs representing the kernel density estimation. We're going to go easy on the details for now; this is just to whet your appetite for more!

Pandas is a statistical data analysis library for Python that provides high-performance data structures, allowing one to carry out an entire scientific computing workflow in Python (as opposed to having to switch to something like R or Fortran for parts of it).

Let's take the seven columns (inclusive) from the baseball.csv data file between Runs (r) and Stolen Bases (sb) for players between the years of 1871 and 2007 and look at them at the same time in one graph:

In [7]: baseball = pd.read_csv("../data/baseball.csv")
In [8]: plt.style.use('../styles/custom.mplstyle')
 data = pd.scatter_matrix(
 baseball.loc[:,'r':'sb'],
 figsize=(16,10))

The generated graph is as follows:

Command 8 will take a few seconds longer than our previous plot since it's crunching a lot of data.

For now, the plot may look like something only a sabermetrician could read, but by the end of this book, complex graph matrices will be only one of many advanced topics in matplotlib that will have you reaching for new heights.

One last teaser before we close out the chapter—you may have noticed that the plots for the baseball data took a while to generate. Imagine doing 1,000 of these. Or 1,000,000. Traditionally, that's a showstopper for matplotlib projects, but in the latter half of this book, we will cover material that will not only show you how to overcome that limit, but also offer you several options to make it happen.

It's going to be a wild ride.

主站蜘蛛池模板: 凉城县| 三门峡市| 河池市| 宁晋县| 榕江县| 塘沽区| 宁阳县| 道真| 缙云县| 高州市| 若羌县| 银川市| 昌吉市| 富平县| 新绛县| 昆山市| 安丘市| 婺源县| 泽普县| 夏河县| 山丹县| 罗山县| 衡山县| 鄄城县| 尚义县| 鹤庆县| 东兰县| 南和县| 班玛县| 奎屯市| 嵊泗县| 普洱| 旺苍县| 太白县| 宁德市| 武胜县| 绥芬河市| 仁化县| 迁安市| 梅河口市| 东山县|