- Big Data Analysis with Python
- Ivan Marin Ankit Shukla Sarang VK
- 550字
- 2021-06-11 13:46:39
Types of Graphs and When to Use Them
Every analysis, whether on small or large datasets, involves a descriptive statistics step, where the data is summarized and described by statistics such as mean, median, percentages, and correlation. This step is commonly the first step in the analysis workflow, allowing a preliminary understanding of the data and its general patterns and behaviors, providing grounds for the analyst to formulate hypotheses, and directing the next steps in the analysis. Graphs are powerful tools to aid in this step, enabling the analyst to visualize the data, create new views and concepts, and communicate them to a larger audience.
There is a vast amount of literature on statistics about visualizing information. The classic book, Envisioning Information, by Edward Tufte, demonstrates beautiful and useful examples of how to present information in graphical form. In another book, The Visual Display of Quantitative Information, Tufte enumerates a few qualities that a graph that will be used for analysis and transmitting information, including statistics, should have:
- Show the data
- Avoid distorting what the data has to say
- Make large datasets coherent
- Serve a reasonably clear purpose—description, exploration, tabulation, or decoration
Graphs must reveal information. We should think about creating graphs with these principles in mind when creating an analysis.
A graph should also be able to stand out on its own, outside the analysis. Let's say that you are writing an analysis report that becomes extensive. Now, we need to create a summary of that extensive analysis. To make the analysis' points clear, a graph can be used to represent the data. This graph should be able to support the summary without the entire extensive analysis. To enable the graph to give more information and be able to stand out on its own in the summary, we have to add more information to it, such as a title and labels.
Exercise 8: Plotting an Analytical Function
In this exercise, we will create a basic plot using the Matplotlib libraries, where we will visualize a function of two variables, for example, y = f(x), where f(x) is x^2:
- First, create a new Jupyter notebook and import all the required libraries:
%matplotlib inline
import pandas as pd
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
- Now, let's generate a dataset and plot it using the following code:
x = np.linspace(-50, 50, 100)
y = np.power(x, 2)
- Use the following command to create a basic graph with Matplotlib:
plt.plot(x, y)
The output is as follows:
Figure 2.1: Basic plot of X and Y axis
- Now, modify the data generation function from x^2 to x^3, keeping the same interval of [-50,50] and recreate the line plot:
y_hat = np.power(x, 3)
plt.plot(x, y_hat)
The output is as follows:

Figure 2.2: Basic plot of X and Y axis
As you can see, the shape of the function changed, as expected. The basic type of graph that we used was sufficient to see the change between the y and y_hat values. But some questions remain: we plotted only a mathematical function, but generally the data that we are collecting has dimensions, such as length, time, and mass. How can we add this information to the plot? How do we add a title? Let's explore this in the next section.