官术网_书友最值得收藏!

Base graphics and ggplot2

There are lots of user-contributed graphics packages in R that can produce some wonderful graphics. You may wish to take a look for yourself at the CRAN task view cran.r-project.org/web/views/Graphics.html. We will have a very quick look at two approaches: base graphics, so-called because it is the default graphical environment within a vanilla installation of R, and ggplot2, a highly popular user-contributed package produced by Hadley Wickham, which is a little trickier to master than base graphics, but can very rapidly produce a wide range of graphical data summaries. We will cover two graphs familiar to all, the bar chart and the line chart.

Bar chart

Useful when comparing quantities across categories, bar charts are very simple within base graphics, particularly when combined with the table() command. We will use the mpg dataset which comes with the ggplot2 package; it summarizes different characteristics of a range of cars. First, let's install the ggplot2 package. You can do this straight from the console:

> install.packages("ggplot2")

Alternatively, you can use the built in package functions in IDEs like RStudio or RKWard. We'll need to load the package at the beginning of each session in which we wish to use this dataset or the ggplot2 package itself. From the console type the following command:

> library(ggplot2)

We will use the table() command to count the number of each type of car featured in the dataset:

> table(mpg$class)

This returns a table object (another special object type within R) that contains the following columns shown in the screenshot:

Producing a bar chart of this object is achieved simply like this:

> barplot(table(mpg$class), main = "Base graphics")

The barplot function takes a vector of frequencies. Where they are named, as here (the table()command returning named frequencies in table form), names are automatically included on the x axis. The defaults for this graph are rather plain; explore ?barplot and ?par to learn more about fine-tuning your graphics.

We've already loaded the ggplot2 package in order to use the mpg dataset, but if you have shut down R in between these two examples you will need to reload it by using the following command:

> library(ggplot2)

The same graph is produced in ggplot2 as follows:

> ggplot(data = mpg, aes(x = class)) + geom_bar() + ggtitle("ggplot2")

This ggplot call shows the three fundamental elements of ggplot calls—the use of a dataframe (data = mpg), the setup of aesthetics (aes(x = class)), which determines how variables are mapped onto axes, colors, and other visual features, and the use of + geom_xxx(). A ggplot call sets up the data and aesthetics, but does not plot anything. Functions such as geom_bar() (there are many others, see ??geom) tell ggplot what type of graph to plot, as well as taking optional arguments, for example, geom_bar() optionally takes a position argument which defines whether the bars should be stacked, offset, or stretched to a common height to show proportions instead of frequencies.

These elements are the key to the power and flexibility that ggplot2 offers. Once the data structure is defined, ways of visualizing that data structure can be added and taken away easily, not only in terms of the type of graphic (bar, line, or scatter) but also the scales and co-ordinate system (log10; polar coordinates) and statistical transformations (smoothing data, summarizing over spatial co-ordinates). The appearance of plots can be easily changed with pre-set and user-defined themes, and multiple plots can be added in layers (that is, adding to one plot) or facets (that is, drawing multiple plots with one function call).

Line chart

Line charts are most often used to indicate change, particularly over time. This time we will use the longley dataset, featuring economic variables between 1947 and 1962:

> plot(x = 1947 : 1962, y = longley$GNP, type = "l",
 xlab = "Year", main = "Base graphics")

The x axis is given very simply by 1947 : 1962, which enumerates all the numbers between 1947 and 1962, and the type = "l" argument specifies the plotting of lines, as opposed to points or both.

The ggplot call looks a lot like the bar chart except with an x and y dimension in the aesthetics this time. The command looks as follows:

> ggplot(longley, aes(x = 1947 : 1962, y = GNP)) + geom_line() +
 xlab("Year") + ggtitle("ggplot2")

Base graphics and ggplot versions of the bar chart are shown in the following screenshot for the purposes of comparison:

主站蜘蛛池模板: 保康县| 兰西县| 武川县| 巴中市| 交口县| 外汇| 崇礼县| 揭西县| 平江县| 建平县| 麻栗坡县| 哈密市| 元阳县| 常德市| 淳化县| 永康市| 玛曲县| 榆树市| 密山市| 辽中县| 保靖县| 福海县| 吉林省| 昌宁县| 乌恰县| 黄大仙区| 乌拉特中旗| 华容县| 远安县| 驻马店市| 昌宁县| 昭通市| 仁寿县| 泸定县| 昌图县| 陈巴尔虎旗| 青冈县| 海城市| 同江市| 乌鲁木齐县| 晋宁县|