官术网_书友最值得收藏!

Grouping data points within a scatter plot

A basic scatter plot has a set of points plotted at the intersection of their values along x and y axes. Sometimes, we might wish to further distinguish between these points based on another value associated with the points. In this recipe, we will learn how we can group data points using colors.

Getting ready

To try out this recipe, start R and type the recipe in the command prompt. You can also choose to save the recipe as a script so that you can use it again later on.

We will also need the lattice and ggplot2 packages. The lattice package is included automatically in the base R installation, but we will need to install the ggplot2 package. To do this, run the following command in the R prompt:

install.packages("ggplot2")

How to do it...

As a first example, let's use the xyplot() command of the lattice library:

library(lattice)

xyplot(mpg~disp,
data=mtcars,
groups=cyl,
auto.key=list(corner=c(1,1)))

How it works...

In this example, we used the xyplot() command to plot mpg versus disp from the preloaded mtcars dataset. We will understand this better if we look at the actual dataset. Type mtcars in the R prompt and hit the Enter key. Let's look at a sample of the data in order to see the row names and first three columns of data:

mtcars[1:6,1:3] 
                       mpg   cyl   disp
Mazda RX4             21.0     6    160
Mazda RX4 Wag         21.0     6    160
Datsun 710            22.8     4    108
Hornet 4 Drive        21.4     6    258
Hornet Sportabout     18.7     8    360
Valiant               18.1     6    225

So, we plotted mpg against disp, but we also used the groups argument to group the data points by cyl. This tells xyplot() that we would like to highlight the data points by different colors based on the number of cylinders (cyl) each car has. Finally, the auto.key argument is set to add a legend so that we know what values of cyl each color represents. The auto.key argument can take a list of values. The only one we have provided here is the location given by the corner argument, which we set to c(1,1), representing the top-right corner. We can also simply set auto.key to TRUE, which will draw the legend in the top margin outside the plotting area.

There's more...

The xyplot() function has slightly obscure arguments. If you look at the help file on xyplot() (by running ?xyplot), you will see that there are a lot of arguments that can be used to control many different aspects of the graph. A simpler alternative to xyplot() is using the functions from the ggplot2 package. Let's draw the same plot using ggplot2:

library(ggplot2)
qplot(disp,mpg,data=mtcars,col= as.factor(cyl))

First, we load the ggplot2 library and then we use the qplot() function to create the preceding graph. We passed disp and mpg as the x and y variables, respectively (note that we can't use the y~x notation in qplot). To group by cyl, all we had to do was set the col argument to cyl. This tells qplot that we want to group the points based on the values of cyl and represent them by different colors. The legend is automatically drawn to the right.

Note that we set col to as.factor(cyl) and not just cyl. This is to make sure that cyl is read as a factor (or a categorical value). If we just use cyl, then the plot is still the same, but the color scale and legend uses all the values between 4 and 8 as it takes cyl as a numerical variable.

Thus, it is easier and more intuitive to produce a better-looking graph with ggplot2.

See also

We will use ggplot2 to group data points by size and symbol instead of color in the next recipe.

主站蜘蛛池模板: 贵定县| 安化县| 涿鹿县| 清水县| 苏尼特右旗| 澄迈县| 宽甸| 清徐县| 板桥市| 巧家县| 牙克石市| 固始县| 华阴市| 汉沽区| 东至县| 昌乐县| 铜陵市| 巴彦县| 梅河口市| 乐安县| 大余县| 斗六市| 临桂县| 灌南县| 手游| 临朐县| 阿瓦提县| 星子县| 兴和县| 文昌市| 昭通市| 黄浦区| 黄石市| 宾川县| 长宁区| 襄樊市| 铜梁县| 昌黎县| 乌拉特后旗| 宾阳县| 扶绥县|