官术网_书友最值得收藏!

Grouping data points within a scatter plot

A basic scatter plot has a set of points plotted at the intersection of their values along x and y axes. Sometimes, we might wish to further distinguish between these points based on another value associated with the points. In this recipe, we will learn how we can group data points using colors.

Getting ready

To try out this recipe, start R and type the recipe in the command prompt. You can also choose to save the recipe as a script so that you can use it again later on.

We will also need the lattice and ggplot2 packages. The lattice package is included automatically in the base R installation, but we will need to install the ggplot2 package. To do this, run the following command in the R prompt:

install.packages("ggplot2")

How to do it...

As a first example, let's use the xyplot() command of the lattice library:

library(lattice)

xyplot(mpg~disp,
data=mtcars,
groups=cyl,
auto.key=list(corner=c(1,1)))

How it works...

In this example, we used the xyplot() command to plot mpg versus disp from the preloaded mtcars dataset. We will understand this better if we look at the actual dataset. Type mtcars in the R prompt and hit the Enter key. Let's look at a sample of the data in order to see the row names and first three columns of data:

mtcars[1:6,1:3] 
                       mpg   cyl   disp
Mazda RX4             21.0     6    160
Mazda RX4 Wag         21.0     6    160
Datsun 710            22.8     4    108
Hornet 4 Drive        21.4     6    258
Hornet Sportabout     18.7     8    360
Valiant               18.1     6    225

So, we plotted mpg against disp, but we also used the groups argument to group the data points by cyl. This tells xyplot() that we would like to highlight the data points by different colors based on the number of cylinders (cyl) each car has. Finally, the auto.key argument is set to add a legend so that we know what values of cyl each color represents. The auto.key argument can take a list of values. The only one we have provided here is the location given by the corner argument, which we set to c(1,1), representing the top-right corner. We can also simply set auto.key to TRUE, which will draw the legend in the top margin outside the plotting area.

There's more...

The xyplot() function has slightly obscure arguments. If you look at the help file on xyplot() (by running ?xyplot), you will see that there are a lot of arguments that can be used to control many different aspects of the graph. A simpler alternative to xyplot() is using the functions from the ggplot2 package. Let's draw the same plot using ggplot2:

library(ggplot2)
qplot(disp,mpg,data=mtcars,col= as.factor(cyl))

First, we load the ggplot2 library and then we use the qplot() function to create the preceding graph. We passed disp and mpg as the x and y variables, respectively (note that we can't use the y~x notation in qplot). To group by cyl, all we had to do was set the col argument to cyl. This tells qplot that we want to group the points based on the values of cyl and represent them by different colors. The legend is automatically drawn to the right.

Note that we set col to as.factor(cyl) and not just cyl. This is to make sure that cyl is read as a factor (or a categorical value). If we just use cyl, then the plot is still the same, but the color scale and legend uses all the values between 4 and 8 as it takes cyl as a numerical variable.

Thus, it is easier and more intuitive to produce a better-looking graph with ggplot2.

See also

We will use ggplot2 to group data points by size and symbol instead of color in the next recipe.

主站蜘蛛池模板: 刚察县| 建宁县| 南投县| 和林格尔县| 尚志市| 屯门区| 瑞丽市| 达州市| 福海县| 临邑县| 龙泉市| 北海市| 泽普县| 临洮县| 鄯善县| 吉首市| 新竹市| 梧州市| 舒兰市| 惠州市| 兰考县| 齐齐哈尔市| 温泉县| 桃江县| 高雄县| 枣庄市| 珲春市| 连云港市| 大余县| 禹州市| 庐江县| 乌兰察布市| 东乌| 茂名市| 宣汉县| 巴塘县| 奇台县| 山东| 蓝田县| 宁南县| 利川市|