- R Data Visualization Cookbook
- Atmajitsinh Gohil
- 727字
- 2021-08-06 19:21:09
Introducing a scatter plot
Scatter plots are used primarily to conduct a quick analysis of the relationships among different variables in our data. It is simply plotting points on the x-axis and y-axis. Scatter plots help us detect whether two variables have a positive, negative, or no relationship. In this recipe, we will study the basics of plotting in R using scatter plots. The following screenshot is an example of a scatter plot:

Getting ready
For implementing the basic scatter plot in R, we would use Carseats data available with the ISLR
package in R.
How to do it…
We will also start this recipe by installing necessary packages using the install.packages()
function and loading the same in R using the library()
function:
install.packages("ISLR") library(ISLR)
Next, we need to load the data in R. Almost all R packages come with preloaded data and hence we can load the data only after we load the library in R. We can attach the data in R using the attach()
function. We can view the entire list of datasets along with their respective libraries in R by typing data()
in the R console window. The attach()
function attaches the data to our R session. This allows us to access different variables of a database:
attach(Carseats)
Once we attach the data, it's a good practice to view the data using head(Carseats)
. The head()
function will display the first six entries of the dataset and will allow us to know the exact column headings of the data:
head(Carseats)
The data can be plotted in R by calling the plot()
function. The plot()
function in R comes with a variety of options and the best way to know all the options is by simply typing ?plot()
in the R console window:
plot(Income, Sales,col = c(Urban),pch = 20, main ="sales of Child Car Seats", xlab = "Income (000's of Dollars)", ylab ="Unit Sales (in 000's)" )
This particular plot requires us to plot the legends as the points have two different color schemes. In R, we can add a legend using the legend()
function:
legend("topright",cex = 0.6, fill = c("red","black"), legend = c("Yes","No"))
How it works…
Readers who are new to R should definitely read the recipe Installing packages and getting help in R in Chapter 1, A Simple Guide to R. The install.packages()
and library()
functions are used in most of the recipes in this book.
The attach()
function is a nice way to reference the data as this allows us to avoid typing the $ notation. The $
notation is another way to reference columns in data and is discussed in the next recipe. Once we attach the data, it's a good practice to view the data using head(Carseats)
. The head()
function has data as its first argument. To view fewer number of lines in the R console window, we can also type head(Carseats, 3)
. The tail(Carseats)
function will display data entries from the bottom of the dataset.
The data can be plotted in R by calling the plot()
function. The first two arguments in the plot()
function refer to the data to be displayed on the x-axis (Income) and y-axis (Sales). The col
argument allows us to assign color to our data points. In this case, we would like to use a qualitative data column (Urban) to color our points. The default color in R is black but we can change this using the col = "blue"
argument. Please refer to the code file to learn about various other options. The pch = 20
argument allows us to plot symbols; the value 20
will plot filled circles. To view all the available pch
values, please type ?par
or ?points
in the R console window. We can also label the heading of the plot using the main ="Sales"
argument. The xlab
and ylab
arguments are used to label the x and y axes in R.
To display a legend is necessary for this scatter plot as we would like to differentiate between sales in urban and rural areas. The first argument in the legend()
function corresponds to the position of the legend. The cex
argument is used to size the text, the default value for cex
is 1. The fill
argument fills the boxes with the specified colors and the legend
argument applies the labels to each of the boxes.