- R for Data Science Cookbook
- Yu Wei Chiu (David Chiu)
- 783字
- 2021-07-14 10:51:26
Reading data from databases
As R reads data into memory, it is perfect for processing and analyzing small datasets. However, as an enterprise accumulates much more data than individuals in their daily lives, database documents are becoming more common for the purpose of storing and analyzing bigger data. To access databases with R, one can use RJDBC
, RODBC
, or RMySQL
as the communications bridge. In this section, we will demonstrate how to use RJDBC
to connect data stored in the database.
Getting ready
In this section, we need to prepare a MySQL environment first. If you have a MySQL environment installed on your machine (Windows), you can inspect server status from MySQL Notifier. If the local server is running, the server status should prompt localhost (Online), as shown in the following screenshot:

Figure 8: MySQL Notifier
Once we have our database server online, we need to validate whether we are authorized to access the database with a given username and password by using any database connection client. For example, you can use the MySQL command line client to connect to the database.
How to do it…
Please perform the following steps to connect R to MySQL with RJDBC
:
- First, we need to install and load the
RJDBC
package:> install.packages("RJDBC") > library(RJDBC)
- We can then download the JDBC driver for MySQL from the https://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java-5.0.8.zip link.
- Unzip the downloaded
mysql-connector-java-5.0.8.zip
and place the unzipped.jar
filemysql-connector-java-5.0.8-bin.jar
in the relevant location. For example, on the author's computer the extracted.jar
file was placed in theC:\Program Files\MySQL\
path. - Next, we can load the MySQL driver for the purpose of connecting to MySQL:
> drv <- JDBC("com.mysql.jdbc.Driver", + "C:\\Program Files\\MySQL\\mysql-connector-java-5.0.8-bin.jar" + )
- We can then establish a connection to MySQL using a registered MySQL driver:
> conn <- dbConnect(drv, + "jdbc:mysql://localhost:3306/finance", + "root", + "test")
- Use some basic operations to retrieve the table list from the connection:
> dbListTables(conn) [1] "majortrade"
- Obtain the data by using the
SELECT
operation:> trade_data <- dbGetQuery(conn, "SELECT * FROM majortrade")
- Finally, we can disconnect from MySQL:
> dbDisconnect(conn) [1] TRUE
How it works…
Two major standards that R can use to access databases are ODBC and JDBC. JDBC (also known as Java Database Connectivity) is made up of a set of Java implemented classes and interfaces, which enables communication between Java and the database. On the other hand, ODBC (also known as Open Database Connectivity) is a standard interface developed by Microsoft.
To compare these two standards, ODBC performs better in importing and exporting data; however, it is also platform dependent. In other words, before making your program work, you must configure the connectivity for different operating systems. In contrast, JDBC is platform independent, which means that your written program can work on any operating system.
To connect R to MySQL with RJDBC, we first need to install and load the RJDBC package from CRAN. RJDBC provides the ability to connect to a database through the JDBC interface. As JDBC is implemented in Java, you should install rJava
prior to RJDBC.
To proceed, we downloaded MySQL Connector/J, which is the official JDBC driver for MySQL, from the MySQL official download site. After extracting the .zip
file (or .tar
file), we placed the file in the proper file path (or you can add .jar
files to the classpath). We can now begin writing an R program to access the database.
In our R script, we first need to register and initialize the MySQL driver before issuing any query to the database. Here, we need to specify the driver's class name, com.mysql.jdbc.Driver
(different databases have different class names), and the .jar
file, mysql-connector-java-5.0.8-bin.jar
, where we can find the class. Next, we can establish the connection to the database with a registered driver. Here, we have to provide the connection string ("jdbc:mysql://localhost:3306/finance"
), the username ("root"
), and password ("test"
) to access the database. As our MySQL server is installed and running on localhost, we can craft the connection string into "jdbc:mysql://localhost:3306/finance"
. 3306
is the default port of MySQL and finance
is our target database.
After the connection is established, we can now issue some SQL queries to the database. We first list the table within the finance
database with the dbListTables
command. Next, we create the table named majortrade
, and insert records loaded from snp500.csv
to the majortrade
table with the insert
statement. We then obtain the data from the database using the select
statement. Finally, to release the connection, we need to disconnect by using the dbDisconnect
command.
There's more…
In R, you can also use RODBC
and RMySQL
to connect to a database. In this section, we illustrate how to access the database through RMySQL
. Follow the next few steps to install and load the RMySQL
package, and then issue some queries to the MySQL
database:
- First, we need to install and load the
RMySQL
package:> install.packages("RMySQL") > library(RMySQL)
- Next, we can access MySQL with a valid username and password:
> mydb <-dbConnect(MySQL(), user='root', password='test', host='localhost')
- At this point, we can send the query to the database and select trading data from the finance database:
> dbSendQuery(mydb, "USE finance") > fetch(dbSendQuery(mydb, "SELECT * FROM majortrade;"))