- Scala for Data Science
- Pascal Bugnion
- 1575字
- 2021-07-23 14:33:09
First steps with JDBC
Let's start by connecting to JDBC from the command line. To follow with the examples, you will need access to a running MySQL server. If you added the MySQL connector to the list of dependencies, open a Scala console by typing the following command:
$ sbt console
Let's import JDBC:
scala> import java.sql._ import java.sql._
We then need to tell JDBC to use a specific connector. This is normally done using reflection, loading the driver at runtime:
scala> Class.forName("com.mysql.jdbc.Driver") Class[_] = class com.mysql.jdbc.Driver
This loads the appropriate driver into the namespace at runtime. If this seems somewhat magical to you, it's probably not worth worrying about exactly how this works. This is the only example of reflection that we will consider in this book, and it is not particularly idiomatic Scala.
Connecting to a database server
Having specified the SQL connector, we can now connect to a database. Let's assume that we have a database called test
on host 127.0.0.1
, listening on port 3306
. We create a connection as follows:
scala> val connection = DriverManager.getConnection( "jdbc:mysql://127.0.0.1:3306/test", "root", // username when connecting "" // password ) java.sql.Connection = com.mysql.jdbc.JDBC4Connection@12e78a69
The first argument to getConnection
is a URL-like string with jdbc:mysql://host[:port]/database
. The second and third arguments are the username and password. Pass in an empty string if you can connect without a password.
Creating tables
Now that we have a database connection, let's interact with the server. For these examples, you will find it useful to have a MySQL shell open (or a MySQL GUI such as MySQLWorkbench) as well as the Scala console. You can open a MySQL shell by typing the following command in a terminal:
$ mysql
As an example, we will create a small table to keep track of famous physicists. In a mysql
shell, we would run the following command:
mysql> USE test; mysql> CREATE TABLE physicists ( id INT(11) AUTO_INCREMENT PRIMARY KEY, name VARCHAR(32) NOT NULL );
To achieve the same with Scala, we send a JDBC statement to the connection:
scala> val statementString = """ CREATE TABLE physicists ( id INT(11) AUTO_INCREMENT PRIMARY KEY, name VARCHAR(32) NOT NULL ) """ scala> val statement = connection.prepareStatement(statementString) PreparedStatement = JDBC4PreparedStatement@c983201: CREATE TABLE ... scala> statement.executeUpdate() results: Int = 0
Let's ignore the return value of executeUpdate
for now.
Inserting data
Now that we have created a table, let's insert some data into it. We can do this with a SQL INSERT
statement:
scala> val statement = connection.prepareStatement(""" INSERT INTO physicists (name) VALUES ('Isaac Newton') """) scala> statement.executeUpdate() Int = 1
In this case, executeUpdate
returns 1
. When inserting rows, it returns the number of rows that were inserted. Similarly, if we had used a SQL UPDATE
statement, this would return the number of rows that were updated. For statements that do not manipulate rows directly (such as the CREATE TABLE
statement in the previous section), executeUpdate
just returns 0
.
Let's just jump into a mysql
shell to verify the insertion performed correctly:
mysql> select * from physicists ; +----+--------------+ | id | name | +----+--------------+ | 1 | Isaac Newton | +----+--------------+ 1 row in set (0.00 sec)
Let's quickly summarize what we have seen so far: to execute SQL statements that do not return results, use the following:
val statement = connection.prepareStatement("SQL statement string") statement.executeUpdate()
In the context of data science, we frequently need to insert or update many rows at a time. For instance, we might have a list of physicists:
scala> val physicistNames = List("Marie Curie", "Albert Einstein", "Paul Dirac")
We want to insert all of these into the database. While we could create a statement for each physicist and send it to the database, this is quite inefficient. A better solution is to create a batch of statements and send them to the database together. We start by creating a statement template:
scala> val statement = connection.prepareStatement(""" INSERT INTO physicists (name) VALUES (?) """) PreparedStatement = JDBC4PreparedStatement@621a8225: INSERT INTO physicists (name) VALUES (** NOT SPECIFIED **)
This is identical to the previous prepareStatement
calls, except that we replaced the physicist's name with a ?
placeholder. We can set the placeholder value with the statement.setString
method:
scala> statement.setString(1, "Richard Feynman")
This replaces the first placeholder in the statement with the string Richard Feynman
:
scala> statement com.mysql.jdbc.JDBC4PreparedStatement@5fdd16c3: INSERT INTO physicists (name) VALUES ('Richard Feynman')
Note that JDBC, somewhat counter-intuitively, counts the placeholder positions from 1 rather than 0.
We have now created the first statement in the batch of updates. Run the following command:
scala> statement.addBatch()
By running the preceding command, we initiate a batch insert: the statement is added to a temporary buffer that will be executed when we run the executeBatch
method. Let's add all the physicists in our list:
scala> physicistNames.foreach { name => statement.setString(1, name) statement.addBatch() }
We can now execute all the statements in the batch:
scala> statement.executeBatch Array[Int] = Array(1, 1, 1, 1)
The return value of executeBatch
is an array of the number of rows altered or inserted by each item in the batch.
Note that we used statement.setString
to fill in the template with a particular name. The PreparedStatement
object has setXXX
methods for all basic types. To get a complete list, read the PreparedStatement
API documentation (http://docs.oracle.com/javase/7/docs/api/java/sql/PreparedStatement.html).
Reading data
Now that we know how to insert data into a database, let's look at the converse: reading data. We use SQL SELECT
statements to query the database. Let's do this in the MySQL shell first:
mysql> SELECT * FROM physicists; +----+-----------------+ | id | name | +----+-----------------+ | 1 | Isaac Newton | | 2 | Richard Feynman | | 3 | Marie Curie | | 4 | Albert Einstein | | 5 | Paul Dirac | +----+-----------------+ 5 rows in set (0.01 sec)
To extract this information in Scala, we define a PreparedStatement
:
scala> val statement = connection.prepareStatement(""" SELECT name FROM physicists """) PreparedStatement = JDBC4PreparedStatement@3c577c9d: SELECT name FROM physicists
We execute this statement by running the following command:
scala> val results = statement.executeQuery() results: java.sql.ResultSet = com.mysql.jdbc.JDBC4ResultSet@74a2e158
This returns a JDBC ResultSet
instance. The ResultSet
is an abstraction representing a set of rows from the database. Note that we used statement.executeQuery
rather than statement.executeUpdate
. In general, one should execute statements that return data (in the form of ResultSet
) with executeQuery
. Statements that modify the database without returning data (insert, create, alter, or update statements, among others) are executed with executeUpdate
.
The ResultSet
object behaves somewhat like an iterator. It exposes a next
method that advances itself to the next record, returning true
if there are records left in ResultSet
:
scala> results.next // Advance to the first record Boolean = true
When the ResultSet
instance points to a record, we can extract fields in this record by passing in the field name:
scala> results.getString("name") String = Isaac Newton
We can also extract fields using positional arguments. The fields are indexed from one:
scala> results.getString(1) // first positional argument String = Isaac Newton
When we are done with a particular record, we call the next
method to advance the ResultSet
to the next record:
scala> results.next // advances the ResultSet by one record Boolean = true scala> results.getString("name") String = Richard Feynman

A ResultSet object supports the getXXX(fieldName) methods to access the fields of a record and a next
method to advance to the next record in the result set.
One can iterate over a result set using a while
loop:
scala> while(results.next) { println(results.getString("name")) } Marie Curie Albert Einstein Paul Dirac
Tip
A word of warning applies to reading fields that are nullable. While one might expect JDBC to return null when faced with a null SQL field, the return type depends on the getXXX
command used. For instance, getInt
and getLong
will return 0
for any field that is null. Similarly, getDouble
and getFloat
return 0.0
. This can lead to some subtle bugs in code. In general, one should be careful with getters that return Java value types (int
, long
) rather than objects. To find out if a value is null
in the database, query it first with getInt
(or getLong
or getDouble
, as appropriate), then use the wasNull
method that returns a Boolean if the last read value was null:
scala> rs.getInt("field") 0 scala> rs.wasNull // was the last item read null? true
This (surprising) behavior makes reading from ResultSet
instances error-prone. One of the goals of the second part of this chapter is to give you the tools to build an abstraction layer on top of the ResultSet
interface to avoid having to call methods such as getInt
directly.
Reading values directly from ResultSet
objects feels quite unnatural in Scala. We will look, further on in this chapter, at constructing a layer through which you can access the result set using type classes.
We now know how to read and write to a database. Having finished with the database for now, we close the result sets, prepared statements, and connections:
scala> results.close scala> statement.close scala> connection.close
While closing statements and connections is not important in the Scala shell (they will get closed when you exit), it is important when you run programs; otherwise, the objects will persist, leading to "out of memory exceptions". In the next sections, we will look at establishing connections and statements with the loan pattern, a design pattern that closes a resource automatically when we finish using it.
- Go Web編程
- C#高級編程(第10版) C# 6 & .NET Core 1.0 (.NET開發經典名著)
- Spring 5.0 Microservices(Second Edition)
- Delphi程序設計基礎:教程、實驗、習題
- R語言游戲數據分析與挖掘
- Learning ArcGIS Pro
- Building Cross-Platform Desktop Applications with Electron
- HTML5+CSS3網頁設計
- Spring Boot企業級項目開發實戰
- NGINX Cookbook
- Mastering Git
- 新印象:解構UI界面設計
- RubyMotion iOS Develoment Essentials
- Mastering Apache Camel
- 視窗軟件設計和開發自動化:可視化D++語言