- Learning Concurrent Programming in Scala
- Aleksandar Prokopec
- 4718字
- 2021-08-05 17:08:17
Processes and Threads
In modern, pre-emptive, multitasking operating systems, the programmer has little or no control over the choice of the processor on which the program will be executed. In fact, the same program might run on many different processors during its execution and sometimes even simultaneously on several processors. It is usually the task of the Operating System (OS) to assign executable parts of the program to specific processors—this mechanism is called multitasking, and it happens transparently for the computer user.
Historically, multitasking was introduced to operating systems to improve the user experience by allowing multiple users or programs to use resources of the same computer simultaneously. In cooperative multitasking, programs were able to decide when to stop using the processor and yield control to other programs. However, this required a lot of discipline on the programmer's part and programs could easily give the impression of being unresponsive. For example, a download manager that starts downloading a file must take care in order to yield control to other programs. Blocking the execution until a download finishes will completely ruin the user experience. Most operating systems today rely on pre-emptive multitasking, in which each program is repetitively assigned slices of execution time at a specific processor. These slices are called time slices. Thus, multitasking happens transparently for the application programmer as well as the user.
The same computer program can be started more than once, or even simultaneously within the same OS. A process is an instance of a computer program that is being executed. When a process starts, the OS reserves a part of the memory and other computational resources, and associates them with a specific computer program. The OS then associates the processor with the process, and the process executes during one time slice. Eventually, the OS gives other processes control over the processor. Importantly, the memory and other computational resources of one process are isolated from the other processes: two processes cannot read each other's memory directly or simultaneously use most of the resources.
Most programs are comprised of a single process, but some programs run in multiple processes. In this case, different tasks within the program are expressed as separate processes. Since separate processes cannot access the same memory areas directly, it can be cumbersome to express multitasking using multiple processes.
Multitasking was important long before the recent years, in which multicore computers became mainstream. Large programs such as web browsers are divided into many logical modules. A browser's download manager downloads files independent of rendering the web page or updating the HTML Document Object Model (DOM). While the user is browsing a social networking website, the file download proceeds in the background; but both independent computations occur as part of the same process. These independent computations occurring in the same process are called threads. In a typical operating system, there are many more threads than processors.
Every thread describes the current state of the program stack and the program counter during program execution. The program stack contains a sequence of method invocations that are currently being executed, along with the local variables and method parameters of each method. The program counter describes the position of the current instruction in the current method. A processor can advance the computation in some thread by manipulating the state of its stack or the state of the program objects and executing the instruction at the current program counter. When we say that a thread performs an action such as writing to a memory location, we mean that the processor executing that thread performs that action. In pre-emptive multitasking, thread execution is scheduled by the operating system. A programmer must assume that the processor time assigned to his thread is unbiased toward other threads in the system.
OS threads are a programming facility provided by the OS, usually exposed through an OS-specific programming interface. Unlike separate processes, separate OS threads within the same process share a region of memory, and communicate by writing to and reading parts of that memory. Another way to define a process is to define it as a set of OS threads along with the memory and resources shared by these threads.
Based on the preceding discussion about the relationships between processes and threads, a summary of a typical OS is depicted in the following simplified illustration:

The preceding illustration shows an OS in which multiple processes are executing simultaneously. Only the first three processes are shown in the illustration. Each process is assigned a fixed region of computer memory. In practice, the memory system of the OS is much more complex, but this approximation serves as a simple mental model.
Each of the processes contains multiple OS threads, two of which are shown for each process. Currently, Thread 1 of Process 2 is executing on CPU Core 1, and Thread 2 of Process 3 is executing on CPU Core 2. The OS periodically assigns different OS threads to each of the CPU cores to allow the computation to progress in all the processes.
Having shown the relationship between the OS threads and processes, we turn our attention to see how these concepts relate to the Java Virtual Machine (JVM), the runtime on top of which Scala programs execute.
Starting a new JVM instance always creates only one process. Within the JVM process, multiple threads can run simultaneously. The JVM represents its threads with the java.lang.Thread
class. Unlike runtimes for languages such as Python, the JVM does not implement its custom threads. Instead, each Java thread is directly mapped to an OS thread. This means that Java threads behave in a very similar way to the OS threads, and the JVM depends on the OS and its restrictions.
Scala is a programming language that is by default compiled to the JVM bytecode, and the Scala compiler output is largely equivalent to that of Java from the JVM's perspective. This allows Scala programs to transparently call Java libraries, and in some cases, even vice versa. Scala reuses the threading API from Java for several reasons. First, Scala can transparently interact with the existing Java thread model, which is already sufficiently comprehensive. Second, it is useful to retain the same threading API for compatibility reasons, and there is nothing fundamentally new that Scala can introduce with respect to the Java thread API.
The rest of this section shows how to create JVM threads using Scala, how they can be executed, and how they can communicate. We will show and discuss several concrete examples. Java aficionados, already well-versed in this subject, might choose to skip the rest of the section.
Creating and starting threads
Every time a new JVM process starts, it creates several threads by default. The most important thread among them is the main thread, which executes the main
method of the Scala program. We show this in the following program, which gets the name of the current thread and prints it to the standard output:
object ThreadsMain extends App { val t: Thread = Thread.currentThread val name = t.getName println(s"I am the thread $name") }
On the JVM, thread objects are represented with the Thread
class. The preceding program uses the static currentThread
method to obtain a reference to the current thread object, and stores it to a local variable named t
. It then calls the getName
method to obtain the thread's name. If you are running this program from Simple Build Tool (SBT) with the run
command, as explained in Chapter 1, Introduction, you should see the following output:
[info] I am the thread run-main-0
Normally, the name of the main thread is just main
. The reason we see a different name is because SBT started our program on a separate thread inside the SBT process. To ensure that the program runs inside a separate JVM process, we need to set SBT's fork
setting to true
:
> set fork := true
Invoking the SBT run
command again should give the following output:
[info] I am the thread main
Every thread goes through several thread states during its existence. When a Thread
object is created, it is initially in the new state. After the newly created thread object starts executing, it goes into the runnable state. After the thread is done executing, the thread object goes into the terminated state, and cannot execute any more.
Starting an independent thread of computation consists of two steps. First, we need to create a Thread
object to allocate the memory for the stack and thread state. To start the computation, we need to call the start
method on this object. We show how to do this in the following example application called ThreadsCreation
:
object ThreadsCreation extends App { class MyThread extends Thread { override def run(): Unit = { println("New thread running.") } } val t = new MyThread t.start() t.join() println("New thread joined.") }
When a JVM application starts, it creates a special thread called the main thread that executes the method called main
in the specified class, in this case, ThreadsCreation
. When the App
class is extended, the main
method is automatically synthesized from the object body. In this example, the main thread first creates another thread of the MyThread
type and assigns it to t
.
Next, the main thread starts t
by calling the start
method. Calling the start
method eventually results in executing the run
method from the new thread. First, the OS is notified that t
must start executing. When the OS decides to assign the new thread to some processor, this is largely out of the programmer's control, but the OS must ensure that this eventually happens. After the main thread starts the new thread t
, it calls its join
method. This method halts the execution of the main thread until t
completes its execution. We can say that the join
operation puts the main thread into the waiting state until t
terminates. Importantly, the waiting thread relinquishes its control over the processor, and the OS can assign that processor to some other thread.
Note
Waiting threads notify the OS that they are waiting for some condition and cease spending CPU cycles, instead of repetitively checking that condition.
In the meantime, the OS finds an available processor and instructs it to run the child thread. The instructions that a thread must execute are specified by overriding its run
method. The t
instance of the MyThread
class starts by printing the "New thread running."
text to the standard output and then terminates. At this point, the operating system is notified that t
is terminated and eventually lets the main thread continue the execution. The OS then puts the main thread back into the running state, and the main thread prints "New thread joined."
. This is shown in the following diagram:

It is important to note that the two outputs "New thread running."
and "New thread joined."
are always printed in this order. This is because the join
call ensures that the termination of the t
thread occurs before the instructions following the join
call.
When running the program, it is executed so fast that the two println
statements occur almost simultaneously. Could it be that the ordering of the println
statements is just an artifact in how the OS chooses to execute these threads? To verify the hypothesis that the main thread really waits for t
and that the output is not just because the OS is biased to execute t
first in this particular example, we can experiment by tweaking the execution schedule. Before we do that, we will introduce a shorthand to create and start a new thread; the current syntax is too verbose! The new thread
method simply runs a block of code in a newly started thread. This time, we will create the new thread using an anonymous thread class declared inline at the instantiation site:
def thread(body: =>Unit): Thread = { val t = new Thread { override def run() = body } t.start() t }
The thread
method takes a block of code body, creates a new thread that executes this block of code in its run
method, starts the thread, and returns a reference to the new thread so that the clients can call join
on it.
Creating and starting threads using the thread
statement is much less verbose. To make the examples in this chapter more concise, we will use the thread
statement from now on. However, you should think twice before using the thread
statement in production projects. It is prudent to correlate the syntactic burden with the computational cost; lightweight syntax can be mistaken for a cheap operation and creating a new thread is relatively expensive.
We can now experiment with the OS by making sure that all the processors are available. To do this, we will use the static sleep
method on the Thread
class, which postpones the execution of the thread that is being currently executed for the specified number of milliseconds. This method puts the thread into the timed waiting state. The OS can reuse the processor for other threads when sleep
is called. Still, we will require a sleep time much larger than the time slice on a typical OS, which ranges from 10 to 100 milliseconds. The following code depicts this:
object ThreadsSleep extends App { val t = thread { Thread.sleep(1000) log("New thread running.") Thread.sleep(1000) log("Still running.") Thread.sleep(1000) log("Completed.") } t.join() log("New thread joined.") }
The main thread of the ThreadSleep
application creates and starts a new t
thread that sleeps for one second, then outputs some text, and repeats this two or more times before terminating. The main thread calls join
as before and then prints "New thread joined."
.
Note that we are now using the log
method described in Chapter 1, Introduction. The log
method prints the specified string value along with the name of the thread that calls the log
method.
Regardless of how many times you run the preceding application, the last output will always be "New thread joined."
. This program is deterministic: given a particular input, it will always produce the same output, regardless of the execution schedule chosen by the OS.
However, not all the applications using threads will always yield the same output if given the same input. The following code is an example of a nondeterministic application:
object ThreadsNondeterminism extends App { val t = thread { log("New thread running.") } log("...") log("...") t.join() log("New thread joined.") }
There is no guarantee that the log("...")
statements in the main thread occur before or after the log
call in the t
thread. Running the application several times on a multicore processor prints "..."
before, after, or interleaved with the output by the t
thread. By running the program, we get the following output:
run-main-46: ... Thread-80: New thread running. run-main-46: ... run-main-46: New thread joined.
Running the program again results in a different order between these outputs:
Thread-81: New thread running. run-main-47: ... run-main-47: ... run-main-47: New thread joined.
Most multithreaded programs are nondeterministic, and this is what makes multithreaded programming so hard. There are multiple possible reasons for this. First, the program might be too big for the programmer to reason about its determinism properties, and interactions between threads could simply be too complex. But some programs are inherently nondeterministic. A web server has no idea which client will be the first to send a request for a web page. It must allow these requests to arrive in any possible order and respond to them as soon as they arrive. Depending on the order in which the clients prepare inputs for the web server, they can behave differently even though the requests might be the same.
Atomic execution
We have already seen one basic way in which threads can communicate: by waiting for each other to terminate. The information that the joined thread delivers is that it has terminated. In practice, however, this information is not necessarily useful; for example, a thread that renders one page in a web browser must inform the other threads that a specific URL has been visited.
It turns out that the join
method on threads has an additional property. All the writes to memory performed by the thread being joined occur before the join
call returns, and are visible to the thread that called the join
method. This is illustrated by the following example:
object ThreadsCommunicate extends App { var result: String = null val t = thread { result = "\nTitle\n" + "=" * 5 } t.join() log(result) }
The main thread will never print null
, as the call to join
always occurs before the log
call, and the assignment to result
occurs before the termination of t
. This pattern is a very basic way in which the threads can use their results to communicate with each other.
However, this pattern only allows very restricted one-way communication, and it does not allow threads to mutually communicate during their execution. There are many use cases for an unrestricted two-way communication. One example is assigning unique identifiers, in which a set of threads must concurrently choose numbers such that no two threads produce the same number. We might be tempted to proceed as in the following example, which will not work correctly. We will start by showing the first half of the program:
object ThreadsUnprotectedUid extends App { var uidCount = 0L def getUniqueId() = { val freshUid = uidCount + 1 uidCount = freshUid freshUid }
In the preceding code snippet, we first declare a uidCount
variable that will hold the last unique identifier picked by any thread. The threads will call the getUniqueId
method to compute the first unused identifier, and then update the uidCount
variable. In this example, reading uidCount
to initialize freshUid
and assigning freshUid
back to uniqueUid
do not necessarily happen together. We say that the two statements do not happen atomically, since the statements from the other threads can interleave arbitrarily. We next define a printUniqueIds
method such that, given a number n
, the method calls getUniqueId
to produce n
unique identifiers and then prints them. We use Scala for-comprehensions to map the range 0 until n
to unique identifiers. Finally, the main thread starts a new t
thread that calls the printUniqueIds
method, and then calls printUniqueIds
concurrently with the t
thread as follows:
def printUniqueIds(n: Int): Unit = { val uids = for (i<- 0 until n) yield getUniqueId() log(s"Generated uids: $uids") } val t = thread { printUniqueIds(5) } printUniqueIds(5) t.join() }
Running this application several times reveals that the identifiers generated by the two threads are not necessarily unique; the application prints Vector(1, 2, 3, 4, 5)
and Vector(1, 6, 7, 8, 9)
in some runs, but not in the others! The outputs of the program depend on the timing at which the statements in separate threads get executed.
Note
A race condition is a phenomenon in which the output of a concurrent program depends on the execution schedule of the statements in the program.
A race condition is not necessarily an incorrect program behavior. However, if some execution schedule causes an undesired program output, the race condition is considered to be a program error. The race condition from the previous example is a program error, because the getUniqueId
method is not atomic. The t
thread and the main thread sometimes concurrently call getUniqueId
. In the first line, they concurrently read the value of uidCount
, which is initially 0
, and conclude that their own freshUid
variable should be 1
. The freshUid
variable is a local variable, so it is allocated on the thread stack; each thread sees a separate instance of that variable. At this point, the threads decide to write the value 1
back to uidCount
in any order, and both return a non-unique identifier 1
. This is illustrated in the following figure:

There is a mismatch between the mental model that most programmers inherit from sequential programming and the execution of the getUniqueId
method when it is run concurrently. This mismatch is grounded in the assumption that getUniqueId
executes atomically. Atomic execution of a block of code means that the individual statements in that block of code executed by one thread cannot interleave with those statements executed by another thread. In atomic execution, the statements can only be executed all at once, which is exactly how the uidCount
field should be updated. The code inside the getUniqueId
function reads, modifies, and writes a value, which is not atomic on the JVM. An additional language construct is necessary to guarantee atomicity. The fundamental Scala construct that allows this sort of atomic execution is called the synchronized
statement, and it can be called on any object. This allows us to define getUniqueId
as follows:
def getUniqueId() = this.synchronized { val freshUid = uidCount + 1 uidCount = freshUid freshUid }
The synchronized
call ensures that the subsequent block of code can only execute if there is no other thread simultaneously executing this synchronized block of code, or any other synchronized block of code called on the same this
object. In our case, the this
object is the enclosing singleton object, ThreadsUnprotectedUid
, but in general, this can be an instance of the enclosing class or trait.
Two concurrent invocations of the getUniqueId
method are shown in the following figure:

We can also call synchronized
and omit the this
part, in which case the compiler will infer what the surrounding object is, but we strongly discourage you from doing so. Synchronizing on incorrect objects results in concurrency errors that are not easily identified.
Tip
Always explicitly declare the receiver for the synchronized
statement—doing so protects you from subtle and hard-to-spot program errors.
The JVM ensures that the thread executing a synchronized
statement invoked on some x
object is the only thread executing any synchronized
statement on that particular x
object. If a T
thread calls synchronized
on x
, and there is another S
thread calling synchronized
on x, then the T
thread is put into the blocked state. Once the S
thread completes its synchronized
statement, the JVM can choose the T
thread to execute its own synchronized
statement.
Every object created inside the JVM has a special entity called an intrinsic lock or a monitor, which is used to ensure that only one thread is executing some synchronized
block on that object. When a thread starts executing the synchronized
block, we can say that the T
thread gains ownership of the x
monitor, or alternatively, acquires it. When a thread completes the synchronized
block, we can say that it releases the monitor.
The synchronized
statement is one of the fundamental mechanisms for inter-thread communication in Scala and on the JVM. Whenever there is a possibility that multiple threads access and modify a field in some object, you should use the synchronized
statement.
Reordering
The synchronized
statement is not without a price: writes to fields such as uidCount
, which are protected by the synchronized
statement are usually more expensive than regular unprotected writes. The performance penalty of the synchronized
statement depends on the JVM implementation, but it is usually not large. You might be tempted to avoid using synchronized
when you think that there is no bad interleaving of program statements, like the one we saw previously in the unique identifier example. Never do this! We will now show you a minimal example in which this leads to serious errors.
Let's consider the following program, in which two threads t1
and t2
access a pair of Boolean variables, a
and b
, and a pair of Int variables, x
and y
. The t1
thread sets the variable a
to true
, and then reads the value of b
. If the value of b
is true
, the t1
thread assigns 0
to y
, and otherwise it assigns 1
. The t2
thread does the opposite: it first assigns true
to the variable b
, and then assigns 0
to x
if a
is true
, and 1
otherwise. This is repeated in a loop 100000
times, as shown in the following snippet:
object ThreadSharedStateAccessReordering extends App { for (i <- 0 until 100000) { var a = false var b = false var x = -1 var y = -1 val t1 = thread { a = true y = if (b) 0 else 1 } val t2 = thread { b = true x = if (a) 0 else 1 } t1.join() t2.join() assert(!(x == 1 && y == 1), s"x = $x, y = $y") } }
This program is somewhat subtle, so we need to carefully consider several possible execution scenarios. By analyzing the possible interleaving of the instructions of the t1
and t2
threads, we can conclude that if both the threads are simultaneously assigned to a
and b
, then they will both assign 0
to x
and y
. This outcome indicates that both the threads started at almost the same time, and is shown on the left in the following illustration:

Alternatively, let's assume that the t2
thread executes faster. In this case, the t2
thread sets the variable b
to true
, and proceeds to read the value of a
. This happens before the assignment to a
by the t1
thread, so the t2
thread reads the value false
, and assigns 1
to x
. When the t1
thread executes, it sees that the value of b
is true
, and assigns 0
to y
. This sequence of events is shown on the right in the preceding illustration. Note that the case where the t1
thread starts first results in a similar assignment where x = 0
and y = 1
, so it is not shown in the illustration.
The conclusion is that regardless of how we reorder the execution of the statements in the t1
and t2
threads, the output of the program should never be such that x = 1
and y = 1
simultaneously. Thus, the assertion at the end of the loop never throws an exception.
However, after running the program several times, we get the following output, which indicates that both x
and y
can be assigned the value 1
simultaneously:
[error] Exception in thread "main": assertion failed: x = 1, y = 1
This result is scary and seems to defy common sense. Why can't we reason about the execution of the program the way we did? The answer is that by the JMM specification, the JVM is allowed to reorder certain program statements executed by one thread as long as it does not change the serial semantics of the program for that particular thread. This is because some processors do not always execute instructions in the program order. Additionally, the threads do not need to write all their updates to the main memory immediately, but can temporarily keep them cached in registers inside the processor. This maximizes the efficiency of the program and allows better compiler optimizations.
How then should we reason about multithreaded programs? The error we made when analyzing the example is that we assumed that the writes by one thread are immediately visible to all the other threads. We always need to apply proper synchronization to ensure that the writes by one thread are visible to another thread.
The synchronized
statement is one of the fundamental ways to achieve proper synchronization. Writes by any thread executing the synchronized
statement on an x
object are not only atomic, but also visible to threads that execute synchronized
on x
. Enclosing each assignment in the t1
and t2
threads in a synchronized
statement makes the program behave as expected.
Tip
Use the synchronized
statement on some object x
when accessing (reading or modifying) a state shared between multiple threads. This ensures that at most, a single T
thread is at any time executing a synchronized
statement on x
. It also ensures that all the writes to the memory by the T
thread are visible to all the other threads that subsequently execute synchronized
on the same object x
.
In the rest of this chapter and in Chapter 3, Traditional Building Blocks of Concurrency, we will see additional synchronization mechanisms, such as volatile and atomic variables. In the next section, we will take a look at other use cases of the synchronized
statement and learn about object monitors.