官术网_书友最值得收藏!

Differentiating concurrency and parallelism

Concurrency and parallelism are two terms that you hear a lot about now when it comes to programming for multiprocessor machines. While the terms themselves and techniques behind the two are distinctly different, sometimes people will confuse one for the other. Because of this, I think it's important to quickly clear the air on these two terms before moving forward, as I will reference them in various places throughout the rest of the book.

Defining concurrency in computing

If you go and look up the word concurrent in the dictionary, you will see that it's a term that applies some form of competition. I think that's a good place to start when trying to understand how concurrency relates to computing and computer programming.

Back in the old days, when computers only had one CPU, that processor had to be smart about how it scheduled and executed the work that was being requested of it. Imagine you were on your old, single CPU computer and were typing out a document and then decided to also download a file from the Internet. Your single CPU can only do one thing at once. So, if it decided to pessimistically schedule that work serially, then your download would not start until you have completely finished using and closed your word-processor application.

In a model like that, there is no concurrency at all. Each task runs to completion sequentially, one after the other. There is no point at all when progress has been made on multiple tasks. This would be an extremely frustrating end user experience, so thankfully, the people who built your CPU designed it to be able to properly schedule and handle multiple concurrently executing tasks. When you are not actively typing in your word processor, the CPU can switch over and start to make progress on your download request. If you start typing again, it can switch control back to the word-processor application and stop making progress on the download.

This type of back and forth handling of two different tasks by the CPU is referred to as time slicing. The execution of either task never overlaps, but progress is still made on both tasks (back and forth) as opposed to completely finishing one before starting the other (which would be serial). This concurrent execution process can be seen visually in the following diagram:

In the preceding diagram, the white/non-shaded line pieces represent the time where that task is not being executed by the CPU. Then, the colored/shaded line pieces are times when that task is being executed by the CPU. You can see that there is no point at all across the two tasks where the white and green lines overlap. Another way to refer to those tasks would be as threads of execution within the CPU. This is a good way to think of things moving forward as you probably already are familiar with threads and multi-threaded programming within Scala/Java and in the JVM.

Defining parallelism

In defining concurrency, I used a dictionary definition to set the baseline understanding that concurrency is about competition. We can use the same approach here when defining parallelism.

If you look up parallel in the dictionary, it's basically defined as two independent lines that do not meet or intersect with each other. If the concurrent model of task execution never involves two tasks overlapping in execution, then parallelism is the opposite, in that, the tasks entirely overlap. Parallelism is not something that can be leveraged in a single CPU machine. You need multiple CPUs to each run the tasks that are overlapping in their execution.

The main goal of parallelism in programming is to get stuff done faster by breaking sequential steps up into tasks that can run in parallel. Say, you have a task A, and in order for A to complete, it must complete subtasks 1, 2, and 3, each taking one second to complete. The naive approach would be to execute these three subtasks serially, back to back to back, similar to the following diagram:

If we took this serial approach to task A's execution, then it would take three total seconds to complete. If we instead decided to execute the subtasks in parallel, then the picture should change to look like the following diagram:

If we decided to follow this parallel model for subtask execution, then the total execution time for Task A would be one second, as we are running all of the subtasks at the exact same time. Even though each subtask takes one second to execute, because we are running them all at the same time (as opposed to sequentially), then the total time to execute is still only one second.

If you compare the two pictures, you can think of the length dimension of task A bubble as representing execution time and the height as representing processor or thread needs. If you want to get things done more quickly, in parallel, you need to have more CPUs available to run those tasks at the same time.

The dangers of concurrent programming

When defining concurrency, I gave a pretty low-level description. So how does this description relate to the code that you write on the JVM? Well, if the code you write is ever running inside of a JVM where there is more than one Java thread running, then you need to be aware of and think of concurrency and concurrent access to the state on your class instances. In a multi-threaded JVM, the CPU will start time slicing back and forth between those multiple threads, increasing the likelihood that these threads could get in each other's way and create inconsistent states within your in-memory data.

When you add in multi-core CPUs to the mix and their ability to run more than one thread at once, it increases the likelihood that multiple threads will be competing to access the same data. The multi-core CPU world really ups the ante when it comes to writing safe, concurrency-aware code. As machines get more powerful with more and more CPUs, it quickly becomes clear that concurrency is something programmers need to be aware of and keep an account for when writing their code.

Because of the threat of concurrent access to things such as state, programmers have to protect their code by synchronizing access to that state. Doing so ensures that only one thread at a time is in certain blocks within your code. Unfortunately, this really slows things down as threads are constantly competing for locks instead of executing their program code. Also, writing good, safe synchronized code can be difficult and error prone, sometimes leading to the dreaded deadlock. Thankfully, safe concurrent code is what Akka actors are all about, and we will demonstrate that in the next section.

主站蜘蛛池模板: 杭锦后旗| 南宫市| 华宁县| 桃园县| 英超| 红桥区| 丁青县| 深圳市| 岢岚县| 河西区| 祁门县| 西盟| 芦山县| 沾化县| 尖扎县| 乐至县| 海宁市| 西乌珠穆沁旗| 扶余县| 尉氏县| 小金县| 军事| 沈阳市| 安庆市| 鹤峰县| 潞西市| 牙克石市| 永胜县| 武功县| 高密市| 天峨县| 堆龙德庆县| 江西省| 通渭县| 嘉善县| 福清市| 新宁县| 沂源县| 盐源县| 亚东县| 凯里市|