書名： Java 9 Programming Blueprints
作者名： Jason Lee
本章字數： 693字
更新時間： 2021-07-02 18:56:31

Building the library

The foundational piece of this project is the library which both the CLI and the GUI will consume, so it makes sense to start here. When designing the library--its inputs, outputs, and general behavior--it helps to understand what exactly do we want this system to do, so let's take some time to discuss the functional requirements.

As stated in the introduction, we'd like to be able to search for duplicate files in an arbitrary number of directories. We'd also like to be able to restrict the search and comparison to only certain files. If we don't specify a pattern to match, then we want to check every file.

The most important part is how to identify a match. There are, of course, a myriad of ways in which this can be done, but the approach we will use is as follows:

Identify files that have the same filename. Think of those situations where you might have downloaded images from your camera to your computer for safekeeping, then, later, perhaps you forgot that you had already downloaded the images, so you copied them again somewhere else. Obviously, you only want one copy, but is the file, for example, IMG_9615.JPG, in the temp directory the same as the one in your picture backup directory? By identifying files with matching names, we can test them to be sure.
Identify files that have the same size. The likelihood of a match here is smaller, but there is still a chance. For example, some photo management software, when importing images from a device, if it finds a file with the same name, will modify the filename of the second file and store both, rather than stopping the import and requiring immediate user intervention. This can result in a large number of files such as IMG_9615.JPG and IMG_9615-1.JPG. This check will help identify these situations.
For each match above, to determine whether the files are actually a match, we'll generate a hash based on the file contents. If more than one file generates the same hash, the likelihood of those files being identical is extremely high. These files we will flag as potential duplicates.

It's a pretty simple algorithm and should be pretty effective, but we do have a problem, albeit one that's likely not immediately apparent. If you have a large number of files, especially a set with a large number of potential duplicates, processing all of these files could be a very lengthy process, which we would like to mitigate as much as possible, which leads us to some non-functional requirements:

The program should process files in a concurrent manner so as to minimize, as much as possible, the amount of time it takes to process a large file set
This concurrency should be bounded so that the system is not overwhelmed by processing the request
Given the potential for a large amount of data, the system must be designed in such a way so as to avoid using up all available RAM and causing system instability

With that fairly modest list of functional and non-functional requirements, we should be ready to begin. Like the last application, let's start by defining our module. In src/main/java, we will create this module-info.java:

    module com.steeplesoft.dupefind.lib { 
      exports com.steeplesoft.dupefind.lib; 
    }

Initially, the compiler--and the IDE--will complain that the com.steeplesoft.dupefind.lib package does not exist and won't compile the project. That's fine for now, as we'll be creating that package now.

The use of the word concurrency in the functional requirements, most likely, immediately brings to mind the idea of threads. We introduced the idea of threads in Chapter 2, Managing Java Processes, so if you are not familiar with them, review that section in the previous chapter.

Our use of threading in this project is different from that in the last, in that we will have a body of work that needs to be done, and, once it's finished, we want the threads to exit. We also need to wait for these threads to finish their work so that we can analyze it. In the java.util.concurrent package, the JDK provides several options to accomplish this.

官术网_书友最值得收藏!

Java 9 Programming Blueprints

Building the library