- Java 9 Programming Blueprints
- Jason Lee
- 693字
- 2021-07-02 18:56:31
Building the library
The foundational piece of this project is the library which both the CLI and the GUI will consume, so it makes sense to start here. When designing the library--its inputs, outputs, and general behavior--it helps to understand what exactly do we want this system to do, so let's take some time to discuss the functional requirements.
As stated in the introduction, we'd like to be able to search for duplicate files in an arbitrary number of directories. We'd also like to be able to restrict the search and comparison to only certain files. If we don't specify a pattern to match, then we want to check every file.
The most important part is how to identify a match. There are, of course, a myriad of ways in which this can be done, but the approach we will use is as follows:
- Identify files that have the same filename. Think of those situations where you might have downloaded images from your camera to your computer for safekeeping, then, later, perhaps you forgot that you had already downloaded the images, so you copied them again somewhere else. Obviously, you only want one copy, but is the file, for example, IMG_9615.JPG, in the temp directory the same as the one in your picture backup directory? By identifying files with matching names, we can test them to be sure.
- Identify files that have the same size. The likelihood of a match here is smaller, but there is still a chance. For example, some photo management software, when importing images from a device, if it finds a file with the same name, will modify the filename of the second file and store both, rather than stopping the import and requiring immediate user intervention. This can result in a large number of files such as IMG_9615.JPG and IMG_9615-1.JPG. This check will help identify these situations.
- For each match above, to determine whether the files are actually a match, we'll generate a hash based on the file contents. If more than one file generates the same hash, the likelihood of those files being identical is extremely high. These files we will flag as potential duplicates.
It's a pretty simple algorithm and should be pretty effective, but we do have a problem, albeit one that's likely not immediately apparent. If you have a large number of files, especially a set with a large number of potential duplicates, processing all of these files could be a very lengthy process, which we would like to mitigate as much as possible, which leads us to some non-functional requirements:
- The program should process files in a concurrent manner so as to minimize, as much as possible, the amount of time it takes to process a large file set
- This concurrency should be bounded so that the system is not overwhelmed by processing the request
- Given the potential for a large amount of data, the system must be designed in such a way so as to avoid using up all available RAM and causing system instability
With that fairly modest list of functional and non-functional requirements, we should be ready to begin. Like the last application, let's start by defining our module. In src/main/java, we will create this module-info.java:
module com.steeplesoft.dupefind.lib { exports com.steeplesoft.dupefind.lib; }
Initially, the compiler--and the IDE--will complain that the com.steeplesoft.dupefind.lib package does not exist and won't compile the project. That's fine for now, as we'll be creating that package now.
The use of the word concurrency in the functional requirements, most likely, immediately brings to mind the idea of threads. We introduced the idea of threads in Chapter 2, Managing Java Processes, so if you are not familiar with them, review that section in the previous chapter.
Our use of threading in this project is different from that in the last, in that we will have a body of work that needs to be done, and, once it's finished, we want the threads to exit. We also need to wait for these threads to finish their work so that we can analyze it. In the java.util.concurrent package, the JDK provides several options to accomplish this.
- Microsoft Exchange Server PowerShell Cookbook(Third Edition)
- Python量化投資指南:基礎、數據與實戰
- 計算機圖形學編程(使用OpenGL和C++)(第2版)
- Learning ArcGIS Pro 2
- Vue.js快速入門與深入實戰
- Reactive Programming with Swift
- jQuery EasyUI網站開發實戰
- Responsive Web Design with HTML5 and CSS3
- FreeSWITCH 1.6 Cookbook
- Java深入解析:透析Java本質的36個話題
- CKA/CKAD應試教程:從Docker到Kubernetes完全攻略
- 大學計算機基礎(第2版)(微課版)
- Spring Boot進階:原理、實戰與面試題分析
- 代碼閱讀
- PHP項目開發全程實錄(第4版)