- Mastering Java for Data Science
- Alexey Grigorev
- 384字
- 2021-07-02 23:44:33
Collections
Data is the most important part of data science. When dealing with data, it needs to be efficiently stored and processed, and for this we use data structures. A data structure describes a way to store data efficiently to solve a specific problem, and the Java Collection API is the standard Java API for data structures. This API offers a wide variety of implementations that are useful in practical data science applications.
We will not describe the collection API in full detail, but concentrate on the most useful and important ones--list, set, and map interfaces.
Lists are collections where each element can be accessed by its index. The g0-to implementation of the List interface is ArrayList, which should be used in 99% of cases and it can be used as follows:
List<String> list = new ArrayList<>();
list.add("alpha");
list.add("beta");
list.add("beta");
list.add("gamma");
System.out.println(list);
There are other implementations of the List interface, LinkedList or CopyOnWriteArrayList, but they are rarely needed.
Set is another interface in the Collections API, and it describes a collection which allows no duplicates. The go-to implementation is HashSet, if the order in which we insert elements does not matter, or LinkedHashSet, if the order matters. We can use it as follows:
Set<String> set = new HashSet<>();
set.add("alpha");
set.add("beta");
set.add("beta");
set.add("gamma");
System.out.println(set);
List and Set both implement the Iterable interface, which makes it possible to use the for-each loop with them:
for (String el : set) {
System.out.println(el);
}
The Map interface allows mapping keys to values, and is sometimes called as dictionary or associative array in other languages. The g0-to implementation is HashMap:
Map<String, String> map = new HashMap<>();
map.put("alpha", "α");
map.put("beta", "β");
map.put("gamma", "γ");
System.out.println(map);
If you need to keep the insertion order, you can use LinkedHashMap; if you know that the map interface will be accessed from multiple threads, use ConcurrentHashMap.
The Collections class provides several helper methods for dealing with collections such as sorting, or extracting the max or min elements:
String min = Collections.min(list);
String max = Collections.max(list);
System.out.println("min: " + min + ", max: " + max);
Collections.sort(list);
Collections.shuffle(list);
There are other collections such as Queue, Deque, Stack, thread-safe collections, and some others. They are less frequently used and not very important for data science.
- 計算機組成原理與接口技術:基于MIPS架構實驗教程(第2版)
- 數據產品經理高效學習手冊:產品設計、技術常識與機器學習
- 輕松學大數據挖掘:算法、場景與數據產品
- Voice Application Development for Android
- 云計算與大數據應用
- 大數據:規劃、實施、運維
- Neural Network Programming with TensorFlow
- Scratch 3.0 藝術進階
- Spark大數據分析實戰
- 深入淺出 Hyperscan:高性能正則表達式算法原理與設計
- 金融商業算法建模:基于Python和SAS
- Instant Autodesk AutoCAD 2014 Customization with .NET
- 數據庫與數據處理:Access 2010實現
- 區塊鏈+:落地場景與應用實戰
- Gideros Mobile Game Development