- Mastering Java for Data Science
- Alexey Grigorev
- 384字
- 2021-07-02 23:44:33
Collections
Data is the most important part of data science. When dealing with data, it needs to be efficiently stored and processed, and for this we use data structures. A data structure describes a way to store data efficiently to solve a specific problem, and the Java Collection API is the standard Java API for data structures. This API offers a wide variety of implementations that are useful in practical data science applications.
We will not describe the collection API in full detail, but concentrate on the most useful and important ones--list, set, and map interfaces.
Lists are collections where each element can be accessed by its index. The g0-to implementation of the List interface is ArrayList, which should be used in 99% of cases and it can be used as follows:
List<String> list = new ArrayList<>();
list.add("alpha");
list.add("beta");
list.add("beta");
list.add("gamma");
System.out.println(list);
There are other implementations of the List interface, LinkedList or CopyOnWriteArrayList, but they are rarely needed.
Set is another interface in the Collections API, and it describes a collection which allows no duplicates. The go-to implementation is HashSet, if the order in which we insert elements does not matter, or LinkedHashSet, if the order matters. We can use it as follows:
Set<String> set = new HashSet<>();
set.add("alpha");
set.add("beta");
set.add("beta");
set.add("gamma");
System.out.println(set);
List and Set both implement the Iterable interface, which makes it possible to use the for-each loop with them:
for (String el : set) {
System.out.println(el);
}
The Map interface allows mapping keys to values, and is sometimes called as dictionary or associative array in other languages. The g0-to implementation is HashMap:
Map<String, String> map = new HashMap<>();
map.put("alpha", "α");
map.put("beta", "β");
map.put("gamma", "γ");
System.out.println(map);
If you need to keep the insertion order, you can use LinkedHashMap; if you know that the map interface will be accessed from multiple threads, use ConcurrentHashMap.
The Collections class provides several helper methods for dealing with collections such as sorting, or extracting the max or min elements:
String min = Collections.min(list);
String max = Collections.max(list);
System.out.println("min: " + min + ", max: " + max);
Collections.sort(list);
Collections.shuffle(list);
There are other collections such as Queue, Deque, Stack, thread-safe collections, and some others. They are less frequently used and not very important for data science.
- 數據庫原理及應用教程(第4版)(微課版)
- Access 2016數據庫教程(微課版·第2版)
- Python數據挖掘:入門、進階與實用案例分析
- 計算機信息技術基礎實驗與習題
- 新型數據庫系統:原理、架構與實踐
- 揭秘云計算與大數據
- 算法與數據中臺:基于Google、Facebook與微博實踐
- Sybase數據庫在UNIX、Windows上的實施和管理
- INSTANT Cytoscape Complex Network Analysis How-to
- 金融商業算法建模:基于Python和SAS
- Spark分布式處理實戰
- Unreal Engine Virtual Reality Quick Start Guide
- Expert Python Programming(Third Edition)
- 中國云存儲發展報告
- Node.js High Performance