- Mastering Java for Data Science
- Alexey Grigorev
- 384字
- 2021-07-02 23:44:33
Collections
Data is the most important part of data science. When dealing with data, it needs to be efficiently stored and processed, and for this we use data structures. A data structure describes a way to store data efficiently to solve a specific problem, and the Java Collection API is the standard Java API for data structures. This API offers a wide variety of implementations that are useful in practical data science applications.
We will not describe the collection API in full detail, but concentrate on the most useful and important ones--list, set, and map interfaces.
Lists are collections where each element can be accessed by its index. The g0-to implementation of the List interface is ArrayList, which should be used in 99% of cases and it can be used as follows:
List<String> list = new ArrayList<>();
list.add("alpha");
list.add("beta");
list.add("beta");
list.add("gamma");
System.out.println(list);
There are other implementations of the List interface, LinkedList or CopyOnWriteArrayList, but they are rarely needed.
Set is another interface in the Collections API, and it describes a collection which allows no duplicates. The go-to implementation is HashSet, if the order in which we insert elements does not matter, or LinkedHashSet, if the order matters. We can use it as follows:
Set<String> set = new HashSet<>();
set.add("alpha");
set.add("beta");
set.add("beta");
set.add("gamma");
System.out.println(set);
List and Set both implement the Iterable interface, which makes it possible to use the for-each loop with them:
for (String el : set) {
System.out.println(el);
}
The Map interface allows mapping keys to values, and is sometimes called as dictionary or associative array in other languages. The g0-to implementation is HashMap:
Map<String, String> map = new HashMap<>();
map.put("alpha", "α");
map.put("beta", "β");
map.put("gamma", "γ");
System.out.println(map);
If you need to keep the insertion order, you can use LinkedHashMap; if you know that the map interface will be accessed from multiple threads, use ConcurrentHashMap.
The Collections class provides several helper methods for dealing with collections such as sorting, or extracting the max or min elements:
String min = Collections.min(list);
String max = Collections.max(list);
System.out.println("min: " + min + ", max: " + max);
Collections.sort(list);
Collections.shuffle(list);
There are other collections such as Queue, Deque, Stack, thread-safe collections, and some others. They are less frequently used and not very important for data science.
- 我們都是數據控:用大數據改變商業、生活和思維方式
- 正則表達式必知必會
- 新型數據庫系統:原理、架構與實踐
- 深入淺出MySQL:數據庫開發、優化與管理維護(第2版)
- 大數據:從概念到運營
- Scratch 3.0 藝術進階
- 云原生數據中臺:架構、方法論與實踐
- Hadoop大數據開發案例教程與項目實戰(在線實驗+在線自測)
- Google Cloud Platform for Developers
- MySQL技術內幕:SQL編程
- Spark分布式處理實戰
- 數據庫與數據處理:Access 2010實現
- Expert Python Programming(Third Edition)
- Visual FoxPro數據庫技術基礎
- 利用Python進行數據分析(原書第2版)