官术网_书友最值得收藏!

Collections

Data is the most important part of data science. When dealing with data, it needs to be efficiently stored and processed, and for this we use data structures. A data structure describes a way to store data efficiently to solve a specific problem, and the Java Collection API is the standard Java API for data structures. This API offers a wide variety of implementations that are useful in practical data science applications.

We will not describe the collection API in full detail, but concentrate on the most useful and important ones--list, set, and map interfaces.

Lists are collections where each element can be accessed by its index. The g0-to implementation of the List interface is ArrayList, which should be used in 99% of cases and it can be used as follows:

List<String> list = new ArrayList<>(); 
list.add("alpha");
list.add("beta");
list.add("beta");
list.add("gamma");
System.out.println(list);

There are other implementations of the List interface, LinkedList or CopyOnWriteArrayList, but they are rarely needed.

Set is another interface in the Collections API, and it describes a collection which allows no duplicates. The go-to implementation is HashSet, if the order in which we insert elements does not matter, or LinkedHashSet, if the order matters. We can use it as follows:

Set<String> set = new HashSet<>(); 
set.add("alpha");
set.add("beta");
set.add("beta");
set.add("gamma");
System.out.println(set);

List and Set both implement the Iterable interface, which makes it possible to use the for-each loop with them:

for (String el : set) { 
System.out.println(el);
}

The Map interface allows mapping keys to values, and is sometimes called as dictionary or associative array in other languages. The g0-to implementation is HashMap:

Map<String, String> map = new HashMap<>(); 
map.put("alpha", "α");
map.put("beta", "β");
map.put("gamma", "γ");
System.out.println(map);

If you need to keep the insertion order, you can use LinkedHashMap; if you know that the map interface will be accessed from multiple threads, use ConcurrentHashMap.

The Collections class provides several helper methods for dealing with collections such as sorting, or extracting the max or min elements:

String min = Collections.min(list); 
String max = Collections.max(list);
System.out.println("min: " + min + ", max: " + max);
Collections.sort(list);
Collections.shuffle(list);

There are other collections such as Queue, Deque, Stack, thread-safe collections, and some others. They are less frequently used and not very important for data science.

主站蜘蛛池模板: 德保县| 响水县| 禹州市| 依兰县| 茂名市| 晋州市| 深圳市| 临洮县| 阿合奇县| 内乡县| 桑植县| 皮山县| 修武县| 突泉县| 望江县| 阳山县| 兰西县| 西乌珠穆沁旗| 全州县| 兴城市| 瑞昌市| 句容市| 吐鲁番市| 蒙阴县| 南漳县| 镇赉县| 阳原县| 曲阳县| 贡觉县| 西畴县| 建始县| 石台县| 呼和浩特市| 凉城县| 渝北区| 郓城县| 无锡市| 前郭尔| 托克逊县| 麻城市| 通辽市|