書名： Hands-On Data Analysis with Scala
作者名： Rajesh Gupta
本章字數： 1359字
更新時間： 2021-06-24 14:51:07

List

A List provides fast and constant time performance for head and tail operations in a collection. We can visualize a List as a collection of elements that are connected by some kind of link. Let's look at the Scala List functionality using Scala REPL, as shown in the following code:

scala> val persons = List(Person("Jon", "Doe", 21), Person("Alice", "Smith", 20), Person("Bob", "Crew", 27)) // construct a List of Person objects
persons: List[Person] = List(Person(Jon,Doe,21), Person(Alice,Smith,20), Person(Bob,Crew,27))

scala> val personHead = persons.head // first person in the collection
personHead: Person = Person(Jon,Doe,21)

scala> val personAtTwo = persons(2) // person at index 2 (this is same as apply operation)
personAtTwo: Person = Person(Bob,Crew,27)

scala> val personsTail = persons.tail // collection without the first person
personsTail: List[Person] = List(Person(Alice,Smith,20), Person(Bob,Crew,27))

scala> val personsByAge = persons.sortBy(p => p.age) // sort persons by age
personsByAge: List[Person] = List(Person(Alice,Smith,20), Person(Jon,Doe,21), Person(Bob,Crew,27))

scala> val personsByFname = persons.sortBy(p => p.fname) // sort persons by first name
personsByFname: List[Person] = List(Person(Alice,Smith,20), Person(Bob,Crew,27), Person(Jon,Doe,21))

scala> val (below25, above25) = persons.partition(p => p.age <= 25) // split persons by age
below25: List[Person] = List(Person(Jon,Doe,21), Person(Alice,Smith,20))
above25: List[Person] = List(Person(Bob,Crew,27))

scala> val updatePersons = persons.updated(0, Person("Jon", "Doe", 20)) // update first element
updatePersons: List[Person] = List(Person(Jon,Doe,20), Person(Alice,Smith,20), Person(Bob,Crew,27))

The following is a summary of the List operations and their associated performance characteristics:

As can be seen in the preceding table, the List enables very fast head, tail, and prepend operations. For the array type described earlier, we saw that tail was an expensive linear time operation. The apply operation for getting an element at the specified index is a linear time operation. This is because the desired element can only be located by traversing the links, starting from the head. This explains why an update is a slow operation for the List.

In a real-world scenario, constant time performance is the desired behavior and we want to avoid linear time performance, particularly for large datasets. Performance is an important factor in determining the most suitable data structure for the problem being solved. If constant time performance is not practical, we generally look for data structures and algorithms that provide Log Time performance O(log n): time proportional to the logarithm of the collection size. Note that there are many algorithms, such as sorting, with best performance times of O(n log n). When dealing with large datasets, a good understanding of the performance characteristics of the data structures and algorithms that are used goes a long way in solving problems effectively and efficiently.

Similar considerations hold true for memory usage, even though larger amounts of RAM are now becoming available at a cheaper price. This is because the growth in the size of data being produced is much higher than the drop in prices of RAM.

Let's now look at ListBuffer, which can be used for constructing a list more efficiently. This will be very useful, given that the append operation has significant performance overheads. As mentioned earlier, datasets are generally constructed once but are used multiple times during data analysis processes. Let's look at the following code:

scala> import scala.collection.mutable.ListBuffer
import scala.collection.mutable.ListBuffer

scala> val personsBuf = ListBuffer[Person]() // create ListBuffer of Person
personsBuf: scala.collection.mutable.ListBuffer[Person] = ListBuffer()

scala> personsBuf.append(Person("Jon", "Doe", 21)) // append a Person object at end

scala> personsBuf.prepend(Person("Alice", "Smith", 20)) // prepend a Person object at head

scala> personsBuf.insert(1, Person("Bob", "Crew", 27)) // insert a Person object at index 1

scala> val persons = personsBuf.toList // materialize into a List of Person
persons: List[Person] = List(Person(Alice,Smith,20), Person(Bob,Crew,27), Person(Jon,Doe,21))

scala> val personRemoved = personsBuf.remove(1) // remove Person object at index 1
personRemoved: Person = Person(Bob,Crew,27)

scala> val personsUpdated = personsBuf.toList // materialize into a List of Person
personsUpdated: List[Person] = List(Person(Alice,Smith,20), Person(Jon,Doe,21))

If we compare ArrayBuffer and ListBuffer, we can see that they both offer similar APIs. Their primary use is for constructing an array and list respectively, providing good performance characteristics.

The decision of choosing between array and list is dependent on how the dataset will be used. The following are some useful tips:

An array should generally be the first choice because of its storage efficiency. Array operations are somewhat limited compared to list operations, and the usage pattern becomes the determining factor.
If a tail operation is necessary, a list is the obvious choice. In fact, there are many recursive algorithms that make extensive use of this feature. Using an array instead of a list will result in a significant performance penalty.
If apply or update operations are desired, then an array is certainly a better choice.
If the prepend operation is needed or if a limited use of append is required, then a list is the only choice because an array does not support the prepend or append operations.

As you can see, there are many factors at play when it comes to selecting the appropriate data structure. This is often the case in any software design decision where there are conflicting needs and you need to decide how to make trade-offs. For example, you might decide in favor of using a list even though none of the non-array features of a list are required based on current usage patterns. This could be because of the list's fast tail operation, which could be beneficial for the recursive algorithms in future usage patterns.

Recursive algorithms play a central role in functional programming. In fact, Scala supports tail-recursion optimization out of the box, which facilitates practical usage of recursive algorithms with large datasets. With classic recursion, a significant amount of stack space is required, making it impractical to use on large datasets. With tail-recursion optimization, the Scala compiler eliminates the stack growth under the hood. Let's look at a classic recursion and tail-recursion example:

scala> import annotation.tailrec
import annotation.tailrec

scala> @tailrec def classicFactorial(n: Int): Int = { require(n > 0); if (n == 1) n else n * classicFactorial(n-1) } // this should fail
<console>:14: error: could not optimize @tailrec annotated method classicFactorial: it contains a recursive call not in tail position
       @tailrec def classicFactorial(n: Int): Int = { require(n > 0); if (n == 1) n else n * classicFactorial(n-1) }

scala> def classicFactorial(n: Int): Int = { require(n > 0, "n must be non-zero and positive"); if (n == 1) n else n * classicFactorial(n-1) } // this should work
classicFactorial: (n: Int)Int

scala> val classicResult = classicFactorial(5)
classicResult: Int = 120
scala> def tailRecFactorial(n: Int): Int = {
     | require(n > 0, "n must be non-zero and positive")
     | @tailrec def factorial(acc: Int, m: Int): Int = if (m == 1) acc else factorial(acc * m, m-1) // this should work as this recursive algorithm meets tail recursion requirements
     | factorial(1, n)
     | }
tailRecFactorial: (n: Int)Int

scala> val tailRecResult = tailRecFactorial(5)
tailRecResult: Int = 120

The preceding examples provides insight into recursive functions, and in particular demonstrates a tail-recursion variant. Let's look at the following example of a tail-recursion algorithm using List:

scala> val persons = List(Person("Jon", "Doe", 21), Person("Alice", "Smith", 20), Person("Bob", "Crew", 27))
persons: List[Person] = List(Person(Jon,Doe,21), Person(Alice,Smith,20), Person(Bob,Crew,27))

scala> @tailrec def minAgePerson(acc: Option[Person], lst: List[Person]): Option[Person] = {
     | if (lst.isEmpty) acc
     | else if (acc.isEmpty) minAgePerson(Some(lst.head), lst.tail)
     | else if (acc.get.age <= lst.head.age) minAgePerson(acc, lst.tail)
     | else minAgePerson(Some(lst.head), lst.tail)
     | }
minAgePerson: (acc: Option[Person], lst: List[Person])Option[Person]

scala> val youngest = minAgePerson(None, persons) // Person with minimum age
youngest: Option[Person] = Some(Person(Alice,Smith,20))

scala> val youngestEmpty = minAgePerson(None, Nil) // Nil == List(), an empty list
youngestEmpty: Option[Person] = None

The preceding code is a very simple example of finding a Person with the minimum age from a list of Person objects. This simple example, however, illustrates the following important and powerful points regarding Scala:

It is fairly straightforward to write a tail-recursive algorithm using a list in Scala that accumulates information. This algorithm can traverse the entire list without incurring the overhead of stack growth in a classic recursion.
Scala's option construct provides a convenient way of representing the presence or absence of an object.
List's head and tail operations come in handy in writing such recursive algorithms, and provide the desired constant time performance for both these operations.
The code is concise and works even on the empty list.
Using the accumulator is a commonly used pattern in turning a classic recursion algorithm into a tail-recursion algorithm.

官术网_书友最值得收藏!

Hands-On Data Analysis with Scala

List