書名： Hands-On Data Analysis with Scala
作者名： Rajesh Gupta
本章字數： 1330字
更新時間： 2021-06-24 14:51:06

Functional programming using Scala

In the functional programming paradigm, functions become the primary tool for modeling solutions to a problem. In the simplest form, we can think of a function as something that accepts one or more input and produces an output.

To illustrate this concept, let's define a function in Scala that accepts two sets of integer input and returns the sum of two integers plus one, as shown in the following code:

scala> val addAndInc = (a: Int, b: Int) => a + b + 1
addAndInc: (Int, Int) => Int = <function2>

scala> addAndInc(5, 10)
res0: Int = 16

In the preceding example, we have created an anonymous function that takes two sets of integer input and returns an integer output. This function increments the sum of two input numbers and returns the result. There is another way of defining a function in Scala as a named method. Let's look at that in the Scala REPL:

scala> def addAndIncMethod(a: Int, b: Int) = a + b + 1
addAndIncMethod: (a: Int, b: Int)Int

scala> addAndIncMethod(5, 10)
res1: Int = 16

scala> val methodAsFunc = addAndIncMethod
<console>:12: error: missing argument list for method addAndIncMethod
Unapplied methods are only converted to functions when a function type is expected.
You can make this conversion explicit by writing `addAndIncMethod _` or `addAndIncMethod(_,_)` instead of `addAndIncMethod`.
       val methodAsFunc = addAndIncMethod

In the preceding example, we have defined a method that is bound to a name. The usage of the anonymous function and the named method is identical; however, there are some subtle differences between the two, as shown in the following list:

The signature of the anonymous function is (Int, Int) => Int and the signature of the named method is (a: Int, b: Int)Int
When we try to assign a method to a variable, we get an error

A method can be easily converted into an anonymous function in Scala by doing the following:

scala> val methodAsFunc = addAndIncMethod _ // turns method into function
methodAsFunc: (Int, Int) => Int = <function2>

scala> methodAsFunc(5, 10)
res2: Int = 16

As can be seen, after conversion, the signature changes to (Int, Int) => Int.

Anonymous functions and named methods are both useful in the context of Scala programming. In the previous section on Scala's object-oriented programming, we defined a scala class with some methods. These were all named methods, and would not be of much value from an object-oriented point of view if these were anonymous methods.

The object-oriented approach puts data and behavior together. Objects have mutable states that are manipulated by methods operating on those objects. From a purely functional approach, inputs are never mutated and new outputs are created when a function is applied to these inputs. There is a strong emphasis on immutability in the functional paradigm. Immutability has strong advantages when it comes to the reasoning behind the behavior of a program. You do not have to be concerned about the accidental corruption of a shared state as there are no shared states in the pure functional paradigm.

Functions can be defined within a function. Let's look at the following concrete example using Scala REPL to see why this is useful:

scala> def factorial(n: Int): Long = if (n <= 1) 1 else n * factorial(n-1)
factorial: (n: Int)Int

scala> factorial(5)
res0: Long = 120

The preceding code is an example of the factorial function, and it uses recursion to compute this value. We know that classic recursion requires a stack size proportional to the number of iterations. Scala provides tail recursion optimization to address this problem, and we can make use of an inner function to optimize this problem. In the following code, we'll define a function inside a function:

scala> import scala.annotation.tailrec
import scala.annotation.tailrec

scala> def optimizedFactorial(n: Int): Long = {
     | @tailrec
     | def factorialAcc(acc: Long, i: Int): Long = {
     | if (i <= 1) acc else factorialAcc(acc * i, i -1)
     | }
     | factorialAcc(1, n)
     | }
optimizedFactorial: (n: Int)Long

scala> optimizedFactorial(5)
res1: Long = 120

We can call the factorialAcc function an inner function and the main function, optimizedFactorial, an outer function. The outer function preserves the same interface as the factorial function defined earlier. The inner function uses an accumulator design pattern that allows tail recursion optimization to work. In each iteration, the results are accumulated into an accumulator variable that is not dependent on the iteration variable.

Recursion is a very useful programming construct and tail recursion is a special type of recursion that can be optimized at compile time. Let us first try to understand what recursion really is and what type of recursion is considered tail recursive. In simple terms, any function that calls or invokes itself one or more times is considered recursive. Our factorial examples in both forms are recursive. Let us look at the execution of the first version for a value of 5:

factorial(5)
5 * factorial(4)
5 * 4 * factorial(3)
5 * 4 * 3 * factorial(2)
5 * 4 * 3 * 2 * factorial(1)
5 * 4 * 3 * 2 * 1
120

For the second version:

optimizedFactorial(5)
factorialAcc(1, 5)
factorialAcc(1*5, 4)
factorialAcc(1*5*4, 3)
factorialAcc(1*5*4*3, 2)
factorialAcc(1*5*4*3*2, 1)
1*5*4*3*2
120

The most important difference between the two versions is in the last return statement of the function:

// output of next invocation must be multiplied to current number, so // the state (current number) has to preserved on stack frame
n * factorial(n-1)

// function being called already knows the current state being passed // as the first argument so it does not need preserved on stack frame
factorialAcc(acc * i, i -1)

Recursive functions become expensive due to the usage of stack frames proportional to the number times a function invokes itself. Just imagine running factorial for a large number, it would easily cause stack overflow to occur fairly quickly. Tail recursion property of a recursive algorithm allows us to perform some optimizations to reduce the stack usage as if it was an iterative algorithm.

The following is a screenshot of the preceding recursion examples in IntelliJ IDE. The IDE helps us clearly see which functions or methods are purely Recursive and which ones are Tail Recursive:

Please note the specific symbols next to factorial and optimizedFactorial. The two symbols are different, and if you hover over them, you can see the full description, listed as follows:

Method factorial is recursive
Method factorial is tail recursive

Let's use the following code to see whether we are able to apply tail recursion optimization to the original factorial function in Scala REPL:

scala> @tailrec
     | def factorial(n: Int): Long = if (n <= 1) 1 else n * factorial(n-1)
<console>:14: error: could not optimize @tailrec annotated method factorial: it contains a recursive call not in tail position
       def factorial(n: Int): Long = if (n <= 1) 1 else n * factorial(n-1)

As you can see from the error message, tail recursion optimization cannot be applied in this case. The Scala compiler is able to perform the tail recursion optimization of a recursive function when it follows certain constraints. The details of this are not within the scope of this book; however, this is a very important and useful feature provided by Scala. As we saw in the preceding code, by using an inner function and an accumulator design pattern, we are able to successfully achieve this optimization. As a result of this optimization, the stack usage requirements for the recursive function become equivalent to that of the iterative function.

Functional programming treats functions as first-class citizens. In fact, Scala's collection API heavily makes use of this feature. Functions can be defined within a function and these can be passed as parameters.

From a data analysis and processing perspective, functional programming provides a framework for massively parallel processing solutions. If the input data is immutable, transformations can be applied to the input any number of times and will always result in the same output.

Next, we will look at Scala's case classes and the collection API, which provide some major advantages in data analysis.

官术网_书友最值得收藏!

Hands-On Data Analysis with Scala

Functional programming using Scala