書名： Mastering Elixir
作者名： André Albuquerque Daniel Caixinha
本章字數： 829字
更新時間： 2021-08-05 10:42:48

Lazy processing with the stream module

We will now talk about a different way of processing collections, which, as functional programming, may require a shift in your mindset. Before talking about lazy processing, let's enumerate some of the shortcomings of working with the Enum module. The Enum module is referred to as being eager. This means that when processing a collection, this module will load the entire collection into memory. Furthermore, if you have a chain of functions you want to apply to a collection, the Enum module will iterate through your collection as many times as the functions are applying to it. Let's examine this further with an example:

iex> [1, 2, 3, 4, 5] \
...> |> Enum.map(&(&1 + 10)) \
...> |> Enum.zip(["a", "b", "c", "d", "e"])
[{11, "a"}, {12, "b"}, {13, "c"}, {14, "d"}, {15, "e"}]

The \ on the end of the first two lines is to stop our Elixir console from evaluating this line right away, and wait for a new line instead. This way, we can write these operations with the pipe operator on multiple lines, which makes them more readable.

We take our initial collection and iterate it to add 10 to each element inside it. This generates a new list, which is passed to our next function. This function will zip the two lists together, which will produce a new list, which is returned to us. In this simple example, we need to traverse our list twice to build the desired result.

This is where the Stream module, and lazy processing, becomes advantageous. When working with lazy enumerables, the entire collection never gets loaded into memory, and contrary to what we're accustomed to, the computations aren't made right away. The results are produced as they are needed. Let's see this same example with the Stream module:

iex> [1, 2, 3, 4, 5] \
...> |> Stream.map(&(&1 + 1)) \
...> |> Stream.zip(["a", "b", "c", "d", "e"])
#Function<66.40091930/2 in Stream.zip/1>

As you can see, we're not getting our final list back. When we feed our list to Stream.map, the list is not iterated. Instead, the functions that will be applied on it are saved into a structure (along with the collection we're working on). We can then pass this structure into the next function, which will further save a new function to be applied to our list. This is really cool! But how do we make it return the result we're expecting? Just treat it as a regular (eager) enumerable, by applying a function from the Enum module, and it will start to produce results.

To exemplify this, we'll use the Enum.take/2 function, which allows us to take a given number of items from an enumerable:

iex> [1, 2, 3, 4, 5] \
...> |> Stream.map(&(&1 + 10)) \
...> |> Stream.zip(["a", "b", "c", "d", "e"]) \
...> |> Enum.take(1)
[{11, "a"}]

As you can see, we're now getting the expected result back. Note that this is not a result of applying our computation to all the list and then just taking the first element. We've essentially only computed results for the first element, as that's all that was necessary. If you wanted to have the full list in the end, you could use the Enum.to_list/1 function.

Streams are a really nimble way to process large, or even infinite, collections. Imagine that you're parsing values from a huge CSV file, and then running some functions on them. If you're running your application on the cloud, as most of us are these days, you probably have a short amount of memory. Using lazy processing, you can avoid having to load the whole file, processing it line by line. If you're processing an infinite collection, such as an RSS feed, lazy processing is also a great solution, as you can process each element of the collection incrementally, as they arrive.

Note that while the Stream module is amazing, it will not replace your usage of the Enum module. It's certainly great for very large collections, or even if you have a big chain of functions being applied to a collection and only want to traverse it once. However, for small or even medium collections, the Stream module will perform worse, as you're adding a lot of overhead, for instance, by having to save the functions you'll apply instead of applying them right away. Always analyze your situation carefully and take this into account when choosing to use the Enum or the Stream module for a given task.

We'll be using functions from the Stream module in the application we'll build in this book. You'll learn more about the Stream module in Chapter 4, Powered by Erlang/OTP.

Elixir provides some functions that wrap most of the complex parts of building streams. If you want to build your own lazy stream, check out these functions from the Stream module: cycle, repeatedly, iterate, unfold, and resource. The full documentation for the Stream can be found at https://hexdocs.pm/elixir/Stream.html.

官术网_书友最值得收藏!

Mastering Elixir

Lazy processing with the stream module