- Apache Spark 2.x for Java Developers
- Sourav Gulati Sumit Kumar
- 215字
- 2021-07-02 19:01:58
Intermediate operations
Intermediate operations always return another stream and get lazily evaluated only when terminal operations are called. The feature of lazy evaluation optimizes intermediate operations when multiple operations are chained together as evaluation only takes place after terminal operation. Another scenario where lazy evaluation tends to be useful is in use cases of infinite or large streams as iteration over an entire stream may not be required or even possible, such as anyMatch, findFirst(), and so on. In these scenarios, short circuiting (discussed in the next section) takes place and the terminal operation exits the flow just after meeting the criteria rather than iterating over entire elements.
Intermediate operations can further be sub-divided into stateful and stateless operations. Stateful operations preserve the last seen value, as in methods such as sorted(), limit(), sequential(), and so on since they need them while processing the current record. For example, the limit() method needs to keep a check on the maximum number of elements allowed in that stream and it can only be achieved if we possess the count of records previously stored in the stream. A stateless operation has no such obligation to manage the state of stream and hence operations such as peek(), map(), filter(), and so on do not possess any state as such: