- Apache Spark 2.x for Java Developers
- Sourav Gulati Sumit Kumar
- 331字
- 2021-07-02 19:01:54
Lazy evaluation
Another important thing to understand about RDD is Lazy evaluation. Spark creates a DAG, also called the lineage graph, of all the operations you perform on an RDD. Execution of the graph starts only when an action is performed on RDD. Let's consider an example of DAG operations on RDD:

Here, first an RDD is calculated by reading data from a stable storage and two of the transformations are performed on the RDD and then finally an action is performed to get the result.
Look at the previous diagram; one would infer that RDD1 will be created as soon as a Spark job finds the step to create it from the database and then it will find the transformation steps, so it will perform transformations. Then it finds an action and so it will run the given action to calculate the result. However, this is not true.
In reality, a Spark job will start creating DAG steps until it finds a step that asks it to perform action on RDD. When the job finds this step, it starts executing the DAG from the first vertex.
The following are the benefits of this approach:
- Fault tolerance: The lineage graph of the operations on an RDD, makes it fault tolerant. Since Spark is well aware of the steps it needs to perform to create an RDD, it can recalculate the RDD or its partitions in case of failure of the previous step instead of repeating the whole process again. For example, with DAG, if a partition of RDD is lost while processing, it can be calculated from RDD2, instead of repeating the process of calculating it from the database and performing two transformations. This gives a huge benefit of saving time and resources in case of failures.
- Optimizing resource usage: As Spark knows all the steps to be performed to calculate the end result in advance, it can leverage this information to use the cluster resources in a most optimized manner.
- ASP.NET Web API:Build RESTful web applications and services on the .NET framework
- 零基礎搭建量化投資系統:以Python為工具
- 數據庫程序員面試筆試真題與解析
- Manga Studio Ex 5 Cookbook
- Mastering Entity Framework
- 算法基礎:打開程序設計之門
- Spring核心技術和案例實戰
- Learning OpenCV 3 Computer Vision with Python(Second Edition)
- Extending Puppet(Second Edition)
- Clean Code in C#
- Advanced UFT 12 for Test Engineers Cookbook
- 零代碼實戰:企業級應用搭建與案例詳解
- Oracle實用教程
- Photoshop智能手機APP界面設計
- 金融商業數據分析:基于Python和SAS