- Mastering Apache Spark 2.x(Second Edition)
- Romeo Kienzler
- 170字
- 2021-07-02 18:55:32
Understanding the workings of the Catalyst Optimizer
So how does the optimizer work? The following figure shows the core components and how they are involved in a sequential optimization process:

First of all, it has to be understood that it doesn't matter if a DataFrame, the Dataset API, or SQL is used. They all result in the same Unresolved Logical Execution Plan (ULEP). A QueryPlan is unresolved if the column names haven't been verified and the column types haven't been looked up in the catalog. A Resolved Logical Execution Plan (RLEP) is then transformed multiple times, until it results in an Optimized Logical Execution Plan. LEPs don't contain a description of how something is computed, but only what has to be computed. The optimized LEP is transformed into multiple Physical Execution Plans (PEP) using so-called strategies. Finally, an optimal PEP is selected to be executed using a cost model by taking statistics about the Dataset to be queried into account. Note that the final execution takes place on RDD objects.
- aelf區塊鏈應用架構指南
- Building Cross-Platform Desktop Applications with Electron
- Python自然語言處理(微課版)
- Kali Linux Wireless Penetration Testing Beginner's Guide(Third Edition)
- jQuery開發基礎教程
- 焊接機器人系統操作、編程與維護
- RabbitMQ Essentials
- Learning Node.js for .NET Developers
- 機器學習微積分一本通(Python版)
- Python函數式編程(第2版)
- LabVIEW數據采集
- Java Hibernate Cookbook
- 現代CPU性能分析與優化
- Android編程權威指南(第4版)
- Python機器學習開發實戰