官术网_书友最值得收藏!

Understanding the workings of the Catalyst Optimizer

So how does the optimizer work? The following figure shows the core components and how they are involved in a sequential optimization process:

First of all, it has to be understood that it doesn't matter if a DataFrame, the Dataset API, or SQL is used. They all result in the same Unresolved Logical Execution Plan (ULEP). A QueryPlan is unresolved if the column names haven't been verified and the column types haven't been looked up in the catalog. A Resolved Logical Execution Plan (RLEP) is then transformed multiple times, until it results in an Optimized Logical Execution Plan. LEPs don't contain a description of how something is computed, but only what has to be computed. The optimized LEP is transformed into multiple Physical Execution Plans (PEP) using so-called strategies. Finally, an optimal PEP is selected to be executed using a cost model by taking statistics about the Dataset to be queried into account. Note that the final execution takes place on RDD objects.

主站蜘蛛池模板: 和田市| 黔西| 岱山县| 凌云县| 松阳县| 叙永县| 邯郸市| 富阳市| 贡嘎县| 昌图县| 当阳市| 宽城| 嵊州市| 屏山县| 关岭| 东辽县| 尉氏县| 鹤峰县| 平南县| 息烽县| 张家界市| 河池市| 龙门县| 祁东县| 崇仁县| 江都市| 商都县| 山丹县| 宜都市| 当涂县| 城市| 纳雍县| 新乡县| 葫芦岛市| 上犹县| 鹤庆县| 吉安县| 武安市| 鹤山市| 太仆寺旗| 渝北区|