官术网_书友最值得收藏!

Understanding the workings of the Catalyst Optimizer

So how does the optimizer work? The following figure shows the core components and how they are involved in a sequential optimization process:

First of all, it has to be understood that it doesn't matter if a DataFrame, the Dataset API, or SQL is used. They all result in the same Unresolved Logical Execution Plan (ULEP). A QueryPlan is unresolved if the column names haven't been verified and the column types haven't been looked up in the catalog. A Resolved Logical Execution Plan (RLEP) is then transformed multiple times, until it results in an Optimized Logical Execution Plan. LEPs don't contain a description of how something is computed, but only what has to be computed. The optimized LEP is transformed into multiple Physical Execution Plans (PEP) using so-called strategies. Finally, an optimal PEP is selected to be executed using a cost model by taking statistics about the Dataset to be queried into account. Note that the final execution takes place on RDD objects.

主站蜘蛛池模板: 揭东县| 长泰县| 平定县| 靖远县| 苗栗市| 延安市| 娄底市| 方山县| 沙湾县| 利川市| 长春市| 海南省| 平度市| 中超| 濮阳市| 保定市| 镇宁| 雷州市| 邯郸市| 山东省| 台江县| 宜川县| 专栏| 庄河市| 卓资县| 汾西县| 石城县| 剑阁县| 昭苏县| 大兴区| 彰化县| 林甸县| 娄烦县| 定兴县| 佛山市| 定安县| 金华市| 鄂伦春自治旗| 江山市| 凌海市| 隆昌县|