官术网_书友最值得收藏!

Physical Execution Plan generation and selection

The Resolved and Optimized LEP is used to generate a large set of PEP candidates. PEPs are execution plans that have been completely resolved. This means that a PEP contains detailed instructions to generate the desired result. They are generated by so-called strategies. Strategies are used to optimize selection of join algorithms based on statistics. In addition, rules are executed for example to pipeline multiple operations on an RDD into a single, more complex operation. After a set of PEPs has been generated - they all will return the exact same result - the best one is chosen based on heuristics in order to minimize execution time.

In case the data source supports it, operations are pushed down to the source, namely for filtering (predicate) or selection of attributes (projection). This concept is explained in very detail on Chapter 2, Apache Spark SQL, in the section called Predicate push-down on smart data sources.

The main idea of predicate push-down is that parts of the AST are not executed by Apache Spark but by the data source itself. So for example filtering rows on column names can be done much more efficient by a relational or NoSQL database since it sits closer to the data and therefore can avoid data transfers between the database and Apache Spark. Also, the removal of unnecessary columns is a job done more effectively by the database.

主站蜘蛛池模板: 华阴市| 石嘴山市| 吐鲁番市| 墨竹工卡县| 珲春市| 滕州市| 绥宁县| 赤峰市| 赤壁市| 静安区| 麦盖提县| 云阳县| 凤山市| 涪陵区| 信丰县| 神池县| 斗六市| 永川市| 东阿县| 绥化市| 青海省| 鄂托克旗| 纳雍县| 东丽区| 桦甸市| 无锡市| 十堰市| 江达县| 饶阳县| 安平县| 库尔勒市| 定边县| 彭阳县| 杂多县| 田东县| 荣昌县| 逊克县| 南投县| 长阳| 临安市| 乌海市|