書名： Machine Learning with Spark（Second Edition）
作者名： Rajdeep Dua Manpreet Singh Ghotra Nick Pentreath
本章字?jǐn)?shù)： 113字
更新時間： 2021-07-09 21:07:56

Performance improvements in Spark ML over Spark MLlib

Spark 2.0 uses Tungsten Engine, which is built using ideas of modern compilers and MPP databases. It emits optimized bytecode at runtime, which collapses the query into a single function. Hence, there is no need for virtual function calls. It also uses CPU registers to store intermediate data. This technique has been called whole stage code generation.

Reference : https://databricks.com/blog/2016/05/11/apache-spark-2-0-technical-preview-easier-faster-and-smarter.htmlSource: https://databricks.com/blog/2016/05/11/apache-spark-2-0-technical-preview-easier-faster-and-smarter.html

The upcoming table and graph show single function improvements between Spark 1.6 and Spark 2.0:

Chart comparing Performance improvements in Single line functions between Spark 1.6 and Spark 2.0

Table comparing Performance improvements in Single line functions between Spark 1.6 and Spark 2.0.

官术网_书友最值得收藏!

Machine Learning with Spark（Second Edition）

Performance improvements in Spark ML over Spark MLlib