- Machine Learning with Spark(Second Edition)
- Rajdeep Dua Manpreet Singh Ghotra Nick Pentreath
- 214字
- 2021-07-09 21:07:41
SchemaRDD
SchemaRDD is a combination of RDD and schema information. It also offers many rich and easy-to-use APIs (that is, the DataSet API). SchemaRDD is not used with 2.0 and is internally used by DataFrame and Dataset APIs.
A schema is used to describe how structured data is logically organized. After obtaining the schema information, the SQL engine is able to provide the structured query capability for the corresponding data. The DataSet API is a replacement for Spark SQL parser's functions. It is an API to achieve the original program logic tree. Subsequent processing steps reuse Spark SQL's core logic. We can safely consider DataSet API's processing functions as completely equivalent to that of SQL queries.
SchemaRDD is an RDD subclass. When a program calls the DataSet API, a new SchemaRDD object is created, and a logic plan attribute of the new object is created by adding a new logic operation node on the original logic plan tree. Operations of the DataSet API (like RDD) are of two types--Transformation and Action.
APIs related to the relational operations are attributed to the Transformation type.
Operations associated with data output sources are of Action type. Like RDD, a Spark job is triggered and delivered for cluster execution, only when an Action type operation is called.
- 后稀缺:自動化與未來工作
- 玩轉智能機器人程小奔
- ABB工業機器人編程全集
- Dreamweaver CS3網頁制作融會貫通
- Learning Apache Spark 2
- Zabbix Network Monitoring(Second Edition)
- STM32G4入門與電機控制實戰:基于X-CUBE-MCSDK的無刷直流電機與永磁同步電機控制實現
- 可編程控制器技術應用(西門子S7系列)
- RPA(機器人流程自動化)快速入門:基于Blue Prism
- Machine Learning with the Elastic Stack
- Azure PowerShell Quick Start Guide
- MATLAB-Simulink系統仿真超級學習手冊
- Mastering OpenStack(Second Edition)
- 計算機辦公應用培訓教程
- Hands-On Geospatial Analysis with R and QGIS