官术网_书友最值得收藏!

SchemaRDD

SchemaRDD is a combination of RDD and schema information. It also offers many rich and easy-to-use APIs (that is, the DataSet API). SchemaRDD is not used with 2.0 and is internally used by DataFrame and Dataset APIs.

A schema is used to describe how structured data is logically organized. After obtaining the schema information, the SQL engine is able to provide the structured query capability for the corresponding data. The DataSet API is a replacement for Spark SQL parser's functions. It is an API to achieve the original program logic tree. Subsequent processing steps reuse Spark SQL's core logic. We can safely consider DataSet API's processing functions as completely equivalent to that of SQL queries.

SchemaRDD is an RDD subclass. When a program calls the DataSet API, a new SchemaRDD object is created, and a logic plan attribute of the new object is created by adding a new logic operation node on the original logic plan tree. Operations of the DataSet API (like RDD) are of two types--Transformation and Action.

APIs related to the relational operations are attributed to the Transformation type.

Operations associated with data output sources are of Action type. Like RDD, a Spark job is triggered and delivered for cluster execution, only when an Action type operation is called.

主站蜘蛛池模板: 界首市| 临西县| 大理市| 安西县| 杭锦后旗| 阆中市| 普洱| 巴中市| 隆子县| 基隆市| 乌鲁木齐县| 固始县| 伊吾县| 通化县| 邹平县| 鹤壁市| 商都县| 金平| 广南县| 贵阳市| 建水县| 云梦县| 历史| 宁武县| 环江| 车致| 宁阳县| 沛县| 米泉市| 石棉县| 寿光市| 新绛县| 阿克| 泗水县| 邵武市| 宜兰县| 乐安县| 马龙县| 二手房| 苍山县| 晴隆县|