- Machine Learning with Spark(Second Edition)
- Rajdeep Dua Manpreet Singh Ghotra Nick Pentreath
- 214字
- 2021-07-09 21:07:41
SchemaRDD
SchemaRDD is a combination of RDD and schema information. It also offers many rich and easy-to-use APIs (that is, the DataSet API). SchemaRDD is not used with 2.0 and is internally used by DataFrame and Dataset APIs.
A schema is used to describe how structured data is logically organized. After obtaining the schema information, the SQL engine is able to provide the structured query capability for the corresponding data. The DataSet API is a replacement for Spark SQL parser's functions. It is an API to achieve the original program logic tree. Subsequent processing steps reuse Spark SQL's core logic. We can safely consider DataSet API's processing functions as completely equivalent to that of SQL queries.
SchemaRDD is an RDD subclass. When a program calls the DataSet API, a new SchemaRDD object is created, and a logic plan attribute of the new object is created by adding a new logic operation node on the original logic plan tree. Operations of the DataSet API (like RDD) are of two types--Transformation and Action.
APIs related to the relational operations are attributed to the Transformation type.
Operations associated with data output sources are of Action type. Like RDD, a Spark job is triggered and delivered for cluster execution, only when an Action type operation is called.
- Word 2003、Excel 2003、PowerPoint 2003上機(jī)指導(dǎo)與練習(xí)
- Windows XP中文版應(yīng)用基礎(chǔ)
- iClone 4.31 3D Animation Beginner's Guide
- C語(yǔ)言寶典
- 觸控顯示技術(shù)
- Learn CloudFormation
- Learning Azure Cosmos DB
- 工業(yè)機(jī)器人運(yùn)動(dòng)仿真編程實(shí)踐:基于Android和OpenGL
- 工業(yè)機(jī)器人維護(hù)與保養(yǎng)
- Red Hat Linux 9實(shí)務(wù)自學(xué)手冊(cè)
- DevOps Bootcamp
- 智能生產(chǎn)線的重構(gòu)方法
- Mastering pfSense
- 單片機(jī)技術(shù)項(xiàng)目化原理與實(shí)訓(xùn)
- 21天學(xué)通Linux嵌入式開(kāi)發(fā)