- Mastering Apache Spark 2.x(Second Edition)
- Romeo Kienzler
- 109字
- 2021-07-02 18:55:31
RDDs versus DataFrames versus Datasets
To make it clear, we are discouraging you from using RDDs unless there is a strong reason to do so for the following reasons:
- RDDs, on an abstraction level, are equivalent to assembler or machine code when it comes to system programming
- RDDs express how to do something and not what is to be achieved, leaving no room for optimizers
- RDDs have proprietary syntax; SQL is more widely known
Whenever possible, use Datasets because their static typing makes them faster. As long as you are using statically typed languages such as Java or Scala, you are fine. Otherwise, you have to stick with DataFrames.
推薦閱讀
- The Complete Rust Programming Reference Guide
- Implementing Modern DevOps
- AngularJS Web Application Development Blueprints
- Python神經網絡項目實戰
- Web全棧工程師的自我修養
- Backbone.js Blueprints
- MySQL數據庫基礎實例教程(微課版)
- Python貝葉斯分析(第2版)
- C語言程序設計
- SQL Server 2016數據庫應用與開發
- 數據結構案例教程(C/C++版)
- Multithreading in C# 5.0 Cookbook
- Python機器學習算法: 原理、實現與案例
- Visual Foxpro 9.0數據庫程序設計教程
- Visual Studio Code 權威指南