- Mastering Apache Spark 2.x(Second Edition)
- Romeo Kienzler
- 109字
- 2021-07-02 18:55:31
RDDs versus DataFrames versus Datasets
To make it clear, we are discouraging you from using RDDs unless there is a strong reason to do so for the following reasons:
- RDDs, on an abstraction level, are equivalent to assembler or machine code when it comes to system programming
- RDDs express how to do something and not what is to be achieved, leaving no room for optimizers
- RDDs have proprietary syntax; SQL is more widely known
Whenever possible, use Datasets because their static typing makes them faster. As long as you are using statically typed languages such as Java or Scala, you are fine. Otherwise, you have to stick with DataFrames.
推薦閱讀
- Learning Microsoft Windows Server 2012 Dynamic Access Control
- Hands-On Machine Learning with scikit:learn and Scientific Python Toolkits
- C語言程序設(shè)計(第3版)
- Java高并發(fā)核心編程(卷2):多線程、鎖、JMM、JUC、高并發(fā)設(shè)計模式
- Ceph Cookbook
- Oracle 12c中文版數(shù)據(jù)庫管理、應(yīng)用與開發(fā)實踐教程 (清華電腦學(xué)堂)
- Troubleshooting PostgreSQL
- Big Data Analytics
- 領(lǐng)域驅(qū)動設(shè)計:軟件核心復(fù)雜性應(yīng)對之道(修訂版)
- Java網(wǎng)絡(luò)編程實戰(zhàn)
- Mastering React
- Clean Code in C#
- ExtJS Web應(yīng)用程序開發(fā)指南第2版
- Java 從入門到項目實踐(超值版)
- Python深度學(xué)習(xí)(第2版)