- Mastering Apache Spark 2.x(Second Edition)
- Romeo Kienzler
- 144字
- 2021-07-02 18:55:30
Applying SQL table joins
In order to examine the table joins, we have created some additional test data. Let's consider banking data. We have an account table called account_data.json and a customer data table called client_data.json. So let's take a look at the two JSON files.
First, let's look at client.json:

Next, let's look at account.json:

As you can see, clientId of account.json refers to id of client.json. Therefore, we are able to join the two files but before we can do this, we have to load them:
var client = spark.read.json("client.json")
var account = spark.read.json("account.json")
Then we register these two DataFrames as temporary tables:
client.createOrReplaceTempView("client")
account.createOrReplaceTempView("account")
Let's query these individually, client first:

Then follow it up with account:

Now we can join the two tables:

Finally, let's calculate some aggregation on the amount of money that every client has on all his accounts:

- Python程序設(shè)計(jì)教程(第2版)
- Learning Real-time Processing with Spark Streaming
- Objective-C應(yīng)用開(kāi)發(fā)全程實(shí)錄
- Building Cross-Platform Desktop Applications with Electron
- Python貝葉斯分析(第2版)
- Java EE 7 Performance Tuning and Optimization
- Scala程序員面試算法寶典
- Haskell Data Analysis Cookbook
- 零基礎(chǔ)趣學(xué)C語(yǔ)言
- PHP+MySQL+Dreamweaver動(dòng)態(tài)網(wǎng)站開(kāi)發(fā)從入門(mén)到精通(第3版)
- 動(dòng)手學(xué)數(shù)據(jù)結(jié)構(gòu)與算法
- 運(yùn)維前線:一線運(yùn)維專家的運(yùn)維方法、技巧與實(shí)踐
- 算法設(shè)計(jì)與分析:基于C++編程語(yǔ)言的描述
- R語(yǔ)言實(shí)戰(zhàn)(第2版)
- 軟件再工程:優(yōu)化現(xiàn)有軟件系統(tǒng)的方法與最佳實(shí)踐