- Mastering Apache Spark 2.x(Second Edition)
- Romeo Kienzler
- 144字
- 2021-07-02 18:55:30
Applying SQL table joins
In order to examine the table joins, we have created some additional test data. Let's consider banking data. We have an account table called account_data.json and a customer data table called client_data.json. So let's take a look at the two JSON files.
First, let's look at client.json:

Next, let's look at account.json:

As you can see, clientId of account.json refers to id of client.json. Therefore, we are able to join the two files but before we can do this, we have to load them:
var client = spark.read.json("client.json")
var account = spark.read.json("account.json")
Then we register these two DataFrames as temporary tables:
client.createOrReplaceTempView("client")
account.createOrReplaceTempView("account")
Let's query these individually, client first:

Then follow it up with account:

Now we can join the two tables:

Finally, let's calculate some aggregation on the amount of money that every client has on all his accounts:

- 零基礎學C++程序設計
- C語言程序設計案例教程(第2版)
- Java面向對象思想與程序設計
- Getting Started with ResearchKit
- ASP.NET Core 2 and Vue.js
- MariaDB High Performance
- Rust Essentials(Second Edition)
- 青少年信息學競賽
- Android Wear Projects
- Python Machine Learning Blueprints:Intuitive data projects you can relate to
- Angular Design Patterns
- Machine Learning for OpenCV
- Java核心編程
- Building Clouds with Windows Azure Pack
- R語言:邁向大數(shù)據(jù)之路