官术网_书友最值得收藏!

Applying SQL table joins

In order to examine the table joins, we have created some additional test data. Let's consider banking data. We have an account table called account_data.json and a customer data table called client_data.json. So let's take a look at the two JSON files.

First, let's look at client.json:

Next, let's look at account.json:

As you can see, clientId of account.json refers to id of client.json. Therefore, we are able to join the two files but before we can do this, we have to load them:

var client = spark.read.json("client.json")
var account = spark.read.json("account.json")

Then we register these two DataFrames as temporary tables:

client.createOrReplaceTempView("client")
account.createOrReplaceTempView("account")

Let's query these individually, client first:

Then follow it up with account:

Now we can join the two tables:

Finally, let's calculate some aggregation on the amount of money that every client has on all his accounts:

主站蜘蛛池模板: 台南县| 开化县| 勐海县| 田阳县| 宝丰县| 商丘市| 广德县| 乌恰县| 宜黄县| 石屏县| 莱州市| 增城市| 花莲市| 固安县| 宝丰县| 延津县| 旺苍县| 慈利县| 夏河县| 会宁县| 永仁县| 堆龙德庆县| 潼南县| 儋州市| 凤山县| 襄垣县| 巩义市| 阜新市| 秦安县| 乐清市| 宜宾县| 米脂县| 三明市| 南昌市| 冷水江市| 小金县| 五原县| 肇东市| 大安市| 永福县| 漳浦县|