官术网_书友最值得收藏!

Applying SQL table joins

In order to examine the table joins, we have created some additional test data. Let's consider banking data. We have an account table called account_data.json and a customer data table called client_data.json. So let's take a look at the two JSON files.

First, let's look at client.json:

Next, let's look at account.json:

As you can see, clientId of account.json refers to id of client.json. Therefore, we are able to join the two files but before we can do this, we have to load them:

var client = spark.read.json("client.json")
var account = spark.read.json("account.json")

Then we register these two DataFrames as temporary tables:

client.createOrReplaceTempView("client")
account.createOrReplaceTempView("account")

Let's query these individually, client first:

Then follow it up with account:

Now we can join the two tables:

Finally, let's calculate some aggregation on the amount of money that every client has on all his accounts:

主站蜘蛛池模板: 南江县| 西充县| 杭锦旗| 余庆县| 唐海县| 武定县| 杭州市| 岳西县| 九江市| 奉化市| 永嘉县| 会泽县| 梅州市| 锡林浩特市| 广元市| 新沂市| 图片| 兴文县| 霸州市| 枣庄市| 中西区| 东平县| 永德县| 内丘县| 双牌县| 白朗县| 长兴县| 松潘县| 仙游县| 沙坪坝区| 收藏| 马龙县| 区。| 扎鲁特旗| 怀远县| 尤溪县| 西丰县| 缙云县| 天门市| 将乐县| 吉林市|