官术网_书友最值得收藏!

Applying SQL table joins

In order to examine the table joins, we have created some additional test data. Let's consider banking data. We have an account table called account_data.json and a customer data table called client_data.json. So let's take a look at the two JSON files.

First, let's look at client.json:

Next, let's look at account.json:

As you can see, clientId of account.json refers to id of client.json. Therefore, we are able to join the two files but before we can do this, we have to load them:

var client = spark.read.json("client.json")
var account = spark.read.json("account.json")

Then we register these two DataFrames as temporary tables:

client.createOrReplaceTempView("client")
account.createOrReplaceTempView("account")

Let's query these individually, client first:

Then follow it up with account:

Now we can join the two tables:

Finally, let's calculate some aggregation on the amount of money that every client has on all his accounts:

主站蜘蛛池模板: 汕头市| 荣成市| 法库县| 大连市| 治县。| 尼木县| 张家川| 灯塔市| 巴彦淖尔市| 固原市| 许昌市| 万安县| 景德镇市| 潮安县| 湘阴县| 枣阳市| 惠州市| 永州市| 望江县| 西丰县| 阿合奇县| 改则县| 甘孜县| 肃南| 罗山县| 桐庐县| 乌鲁木齐县| 黄平县| 京山县| 迁西县| 南雄市| 乡宁县| 余干县| 洛隆县| 东辽县| 桓台县| 安岳县| 镇原县| 洛扎县| 靖州| 巴林左旗|