- Mastering Apache Spark 2.x(Second Edition)
- Romeo Kienzler
- 308字
- 2021-07-02 18:55:31
User-defined functions
In order to create user-defined functions in Scala, we need to examine our data in the previous Dataset. We will use the age property on the client entries in the previously introduced client.json. We plan to create an UDF that will enumerate the age column. This will be useful if we need to use the data for machine learning as a lesser number of different values is sometimes useful. This process is also called binning or categorization. This is the JSON file with the age property added:

Now let's define a Scala enumeration that converts ages into age range codes. If we use this enumeration among all our relations, we can ensure consistent and proper coding of these ranges:
object AgeRange extends Enumeration {
val Zero, Ten, Twenty, Thirty, Fourty, Fifty, Sixty, Seventy, Eighty, Ninety, HundretPlus = Value
def getAgeRange(age: Integer) = {
age match {
case age if 0 until 10 contains age => Zero
case age if 11 until 20 contains age => Ten
case age if 21 until 30 contains age => Twenty
case age if 31 until 40 contains age => Thirty
case age if 41 until 50 contains age => Fourty
case age if 51 until 60 contains age => Fifty
case age if 61 until 70 contains age => Sixty
case age if 71 until 80 contains age => Seventy
case age if 81 until 90 contains age => Eighty
case age if 91 until 100 contains age => Ninety
case _ => HundretPlus
}
}
def asString(age: Integer) = getAgeRange(age).toString
}
We can now register this function using SparkSession in Scala so that it can be used in a SQL statement:

The newly registered function called toAgeRange can now be used in the select statement. It takes age as a parameter and returns a string for the age range:

- Learning Cython Programming
- PHP 7底層設(shè)計(jì)與源碼實(shí)現(xiàn)
- 控糖控脂健康餐
- CentOS 7 Linux Server Cookbook(Second Edition)
- Dependency Injection in .NET Core 2.0
- Windows Server 2016 Automation with PowerShell Cookbook(Second Edition)
- HTML 5與CSS 3權(quán)威指南(第3版·上冊(cè))
- Terraform:多云、混合云環(huán)境下實(shí)現(xiàn)基礎(chǔ)設(shè)施即代碼(第2版)
- 軟件測(cè)試綜合技術(shù)
- 零基礎(chǔ)看圖學(xué)ScratchJr:少兒趣味編程(全彩大字版)
- Magento 2 Beginners Guide
- iOS開(kāi)發(fā)項(xiàng)目化入門(mén)教程
- 基于GPU加速的計(jì)算機(jī)視覺(jué)編程:使用OpenCV和CUDA實(shí)時(shí)處理復(fù)雜圖像數(shù)據(jù)
- HTML5移動(dòng)Web開(kāi)發(fā)
- Python滲透測(cè)試編程技術(shù):方法與實(shí)踐(第2版)