- Learning Apache Spark 2
- Muhammad Asif Abbasi
- 377字
- 2021-07-09 18:45:58
Passing functions to Spark (Scala)
As you have seen in the previous example, passing functions is a critical functionality provided by Spark. From a user's point of view you would pass the function in your driver program, and Spark would figure out the location of the data partitions across the cluster memory, running it in parallel. The exact syntax of passing functions differs by the programming language. Since Spark has been written in Scala, we'll discuss Scala first.
In Scala, the recommended ways to pass functions to the Spark framework are as follows:
- Anonymous functions
- Static singleton methods
Anonymous functions
Anonymous functions are used for short pieces of code. They are also referred to as lambda expressions, and are a cool and elegant feature of the programming language. The reason they are called anonymous functions is because you can give any name to the input argument and the result would be the same.
For example, the following code examples would produce the same output:
val words = dataFile.map(line => line.split(" ")) val words = dataFile.map(anyline => anyline.split(" ")) val words = dataFile.map(_.split(" "))

Figure 2.11: Passing anonymous functions to Spark in Scala
Static singleton functions
While anonymous functions are really helpful for short snippets of code, they are not very helpful when you want to request the framework for a complex data manipulation. Static singleton functions come to the rescue with their own nuances, which we will discuss in this section.
Note
In software engineering, the Singleton pattern is a design pattern that restricts instantiation of a class to one object. This is useful when exactly one object is needed to coordinate actions across the system.
Static methods belong to the class and not an instance of it. They usually take input from the parameters, perform actions on it, and return the result.

Figure 2.12: Passing static singleton functions to Spark in Scala
Static singleton is the preferred way to pass functions, as technically you can create a class and call a method in the class instance. For example:
class UtilFunctions{ def split(inputParam: String): Array[String] = {inputParam.split(" ")} def operate(rdd: RDD[String]): RDD[String] ={rdd.map(split)} }
You can send a method in a class, but that has performance implications as the entire object would be sent along the method.
- 網絡服務器架設(Windows Server+Linux Server)
- Google App Inventor
- 大數據挑戰與NoSQL數據庫技術
- PostgreSQL Administration Essentials
- Creo Parametric 1.0中文版從入門到精通
- 中國戰略性新興產業研究與發展·工業機器人
- 步步圖解自動化綜合技能
- Enterprise PowerShell Scripting Bootcamp
- Azure PowerShell Quick Start Guide
- Applied Data Visualization with R and ggplot2
- Bayesian Analysis with Python
- SMS 2003部署與操作深入指南
- AI的25種可能
- Windows安全指南
- 和機器人一起進化