官术网_书友最值得收藏!

Scala Overview

Scala is a popular general-purpose, high-level programming language that typically runs on the Java Virtual Machine (JVM). JVM is a time-tested platform that has proven itself in terms of stability and performance. A large number of very powerful libraries and frameworks have been built using Java. For instance, in the context of data analysis, there are many Java libraries available to handle different data formats, such as XML, JSON, Avro, and so on. Scala's interoperability with such well-tested libraries helps increase a Scala programmer's productivity greatly.

When it comes to data analysis and processing, it is often the case that there is an abundance of data transformation tasks that need to be performed. Some examples of such tasks are mapping from one representation to another, filtering irrelevant data, and joining one set of data with another set. Trying to solve such problems using the object-oriented paradigm often means that we have to write a significant amount of boilerplate code even to perform a fairly simple task. Oftentimes, solving data problems requires thinking in terms of input and transformations that are to be applied to this input. Scala's functional programming model provides a set of features that facilitate writing code that is concise and expressive. Spark is a popular distributed data analytics engine that has almost entirely been written in Scala. In fact, there is a strong resemblance between the Scala collection API and the Spark API.

Most of the Java libraries can be used with relative ease from Scala code. One can easily mix object-oriented and functional styles of programming in the same Scala code base. This ability provides a very simple pathway to a great deal of productivity. Some of the major benefits of using Scala are as follows:

  • Most Java libraries and frameworks can be reused from Scala. Scala code is compiled into Java byte code and runs on JVM. This makes it seamless to use Java code that has already been written from a Scala program. In fact, it is not uncommon to have a mix of both Java and Scala codes within a single project.
  • Scala's functional constructs can be used to write code that is simple, concise, and expressive. 
  • We can still use object-oriented features where they are a better fit.

There are many useful data libraries and frameworks that are built using Scala. These are summarized later in this chapter. Apache Spark needs a special mention. Apache Spark has become a de facto standard for performing distributed data analysis at scale. Since Spark is almost entirely written in Scala, its integration with Scala is the most complete, even though it has support for Java, Python, and R as well. Spark's API has been heavily influenced by Scala's collection API. It also leverages Scala's case class features in its dataset API and significantly helps in reducing the writing of boilerplate code that is otherwise necessary for Java.

The following topics will be covered in this chapter:

  • Installing and getting started with Scala
  • Object-oriented and functional programming overview
  • Scala case classes and the collection API
  • Overview of Scala libraries for data analysis
主站蜘蛛池模板: 泰和县| 益阳市| 东方市| 隆林| 关岭| 清水县| 海晏县| 和顺县| 亳州市| 即墨市| 桦南县| 安图县| 巴林左旗| 岗巴县| 兴山县| 什邡市| 靖西县| 广平县| 阿鲁科尔沁旗| 江油市| 庐江县| 宁明县| 陇西县| 宁安市| 连平县| 肥城市| 朝阳市| 元谋县| 永胜县| 阿巴嘎旗| 西乡县| 紫云| 东兴市| 弋阳县| 五原县| 镇康县| 互助| 高淳县| 公安县| 临桂县| 钟山县|