- Hands-On Data Analysis with Scala
- Rajesh Gupta
- 521字
- 2021-06-24 14:51:03
Scala Overview
Scala is a popular general-purpose, high-level programming language that typically runs on the Java Virtual Machine (JVM). JVM is a time-tested platform that has proven itself in terms of stability and performance. A large number of very powerful libraries and frameworks have been built using Java. For instance, in the context of data analysis, there are many Java libraries available to handle different data formats, such as XML, JSON, Avro, and so on. Scala's interoperability with such well-tested libraries helps increase a Scala programmer's productivity greatly.
When it comes to data analysis and processing, it is often the case that there is an abundance of data transformation tasks that need to be performed. Some examples of such tasks are mapping from one representation to another, filtering irrelevant data, and joining one set of data with another set. Trying to solve such problems using the object-oriented paradigm often means that we have to write a significant amount of boilerplate code even to perform a fairly simple task. Oftentimes, solving data problems requires thinking in terms of input and transformations that are to be applied to this input. Scala's functional programming model provides a set of features that facilitate writing code that is concise and expressive. Spark is a popular distributed data analytics engine that has almost entirely been written in Scala. In fact, there is a strong resemblance between the Scala collection API and the Spark API.
Most of the Java libraries can be used with relative ease from Scala code. One can easily mix object-oriented and functional styles of programming in the same Scala code base. This ability provides a very simple pathway to a great deal of productivity. Some of the major benefits of using Scala are as follows:
- Most Java libraries and frameworks can be reused from Scala. Scala code is compiled into Java byte code and runs on JVM. This makes it seamless to use Java code that has already been written from a Scala program. In fact, it is not uncommon to have a mix of both Java and Scala codes within a single project.
- Scala's functional constructs can be used to write code that is simple, concise, and expressive.
- We can still use object-oriented features where they are a better fit.
There are many useful data libraries and frameworks that are built using Scala. These are summarized later in this chapter. Apache Spark needs a special mention. Apache Spark has become a de facto standard for performing distributed data analysis at scale. Since Spark is almost entirely written in Scala, its integration with Scala is the most complete, even though it has support for Java, Python, and R as well. Spark's API has been heavily influenced by Scala's collection API. It also leverages Scala's case class features in its dataset API and significantly helps in reducing the writing of boilerplate code that is otherwise necessary for Java.
The following topics will be covered in this chapter:
- Installing and getting started with Scala
- Object-oriented and functional programming overview
- Scala case classes and the collection API
- Overview of Scala libraries for data analysis
- MCSA Windows Server 2016 Certification Guide:Exam 70-741
- 深度學習中的圖像分類與對抗技術
- CompTIA Linux+ Certification Guide
- 工業機器人操作與編程
- Kubernetes for Serverless Applications
- Deep Reinforcement Learning Hands-On
- 突破,Objective-C開發速學手冊
- 悟透AutoCAD 2009案例自學手冊
- Building a BeagleBone Black Super Cluster
- Unity Multiplayer Games
- Photoshop CS5圖像處理入門、進階與提高
- 工業機器人入門實用教程
- 實戰突擊
- 工程地質地學信息遙感自動提取技術
- AVR單片機C語言程序設計實例精粹