- Hands-On Data Analysis with Scala
- Rajesh Gupta
- 521字
- 2021-06-24 14:51:03
Scala Overview
Scala is a popular general-purpose, high-level programming language that typically runs on the Java Virtual Machine (JVM). JVM is a time-tested platform that has proven itself in terms of stability and performance. A large number of very powerful libraries and frameworks have been built using Java. For instance, in the context of data analysis, there are many Java libraries available to handle different data formats, such as XML, JSON, Avro, and so on. Scala's interoperability with such well-tested libraries helps increase a Scala programmer's productivity greatly.
When it comes to data analysis and processing, it is often the case that there is an abundance of data transformation tasks that need to be performed. Some examples of such tasks are mapping from one representation to another, filtering irrelevant data, and joining one set of data with another set. Trying to solve such problems using the object-oriented paradigm often means that we have to write a significant amount of boilerplate code even to perform a fairly simple task. Oftentimes, solving data problems requires thinking in terms of input and transformations that are to be applied to this input. Scala's functional programming model provides a set of features that facilitate writing code that is concise and expressive. Spark is a popular distributed data analytics engine that has almost entirely been written in Scala. In fact, there is a strong resemblance between the Scala collection API and the Spark API.
Most of the Java libraries can be used with relative ease from Scala code. One can easily mix object-oriented and functional styles of programming in the same Scala code base. This ability provides a very simple pathway to a great deal of productivity. Some of the major benefits of using Scala are as follows:
- Most Java libraries and frameworks can be reused from Scala. Scala code is compiled into Java byte code and runs on JVM. This makes it seamless to use Java code that has already been written from a Scala program. In fact, it is not uncommon to have a mix of both Java and Scala codes within a single project.
- Scala's functional constructs can be used to write code that is simple, concise, and expressive.
- We can still use object-oriented features where they are a better fit.
There are many useful data libraries and frameworks that are built using Scala. These are summarized later in this chapter. Apache Spark needs a special mention. Apache Spark has become a de facto standard for performing distributed data analysis at scale. Since Spark is almost entirely written in Scala, its integration with Scala is the most complete, even though it has support for Java, Python, and R as well. Spark's API has been heavily influenced by Scala's collection API. It also leverages Scala's case class features in its dataset API and significantly helps in reducing the writing of boilerplate code that is otherwise necessary for Java.
The following topics will be covered in this chapter:
- Installing and getting started with Scala
- Object-oriented and functional programming overview
- Scala case classes and the collection API
- Overview of Scala libraries for data analysis
- 精通MATLAB神經(jīng)網(wǎng)絡(luò)
- 機器學習及應用(在線實驗+在線自測)
- 腦動力:C語言函數(shù)速查效率手冊
- 物聯(lián)網(wǎng)與云計算
- SharePoint 2010開發(fā)最佳實踐
- WordPress Theme Development Beginner's Guide(Third Edition)
- 3D Printing for Architects with MakerBot
- Prometheus監(jiān)控實戰(zhàn)
- 單片機技術(shù)一學就會
- 中國戰(zhàn)略性新興產(chǎn)業(yè)研究與發(fā)展·智能制造裝備
- Linux嵌入式系統(tǒng)開發(fā)
- Working with Linux:Quick Hacks for the Command Line
- R Machine Learning Projects
- 新一代人工智能與語音識別
- 樂高創(chuàng)意機器人教程(中級 上冊 10~16歲) (青少年iCAN+創(chuàng)新創(chuàng)意實踐指導叢書)