官术网_书友最值得收藏!

Building Spark applications

Using Spark in an interactive mode with the Spark shell is very good for quick prototyping; however for developing applications, we need an IDE. The choices for the Spark IDE have come a long way since the days of Spark 1.0. One can use an array of the Spark IDEs for developing algorithms, data wrangling (that is, exploring data), and modeling analytics applications. As a general rule of thumb, iPython and Zeppelin are used for data exploration IDEs. The language of choice for iPython is Python and Scala/Java for Zeppelin. This is a general observation; all of them can handle the major languages; Scala, Java, Python, and SQL. For developing Scala and Java, the preferred IDE is Eclipse and IntelliJ. We will mostly use the Spark shell (and occasionally iPython) in this book, as our focus is data wrangling and understanding the Spark APIs. Of course, deploying Spark applications require compiling for Java and Scala.

Building the Spark jobs is a bit trickier than building a normal application as all dependencies have to be available on all the machines that are in your cluster.

In this chapter, we will first look at iPython and Eclipse, and then cover the process of building a Java and Scala Spark job with Maven, and learn to build the Spark jobs with a non-Maven aware build system. A reference website for building Spark is at http://spark.apache.org/docs/latest/building-spark.html.

主站蜘蛛池模板: 鹰潭市| 甘谷县| 如皋市| 丰城市| 肃北| 轮台县| 历史| 九台市| 东乌珠穆沁旗| 禹城市| 桐乡市| 五常市| 化隆| 罗山县| 黔江区| 辉南县| 堆龙德庆县| 巫山县| 呼玛县| 密云县| 南丰县| 疏附县| 鄂伦春自治旗| 纳雍县| 靖江市| 张掖市| 哈巴河县| 泰顺县| 信丰县| 特克斯县| 卓资县| 西青区| 鸡东县| 阳春市| 鞍山市| 陆河县| 麻城市| 新蔡县| 万盛区| 剑阁县| 淅川县|