官术网_书友最值得收藏!

Spark shell

We will go back into our Spark folder, which is spark-2.3.2-bin-hadoop2.7, and start our PySpark binary by typing .\bin\pyspark.

We can see that we've started a shell session with Spark in the following screenshot:

Spark is now available to us as a spark variable. Let's try a simple thing in Spark. The first thing to do is to load a random file. In each Spark installation, there is a README.md markdown file, so let's load it into our memory as follows:

text_file = spark.read.text("README.md")

If we use spark.read.text and then put in README.md, we get a few warnings, but we shouldn't be too concerned about that at the moment, as we will see later how we are going to fix these things. The main thing here is that we can use Python syntax to access Spark.

What we have done here is put README.md as text data read by spark into Spark, and we can use text_file.count() can get Spark to count how many characters are in our text file as follows:

text_file.count()

From this, we get the following output:

103

We can also see what the first line is with the following:

text_file.first()

We will get the following output:

Row(value='# Apache Spark')

We can now count a number of lines that contain the word Spark by doing the following:

lines_with_spark = text_file.filter(text_file.value.contains("Spark"))

Here, we have filtered for lines using the filter() function, and within the filter() function, we have specified that text_file_value.contains includes the word "Spark", and we have put those results into the lines_with_spark variable.

We can modify the preceding command and simply add .count(), as follows: 

text_file.filter(text_file.value.contains("Spark")).count()

We will now get the following output:

20

We can see that 20 lines in the text file contain the word Spark. This is just a simple example of how we can use the Spark shell.

主站蜘蛛池模板: 旺苍县| 南郑县| 浠水县| 大宁县| 玛曲县| 驻马店市| 屏边| 茶陵县| 平陆县| 长沙市| 肃宁县| 佳木斯市| 湖口县| 九寨沟县| 忻州市| 北川| 承德市| 镇远县| 嘉峪关市| 永州市| 尚志市| 太仆寺旗| 维西| 雷州市| 石楼县| 巫溪县| 哈尔滨市| 永兴县| 教育| 蛟河市| 从化市| 崇文区| 同德县| 青岛市| 雷波县| 天峻县| 丹凤县| 中阳县| 英超| 静海县| 镇康县|