官术网_书友最值得收藏!

Storage

If during the execution of a job, the user persists/cache an RDD then information about that RDD can be retrieved on this tab. It can be accessed at http://localhost:4040/storage/.

Let's launch Spark shell again, read a file, and run an action on it. However, this time we will cache the file before running an action on it.

Initially, when you launch Spark shell, the Storage tab appears blank.

Let's read the file using SparkContext, as follows:

scala>val file=sc.textFile("/usr/local/spark/examples/src/main/resources/people.txt")
file: org.apache.spark.rdd.RDD[String] = /usr/local/spark/examples/src/main/resources/people.txt MapPartitionsRDD[1] at textFile at <console>:24

This time we will cache this RDD. By default, it will be cached in memory:

scala>file.cache
res0: file.type = /usr/local/spark/examples/src/main/resources/people.txt MapPartitionsRDD[1] at textFile at <console>:24

As explained earlier, the DAG of transformations will only be executed when an action is performed, so the cache step will also be executed when we run an action on the RDD. So let's run a collect on it:

scala>file.collect
res1: Array[String] = Array(Michael, 29, Andy, 30, Justin, 19)

Now, you can find information about an RDD being cached on the Storage tab.

If you click on the RDD name, it provides information about the partitions on the RDD along with the address of the host on which the RDD is stored.

主站蜘蛛池模板: 藁城市| 平阴县| 庆阳市| 香港 | 辉南县| 临海市| 静安区| 桑日县| 泰来县| 东明县| 晴隆县| 龙胜| 南昌县| 商城县| 胶南市| 广德县| 临安市| 德安县| 武山县| 馆陶县| 西城区| 泽普县| 青龙| 蒲城县| 水城县| 平远县| 永新县| 闽清县| 宁安市| 葵青区| 芦山县| 江山市| 游戏| 中方县| 渭源县| 全州县| 汾西县| 高陵县| 大丰市| 安塞县| 胶州市|