官术网_书友最值得收藏!

Storage

If during the execution of a job, the user persists/cache an RDD then information about that RDD can be retrieved on this tab. It can be accessed at http://localhost:4040/storage/.

Let's launch Spark shell again, read a file, and run an action on it. However, this time we will cache the file before running an action on it.

Initially, when you launch Spark shell, the Storage tab appears blank.

Let's read the file using SparkContext, as follows:

scala>val file=sc.textFile("/usr/local/spark/examples/src/main/resources/people.txt")
file: org.apache.spark.rdd.RDD[String] = /usr/local/spark/examples/src/main/resources/people.txt MapPartitionsRDD[1] at textFile at <console>:24

This time we will cache this RDD. By default, it will be cached in memory:

scala>file.cache
res0: file.type = /usr/local/spark/examples/src/main/resources/people.txt MapPartitionsRDD[1] at textFile at <console>:24

As explained earlier, the DAG of transformations will only be executed when an action is performed, so the cache step will also be executed when we run an action on the RDD. So let's run a collect on it:

scala>file.collect
res1: Array[String] = Array(Michael, 29, Andy, 30, Justin, 19)

Now, you can find information about an RDD being cached on the Storage tab.

If you click on the RDD name, it provides information about the partitions on the RDD along with the address of the host on which the RDD is stored.

主站蜘蛛池模板: 开封市| 舟山市| 岗巴县| 吐鲁番市| 加查县| 乌拉特中旗| 芷江| 社旗县| 合水县| 阳曲县| 梧州市| 怀安县| 安西县| 望都县| 新干县| 子洲县| 大悟县| 西乌| 深水埗区| 河津市| 鲁山县| 峡江县| 大关县| 仁寿县| 北宁市| 蚌埠市| 天门市| 榆树市| 宜兴市| 沅江市| 寻甸| 辽源市| 永城市| 永春县| 保靖县| 边坝县| 清远市| 炎陵县| 札达县| 罗平县| 夏邑县|