官术网_书友最值得收藏!

Storage

If during the execution of a job, the user persists/cache an RDD then information about that RDD can be retrieved on this tab. It can be accessed at http://localhost:4040/storage/.

Let's launch Spark shell again, read a file, and run an action on it. However, this time we will cache the file before running an action on it.

Initially, when you launch Spark shell, the Storage tab appears blank.

Let's read the file using SparkContext, as follows:

scala>val file=sc.textFile("/usr/local/spark/examples/src/main/resources/people.txt")
file: org.apache.spark.rdd.RDD[String] = /usr/local/spark/examples/src/main/resources/people.txt MapPartitionsRDD[1] at textFile at <console>:24

This time we will cache this RDD. By default, it will be cached in memory:

scala>file.cache
res0: file.type = /usr/local/spark/examples/src/main/resources/people.txt MapPartitionsRDD[1] at textFile at <console>:24

As explained earlier, the DAG of transformations will only be executed when an action is performed, so the cache step will also be executed when we run an action on the RDD. So let's run a collect on it:

scala>file.collect
res1: Array[String] = Array(Michael, 29, Andy, 30, Justin, 19)

Now, you can find information about an RDD being cached on the Storage tab.

If you click on the RDD name, it provides information about the partitions on the RDD along with the address of the host on which the RDD is stored.

主站蜘蛛池模板: 绍兴县| 湾仔区| 广元市| 定边县| 凤城市| 突泉县| 绵竹市| 汝南县| 来安县| 呼和浩特市| 伊金霍洛旗| 武平县| 惠来县| 吉隆县| 抚顺县| 桐梓县| 南京市| 齐河县| 林周县| 开平市| 和林格尔县| 博爱县| 东丽区| 忻城县| 东源县| 江达县| 星子县| 新宁县| 无极县| 丰城市| 石景山区| 仙居县| 靖宇县| 申扎县| 香格里拉县| 屏山县| 平塘县| 三都| 邳州市| 金溪县| 灵璧县|