- Apache Spark 2.x for Java Developers
- Sourav Gulati Sumit Kumar
- 205字
- 2021-07-02 19:02:02
Storage
If during the execution of a job, the user persists/cache an RDD then information about that RDD can be retrieved on this tab. It can be accessed at http://localhost:4040/storage/.
Let's launch Spark shell again, read a file, and run an action on it. However, this time we will cache the file before running an action on it.
Initially, when you launch Spark shell, the Storage tab appears blank.

Let's read the file using SparkContext, as follows:
scala>val file=sc.textFile("/usr/local/spark/examples/src/main/resources/people.txt")
file: org.apache.spark.rdd.RDD[String] = /usr/local/spark/examples/src/main/resources/people.txt MapPartitionsRDD[1] at textFile at <console>:24
This time we will cache this RDD. By default, it will be cached in memory:
scala>file.cache
res0: file.type = /usr/local/spark/examples/src/main/resources/people.txt MapPartitionsRDD[1] at textFile at <console>:24
As explained earlier, the DAG of transformations will only be executed when an action is performed, so the cache step will also be executed when we run an action on the RDD. So let's run a collect on it:
scala>file.collect
res1: Array[String] = Array(Michael, 29, Andy, 30, Justin, 19)
Now, you can find information about an RDD being cached on the Storage tab.

If you click on the RDD name, it provides information about the partitions on the RDD along with the address of the host on which the RDD is stored.

- Python測試開發(fā)入門與實踐
- PHP+MySQL網站開發(fā)技術項目式教程(第2版)
- Java Web程序設計
- 假如C語言是我發(fā)明的:講給孩子聽的大師編程課
- Access 2010數據庫應用技術(第2版)
- Active Directory with PowerShell
- Kotlin開發(fā)教程(全2冊)
- HTML5+CSS3+jQuery Mobile APP與移動網站設計從入門到精通
- Mockito Essentials
- Mastering Concurrency in Python
- C/C++代碼調試的藝術(第2版)
- Kotlin入門與實戰(zhàn)
- 陪孩子像搭積木一樣學編程:Python真好玩+Scratch趣味編程(全2冊)
- Mastering MeteorJS Application Development
- Comprehensive Ruby Programming