- Machine Learning with Spark(Second Edition)
- Rajdeep Dua Manpreet Singh Ghotra Nick Pentreath
- 190字
- 2021-07-09 21:07:41
Creating RDDs
RDDs can be Scala Spark shells that you launched earlier:
val collection = List("a", "b", "c", "d", "e")
val rddFromCollection = sc.parallelize(collection)
RDDs can also be created from Hadoop-based input sources, including the local filesystem, HDFS, and Amazon S3. A Hadoop-based RDD can utilize any input format that implements the Hadoop InputFormat interface, including text files, other standard Hadoop formats, HBase, Cassandra, tachyon, and many more.
The following code is an example of creating an RDD from a text file located on the local filesystem:
val rddFromTextFile = sc.textFile("LICENSE")
The preceding textFile method returns an RDD where each record is a String object that represents one line of the text file. The output of the preceding command is as follows:
rddFromTextFile: org.apache.spark.rdd.RDD[String] = LICENSE
MapPartitionsRDD[1] at textFile at <console>:24
The following code is an example of how to create an RDD from a text file located on the HDFS using hdfs:// protocol:
val rddFromTextFileHDFS = sc.textFile("hdfs://input/LICENSE ")
The following code is an example of how to create an RDD from a text file located on the Amazon S3 using s3n:// protocol:
val rddFromTextFileS3 = sc.textFile("s3n://input/LICENSE ")
- Learning Social Media Analytics with R
- 精通Excel VBA
- Python Data Science Essentials
- 變頻器、軟啟動器及PLC實用技術260問
- Linux內核精析
- INSTANT VMware vCloud Starter
- Mastering Ceph
- 網絡脆弱性掃描產品原理及應用
- Introduction to R for Business Intelligence
- 電腦上網入門
- Mastering Ansible(Second Edition)
- MongoDB 4 Quick Start Guide
- 手把手教你學Photoshop CS3
- Hadoop Beginner's Guide
- Effective Business Intelligence with QuickSight