官术网_书友最值得收藏!

Creating RDDs

RDDs can be Scala Spark shells that you launched earlier:

val collection = List("a", "b", "c", "d", "e") 
val rddFromCollection = sc.parallelize(collection)

RDDs can also be created from Hadoop-based input sources, including the local filesystem, HDFS, and Amazon S3. A Hadoop-based RDD can utilize any input format that implements the Hadoop InputFormat interface, including text files, other standard Hadoop formats, HBase, Cassandra, tachyon, and many more.

The following code is an example of creating an RDD from a text file located on the local filesystem:

val rddFromTextFile = sc.textFile("LICENSE")

The preceding textFile method returns an RDD where each record is a String object that represents one line of the text file. The output of the preceding command is as follows:

rddFromTextFile: org.apache.spark.rdd.RDD[String] = LICENSE   
MapPartitionsRDD[1] at textFile at <console>:24

The following code is an example of how to create an RDD from a text file located on the HDFS using hdfs:// protocol:

val rddFromTextFileHDFS = sc.textFile("hdfs://input/LICENSE ")

The following code is an example of how to create an RDD from a text file located on the Amazon S3 using s3n:// protocol:

val rddFromTextFileS3 = sc.textFile("s3n://input/LICENSE ")
主站蜘蛛池模板: 徐汇区| 乌拉特前旗| 新野县| 红河县| 根河市| 丹阳市| 胶南市| 铅山县| 洛阳市| 图们市| 达拉特旗| 青州市| 犍为县| 永定县| 前郭尔| 绵竹市| 庆元县| 甘洛县| 东乡族自治县| 荔波县| 东丽区| 宁远县| 东丽区| 温宿县| 泾源县| 鸡东县| 苏尼特左旗| 安化县| 盐亭县| 隆回县| 华容县| 钟山县| 永修县| 枣庄市| 南京市| 涞水县| 施甸县| 汕头市| 镇巴县| 晋中市| 凤台县|