官术网_书友最值得收藏!

Creating RDDs

RDDs can be Scala Spark shells that you launched earlier:

val collection = List("a", "b", "c", "d", "e") 
val rddFromCollection = sc.parallelize(collection)

RDDs can also be created from Hadoop-based input sources, including the local filesystem, HDFS, and Amazon S3. A Hadoop-based RDD can utilize any input format that implements the Hadoop InputFormat interface, including text files, other standard Hadoop formats, HBase, Cassandra, tachyon, and many more.

The following code is an example of creating an RDD from a text file located on the local filesystem:

val rddFromTextFile = sc.textFile("LICENSE")

The preceding textFile method returns an RDD where each record is a String object that represents one line of the text file. The output of the preceding command is as follows:

rddFromTextFile: org.apache.spark.rdd.RDD[String] = LICENSE   
MapPartitionsRDD[1] at textFile at <console>:24

The following code is an example of how to create an RDD from a text file located on the HDFS using hdfs:// protocol:

val rddFromTextFileHDFS = sc.textFile("hdfs://input/LICENSE ")

The following code is an example of how to create an RDD from a text file located on the Amazon S3 using s3n:// protocol:

val rddFromTextFileS3 = sc.textFile("s3n://input/LICENSE ")
主站蜘蛛池模板: 太康县| 合作市| 墨江| 思南县| 古田县| 红河县| 和平区| 陵川县| 江西省| 烟台市| 含山县| 宜昌市| 仙桃市| 贺州市| 宜阳县| 咸宁市| 通江县| 旬阳县| 浦县| 连州市| 通江县| 永新县| 天台县| 蓬莱市| 通化市| 离岛区| 新疆| 龙海市| 华宁县| 灵山县| 锡林浩特市| 海盐县| 台东县| 泸州市| 濮阳市| 南皮县| 灌阳县| 淳化县| 宁化县| 瑞丽市| 霍林郭勒市|