官术网_书友最值得收藏!

Creating RDDs

RDDs can be Scala Spark shells that you launched earlier:

val collection = List("a", "b", "c", "d", "e") 
val rddFromCollection = sc.parallelize(collection)

RDDs can also be created from Hadoop-based input sources, including the local filesystem, HDFS, and Amazon S3. A Hadoop-based RDD can utilize any input format that implements the Hadoop InputFormat interface, including text files, other standard Hadoop formats, HBase, Cassandra, tachyon, and many more.

The following code is an example of creating an RDD from a text file located on the local filesystem:

val rddFromTextFile = sc.textFile("LICENSE")

The preceding textFile method returns an RDD where each record is a String object that represents one line of the text file. The output of the preceding command is as follows:

rddFromTextFile: org.apache.spark.rdd.RDD[String] = LICENSE   
MapPartitionsRDD[1] at textFile at <console>:24

The following code is an example of how to create an RDD from a text file located on the HDFS using hdfs:// protocol:

val rddFromTextFileHDFS = sc.textFile("hdfs://input/LICENSE ")

The following code is an example of how to create an RDD from a text file located on the Amazon S3 using s3n:// protocol:

val rddFromTextFileS3 = sc.textFile("s3n://input/LICENSE ")
主站蜘蛛池模板: 苗栗县| 玉龙| 拉萨市| 微山县| 古田县| 河南省| 福建省| 沈阳市| 鹤岗市| 行唐县| 尉犁县| 阳东县| 西乌珠穆沁旗| 林口县| 当涂县| 荃湾区| 慈利县| 万安县| 和平区| 武城县| 玛纳斯县| 梓潼县| 西乌珠穆沁旗| 玛多县| 多伦县| 淳安县| 台江县| 吴川市| 岑溪市| 米易县| 韶关市| 安龙县| 毕节市| 昆明市| 科技| 北安市| 含山县| 鄢陵县| 云浮市| 天门市| 清新县|