官术网_书友最值得收藏!

Reading the training dataset

There is a Cryotherapy.xlsx Excel file, which contains data as well as data usage agreement texts. So, I just copied the data and saved it in a CSV file named Cryotherapy.csv. Let's start by creating SparkSession—the gateway to access Spark:

val spark = SparkSession
.builder
.master("local[*]")
.config("spark.sql.warehouse.dir", "/temp")
.appName("CryotherapyPrediction")
.getOrCreate()

import spark.implicits._

Then let's read the training set and see a glimpse of it:

var CryotherapyDF = spark.read.option("header", "true")
.option("inferSchema", "true")
.csv("data/Cryotherapy.csv")

Let's take a look to see if the preceding CSV reader managed to read the data properly, including header and types:

CryotherapyDF.printSchema()

As seen from the following screenshot, the schema of the Spark DataFrame has been correctly identified. Also, as expected, all the features of my ML algorithms are numeric (in other words, in integer or double format):

A snapshot of the dataset can be seen using the show() method. We can limit the number of rows; here, let's say 5:

CryotherapyDF.show(5)

The output of the preceding line of code shows the first five samples of the DataFrame:

主站蜘蛛池模板: 锡林郭勒盟| 泰宁县| 湖口县| 临江市| 灌南县| 通许县| 松溪县| 襄垣县| 勐海县| 桂东县| 临沂市| 五常市| 盐池县| 新安县| 沁水县| 亚东县| 华容县| 密云县| 康马县| 尤溪县| 和林格尔县| 玛多县| 巴东县| 晋中市| 赣州市| 布拖县| 阿拉尔市| 卢氏县| 开阳县| 明溪县| 灵宝市| 定安县| 南郑县| 洛南县| 微博| 济南市| 桂东县| 宜城市| 安岳县| 巴东县| 天门市|