官术网_书友最值得收藏!

Description of the dataset

We will use a recently added cryotherapy dataset from the UCI machine learning repository. The dataset can be downloaded from http://archive.ics.uci.edu/ml/datasets/Cryotherapy+Dataset+#.

This dataset contains information about wart treatment results of 90 patients using cryotherapy. In case you don't know, a wart is a kind of skin problem caused by infection with a type of human papillomavirus. Warts are typically small, rough, and hard growths that are similar in color to the rest of the skin.

There are two available treatments for this problem:

  • Salicylic acid: A type of gel containing salicylic acid used in medicated band-aids.
  • Cryotherapy: A freezing liquid (usually nitrogen) is sprayed onto the wart. It will destroy the cells in the affected area. After the cryotherapy, usually, a blister develops, which eventually turns into a scab and falls off after a week or so.

There are 90 samples or instances that were either recommended to go through cryotherapy or be discharged without cryotherapy. There are seven attributes in the dataset:

  • sex: Patient gender, characterized by 1 (male) or 0 (female).
  • age: Patient age.
  • Time: Observation and treatment time in hours.
  • Number_of_Warts: Number of warts.
  • Type: Types of warts.
  • Area: The amount of affected area.
  • Result_of_Treatment: The recommended result of the treatment, characterized by either 1 (yes) or 0 (no). It is also the target column.

As you can understand, it is a classification problem because we will have to predict discrete labels. More specifically, it is a binary classification problem. Since this is a small dataset with only six features, we can start with a very basic classification algorithm called logistic regression, where the logistic function is applied to the regression to get the probabilities of it belonging in either class. We will learn more details about logistic regression and other classification algorithms in Chapter 3, Scala for Learning Classification. For this, we use the Spark ML-based implementation of logistic regression in Scala.

主站蜘蛛池模板: 舟山市| 绥德县| 巴东县| 满洲里市| 四子王旗| 平江县| 沂源县| 呼和浩特市| 佛山市| 岳阳县| 同心县| 沙洋县| 云龙县| 潞西市| 五常市| 剑川县| 南涧| 邵阳市| 文水县| 广州市| 额尔古纳市| 桦甸市| 海原县| 绥芬河市| 仁怀市| 玉田县| 柞水县| 高淳县| 聊城市| 东兰县| 吉木萨尔县| 武城县| 冕宁县| 房产| 和政县| 中西区| 府谷县| 资中县| 东乡| 班戈县| 曲靖市|