官术网_书友最值得收藏!

How to do it...

Let's take a look at linear regression in supervised learning:

  1. Let's begin with a simple example of linear regression where we need to determine the relationship between men's height (in cms) and weight (in kgs). The following sample data represents the height and weight of 10 random men:
data <- data.frame("height" = c(131, 154, 120, 166, 108, 115,
158, 144, 131, 112),
"weight" = c(54, 70, 47, 79, 36, 48, 65,
63, 54, 40))
  1. Now, generate a linear regression model as follows:
lm_model <- lm(weight ~ height, data)
  1. The following plot shows the relationship between men's height and weight along with the fitted line:
plot(data, col = "red", main = "Relationship between height and
weight",cex = 1.7, pch = 1, xlab = "Height in cms", ylab = "Weight
in kgs")
abline(lm(weight ~ height, data))
Linear relationship between weight and height
  1. In semi-supervised models, the learning is primarily initiated using labeled data (a smaller quantity in general) and then augmented using unlabeled data (a larger quantity in general).

Let's perform K-means clustering (unsupervised learning) on a widely used dataset, iris.

  1. This dataset consists of three different species of iris (Setosa, Versicolor, and Virginica) along with their distinct features such as sepal length, sepal width, petal length, and petal width:
data(iris)
head(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
  1. The following plots show the variation of features across irises. Petal features show a distinct variation as against sepal features:
library(ggplot2)
library(gridExtra)
plot1 <- ggplot(iris, aes(Sepal.Length, Sepal.Width, color =
Species))
geom_point(size = 2)
ggtitle("Variation by Sepal features")
plot2 <- ggplot(iris, aes(Petal.Length, Petal.Width, color =
Species))
geom_point(size = 2)
ggtitle("Variation by Petal features")
grid.arrange(plot1, plot2, ncol=2)
Variation of sepal and petal features by length and width
  1. As petal features show a good variation across irises, let's perform K-means clustering using the petal length and petal width:
set.seed(1234567)
iris.cluster <- kmeans(iris[, c("Petal.Length","Petal.Width")],
3, nstart = 10)
iris.cluster$cluster <- as.factor(iris.cluster$cluster)
  1. The following code snippet shows a cross-tab between clusters and species (irises). We can see that cluster 1 is primarily attributed with setosa, cluster 2 with versicolor, and cluster 3 with virginica:
> table(cluster=iris.cluster$cluster,species= iris$Species)
species
cluster setosa versicolor virginica
1 50 0 0
2 0 48 4
3 0 2 46

ggplot(iris, aes(Petal.Length, Petal.Width, color =
iris.cluster$cluster)) + geom_point() + ggtitle("Variation by
Clusters")
  1. The following plot shows the distribution of clusters:
Variation of irises across three clusters
主站蜘蛛池模板: 鹤庆县| 三穗县| 潼南县| 阿尔山市| 定南县| 柳河县| 漳州市| 江山市| 承德县| 哈密市| 南京市| 蒲江县| 自治县| 乌兰浩特市| 成武县| 乌兰浩特市| 特克斯县| 酒泉市| 武隆县| 门源| 兖州市| 鄂伦春自治旗| 尼木县| 乡宁县| 鹤峰县| 江油市| 兴和县| 柯坪县| 深泽县| 萍乡市| 赤峰市| 泰兴市| 新晃| 西丰县| 兴山县| 江川县| 怀柔区| 涿州市| 德令哈市| 金华市| 海晏县|