官术网_书友最值得收藏!

Scatter plot to show clusters

While we have seen a scatter plot of random dots before, scatter plots are most useful with representing discrete data points that show a trend or clusters. Each data series will be plotted in different color per plotting command by default, which helps us distinguish the distinct dots from each series. To demonstrate the idea, we will generate two artificial clusters of data points using a simple random number generator function in NumPy, shown as follows:

import matplotlib.pyplot as plt

# seed the random number generator to keep results reproducible
np.random.seed(123)

# Generate 10 random numbers around 2 as x-coordinates of the first data series
x0 = np.random.rand(10)+1.5

# Generate the y-coordinates another data series similarly
np.random.seed(321)
y0 = np.random.rand(10)+2
np.random.seed(456)
x1 = np.random.rand(10)+2.5
np.random.seed(789)
y1 = np.random.rand(10)+2
plt.scatter(x0,y0)
plt.scatter(x1,y1)

plt.show()

We can see from the following plot that there are two artificially created clusters of dots colored blue (approximately in the left half) and orange (approximately in the right half) respectively:

There is another way to generate clusters and plot them on scatter plots. We can generate clusters of data points for testing and demonstration more directly using the make_blobs() function in a package called sklearn, which is developed for more advanced data analysis and data mining, as shown in the following snippet. We can specify the colors according to the assigned feature (cluster identity):

import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs

# make blobs with 3 centers with random seed of 23
blob_coords,features = make_blobs(centers=3, random_state=23)

# plot the blobs, with c value set to map colors to features
plt.scatter(blob_coords[:, 0], blob_coords[:, 1], marker='x', c=features)
plt.show()

As the make_blob function generates dots based on isotropic Gaussian distribution, we can see from the resultant plot that they are better clustered as three distinct blobs of dots centralized at three points:

scikit-learn is a powerful Python package that provides many different simple functions for data mining and data analysis. It has a versatile suite of algorithms for classification, regression, clustering, dimension reduction, and modeling. It also allows data preprocessing and pipelining multiple processing stages.

For getting familiar with the scikit-learn library, we can use the datasets preloaded in package, such as the famous iris petals, or generate datasets according to the specified distribution as shown before. Here we'll only use the data to illustrate a simple visualization using scatter plot, and we won't go into the details just yet. More examples are available and can be found by clicking on this link:

http://scikit-learn.org/stable/auto_examples/datasets/plot_random_dataset.html 

The previous example is a demonstration of an easier way to map dot color with labeled features, if any. The details of make_blobs() and other scikit-learn functions are out of our scope of introducing basic plots in this chapter.

We encourage our readers to seek resources and books on our Mapt platform to get a better understanding of the scikit-learn library usage. 

 Alternatively, readers can also read the scikit-learn documentation here: http://scikit-learn.org.

主站蜘蛛池模板: 麦盖提县| 富源县| 东乌| 凌云县| 绥化市| 平舆县| 渑池县| 阿瓦提县| 通州市| 芷江| 湖口县| 天水市| 泰兴市| 简阳市| 简阳市| 繁昌县| 连城县| 新源县| 久治县| 镇雄县| 松溪县| 辽宁省| 阿克| 河源市| 乌拉特中旗| 台中市| 尼玛县| 南岸区| 健康| 微山县| 吉木萨尔县| 新宁县| 广宗县| 商城县| 榆中县| 兴城市| 苍南县| 合江县| 顺义区| 赞皇县| 城口县|