官术网_书友最值得收藏!

Tweaking the parameters

So what about all the other parameters? Can we tweak them all to get better results?

Sure. We could, of course, tweak the number of clusters or play with the vectorizer's max_features parameter (you should try that!). Also, we could play with different cluster center initializations. There are also more exciting alternatives to KMeans itself. There are, for example, clustering approaches that also let you use different similarity measurements such as Cosine similarity, Pearson, or Jaccard. An exciting field for you to play.

But before you go there, you will have to define what you actually mean by "better". Scikit has a complete package dedicated only to this definition. The package is called sklearn.metrics and also contains a full range of different metrics to measure clustering quality. Maybe that should be the first place to go now, right into the sources of the metrics package.

主站蜘蛛池模板: 崇明县| 闽侯县| 刚察县| 辽宁省| 陈巴尔虎旗| 本溪市| 湘潭市| 陇南市| 四平市| 鹰潭市| 安岳县| 江华| 芦溪县| 青岛市| 三亚市| 化州市| 双牌县| 福清市| 敖汉旗| 铁岭市| 亳州市| 永登县| 尼木县| 威宁| 平遥县| 无极县| 前郭尔| 小金县| 太湖县| 景德镇市| 东丰县| 军事| 九龙城区| 峨眉山市| 广丰县| 阳泉市| 吉安县| 荣成市| 怀安县| 茂名市| 平乡县|